Tampere University of Technology

TUTCRIS Research Portal

Using OpenCL to Rapidly Prototype FPGA Designs

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Details

Original languageEnglish
Title of host publication2016 IEEE Nordic Circuits and Systems Conference (NORCAS)
PublisherIEEE
ISBN (Electronic)978-1-5090-1095-0
DOIs
Publication statusPublished - 1 Nov 2016
Publication typeA4 Article in a conference publication
EventNordic circuits and systems conference -
Duration: 1 Jan 2000 → …

Conference

ConferenceNordic circuits and systems conference
Period1/01/00 → …

Abstract

Field Programmable Gate Arrays (FPGAs) have gained popularity because their reconfigurability can speed up development and verification with relatively low cost. However the deep level of understanding required on hardware logic programming has discouraged many software engineers. An interface between host devices and FPGAs to enable designing and programming FPGAs using a software programming standard and encapsulating hardware details is much desired. In this paper we evaluate leveraging Open Computing Language (OpenCL) to rapidly design FPGAs, considering both hardware logic utilization efficiency and computing performance. On a heterogeneous computer system consisting of ARM processors and Altera FPGA, we execute an OpenCL host program on the ARM processors and an OpenCL kernel on the FPGA, to compute a parametrizable two-dimensional Mandelbrot fractal. We explore three design aspects of adjusting OpenCL work-group size, coalescing memory access, and replicating compute units to improve the FPGA computation performance. After optimizing the core algorithm, we efficiently reduced the logic utilization and Digital Signal Processing (DSP) blocks required for a single compute unit, and successfully increased the number of replicated compute units from four to six, thus delivering a 1.5X increase of parallel computation capacity of the FPGA, and improving the computing speed by 1.5X and memory bandwidth by 1.7X.

Publication forum classification