Tampere University of Technology

TUTCRIS Research Portal

Design and Implementation of Multi-Purpose DCT/DST-Specific Accelerator on Heterogeneous Multicore Architecture

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Details

Original languageEnglish
Title of host publication2018 IEEE Nordic Circuits and Systems Conference (NORCAS): NORCHIP and International Symposium of System-on-Chip (SoC)
PublisherIEEE
ISBN (Electronic)978-1-5386-7656-1
DOIs
Publication statusPublished - 13 Dec 2018
Publication typeA4 Article in a conference publication
EventIEEE Nordic Circuits and Systems Conference - Tallinn, Estonia
Duration: 30 Oct 201831 Oct 2018

Conference

ConferenceIEEE Nordic Circuits and Systems Conference
Abbreviated titleNorCAS 2018
CountryEstonia
CityTallinn
Period30/10/1831/10/18

Abstract

This paper presents the implementation of various sizes of Discrete Cosine transform (DCT) and Discrete Sine Transform (DST) dedicated for High Efficiency Video Coding (HEVC) standard by using template-based Coarse-Grained Reconfigurable Arrays (CGRAs) as accelerators on Heterogeneous Accelerator-Rich Platform (HARP). The proposal makes multipurpose DCT/DST specific accelerators in such a way that final architecture consists of 4/8/16/32-point DCT and 4-point DST. The accelerators are primarily designed by crafting template-based CGRA devices at different dimensions and then arranging them on a Network-on-Chip platform along with a few RISC cores. In this research work, the performance of each DCT/DST-specific accelerator, the collective performance of the whole platform and the NoC traffic are recorded in terms of the number of clock cycles and several high-level performance metrics. Conducted experiments show that 4-point DCT and 4-point DST can be implemented completely in 54 and 56 clock cycles, respectively, while for 8/16/32-point DCT, 67, 179 and 354 clock cycles are required, respectively. The achieved total power dissipation and energy consumption based on post placement and routing information are equal to 4.1 W and 10.87 μJ, respectively with 256 instantiated Processing Elements (PEs) at 200.0 MHz operating frequency. It resulted to a performance of 51.2 Giga Operations Per Second (GOPS) and 12 MOPS/mW as an architectural constant for the HARP template on 28 nm Altera Stratix-V chip. The proposed architecture is able to sustain Full HD 1080p format at 30 fps on FPGA.

ASJC Scopus subject areas

Publication forum classification