Tampere University of Technology

TUTCRIS Research Portal

Design and Implementation of 2D IDCT/IDST-Specific Accelerator on Heterogeneous Multicore Architecture

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review


Original languageEnglish
Title of host publication 2018 IEEE Nordic Circuits and Systems Conference (NORCAS): NORCHIP and International Symposium of System-on-Chip (SoC)
ISBN (Electronic)978-1-5386-7656-1
Publication statusPublished - 13 Dec 2018
Publication typeA4 Article in a conference publication
EventIEEE Nordic Circuits and Systems Conference - Tallinn, Estonia
Duration: 30 Oct 201831 Oct 2018


ConferenceIEEE Nordic Circuits and Systems Conference
Abbreviated titleNorCAS 2018


The paper talks about how to implement different sizes of Inverse Discrete Cosine Transform (IDCT) as well as Inverse Discrete Sine transform (IDST) that are dedicated on High Efficiency Video Coding (HEVC) standard through employing Coarse-Grained Reconfigurable Arrays (CGRAs) as a template-based accelerators on Heterogeneous Accelerator-Rich Platform (HARP). The proposal designs multi-purpose IDCT/IDST-based accelerators in a manner that the final architecture is made up of 4-point IDST and 4/8-point IDCT. The designing of the accelerators is done by creating template-based CGRA devices at various dimensions after which they are arranged in a sequential manner over a structure that is Network-on-Chip (NoC) based accompanied by a number of RISC cores. The research records the IDCT/IDST-specific accelerator performance, the entire platform's performance, as well as the traffic of the NoC with regard to the total number of clock cycles made as well as several other high-level metrics of performance. The experiments that were conducted found that 4-point IDCT and 4-point IDST can be totally implemented in 56 clock cycles. For 8-point IDCT, the clock cycles required are 64. The total power dissipation, as well as energy consumption centred on information on routing and post placement, are all equal to 4.03 mW and 1.76 μJ for 4-point IDCT/IDST and 3.06 μJ for 8-point IDCT, respectively. Furthermore, the use of 256 instantiated Processing Elements (PEs) at an operating frequency of 200.0 MHz results to a 51.2 Giga Operations Per Second (GOPS) performance and 0.012 GOPS/mW architectural constant for the HARP model on the 28 nm Altera Stratix-V chip. The architecture under the proposal is capable of fully sustaining a format of Full HD 1080P at 30 fps on FPGA.

ASJC Scopus subject areas

Publication forum classification