Tampere University of Technology

TUTCRIS Research Portal

Reducing the overheads of hardware acceleration through datapath integration

Research output: Scientific - peer-reviewConference contribution

Details

Original languageEnglish
Title of host publicationMultimedia on Mobile Devices 2008, 28-29 January, 2008, San Jose, California, USA. Proceedings of SPIE-IS&T Electronic Imaging
EditorsR. Greutzbur, J. Takala
Pagespp. 68210R-1-10
Number of pages10
DOIs
StatePublished - 2008
Publication typeA4 Article in a conference publication
EventSPIE CONFERENCE PROCEEDINGS -

Conference

ConferenceSPIE CONFERENCE PROCEEDINGS
Period1/01/00 → …

Abstract

Hardware accelerators are used to speed up execution of specific tasks such as video coding. Often the purpose of hardware acceleration is to be able to use a cheaper or, for example, more energy economical processor for executing the majority of the application in software. However, when using hardware acceleration, new overheads are produced mainly due to the need to transfer data to and from the accelerator and signaling the readiness of the accelerator computation to the processor. We find the traditional mechanisms suboptimal for fine-grain hardware acceleration, especially when energy efficiency is important. This paper explores a technique unique to Transport Triggered Architectures to interface with hardware accelerators. The proposed technique places hardware accelerators to the processor data path, making them visible as regular function units to the programmer. This way communication costs are reduced as data can be transferred directly to the accelerator from other processor data path components and synchronization can be done by polling a simple ready flag in the accelerator function unit. Additionally, this setup enables the instruction scheduler of the compiler to schedule the hardware accelerator like any other operation, thus partially hide its latency with other program operations. The paper presents a case study with an audio decoder application in which fine-grain and coarse-grain hardware accelerators are integrated to the processor data path as function units. The case is used to study several different synchronization, communication, and latency-hiding techniques enabled by this kind of setup.

Publication forum classification