Tampere University of Technology

TUTCRIS Research Portal

OpenCL Programmable Exposed Datapath High Performance Low-Power Image Signal Processor

Research output: Scientific - peer-reviewConference contribution

Standard

OpenCL Programmable Exposed Datapath High Performance Low-Power Image Signal Processor. / Multanen, Joonas; Kultala, Heikki; Koskela, Matias; Viitanen, Timo; Jääskeläinen, Pekka; Takala, Jarmo; Danielyan, Aram; Cruz, Cristóvão.

2016 IEEE Nordic Circuits and Systems Conference (NORCAS). IEEE, 2016.

Research output: Scientific - peer-reviewConference contribution

Harvard

Multanen, J, Kultala, H, Koskela, M, Viitanen, T, Jääskeläinen, P, Takala, J, Danielyan, A & Cruz, C 2016, OpenCL Programmable Exposed Datapath High Performance Low-Power Image Signal Processor. in 2016 IEEE Nordic Circuits and Systems Conference (NORCAS). IEEE, Nordic circuits and systems conference, 1 January. DOI: 10.1109/NORCHIP.2016.7792906

APA

Vancouver

Author

Multanen, Joonas; Kultala, Heikki; Koskela, Matias; Viitanen, Timo; Jääskeläinen, Pekka; Takala, Jarmo; Danielyan, Aram; Cruz, Cristóvão / OpenCL Programmable Exposed Datapath High Performance Low-Power Image Signal Processor.

2016 IEEE Nordic Circuits and Systems Conference (NORCAS). IEEE, 2016.

Research output: Scientific - peer-reviewConference contribution

Bibtex - Download

@inbook{b2ec7c604352444d86410848a540c2cd,
title = "OpenCL Programmable Exposed Datapath High Performance Low-Power Image Signal Processor",
abstract = "Sophisticated computational imaging algorithms require both high performance and good energy-efficiency when executed on mobile devices. Recent trend has been to exploit the abundant data-level parallelism found in general purpose programmable GPUs. However, for low-power mobile use cases, generic GPUs consume excessive amounts of power. This paper proposes a programmable computational imaging processor with 16-bit half-precision SIMD floating point vector processing capabilities combined with power efficiency of an exposed datapath. In comparison to traditional VLIW architectures with similar computational resources, the exposed datapath reduces the register file traffic and complexity. These and the specific optimizations enabled by the explicit programming model enable extremely good power-performance. When synthesized on a 28nm ASIC technology, the accelerator consumes 71mW of power while running a state-of-the-art denoising algorithm, and occupies only 0.2mm² of chip area. For the algorithm, energy usage per frame is 7mJ, which is 10x less than the best found GPU-based implementation.",
author = "Joonas Multanen and Heikki Kultala and Matias Koskela and Timo Viitanen and Pekka Jääskeläinen and Jarmo Takala and Aram Danielyan and Cristóvão Cruz",
year = "2016",
doi = "10.1109/NORCHIP.2016.7792906",
booktitle = "2016 IEEE Nordic Circuits and Systems Conference (NORCAS)",
publisher = "IEEE",

}

RIS (suitable for import to EndNote) - Download

TY - CHAP

T1 - OpenCL Programmable Exposed Datapath High Performance Low-Power Image Signal Processor

AU - Multanen,Joonas

AU - Kultala,Heikki

AU - Koskela,Matias

AU - Viitanen,Timo

AU - Jääskeläinen,Pekka

AU - Takala,Jarmo

AU - Danielyan,Aram

AU - Cruz,Cristóvão

PY - 2016

Y1 - 2016

N2 - Sophisticated computational imaging algorithms require both high performance and good energy-efficiency when executed on mobile devices. Recent trend has been to exploit the abundant data-level parallelism found in general purpose programmable GPUs. However, for low-power mobile use cases, generic GPUs consume excessive amounts of power. This paper proposes a programmable computational imaging processor with 16-bit half-precision SIMD floating point vector processing capabilities combined with power efficiency of an exposed datapath. In comparison to traditional VLIW architectures with similar computational resources, the exposed datapath reduces the register file traffic and complexity. These and the specific optimizations enabled by the explicit programming model enable extremely good power-performance. When synthesized on a 28nm ASIC technology, the accelerator consumes 71mW of power while running a state-of-the-art denoising algorithm, and occupies only 0.2mm² of chip area. For the algorithm, energy usage per frame is 7mJ, which is 10x less than the best found GPU-based implementation.

AB - Sophisticated computational imaging algorithms require both high performance and good energy-efficiency when executed on mobile devices. Recent trend has been to exploit the abundant data-level parallelism found in general purpose programmable GPUs. However, for low-power mobile use cases, generic GPUs consume excessive amounts of power. This paper proposes a programmable computational imaging processor with 16-bit half-precision SIMD floating point vector processing capabilities combined with power efficiency of an exposed datapath. In comparison to traditional VLIW architectures with similar computational resources, the exposed datapath reduces the register file traffic and complexity. These and the specific optimizations enabled by the explicit programming model enable extremely good power-performance. When synthesized on a 28nm ASIC technology, the accelerator consumes 71mW of power while running a state-of-the-art denoising algorithm, and occupies only 0.2mm² of chip area. For the algorithm, energy usage per frame is 7mJ, which is 10x less than the best found GPU-based implementation.

U2 - 10.1109/NORCHIP.2016.7792906

DO - 10.1109/NORCHIP.2016.7792906

M3 - Conference contribution

BT - 2016 IEEE Nordic Circuits and Systems Conference (NORCAS)

PB - IEEE

ER -