Tampere University of Technology

TUTCRIS Research Portal

Exposed Datapath optimizations for Loop Scheduling

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Standard

Exposed Datapath optimizations for Loop Scheduling. / Kultala, Heikki; Jääskeläinen, Pekka; IJzerman, Johannes; Lehtonen, Lasse; Viitanen, Timo; Mäkitalo, Markku; Takala, Jarmo.

Embedded Computer Systems: Architectures, Modeling, and Simulation 2017 IEEE International Conference (IC-SAMOS 2017). IEEE, 2018. p. 171-178.

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Harvard

Kultala, H, Jääskeläinen, P, IJzerman, J, Lehtonen, L, Viitanen, T, Mäkitalo, M & Takala, J 2018, Exposed Datapath optimizations for Loop Scheduling. in Embedded Computer Systems: Architectures, Modeling, and Simulation 2017 IEEE International Conference (IC-SAMOS 2017). IEEE, pp. 171-178, International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation, 1/01/00. https://doi.org/10.1109/SAMOS.2017.8344625

APA

Kultala, H., Jääskeläinen, P., IJzerman, J., Lehtonen, L., Viitanen, T., Mäkitalo, M., & Takala, J. (2018). Exposed Datapath optimizations for Loop Scheduling. In Embedded Computer Systems: Architectures, Modeling, and Simulation 2017 IEEE International Conference (IC-SAMOS 2017) (pp. 171-178). IEEE. https://doi.org/10.1109/SAMOS.2017.8344625

Vancouver

Kultala H, Jääskeläinen P, IJzerman J, Lehtonen L, Viitanen T, Mäkitalo M et al. Exposed Datapath optimizations for Loop Scheduling. In Embedded Computer Systems: Architectures, Modeling, and Simulation 2017 IEEE International Conference (IC-SAMOS 2017). IEEE. 2018. p. 171-178 https://doi.org/10.1109/SAMOS.2017.8344625

Author

Kultala, Heikki ; Jääskeläinen, Pekka ; IJzerman, Johannes ; Lehtonen, Lasse ; Viitanen, Timo ; Mäkitalo, Markku ; Takala, Jarmo. / Exposed Datapath optimizations for Loop Scheduling. Embedded Computer Systems: Architectures, Modeling, and Simulation 2017 IEEE International Conference (IC-SAMOS 2017). IEEE, 2018. pp. 171-178

Bibtex - Download

@inproceedings{344bcf5c2def43d1b2819e92311f2302,
title = "Exposed Datapath optimizations for Loop Scheduling",
abstract = "Transport Triggered Architecture (TTA) processors allow unique low level compiler optimizations such as software bypassing and operand sharing. Previously, these optimizations have mostly been performed inside single basic blocks, leaving much of their potential unused. In this work, software bypassing and operand sharing are integrated with loop scheduling, allowing optimizations over loop iteration boundaries. This considerably further reduces register file accesses and immediate value transfers on tight loops – in some cases even eliminating all register file accesses from the loop body. In the benchmarked 12 small loops, compared to traditional VLIW-style processors, on average 63{\%} of register file reads and 77{\%} of register file writes could be eliminated. Compared to a compiler which performs these optimizations only inside a basic block, on average 58{\%} of register file reads, 28{\%} of register file writes could be eliminated. The additional register access reductions allow both direct energy savings from fewer register accesses and indirect energy savings by allowing the use of simpler register files with less read and write ports and a simpler interconnect network with less transport buses.",
author = "Heikki Kultala and Pekka J{\"a}{\"a}skel{\"a}inen and Johannes IJzerman and Lasse Lehtonen and Timo Viitanen and Markku M{\"a}kitalo and Jarmo Takala",
year = "2018",
doi = "10.1109/SAMOS.2017.8344625",
language = "English",
pages = "171--178",
booktitle = "Embedded Computer Systems: Architectures, Modeling, and Simulation 2017 IEEE International Conference (IC-SAMOS 2017)",
publisher = "IEEE",

}

RIS (suitable for import to EndNote) - Download

TY - GEN

T1 - Exposed Datapath optimizations for Loop Scheduling

AU - Kultala, Heikki

AU - Jääskeläinen, Pekka

AU - IJzerman, Johannes

AU - Lehtonen, Lasse

AU - Viitanen, Timo

AU - Mäkitalo, Markku

AU - Takala, Jarmo

PY - 2018

Y1 - 2018

N2 - Transport Triggered Architecture (TTA) processors allow unique low level compiler optimizations such as software bypassing and operand sharing. Previously, these optimizations have mostly been performed inside single basic blocks, leaving much of their potential unused. In this work, software bypassing and operand sharing are integrated with loop scheduling, allowing optimizations over loop iteration boundaries. This considerably further reduces register file accesses and immediate value transfers on tight loops – in some cases even eliminating all register file accesses from the loop body. In the benchmarked 12 small loops, compared to traditional VLIW-style processors, on average 63% of register file reads and 77% of register file writes could be eliminated. Compared to a compiler which performs these optimizations only inside a basic block, on average 58% of register file reads, 28% of register file writes could be eliminated. The additional register access reductions allow both direct energy savings from fewer register accesses and indirect energy savings by allowing the use of simpler register files with less read and write ports and a simpler interconnect network with less transport buses.

AB - Transport Triggered Architecture (TTA) processors allow unique low level compiler optimizations such as software bypassing and operand sharing. Previously, these optimizations have mostly been performed inside single basic blocks, leaving much of their potential unused. In this work, software bypassing and operand sharing are integrated with loop scheduling, allowing optimizations over loop iteration boundaries. This considerably further reduces register file accesses and immediate value transfers on tight loops – in some cases even eliminating all register file accesses from the loop body. In the benchmarked 12 small loops, compared to traditional VLIW-style processors, on average 63% of register file reads and 77% of register file writes could be eliminated. Compared to a compiler which performs these optimizations only inside a basic block, on average 58% of register file reads, 28% of register file writes could be eliminated. The additional register access reductions allow both direct energy savings from fewer register accesses and indirect energy savings by allowing the use of simpler register files with less read and write ports and a simpler interconnect network with less transport buses.

U2 - 10.1109/SAMOS.2017.8344625

DO - 10.1109/SAMOS.2017.8344625

M3 - Conference contribution

SP - 171

EP - 178

BT - Embedded Computer Systems: Architectures, Modeling, and Simulation 2017 IEEE International Conference (IC-SAMOS 2017)

PB - IEEE

ER -