Tampere University of Technology

TUTCRIS Research Portal

A Hybrid Task Graph Scheduler for High Performance Image Processing Workflows

Research output: Contribution to journalArticleScientificpeer-review


Original languageEnglish
Pages (from-to)457–467
Number of pages11
JournalJournal of Signal Processing Systems
Issue number3
Publication statusPublished - 2017
Publication typeA1 Journal article-refereed


Designing applications for scalability is key to improving their performance in hybrid and cluster computing. Scheduling code to utilize parallelism is difficult, particularly when dealing with data dependencies, memory management, data motion, and processor occupancy. The Hybrid Task Graph Scheduler (HTGS) improves programmer productivity when implementing hybrid workflows for multi-core and multi-GPU systems. The Hybrid Task Graph Scheduler (HTGS) is an abstract execution model, framework, and API that increases programmer productivity when implementing hybrid workflows for such systems. HTGS manages dependencies between tasks, represents CPU and GPU memories independently, overlaps computations with disk I/O and memory transfers, keeps multiple GPUs occupied, and uses all available compute resources. Through these abstractions, data motion and memory are explicit; this makes data locality decisions more accessible. To demonstrate the HTGS application program interface (API), we present implementations of two example algorithms: (1) a matrix multiplication that shows how easily task graphs can be used; and (2) a hybrid implementation of microscopy image stitching that reduces code size by ≈ 43% compared to a manually coded hybrid workflow implementation and showcases the minimal overhead of task graphs in HTGS. Both of the HTGS-based implementations show good performance. In image stitching the HTGS implementation achieves similar performance to the hybrid workflow implementation. Matrix multiplication with HTGS achieves 1.3x and 1.8x speedup over the multi-threaded OpenBLAS library for 16k × 16k and 32k × 32k size matrices, respectively.


  • Dataflow, Heterogeneous architectures, Hybrid workflows, Image processing, Matrix multiplication, Task graph

Publication forum classification

Field of science, Statistics Finland