Hardware Deceleration of Kvazaar HEVC Encoder
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
|Title of host publication||Embedded Computer Systems|
|Subtitle of host publication||Architectures, Modeling, and Simulation - 19th International Conference, SAMOS 2019, Proceedings|
|Number of pages||14|
|Publication status||Published - 4 Oct 2019|
|Publication type||A4 Article in a conference publication|
|Event||International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation - Samos, Greece|
Duration: 7 Jul 2019 → 11 Jul 2019
|Name||Lecture Notes in Computer Science|
|Conference||International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation|
|Period||7/07/19 → 11/07/19|
of the prior Advanced Video Coding (AVC) standard but tackling its huge com-
plexity calls for efficient HEVC codec implementations. The recent advances in
Graphics Processing Units (GPUs) have made programmable general-purpose
GPUs (GPGPUs) a popular option for accelerating various video coding tools.
Massively parallel GPU architectures are particularly well suited for hardware-
oriented full search (FS) algorithm in HEVC integer motion estimation (IME).
This paper analyzes the feasibility of a GPU-accelerated FS implementation in
the practical Kvazaar open-source HEVC encoder. According to our evaluations,
implementing FS on AMD Radeon RX 480 GPU makes Kvazaar 12.5 times as
fast as the respective anchor implemented entirely on an Intel 8-core i7 processor.
However, the obtained speed gain is lost when fast IME algorithms are put into
use in the anchor. For example, executing the anchor with hexagon-based search
(HEXBS) algorithm is almost two times as fast as our GPU-accelerated proposal
and the benefit of GPU offloading is reduced to a slight coding gain of 1.2%. Our
results show that accelerating IME on a GPU speeds up non-practical encoders
due to their enormous inherent complexity but the price paid with practical en-
coders tends to be too high. Conditional processing schemes of fast IME algo-
rithms can be efficiently executed on processors without any substantial coding
loss over that of FS. Nevertheless, we still believe there might be room for ex-
ploiting GPU on IME acceleration but GPU-parallelized fast algorithms are
needed to get value for additional implementation cost and power budget.