Tampere University of Technology

TUTCRIS Research Portal

Parallel memory architectures for video coding

Research output: Book/ReportDoctoral thesisCollection of Articles


Original languageEnglish
Place of PublicationTampere
PublisherTampere University of Technology
Number of pages86
ISBN (Electronic)952-15-1471-X
ISBN (Print)952-15-1260-1
Publication statusPublished - 26 Nov 2004
Publication typeG5 Doctoral dissertation (article)

Publication series

NameTampere University of Technology. Publication
PublisherTampere University of Technology
ISSN (Print)1459-2045


Most of the current processor architectures have word-addressable internal memories and wide data paths that are efficiently utilized whenever data is aligned according to word locations. However, in video coding the operands are typically 8- to 16-bits, so the architecture would be inefficiently exploited. Common solutions are to modify the data path so that multiple subwords can be processed in parallel and to provide dedicated instructions for data alignment. However, internal parallel memory architectures with versatile memory access properties have not been widely used. This thesis provides new insight into the design of internal on-chip data memory architectures for standards based video compression. The results can be employed in both programmable and hardware oriented solutions. This work shows that internal parallel data memory can be a viable design choice for the subword parallel and SIMD parallel processors performing video coding applications. Based on the analysis of the key functions of a video codec, a conventional word addressable architecture needs an average cycle count higher by a factor of 1.44-1.98 and an average instruction count higher by a factor of 1.22-1.62 than the proposed parallel memory. With a modulo addressable parallel memory, the external memory bandwidth can be decreased by about a factor of 1.6, while preserving efficient memory access performance in block matching operations. Enhanced memory access benefits of the parallel memory are application specific and need to be judged against the complexity of the design task. When compared to a conventional word-addressable memory, the parallel memories studied required larger silicon area (1.14-1.93), had higher power consumption per memory access (1.30-2.77), and longer total memory access delay (1.16-2.36). The results improve the understanding of design trade-offs related to video codecs. Furthermore, they provide the implementors data on the gate counts, area, power consumption, cycle times, and other performance figures of parallel memory solutions.

Publication forum classification

Downloads statistics

No data available