Tampere University of Technology

TUTCRIS Research Portal

Application-Specific Parallel Structures for Discrete Cosine Transform and Variable Length Decoding

Research output: Book/ReportDoctoral thesisCollection of Articles


Original languageEnglish
Place of PublicationTampere
PublisherTampere University of Technology
Number of pages127
ISBN (Electronic)952-15-1405-1
ISBN (Print)952-15-1196-6
Publication statusPublished - 18 Jun 2004
Publication typeG5 Doctoral dissertation (article)

Publication series

NameTampere University of Technology. Publication
PublisherTampere University of Technology
ISSN (Print)1459-2045


This Thesis considers the design of application-specific parallel structures for digital signal processing. Due to wideness of the subject, the discussion has been restricted to the studies of the discrete cosine transform and variable length decoding. New area-efficient parallel structures, which process data in a sequential form at data rate, are developed for the discrete cosine transform. The development of the structures begins with the derivation of novel regular fast algorithms. The algorithms lend themselves for vertical mapping resulting in modular cascaded structures that can be freely pipelined due to the loop-free structure. In order to prove the feasibility and estimate the performance, the unified transform kernel for discrete cosine transform and its inverse is implemented on a standard cell CMOS technology with a data path synthesis. Finally, the comparison to a state-of-the-art design reveals up to 15% smaller estimated area than in the reference design. For the variable length decoding, a novel multiple-symbol decoding scheme is proposed. The critical path of the resulting decoder is minimized by introducing a new multiplexed add unit. In order to prove the feasibility and determine the limiting factors of the scheme, the decoder has been implemented on an FPGA technology. When applied to MPEG-2 standard benchmark scenes, on average 4.8 codewords are decoded per cycle resulting in the throughput of 106 million symbols per second. Although, a straightforward and fair comparison of variable length decoders is extremely difficult due to different implementation approaches, the performance of the decoder can be considered promising with 16-100% better throughput at 2-3.6 times lower frequencies than the reference designs on the same FPGA technology. In both the case studies, the discrete cosine transform and variable length decoding, the modularity and achievable high speed operation provide flexibility for the design re-use in the current and future applications.

Publication forum classification

Downloads statistics

No data available