## Application-Specific Parallel Structures for Discrete Cosine Transform and Variable Length Decoding

Research output: Book/Report › Doctoral thesis › Collection of Articles

### Standard

**Application-Specific Parallel Structures for Discrete Cosine Transform and Variable Length Decoding.** / Nikara, J.

Research output: Book/Report › Doctoral thesis › Collection of Articles

### Harvard

*Application-Specific Parallel Structures for Discrete Cosine Transform and Variable Length Decoding*. Tampere University of Technology. Publication, vol. 481, Tampere University of Technology, Tampere.

### APA

*Application-Specific Parallel Structures for Discrete Cosine Transform and Variable Length Decoding*. (Tampere University of Technology. Publication; Vol. 481). Tampere: Tampere University of Technology.

### Vancouver

### Author

### Bibtex - Download

}

### RIS (suitable for import to EndNote) - Download

TY - BOOK

T1 - Application-Specific Parallel Structures for Discrete Cosine Transform and Variable Length Decoding

AU - Nikara, J.

N1 - Awarding institution:Tampere University of Technology

PY - 2004/6/18

Y1 - 2004/6/18

N2 - This Thesis considers the design of application-specific parallel structures for digital signal processing. Due to wideness of the subject, the discussion has been restricted to the studies of the discrete cosine transform and variable length decoding. New area-efficient parallel structures, which process data in a sequential form at data rate, are developed for the discrete cosine transform. The development of the structures begins with the derivation of novel regular fast algorithms. The algorithms lend themselves for vertical mapping resulting in modular cascaded structures that can be freely pipelined due to the loop-free structure. In order to prove the feasibility and estimate the performance, the unified transform kernel for discrete cosine transform and its inverse is implemented on a standard cell CMOS technology with a data path synthesis. Finally, the comparison to a state-of-the-art design reveals up to 15% smaller estimated area than in the reference design. For the variable length decoding, a novel multiple-symbol decoding scheme is proposed. The critical path of the resulting decoder is minimized by introducing a new multiplexed add unit. In order to prove the feasibility and determine the limiting factors of the scheme, the decoder has been implemented on an FPGA technology. When applied to MPEG-2 standard benchmark scenes, on average 4.8 codewords are decoded per cycle resulting in the throughput of 106 million symbols per second. Although, a straightforward and fair comparison of variable length decoders is extremely difficult due to different implementation approaches, the performance of the decoder can be considered promising with 16-100% better throughput at 2-3.6 times lower frequencies than the reference designs on the same FPGA technology. In both the case studies, the discrete cosine transform and variable length decoding, the modularity and achievable high speed operation provide flexibility for the design re-use in the current and future applications.

AB - This Thesis considers the design of application-specific parallel structures for digital signal processing. Due to wideness of the subject, the discussion has been restricted to the studies of the discrete cosine transform and variable length decoding. New area-efficient parallel structures, which process data in a sequential form at data rate, are developed for the discrete cosine transform. The development of the structures begins with the derivation of novel regular fast algorithms. The algorithms lend themselves for vertical mapping resulting in modular cascaded structures that can be freely pipelined due to the loop-free structure. In order to prove the feasibility and estimate the performance, the unified transform kernel for discrete cosine transform and its inverse is implemented on a standard cell CMOS technology with a data path synthesis. Finally, the comparison to a state-of-the-art design reveals up to 15% smaller estimated area than in the reference design. For the variable length decoding, a novel multiple-symbol decoding scheme is proposed. The critical path of the resulting decoder is minimized by introducing a new multiplexed add unit. In order to prove the feasibility and determine the limiting factors of the scheme, the decoder has been implemented on an FPGA technology. When applied to MPEG-2 standard benchmark scenes, on average 4.8 codewords are decoded per cycle resulting in the throughput of 106 million symbols per second. Although, a straightforward and fair comparison of variable length decoders is extremely difficult due to different implementation approaches, the performance of the decoder can be considered promising with 16-100% better throughput at 2-3.6 times lower frequencies than the reference designs on the same FPGA technology. In both the case studies, the discrete cosine transform and variable length decoding, the modularity and achievable high speed operation provide flexibility for the design re-use in the current and future applications.

M3 - Doctoral thesis

SN - 952-15-1196-6

T3 - Tampere University of Technology. Publication

BT - Application-Specific Parallel Structures for Discrete Cosine Transform and Variable Length Decoding

PB - Tampere University of Technology

CY - Tampere

ER -