Evaluation of a Heterogeneous Multicore Architecture by Design and Test of an OFDM Receiver
Research output: Contribution to journal › Article › Scientific › peer-review
|Number of pages||3187|
|Journal||IEEE Transactions on Parallel and Distributed Systems|
|Early online date||22 May 2017|
|Publication status||Published - 1 Nov 2017|
|Publication type||A1 Journal article-refereed|
This paper presents an evaluation of a Heterogeneous Multicore Architecture (HMA) by implementing Orthogonal Frequency- Division Multiplexing (OFDM) receiver blocks as designs for the test of functionality. OFDM receiver consists of computationally intensive and general-purpose processing tasks that can provide maximum coverage to test and evaluate a massively-parallel as well as a general-purpose platform like the HMA. The blocks of the receiver are primarily designed by crafting template-based Coarse-Grained Reconfigurable Array (CGRA) devices and then arranging them in a sequence over a Network-on-Chip (NoC) structure along with a few RISC cores for complete OFDM processing. The OFDM blocks such as Fast Fourier Transform (FFT) and Time Synchronization are computationally intensive and require parallel processing. The OFDM receiver also contains tasks such as frequency offset estimation which require the processing of Taylor series and CORDIC algorithms that are serial in nature. Such a combination of serial and parallel algorithms can perform a thorough exploration and evaluation of almost all the design features of an HMA. The OFDM implementation has led to scale CGRAs to different dimensions, instantiate Processing Elements (PEs) as multiple arithmetic resources and to establish almost all possible ways of PE interconnections. It further explores time-multiplexed patterns for data placement in the CGRA memories. Nevertheless, the data can also be exchanged among different nodes over NoC structure simultaneously and independently by using direct memory access devices. In this experimental work, the performance of each CGRA, the collective performance of the whole platform and the NoC traffic are recorded in terms of the number of clock cycles and several high-level performance metrics. Today’s HMAs are generally over or under resourced for the applications that they are designed for and thus not an optimal choice for the end user. Apart from the interesting comparisons to the other state-of-the-art, our experimental setup has provided important insight and guidelines that the designers can use to implement near-optimal solutions for their target applications.