TUTCRIS - Tampereen teknillinen yliopisto

TUTCRIS

Designing globally-asynchronous locally-synchronous on-chip communication networks

Tutkimustuotos

Yksityiskohdat

AlkuperäiskieliEnglanti
JulkaisupaikkaTampere
KustantajaTampere University of Technology
Sivumäärä90
ISBN (elektroninen)978-952-15-2005-1
ISBN (painettu)978-952-15-1987-1
TilaJulkaistu - 3 kesäkuuta 2008
OKM-julkaisutyyppiG5 Artikkeliväitöskirja

Julkaisusarja

NimiTampere University of Technology. Publication
KustantajaTampere University of Technology
Vuosikerta742
ISSN (painettu)1459-2045

Tiivistelmä

This thesis addresses two aspects of designing on-chip communication networks. One is about applying Globally-Asynchronous Locally-Synchronous (GALS) communication scheme into Network-on-Chip (NoC). Another is of designing and realizing different types of on-chip communication structures in the frame of GALS scheme. The work of applying GALS scheme into on-chip networks presented in this thesis includes the strategy of realizing GALS scheme in a NoC, synchronization method in a GALS NoC, and asynchronous circuit design. GALS scheme is applied in the NoC designs presented in this thesis by applying synchronous style in the communications between network nodes and their attached function hosts while applying asynchronous style in the communications among network nodes. The asynchronous circuits developed for realizing the GALS on-chip networks include an asynchronous First-In First-Out (FIFO) design, control pipeline structures, C-element structure, and an arbiter design. Three different types of on-chip networks are designed and presented in this thesis, which include a direct network, a Code-Division Multiple-Access (CDMA) network, and a crossbar network. The direct on-chip network presented in this thesis is a bidirectional ring network which gives an example of realizing GALS scheme in Proteo NoC architecture. The ring network realization consists of six nodes and requires an area of 177K equivalent gates when it is realized with a 0.18μm standard-cell library of Application Specific Integrated Circuits (ASIC). Although the ring network has a scalable network structure, its data transfer latency can vary largely depending on the data destination and routing process. This drawback increases the difficulty for the ring network to provide constant quality of communication service. Therefore, a network structure which applies CDMA technique is developed and presented in this thesis in order to provide non-blocking data transfers among network nodes so that data transfer latencies have small variances. The CDMA NoC achieves this feature by applying orthogonal codes to build non-blocking data transfer channels among network nodes. The six-node realization of CDMA NoC presented in this thesis has an area of 272K equivalent gates when it is realized with a 0.18μm standard-cell library and the data path width is 32 bits. The compensation of the larger area cost is that the asynchronous data transfer latency in the six-node CDMA NoC is equivalent to the best-case latency in the ring network. When the data path width is 32 bits, the realized CDMA network can transfer a 96-bit payload packet between network nodes within 49ns through a four-phase handshake protocol if there is no congestion of destination, which is equivalent to 11.76Gbits/s throughput of the network. Crossbar is a well-known structure which can also supply the feature of non-blocking data transfers. Therefore, a six-node crossbar network is developed in this work as a reference to evaluate the CDMA network. In comparison with the six-node crossbar network, the CDMA network realization has 39.4% larger logic gate area cost when the data path width is 8 bits, whereas, the number of data wires in the CDMA network is 80.1% less than the number in the crossbar network if there are 31 network nodes. Besides ASIC realizations, a four-node GALS bidirectional ring network is realized on an Field-Programmable Gate Array (FPGA) device as an example of prototyping a synchronousasynchronous mixed NoC design on a Look-Up-Table (LUT) based FPGA device. The realization consumes 41.7K LUTs on an Altera StratixII FPGA device.

Julkaisufoorumi-taso

Latausten tilastot

Ei tietoja saatavilla