# Design and Analysis of Energy Efficient Semi-Serial Link for On-Chip Communication

# <sup>1</sup>M. Chennakesavulu, <sup>2</sup>J. Raghu

<sup>1</sup>Associate Professor, School Of Electronics &Communication Engineering, RGMCET, Nandyal-518 501, Kurnool (dist), Andhra Pradesh, India

<sup>2</sup>M.Tech (DSCE), student, RGMCET, Nandyal-518501, Kurnool (dist), Andhra Pradesh, India

Abstract: Now a days in network-on-chip (Noc) different type of communication links are used like parallel, serial and semi-serial links. In this project low energy semi-serial on-chip communication link is designed. In this project protocol used for this proposed semi-serial link is defined. And key elements of this semi-serial link communication like serializer, deserializer, driver, receiver, and data validity decoder are designed. The energy efficiency of the proposed semi-serial link, (which consists of bit-serial links in parallel), mainly comes from the sharing of the novel's serializer's control circuit among the bit semi-serial links. In addition the integration of pulse signaling with wave-pipelining, the use of new low-complexity data decoding logic causes for the power reduction. The links are designed and simulated using cmos 180nm, 120 and 65 nm technologies in microwind 3.1 cadtool. When technology scale down from 180nm to 65nm, power decreased from 22.216mw to 10.464mw, delay decreased from 0.247ns to 0.115ns and power-delay product varies from 5.487pw-sec to 1.203pw-sec.

Keywords: Differential current-mode signaling, network-on-chip (noc), pulse signaling, wave-pipelining.

## 1. Introduction

Technology scaling in to the nanometer regime creates an opportunity to integrate hundreds of cores on a single chip. In this many core era, network-on-chip (NoC) is emerging as a scalable and modular solution to provide communication between cores [1] [2]. NoC needs to be designed efficiently to maximize system performance and minimize power consumption and area. Large latencies of networks cause performance degradation in high performance systems. The power consumption of NoC implemented with current techniques is too high, by a factor of 10, to meet the expected needs of future applications [3]. However, the long-range links have to be designed in a performance and energy efficient manner. Thus, the main goal of this work is to design a high-throughput, low-power and smaller area global on-chip interconnect that can be used as long-range link in NoC. This in turn increases the overall network throughput and decreases its power consumption besides minimizing traffic congestion. NoCs with a globally asynchronous locally synchronous (GALS) clocking style are used in most of the proposed network designs and is expected to be an attractive approach to overcome many of the timing problems [4]. GALS simplifies the clock tree design and results in easily scalable clocking systems. It also allows improved energy savings since each functional unit can easily have its own independent clock and voltage [5]. Furthermore, it enables easy implementation of a distributed power management system for the entire chip [6]. Due to these advantages, the proposed long-range link design is based on self-timed design principles. Specifically, it uses two-phase handshaking and delay-insensitive data encoding and transfer techniques. Delay-insensitive data encoding and transfer is a viable solution to communicate reliably in the presence of signal propagation delay variations, which occur due to crosstalk, process, voltage, and temperature (PVT) variations. The proposed serial link communication protocol along with the design of its circuits is presented in section 2. Transistor-level simulation results and analysis of proposed link are presented in section 3. Finally the conclusion drawn from this work is discussed in section 4.

# 2. Low-Power Serial on-Chip Communication Link

Α high-throughput and low-power serial on-chip communication link employing integration of pulse dual-rail data encoding, wave-pipelining, pulse signaling and differential current-mode signaling is presented. Two-phase pulse dual-rail encoding is performed at low cost using two AND gates, one for data bit '1' and the other for '0'. This encoding enables usage of pulse signaling along with differential signaling directly. Furthermore, both the latency and the power consumption are reduced because data decoding logic is not needed at the receiver. The ability to detect each bit through pulse signaling in the wave-pipelined communication makes the link delay-insensitive and also enables acknowledging the transmission per word instead of per bit, improving throughput and saving energy. The semiserial link consists of serializer and deserializer, dual-rail encoder, driver, receiver, and data validity decoder, as shown in Fig.2.1. In the subsequent subsections, the communication protocol, design details of the link circuits and the signaling technique are discussed.





## Volume 2 Issue 12, December2013 www.ijsr.net

### **2.1. Communication Protocol**

It is assumed that the sender and receiver modules have twophase bundled-data interface. As soon as there is a request from the sender module which informs the data to be sent are ready and stable, the data will be loaded into the shift register. In addition to the data, the Stop bit is also loaded which will be used to stop the shifting in the deserializer without the need for additional control logic such as data bit counter. The locally generated clock starts running after parallel data loading is completed. It is used for data shifting and dual-rail data encoding. It is a stoppable clock that runs only when there is data in the shift register to be transmitted and stopped at all other time, saving the communication power significantly. Data is shifted at the negative edge of the clock and encoded when the clock is in high state. The counter counts at the negative edge of the clock and signals the completion of data shifting when it reaches the maximum count value, which in turn stops the clock. Dual-rail and differential pulse current-mode signaling is used for data transmission through the wire. Acknowledgment is sent per word instead of per bit thanks to the devised delayinsensitive wave-pipelining in the wire. In the receiving side, the transmitted data is retrieved directly from the receiver without the need for data decoding logic. The extracted data validity indicator is used as a clock for shifting the data in the deserializer. Shifting is performed at both edges of the data validity indicator signal. The arrival of Stop bit at the last flip-flop of the deserializer indicates that shifting is completed and the data are ready for parallel bit out. At this point, request to the receiving router will be sent. The deserializer shift register will be cleared when an acknowledgment from the data receiving module is received. The deserializer consists of a shift register and interfacing circuit between the serial link and receiving router. The data receiver output Wdout is shifted in the deserializer shift register at both edges of data validity indicator signal, DVIout. The shifting process will be stopped when the Stop bit reaches to the shift register's last flipflop. Req2R and Ack2R are the bundled-data interface between the link and receiving block (Figure 2.1). RstH signal resets the deserializer's shift register.

## 2.2. Serializer and Pulse Dual-Rail Encoder

The bit parallel data from the sender is serialized using a novel shift register which uses the locally generated clock to shift the stored data. As shown in Fig 2.2, the serializer consists of shift register, counter, clock generating circuit and other interfacing elements. The shift register is shown in Fig.2.4 and the counter is shown in Fig.2.3. The design of the shift register is based on True Single-Phase-Clocked (TSPC) flip-flops [7] and customized to have parallel data loading ability. TSPC is chosen because of its ability to embed logic, parallel data loading in this case, with very little delay overhead. In addition, it has much smaller setup time and propagation delay compared to other dynamic flip-flops, making it the most suitable to realize high-speed shift registers. The customized TSPC circuit with parallel loading is shown in Fig.2.2. In the loading phase, transistors Mns and Mnr are used to load bit '1' and bit '0', respectively and transistor Mps decouples D from node L1 (preventing error when D is '0' and data to be loaded is '1'). The tri-state weak inverter is used as a keeper for the loaded data. There are two

3-input upper asymmetric C-elements (C1 and C2) in the serializer circuit, shown in Fig.2.2, that are used to generate the local clock and keeper enable signals. The output of the NOR gates act the active-low reset signal for C1 and C2. The implementation of 3-input upper asymmetric c-element is shown in Fig.2.6.



Figure 2.2: Serializer and Pulse dual-rail encoder

One-hot counter is designed from shift register so that its delay becomes equivalent to the data shift register in the serializer. As in the serializer's shift register, the counter shifts its one-hot code at the negative edge of the clock. Its shift register is designed from TSPC flip-flops which are customized to support active-low reset. For *N*-bit word counter, *N* TSPC flip-flops are connected in series and the last flip-flop's output is inverted and feedback to the first flip-flop's input.

The delay-insensitive data transfer, such as the dual-rail encoded interconnect, is a necessity in global interconnects of a nanometer SoC [8]. The delay-insensitivity makes the data transfer robust, because the sender and the receiver modules can communicate reliably regardless of delays in the transceivers and wires. Delay-insensitive data encoding technique requires 2N wires to transmit N-bit data. Pulse dual-rail encoding, where the presence of a new valid bit is represented by a pulse instead of voltage transitions or levels, is formulated and used in the presented serial link. This encoding enables straightforward use of pulse signaling. Furthermore it has simpler and faster encoding/decoding logic when it is used along with differential signaling than the transition based protocols. When the clock is high, the dual-rail encoder, shown in Fig. 2.2, encodes each bit into Pulse and No Pulse (P, NP) pair depending on its value. For example, when the output of the shift register is bit '1', and the clock is high, there is a pulse at the output of AND1 and no pulse at the output of ANDO.



Figure 2.3: Counter



Figure 2.4: Parallel in serial out shift register

# 2.3 High-Speed Differential Pulse Current-Mode Signaling

In pulse signaling only a small portion of the wire is charged during pulse propagation, significantly reducing the amount of capacitance need to be charged and hence, saving considerable amount of power over level-based signaling. It has been shown that the use of pulse signaling can save up to 50% of energy compared to level-based signaling with repeater insertion [9]. Furthermore, it has been demonstrated through analytical models that more than 70% power saving could be achieved by combining pulse signaling with wavepipelining technique without penalties of data throughput [10]. Since the main goal of this work is to achieve both high-speed global communication and low- power consumption, pulse signaling along with wave-pipelining is employed. In addition, differential current-mode signaling is used because of its high performance, better energy efficiency and noise immunity features [11][12][13]. Integration of dual-rail encoding and differential signaling has been realized using only two wires per link instead of four(two for dual-rail and two for dif- ferential signaling). This further reduces both power and required area of the link. In addition to power saving, pulse current-mode signaling mitigates the effect of dispersion due to its returnto-zero signaling scheme in which sharp current pulses are used to transmit data and receiver termination is employed. To make use of these promising advantages, the wires need to be modeled with consideration of the lossy on-chip environment. Wider and thicker wires with larger spacing than the minimum is preferred to ease attenuation and preserve pulse integrity. This can be realized with smaller area overhead in a serial link than in parallel links.

#### 2.3.1. Driver

In this link, a source-coupled differential current-steering driver, shown in Fig. 2.7, is used. It is fast because it has an extremely sharp transient response. The driver has also an advantage of reducing the AC component of the power supply noise because the circuit draws constant current from the supply



Figure 2.5: Shift register's TSPC FF with parallel loading



Figure 2.6: Upper asymmetric c-element

When it is operating. This driver is naturally suited to drive a balanced differential pair of wires. The complementary outputs of the driver are attached to the two wires. The other end of the transmission is parallel terminated into a positive voltage. Depending on the output of the dual-rail encoder, in other words input to the driver, current will be steered in one of the wires from the current source. When bit '1' is transmitted, a voltage pulse drives the gate of Mnp1 this in turn steers current pulse in wire1 and no current in wire0. And when bit '0' is transmitted, the current pulse is steered through Mnp0, which in turn steers the current pulse in wire0 and no current in wire1.



#### 2.3.2 Receiver

The termination load and receiver design is shown in Fig. 2.8. Diode connected Mpt0 and Mpt1 transistors are used as termination load. In ad- dition to termination, they are also used to mirror the wire current which will be needed to decode out data validity indicator. The transconductance of these transistors has been regulated through the use of Mpr0

and Mpr1. The receiver needs to have high common-mode noise rejection capability in order to take full advantage of differential signaling. Due to this, a high- speed self-biased differential amplifier is used. The differential amplifier used in this design has less sensitivity to process, temperature and supply voltage variations. It operates at high speed because its output switching currents are significantly greater than its quiescent current. Furthermore, the adopted amplifier has higher differential-mode gain than conventional amplifiers and a large common-mode input range because its bias condition adjusts itself to accommodate the input swing [14] [15].



Figure 2.8: Receiver

## 2.3.3Data Validity Decoder

In delay-insensitive transmission, decoding of data and data validity in- dicator at the receiving end is necessary. The transmitted data is received and decoded out directly in the receiver without the need for separate data decoding logic. This is due to the novel integration of pulse and differential signaling. The remaining issue is data validity indicator, which will also be used as a clock to shift the data bit in the deserializer. From the encoding, it is known that there will be current only in one of the wires when there is valid bit transmission and no current in both wires between two consecutive bit transmissions. Each wire's current is compared with a reference current using a current comparator and the output of the two current comparators is fed to a differential amplifier. The output of the differential amplifier is the data validity indicator (DVIout). This way of completion detection makes the communication robust to both delay variations and noise because it takes into account both wires and the used differential amplifier, which has a high common-mode noise rejection ratio. Both edges of DVIout signal indicate the availability of valid and new data at the receiver output. The circuit of the data validity decoder is shown in Fig.2.9.



# 2.4. Deserializer

The deserializer consists of a shift register and interfacing circuit (between the receiving module and the deserializer) as shown in Fig .10. In shift register data is shifted out at both edges of the DVIout signal. The shift register is designed from double-edge-triggered flip-flops. This flip-flop is designed by tying together the outputs of a negative and a edge-triggered TSPC positive flip-flops, obtaining multiplexer function for free. It stores dynamically during opposite clock phases and drives its output actively on both clock edges. The circuit of a double-edge-triggered flip-flop is shown in Fig.2.11. Mnrs1 and Mnrs2 transistors are used for resetting the flip-flop. When Stop bit reaches the last FF, the shifting will be stopped and data can be read out in parallel. The bundled-data two-phase request signal for parallel data receiving module is generated using a D-FF as shown in the interfacing circuit Fig.2.10. As soon as an acknowledgment is received, the deserializer's shift register will be cleared (resetted).



Figure 2.10: Deserializer and handshaking Circuits Implementation



Figure 2.11: Double-edge triggered TSPC FF of the deserializer

## 3. Simulation Results and Analysis

The performance, power consumption, and energy per bit of the presented serial link are discussed in this section. The performance, power consumption, and energy per bit of the presented serial link are discussed in this section. The simulation results are shown in below.

Figure 2.9: Data Validity decode

Volume 2 Issue 12, December2013 www.ijsr.net

# International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064

## Table 3.1: Serializer

| Technology | Power  | Delay | Power-  | Area        | Idd max & Idd avr |
|------------|--------|-------|---------|-------------|-------------------|
|            |        |       | delay   |             |                   |
|            |        |       | product |             |                   |
| Cmos       | 14.573 | 0.259 | 3.774   | 1487689     | Iddmax=13.554mA   |
| 018.rul    | Mw     | ns    | pw-sec  | $\lambda^2$ | Idd avr=3.818mA   |
| Cmos       | 7.185  | 0.141 | 1.013   | 1500696     | Iddmax=11.207mA   |
| 012.rul    | Mw     | ns    | pw-sec  | $\lambda^2$ | Idd avr=3.136mA   |
| Cmos       | 4,509  | 0.064 | 0.288   | 1471680     | Idd max=9,391mA   |
| 65n        | mW     | ns    | pw-sec  | $\lambda^2$ | Idd avr=2.362mA   |

Table 3.2: Counter

|   | Technology | Power  | Delay | Power-                  | Area        | Idd max & Idd avr |
|---|------------|--------|-------|-------------------------|-------------|-------------------|
|   |            |        |       | delay                   |             |                   |
|   |            |        |       | product                 |             |                   |
| Ī | Cmos       | 97.424 | 0.038 | 3.702x                  | 59823       | Idd max=4.096mA   |
|   | 018.rul    | μW     | ns    | 10 <sup>-15</sup> w-sec | $\lambda^2$ | Idd avr=0.186mA   |
| Ī | Cmos       | 26.252 | 0.015 | 0.997x                  | 59041       | Idd max=4.148mA   |
|   | 012.rul    | μW     | ns    | 10 <sup>-15</sup> w-sec | $\lambda^2$ | Idd avr=0.229mA   |
| ſ | Cmos 65n   | 14.831 | 0.009 | 0.133x                  | 60522       | Idd max=3.497mA   |
|   |            | μW     | ns    | 10 <sup>-15</sup> w-sec | $\lambda^2$ | Idd avr=0.205mA   |

Table 3.3: Parallel in Serial Out Shift Register

| Technology | Power | Delay | Power-  | Area                   | Idd max &       |
|------------|-------|-------|---------|------------------------|-----------------|
|            |       |       | delay   |                        | Idd avr         |
|            |       |       | product |                        |                 |
| Cmos       | 4.934 | 0.096 | 0.473   | dx=2196 λ              | Iddmax=7.304mA  |
| 018.rul    | mW    | ns    | pw-sec  | dy=496 λ               | Idd avr=1.675mA |
|            |       |       |         | $=1089216 \lambda^{2}$ |                 |
|            |       |       |         |                        |                 |
| Cmos       | 2.716 | 0.043 | 0.116   | dx=2189 λ              | Idd             |
| 012.rul    | mW    | ns    | pw-sec  | dy=489 λ               | max=7.588mA     |
|            |       |       |         | $=1070421 \lambda^{2}$ | Idd avr=1.578mA |
| Cmos 65n   | 1.673 | 0.035 | 0.058   | dx=2191 λ              | Idd             |
|            | mW    | ns    | pw-sec  | dy=486 λ               | max=5.740mA     |
|            |       |       |         | $=1064826 \lambda^{2}$ | Idd avr=1.153mA |

Table 3.4: Shift Register's TSPCFF with Parallel Loading

| Technology | Power  | Delay | Power-                  | Area        | Idd max & Idd avr |
|------------|--------|-------|-------------------------|-------------|-------------------|
|            |        |       | delay                   |             |                   |
|            |        |       | product                 |             |                   |
| Cmos       | 1.384  | 0.063 | 0.087                   | 338910      | Idd max=8.592mA   |
| 018.rul    | mW     | ns    | pw-sec                  | $\lambda^2$ | Idd avr=0.330mA   |
| Cmos       | 70.626 | 0.026 | 1.836 x                 | 374220      | Idd max=2.338mA   |
| 012.rul    | μW     | ns    | 10 <sup>-15</sup> w-sec | $\lambda^2$ | Idd avr=0.009mA   |
| Cmos 65n   | 0.488  | 0.017 | 0.008                   | 350472      | Idd max=6.073mA   |
|            | mW     | ns    | pw-sec                  | $\lambda^2$ | Idd avr=0.356mA   |

 Table 3.5: Upper Asymmetric C-Element

| Technology | Power  | Delay | Power-  | Area        | Idd max & Idd avr |
|------------|--------|-------|---------|-------------|-------------------|
|            |        |       | delay   |             |                   |
|            |        |       | product |             |                   |
| Cmos       | 4.362  | 0.033 | 0.143   | 444564      | Idd max=9.286mA   |
| 018.rul    | mW     | ns    | pw-sec  | $\lambda^2$ | Idd avr=0.472mA   |
| Cmos       | 0.646  | 0.011 | 0.001   | 468046      | Idd max=6.082mA   |
| 012.rul    | mW     | ns    | pw-sec  | $\lambda^2$ | Idd avr=0.538mA   |
| Cmos 65n   | 1.527m | 0.007 | 0.010   | 439318      | Idd max=8.146mA   |
|            | W      | ns    | pw-sec  | $\lambda^2$ | Idd avr=0.561mA   |

Table 3.6: Driver

| Technology | Power       | Delay       | Power-<br>delay<br>product | Area                                               | Idd max & Idd avr                  |
|------------|-------------|-------------|----------------------------|----------------------------------------------------|------------------------------------|
| Cmos       | 4.510       | 0.045       | 0.202                      | $\begin{array}{c}241301\\\lambda^2\end{array}$     | Idd max=5.580mA                    |
| 018.rul    | mW          | ns          | pw-sec                     |                                                    | Idd avr=0.884mA                    |
| Cmos       | 2.032       | 0.015       | 0.030                      | $270643 \\ \lambda^{2}$                            | Idd max=4.583mA                    |
| 012.rul    | mW          | ns          | pw-sec                     |                                                    | Idd avr=0.655mA                    |
| Cmos 65n   | 1.656<br>mW | 0.010<br>ns | 0.016<br>pw-sec            | $\begin{array}{c} 244352 \\ \lambda^2 \end{array}$ | Idd max=4.445mA<br>Idd avr=0.665mA |

| Table | 3.7: | Data | validity | Decoder |
|-------|------|------|----------|---------|
| Lanc  | J./. | Data | vanuity  | Decouci |

| Technology | Power  | Delay | Power-delay<br>product  | Area        | Idd max & Idd avr |
|------------|--------|-------|-------------------------|-------------|-------------------|
| Cmos       | 26.031 | 0.050 | 1.301x10 <sup>-15</sup> | 92518       | Idd max=0.058mA   |
| 018.rul    | μW     | ns    | w-sec                   | $\lambda^2$ | Idd avr=0.001mA   |
| Cmos       | 3.972  | 0.018 | $0.071 \times 10^{-15}$ | 94010       | Idd max=0.029mA   |
| 012.rul    | μW     | ns    | w-sec                   | $\lambda^2$ | Idd avr=0.000mA   |
| Cmos       | 2.225  | 0.012 | $0.026 \times 10^{-15}$ | 96188       | Idd max=0.036mA   |
| 65n        | μW     | ns    | w-sec                   | $\lambda^2$ | Idd avr=0.000mA   |

| Table | 3.8: | Receiver |
|-------|------|----------|
|-------|------|----------|

| Tashnalagu  | Dowor  | Dalay | Power-delay             | Aroo        | Idd max &       |
|-------------|--------|-------|-------------------------|-------------|-----------------|
| reciniology | rower  | Delay | product                 | Alea        | Idd avr         |
| Cmos        | 56.608 | 0.054 | 3.056x                  | 49200       | Idd max=0.791mA |
| 018.rul     | μW     | ns    | 10 <sup>-15</sup> w-sec | $\lambda^2$ | Idd avr=0.009mA |
| Cmos        | 11.531 | 0.026 | $0.299 \times 10^{-15}$ | 53111       | Idd max=1.031mA |
| 012.rul     | μW     | ns    | w-sec                   | $\lambda^2$ | Idd avr=0.004mA |
| Cmos        | 4.985  | 0.038 | 0.189x10 <sup>-15</sup> | 52438       | Idd max=0.553mA |
| 65n         | μW     | ns    | w-sec                   | $\lambda^2$ | Idd avr=0.002mA |

 Table 3.9: Double-Edge Triggered TSPC FF of the Deserializer

| Desenalizer |       |       |                            |             |                      |  |  |  |
|-------------|-------|-------|----------------------------|-------------|----------------------|--|--|--|
| Technology  | Power | Delay | Power-<br>delay<br>product | Area        | Idd max &<br>Idd avr |  |  |  |
| Cmos        | 1.001 | 0.038 | 0.038                      | 80229       | dd max=0.791mA       |  |  |  |
| 018.rul     | mW    | ns    | pw-sec                     | $\lambda^2$ | Idd avr=0.009mA      |  |  |  |
| Cmos        | 0.451 | 0.014 | 0.006                      | 84862       | dd max=1.031mA       |  |  |  |
| 012.rul     | mW    | ns    | pw-sec                     | $\lambda^2$ | Idd avr=0.004mA      |  |  |  |
| Cmos        | 0.352 | 0.008 | 0.002                      | 75870       | dd max=0.553mA       |  |  |  |
| 65n         | mW    | ns    | pw-sec                     | $\lambda^2$ | Idd avr=0.002mA      |  |  |  |

 Table 3.10:
 DESERIALIZER

| Technology | Power | Delay | Power-<br>delay<br>product | Area        | Idd max & Idd avr |
|------------|-------|-------|----------------------------|-------------|-------------------|
| Cmos       | 2.532 | 0.151 | 0.382                      | 49200       | Idd max=4.293mA   |
| 018.rul    | mW    | ns    | pw-sec                     | $\lambda^2$ | Idd avr=0.608mA   |
| Cmos       | 1.150 | 0.067 | 0.077                      | 1032264     | Idd max=4.449mA   |
| 012.rul    | mW    | ns    | pw-sec                     | $\lambda^2$ | Idd avr=0.526Ma   |
| Cmos       | 0.738 | 0.052 | 0.038                      | 1022070     | Idd max=3.923mA   |
| 65n        | mW    | ns    | pw-sec                     | $\lambda^2$ | Idd avr=0.412mA   |

Table 3.11: Semi Serial Link

| Technology | Power  | Delay | Power-<br>delay<br>Product | Area        | Idd max & Idd avr |
|------------|--------|-------|----------------------------|-------------|-------------------|
| Cmos       | 22.216 | 0.247 | 5.487                      | 3944790     | Idd max=17.969mA  |
| 018.rul    | mW     | ns    | pw-sec                     | $\lambda^2$ | Idd avr=5.583mA   |
| Cmos       | 6.518  | 0.043 | 0.280                      | 4144628     | Idd max=11.586mA  |
| 012.rul    | mW     | ns    | pw-sec                     | $\lambda^2$ | Idd avr=3.333mA   |
| Cmos 65n   | 10.464 | 0.115 | 1.203                      | 3989131     | Idd max=13.887mA  |
|            | mW     | ns    | pw-sec                     | $\lambda^2$ | Idd avr=4.341mA   |

## 4. Conclusion

The design and analysis of a high-throughput and low-power serial on-chip communication link is presented. The combination of pulse dual rail encoding, wave-pipelining, pulse signaling and differential current-mode signaling besides customization of serializer/deserializer circuits leads to a realization of high-throughput serial link with low power consumption. Our simulation results showed in the above. The links are designed and simulated using cmos 180nm, 120 and 65 nm technologies in microwind 3.1 cadtool. When technology scale down from 180nm to 65nm, power decreased from 22.216mw to 10.464mw, delay decreased from 0.247ns to 0.115ns and power-delay product varies from 5.487pw-sec to 1.203pw-sec. This link is a promising candidate for long-range NoC channels, which are needed inherently due to topologies or through customization of regular 2D networks.

# References

- D. Troung, W. Cheng, T. Mohsenin, Z. Yu, A. Jacobson, G. Landge, M. Meeuwsen, C. Watnik, A. Tran, Z. Xiao, E. Work, J. Webb, P. Mejia, and B. Baas, "A 167-Processor computational platform in 65 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 44, no. 4, pp. 1130– 1144, Apr. 2009.
- [2] S. Bell, B. Edwards, J. Amann, R. Conlin, K. Joyce, V. Leung, J. MacKay, and M. Reif, "Tile64TM processor: A 64-core soc with mesh interconnect," in *Dig. Techn. Papers, IEEE Solid-State Circuits Conf.*, 2008, pp. 88– 598.
- [3] J. D. Owens, W. J. Dally, R. Ho, D. N. Jayasimha, S. W. Keckler, and L.-S. Peh, "Research challenges for onchip interconnection networks," *IEEE Micro*, vol. 27, no. 5, pp. 96–108, Sep.-Oct. 2007.
- [4] L. Benini and G. Micheli, Networks on Chips: Technology and Tools. San Francisco, CA: Morgan Kaufmann, 2006.
- [5] M. Krstic, E. Grass, F. K. Gurkaynak, and P. Vivet, "Globally asynchronous, locally synchronous circuits: Overview and outlook," *IEEE Design Test Comput.*, vol. 24, no. 5, pp. 430–441, Sep.-Oct. 2007.
- [6] E. Beigne, F. Clermidy, H. Lhermet, S. Miermont, Y. Thonnart, X.-T. Tran, A. Valentian, D. Varreau, P. Vivet, X. Popon, and H. Lebreton, "An asynchronous power aware and adaptiveNoC based circuit," *IEEE J. Solid-State Circuits*, vol. 44, no. 4, pp. 1167–1176, 2009.
- [7] P. Wang, G. Pei, and E. C.-C. Kan, "Pulsed wave interconnect," *IEEE Trans. Very Large Scale Integr.* (VLSI) Syst., vol. 12, no. 5, pp. 453–463, May 2004.
- [8] S. Borkar, "Designing reliable systems from unreliable components the challenges of transistor variability and degradation," *IEEE Micro*, vol. 25, no. 6, pp. 10–16, Nov.-Dec. 2005.
- [9] M. Chen and Y. Cao, "Analysis of pulse signaling for low-power on-chip global bus design," in *Proc. 7th IEEE Int. Symp. Quality Electron. Design*, 2006, p. 6.
- [10] T. Kuboki, A. Tsuchiya, and H. Onodera, "A 10 Gbps/channel on-chip signaling circuit with an impedance-unmatched CML driver in 90 nm CMOS technology," in *Proc. IEEE Asia South Pacific Design Autom. Conf.*, 2007, pp. 120–121.

- [11] Maheshwari and W. Burleson, "Differential currentsensing for on-chip interconnects," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 12, no. 12, pp. 1321–1329, 2002.
- [12] R. Bashirullah, L. Wentai, and R. K. Cavin, "Currentmode signaling in deep submicrometer global interconnects," *IEEE Trans. Very Large Scale Integr.* (VLSI) Syst., vol. 11, no. 3, pp. 406–417, Jun. 2003
- [13] W. J. Dally and J. W. Poulton, *Digital Systems Engineering*. Cambridge, MA: Cambridge University Press, 1998.
- [14] M. Bazes, "Two novel fully complementary self-biased CMOS differential amplifiers," *IEEE J. Solid-State Circuits*, vol. 26, no. 2, pp. 165–168, Feb. 1991.

## **Author Profile**



**M. Chennakesavulu** is currently working as an associate professor in RGM Engineering College, Nandyal. He was completed his M.Tech Embedded system at JNT University, Anantapur. He obtained his

B.Tech from JNT University and has presented and published seven papers in National, International conferences. Currently, his current areas of research are Fault tolerant data buses and Low Power interconnects in system on chip



**Jestadi Raghu** received his B-tech degree from Alfa College of Engineering and Technology in Electronics and communication and pursuing M-tech Degree in Rajeev Gandhi Memorial College of Engineering And

Technology in the specialization Digital Systems and Computer Electronics. His research interests are on Low Power and high performance VLSI designs.