# Design and Implementation of Argo And Falp Method on NOC and Their Analysis

#### A. Chezhiyan<sup>1</sup>, M. R. Mahalakshmi<sup>2</sup>

<sup>1</sup>P.G Scholar, Department of Electronics and communication Engineering, Sri Muthukumaran Institute of Technology, Chennai, Tamilnadu, India

<sup>2</sup>Assistent Professor, Department of Electronics and Communication Engineering, Sri Muthukumaran Institute of Technology, Chennai, Tamilnadu, India

Abstract: Most communication traffic in today's Network on Chips (NOC) is based on router for volatile memory based designs. The NOC should be designed to efficiently handle the many-to-one communication pattern, data access to and from the routing controller. This paper motivate the use of separate network for routing and justify the power consumption and performance improvement is obtained by using Argo and FALP methods, when compared to traditional round robin method. The paper shows how the congestion is avoided using Argo method and state transition based Finite State Machine memory controller is designed for on-chip cores which optimize area and power. Our experiments on a realistic NOC multimedia benchmark shows a large reduction in power consumption and improvement in throughput when compared to existing solutions.

Keywords: NOC, RRB, Argo, FALP, Router

#### 1. Introduction

Network-on-Chip is a general purpose on-chip communication concept that offers better throughput, which deals with complexity of modern systems. Network-on-chip (NOC) is a complex interconnection of various functional elements. It iterates communication bottleneck in the gigabit communication due to its bus structured architecture. Thus there was need of such system that can modularity and parallelism, network-on-chip provide many such attractive properties and solve the problem of communication bottleneck. It basically works on the idea of interconnection of cores using on chip network..

## 2. Network on Chip

Network-on-chip(NOC), comparatively a new concept that comeforth as a system-on-chip (SOC) communication methodology, take over many ideas from the computer networks, the knowledge domain in which the research on routers and packet switching has matured.A scheduling algorithm computes which packet has to be forwarded prior to the other packets. NoC structured as a 4-by-4 grid which provides global chip-level communication. It employs a grid of routing nodes spread out across the chip, connected by communication links. For now, we will accommodate a simplified perspective in which the NoC contains the following fundamental components. Network adapters implement the interface by which cores (IP blocks)connect to the NoC. The scaling of microchip technologies has enabled large scale systems-on-chip(SoC)which the NoC contains the following fundamental components. Network adapters implement the interface by which cores (IP blocks)connect to the NoC. Their function is to decouple computation (the cores) from communication .



**Figure 1:** NOC 4-by-4 grid structured NoC The NoC in the figure could thus employ packet or circuit switching or something entirely different and be implemented using asynchronous, synchronous, or other logic.

## 3. Review of Existing Method

An effective adaptive routing algorithm can help minimize path congestion. However, conventional adaptive routing schemes use only channel-based information to notice the congestion status. Due to the deficiency of switch-based information, it is difficult to unveil the real congestion status of channel-based information along the routing path. Switch Congestion: When a packet is transmitted through the north output port, some packets that receive a failed output request must be blocked and then be queued at the input buffers. The routed packet has to wait for this channel to be released. Channel Congestion: Because of the limited input buffer size, the switch runs out of buffer space. Notably, due to the backpressure effect of the link-level flow control, switch and channel congestion are severely correlated. That is, switch congestion in one router can reconstruct itself as channel congestion in one of the adjacent routers. Therefore, path

congestion can start to build and spread from a congested switch to source nodes, which grows into a congestion tree. It highly demeans the overall system performance, especially in real-time applications with strict latency requirements.



Figure 2: Switch contention, switch congestion, and channel congestion

Congestion-aware adaptive routing selects an output channel based on various types of network congestion information. Consequently, these selection functions can adjust path selection based on a time-variant congestion status. Congestion aware adaptive routing adopts two types of spatial information: local information and regional information. Local routing information considers local information such as the downstream buffer count and available flit slot to assess the traffic status

## 4. Block Diagram



Nios II is a 32-bit embedded-processor architecture designed specifically for the Altera family of FPGAs. Nios II architecture is a RISC soft core architecture which is implemented entirely in the programmable logic and memory blocks of Altera FPGAs. User-defined instructions accept values from up to two 32-bit source registers and optionally write back a result to a 32-bit destination register. DRAM plays vital role for system design such as RAM, cache memories etc. To increase row buffer locality, existing paper introduced a Thread row buffer to increase throughput and DRAM's overall performance.TRB increases row hit rate by reusing row that consists of same information's. Data access between DRAM banks are controlled by a logic controller consists of sequential elements. Logic controller is mostly accessed by every blocks in DRAM circuitry. Memory controllers contain the logic necessary to read and write to DRAM, and to "refresh" the DRAM. Without constant refresh, DRAM will drop off data written to it as the capacitors leak the their charge within a fraction of a second. Reading and writing to DRAM is performed by selecting the row and column data addresses of the DRAM as the inputs to the multiplexer circuit, where the demultiplexer on the DRAM uses the converted inputs to select the correct memory location and



Figure 4: DRAM Bank



Figure 5: Network Interface

return the data, which is then passed back through a multiplexer to consolidate the data in order to reduce the required bus width for the operation.

# 5. Network Interface

The Network Interface (NI) is the module that acts as an abstraction layer in between the network protocol and internal UDN/MDN switch protocols. There are two types of NI: Input NI (INI) and Output NI (ONI). NI consists of a packetization unit (PU), a de-packetization unit (DU) and PE interface. NI is located between a router and a PE, decoupling the communication and computation. When a packet is injected into the switch, INI encapsulates (packetizes) them into U(M)DN packets, and carry them to the next router, in equally sized . ONI, in return, receives the U(M)DN flits, strips (depacketizes) the original packet, and ejects it from the switch. It offers a memory-mapped view on

#### International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064 Impact Factor (2012): 3.358

all control registers in NI. That is, the registers in NI can be accessed using conventional bus interface. N to 1 bit serializers – one for each outgoing wire. Data Distributor to send data from output queues to one of the serializers. Each distributor can send data to each of the serializers. Not all the distributors are loaded all the time a single distributor can serve all the serializers. To that particular sending channel. As input to the 32-bit to 1-bit serializer, there is one OR gate to allow all the data distributors to be able to forward data to each serializer. The other OR gate and 1-bit output are hand-shaking signals between the data distributor and the serializer. The advantage of the new NI design is area savings.

# 6. Argo and Falp Method

In this paper we describe the use of asynchronous routers in a time-division-multiplexed (TDM) network-on-chip (NOC), Argo, that is being developed for a multi-processor platform for real-time systems.TDM need a common time reference, and existent TDM-based Network on chip designs are either synchronous or mesochronous. We use asynchronous to accomplish a simpler, smaller and more robust, self-timed design. Our design provide the fact that pipelined asynchronous circuits also act as ripple FIFOs. Thus, it keep off the need for explicit synchronization FIFOs between the routers. Argo has interesting timing properties that allow it to tolerate skew between the network interfaces (NIs).



Figure 6: Argo Block Diagram

The paper presents Argo NOC-architecture and provides a quantitative analysis of its ability of absorb skew between the Nis by using a signal transition graph model and realistic component delays. Network-on-Chip (NoC) are known as the future communication infrastructure for many-core systems. They are capable of malfunction in the presence of the faults as technology sizes reduce proportionally the performance of the system. According to the results, links have failed 71% due to the Crosstalk fault. A new faultadaptive and low power, calling FALP, method for Network on chip routers is presented in this article to extenuate their unexpected behavior through links. It reduces the switching power consumption overhead by keep track of the frequency and life time of faults. However, based on the VHDL execution of a Network on chip router, routing unit has a trifling area comparing to other components of an Network on chip router. In router architecture less than 11% of an NoC router area is occupied by routing component embedded with FALP technique.

## 7. Simulation Result

In this paper the analysis is based on typical and constant gate delay values. In fig 11 shows the output waveform of existing system. and fig 12 shows the waveform of proposed system. We have explored the use of min-max delay intervals and it leads to higher throughput values .the below table shows the performance evaluation table and therefore reduced skew tolerance.



Figure 7: Output waveform for existing system



Figure 8: output waveform for proposed system

| PARAMETERS  | EXISTING<br>SYSTEM | PROPOSED<br>SYSTEM |
|-------------|--------------------|--------------------|
| AREA        | 1281               | 388                |
| DELAY       | 0.0044ms           | 0.0041ms           |
| TOTAL POWER | 87.42mw            | 74.89mw            |
| THROUGHPUT  | 7152               | 7744               |

Figure 9: Performance Evaluations Table

# 8. Conclusion

In this paper we emulated the performance of Path congestion aware routing and Argo on quartus II platform. The paper extended previous work on synchronous and mesochronous TDM-based NOCs by exploring the use of asynchronous routers that allow a truly GALS-style implementation of a NOC-based multi-core platform. We compared the performance of these two routing method in terms of number of resources utilized, throughput , area , power and delay. Path congestion aware routing consumes

Volume 3 Issue 11, November 2014 <u>www.ijsr.net</u> Licensed Under Creative Commons Attribution CC BY more resources, which mean it utilizes more silicon area. Argo has high clock frequency than the Path congestion aware routing, which means Argo could process data more quickly. In this design, we should make a trade-off among the resource or silicon area, maximum clock frequency and delay and choose suitable arbitration mechanism according to that. We can propose a novel fault-tolerant method has been proposed to improve the reliability of NoC routerlinks.

This method is composed of fault detection and fault tolerant techniques, which is implemented in different zones of the network. We have not yet explored this property.

## References

- [1] En-Jui, Chan Hsien KaiHsin, (2014) "Path Congestion Aware Adaptive Routing With a Contention Prediction Scheme for NOC Systems", IEEE Transactions on computer-aided design of integrated circuits and systems, VOL. 33, pp 113-126.
- [2] Ghiribaldi.A, Bertozzi.D, BNowick.S.M, (2013) "A transition-signaling bundled data NoC switch architecture for cost-effective GALS multicore systems," Proceedings Design, Automation, and Test in Europe Conference and Exhibition, pp. 332–337.
- [3] Kasapaki.E, Sparsø.J, Sørensen.R, and Goossens.K,(2013) "Router Designs for an Asynchronous Time-Division-Multiplexed Network-on-Chip," in Proc. of Euromicro Conference on Digital System Design (DSD), pp. 319–326.
- [4] Panades.I.M, Greiner.A, Sheibanyrad.A, (2006)"A low cost network-on- chip with guaranteed service well suited to the GALS approach," in 1st International Conference on Nano-Networks (Nano-Net), pp. 1–5..
- [5] Schoeberl.M, Brandner.F, Sparsø.J, and Kasapaki.E,( 2012) "A Statically Scheduled Time-Division-Multiplexed Network-on-Chip for Real-Time Systems," in Proc. IEEE/ACM Intl. Symposium on Networks-on-Chip (NOCS). IEEE Computer Society Press, pp. 152– 160.
- [6] Sparsø.J,Kasapaki.E, Schoeberl.M,(2013) "An Areaefficient Network Interface for a TDM-based Networkon-Chip," in Proc. Design Automation and Test in Europe (DATE), pp. 1044–1047.