# Analysis, Physical Design and Power Optimization of Block Signal Estimator for Hign Speed Serial Interface

### Sachin Revannavar<sup>1</sup>, Dr. H. V. Ravish Aradhya<sup>2</sup>

<sup>1, 2</sup>Department of ECE, R. V. College of Engineering, Bangalore, India

Abstract: Present day in VLSI world, the way the complexity level of IC technology is advancing day by day, it is very important to have best design methods and power optimization schemes for the PD (Physical Design) along with timing closure and physical verification. Today's complex IC designs require good physical design strategies to ensure its high quality and also to meet the required timing target. There are different methodologies that can be used when designing but the paper concentrates on the Top down based approach. Traditionally Blocks are tested from the top level which was giving low coverage due to lack of flexibility and also there is a chance of missing some freedom to floor plan the design at the block level.

Keywords: floorplan, placement, CTS, routing, sign off timing and physical verification

#### 1. Introduction

Physical design is an important part of the ASIC design flow and as the technology shrinking power is also become an important factor, mainly the leakage power in one of the important concern in the VLSI industry due to gate count of the chip. In this paper we are telling the approach of physical design for the block signal estimator, and use of different methods to reduce the power. The signal estimator is one of the blocks which is used in the High-speed serial interface. HSI is also called SERDES SER for serializer and DES for deserializer. Core data rate is much lower than the interface. Digital signal processing usually employs parallel architecture. HSI requires a data-rate converting unit. Serializer low-speed parallel data to high-speed serial data. Deserializer: high-speed serial data low-speed parallel data.



Fig Error! No text of specified style in document..1: High-speed serial Interface block diagram

#### 2. Implementation Details

Physical design flow starts with the floorplan and succeeded with the following stages like placement, CTS, routing, sign off timing and physical verification. Floor planning is the first step and very important step for implementing a netlist to GDSII. Different technology nodes will have different requirements to be met during floor planning and power planning depending upon the process. In this stage there no scope to do the power optimization. Floor planning generally involves.

1. Define chip or die geometry.

- 2. Pin placement.
- 3. Macro placement.

Once floor planning is finished then power routing is considered. Power routing is often a most straight forward step. Power to the chip is supplied by power pads or bumps. Pads are placed on chip boundary and bumps are placed all over in the chip area, which is above highest routable layer. In both the cases, power needs to be distributed to entire chip connecting to macros and standard cells.

After finishing floor planning and power planning, standard cell placement and optimization is done to achieve various design goals. Design goals can be timing, power and area. Placement optimization tries to achieve these goals and makes sure that the design is routable as well. Nut shell tool will first do placement of cells, high fan-out buffering, logical restructuring to meet timing, cell sizing, buffering, cloning, the threshold voltage (Vt) swapping for leakage optimization, area reduction. The database at the end of placement optimization would be a legally placed and optimized design. Goals for Placement Optimization.

Goals of placement optimization are to reduce

- 1. Congestion
- 2. Area
- 3. Leakage power

Next step is clock tree building, all clock nets were treated as ideal nets. This means that they are not optimized by placement optimization (placement optimization would only place buffers in free space). But Clock Tree Synthesis (CTS) has all the freedom to change the placement which was already done. Also from timing prospective, all the clock nets and cells delays are assumed to be zero. That means clock skew is zero. Building CTS, as shown in fig. 4.5 [1], which means making clock tree on high fan-out clock nets and optimization clock paths. The target for CTS is to reduce clock skews, minimize clock insertion delay (also known as clock latency) and fixing maximum capacitance/slew violations. Before starting actual CTS, specifications and targets for CTS should be specified. Specifications are global clock skew, minimum insertion

www.ijsr.net

#### Licensed Under Creative Commons Attribution CC BY

delay, maximum capacitance, maximum slew and maximum fan-out. Goals for CTS

- Goals of CTS are to reduce
- 1. Clock skew
- 2. Clock latency
- 3. Logical DRCs



Figure 2.1: Clock tree synthesis

Once the placement and CTS optimization are finished, the final (major) step to be done is routing. In this step tool routes all the signal nets in the design. When a net is logically connected to multiple pins, physically connect them using metal layers and vias. Every foundry defines a set of routing rules (also synonymously called as DRC rules). Those routing rules contain:

- 1) Definition of metal layers and there preferred routing direction.
- 2) Defining width and spacing constraints for each metal layers
- 3) Defining other complex rules like, minimum area, minimum edge, notch rule etc.
- 4) Defining vias which connect two adjacent metal layers, their size and the spacing requirements.

Routing in all tools is done in 3 steps for best control over QoR and runtime. These steps are global routing, track routing and detail routing.

## 3. Results

There are no macros in the design hence power is not important factor at floorplan stage.

| Design Dim  | ensions               |                  |            |
|-------------|-----------------------|------------------|------------|
| Specify By: | size 👃 Die/IO/Core    | Coordinates      |            |
| Core S      | ze by: 💧 Aspect Ratio | Ratio (HW)       | 1.89795918 |
|             |                       | Core Utilization | 0.519325   |
|             |                       | Cell Utilization | 0.519325   |
|             | C Dimension           | 10.02            | 150.528    |
|             |                       | Height           | 185.666    |
| Ú De Sz     | e by                  | moth             | 158.528    |
|             |                       | Heght            | 285,896    |

Figure 3.1: Design dimensions

There are many steps in the floorplan stage first create the black box of the Core size {150.528 285.696} as given in the design constrain is shown in fig 3.1.

There are no macros present in the design hence next step is adding the different types of cells to the design like endcaps, welltaps.



Figure 3.2: Welltaps and Endcaps

Fig 3.2 shows the insertion of welltaps and endcaps to the design. Uses of these cells are to maintain the nwell continuity as well as these cells are placed around the macros to reduce the congestion around it. routing power grid fig 3.3(a) shows how the power lines are routed and 3.2(b) shows that connection of different metal layers using via.



Figure 3.3: (a)



**Figure 3.3:** (a) Power routing (b) Connection of metal layer

Once power routing is completed, next step is pin placement and adding the boundary blocakage to the design. Fig 3.4 (a) shows the pin assignment and final floorplan (b) shown the boundary blockage added to the design.Pin assignment is the important step that depends upon the Chip level suggesstion at what side that macro is communicating with other macro for signal estimator when it is assembled at the top level it is communicating with top side macro so all transmitting and reciving pins are placed at the top edge. If all pins are placed at the top edge then it will increase the congestion at that side due to high number of pins at the top edge it may cause the DRC violation, so remaining control pins are placed at the left edge.



Figure 3.4: (a)



**Figure 3.4: (b) Figure 3.4: (a)** Pin assignment (b) Boundary Blockage

Second stage of physical design is placement. Fig 3.5 shows that how tool did the optimization while doing the placement and optimization. At the placement stage tool will complete two tasks one is placement of standard cells and optimization like fixing the DRV's and set up violtions. Fig 3.5 clearly indicates that before placement and optimization there are lot of set up violations. max\_cap violations and max\_trans violations we found once placement is completed it fixed all DRV violations set up also it fixed up to certain level. Clock tree is not built yet so it is unable to fix the all setup violation and it is unable to calculate the hold value without actual clock.

| Setup mole   all       | Setup mode   all      |
|------------------------|-----------------------|
| NMG (no.): -0.630      | WBS [85]:  +0.867     |
| TMG (no.): -263.944    | TMS [85]:  +7.703     |
| Limiating Pather: NATL | Violating Paths:  500 |
| All. Pather: NATL      | All Paths:  7300      |

|   | -0.00000000                                      | Ra                                            | I                               | 1cal                                               | 1 100                                           | Rea                              | k i                      | Total                               | ļ |
|---|--------------------------------------------------|-----------------------------------------------|---------------------------------|----------------------------------------------------|-------------------------------------------------|----------------------------------|--------------------------|-------------------------------------|---|
| ļ | 101                                              | lir setsiteme)                                | Kerst Vize                      | l Kr nets(terms)                                   | 1 2013                                          | Wr nets(tens)                    | Worst Vie                | l fir nets(terns)                   | 1 |
|   | navçica)<br>navçiran<br>navçinanut<br>navçinanut | 11 (11)<br>  297 (3718)<br>  0 (0)<br>  0 (0) | -3.24<br>  -31.40<br>  1<br>  1 | 13 (3)<br>(37 (1)8)<br>(4 (1)<br>(1 (1)<br>(1 (1)) | nax_cap<br>nax_tran<br>nax_fancut<br>nax_length | 0 (8)<br>0 (8)<br>0 (8)<br>0 (8) | 0.000<br>0.000<br>0<br>0 | 2 (2)<br>4 (3637)<br>8 (8)<br>8 (8) |   |

Figure 3.5: Initial and Final placement optimization report

Table 3.1 gives the complete details about the deign after placement like block density, count of Vt power number etc. based on the different multi Vt cells count there are three different experiment were completed. By seeing the results of the table 5.5 it is proved that by using multi-Vt cells power is reduced. After placement the set up timing WNS/TNS is looking good it is well below the threshold. Density is also less this is also important factor, if it is high then in future there may be a chance of routing congestion. Next row is explained about the congestion which is the routing overflow information is less than the 1% in both vertical and horizontal directions. Finally, in the table is density map snapshot in the last row. This map gives the information about the placement congestion dark blue indicates that in that Gcell area the placement density is higher than 50% as the blue became lighter percentage of

# Volume 6 Issue 8, August 2017 <u>www.ijsr.net</u>

Licensed Under Creative Commons Attribution CC BY

congestion is high. If any Gcell is 100 % denser, then it will show as red color

|                             |                     | Table 3.1                             | : Plac      | ement Reports                          |                                                |
|-----------------------------|---------------------|---------------------------------------|-------------|----------------------------------------|------------------------------------------------|
| Parame<br>experim           | ter)<br>ent         | exp                                   | <u>e</u>    | expl                                   | exp2                                           |
| V4                          |                     | 385                                   |             | 385                                    | 385                                            |
| V3                          |                     | 21                                    |             | 4277                                   | 8369                                           |
| V2                          |                     | н                                     |             | 3046                                   | 14832                                          |
| V1                          |                     | 40956                                 |             | 36005                                  | 19272                                          |
| Totalpov<br>Leakage p<br>(m | ver/<br>ower<br>W)  | 173.6/31                              | .38         | 173.6 / 28.21                          | 147.7/15.04                                    |
| WNS/T<br>(pi                | NS<br>I)            | 0627 / -1                             | 4.87        | -0.067 / -7.703                        | -0.067 / -37.712                               |
| densit                      | y                   | 44.882                                | %.          | 45.852%                                | 42.806%                                        |
| Congestion                  | Over<br>(0.01<br>(0 | flow: 48 = 9<br>1% H) + 39<br>.03% V) | Overf<br>H) | low: 32 = 4 (0.00%<br>+ 28 (0.02% V) ( | Overflow: 47 = 3<br>0.00% H) + 44 (0.03%<br>V) |
| Density map                 |                     |                                       |             |                                        |                                                |

Next stage is CTS. Inputs to the clock tree building are given below. Table 3.2 gives the complete report of clock tree building stage, buffered clock tree is built in the design, hence clock tree power is reduced. Clock gating technique is adopted to reduce the power. In that highlighted column experiment gave higher power numbers so that experiment will not carry forward for the remaining stages. After CTS completes optimization or timing fixes will takes place table 5.3 and table 5.4 will give the report of the timing optimization. First tool will try to optimize the set up violations then DRV's and finally hold violations in the two optimization steps.

| optimization ster |                        |
|-------------------|------------------------|
| 1. CTS            | : concurrent           |
| 2. max fanout     | : 55                   |
| 3. max slew       | : 100 & 60             |
| 4. max skew       | : 100                  |
| 5. buffer depth   | : 20                   |
| 6. max_trans      | : 80                   |
| 7. preferred rout | ing layers : M8 and M9 |
| 8. NDR            | : double_isolate       |
|                   |                        |

|                                               | Table         | 5.2: CIS K    | eports                        |                                |
|-----------------------------------------------|---------------|---------------|-------------------------------|--------------------------------|
| experiment                                    | exp           | expl          | exp2<br>slew 100              | exp3<br>slew 60                |
| Totalpower/<br>Leakage power<br>( <u>mW</u> ) | 157.8 / 31.57 | 124.3 / 19.8  | 130.3 / 12.68                 | 101.6 / 10.94                  |
| WNS/TNS<br>(ps)                               | 0/0           | -0.041/-1.180 | -0.041 / -7.820<br>(801/1084) | -0.016 / -8.554<br>(1700/1168) |
| density                                       | 45.145%       | 46.191%       | 42.878%                       | 41.436%                        |

#### Table 3.3: Set up Optimization Reports

| OptMode<br>PowerEffort /<br>Siew | Totalpower/<br>Leakage<br>power<br>(mW) | WNS/TNS<br>(ps)     | Vt cell report                        | density |
|----------------------------------|-----------------------------------------|---------------------|---------------------------------------|---------|
| exp1                             | 146.2 / 19.11                           | -0.066 /<br>-2.498  | V3 : 11991<br>V2 : 9362<br>V1 : 21753 | 46.072% |
| exp2<br>Slew 100                 | 123.8 / 7.713                           | -0.037 /<br>-22.277 | V3 : 19911<br>V2 : 15151<br>V1 : 7279 | 42.840% |
| exp3<br>Slew 60                  | 118.8/7.766                             | -0.024 /<br>-22.291 | V3 : 18685<br>V2 : 14516<br>V1 : 8249 | 41.490% |

#### Table 3.4: Hold Optimization Reports

| OptMode<br>PowerEffor<br>1 / Slew | Totalpower/<br>Leakage<br>power(mW) | WNS/TNS<br>(ps)<br>setup | WNS/TNS<br>(ps)<br>hold | Vt cell<br>report                     | density  |
|-----------------------------------|-------------------------------------|--------------------------|-------------------------|---------------------------------------|----------|
| exp1                              | 151 / 18.92                         | -0.0467<br>-7.861        | -0.029 /<br>-4.124      | V3 : 18695<br>V2 : 9410<br>V1 : 13450 | 50:302%  |
| exp2<br>Slew 100                  | 128.9 /<br>7.948                    | -0.042 /<br>-26.917      | -0.037 /<br>-6.092      | V3 : 25784<br>V2 : 15166<br>V1 : 7279 | 46.727%  |
| exp3<br>Slew 60                   | 122.8 /<br>7.969                    | -0.024 /<br>-24.013      | -0.072 /<br>-5.461      | V3 : 23192<br>V2 : 14512<br>V1 : 8275 | 44.540 % |

Routing is the final stage of the physical design table 3.5. shows the final routing reports. Compared to the previous stage in all aspects results went worst because at the routing stage tool get the actual RC values, in previous stages it is ideal values.

| Table 3.5 | Routing | reports |
|-----------|---------|---------|
|-----------|---------|---------|

|                                  |                                         |                              | 0 1                       |                                        |          |
|----------------------------------|-----------------------------------------|------------------------------|---------------------------|----------------------------------------|----------|
| OptMode<br>PowerEffort<br>/ Siew | Totalpower/<br>Leakage<br>power<br>(mW) | (setup)<br>WNS/TNS<br>(ps)   | (hold)<br>WNS/TNS<br>(ps) | Vt cell<br>report                      | density  |
| exp2 100                         | 138.3 /<br>11.37                        | -0.064/-<br>12.403<br>(619)  | -0.038/<br>-0.318<br>(47) | V3 : 27504<br>V2 : 13556<br>V1 : 10566 | 48.632 % |
| exp3 60                          | 134 / 11.99                             | -0.039 /<br>-8.411<br>(773)  | -0.071/<br>-0.0579<br>(5) | V3 : 25502<br>V2 : 12438<br>V1 : 11938 | 46.877 % |
| exp2 100                         | 136.27<br>9.986                         | -0.027 /<br>-3.962<br>(2078) | -0.038/<br>-0.318<br>(12) | V3:28374<br>V2:14099<br>V1:9184        | 48.632%  |

# Volume 6 Issue 8, August 2017 <u>www.ijsr.net</u>

Licensed Under Creative Commons Attribution CC BY

Final timing is checked using the tool tempus fig 3.5 shows the number of violations between different path groups and WNS and TNS.

| Set up reports : |              |               |             |               |  |
|------------------|--------------|---------------|-------------|---------------|--|
| 41               | 306          | HERE          | EXE         | ECOF          |  |
| ani:16:x(0       | WE/MOVE      | M5:06:40      | W0766402    | Mit: Mit: Mit |  |
| 1.041-1.041      | 13.04:13.04d | 6,000;0,000;0 | 0,000,000,0 | 1,00;1,00;0   |  |

#### hold report :

| AD               | 109E            | 2007          | RECIRE        | ECOF          |
|------------------|-----------------|---------------|---------------|---------------|
| Microsoft (1996) | WG20024530      | VE:NOVE       | W6c1W2+W11    | WG:PR:prid    |
| £1,8%~1%,40;22   | -11.89(-95,40:0 | 0,000;0,000;1 | 0,000:0,000:0 | 8,000;0,000;0 |

Fig 5.5: Tempus Timing Summary

Physical design for the block signal estimator was completed floor plan is takes place in such way that it is helpful for the chip level. Pin placement is completed as per the suggestion of top level. scan chain reordering is done so that the placement congestion and routing congestion are reduced. Buffering and cloning are done for those cells that are placed at longer distance and for the cells having high fan out. Crosstalk and noise are reduced by maintaining a larger space between the nets which travels parallel for a longer distance. This project adopted power optimization techniques like usage of multi-threshold cells and reducing slew rate that effectively reduced power consumption.

## 4. Conclusion

The proposed approach in this paper completes physical design of the block signal estimator with a leakage power of block 9.986 which is 33% improvement in the leakage power as compared to the previous design. This paper also identified few power optimization techniques which can be used at physical design stage.

# References

- Vazgen Melikyan, Eduard Babayan, Anush Melikyan, Davit Babayan, Poghos Petrosyan, Edvard Mkrtchyan "Clock gating and multi-VTH low power design methods based on 32/28 nm ORCA processor", East-West Design & Test Symposium (EWDTS), 2015 IEEE.
- [2] www.mentor.com/dsm
- [3] V. Melikyan E. Babayan "Low power design of digital ICs (based on SAED 90nm Educational Design Kit)" in Instructional guidelines for advanced laboratory works pp. 80 2012.
- [4] Ashutosh Gupta, Kiran Rawat, Sujata Pandey, Pradeep Kumar, Saket Kumar, H.P.Singh, "Physical Design Implementation of 32-bit AMBA ASB APB module with improved performance", International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT) - 2016.
- [5] Seongbo Shim, Woohyun Chung, and Youngsoo Shin, "Lithography Defect Probability and Its Application to

Physical Design Optimization", IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS.

[6] Atsushi Kurokawa, Member, IEEE, Takashi Sato, Member, IEEE, Toshiki Kanamoto, Member, IEEE, and Masanori Hashimoto, Member, IEEE "Interconnect Modeling: A Physical Design Perspective", IEEE TRANSACTIONS ON ELECTRON DEVICES, VOL. 56, NO. 9, SEPTEMBER 2009.