# Low Power Variable Latency Multiplier With AH Logic

## Roobitha Nujum<sup>1</sup>, Jini Cheriyan<sup>2</sup>

<sup>1</sup>P G Scholar, VLSI and Embedded Systems, Department of ECE, T K M Institute of Technology, Kollam, India

<sup>2</sup>Assistant Professor, Department. of ECE, T K M Institute of Technology, Kollam, India

Abstract: Low power design has been an important part in VLSI system design. Digital multipliers are most critical functional units of digital filters. The overall performance of digital filters depends on the throughput of multiplier design. Aging problem of transistors has a significant effect on performance of these systems and in long term, the system may fail due to delay problems. Aging effect can be reduced by using over-design approaches, but these approaches leads to area, power inefficiency. Moreover, timing violations occur when fixed latency designs are used. Hence to reduce timing violations and to ensure reliable operation under aging effect, low power variable latency multiplier with adaptive hold logic is used. This multiplier design can be applied to digital filter so as to enhance its performance. The VHDL language is used for coding, synthesis was done by using Xilinx ISE and simulated by using Model-Sim.

Keywords: Adaptive Hold Logic (AHL), Bypassing Technique, Negative -Bias Temperature Instability (NBTI), Variable latency, FIR Filter Design

### 1. Introduction

Filters are widely used in signal processing and communication systems. Digital finite impulse response (FIR) filters are the basic building block of many digital signal processing systems. In signal processing, the function of a filter is to remove unwanted parts of the signal, such as random noise, or to extract useful parts of the signal, such as the components lying within a certain frequency range. The main objectives of digital FIR filter are to filter out undesirable parts of the signal, shape the spectrum of signals in communication channels, signal detection or analysis in radar applications.

An analog filter uses analog electronic circuits made up from components such as resistors, capacitors and op-amps to produce the required filtering effect. Such filter circuits are widely used in such applications as noise reduction, video signal enhancement, graphic equalizers in hi-fi systems, and many other areas. A digital filter uses a digital processor to perform numerical calculations on sampled values of the signal. The processor may be a general-purpose computer, or a specialized DSP (Digital Signal Processor) chip. Digital filters are more advantageous when compared with analog ones. Digital filters are easily designed, tested and implemented on a general-purpose computer or workstation.

Multiplication is an essential arithmetic operation for common DSP applications, such as filtering and fast Fourier transform (FFT). To achieve high execution speed, parallel array multipliers are widely used. These multipliers tend to consume most of the power in DSP computations, and thus power-efficient multipliers are very important for the design of low-power DSP systems. The throughput of digital filter systems depends on these multipliers, and if the multipliers are too slow, the performance of entire circuits will be reduced.

Furthermore, negative bias temperature instability (NBTI) occurs when a PMOS transistor is under negative bias, results

in aging effect. Aging effect degrades transistor speed by increase in threshold voltage, which results in real time delay problems. The corresponding effect on an nMOS transistor is positive bias temperature instability (PBTI), which occurs when an nMOS transistor is under positive bias. Traditional methods to reduce this aging effect were area and power inefficient [1].

Traditional circuits are based on fixed latency design. In fixed latency design, critical path delay as the overall circuit clock cycle in order to perform correctly. However, the probability that the critical paths are activated is low. For these noncritical paths, using the critical path delay as the overall cycle period will result in significant timing waste. Hence, the variable-latency design was proposed to re- duce the timing waste of traditional circuits. In variable- latency design, shortest paths are assigned to be executed within one cycle and longest paths within two or more cycle. When shorter paths are activated frequently, the average latency of variable-latency designs is better than that of fixed latency designs [1].

Main objective of the work is to design a digital filter using low power variable latency multiplier with AH logic. Low power variable latency multiplier is designed so as to ensure minimum performance degradation. The coding can be synthesized by the Xilinx ISE Design Suite, simulated using Model-Sim simulator.

#### 2. Related Works

Many works has been carried out in the field of low power Digital FIR Filter designs. Digital FIR filter based on bypassing multiplier are proposed earlier [2]. But bypassing multiplier designs are not based on variable latency technique and it does not consider delay due to aging effect.

NBTI mechanism cab be realized using reaction- diffusion model. In this RD model NBTI is explained using two phases; stress phase and recovery phase. Stress phase: Vgs = -Vdd, the interaction between inversion layer holes and

# Volume 4 Issue 2, February 2015 www.ijsr.net

#### International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064 Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438

hydrogen-passivated Si atoms breaks the Si–H bond generated during the oxidation process, generating H or H2 molecules. When these molecules diffuse away, interface traps are left. The accumulated interface traps between silicon and the gate oxide interface result in increased threshold voltage (*V*th), reducing the circuit switching speed. When the biased voltage is removed, the reverse reaction occurs, reducing the NBTI effect. However, the reverse reaction does not eliminate all the interface traps generated during the stress phase, and *V*th is increased in the long term. Hence, it is important to design a reliable highperformance multiplier

Hot carrier injection (HCI) is a phenomenon in solid-state electronic devices where an electron or a "hole" gains sufficient kinetic energy to overcome a potential barrier necessary to break an interface state. The term "hot" refers to the effective temperature used to model carrier density, not to the overall temperature of the device. Since the charge carriers can become trapped in the gate dielectric of a MOS transistor, the switching characteristics of the transistor can be permanently changed. Hot-carrier injection is one of the mechanisms that adversely affects the reliability of semiconductors of solid-state devices. Methods to reduce NBTI and HCI includes, delay guard banding and over design approaches [8]. But these approaches are area-power inefficient, as it requires additional circuitry.

Array multiplier is an efficient layout of a combinational parallel multiplier. Multiplication of two binary numbers can be obtained with one micro operation by using combinational circuit that forms the product bit all at once thus making it a fast way of multiplying two numbers since only delay is the time for the signals to propagate through gates that forms the multiplication array. With its structure, this multiplier is based on add and shift operations. Each partial product is generating by taking into account the multiplicand and one bit of multiplier each time. The impending addition is carried out by carry save addition and the final product is obtained employing fast adder.

In the Carry Save Addition method, the first row can be designed with either Half-Adders or Full-Adders. We have to multiply two bits (one partial product) each from X and Y. If the first row of the partial products is implemented with full adders, then the third input i.e  $C_{in}$  will be considered '0'. The carries of each full adder can be diagonally forwarded to the next row of the adder. The resulting multiplier is said to be Carry Save Multiplier, because the carry bits are not immediately added, but rather are saved for the next stage. The basic idea is to implement the design with full adders only. Hence in the design if the full adders have two input data at any stage, the third input is considered as zero. In the final stage, carries and sums are merged in a carry-propagate (e.g. ripple carry or carry-look ahead) adder stage. This is the conventional array multiplier with CSA

Bypassing multipliers are modification of normal array multipliers. Dynamic power consumption can be reduced by bypassing method when the multiplier has more zeros in input data [3]. The path delay for an operation is strongly tied to the number of zeros in the multiplicands in the column- bypassing multiplier. Traditional filter design using bypassing multiplier does not consider variable latency technique. However, no digital FIR filter using variable latency based multiplier that considers the aging effect and can adjust dynamically has not yet been developed.

# 3. Methodology

Performance of digital filters depends on throughput of multipliers. The primary objective is power reduction with small area and delay overhead. Low power Variable latency multiplier design with AH logic introduces a multiplier, in which AHL circuit associated with it adjusts the circuit when timing delays occurs so as to ensure minimum performance degradation.

# 3.1 Low Power Variable Latency Multiplier With AH Logic

The basic block diagram for low power variable latency multiplier with AH logic is shown in Figure 1, which includes two m-bit inputs (m is a positive number), one 2mbit output, one column-bypassing multiplier, 2m 1-bit Razor flip- flops, and an AHL circuit. Clock is provided by the AND gate at the input.

The overall working of variable latency multiplier is as follows: when input patterns arrive, the column-bypassing multiplier and the AHL circuit execute simultaneously. Depending on the number of zeros in the multiplicand, the AHL circuit decides number of clock cycles required for the current input pattern. If the input pattern requires two cycles to complete, the AHL will output 0 to disable the clock signal of the input flip-flops. Otherwise, the AHL will output 1 for normal operations.



Figure 1: Low Power Variable Latency Multiplier With AH Logic

When the column bypassing multiplier completes the operation, the result will be passed to the Razor flip-flops. The Razor flip-flops check for any path delay timing violations. If timing violations occur, it means that the cycle period is not long enough for the current operation to complete and that the execution result of the multiplier is incorrect. Thus, the Razor flip-flops will output an error to inform the system that the current operation needs to be re-executed using two cycles to ensure the operation is correct.

#### Volume 4 Issue 2, February 2015 <u>www.ijsr.net</u> Licensed Under Creative Commons Attribution CC BY

#### 3.2 Column Bypassing Multiplier



Figure 2: Column Bypassing Multiplier

A column-bypassing multiplier is a modified array multiplier. The column bypassing multiplier is an excellent candidate for the variable design since we can simply examine the number of zeros in the multiplicand to predict whether the operation requires one cycle or two cycles to complete. A column-bypassing multiplier is designed in such a way that, the FA operations are disabled if the corresponding bit in the multiplicand is 0. Figure 2 shows a 4x4 column-bypassing multiplier. Supposing the inputs are 1010\*1111, it can be seen that for the full adders (FAs) in the first and third diagonals, two of the three input bits are 0: the carry bit from its upper right FA and the partial product  $a_{i*}b_i$ . Therefore, the output of the adders in both diagonals is 0, and the output sum bit is simply equal to the third bit, which is the sum output of its upper FA.

Hence, the FA is modified to add two tristate gates and one multiplexer. The multiplicand bit  $a_i$  can be used as the selector of the multiplexer to decide the output of the FA, and  $a_i$  can also be used as the selector of the tristate gate to turnoff the input path of the FA. If  $a_i$  is 0, the inputs of FA are disabled, and the sum bit of the current FA is equal to the sum bit from its upper FA, thus reducing the power consumption of the multiplier. If  $a_i$  is 1, the normal sum result is selected.

#### 3.3 The Razor Flipflop

Fig 3 shows the block diagram of the razor flip-flop. Input to the razor flip-flop is the  $2_m$  bit output from column bypassing multiplier. For each m bit a 1-bit razor flip-flop is used. A 1-bit Razor flip-flop contains a main flip-flop, shadow latch, XOR gate and MUX. The main flip-flop catches the execution result for the combination circuit using a normal clock signal, and the shadow latch catches the execution result using a delayed clock signal, which is slower than the normal clock signal. If the latched bit of the shadow latch is different from that of the main flip-flop, this means the path delay of the current operation exceeds the cycle period, and the main flip- flop catches an incorrect result. If errors occur, the Razor flip- flop will set the error signal to 1 to notify the system to re- execute the operation and notify the AHL circuit that an error has occurred. Razor flip-flops is used to detect whether an operation that is considered to be a one-cycle pattern can really finish in a cycle. If not, the operation is re-executed with two cycles.



Figure 3: The Razor Flipflop

#### 3.4 Adaptive Hold Logic

The Adaptive Hold Logic (AHL) circuit is the key component of variable-latency multiplier. Block diagram of adaptive hold logic is shown in fig 4. The AHL circuit contains decision block, MUX and a D flip-flop. If the cycle period is too short, the column-bypassing multiplier is not able to complete these operations successfully, causing timing violations. These timing violations will be caught by the Razor flip-flops, which generate error signals. If errors happen frequently, it means the circuit has suffered significant timing degradation due to the aging effect.



The operation of the AHL circuit are as follows: when an input pattern arrives, decision block will decide whether the pattern requires one cycle or two cycles to complete and pass both results to the multiplexer. The multiplexer selects one of either result based on the output of the razor flip-flop. Then an *OR* operation is performed between the result of the multiplexer, and the *Qbar* signal is used to determine the input of the D flip-flop. When the pattern requires one cycle, the output of the multiplexer is 1. The !(gating) signal will become 1, and the input flip-flops will latch new data in the next cycle. On the other hand, when the output of the multiplexer is 0, which means the input pattern requires two cycles to complete, the *OR* gate will output 0 to the D flip-flop. Therefore, the !(gating) signal will be 0 to disable the clock signal of the input flip-flops in the next cycle.

#### 4. Results and Discussion

The design entry is modeled using VHDL in Xilinx ISE Design Suite 8.1 with the project targeted to Spartan 2E

Starter Kit Board, to obtain the synthesis report. The simulation of the design is performed using Model-Sim SE 6.3f from Mentor Graphics to validate the functionality of the design. Structural model of fixed latency multiplier and low power variable latency multiplier with AH logic in 4x4, 16x16 and 32x32 is developed. The low power variable latency multiplier with AH logic contains modules such as a column bypassing multiplier, the razor flip-flop and an adaptive hold logic. The simulation results for the bypassing based multipliers are shown in Table 1.

| Table <sup>*</sup> | 1: Area         | Power   | & Delay | Com | narison |
|--------------------|-----------------|---------|---------|-----|---------|
| Lanc.              | <b>I.</b> Alca, | I U WCI | a Delay | Com | parison |

| Multiplier                                                                | Area<br>(Gate Count) | Power<br>(mW) | Delay<br>(nS) |
|---------------------------------------------------------------------------|----------------------|---------------|---------------|
| Fixed Latency 4x4<br>Column Bypassing<br>Multiplier                       | 590                  | 30            | 5.522         |
| Variable Latency 4x4<br>Column Bypassing<br>Multiplier With AH<br>Logic   | 590                  | 29            | 3.334         |
| Fixed Latency 16x16<br>Column Bypassing<br>Multiplier                     | 12,826               | 71            | 6.901         |
| Variable Latency<br>16x16 Column<br>Bypassing Multiplier<br>With AH Logic | 8,199                | 44            | 4.107         |
| Fixed Latency 32x32<br>Column Bypassing<br>Multiplier                     | 53,488               | 231           | 6.199         |
| Variable Latency<br>32x32 Column<br>Bypassing Multiplier<br>With AH Logic | 32,676               | 110           | 3.683         |

In the 32x32 multiplier, the area of variable latency column bypassing is around 20% less when compared with fixed latency column bypassing. In the 16x16 multiplier, the area of variable latency column bypassing is 40% less when compared with fixed latency column bypassing. In the case of 4x4 multiplier, the area of variable latency column bypassing is same as that of fixed latency column bypassing. The AHL and Razor flip-flops both occupy a smaller area ratio in larger multipliers.

To make comparison fair, the power of fixed latency column bypassing includes the power of flip-flops at the input and output, and the power of variable latency column by-passing includes the power of flip-flops at the input and the power of Razor flip-flops at the output. In the 32x32 multiplier, the power of variable latency column bypassing is around 15% less when compared with fixed latency column bypassing. In the 16x16 multiplier, the power of variable latency column bypassing is 27% less when compared with fixed latency column bypassing. In the case of 4x4 multiplier, the power of variable latency column bypassing is 1% less when compared with fixed latency column bypassing. Compared with the fixed-latency multiplier, the variable-latency multiplier has higher power due to more complicated circuits. However, with changing the framework of normal variable-latency multiplier, it still has less power than that of the fixed-latency multiplier, because it uses both the clocking gating and a bypassing power reduction technique.

Similarly in the case of delay, variable latency based multipliers has less delay when compared with fixed latency ones. Based on the circuit area, power and delay comparison, the 16x16 variable latency based multiplier with AH logic is the most efficient one and the 4x4 variable latency based multiplier with AH logic is the least efficient one. Hence, 16x16 variable latency multiplier with AH logic can be applied to digital FIR filter design so as to enhance its performance.

#### 5. Conclusion

Low power utilization is the most important criteria for the high performance DSP system. High feat system can be achieved by reducing the dynamic power which in turn reduces the total power dissipation. Low power Variablelatency multiplier design with the adaptive hold logic is able to adjust the adaptive hold logic to mitigate performance degradation due to delay problems. Variable-latency design minimizes the timing waste of the noncritical paths. The Razor flip-flops detect the timing violations and re-execute the operations using two cycles. Variable-latency design can adjust clock cycle required by input patterns to minimize performance degradation. And hence variable latency multipliers have less performance degradation when compared with traditional fixed latency multipliers, which needs to consider the degradation by both the NBTI effect and use the worst case delay as the cycle period. Therefore, performance of digital FIR filters using low power variable latency multiplier with AH logic, can be enhanced by reduced delay, area and power.

#### References

- [1] Ing-Chao Lin, Member, IEEE, Yu-Hung Cho, and Yi-Ming Yang, "Aging-Aware Reliable Multiplier Design With Adaptive Hold Logic" IEEE Transactions On Very Large Scale Integration (VLSI) Systems.
- [2] Prabhu E, Mangalam H, Saranya K, "Design of Low Power Digital FIR Filter based on Bypassing Multiplier", International Journal of Computer Applications (0975 – 8887) Volume 70– No.9, May 2013
- [3] J.Sudha Rani, D.Ramadevi, B.Santhosh Kmar, Jeevan Reddy K, "Design of Low Power Col-umn bypass Multiplier using FPGA", IOSR Journal of VLSI and Signal Processing (IOSR-JVSP), Vol.1, Issue 3 (Nov. -Dec. 2012).
- [4] Yu-Shih Su, Da-Chung Wang, Shih-Chieh Chang, Member, IEEE, and Malgorzata Marek-Sadowska, "Performance Optimization Using Variable-Latency Design Style", IEEE Trans-actions On Very Large Scale Integration (VLSI) Systems, Vol. 19, No. 10, October 2011.
- [5] N.Ravi, Dr.T.S.Rao, Dr.T.J.Prasad," Performance Evaluation of Bypassing Array Multiplier with Optimized Design," International Journal of Computer Applications, Vol. 28, No.5, August 2011

- [6] K.-C. Wu and D. Marculescu, "Aging-aware timing analysis and optimization considering path sensitization," in Proc. Date, 2011.
- [7] Yiran Chen, Hai Li, Cheng-Kok Koh, Guangyu Sun, Jing Li,Yuan Xie, and Kaushik Roy, "Variable-Latency Adder (VL-Adder) Designs for Low Power and NBTI Tolerance", IEEE Transactions On Very Large Scale Integration (VLSI) Systems, Vol.18, No.11, November 2010.
- [8] Bipul C. Paul, Member, Kunhyuk Kang, Haldun Kufluoglu, "Impact of NBTI on the Tem-poral Performance Degradation of Digital Circuits", IEEE Electron Device Letters., Vol. 26, No. 8, August 2005.
- [9] J. Ohban, V. G. Moshnyaga, and K. Inoue, "Multiplier energy reduction through bypassing of partial products," in Proc. APCCAS, 2002.