Design and Implementation of 64-Bit Multiplier Using CLAA and CSLA

Shaik Meerabi¹, Krishna Prasad Satamraju²

Department of Electronics and communication Engineering, VVIT, Nambur, Guntur, A.P, India

Abstract: This paper deals with comparative study of the Carry Look-Ahead Adder (CLAA) based 64-bit unsigned integer multiplier and Carry Select Adder (CSLA) based 64-bit unsigned integer multiplier. Multiplication is a fundamental operation in most of the signal processing algorithms. Multipliers occupy large area, long latency and consume considerable power. Therefore there is a need for designing a multiplier that consumes less power. Moreover the digital system’s efficiency is generally determined by the performance of the multiplier because the multiplier is generally the slowest element in the system, and consumes more area. Hence, optimizing the speed, area and delay of the multiplier is a major design issue. Carry Select Adder (CSLA) is one of the fastest adders used in many applications to perform fast arithmetic functions. From the structure CSLA there is a scope for reducing area and delay by using Common Boolean Logic (CBL). This work evaluates the performance of the proposed designs in terms of area, delay, and power. The power dissipation is same for both CLAA based multiplier and CSLA based multiplier. But the area delay product of modified CSLA based multiplier is reduced to 6% when compared to CLAA based multiplier. These multipliers are simulated and synthesized using Modelsim6.4b and Xilinx 10.1.

Keywords: CLAA, CSLA, CBL, Delay, Area

1. Introduction

Digital computer arithmetic is an aspect of logic design with the objective of developing appropriate algorithms in order to achieve efficient utilization of the available hardware. The basic operations are addition, subtraction, multiplication and division. Addition operation plays an important role in the design of multiplier as repeated form of addition operations and shifting results in the multiplication operation.

Multiplication is a mathematical operation that at its simplest is an abbreviated process of adding an integer a specified number of times. Multiplication is the fundamental arithmetic operation important in several processors and digital signal processing systems. Multiplication of two k-bit numbers need multi operand addition process that can be realized in k cycles of shifting and addition with hardware, firmware or software. Multiplication based operations such as multiply and accumulate (MAC) and inner product are among some of the frequently used intensive arithmetic functions currently implemented in many digital signal processing (DSP) applications such as convolution, Fast Fourier Transform (FFT), filtering and in microprocessors in its arithmetic and logic unit. Portable multimedia and digital signal processing (DSP) systems, which typically require low power consumption, short design cycle, and flexible processing ability, have become increasingly popular over the past few years. As many multimedia and DSP applications are highly multiplication intensive so that the performance of these systems are dominated by multipliers. Unfortunately, portable devices mostly operate with standalone batteries, but multipliers consumes large amount of power. Digital signal processing systems need multiplication algorithms to implement DSP algorithms such as filtering where the multiplication algorithm is directly within the critical path. Along with signal processing applications, multimedia, and 3D graphics, performance, in most cases, strongly depends on the effectiveness of the hardware used for computing multiplications. More sophisticated signal processing systems are being implemented on a VLSI chip. These signal processing applications not only demand great computation capacity but also consume considerable amount of energy. While speed and area remain to be the two major design tools, the size of multiplier is directly proportional to the square of its resolution.

Given that the hardware can only perform a relatively simple and primitive set of Boolean operations, arithmetic operations are based on a hierarchy of operations that are built upon the simple ones. In VLSI design, speed, power and chip area are the most often used measures for determining the performance and efficiency of the VLSI architecture.

Ripple carry adders exhibits the most compact design but the slowest in speed: Whereas carry look ahead is the fastest one but consumes more area. Carry select adders act as a compromise between the two adders. A new concept of Common Boolean Logic is presented to speed up addition process. The CSLA is not area efficient because it uses multiple pairs of Ripple Carry Adders (RCA) to generate partial sum and carry by considering carry input Cᵢ₋₁ = 0 and Cᵢ₋₀ = 1, then the final sum and carry are selected by the multiplexers (mux). In this paper we are going to compare the performance of different adders which are used to design the multipliers, based on area and time needed for performing multiplication.

2. Carry Look Ahead Adder

The Carry Look Ahead Adder (CLAA) solves the carry delay problem by calculating the carry signals in advance, based on the input signals. It is based on the fact that a carry signal will be generated in two cases:

1) When both bits aᵢ and bᵢ are 1,
2) When one of the two bits is 1 and the carry-in is 1.
Thus we can write the above two equations can be written in terms of two new signals $P_i$ and $G_i$, which are shown in Figure 1 and 64-bit Carry Look Ahead Adder is shown in Figure 2.

Let $G_i$ be the carry generate function and $P_i$ be the carry propagate function. Then we can rewrite the carry function as follows:

$$G_i = A_i \cdot B_i$$  \hspace{1cm} (1)
$$P_i = (A_i \oplus B_i).$$  \hspace{1cm} (2)
$$S_i = P_i \oplus C_i.$$  \hspace{1cm} (3)
$$C_{i+1} = G_i + P_i \cdot C_i.$$  \hspace{1cm} (4)

3. Carry Select Adders

The Carry Select Adder comes in the category of conditional sum adder. Conditional sum adder works on some condition. Sum and carry are calculated by assuming input carry as 1 and 0 prior to the input carry comes. When actual carry input arrives, the actual calculated values of sum and carry are selected using a multiplexer.

The conventional carry select adder contains one $k/2$ bit adder for the lower half of the bits and two $k/2$ bit adder for upper half of the bits, where $k$ is the length of the input. In MSB adders, one adder assumes carry input as one for performing addition and another assumes carry input as zero. The carry out calculated from the last stage i.e. least significant bit stage is used to select the actual calculated values of output carry and sum. The selection is done by using a multiplexer.

4. Proposed System: Modified Carry Select Adder

For area minimization, the Number of slices should be minimized. It can be achieved by altering the Full Adder. The logic design of Full Adder is shown in figure 5. Its truth table is shown in Table 1. The proposed System is designed with the help of NOT, OR, XOR and AND gates. Here the size of the adder is reduced by Boolean algebra so that the number of slices is reduced. When the number of slices is reduced automatically the area and delay are also reduced.

$$\text{Figure 1: Full Adder Stage at Stage I with } P_i \text{ and } G_i$$

$$\text{Figure 2: 64 bit Carry Look-Ahead Adder}$$

$$\text{Figure 3: Basic Carry Select adder circuit}$$

$$\text{Figure 4: Block Diagram of Regular 64 bit Carry Select Adder}$$

$$\text{Figure 5: Logic Diagram of Full Adder}$$
### Table 1: Truth Table for full adder

<table>
<thead>
<tr>
<th>C&lt;sub&gt;In&lt;/sub&gt;</th>
<th>A</th>
<th>B</th>
<th>SUM</th>
<th>CARRY</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
</tr>
</tbody>
</table>

Logical equations for full adder are as below:

\[
\text{Sum} = A \oplus B \oplus C\text{In}
\]  \hspace{1cm} (5)

\[
\text{Carry} = (A \cdot B) + C\text{In} \cdot (A + B)
\]  \hspace{1cm} (6)

Minimized Logical equations for full adder:

FOR C<sub>In</sub>=0

\[
S_v = A \oplus B
\]  \hspace{1cm} (7)

\[
C_v = A \cdot B
\]  \hspace{1cm} (8)

FOR C<sub>In</sub>=1

\[
S_k = \text{NOT } S_v
\]  \hspace{1cm} (9)

\[
C_k = A + B
\]  \hspace{1cm} (10)

The logic for the required carry is chosen using 2:1 mux circuits. This results in reduced gate numbers than existing carry select adders. Hence it has the area minimized and optimal computational speed.

### 5. Multiplication Algorithm

Let the product register size be 128 bits. Let the multiplicand registers size be 64 bits. Store the multiplier in the least significant half of the product register. Clear the most significant half of the product register. Repeat the following steps for 64 times:

1) If the least significant bit of the product register is "1" then add the multiplicand to the most significant half of the product register.
2) Shift the content of the product register one bit to the right (ignore the shifted-out bit.)
3) Shift-in the carry bit into the most significant bit of the product register.

Figure 7 shows a block diagram of two n-bit values.

### 6. Simulation Results

The HDL simulation of the two multipliers is presented in this section. In this, waveforms, timing diagrams, the design summary and the power analysis for both the CLAA and CSLA based multipliers. The HDL code for both multipliers, using CLAA and CSLA, are generated. The HDL model has been developed using Modelsim6.4b. The multipliers use two 64-bit values. Here we are dealing with the comparison in the bit range of n*n (64*64) as input and 2n (128) bit output.

The performance analysis of the area and delay is shown in the table2. The area delay product of modified CSLA based multiplier is reduced to 6% when compared to CLAA based multiplier.
The power performance analysis for the CLAA and CSLA based multipliers are shown in Table 3. Here the power dissipation is approximately same for both CLAA & CSLA.

Table 3: power analysis

<table>
<thead>
<tr>
<th>Multiplier type</th>
<th>Total Power (mW)</th>
</tr>
</thead>
<tbody>
<tr>
<td>CLAA based 64-bit Multiplier</td>
<td>203</td>
</tr>
<tr>
<td>Regular CSLA based 64-bit Multiplier</td>
<td>203</td>
</tr>
<tr>
<td>Modified CSLA based 64-bit Multiplier</td>
<td>203</td>
</tr>
</tbody>
</table>

7. Conclusion

We present a design and implementation of 64-bit unsigned multiplier with CLAA and CSLA. The design is tested with the existed adders. Analysis shown that there is a reduction of about 6% in the area delay product. The multiplier is proved to be efficient in terms of power utilization as well.

References


Author Profile

Shaik. Meerabi is pursuing M. Tech with specialization in VLSI and Embedded Systems from Vasireddy Venkataadri Institute of Technology (VIT) Nambur. She completed her B. Tech in Electronics and Communication Engineering from Acharya Nagarjuna University.

Krishna Prasad Satamraju obtained in Masters Degree in Technology from SRM University, with specialization in Digital Communications and Networking. He is presently working as Assistant Professor in the Department of ECE, VVIT, Nambur. He has over Eight years of teaching experience. His research areas include Embedded Systems, RTOS and Linux based Device driver development.