International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064 Impact Factor (2012): 3.358

# Design of FFT Processor for OFDM to Achieve High Efficiency Parameters

# A Murali<sup>1</sup>, Belcy Mathews<sup>2</sup>

<sup>1</sup>Associate Professor, Department of ECE, Lords Institute of Engineering & Technology, Hyderabad, India

<sup>2</sup>Assistant Professor, Department of ECE, Lords Institute of Engineering & Technology, Hyderabad, India

Abstract: In the communication world of today more and more OFDM systems are brought on-line with an ever increasing number of standards, services, and applications, all with different requirements on the physical layer of the transceiver. The physical layer is frequent, due to power constraints and speed, implemented as an ASIC and thus locked to one specific case. Hence, the number of required implementations will grow fast. Due to this reason, there will be an interest from the industry's point of view to search for a flexible architecture that can be configured to function with several standards, applications, or services. A flexible architecture will lead to re-usage of design and implementation and thus reduced cost. This document addresses both OFDM in theory and the implementation aspects of a flexible hardware solution for digital baseband OFDM. The FFT processor is a central part of an OFDM transceiver, and has been fabricated both as a standalone chip and as part of an OFDM transmitter chip. A scheme to reduce hardware and delay in an OFDM transceiver is proposed in this paper which can be significantly reduced with a cyclic suffix and a bidirectional FFT processor.

Keywords: FFT, OFDM

#### **1. Introduction**

FFT Processor is used widely in different applications like Image processing, Electromagnetic Spectrum Measurements, Radar, Multimedia Communication Services.

Analogue multi-carrier systems have been around since the 50's and the concept of orthogonal frequency division multiplexing (OFDM) with overlapping sub channel spectra was introduced by Chang in the mid 60's [1]. If we look further into the future, there is an ongoing discussion to include an OFDM transceiver in the fourth generation mobile system. With an OFDM transceiver the fourth generation mobiles can connect with a high data rate to an increasing number of hot spots, wireless local area networks that are installed in, e.g. coffee shops and offices. When focus is turned from transmitter to transceiver, a higher level of co-optimisation could be explored. In this design more than half of the memory needed to insert a cyclic prefix is removed if a bidirectional FFT processor and cyclic suffix is used. Since the unit that inserts cyclic prefix has a large area, the hardware savings are significant. However, in order to use the bidirectional FFT, with as high SNR as the one way FFT, a semi-floating point arithmetic was used. This resulted in increased hardware for small FFTs and no area savings is obtained for systems with less than 256points. The OFDM scheme has moved from research and military applications, to everyday products in just a few years.

The great strength of OFDM is its spectrum efficiency [bps/Hz] and its ability to deal with multipath channels, i.e. the type of channels that appear in wireless environments. Since OFDM is computationally demanding and therefore power hungry, it was not until recently the technology made it possible to build mobile OFDM devices with an adequate operation time. Thus, with a large market for wireless

devices operating in a multipath environment and the technology to build energy efficient devices, the time of OFDM has finally existed.

A: OFDM



OFDM is a broadband multicarrier modulation method that offers superior performance and benefits over older, more traditional single-carrier modulation methods because it is a better fit with today's high-speed data requirements and operation in the UHF and microwave spectrum. . First, it is used for digital radio broadcasting—specifically Europe's DAB and Digital Radio Mondial. It is used in the U.S.'s HD Radio. OFDM is also used in wired communications like power-line networking technology.

#### **B. OFDM Working**

OFDM is based on the concept of frequency-division multiplexing (FDD), the method of transmitting multiple data streams over a common broadband medium. That medium could be coaxial cable, radio spectrum, fiber-optic cable, or twisted pair. Each data stream is modulated onto number of adjacent carriers within the bandwidth of medium, and all are transmitted simultaneously. The best example of such a system is cable TV, which transmits several parallel channels of video and audio over a single fiber-optic cable and coax cable. The digital baseband parts

#### International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064 Impact Factor (2012): 3.358

of an OFDM transceiver [12], is shown in Figure 1.1. The basic idea of OFDM is to divide the available spectrum into N orthogonal sub channels. In the mapper, data is converted to signals located in the frequency domain where each sub channel is assigned one signal. The signals are then transformed to the time domain with the inverse fast Fourier transform (IFFT). The FFT is an efficient method to implement the DFT algorithm, based on a divide and conquer approach [13]. The last digital part of the transmitter inserts a cyclic extension to remove the effects of intersymbol interference (ISI) and interchannel interference (ICI).

# 2. System Design

Design strategy is an important issue when designing a chip and becomes even more important when the number of design criteria increase. In this case, a broad spectrum of application areas requires a high flexibility, the ability to handle high as well as low speed, and all this with high power efficiency.



Figure 1.2: Throughput versus mobility in OFDM systems

A flexible OFDM transceiver has the possibility to be used for all kinds of applications; one solution serves all. However, it might not be possible to cover all applications with one flexible design, but if it can be done the benefits are high enough to try.

## 2.1 Design strategy



The designer's goal is to find the optimal point in the design space, one with enough throughput for the application and minimum power and area requirements. However, an optimal search takes time and effort and therefore money, and the money will often drain before the search is over. Hence, the designer most often settle for a "good enough" point in the design space, a point that just about pass the requirements. The optimal line is the line that optimises power for all throughputs with a minimal area, i.e. minimize the angle To find an optimal line is of course even harder than to find a perfect point, but herein lies the challenge. Since we are designing a flexible wireless device, throughput is the parameter that changes and power must be kept to a minimum that only leaves area to trade with. As area is traded for both power and throughput, there is a possibility to end up with a large design. However, improvements in process technology allow more and more to be placed on the same chip, making this a relevant trade-off.

#### 2.2 Design Flow

A simplified design flow is shown in Figure 1.4. First, a floating point model is specified in a suitable programming language, e.g. Matlab or C/C++. After functionality has been verified, a fixed point model is implemented. Using the fixed point model, required word lengths are determined, for example to reach a certain signal to noise ratio. The word length is a critical parameter that affects speed, power, and area, and must thus be chosen with care. Test vectors for later use are extracted from these models. At this point, sufficient knowledge about the algorithm is usually obtained to start with an architectural description. The designs presented, use Matlab to specify the floating point model and C for the fixed point model. A standard library from Alcatel is used and the chips are fabricated in a 0.35  $\mu$ m CMOS process with five metal layers.

## 2.3 IFFT and FFT

The first designed chip is an FFT processor. The FFT processor has a central position both in the OFDM transmitter and receiver. The FFT is a computationally demanding operation that requires an ASIC implementation to reach high performance, i.e. high throughput combined with low energy consumption. The Inverse Fast Fourier Transform (IFFT) transforms signals



Figure1.4: Design flow

from frequency domain to time domain and the FFT performs the reverse operation. To keep signals in the frequency domain simplifies some signal processing operations, e.g. convolution in the time domain becomes multiplication in the frequency domain. The transformation into the time domain is done in order to reduce the number of backend RF-oscillators and demodulators.

The available bandwidth (B) in frequency is split into N subchannels, one for each subcarrier, where each has a power spectrum shape of a squared sinc pulse. The individual power spectra, after the IFFT in the transmitter, of a number of subcarriers are shown in Figure 1.5. Since the IFFT is a linear operation the sub-carriers can be separated again with the FFT, even though there spectrum overlap.



Figure 1.5: The power spectra of the individual subchannels in an OFDM signal.

In addition, with many subcarriers spectrum efficiency is higher since the two subcarriers at the outer edges, a and b in Figure 1.5, contribute more than the central subcarriers to the width of the power spectra. Subcarrier a and b have a spectrum width of approximately KB/N [Hz], while all other subcarriers only have a width of approximately B/N [Hz].

#### 3. FFT Processor Architecture

The FFT and IFFT has the property that, if FFT(Re(xi)+jIm(xi)) = Re(Xi)+jIm(Xi)and IFFT(Re(Xi)+jIm(Xi)) = Re(xi)+jIm(xi),

where xi and Xi are N words long sequences of complex valued, samples and sub-carriers respectively, then 1/N \* FFT(Im(Xi)+jRe(Xi)) = Im(xi)+jRe(xi).

Thus, it is only necessary to discuss and implement the FFT equation. To calculate the inverse transform, the real and imaginary part of the input and output are swapped. Since N is a power of two, scaling with 1/N is the same as right shift the binary word Log2(N) bits. Even simpler, is to just remember that the binary point has moved log2(N) bits to the left. Not performing the bit shift until, if ever, it is necessary, which depends on how the output from the IFFT will be used.



Figure 1.6: A radix-2 DIF butterfly (a) and a radix-2 DIT butterfly (b), where W is the twiddle factor

The FFT algorithm can be realized with a butterfly operation as the basic building block [23]. There are two types of butterfly operations, decimation in time (DIT) and decimation in frequency (DIF), both are shown in Figure 1.6. The difference between DIT and DIF lies in the position of the twiddle factor multiplication, which is either performed before or after the subtraction and addition.



Figure 1.7: FFT architecture, where CG is a clock gate.

All internal control in the FFT processor is managed with a counter that ripples through the stages in time with data. The counter controls which twiddle factor to use, when to activate the trivial multiplier, and when data is placed in the FIFOs. The counter starts at zero when the first input is present and counts, modulo N, plus one for each input data. Hardware to create the counter signal is placed internally on the chip. As a result of the pipelined structure, high throughput is obtained in the FFT processor [26]. Apart

from the latency of N-1 clock cycles, the processor produces one output for each input value, once the pipe is filled.

# 4. FFT Processor Result

A FFT processor has been implemented in a standard CMOS 0.35  $\mu$ m technology with five metal layers. The processor was estimated to compute a 1024 point FFT in less than 13  $\mu$ s, with a clock frequency of 83 MHz. The FFT processor is resizable between 32-1024 points and unused blocks are deactivated with clock gates. The designed processor reaches an SNR of 49 dB with 8 input bits for a 1024 point FFT. Figure 1.10 shows the FFT processor chip. The chip has 84 pins, three twiddle factor ROMs, and 6 RAMs to implement the FIFOs. The core area is 4.94 mm.

**Table 4.1:** A comparison between 1024 point FFTs.

| Design          | Voltage [V] | Frequency [MHz] | Power [mW] |
|-----------------|-------------|-----------------|------------|
| This FFT design | 2.0         | 25              | 76         |
| Low power FFT   | 1.5         | 25              | 200        |

Figure 1.8 shows the power consumption in the core when the chip is operating at 20 MHz. In Figure 4.11 the core power consumption is shown as a function of frequency at a core voltage of 2 V. The FFT processor functioned in all modes up to 50 MHz at 2 V. Since the test equipment did not support any frequencies in between 50 and 100 MHz, it was not possible to verify the estimated max frequency of 83 MHz. In Table 4.1 this design is compared to another pipelined FFT, implemented in the same technology [26]. As seen the presented design consumes less than half the power at the same frequency. Two reasons for this can be found. One, a radix-4 single-path delay communicator is used, which requires 2N words of memory, twice as much as the radix-22 architecture [29]. Secondly, no low power memories are used.



Figure 1.8: Voltage scaling, with frequency = 20 MHz.



Figure 1.9: Frequency scaling, with core voltage = 2 V.

A flexible OFDM transmitter has been implemented in a standard 0.35  $\mu$ m CMOS process with five metal layers. The transmitter is synthesized for a clock speed of 50 MHz and the core area of the chip is 8.5 mm<sup>2</sup>. Care has been taken during the design to keep power consumption low, e.g. low power memories are used as much as possible, unused parts in the design are turned off, and the wordlengths are kept short. The OFDM transmitter functioned in all modes up to 50 MHz at 3.3 V. Figure 1.11 shows the power consumption in the core for different voltages, when the chip is operating at 20 MHz..



Figure 1.10: The OFDM transmitter chip.

The expected square law dependency for power consumption versus voltage is seen. In Figure 5.9 the core power consumption is shown as a function of frequency at a core voltage of 3.3 V. An almost linear dependency between power and frequency is seen, as expected. The effect of the split memory architecture in the signal reordering unit is seen in both Figure 1.11 and 1.12, where it is a larger gap between the 128-point and 512point mode than between the other modes. This is due to the fact that all modes below 512-point only use the low power memories in the signal reordering unit



Figure 1.11: Voltage scaling, with frequency = 20 MHz.



Figure 1.12: Frequency scaling, with core voltage = 3.3 V.

# 5. Conclusions

To achieve high spectrum efficiency [bps/Hz], parameters such as cyclic prefix, number of subcarriers, and constellation in an OFDM system have to adapt to the present state of the channel. Since a wireless channel continuously changes its state, adaptation must be performed in real-time. Hence, the physical layer of a mobile high performance OFDM transceiver must be fast, flexible, and energy efficient, i.e. an ASIC with run time flexibility is required. In this document it is shown that flexibility can be obtained with a reasonable amount of extra hardware. The flexibility will contribute to a larger set of possible applications and thus to the possibility of larger fabrication volumes and lower price per volume.

Two ASIC chips have been fabricated in a 0.35 µm process with five metal layers. The first chip is a flexible FFT processor that can be reconfigured to perform all FFTs and IFFTs from 32 to 1024 points. The processor is word length optimized to reach a high SNR with small memories. The second chip is a flexible OFDM baseband transmitter, which supports bit-loading and a free choice of the cyclic prefix length. The FFT processor is incorporated in this design, but the previously used array multipliers has been exchanged for low power multipliers that use distributed arithmetic. Both fabricated designs use high level clock gating to turnoff unused parts of the design and thus save power.

#### References

- [1] W. Ullah, "A low power FFT-processor for OFDM transceivers using cyclic postfix," in Proc. of 20th NORCHIP Conference, Copenhagen, Nov. 2002, pp. 68-73.
- [2] OFDM Forum, "Broadband Mobile Wireless Group," www.ofdm-forum.com.
- [3] W. Eberle, V. Derudder, G. Vanwijnsberghe, M. Vergara, L. Deneire, L. Van der Perre, M. Engels, I. Bolsens, and H. De Man, "80-Mb/s QPSK and 72-Mb/s 64-QAM flexible and scalable digital OFDM transceiver ASICs for wireless local area networks in the 5-GHz band," IEEE Journal of Solid-State Circuits, vol. 36, pp. 1829-1838, Nov. 2001.
- [4] "A Flexible FFT Processor," in Proc. of 20th NORCHIP Conference, Copenhagen, Nov. 2002, pp. 121-126.
- [5] G. Bi and E. V. Jones, "A Pipelined FFT Processor for Word-Sequential Data," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, pp. 1982-1985, Dec. 1989.
- [6] W. Li and L. Wanhammar, "A Pipelined FFT Processor," in IEEE Workshop on Signal Processing Systems, 1999, pp. 654-662.
- [7] N. Petersson, "Peak and power reduction in multicarrier systems," 2002, licentiate thesis, Lund University, Sweden.
- [8] S. Johansson, "ASIC Implementation of an OFDM Synchronization Algorithm," 2000, Licentiate Thesis, Lund University, Sweden.
- [9] R. Morrison, L. J. Cimini, and S. K. Wilson, "On the Use of a Cyclic Extension in OFDM," in Proc. of Vehicular Technology Conference, VTC 2001 Fall, vol. 2, Atlantic City, NJ, USA, Oct. 7-11 2001, pp. 664-668
- [10] J. Rabaey, A. Chandrakasan, and B. Nikolic, Digital Integrated Circuits, A Design Perspective. Prentice-Hall, 2003.
- [11]Parhi, VLSI Digital Signal Processing Systems. New York, NY, USA: John Wiley & Sons, 1999.

## **Author Profile**



Murali. A is currently Associate Professor, Department of ECE at Lords Institute of Engineering & Technology, Hyderabad, India. He has completed his M.Tech from IIT Madras Chennai. He has 10 years of experience including academic and Industry.

His research interests Embedded Systems and VLSI Design. Research Areas Integrated circuits and VLSI



Belcy Mathews is currently Assistant Professor, Department of ECE at Lords Institute of Engineering & Technology, Hyderabad, India. She has 3 years of experience in academic. Her research interests Communications and DSP.