International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064 Index Copernicus Value (2013): 6.14 | Impact Factor (2014): 5.611

# Review on Scalable FFT Architecture for High Speed Communication Standard

### Rutuja C. Tamhane<sup>1</sup>, Shrikant J. Honade<sup>2</sup>

<sup>1</sup>G.H.Raisoni College of Engineering and Management, Sant Gadge Baba Amravati University, Amaravti, Maharashtra, India

<sup>2</sup>Assistant Professor Dept of Electronics And Telecommunication Engineering, G.H.Raisoni College of Engineering And Management Amaravti Maharashtra, India

Abstract: The Fast Fourier transform (FFT) has presently a key role in signal processing applications. Most of the system needs high flexibility, high speed and high efficiency. The baseband hardware should be economical and capable enough to compute FFT within the time constraints necessary to support multiple wireless standards. Baseband hardware should be scalable so it supports multiple wireless standards as well as it should meet the performance constraints such as high speed, low area and low power consumption. Hence, the baseband hardware needs a scalable FFT module that meets the performance constraints required by multiple wireless standards. This paper presents a highly efficient hierarchical design of an application specific instruction set processor architecture exploration, software tools design, system verification and design implementation. Simulation and synthesis results show that our FFT-ASIP achieves a higher energy-efficiency and flexibility and the area cost will be low.

Keywords: Application-specific instruction set processor (ASIP), fast Fourier transforms (FFT), hierarchical design, TMS320C6X kit, code composer.

#### 1. Introduction

The wired telecommunication networks provide low-bit-rate services as well as high-bit-rate services. Voice services need low-bit-rates whereas broadband transmission services need high-bit-rates. Wireless communication networks additionally give the desired services. However, some of the high-bit-rate services are restricted due to various performance constraints. During the course of time, there has been a growing demand for high-bit-rate services in wireless communication systems. Fast Fourier Transformation (FFT), the most time intense block in electronic communication systems, is facing both high flexibility and throughput requirements in current high speed wireless systems. It should be simply reprogrammed or reconfigured to support various standards and operating modes. For instance, the scale of FFT is desired to be changeable under different operation environments. The existing application-specific integrated circuits (ASICs), although can provide high throughput, cannot provide the desired flexibility and programmability. On the other hand, high throughput is important for FFT computation as well. The fast Fourier transform (FFT) and inverse FFT (IFFT) algorithms are core processing blocks used for conversion from time-tofrequency domain (FFT) and from frequency-to-time domain (IFFT) and represent the most computation intensive tasks.

The proposed FFT ASIP designed can have both high computation similarity and low memory access based on a hierarchical array structure. It will adopt the epoch idea, splits an N point FFT into two smaller FFT loops, then reconstruct the inner-loop FFT data flow into an array structure. The structure contains a set computation module. The proposed design extends the basic processor core with the computation module and adds custom register files to store all the intermediate results of the inner-loop FFT computations. The efficient addressing methods for each data and coefficients, which may remove the address changing logic between stages as proposed in. Then it customizes the instruction set consequently with a further information manipulation instruction and two information transfer instructions. With the support of an array structure and efficient data address logic, the FFT algorithmic program will be simply scaled for any size of computation.

Time Domain Data



Figure: Flow diagram of FFT

#### 2. Literature Review

There has been a lot of work done on FFT processing in each software and hardware. The paper by wei han et.al presented the development of High Performance FFT IP Cores through Hybrid Low Power algorithmic Methodology used parallelpipelined design for high throughput and power efficient FFT IP cores. Low power consumption can be gained through the combination of hybrid low power algorithms and architectures. The results showed that up to 55% and 52% power saving can be achieved. [1]

For A multi-standard FFT processor for wireless system-onchip implementations proposed by Ramesh Chidambaram and Rene van leukan presented the resulting programmable solution that was scalable for the order of the FFT and capable of satisfying performance necessities of various OFDM wireless standards. ASIP adopted a vectorial Ultralong Instruction Word (ULIW) to evaluate processing speed, area and power dissipation. [2]

A Family of Scalable FFT Architectures and an Implementation of 1024-Point Radix-2 FFT for Real-Time Communications is proposed by Hani Saleh et.al provided systematic scalable pipeline architecture which presents a new efficient method for decomposition of perfect shuffle permutation and data reordering for FFT algorithm which can be designed with variable number of processing elements. This provides designers with a trade-off choice of speed vs. complexity (cost and area).[3]

The paper by Xuan guan et.al conferred a novel hierarchical design of an application-specific instruction set processor (ASIP) for high throughput FFT. The FFT computation flow reconstructed into a scalable array structure supported an 8-point butterfly unit (BU). It incorporated custom register files to reduce memory access, and derive an everyday data addressing rule. With the micro design modifications, it extended the instruction set with 3 custom instructions. By using Xtensa's implementation FFT ASIP achieves an information throughput improvement of 866.5X, 5.9X, 2.3X over the standard FFT software implementation. [4]

Yazan Samir Algnabi presented the Novel Architecture of Pipeline Radix 22 SDF FFT based on Digit-Slicing Technique. The implementation has been coded in Verilog HDL and was tested on Xilinx Virtex-4 FPGA prototyping board. A maximum clock frequency of 669.277 MHz with a total equivalent gate count of 14,854 have been obtained from the synthesis report for the 8 point pipeline Radix 22 DIF SDF FFT which is 3.35 times faster than the conventional butterfly. [5]

A hierarchical design of an ASIP for high throughput and scalable FFT processing given by Xuan guan et.al, further reconstructed the FFT computation flow into a scalable array structure supported an 8-point butterfly unit (BU).With the micro architecture modifications, it extended the instruction set architecture (ISA) with new instructions to accelerate FFT operations. A FFT ASIP has been implemented on Tensilica's reconfigurable processor platform. FFT ASIP achieved the data throughput of 405.7 Mb/s for 1 K-point FFT that attains UWB-OFDM specifications. [6]

In MDC FFT/IFFT Processor With Variable Length for MIMO-OFDM Systems paper, Kai-Jiun Yang et.al proposed a radix-r based MDC MIMO FFT/IFFT processor for processing Ns streams of parallel inputs, The proposed approach is suitable for MIMO-OFDM baseband processor such as LTE applications, where Ns = 4 and N can be configured as 2048, 512, 256, and 128. It is worth emphasizing that the proposed design is based on an MDC architecture, which is generally not preferred, due to its low utilization rate in memory and computational elements such as adders and multipliers. [7]

Software-Defined DVB-T2 Demodulator using Scalable DSP Processors Proposed by Ho Yang et.al the flexible platform architecture was proposed with two CGRA cores for major demodulation functions with hardware blocks for frontend filters and channel decoders. Special permutation patterns of frequency and cell interleavers were implemented with specific intrinsic resulting in 17% processing time among the five blocks of demodulator. It is quite possible to define the DVB-T2 demodulator in software using the two CGRA cores. However, special intrinsic are prepared to run the functions of FIR filtering and frequency/cell interleaving within the cycle budget of given processors. Still require the enhanced performance and great flexibility based on multiple algorithms for software-defined demodulators.[8]

Manohar Ayinala proposed a novel memory-based FFT architecture that computes the RFFT based on the modified radix-2 algorithm in An In-Place FFT Architecture for Real-Valued Signals. The algorithm computes only half of the output samples and removes the redundant operations from the flow graph. Advantage of proposed RFFT architecture is that the length of the required memory can be reduced by a factor of 2. [9]

A highly efficient machine description language LISA for application specific instruction-set processor (ASIP) design exploration, software tools design, system verification and design implementation is used to design FFT processor. The design is proposed by Ting chen et.al conferred a fast prototype and implementation of a high throughput and flexible FFT ASIP that relies on LISA 2.0. It proposed a scalable butterfly processing structure with fixed shuffling mode between the stages. Simulation and synthesis results showed that FFT-ASIP achieved a higher energy- efficiency and flexibility and the area cost is low. [10]

Although lots of work on FFT process in each software and hardware has been done, providing each flexibility and high throughput is difficult. To fulfill this stringent a requirement of application specific instruction set processor is planned which provide both flexibility and high throughput computation parallelism and low memory access based on a hierarchical array structure.

# 3. Proposed Work

- 1)Firstly adopting the epoch idea splits the N- point FFT into two or three small FFT loops depending on the number of points.
- 2) The structure contains a fixed computation module, which is an 8 point radix 2 butterfly unit. The butterfly unit is uniform stage-independent basic structure, reducing the processing complexity and simplifying the data addressing

method

- 3)Uses TMS320C6X processor as a platform, extending the basic processor core with the computation module and adding custom register files to store the intermediate results within the inner loop FFT computation.
- 4)Propose efficient addressing methods for both data and coefficients.
- 5)Customize the instruction set
- 6) With the support of an array structure and efficient data addressing logic, the FFT algorithm can be scaled for any size of computation.

# 4. Conclusion

This paper will proposed a hierarchical FFT ASIP design, which will be flexible and efficient to meet the requirements of contemporary digital communication standard. And also provide high throughput computation parallelism and low memory access based on a hierarchical array structure. The hierarchical array structure offers good scalability to any point FFT, and both hardware and software are easy to implement. The cost, area and power consumption will be acceptable.

# References

- Wei Hun, Erdogan, Arslun, and M. Hasan, "The Development of High Performance FFT IP Cores through Hybrid Low Power Algorithmic Methodology,"2005IEEE. Pp.549-552.
- [2] Ramesh Chidambaram, Rene van Leuken Marc Quax, Ingolf Held, Jos Huisken, "A Multistandard FFT Processor for Wireless System-on-Chip Implementations 2006 IEEE.pp.1099-1102
- [3] Adnan Suleiman, Hani Saleh, Adel Hussein, and David Akopian, "A Family of Scalable FFT Architectures and an Implementation of 1024-Point Radix-2 FFT for Real-Time Communications," 2008 IEEE PP-321-327
- [4] Relationship 4. Xuan Guan, YunsiFei, and HaiLin,"A Hierarchical Design of an Application- specific Instruction Set Processor for High-throughput FFT," 2009 IEEE.pp.2513-2516.
- [5] Yazan Samir Algnabi 2Furat A. Aldaamee, 3Rozita Teymourzadeh, 1Masuri Othman and 1Md Shabiul Islam, "Novel Architecture of Pipeline Radix 22 SDF FFT Based on Digit-Slicing Technique" 2012 IEEE PP-470-474
- [6] XuanGuan, YunsiFei and Hai Lin, "Hierarchical Design of an Application-Specific Instruction Set Processor for High-Throughput and Scalable FFT Processing," March 2012 IEEE. VOL. 20, NO. 3, pp.551-563.
- [7] Shang-Ho Tsai, Kai-Jiun, "MDC FFT/IFFT Processor With Variable Length for MIMO-OFDM Systems" IEEE, VOL. 21, NO. 4, APRIL 2013
- [8] Ho Yang, NavneetBasutkar, PengXue, Kyeongyeon Kim, and Young-Hwan Park,"Software-Defined DVB-T2 Demodulator using Scalable DSP Processors IEEE Transactions on Consumer Electronics, Vol. 59, No. 2, May 2013 pp.428-434
- [9] ManoharAyinala, Yingjie Lao, Student Member, IEEE, and Keshab K. Parhi, Fellow, An In-Place FFT

Architecturefor Real-Valued Signals IEEE Transaction on circuits and system-II, VOL. 60, NO. 10, OCTOBER 2013 pp.652-656

[10] Ting Chen1a, Xiaowei Pan2, Hengzhu Liu1, Tiebin Wu11,"Rapid Prototype and Implementation of a High-Throughput and Flexible FFT ASIP Based on LISA 2.0," 2014 IEEE. pp.681-687