Abstract
In multi carrier OFDM systems parameters like speed, throughput and hardware area can be improved by using efficient Fast Fourier Transform approach. In this paper an area efficient and high speed 32 bit floating point FFT processor for OFDM using Vedic multiplication process is presented. Proposed FFT processor is based on memory based architecture and utilizing Urdhavatiraykbhyam sutra for Vedic multiplication. As the number of inbuilt multipliers available in FPGAs are limited, hence external multiplication module are required in the multicarrier OFDM systems, in order to reduce the complexity of FPGA implementation. By the use of Vedic multiplication process in FFT of OFDM high throughput with smaller area can be achieved. Simulation results explain that the proposed scheme is having high speed and throughput.
Keywords
Introduction
Orthogonal frequency division multiplexing (OFDM) belongs to the techniques in which single data stream is transmitted with smaller rate of subcarriers [1]. OFDM system is more robust for frequency selective fading and interferences, and usually used in applications such as Digital Video Broadcasting and Digital Audio Broadcasting [1]. Main significance of Fast Fourier Transform (FFT) is in the modulation and demodulation of OFDM. In the OFDM transmitter, Subcarriers are generated using IFFT module and in OFDM receiver, subcarriers are separated using FFT module [2].
OFDM FFT processor can be implemented on the system using the software based implementation or hardware based implementation. Software based implementation is done using “c” as programming language [3]. While, the implementation of OFDM FFT processor using hardware is much faster than the software based process, hence preferred to implement OFDM FFT processor.
Hardware based implementation of OFDM FFT processor can be achieved using seven methods. These are, cache memory based, sequential, parallel, parallel iterative, array, pipeline and memory based architecture [4]. From these five methods pipeline [5] and memory based architectures are commonly used.
Memory based floating point FFT Processor using Vedic multiplication is presented in this paper. Two types of algorithms are available for FFT implementation; these are Decimation in time (DIT) and Decimation in frequency (DIF) algorithms. In the proposed OFDM FFT processor DIT algorithm with Radix-2 method is used. By the use of Vedic multiplication process in proposed processor, computational complexity and power dissipation can be reduces. Vedic multiplication process can be achieved using Urdhvtirvakabhyam sutra [6]. Proposed processor works at very high speed without increasing the clock frequency hence speed gets increases. Propagation delay of proposed architecture is 85.451 ns. As the number of bits increases then hardware area does not increases rapidly, hence it is also an area efficient process.
In this paper, Section 2 explains Vedic Multiplication process using Urdhvtirvakabhyam sutra. Section 3 contains the multiplication process of 32*32 bits Vedic multiplier using carry save adder (CSA) unit and Binary to Excess-1(BEC-1) code convertor unit. Section 4 explains the main block of proposed OFDM FFT processor. Synthesis and simulation results of proposed OFDM FFT processor using Xilinx ISE design Suite 14.2 are explained in Section 5. Performance analysis of proposed FFT processor is reported in Section 6.
Vedic multiplication process using Urdhvtirvakabhyam sutra
In Digital Signal Processing multiplier comes under the category of key switching element and significant computation of multiplier is quite high. Earlier, different multiplier such as Array, Wallace tree and booth multipliers are used for multiplication [7]. In the array multiplier partial products are parallel generated; which increases the speed of multiplier but worst case delay increases with the width of array, a primary drawback of array multiplier. In Wallace tree multiplier, partial products are added in carry save adder, hence delay reduces. But the complex layout is the drawback for Wallace tree multiplier. quite complex. In Booth multiplier complexity gets reduces, but it has high speed, so cost of circuit gets increases [8]. Hence, to get high multiplication speed with less complex layout Vedic multiplier is used in proposed processor.
Vedic multiplier
Normal multiplier requires N2 multiplication units for N*N bits multiplications. In the proposed Vedic multiplier to multiply N*N bits, only N/2*N/2 bits multiplication is required till the process goes on 2*2 multiplication [7]. Hence the Vedic multiplier reduces the complexity of multiplication [6]. Mainly fourteen Sutras are available for multiplication using Vedic method [6]. Urdhvtirvakabhyam multiplication sutra is used in the proposed OFDM FFT Processor for multiplication.
Urdhvtiryakabhyam multiplication sutra
Urdhavatiryakbhyam Sutra works on “Vertical and crosswise” multiplication process [5]. An example of decimal numbers multiplication using Urdhvtriyakabhyam formula is explained in Fig. 1.
Two decimal numbers 25 and 31 are taken for multiplication using Urdhvtiryakabhyam sutra. Least significant bit (LSB) of these two numbers is multiplied in the first step and generates a resultant digit and one carry digit. This generated carry digit is added to the next stage in step number 2 where crosswise multiplication takes place. In Each step unit digit works as resultant digit and higher digit works as carry digit.
32*32 bit Vedic multiplier using Urdhvtriyakabhyam sutra
32*32 bit Vedic multiplication using Urdhvatiryabhyam formula is used in proposed OFDM FFT processor. In 32*32 bit Vedic multiplication size of multiplicand bits A1 and A2 is 32 bit. A1 and A2 further divided into chunks of size n/2 = 16, and fed to the input of 16*16 Vedic multiplier module. After that these chunks are grouped into chunks of size n/4 = 8 and given to 8*8 bit multiplier block. Again these Chunks are divided into chunks of size 4 and given to 4*4 multiplier unit (module). Same process is repeated for chunk size 2.
Figure 2 shows designing of chunk with size 2. This size 2 chunk is 2*2 bit Vedic multiplier. In multiplication process of 2*2 bits Vedic multiplication for Q0Q1 and R0R1 is implemented by simple using two half adders and resultant bits are S0S1S2S3. This 2*2 bit multiplier module is used in the designing of 32*32 bit Vedic multiplier.
32*32 bit multiplier shown in Fig. 3, uses four 16*16 bit Vedic multipliers, one carry save adder (CSA), one binary to excess-1 (BEC-1) code convertor and one multiplexer unit [7]. In this multiplication process, BEC-1 code convertor is used because numbers of logic gates required for full adder operation are less hence small area is required and high speed can be achieved [9]. Multiplexer selects BEC-1 input or output according to requirement of thesystem.
Proposed OFDM FFT processor architecture
A 32 bit memory based floating point FFT processor using Vedic multiplication is presented in this session. Proposed FFT processor is of length 8 and uses Radix-2 DIT algorithm. IEEE computer association gave two format of floating point numbers; one is of size 64 bit and other is of size 32 bit. In this paper, 32 bit precision is used, which is grouped into 3 parts as shown in Fig. 4. First group is of sign bit(31st bit), second group is of exponent bit (30 to 23 bits) and third group contain mantissa (22 to 0 bits). In process of floating point addition, operation exponent values arrangement, addition of mantissa and calculation of most significant bit (MSB) whether it is 1 or 0, all operations are performs [10].
Floating point numbers can be represented as:
(–1)P is the sign bit, for positive numbers value of “P” will be “0” and for negative numbers value of “P” will be “1”. “R” lies in the range between 0 to 225. “Q” is 23 bit binary number. Architecture of proposed FFT processor basically consists control unit, address generation unit, butterfly processing unit, RAM and ROM unit as shown in Fig. 5. These all units are explained in subsections.
Butterfly processing unit consists of one multiplication operation and two addition operations as depicted in Fig. 6. Multiplication is done by Urdhavatiryakbhyam Sutra. Multiplication process of 32*32 bits Vedic Multiplier is explained in Section 3. This external Vedic multiplier effectively reduces the complexity at the time of Field Programmable gate array (FPGA) implementation. 32*32 bit Vedic multiplier multiplies the outputs of multiplexer unit and ROM, and gives input to the positive and negative triggered flip flops. In butterfly unit negative and positive edge triggered D flip flops (D-FF) are used.
RAM unit
First input of FFT processor is written into RAM unit and when the computation process of FFT gets completed this input again written into the same position into the RAM unit.
ROM unit
ROM unit stores the values of sine and cosine functions.
Control unit
Controller in FFT processor works as a finite state machine and it consist of 7 states (rst1, rst2, rst3, rst4, rst5, rst6, rst7).
Address generation unit
The address generation unit (AGU) provides the RAM and the coefficients of ROM with the correct addresses. It also keeps track of which butterfly is being computed in which stage. For Proposed FFT Processor, there are 3 stages, each stage consisting of 4 butterflies. In addition to this, since address generation during input, output and FFT computation processes are different; it keeps track of the mode of operation of the chip and generates the required address. Controller gives the information related to Mode of operation. A block level description of the AGU is shown in Fig. 7. The different blocks of the AGU are explained separately.
Butterfly generator
The butterfly generator describes which butterfly is being computed in a particular stage. It is basically a 16-bit up counter since for proposed FFT there are 4 butterflies per stage and 4 data words per butterfly (2 real and 2 imaginary).
Stage generator
The stage generator contains information of the current stage in the FFT computation. The stage generator supplies, the base index generator with the current computed stage. For proposed FFT, there are 3 stages hence stage generator is basically a two bit counter.This stage generator is incremented by one in each 4 butterfly counts (by the “stage done”signal).
Stage Done_Io done block
It generates four signals called “iod”, “staged” “fftd” and “butterfly” as shown in Fig. 7. “iod” is generated when the “butterfly” count is 15. This informs to the controller that either the data input or output process is finished. The “staged” signal is generated when the current “butterfly” count is 4, it increments the stage generator by one. “fftd” is generated when the stage count is three. This informs to the controller that the FFT computation process is done.
IO-Address generator
IO Address Generator generates the addresses for RAM during the data input and output processes. During the data input process, output of the butterfly generator “butterfly” can be used for addressing 16 locations in the RAM. However, at the time of data output process, data should be bit-reversed while being written to outside world. Once in the output process bit-reversed address is selected by the multiplexers in the AGU, controller gives the information whether the process is in IO-mode through the signal “iomode”.
Base index generator
The base index generator generates the addresses for reading from the RAM.
Synthesis and simulation results
Synthesis results of 32*32 bit Vedic multiplier and Proposed FFT Processor are shown in Figs. 8 and 9 respectively. In 32*32 bit Vedic Multiplier module main components are four 16*16 bit Vedic Multipliers, one carry save adder and one binary to excess one (BEC-1) code convertor. In the proposed OFDM FFT processor for multiplication of D-FF output and multiplexer output this 32*32 Bit Vedic Multiplier is used.
Simulation of Proposed OFDM FFT processor is done on Xilinx ISE simulator ISIM P.28xd. Simulation results of 32*32 bit Vedic multiplier and Proposed FFT processor is shown in Figs. 10 and 11.
Performance analysis of proposed OFDM FFT processor
Proposed processor is designed in VHDL as an HDL language and implementation is done in Xilinx ISE Design Suite 14.2. Design contains 4656 slices and from them 1572 is used. Total 4 input LUTs are 9312 and from these 2839 are used. Performance report of proposed processor is explained in Table 1.
Total propagation delay of proposed OFDM FFT processor is 85.451 ns, from this total delay 47.434 ns is logic delay and 38.017 ns is route delay.
Conclusion
A Memory based floating point FFT processor for OFDM application is proposed in this paper. Proposed architecture is based on Redix-2 DIT algorithm and utilizes Vedic multiplication based Urdhvtiraykabhyam sutra for Multiplication of Multiplexer value and ROM output. By this proposed methodology external multiplication module is provided at the time of FPGA implementation of OFDM FFT processor. It reduces internal complexity of FPGA. Hence using proposed architecture power consumption of circuit and hardware area gets reduces and speed increases because of low propagation delay (85.451 ns) of which is beneficial for real time OFDM application. Hardware cost and power dissipation of proposed processor is also low.
