Using DSPLIB FFT Implementation for Real Input and Without Data Scaling

DSPLIB for TI C64x+ and C66x DSP architecture provides optimized implementation for FFT with assumptions for format and scaling of the input data. The application report provides guidance for users of DSPLIB for the following use cases:

Efficient compute of single precision FFT using real input without reformatting data to complex format
Fixed point FFT compute with no data scaling to avoid overflow

The content discussed benefits all DSPLIB users on C64x+, C674x and C66x DSP devices.

Trademarks

All other trademarks are the property of their respective owners.

1 Real Input Introduction

Algorithms to perform forward and Inverse Fast Fourier Transforms (FFT and IFFT) typically assume complex input and output data. However, many applications use only real-valued data in the time domain. A simple but inefficient solution to this problem is to pad N-length real input signals to N-length complex input signals with zero-valued imaginary components.

x_real = { 1, 2, 3, ... }
      
      
x_cplx = { 1, 0, 2, 0, 3, 0, ...}

The complex FFT can then be applied to the double-length sequence. However, this method is obviously inefficient. This topic explains how to use the complex-valued FFT and IFFT algorithms to efficiently process real-valued sequences without padding those sequences. There are two key advantages to this approach:

Lower memory footprint - M bytes per sequence instead of 2M bytes
Fewer cycles - length N/2 FFT and IFFT computation instead of length N

For more information on the derivation of this method, see Implementing Fast Fourier Transform Algorithms of Real-Valued Sequences With the TMS320 DSP Platform.

Example C code for this method is also available for download.

1.1 Prerequisites

Install the C674x DSPLIB or C66x DSPLIB based on the DSP being used.
- Familiarize yourself with the FFT and IFFT kernels
- Look at demo code (*_d.c) for usage examples

1.2 Computing a Length N/2 Complex FFT From a Length N Real Input Sequence

Let g(n) be an N-point real sequence (N must be even). Compute the N-point complex FFT, but only use an N/2-point FFT computation. This can be accomplished by using the following steps:

Form the the N/2-point complex valued sequence, x(n) = x₁(n) + jx₂(n), where x₁(n) = g(2n) and x₂(n) = g(2n + 1).
Compute the N/2-point complex FFT on the complex valued sequence x(n) to obtain X(k) = FFT{x(n)}. Note that the FFT should be performed with bit reversal.
Perform an additional computation to get G(k) from X(k).

G_r(k) = X_r(k)A_r(k) – X_i(k)A_i(k) + X_r(N/2–k)B_r(k) + X_i(N/2–k)B_i(k), for k = 0, 1, ..., N/2–1 and X(N/2) = X(0)

G_i(k) = X_i(k)A_r(k) + X_r(k)A_i(k) + X_rr(N/2–k)B_i(k) – X_i(N/2–k)B_r(k)

Note that only N/2 points of the N-point sequence of G(k) are computed in the above equations. Because the FFT of a real-sequence has symmetric properties, you can easily compute the remaining N/2 points of G(k) with the following equations:

G_r(N/2) = X_r(0) – X_i(0)
G_i(N/2) = 0
G_r(N–k) = G_r(k), for k = 1, 2, ..., N/2–1
G_i(N–k) = –G_i(k)

As you can see, the above equations require A(k) and B(k), which are sine and cosine coefficients. These values can be precomputed with the following C code, where even indices contains the real part and odd indices contain the imaginary part.

for (i = 0; i < N/2; i++)
{
  A[2 * i]     = 0.5 * (1.0 - sin (2 * PI / (double) (2 * N) * (double) i));
  A[2 * i + 1] = 0.5 * (-1.0 * cos (2 * PI / (double) (2 * N) * (double) i));
  B[2 * i]     = 0.5 * (1.0 + sin (2 * PI / (double) (2 * N) * (double) i));
  B[2 * i + 1] = 0.5 * (1.0 * cos (2 * PI / (double) (2 * N) * (double) i));
}

1.3 Returning to a Length N Real Sequence Using a Length N/2 Complex IFFT

Let G(k) be a N-point complex valued sequence derived from a real-valued sequence g(n). You want to get back g(n) = IFFT{G(k)}. However, you want to do this with an N/2-point IFFT. This can be accomplished using the following procedure:

X_r(k) = G_r(k)IA_r(k) – G_i(k)IA_i(k) + G_r(N/2–k)IB_r(k) + G_i(N/2–k)IB_i(k), for k = 0, 1, ..., N/2–1 and G(N/2) = G(0)
X_i(k) = G_i(k)IA_r(k) + G_r(k)IA_i(k) + G_r(N/2–k)IB_i(k) – G_i(N/2–k)IB_r(k)

Find X(k) using the above equations.
Compute the N/2-point inverse FFT of X(k) to get x(n) = x1(n) + jx2(n). Note that the IFFT should be performed with bit reversal.
Get g(n) from x(n) using the following equations:

g(2n) = x₁(n), for n = 0, 1, ..., N/2–1
g(2n+1) = x₂(n)

The above equations look similar to those used in our forward FFT computation. However, the pre-computed coefficients are slightly different. The following C code can be used to initialize IA(k) and IB(k), again, even indices contains the real part and odd indices contain the imaginary part.

for (i = 0; i < N/2; i++)
{
  IA[2 * i]     = 0.5 * (1.0 - sin (2 * PI / (double) (2 * N) * (double) i));
  IA[2 * i + 1] = 0.5 * (1.0 * cos (2 * PI / (double) (2 * N) * (double) i));
  IB[2 * i]     = 0.5 * (1.0 + sin (2 * PI / (double) (2 * N) * (double) i));
  IB[2 * i + 1] = 0.5 * (-1.0 * cos (2 * PI / (double) (2 * N) * (double) i));
}

Note that IA(k) is the complex conjugate of A(k) and IB(k) is the complex conjugate of B(k).

1.4 Benchmark of the Efficient Compute of FFT

Table 1 compares the performance of two methods of calculating the FFT of an N-point real sequence. Complex FFT refers to using an N-point complex FFT in the standard way, while Real FFT refers to using an N/2-point complex FFT as described in this topic. As expected, the Real FFT method yields superior performance.

Table 1. Performance Comparison

N	Cycle Count
N	Complex FFT	Real FFT	Improvement
128	1134	827	27.07%
256	2094	1709	18.38%
512	4944	3181	35.65%
1024	9680	7055	27.11%
2048	22770	13839	39.22%
4096	45298	31025	31.50%
8192	104724	61745	41.04%

NOTE

The data was measured on C6748 DSP with the data in L2 SRAM.
L1D and L1P cache was enabled.
Compiler CGT 6.x and CGT 7.4.x was used due to need to support legacy COFF binary format.

2 Fixed Point FFT With No Data Scaling

The FFT (DSP_fft16x16) and iFFT (DSP_ifft16x16) implementation provided with the C64x+ DSPLIB [1] apply scaling of data to avoid overflow. For more information, see the TMS320C64x+ DSP Little-Endian Library Programmer's Reference.

All stages are radix-4 except the last one, which can be radix-2 or radix-4, depending on the size of the FFT. All stages except the last one scale by two the stage output data.

It is desirable in certain use cases that the data scaling is not applied in the FFT routines. This article suggests modifications to the provided FFT source in the DSPLIB such that the data is not scaled. Also, an example is provided to demonstrate the affect of suggested change.