SPRACN4 August 2019 66AK2G12 , 66AK2H06 , 66AK2H12 , 66AK2H14 , OMAP-L132 , OMAP-L138 , TMS320C6452 , TMS320C6454 , TMS320C6455 , TMS320C6457 , TMS320C6652 , TMS320C6654 , TMS320C6655 , TMS320C6657 , TMS320C6672 , TMS320C6674 , TMS320C6678 , TMS320C6742 , TMS320C6743 , TMS320C6745 , TMS320C6746 , TMS320C6747 , TMS320C6748
DSPLIB for TI C64x+ and C66x DSP architecture provides optimized implementation for FFT with assumptions for format and scaling of the input data. The application report provides guidance for users of DSPLIB for the following use cases:
The content discussed benefits all DSPLIB users on C64x+, C674x and C66x DSP devices.
All other trademarks are the property of their respective owners.
Algorithms to perform forward and Inverse Fast Fourier Transforms (FFT and IFFT) typically assume complex input and output data. However, many applications use only real-valued data in the time domain. A simple but inefficient solution to this problem is to pad N-length real input signals to N-length complex input signals with zero-valued imaginary components.
xreal = { 1, 2, 3, ... }
xcplx = { 1, 0, 2, 0, 3, 0, ...}
The complex FFT can then be applied to the double-length sequence. However, this method is obviously inefficient. This topic explains how to use the complex-valued FFT and IFFT algorithms to efficiently process real-valued sequences without padding those sequences. There are two key advantages to this approach:
For more information on the derivation of this method, see Implementing Fast Fourier Transform Algorithms of Real-Valued Sequences With the TMS320 DSP Platform.
Example C code for this method is also available for download.
Let g(n) be an N-point real sequence (N must be even). Compute the N-point complex FFT, but only use an N/2-point FFT computation. This can be accomplished by using the following steps:
Gr(k) = Xr(k)Ar(k) – Xi(k)Ai(k) + Xr(N/2–k)Br(k) + Xi(N/2–k)Bi(k), for k = 0, 1, ..., N/2–1 and X(N/2) = X(0)
Gi(k) = Xi(k)Ar(k) + Xr(k)Ai(k) + Xrr(N/2–k)Bi(k) – Xi(N/2–k)Br(k)
Note that only N/2 points of the N-point sequence of G(k) are computed in the above equations. Because the FFT of a real-sequence has symmetric properties, you can easily compute the remaining N/2 points of G(k) with the following equations:
Gr(N/2) = Xr(0) – Xi(0)
Gi(N/2) = 0
Gr(N–k) = Gr(k), for k = 1, 2, ..., N/2–1
Gi(N–k) = –Gi(k)
As you can see, the above equations require A(k) and B(k), which are sine and cosine coefficients. These values can be precomputed with the following C code, where even indices contains the real part and odd indices contain the imaginary part.
for (i = 0; i < N/2; i++)
{
A[2 * i] = 0.5 * (1.0 - sin (2 * PI / (double) (2 * N) * (double) i));
A[2 * i + 1] = 0.5 * (-1.0 * cos (2 * PI / (double) (2 * N) * (double) i));
B[2 * i] = 0.5 * (1.0 + sin (2 * PI / (double) (2 * N) * (double) i));
B[2 * i + 1] = 0.5 * (1.0 * cos (2 * PI / (double) (2 * N) * (double) i));
}
Let G(k) be a N-point complex valued sequence derived from a real-valued sequence g(n). You want to get back g(n) = IFFT{G(k)}. However, you want to do this with an N/2-point IFFT. This can be accomplished using the following procedure:
Xr(k) = Gr(k)IAr(k) – Gi(k)IAi(k) + Gr(N/2–k)IBr(k) + Gi(N/2–k)IBi(k), for k = 0, 1, ..., N/2–1 and G(N/2) = G(0)
Xi(k) = Gi(k)IAr(k) + Gr(k)IAi(k) + Gr(N/2–k)IBi(k) – Gi(N/2–k)IBr(k)
g(2n) = x1(n), for n = 0, 1, ..., N/2–1
g(2n+1) = x2(n)
The above equations look similar to those used in our forward FFT computation. However, the pre-computed coefficients are slightly different. The following C code can be used to initialize IA(k) and IB(k), again, even indices contains the real part and odd indices contain the imaginary part.
for (i = 0; i < N/2; i++)
{
IA[2 * i] = 0.5 * (1.0 - sin (2 * PI / (double) (2 * N) * (double) i));
IA[2 * i + 1] = 0.5 * (1.0 * cos (2 * PI / (double) (2 * N) * (double) i));
IB[2 * i] = 0.5 * (1.0 + sin (2 * PI / (double) (2 * N) * (double) i));
IB[2 * i + 1] = 0.5 * (-1.0 * cos (2 * PI / (double) (2 * N) * (double) i));
}
Note that IA(k) is the complex conjugate of A(k) and IB(k) is the complex conjugate of B(k).
Table 1 compares the performance of two methods of calculating the FFT of an N-point real sequence. Complex FFT refers to using an N-point complex FFT in the standard way, while Real FFT refers to using an N/2-point complex FFT as described in this topic. As expected, the Real FFT method yields superior performance.
N | Cycle Count | ||
---|---|---|---|
Complex FFT | Real FFT | Improvement | |
128 | 1134 | 827 | 27.07% |
256 | 2094 | 1709 | 18.38% |
512 | 4944 | 3181 | 35.65% |
1024 | 9680 | 7055 | 27.11% |
2048 | 22770 | 13839 | 39.22% |
4096 | 45298 | 31025 | 31.50% |
8192 | 104724 | 61745 | 41.04% |
NOTE
The FFT (DSP_fft16x16) and iFFT (DSP_ifft16x16) implementation provided with the C64x+ DSPLIB [1] apply scaling of data to avoid overflow. For more information, see the TMS320C64x+ DSP Little-Endian Library Programmer's Reference.
All stages are radix-4 except the last one, which can be radix-2 or radix-4, depending on the size of the FFT. All stages except the last one scale by two the stage output data.
It is desirable in certain use cases that the data scaling is not applied in the FFT routines. This article suggests modifications to the provided FFT source in the DSPLIB such that the data is not scaled. Also, an example is provided to demonstrate the affect of suggested change.