This site hosted by Free.ProHosting.com
Google

« JAMI | Main | Turbo charged multiplication using FFT »

May 28, 2002

Fast Fourier Transforms in C++

Linear transforms, especially Fourier and Laplace transforms, are widely used in solving problems in science and engineering. The Fourier transform is used in linear system analysis, antenna studies, optics, random process modelling, probability theory, quantum physics and boundary-value problems and have also been successfully applied to restoration of astronomical data. The Fourier transform, has often been used in many fields of science as a mathematical tool or physical tool to alter a problem into one that can be more easily solved.



Introduction
Linear transforms, especially Fourier and Laplace transforms, are widely used in solving problems in science and engineering. The Fourier transform is used in linear system analysis, antenna studies, optics, random process modelling, probability theory, quantum physics and boundary-value problems and have also been successfully applied to restoration of astronomical data. The Fourier transform, has often been used in many fields of science as a mathematical tool or physical tool to alter a problem into one that can be more easily solved.

Fourier methods as we can see are commonplace in research and digital signal processing. However many computer programmers are unaware of the potentials of this powerful mathematical tool which easily lends itself to the programming field. The SETI@Home project utilizes FFT (Fast Fourier Transforms) to analyze the signals by converting it into a frequency domain and examing the "hot spots" of standard extra terrestrial communication frequencies (around 1420 MHz) for intelligent "noise". The GIMPS (Great Internet Mersenne Prime Search) project locates record large primes by applying fourier methods to square the large numbers (otherwise impossible using standard computer data types) required for the Lucas Lehmer test on mersenne primes.

Although this article does cover some of the theory behind Fourier Transforms, it is impossible to give complete proofs and details behind the mathematics. Interested readers should refer to engineering/mathematics text books for more information, proofs and sample problems.


The Fourier Transform
A physical process can be described either in the time domain by values of a certain quantity h as a function of t. Note that such a process can also be described in the frequency domain where the process is specified by giving its amplitude H as a function of the frequency 'f'. It is possible to go back and forth between the two representations of the functions by means of the "Fourier transform" equations.

If t is measured in seconds then the fourier transform would give rise to the function using the first integral, which would be in cycles per second or Hertz. However, the equation works with other units too.
The Fourier transform in essence decomposes or separates a waveform or function into sinusoids of different frequency which sum to the original waveform. It identifies or distinguishes the different frequency sinusoids and their respective amplitudes.

If you are having trouble visualizing what Fourier transform is all about, you should read this excellent example of fourier transform applied to X-ray crystallography by Kevin Cowtan.

Fourier Transform Properties
Shifting Property
If FT{f(x)} = F(s) and x0 is a real constant, then

Let B = x-x0
Integral is F(s)

This time shifting property states that the Fourier transform of a shifted function is just the transform of the unshifted function multiplied by an exponential factor having a linear phase.

Convolution Theorem
Suppose g(x)=f(x)*h(x). Then, given that FT{g(x)}=G(s), FT{f(x)}=F(s), and FT{h(x)}=H(s),

Applying the fourier transform.
Shifting property to get H(s)*exponent
Integral is fourier transform of f(x), ie. F(s)

Thus fourier transform of a covolution is simply given by the product of the individual transforms, that is

FT{ f(x)*h(x) } = F(s)H(s)

Discrete Fourier Transform (DFT)
Because digital computers work only with discrete data, numerical computation of the Fourier transform of f(t) requires discrete sample values of f(t). In addition, a computer can compute the transform F(s) only at discrete values of s, that is, it can only provide discrete samples of the transform. Approximating the original fourier integral by a discrete sum:

Hence the problem reduces to multiplication of two complex numbers and taking their sum. The discrete Fourier transform maps N complex numbers into N complex numbers. The inverse of the discrete transform is similar with the exception of the negative power of "e" and the division by N to the result.

Fast Fourier Transform (FFT)
Looking back at the DFT equation,

we notice that the complex number h[k] is multiplied by a matrix whose (n,k)th element is the constant W to the power n*k. The matrix multiplication produces a complex result whose components are the H[n]. Thus the matrix multiplication requires N^2 complex multiplications plus a smaller number of operations to generate the required powers of W. Fortunately the problem is much simpler than it looks computationally. Thanks to J.W. Cooley and J.W. Tukey, efficient methods for computing the DFT were developed in the mid 1960s. However we will look at the Danielson and Lanczos version of the FFT developed in 1942 which provides one of the clearest derivations of the algorithm.

Danielson-Lanczos Lemma
Danielson and Lanczos showed that a discrete Fourier transform of length N can be rewritten as the sum of two discrete Fourier transforms, each of length N/2. One of the two formed from even-numbered points of the original N, the other from the odd-numbered points. The proof is as follows:

F[k](e) denotes the kth component of the Fourier transform of length N/2 formed from the even components of the original f[j] while F[k](o) is for the odd components. The transforms are periodic in k with length N/2. So each is repeated through two cycled to obtain F[k]. The wonderful thing about the Danielson-Lanczos Lemma is that it can be used recursively. Having reduced the problem of computing F[k] to the odd and even components of length N/2, the problem can be further divided into components of length N/4. We continue this process until we have subdivided the data into components of length 1. This would be a good time to note the fact that when selecting the size of N it is advantageous to have it as a power of 2. This is especially useful since computers are very good at operating at binary levels and in powers of 2! In case the data size is not aligned to a power of 2, it should be ensured that the size is set to nearest power of 2 and the extra elements padded with zeros.

The big question is what is the fourier transform of length one? It is just the identity operation that copies its one input number into its one output slot! In other words, for very pattern of e's and o's, there is a one point transform that is just one of the input numbers f[n] ! The next trick is to figure out which value of n corresponds to which pattern of e's and o's. Reverse the pattern of e's and o's, let e=0 and o=1 and you will have in binary the value of n! Thus the idea of bit reversal can be exploited in a very clever way which along with the Danielson-Lanczos Lemma makes FFTs practical. Combining the adjacent pairs to get two point transform, further to get four point transforms and so on until the first and second half of the whole data set are combined into the final transform. Each combination takes of order N operations, and there are log N (base 2) combinations. Thus the total number of operations has been reduced from N*N to N*log(base 2)N.

Thus the FFT algorithm has two sections. The first section sorts the data into bit-reversed order. This does not require additional storage space as we should aim at inplace bit order reversal. The second section has an outer loop that is executed log N (base 2) times and calculates transforms of length 2,4,8...N. For each stage of this process, two nested inner loops range over the subtransforms already computed and the elements of each transform.

Bit Reversal Permutation
Understanding the methods behind the bit reversal technique is rather simple. One plainly examines the element by looking at n in binary and reverses or flips that binary number. We look at two algorithms that carry out this task.

Naive Method
We first examine one of the simplest technique that can be used for bit reversal. Consider N=16 and the value we need bit reversed is 11 (binary: 1011).

Source CODE
#include <stdio.h>
#include <math.h>

main()
{
long int fftsize=pow(2,4);
long int bt,i,j,ser;
for (i=0; i<fftsize; i++)
{
bt=fftsize >> 1;
j=0;
ser=i;
while (bt>0)
{
j+=(ser % 2)*bt;
ser >>= 1;
bt >>= 1;
}
printf("\n%d -> %d",i,j);
}
}

Explanation
All that the above C code does is to multiply each binary digit by the fftsize, reducing the fftsize to half after every iteration. Thus to bit reverse the decimal number 11 (1011) for fftsize of 16,
= 8 * 1 + 4 * 1 + 2 * 0 + 1 * 1
= 8 + 4 + 1
= 13 (1101)
One should note that this method of conversion is cumbersome and time consuming.


Bracewell-Buneman Algorithm
The Bracewell-Buneman scheme utilizes storage techniques to gain speed within the algorithm. It starts out with the first row, and then it performs the swap. Next, it utilizes the definition of N=2^m and it adds another power of 2, which effectively doubles the row. This means that it looks at the mth row and adds 1 to the power. Further examination stems from the following chart of permuted positions:

n 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
r1[n] 0 1                            
r2[n] 0 2 1 3                        
r3[n] 0 4 2 6 1 5 3 7                
r4[n] 0 8 4 12 2 10 6 14 1 9 5 13 3 11 7 15

Each row is essentially a double of the previous. It is noticed that 2 terms in the first row do not change positions (0 and 1). 2 terms in the second row do not change positions (0 and 3). 4 terms in both the third and fourth rows do not change positions (3rd:0,2,5,7 .. 4th:0,6,9,15). This pattern stems from the storage type algorithm.

Source CODE
#include <stdio.h>
#include <math.h>

main()
{
long int sersize=pow(2,4);
long int newser[sersize];
long int itersersize=1;
long int iterval,iterpos;
newser[0]=0;
while (itersersize<sersize)
{
iterval=0;
for(iterpos=0;iterpos<itersersize;iterpos++)
{
iterval=newser[iterpos]*2;
newser[iterpos]=iterval;
newser[iterpos+itersersize]=iterval+1;
}
itersersize=itersersize*2;
}
for (iterpos=0;iterpos<sersize;iterpos++)
printf("%d\t",newser[iterpos]);
}

Explanation
The code initially performs the swap and then it sequentially multiplies the swapped values by 2. Effectively doubling each row of permuted values saves space, due to the simple fact that each row is not calculated separately for each individual 'm' term. Compared to the naive method, the Bracewell-Buneman method is much faster.

Inplace Ordering
Although the Bracewell-Buneman algorithm is nice if you have a lot of storage space, it is desirable to have an algorithm which can immediately give us the result. Now look at the data for N=4 bits.

Do you notice the pattern in the reversed bits? It starts off with 0000 then 1000, then 0100 then 1100. The reversed number can be obtained by simply adding a 1 to in reverse. The trick is to realize that this is just a simple XOR routine applied in reverse on the number. This code appears in Jorg Arndt's book Algorithms for Programmers, available at http://www.jjj.de/fxt/

Source CODE
long revbin_update(long r,long n)
{
do {
n=n>>1;
r=r^n;
} while ((r&n)==0);
return r;
}
void revbin_permute(complex *a,long n)
{
if (n<=2) return;
long r=0;
for (long x=1; x<n; x++)
{
r=revbin_update(r,n);
if (r>x) swap(a[x],a[r]);
}
}

The Complex Class
We have covered the important aspects of the fourier transform, but we have to develop the complex class before we can proceed to write our fast fourier transform functions. There do exist fast fourier transforms which can work on real data rather than complex input. I recommend reading "Numerical Recipes in C: The Art of Scientific Computing (ISBN: 0-521-43108-5)", available online at http://www.nr.com. The complex class will have a real and imaginary data members alongwith overloaded addition, subtraction, multiplication and division operators.

Source CODE
class complex {
public:
double real;
double imag;

complex()
{ real=0.0; imag=0.0; }
complex(double rl,double im)
{ real=rl; imag=im; }

complex operator+(complex& b)
{ return complex(real+b.real,imag+b.imag); }
complex operator-(complex& b)
{ return complex(real-b.real,imag-b.imag); }
complex operator*(complex& b)
{ return complex((real*b.real-imag*b.imag),(real*b.imag+imag*b.real)); }
complex operator/(complex& b)
{ return complex((real*b.real+imag*b.imag)/(b.real*b.real+b.imag*b.imag),
(imag*b.real-real*b.imag)/(b.real*b.real+b.imag*b.imag)); }
};

Danielson-Lanczos Lemma Code
We have to define two fourier functions, the fourier transform and the fourier convolution. The source codes are meant to provide a sufficiently quick implementation, however scope for extension and optimization still exist. The biggest limitation is that the data is limited to heap memory. Handling data sets larger than available memory (without using virtual memory) should be investigated and is left as an exercise to the reader.
C CODE

#define PI 3.1415926535897932384626

complex omega(double theta)
{ return complex(cos(theta),sin(theta)); }

void fft(complex *a, long ldn, int is)
// O(N log N)
//
// a[] is the complex input data set, lsb in 0 and msb in N
// ldn is the power of 2 which contains the entire data set, try to
// align the data to a power of 2 by padding with zeroes.
// is is the direction of the transform, +1 = forward, -1 = backward
//
// result is a[]
{
long n=1 << ldn;

revbin_permute(a,n);
for (long ldm=1; ldm<=ldn; ldm++)
{
long m=1 << ldm;
long mh=m/2;

for (long j=0; j<mh; j++)
{
complex e=omega(is*2*PI*j/m);

for (long r=0; r<=n-m; r+=m)
{
complex u=a[r+j];
complex v=a[r+j+mh]*e;

a[r+j]=(u+v);
a[r+j+mh]=(u-v);
}
}
}
}

void fft_convolution(complex *x,complex *y, long n)
// x[], y[] are the two input complex data sets.
// n is the power of 2 which contains the entire data set.
//
// result is y[]
{
long pw=1 << n;

fft(x,n,+1); // forward transform of x[]
fft(y,n,+1); // forward transfomr of y[]

for (long i=0; i<pw; i++)
y[i]=y[i]*x[i]; // element wise multiplication

fft(y,n,-1); // backward transform of y[]

for (i=0; i<pw; i++)
y[i]=y[i]/complex(pw,0); // normalize
}

That brings us to an end of our discussion on fast fourier transforms. In a future article I will discuss how we can use the FFT to perform fast multiplication on very large numbers. I hope the article helped you understand how to implement fast fourier transforms in C++.

Posted by amitc at May 28, 2002 10:06 PM

Comments

This article is really good. Learnt a lot and refreshed from this article.

Posted by: Balaji at October 30, 2005 12:27 PM

Post a comment




Remember Me?