I was recently given a programming assignment that required me to generate a bitmap representation of the Julia set for the complex function $f(z) = z^3 - 1$, using Newton’s method, with both serial (C) and GPU (CUDA) implementations.

Phew, that’s�a mouthful!

## The Code

While I was working on the assignment I had trouble finding C implementations to serve as examples on the internet, so I’ve decided to share mine on Github. Note that I’m using libbmp for saving my output.

## Implementation Details

For our purposes, Newton’s method essentially means that by iteratively computing $f(x_{+1}) = f(x) - \frac{f(x)}{f$, such that we obtain the complex function $f(z_{+1}) = \frac{z^3 - 1}{3z^2}$. After a several iterations (I’ve found that to be under 20) each $f(z)$ should converge to one of three basins of attraction (the cubic roots of unity: $1$, $e^\frac{2i\pi}{3}$, and $e^\frac{-2i\pi}{3}$).

To do this requires us to do a little arithmetic on complex numbers. There are a number of open source C/C++ libraries out there that can take care of the details for us, but I found them to be a bit overkill for computing just one function. As such, I’ve written a few very simple helper functions.

//////////////////////////////////////////////////////////////////////////////// // Complex Number Helper Functions //////////////////////////////////////////////////////////////////////////////// __device__ __host__ void complex_add(double a, double b, double c, double d, double *realOut,  double *imgOut) { *realOut = a + c; *imgOut = b + d; }   __device__ __host__ void complex_sub(double a, double b, double c, double d, double *realOut, double *imgOut) { *realOut = a - c; *imgOut = b - d; }   __device__ __host__ void complex_mul(double a, double b, double c, double d, double *realOut, double *imgOut) { *realOut = (a * c) - (b * d); *imgOut = (b * c) + (a * d); }   __device__ __host__ void complex_div(double a, double b, double c, double d, double *realOut, double *imgOut) { *realOut = ((a * c) + (b * d)) / (pow(c, 2) + pow(d, 2)); *imgOut = ((b * c) - (a * d))/ (pow(c, 2) + pow(d, 2)); }

Notice the __device__ and __host__ prefixes? Those indicate that these functions are available both on the cpu (host) and the gpu (device).

Since we’re trying to map a complex function to an image, the x coördinate in our image will correspond to the real component of z, and the y coördinate will map to the imaginary component of z. Essentially what we’re going to do is, for each pixel, iteratively evaluate $f(z)$ until we converge on one of our basin attractors, and colour that pixel accordingly. Pretty simple!

Here’s the meaty part:

//////////////////////////////////////////////////////////////////////////////// // CPU Implementation //////////////////////////////////////////////////////////////////////////////// void cpu_julia(int *matrix) { double newRe, newIm, oldRe, oldIm; double z_3_r, z_3_i, z_2_r, z_2_i, inner_r, inner_i;   double ratio = (double)height / (double)width;   for(int x = 0; x < width; x++) for(int y = 0; y < height; y++) { /////////// // Set up starting value based on x, y (x = real, y = imaginary). /////////// newRe = (((double)x / (double)width) - 0.5) * 2 * zoom; newIm = ratio * (((double)y / (double)height) - 0.5) * 2 * zoom;   /////////// // Newton's Method. z[+ 1] = z - ((z^3 - 1) / 3z^2) /////////// for(int i = 0; i < max_iterations; i++) { oldRe = newRe; oldIm = newIm;   //Clear everything. z_3_r = z_3_i = z_2_r = z_2_i = inner_r = inner_i = 0;   complex_mul(oldRe, oldIm, oldRe, oldIm, &z_2_r, &z_2_i); // z^2 complex_mul(z_2_r, z_2_i, oldRe, oldIm, &z_3_r, &z_3_i); // z^3 z_3_r -= 1; //z^3 - 1   z_2_r *= 3; // 3z^2 z_2_i *= 3;   complex_div(z_3_r, z_3_i, z_2_r, z_2_i, &inner_r, &inner_i); // ((z^3 - 1) / 3z^2)   complex_sub(oldRe, oldIm, inner_r, inner_i, &newRe, &newIm); //z - ((z^3 - 1) / 3z^2)   //If we've mostly converged, break out early. if (abs(newRe - oldRe) < epsilon && abs(newIm - oldIm) < epsilon) break; }   /////////// // Figure out which root we've converged to. /////////// if (abs(1 - newRe) < epsilon && abs(0 - newIm) < epsilon) matrix[x * height + y] = 1; else if (newRe - 0.5 < epsilon && 0.86603 - newIm < epsilon) matrix[x * height + y] = 2; else if (newRe - 0.5 < epsilon && newIm - 0.86603 < epsilon) matrix[x * height + y] = 3; else matrix[x * height + y] = 0; } }

## And Cuda?

Well, it turns out we don’t need to make too many changes for our Cuda version! The main iteration loop is going to stay the same – the only difference is that we no longer need the outer X and Y loops. Instead, each Cuda threads is going to map directly to one pixel in our output matrix.

__global__ void gpu_julia(int *matrix, int width, int height, int max_iterations, double zoom, double epsilon) { //Compute global thread id to index global memory. //Each thread is one pixel. int threadID = (blockIdx.x * blockDim.x) + threadIdx.x;   double newRe, newIm, oldRe, oldIm; double z_3_r, z_3_i, z_2_r, z_2_i, inner_r, inner_i;   //Guard to make sure we're not writing to memory we don't own. if (threadID < width * height) { /////////// // Set up starting value based on x, y (x = real, y = imaginary). /////////// int x = (threadID / height); int y = (threadID % height);   newRe = (((double)x / (double)width) - 0.5) * 2 * zoom; newIm = ((double)height / (double)width) * (((double)y / (double)height) - 0.5) * 2 * zoom;   /////////// // Newton's Method. z[+ 1] = z - ((z^3 - 1) / 3z^2) /////////// for(int i = 0; i < max_iterations; i++) {...

## Results Click for a larger version.

Thanks for reading! Don’t forget to check out the full project on Github, which includes all the boring glue code I’ve omitted in the post.