I was recently given a programming assignment that required me to generate a bitmap representation of the Julia set for the complex function $f(z) = z^3 - 1$, using Newton’s method, with both serial (C) and GPU (CUDA) implementations.

Phew, that’s a mouthful!

## The Code

While I was working on the assignment I had trouble finding C implementations to serve as examples on the internet, so I’ve decided to share mine on Github.Â Note that I’m usingÂ libbmpÂ for saving my output.

## Implementation Details

For our purposes, Newton’s method essentially means that by iteratively computing $f(x_{+1}) = f(x) - \frac{f(x)}{f$, such that we obtain the complex function $f(z_{+1}) = \frac{z^3 - 1}{3z^2}$. After a severalÂ iterations (I’ve found that to be under 20) each $f(z)$ should converge to one of three basins of attraction (the cubic roots of unity: $1$, $e^\frac{2i\pi}{3}$, and $e^\frac{-2i\pi}{3}$).

To do this requires us to do a little arithmetic on complex numbers. There are a number of open source C/C++ libraries out there that can take care of the details for us, but I found them to be a bit overkill for computing just one function. As such, I’ve written a few very simple helper functions.

//////////////////////////////////////////////////////////////////////////////// // Complex Number Helper Functions //////////////////////////////////////////////////////////////////////////////// __device__ __host__ void complex_add(double a, double b, double c, double d, double *realOut,Â  double *imgOut) { *realOut = a + c; *imgOut = b + d; }   __device__ __host__ void complex_sub(double a, double b, double c, double d, double *realOut, double *imgOut) { *realOut = a - c; *imgOut = b - d; }   __device__ __host__ void complex_mul(double a, double b, double c, double d, double *realOut, double *imgOut) { *realOut = (a * c) - (b * d); *imgOut = (b * c) + (a * d); }   __device__ __host__ void complex_div(double a, double b, double c, double d, double *realOut, double *imgOut) { *realOut = ((a * c) + (b * d)) / (pow(c, 2) + pow(d, 2)); *imgOut = ((b * c) - (a * d))/ (pow(c, 2) + pow(d, 2)); }

Notice the __device__ and __host__ prefixes? Those indicate that these functions are available both on the cpu (host) and the gpu (device).

Since we’re trying to map a complex function to an image, the xÂ coÃ¶rdinate in our image will correspond to the real component of z, and the yÂ coÃ¶rdinate will map to the imaginary component of z. Essentially what we’re going to do is, for each pixel, iteratively evaluate $f(z)$ until we converge on one of our basin attractors, and colour that pixel accordingly. Pretty simple!

Here’s the meaty part:

//////////////////////////////////////////////////////////////////////////////// // CPU Implementation //////////////////////////////////////////////////////////////////////////////// void cpu_julia(int *matrix) { double newRe, newIm, oldRe, oldIm; double z_3_r, z_3_i, z_2_r, z_2_i, inner_r, inner_i;   double ratio = (double)height / (double)width;   for(int x = 0; x < width; x++) for(int y = 0; y < height; y++) { /////////// // Set up starting value based on x, y (x = real, y = imaginary). /////////// newRe = (((double)x / (double)width) - 0.5) * 2 * zoom; newIm = ratio * (((double)y / (double)height) - 0.5) * 2 * zoom;   /////////// // Newton's Method. z[+ 1] = z - ((z^3 - 1) / 3z^2) /////////// for(int i = 0; i < max_iterations; i++) { oldRe = newRe; oldIm = newIm;   //Clear everything. z_3_r = z_3_i = z_2_r = z_2_i = inner_r = inner_i = 0;   complex_mul(oldRe, oldIm, oldRe, oldIm, &z_2_r, &z_2_i); // z^2 complex_mul(z_2_r, z_2_i, oldRe, oldIm, &z_3_r, &z_3_i); // z^3 z_3_r -= 1; //z^3 - 1   z_2_r *= 3; // 3z^2 z_2_i *= 3;   complex_div(z_3_r, z_3_i, z_2_r, z_2_i, &inner_r, &inner_i); // ((z^3 - 1) / 3z^2)   complex_sub(oldRe, oldIm, inner_r, inner_i, &newRe, &newIm); //z - ((z^3 - 1) / 3z^2)   //If we've mostly converged, break out early. if (abs(newRe - oldRe) < epsilon && abs(newIm - oldIm) < epsilon) break; }   /////////// // Figure out which root we've converged to. /////////// if (abs(1 - newRe) < epsilon && abs(0 - newIm) < epsilon) matrix[x * height + y] = 1; else if (newRe - 0.5 < epsilon && 0.86603 - newIm < epsilon) matrix[x * height + y] = 2; else if (newRe - 0.5 < epsilon && newIm - 0.86603 < epsilon) matrix[x * height + y] = 3; else matrix[x * height + y] = 0; } }

## And Cuda?

Well, it turns out we don’t need to make too many changes for our Cuda version! The main iteration loop is going to stay the same – the only difference is that we no longer need the outer XÂ and YÂ loops. Instead, each Cuda threads is going to map directly to one pixel in our output matrix.

__global__ void gpu_julia(int *matrix, int width, int height, int max_iterations, double zoom, double epsilon) { //Compute global thread id to index global memory. //Each thread is one pixel. int threadID = (blockIdx.x * blockDim.x) + threadIdx.x;   double newRe, newIm, oldRe, oldIm; double z_3_r, z_3_i, z_2_r, z_2_i, inner_r, inner_i;   //Guard to make sure we're not writing to memory we don't own. if (threadID < width * height) { /////////// // Set up starting value based on x, y (x = real, y = imaginary). /////////// int x = (threadID / height); int y = (threadID % height);   newRe = (((double)x / (double)width) - 0.5) * 2 * zoom; newIm = ((double)height / (double)width) * (((double)y / (double)height) - 0.5) * 2 * zoom;   /////////// // Newton's Method. z[+ 1] = z - ((z^3 - 1) / 3z^2) /////////// for(int i = 0; i < max_iterations; i++) {...

## Results

Click for a larger version.

Thanks for reading! Don’t forget to check out the full project onÂ Github, which includes all the boring glue code I’ve omitted in the post.