 When I was doing my undergrad I took a course that encouraged students to explore various browser technologies (such as Javascript, canvas, etc) in their assignments. For one of these assignments I thought it would be fun to re-implement an interactive Javascript version of the swarming/biod simulation I had previously written in Java.

You can play around with a live demo of the code, or check out the source on Github. The biods themselves are colour-coded based on which force/rule currently dominates their heading:

• Red: Claustrophobia (biods don’t like getting too close to one another, and will try to fan out a bit).
• Green: Cohesion (biods will, in general attempt to move in the same average direction as their peers).
• Blue: An repulsor/attractor is pulling/pushing on the biod.

The little purple guys in the bottom-left are a couple of predators. They’re programmed to chase and eat the biods (they go after the closest one in their field of vision), are faster than the biods (but have a slower turning radius) and are generally pretty fun to watch. They have a repulsor attached to them that causes our biods to flee when a predator gets close.

Have fun!

I was recently given a programming assignment that required me to generate a bitmap representation of the Julia set for the complex function $f(z) = z^3 - 1$, using Newton’s method, with both serial (C) and GPU (CUDA) implementations.

Phew, that’s�a mouthful!

## The Code

While I was working on the assignment I had trouble finding C implementations to serve as examples on the internet, so I’ve decided to share mine on Github. Note that I’m using libbmp for saving my output.

## Implementation Details

For our purposes, Newton’s method essentially means that by iteratively computing $f(x_{+1}) = f(x) - \frac{f(x)}{f$, such that we obtain the complex function $f(z_{+1}) = \frac{z^3 - 1}{3z^2}$. After a several iterations (I’ve found that to be under 20) each $f(z)$ should converge to one of three basins of attraction (the cubic roots of unity: $1$, $e^\frac{2i\pi}{3}$, and $e^\frac{-2i\pi}{3}$).

To do this requires us to do a little arithmetic on complex numbers. There are a number of open source C/C++ libraries out there that can take care of the details for us, but I found them to be a bit overkill for computing just one function. As such, I’ve written a few very simple helper functions.

//////////////////////////////////////////////////////////////////////////////// // Complex Number Helper Functions //////////////////////////////////////////////////////////////////////////////// __device__ __host__ void complex_add(double a, double b, double c, double d, double *realOut,  double *imgOut) { *realOut = a + c; *imgOut = b + d; }   __device__ __host__ void complex_sub(double a, double b, double c, double d, double *realOut, double *imgOut) { *realOut = a - c; *imgOut = b - d; }   __device__ __host__ void complex_mul(double a, double b, double c, double d, double *realOut, double *imgOut) { *realOut = (a * c) - (b * d); *imgOut = (b * c) + (a * d); }   __device__ __host__ void complex_div(double a, double b, double c, double d, double *realOut, double *imgOut) { *realOut = ((a * c) + (b * d)) / (pow(c, 2) + pow(d, 2)); *imgOut = ((b * c) - (a * d))/ (pow(c, 2) + pow(d, 2)); }

Notice the __device__ and __host__ prefixes? Those indicate that these functions are available both on the cpu (host) and the gpu (device).

Since we’re trying to map a complex function to an image, the x coördinate in our image will correspond to the real component of z, and the y coördinate will map to the imaginary component of z. Essentially what we’re going to do is, for each pixel, iteratively evaluate $f(z)$ until we converge on one of our basin attractors, and colour that pixel accordingly. Pretty simple!

Here’s the meaty part:

//////////////////////////////////////////////////////////////////////////////// // CPU Implementation //////////////////////////////////////////////////////////////////////////////// void cpu_julia(int *matrix) { double newRe, newIm, oldRe, oldIm; double z_3_r, z_3_i, z_2_r, z_2_i, inner_r, inner_i;   double ratio = (double)height / (double)width;   for(int x = 0; x < width; x++) for(int y = 0; y < height; y++) { /////////// // Set up starting value based on x, y (x = real, y = imaginary). /////////// newRe = (((double)x / (double)width) - 0.5) * 2 * zoom; newIm = ratio * (((double)y / (double)height) - 0.5) * 2 * zoom;   /////////// // Newton's Method. z[+ 1] = z - ((z^3 - 1) / 3z^2) /////////// for(int i = 0; i < max_iterations; i++) { oldRe = newRe; oldIm = newIm;   //Clear everything. z_3_r = z_3_i = z_2_r = z_2_i = inner_r = inner_i = 0;   complex_mul(oldRe, oldIm, oldRe, oldIm, &z_2_r, &z_2_i); // z^2 complex_mul(z_2_r, z_2_i, oldRe, oldIm, &z_3_r, &z_3_i); // z^3 z_3_r -= 1; //z^3 - 1   z_2_r *= 3; // 3z^2 z_2_i *= 3;   complex_div(z_3_r, z_3_i, z_2_r, z_2_i, &inner_r, &inner_i); // ((z^3 - 1) / 3z^2)   complex_sub(oldRe, oldIm, inner_r, inner_i, &newRe, &newIm); //z - ((z^3 - 1) / 3z^2)   //If we've mostly converged, break out early. if (abs(newRe - oldRe) < epsilon && abs(newIm - oldIm) < epsilon) break; }   /////////// // Figure out which root we've converged to. /////////// if (abs(1 - newRe) < epsilon && abs(0 - newIm) < epsilon) matrix[x * height + y] = 1; else if (newRe - 0.5 < epsilon && 0.86603 - newIm < epsilon) matrix[x * height + y] = 2; else if (newRe - 0.5 < epsilon && newIm - 0.86603 < epsilon) matrix[x * height + y] = 3; else matrix[x * height + y] = 0; } }

## And Cuda?

Well, it turns out we don’t need to make too many changes for our Cuda version! The main iteration loop is going to stay the same – the only difference is that we no longer need the outer X and Y loops. Instead, each Cuda threads is going to map directly to one pixel in our output matrix.

__global__ void gpu_julia(int *matrix, int width, int height, int max_iterations, double zoom, double epsilon) { //Compute global thread id to index global memory. //Each thread is one pixel. int threadID = (blockIdx.x * blockDim.x) + threadIdx.x;   double newRe, newIm, oldRe, oldIm; double z_3_r, z_3_i, z_2_r, z_2_i, inner_r, inner_i;   //Guard to make sure we're not writing to memory we don't own. if (threadID < width * height) { /////////// // Set up starting value based on x, y (x = real, y = imaginary). /////////// int x = (threadID / height); int y = (threadID % height);   newRe = (((double)x / (double)width) - 0.5) * 2 * zoom; newIm = ((double)height / (double)width) * (((double)y / (double)height) - 0.5) * 2 * zoom;   /////////// // Newton's Method. z[+ 1] = z - ((z^3 - 1) / 3z^2) /////////// for(int i = 0; i < max_iterations; i++) {...

## Results Click for a larger version.

Thanks for reading! Don’t forget to check out the full project on Github, which includes all the boring glue code I’ve omitted in the post.

I’ve found a pretty nifty plugin to do syntax highlighting in WordPress – and because it’s Ajax/jQuery, colourization gets done after the fact so that those of you who run without javascript will still get to see some (ugly) code.

Giawa recommended a WordPress plugin for syntax highlighting that seems to work pretty well (previously I used jQuery.Syntax, but it seemed to have issues with line numbers in IE/FF).

The plugin is called wp-syntax.

using System; using System.Collections.Generic; using System.Linq; using System.Text;   namespace HelloWorld { class Program { static void Main(string[] args) { Console.WriteLine("Hello World!"); } } }

The Hough transform is a powerful feature recognition algorithm that was originally designed (by Paul Hough of IBM) to extract lines from images. In this post, I’m going to describe a very simple modification that allows for the extraction of not lines, but circles.

Note: I will not discuss the theory behind the Hough transform, as both Wikipedia and PlanetMath have excellent in-depth descriptions. Instead I’m going to simply provide a quick and dirty implementation for circle detection.

The most important aspect of this algorithm is that the radius of the circle you are looking to detect needs to be known before-hand. While there is a little wiggle room, the algorithm is about as precise as your guess is close to the radius of the circle you are trying to detect. In my case I wanted to detect a small high-density disc in a series of phantom mammograms (shown below), which is more or less always going to be about 1 inch in diameter. Low Contrast Phantom Mammogram

## Step 1: Noise Suppression

The crux of the algorithm is going to rely upon detecting the silhouette of the circle (each pixel that belongs to its edge counts as a “vote”, to use the academic lingo). Most edge-detecting algorithms tend to be sensitive to noise – especially the simpler-to-implement ones, so some blurring is required. While it is perfectly legitimate to use a naive box or gaussian filter, my suggestion is to use something like the median filter which suppresses noise whilst preserving edges. Median Filter on Low Contrast Phantom Mammogram Median Filter on Noisy Phantom Mammogram

## Step 2: Edge Detection

In order to extract features from our image, we need to obtain feature sillouhettes. For best results, the type of edge detector to use depends on the composition of the image you’re working with. For the phantom mammograms I use a slightly modified Sobel operator. The resulting edge map should be a binary image – either a pixel belongs to the silhouette or it doesn’t. Sobel Filter on Low Contrast Phantom Mammogram Sobel Filter on Noisy Phantom Mammogram

## Step 3: Build Hough Map

The idea here is to count the number of pixels that belong to each edge in the image. If we do things right, the feature with the highest number of votes (pixels) is the bounding edge of our circle, allowing us to calculate a center position. However, with the edge map we calculated in the previous step we have no information as to which pixels belong to what feature. To remedy this we’ll take advantage of one very nice property of a circle – it is a completely uniform shape. Because of this, if you were to draw circles all around the perimeter of another circle of equal radius, the point of greatest overlap would be the very center of that circle. Very convenient! To accomplish this, at each non-zero pixel in our edge map we additively plot a circle (this is why we needed to know the radius beforehand!) centered over that pixel in our Hough map. Hough Transform on Low Contrast Phantom Mammogram Hough Transform on Noisy Phantom Mammogram

## Step 4: Detect the Origin

Extracting the origin of our detected circle from the Hough map is a relatively simple process – scan for the pixel of highest intensity, and you’re done! However due to factors like noise, the Hough map is unlikely to be clean enough for such an approach. In my instance I ended up searching for the pixel of highest intensity and region-filling at that value (using a very simple blob fill), and taking the center point of that region as my circle origin.

## Results

So far, I’ve been very happy with this algorithm in my current implementation. An image has to be very, very noisy in order for things to be tripped up. The algorithm also runs fairly quickly (the median filter is the greatest bottleneck), but optimization of convolution filters is a good topic for another day!

Flocking algorithms are a very interesting subset of n-body simulations that allow for the modeling of fairly complex behaviour (such as the swarming of moths around a light) using a very simple set of base rules. Chris Reynolds developed the first computer simulation back in 1986, outlining the 3 basic rules that govern the behaviour of individual “Biods” .

During the last winter semester at school I was developing a graphical demo to showcase some research I had done for a directed studies, and stumbled across Reynolds’ work while I was looking for a way to create a swarm of fireflies. Thanks to the very simple nature of biod interaction, (you can think of Reynolds’ three rules as being the basic instincts for flies) implementation only took a couple of days. While my graphical demo ended up going in a different direction, meaning I didn’t get to use the flocking code I’d written, I did manage to put together a fun little java demo.

## Basic Implementation

Each biod is an autonomous object that gets simulated independently (think of a particle in a standard n-body simulator). Updates are done in 2 passes – the first pass applies Reynolds’ three rules:

1. Separation: Biods will steer to avoid one-another.
2. Alignment: Biods will steer to match the same general heading as their neighbors.
3. Cohesion: Biods will steer to move towards the center position of their neighbors.

Implementation of the first pass is pretty straight forward. First, each biod scans each other biod in the simulation and builds a list of neighbours (those biods which are both within a certain “visibility” range, and fit within a front-facing cone that mimics field-of-view). Next a vector is calculated for each of the three rules, with separation modeled like a spring so as to scale with distance to surrounding elements. Finally the three vectors are summed and normalized, providing a new heading for that biod.

The second pass iterates through each biod and advances them forward along the heading vector that was calculated in the first pass.

## Repulsers and Attractors

Having a bunch of biods swarm about aimlessly in a cloud is kind of cool, but in order to make things more visually exciting I decided to give my biods some direction. I extended them to incorporate a fourth force, an “outside influence” force if you will, that simply gets added in during normalization with the other three forces. Prior to running the update passes on the biods, I iterate through each attractor/repulser, having each apply an exponential force to all biods based on distance (think the force of gravity between two entities), which is accumulated in the new fourth force vector of each biod. For fun, I added several different kinds of methods to model the interaction between attractors/repulsers, such as linear, logarithmic, and sinusoidal (weird eh? Creates a very cool pulsing-wave effect). For kicks I also threw in support for walls, so I could force my flock through a maze.

## Predators

Finally, to wrap everything up, I added one final feature. Predators. No point in having a heard of gazelle without a lion to hunt them, eh? Predators are just repulsers that tend to move towards the center position of all visible biods. This creates a nice chasing effect as the predator moves in towards the bulk of the flock, and the flock disperses to avoid the predator.

## Improvements

For the most part I figure my flocking simulation is pretty complete – it has all the basics implemented, looks pretty cool, and even has an extra feature or two. However it is highly unoptimized (about 4000 biods is the maximum my poor MacbookPro can handle) – everything is done in a single thread, I’m using the basic built-in Java drawing routines, and there’s a nasty Thread.sleep() in the main loop to maintain a semi-constant framerate. If I end up getting back to this and improving it any, I think I’d like to take a look at jCuda (Java bindings for Nvidia’s CUDA library) and see what kind of a benefit massively-parallelizing the simulation on the GPU brings.