(Insert standard gosh-I-haven't-blogged-in-a-while blurb here)

(UPDATE: If you get an out-of-video-memory exception when starting the program, find the line of code that says "public const int numNodesToAllocate = 250;" and change 250 to a smaller number.)

I decided to improve my old XNA Mandelbrot set explorer to take advantage of the GPU horsepower available on PCs and the Xbox. It was a fair bit of work, but the end result is much more interactive and fun to use than the old version.

The core Mandelbrot set algorithm is a natural fit for a pixel shader. The harder part is making it incremental. Once you're zoomed in a fair bit, it can take hundreds or thousands of iterations to determine if a point is in the set or not, and it's more fun if you can zoom and pan around while progressively improving the image. So you want a program that can iterate a little, then move and draw, then iterate a little more. That is a little off the beaten path of the standard rendering pipeline.

I decided to use render-to-texture and ping-pong between two 256x256 textures. So for the first frame, the shader reads from texture A, computes some iterations, and writes to texture B. Then texture B is rendered onscreen. Then in the next frame, the shader reads from texture B, computes some further iterations, and writes to texture A. Then texture A is rendered onscreen. Then we repeat the whole process. (Life would be easier if we could read to and write to the same texture, but Direct3D doesn't work that way).

But instead of there just being one texture A and one texture B, there are a whole lot of them. The two-dimensional space is subdivided with a quadtree, and each node in the quadtree has a "texture A" and a "texture B." The program both iterates on and draws all visible nodes, every frame.

Also, what I'm calling "texture A" for one node is actually more than one Direct3D texture. I need to maintain four pieces of state for each texel: the current calculated "x", the current calculated "y", the iteration count, and whether the point is "done." Ideally I'd use floats for "x" and "y", a uint for "iterations", and a boolean for "done." SurfaceFormat.Vector4 would have been sufficient (albeit a bit wasteful), but the Xbox doesn't support it. I decided to use three SurfaceFormat.Single textures, and encode "done" into the iterations count by negating the value if the texel is done.

I had never tried using multiple render targets (MRT) before, but it worked fine. Instead of having the pixel shader output a float4, it needs to output a structure with values for each render target:

// MRT output. Even though each RT only has a red channel,
// we have to write out a four-component vector to each RT.
struct PS_OUTPUT
{
    float4 x : COLOR0;
    float4 y : COLOR1;
    float4 iterations : COLOR2;
};

The shader also needs to read from three textures, but that's just a matter of using three samplers, with a different texture bound to each sampler:

    float4 colorData = tex2D(dataSamp, pos);
    float4 colorData1 = tex2D(dataSamp1, pos);
    float4 colorData2 = tex2D(dataSamp2, pos);
    float x = colorData.r;
    float y = colorData1.r;
    float iterations = colorData2.r;
    bool done = false;
    if( iterations < 0 )
    {
        done = true;
        iterations = -iterations;
    }

    (Not shown here: update x, y, iterations, done)

    if( done )
        iterations = -iterations; 
    PS_OUTPUT output;
    output.x = float4(x,0,0,0);
    output.y = float4(y,0,0,0);
    output.iterations = float4(iterations,0,0,0);
    return output;

Okay, so each node consists of six 256x256 32 bpp textures, or 1.5MB of texture memory. How many of these nodes do we need around, to be able to view a frame at full screen resolution? At 1280x720, there are generally about 50 to 100 nodes recalculated per frame. Since we want to zoom and pan around, it's helpful to have a few extra nodes. So I allocate a pool of 250 nodes up front when the program starts, and purge least-recently-used nodes when I need fresh ones for newly-exposed areas of the screen. So, yeah, that's 375MB of textures altogether, and 75-150MB of that is read/written per frame. That's a lot of texture memory to eat up, especially on an Xbox, but it works.

There are actually three key pixel shaders. One is used to set the initial values for x, y, and iterations/done when a node is initialized. A second shader does the iterations. A third shader just reads the iterations/done info and translates it into a color for display.

You can use the "B" button to tell the "CalcCore" pixel shader to perform 1, 4, or 16 iterations per pixel per frame. More iterations makes a given node calculate faster, and is nice if you're not panning/zooming, but the frame rate suffers a bit. As with previous versions of Wendybrot, you can use the right stick to play around with the color palette.

The attached zip file contains all the source code, plus prebuilt ccgame packages for both Xbox and Windows. You can undefine "USE_RTT" in the source code to make Wendybrot use the old method of performing the iterations on the CPU instead of with the pixel shader -- but it's much slower! Also, you can try rendering at 1920x1080, but the frame rate struggles a bit here. Still, it's staggering to think about how much math and memory transfer is happening per frame.

I hope you enjoy playing with the program and learn a few things from the code.

-Mike