Welcome to MSDN Blogs Sign in | Join | Help

Comments on 2D vs. 3D

Some great comments on my previous post.

A popular theory is that our affinity for 2D could be an artifact of the displays we are using. JeBus says: "If we had holographic technology, where the game objects actually appeared around the player in a 3-dimensional space, we'd be much better at it in-game."

But people were creating artwork and visualizing mathematics in 2D long before the invention of the computer screen! Archeologists believe cave paintings and sculptures appeared at roughly the same time. Since then, our species has devoted far more effort to creating 2D images than to carving in 3D.

I find it interesting that early paintings were not 2D projections of a 3D world, but a true 2D geometry where the third axis is simply non-existent. The idea of projecting 3D space with a vanishing point was not discovered until much later.

Beringela suggests what I'm going to call the Ender's Game theory: "We have evolved on what is essentially a 2D plane. When our ancestors hunted, they really only needed to think in 2D - how far away is that deer and what direction is it going. Its height above the ground didn't really matter."

This makes a lot of sense to me. Sure, reality is 3D, but for a species living on the surface of a planet with gravity, one axis is fundamentally different from the other two.

If this was the entire story, I would expect us to be happy to discard the vertical axis when playing a top-down game, but less comfortable with 2D sidescrollers where we keep the not-important-for-hunting-deer vertical axis while collapsing our usual north/south/east/west coordinate system into just a single left/right dimension. Yet most people seem equally at home with top-down or side-on 2D views.

DMG suggests our brains may simplify geometric problems by choosing between a variety of different projections: "We've evolved to naturally project 3D problems into 2D models. Need to find your way home to the cave? Refer to your mental 2D map. Need to hit a mammoth with a spear? Consider the vertical plane joining you and the mammoth."

That certainly matches my experience of using simplified coordinate systems for game AI.

Why do 2D worlds make sense?

It makes sense that our brains have evolved to be better at calculus than probability. This is a survival trait: calculus is necessary any time you want to kill a rabbit with a sling, but you can get by with only crude estimates of probability (see tiger, run: no need to bother computing the exact probability it will try to eat you :-)

What makes less sense to me is why our brains are so much better at doing math and physics in 2D than 3D.

Sure, 2D is simpler because there are fewer numbers involved, but this is more than just reducing vectors from three to two components. There is something fundamentally mind-warpingly hard about trying to visualize 3D math, where in 2D we would just sketch the problem on the back of an envelope, or construct an imaginary sketch in our mind.

Reality is 3D, or at least does a good job of appearing that way, theoretical physics notwithstanding. So how come my brain can easily remember, compare, and reason about many complex 2D scenarios, yet struggles with even simple 3D? Wouldn't it have been more useful if we had evolved a better ability to process the world the way it truly is?

It seems to me that this quirk of our mental geometry processing explains the everlasting popularity of 2D games. If you stop to think about it, the idea of flattening realistic physics onto a 2D plane is somewhat ridiculous! As babies, we learned the rules of inertia, friction, and gravity, but we have only ever experienced these things in a 3D universe. Yet when we pick up the controls of a 2D platformer, we instantly recognize when physics is working as it should, thinking "yeah, that jump felt nice and realistic", without even noticing an entire axis has been removed!

Brains are weird.

Shawn is a twit

For a guy who spends his life working with, thinking about, and talking about technology, I'm actually a bit of a luddite.

Can you believe I didn't own a cellphone until 2006?

My friends make fun of me because my car has no AC, and you have to physically wind a handle to open the windows.

And I'm one of the few people left who still buys music on physical CDs. MP3 player? What's one of them then?

But, bowing to the power of 2010, I am now on Twitter. Follow me @ShawnHargreaves

Angles, integers, and modulo arithmetic

These days most people represent angles using degrees or radians, but many old-skool game programmers preferred a binary integer format where a full circle is 65536. To convert:

    short ToBinary(float degrees)
    {
        return (short)(degrees * 65536 / 360);
    }

If you store such a value in a 16 bit integer type, it will automatically overflow any time you increment it past a full circle, so you no longer need special case wrapping code to handle the circular (modulo) nature of angle arithmetic.

When manipulating degrees using floats, you might have to write:

    angle += turnAmount;

    if (angle >= 360)
        angle -= 360;
    else if (angle < 0)
        angle += 360;

But with a binary angle represented as a short, you can get the same result with just:

    angle += turnAmount;

This simplifies many common angle computations. For instance the TurnToFace method from the Aiming sample would no longer need to bother calling WrapAngle if it used this binary format.

Back in the day most games used integer or fixed point math, so this was a major win. But today most people use floating point, and converting between integer and float formats is awkward and slow. So it is debatable whether this format is still useful.

MotoGP: AI coordinate systems

I occasionally get requests to write about game AI, especially the AI from MotoGP. I have resisted this topic for the simple reason that I never worked directly on AI code, so I don't really know much about it. But hey, this is the Internet, right? You don't have to know anything about a subject in order to write about it on the Internet :-)

Disclaimer: what follows is mostly hearsay, filtered through a couple years of forgetfulness.

My previous post reminded me how many tasks can be simplified by choosing the right coordinate system. Anyone who works with 3D graphics will be familiar with the concept of moving back and forth between different coordinate spaces, common examples being object space, world space, view space, projection space, bone space, and tangent space. Many AI problems can also benefit from their own specialized coordinate systems.

A common simplification is to collapse 3D into 2D. Even though rendering and physics may be truly 3D, decision making logic need not treat all three axes equally. MotoGP tracks have few hills, so our AI was able to ignore the y component.

Next, we switched from x/z cartesian coordinates to a track-relative system. Positions were represented by a pair of values:

  • int distance = how far around the track, stored in 16.16 fixed point format 
    • 0 = starting line
    • 0x8000 = half way around
    • 0x10000 = looped back to the start
    • 0x1C000 = three quarters of the way through the second lap

  • float cross = how far sideways across the track
    • 0 = on the center line
    • -1 = left edge of racing surface
    • 1 = right edge of racing surface

To convert between this and the cartesian coordinates used by our physics and rendering code, we stored a list of segments defining the shape of the racing surface:

    struct TrackSegment
    {
        Vector CenterPoint;
        float DistanceToLeftEdge;
        float DistanceToRightEdge;
    }

We created several hundred of these structures, spaced evenly around the track, by tessellating the Bezier curves from which the tracks were originally created. This gave us enough information to write the necessary coordinate conversion functions.

With track-relative coordinates, many useful calculations become trivially simple:

    if (abs(cross) > 1)
        // You are off the track and should steer back toward the center line


    if (this.distance > other.distance)
        // You are ahead of the other player (even though you may be
        // physically behind in 3D space if you have lapped them)


    short difference = (short)(this.distance - other.distance);

    if (abs(difference) < threshold)
        // These two bikes are physically close together,
        // so we should run obstacle avoidance checks

Because of the fixed point data format, casting the distance counter from 32 to 16 bits was an easy way to discard the lap number, so we could pick and choose which computations cared if two bikes were on different laps, versus wanting to know if they were close in physical space. Thanks to the magic of two's compliment, treating the difference as signed 16 bit gives the shortest distance regardless of which bike is in front (remember that in a modulo arithmetic system such as a looping racetrack there are two possible distances, as you can measure in either direction around the track). This works even when the two bikes are on opposite sides of the starting line, a situation which would require error prone special case logic in most other coordinate systems.

Flattening and straightening out this virtual gameplay area made it easy to reason about things like "am I on the racing line?" or "I'm coming up fast behind this other bike: do I have more room to pass them on the left or right?" which would have been tricky to implement in a full 3D world space. Once we decided to pass on the left, we would convert the resulting track-relative coordinate back into world space, at which point the curvature of the track gets taken into account, showing how we should steer to accomplish our chosen goal.

Pathfinding can also benefit from specialized coordinate systems, the London underground map being a classic example.

Bug or feature?

Writing about randomness reminded me of an interesting bug in the first commercial game I ever released. Extreme G was a futuristic racer for the Nintendo 64. Each vehicle had a limited number of turbo boosts, which increased your speed as long as you drove cleanly, but cut out as soon as you clipped the edge of the track. It was impossible to corner cleanly at turbo speed, so the best tactic was to save your boosts for the longest straight section, while trying to knock boosting opponents off the track.

Ash (the lead programmer) wrote some AI code that decided when the computer players should use their turbo. They would boost when they reached a straight piece of track, and also (for variety) at occasional randomly chosen intervals.

Shortly after the game came out, we read a review that went something like:

"We especially loved the aggressive AI, which does everything in its power to stop you overtaking. If you pass a computer player, they will fight back by firing their turbo, even midway through a turn where that is sure to cause a massive pileup. Perhaps not the best winning strategy, but an awesome f-you attitude!"

"Hah!", quoth Ash, "what a foolish reviewer! That's not how it works. I bet the random number generator just happened to trigger its turbo at the same time he was overtaking, and he read too much into that. My actual AI code is nowhere near so subtle."

But then he went back to look at this actual AI code :-)

A universal challenge of racing game design is how to keep the pack of vehicles close together. If they become too widely separated, the player can be left racing on what appears to be an empty track, with the computer players out of sight in front or behind them, which is no fun at all. In Extreme G, this problem was exacerbated by two things:

  • For performance reasons, we could not afford as many computer vehicles as we ideally wanted, so the pack was overly small in the first place.

  • Weapons and turbo boosts made the gameplay less predictable than in a pure racer, tending to spread out the vehicles.

To help keep the pack together, Ash biased the turbo AI based on race position. When a computer player was in front of the human they would rarely use their turbo, but if they were far behind they would fire it in an attempt to catch up.

What he meant to write was:

    if (computerPlayer is behind human)
    {
        ushort difference = human.TrackPos - computerPlayer.TrackPos;

        if (difference > catchUpThreshold)
            FireTurboBoost();
    }

The TrackPos field held a 16 bit counter of how far around the track each vehicle was, normalized so that 0 represented the starting line, 32768 was halfway around, and 65535 was a full lap nearly back to the start. This format was convenient because the rounding rules of integer arithmetic automatically handled the modulo operations necessary when comparing positions from different laps, or from either side of the start line.

Thanks to an order of processing bug, what actually happened was:

    if (computerPlayer is behind human)
    {
        ushort difference = human.TrackPos - computerPlayer.TrackPosFromPreviousFrame;

        if (difference > catchUpThreshold)
            FireTurboBoost();
    }

This worked as intended, except when you overtook an AI vehicle:

  • Computer was in front of human
  • Now human is in front, so this code kicks in
  • Because it erroneously uses the computer position from the previous frame, the position difference comes out negative
  • Because the type is unsigned, this is interpreted as a large positive integer
  • It looks like the computer is a long way behind, so the turbo is triggered

Oops!

But the reviewer was absolutely right. Although accidental, the resulting behavior was good gameplay. We left this bug unchanged in the sequel.

The psychology of randomness

Due to some quirk of evolution, human beings are remarkably good at intuitively approximating solutions to complex calculus problems, but appallingly bad at estimating probability.

Pretty much anyone is able to catch a ball, or judge when it is safe to pull out into traffic. These tasks require complex predictions at least up to the second derivative (involving position, velocity, and acceleration), yet our subconscious can solve them in seconds.

Our intuitions are not so smart when dealing with probability:

  • When asked to pick a random number between 1 and 100, most people will choose a number that is odd, often prime, and approximately 1/3 or 2/3 of the way between the lower and upper limits. For some reason we think these values are "more random" than other numbers. Try it sometime! It is amazing how many people will choose 67, while nobody ever goes for 50.

  • My mother, a smart lady who is generally good at math, is absolutely convinced that if you roll an (unloaded) dice 10 times and get a 6 each time, the odds of getting another 6 on your 11th attempt must be vanishingly small. Indeed, 11 in a row is unlikely, but her intuition simply cannot accept that, having already rolled 10, the odds of an 11th remain 1/6.

  • The shared birthday party trick and Monty Hall problem are famous examples of how unintuitive even simple probability questions can be.

This matters to game programmers, because we often use random numbers to control our game simulations. If I had a penny for every time I heard someone say "I think the random number generator must be broken, because the numbers I get back aren't very random", well, that would be a lot of pennies.

I ran this test program:

    Random random = new Random();

    for (int i = 0; i < 10; i++)
    {
        Trace.WriteLine(random.Next(100));
    }

Results:

    50
    76
    23
    1
    17
    16
    19    // Huh? Three similar numbers in a row
    8
    90
    90    // Yowzer! Same value twice in a row can't be right?

But these numbers are perfectly random. In fact, when you choose ten random numbers, there will often be strings of repeated or similar values. One of the most unlikely outcomes would be if the results were evenly distributed without any clustering! When we hear "random", we tend to think "evenly distributed, with a slight random perturbation over the top". These are not the same thing.

Our brains are tremendously good at spotting subtle patterns in seemingly random noise. Perhaps too good. We are so focused on trying to identify a pattern that our subconscious fixates on even the slightest deviation from an even distribution. A truly random series of numbers will never be evenly spaced, but when you combine a talent for identifying patterns with a poor understanding of probability, it's no wonder our poor little brains get confused :-)

Try this experiment some time: pick any movie other than Wizard of Oz, any music other than Dark Side of the Moon (preferably things you know well), and play them at the same time. Isn't it amazing how well they fit together? This occurs because our pattern detection skills focus on the handful of places where things do randomly happen to line up, and we are too bad at probability to contrast this with the much larger number of things that don't line up at all.

As game developers, we often use random number generators as an approximation for things that are not truly random at all, but which are too complex for us to properly evaluate.  We could simulate vegetation growth patterns, soil erosion, light and shade, the impacts of logging and lightning fires and deer grazing and the wind shadow of that distant mountain range. Nah. Let's just randomly scatter our tree sprites over the hillside...

The trick to getting good results from random numbers is to think about what characteristics we would like the resulting data to have, and understand that a series of truly independent random numbers is rarely the best distribution.

When planting trees, rather than just repeating a random position selection, we could cover our terrain with a regular grid. Perhaps we could perturb the grid on a coarse level, making some areas have smaller and denser cells than others. We can then plant one tree per cell, including a small random offset to hide the repeating pattern, and perhaps randomly skipping some cells while planting more than one tree in others. The resulting distribution will be far from random in the mathematical sense, but will look more random to the average human being, and that's what counts, right?

Likewise, a random song shuffle algorithm should avoid repeating the same song too close together. Sure, playing it twice in a row is a valid random result, but probably not what the user wanted!

Randomness in games is usually an approximation for something more complex, and not independent samples like you get when tossing a coin, rolling a dice, or calling Random.Next(). By understanding this, we can process our random numbers to produce results that are less random, yet more convincing and artistically pleasing.

Algorithm versus implementation

When I first started making games, most things were written in C, with critical pieces optimized in assembly language. A skilled assembly programmer could beat the C compilers of the day by a factor of two or more, so this was an important optimization (for instance the blit routines in Allegro were all written in assembly).

But assembly language could be a dangerous pitfall. Rewriting a function in assembly language changed the details of the way it worked (the how, or implementation), but did not fundamentally alter its purpose (the what, or algorithm). Doing the same basic thing in a more efficient way could give a 2x, 3x, or even 4x speed boost, but larger gains could only be achieved by switching to a smarter algorithm.

The danger of assembly language was that it made code faster, but also more complex and harder to understand. By obscuring the underlying algorithm, this 2x optimization often hid the potential for much more significant algorithmic improvements.

I was reminded of this by a recent thread on the creators.xna.com forums, where a word search algorithm was taking more than half a second. Early suggestions focused on optimizing the implementation:

  • Use StringBuilder to reduce garbage
  • Pack short text strings into 64 bit integers to speed comparison
  • Use threads to parallelize the work across multiple CPU cores

These techniques are the C# equivalent of rewriting something in assembly language. They did indeed work, reducing the half second search time to 80 milliseconds (a 6x gain), but there is a fairly low upper bound on how much such things can help. For instance on a quad core machine, threaded code can run at most 4x faster than a single core version (in practice the gain is usually far less).

Then Jon Watte suggested a better algorithm, using a tree structure to replace the original O(N!) (factorial) algorithm with a log(N) version.

Result? The search now completes in less than 1 millisecond. That's more than a 500x improvement!

Just goes to show that, while languages and programming techniques continue to evolve, there are still few things as important as the fundamentals of algorithms and data structures.

Premultiplied alpha content processor

As mentioned in my previous post, a Content Processor that converts textures into premultiplied alpha format:

using Microsoft.Xna.Framework;
using Microsoft.Xna.Framework.Content.Pipeline;
using Microsoft.Xna.Framework.Content.Pipeline.Graphics;
using Microsoft.Xna.Framework.Content.Pipeline.Processors;

namespace PremultipliedAlpha
{
    [ContentProcessor]
    class PremultipliedAlphaTextureProcessor : TextureProcessor
    {
        public override TextureContent Process(TextureContent input, ContentProcessorContext context)
        {
            input.ConvertBitmapType(typeof(PixelBitmapContent<Color>));

            foreach (MipmapChain mipChain in input.Faces)
            {
                foreach (PixelBitmapContent<Color> bitmap in mipChain)
                {
                    for (int y = 0; y < bitmap.Height; y++)
                    {
                        for (int x = 0; x < bitmap.Width; x++)
                        {
                            Color c = bitmap.GetPixel(x, y);

                            c.R = (byte)(c.R * c.A / 255);
                            c.G = (byte)(c.G * c.A / 255);
                            c.B = (byte)(c.B * c.A / 255);

                            bitmap.SetPixel(x, y, c);
                        }
                    }
                }
            }

            return base.Process(input, context);
        }
    }
}

Premultiplied alpha in XNA Game Studio

It is possible to use premultiplied alpha with XNA Game Studio, but the bad news is we don't do much to help you with it.

Why not?

Yeah.  Our bad.  In fact one of my biggest regrets about the design of the XNA Framework is that we didn't do more to make this easier!

To use premultiplied alpha, you must do three things:

 

1 – Set The Blend State

Premultiplied alpha blending is configured like this:

    graphicsDevice.RenderState.AlphaBlendEnable = true;
    graphicsDevice.RenderState.SourceBlend = Blend.One; 
    graphicsDevice.RenderState.DestinationBlend = Blend.InverseSourceAlpha; 

Or if you are using SpriteBatch:

    spriteBatch.Begin(SpriteBlendMode.AlphaBlend, SpriteSortMode.Immediate, SaveStateMode.None);
    graphicsDevice.RenderState.SourceBlend = Blend.One; 

 

2 – Premultiply Your Colors

Premultiplied blending only works if all your color values are in premultiplied format. But most paint programs and image file formats do not use premultiplied alpha! So we must find a good place to apply this conversion.

For most games, both 2D and 3D, the flow of color data goes something like this:

Untitled

  • Color values are edited in a paint program, then saved into a .bmp or .png file, which is built into .xnb format by the Content Pipeline, then loaded into a Texture2D object.

  • Other colors may be used to tint the texture. These can be specified as part of the vertex data, or passed as the color parameter to SpriteBatch.Draw.

  • Still more colors may be computed on the fly (eg. realtime lighting).

  • All these colors are passed into the shader, which could do all kinds of interesting things with them, but most often just multiplies them together.

  • The shader produces a final combined color, which is passed to the alpha blending operation.

There are several options for where in this process we choose to convert from conventional to premultiplied format. Two in particular I think can be sensible depending on the situation:

Convert colors at the end of the pixel shader Convert colors in a custom Content Processor
Add "result.rgb *= result.a" right before the end of all your shaders Write a PremultipliedAlphaTextureProcessor, which automatically converts all your textures to premultiplied format while they are being built
Requires custom pixel shaders Does not require custom shaders, so works with SpriteBatch, BasicEffect, etc.
Premultiplication happens after the shader has processed any tint colors, so tints are still specified in conventional, non-premultiplied format All runtime colors, including tint values, are specified in premultiplied format
Alpha blending is done with premultiplied colors, so image composition works properly, but texture filtering happens before the premultiply conversion, so alpha cutouts remain a problem All rendering uses premultiplied colors, so both image composition and alpha cutouts work nicely
Reasonably easy retrofit to existing code, or even just specific parts of that code, as long as you have custom shaders Affects everywhere you do color math, which may be a lot of places, so it can be a pain to retrofit if you don't plan this from the start

 

3 – Use The Right Math

Any time you do computations on color values, you need to use the right math for the type of colors you are dealing with. Some things to bear in mind:

  • Keep track of which data is in which format. Mixing premultiplied with conventional colors is a recipe for chaos!

  • This distinction is only important for colors that have fractional or zero alpha. Opaque colors are the same in both formats, so nothing changes for computations that use RGB color without alpha.

  • Conventional colors change opacity by leaving RGB alone while decreasing alpha. For instance to draw a half transparent orange sprite we would specify a tint of (255, 128, 0, 128). But with premultiplied colors, we must also decrease the RGB values, so that same tint would be specified as (128, 64, 0, 128).

  • By far the most common color operation is tinting, which is done by multiplying two colors together (for instance this is how SpriteBatch.Draw combines its texture and color parameter). Color multiplication works the same for premultiplied and conventional colors, as long as both operands are in the same format.

  • This means that SpriteBatch works as-is with premultiplied data.

  • BasicEffect diffuse lighting and vertex coloring also use color multiplication, and therefore work correctly with premultiplied data.

  • The BasicEffect specular lighting and fog computations will not work correctly with premultiplied data. If you want specular lighting or fog while using premultiplied alpha, you will have to write a custom shader.

Premultiplied alpha and image composition

From Wikipedia:

"Alpha compositing is the process of combining an image with a background to create the appearance of partial transparency. It is often useful to render image elements in separate passes, and then combine the resulting multiple 2D images into a single, final image in a process called compositing."

 

Example

I am making a Pong game. I start by clearing my background to CornflowerBlue (100, 149, 237). I then draw a translucent grey box (128, 128, 128, 128) at the top of the screen, which will form the background of my score display. With conventional alpha blending:

((128, 128, 128) * 0.5)  +  ((100, 149, 237) * (1 - 0.5))  =  (114, 138, 182)

Well and good. But over time, my overlays get more complex, including a map, radar, waypoint indicators, mission objectives, leaderboard rankings, etc. (what can I say, this is a complex Pong game :-)  To avoid redrawing so much stuff every frame, wouldn't it be cool if I could draw the overlays just once to a rendertarget, then copy the resulting static image over the top of my animating gameplay?

 

A Problem

Unfortunately, this doesn't work using conventional alpha blending:

Clear rendertarget to (0, 0, 0, 0)

Blend 50% grey into the rendertarget:

      ((128, 128, 128, 128) * 0.5)  +  ((0, 0, 0, 0) * (1 - 0.5))  =  (64, 64, 64, 64)

Clear backbuffer to CornflowerBlue (100, 149, 237)

Blend rendertarget over the backbuffer:

      ((64, 64, 64) * 0.25)  +  ((100, 149, 237) * (1 - 0.25))  =  (91, 127, 194)

Whoah! That's not even remotely the same result as before. The blend operation is happening twice: first when we draw into the rendertarget, then again when we draw the rendertarget into the backbuffer. Our alpha value ends up getting squared, so 0.5 becomes 0.25.

We could fix this by removing one of the blend operations:

  • Disable blending while drawing to the rendertarget
  • Or disable blending while drawing the rendertarget over the backbuffer

But these are rarely good solutions, since you most likely need blending in both places  (the overlays in my Pong game are built from many layers of alpha blended graphics, and the gameplay scene should be visible behind them).

 

Some Math

Given a series of drawing operations which blend over the top of each other:

result = a -> b -> c -> d

we would like to be able to group a set of calls from the middle of this sequence, storing their result in a rendertarget, then later replace that set of calls with the contents of the rendertarget:

tmp = b -> c

result = a -> tmp -> d

The problem is that this changes the order of the blend operations. What was:

result = blend(blend(blend(a, b), c), d)

becomes:

result = blend(blend(a, blend(b, c)), d)

Because conventional alpha blending is not associative, changing the order of evaluation changes the result.

 

The Solution

Premultiplied alpha blending is associative, so any number of blending operations can be grouped and reordered without affecting the final result.

To use premultiplied alpha, we make two changes:

  • Convert our source colors into premultiplied format, so translucent grey becomes (64, 64, 64, 128) rather than (128, 128, 128, 128)

  • Change RenderState.SourceBlend from Blend.SourceAlpha to Blend.One

Working through the same example as before:

Clear rendertarget to (0, 0, 0, 0)

Blend 50% grey into the rendertarget:

      (64, 64, 64, 128) +  ((0, 0, 0, 0) * (1 - 0.5))  =  (64, 64, 64, 128)

Clear backbuffer to CornflowerBlue (100, 149, 237)

Blend rendertarget over the backbuffer:

      (64, 64, 64) +  ((100, 149, 237) * (1 - 0.5))  =  (114, 138, 182)

Tada! That's the same result as when we were drawing everything directly to the backbuffer.

Premultiplied alpha r0x0rz.

 

Historical Aside

Alvy Ray Smith writes about the invention of the alpha channel and image composition. I find it interesting that this was invented in the 1970s, and fully grokked as of the classic Porter-Duff paper in 1984, yet here we are a quarter of a century later, still having a hard time making composition work right because for some reason premultiplied alpha never became as widely understood as it deserves to be.

Premultiplied alpha

Remember when you first figured out Santa Claus wasn't real? The growing doubt, tempered by the fact that all your friends believed in him, and surely they can't ALL be wrong, then the gradual realization that everybody was in fact wrong...

Well, I've got another one for you: the way most people do alpha blending is bogus!

At a fundamental level, alpha has no special meaning at all. Graphics cards manipulate numeric values, but what they do with these values is entirely determined by what shader code you write and which render states you set. The hardware works with Vector4 data types. Because these often represent colors, it is natural to use the first three components for red, green, and blue. That leaves a spare component, called alpha, which can be used for absolutely anything we like.

Some people use alpha to store shininess, or ambient occlusion, or a collision material ID. In the MotoGP particle system I used it to pass a per-particle random number into my vertex shader physics. But most often, alpha is used to represent transparency. It is important to understand that this is just a convention, and there are several different ways it can be done.

 

Conventional Alpha Blending

The majority of programmers, programs, programming APIs, file formats, etc, define transparency as:

  • RGB specifies the color of the object
  • Alpha specifies how solid it is

In math:

blend(source, dest)  =  (source.rgb * source.a) + (dest.rgb * (1 - source.a))

In code:

    RenderState.SourceBlend = Blend.SourceAlpha; 
    RenderState.DestinationBlend = Blend.InverseSourceAlpha; 

In this world, RGB and alpha are independent. You can change one without affecting the other. Even when an object is fully transparent it still has the same RGB as if it was opaque. Thus 100% transparency can be represented by many different color values.

There isn't really a direct physical analogy for this. I guess it's similar to how a magic cloak of invisibility works in a fantasy universe, where I can be wearing the same red sweater as I am right now, and it remains red even though it currently happens to be invisible.

 

Premultiplied Alpha Blending

Here's a different way to think about transparency:

  • RGB specifies how much color the object contributes to the scene
  • Alpha specifies how much it obscures whatever is behind it

In math:

blend(source, dest)  =  source.rgb + (dest.rgb * (1 - source.a))

In code:

    RenderState.SourceBlend = Blend.One; 
    RenderState.DestinationBlend = Blend.InverseSourceAlpha; 

In this world, RGB and alpha are linked. To make an object transparent you must reduce both its RGB (to contribute less color) and also its alpha (to obscure less of whatever is behind it). Fully transparent objects no longer have any RGB color, so there is only one value that represents 100% transparency (RGB and alpha all zero).

This is more like how light behaves in the real world. What is the RGB of my car windscreen? None: it is transparent, so has no color. How about my sunglasses? These have a fractional alpha value (letting some light some through, while blocking some) and also contribute some RGB for that nice rose-tinted glow.

To use premultiplied alpha, in addition to setting the appropriate renderstates, you must also convert your source graphics into premultiplied format. Drawing a non premultiplied color with premultiplied blending will not give sensible results!

To convert a non premultiplied color into premultiplied format:

color.rgb *= color.a

(hence why this is called "premultiplied" format)

Look at the blend equations for conventional vs. premultiplied alpha. If you substitute this color format conversion into the premultiplied blend function, you get the conventional blend function, so either way produces the same end result. The difference is that premultiplied alpha applies the (source.rgb * source.a) computation as a preprocess rather than inside the blending hardware.

I will write more about how to convert graphics into premultiplied format in a later post.

 

Why premultiplied r0x0rz

Premultiplied alpha is better than conventional blending for several reasons:

  • It works properly when filtering alpha cutouts (see below)

  • It works properly when doing image composition (stay tuned for my next post)

  • It is a superset of both conventional and additive blending. If you set alpha to zero while RGB is non zero, you get an additive blend. This can be handy for particle systems that want to smoothly transition from additive glowing sparks to dark pieces of soot as the particles age.

  • It plays nice with DXT compression, which only supports transparent pixels with an RGB of zero.

 

Premultiplied alpha cutouts

Remember how filtering can produce ugly fringes around the edges of alpha cutouts? Not a problem when using premultiplied alpha! Revisiting the example from my previous post:

tree  =  (0, 255, 0, 255)

border  =  (0, 0, 0, 0)

background  =  (0, 0, 255)

filtered  =  (tree + border) / 2  =  (0, 128, 0, 128)

With conventional alpha blending, the result is darker than we wanted:

result  =  lerp(background, filtered.rgb, filtered.a)  =  (0, 64, 128)

But premultiplied blending produces the right answer:

result  =  filtered.rgb + (background * filtered.a)  =  (0, 128, 128)

This works because texture filtering and premultiplied blending are both linear transforms, and are therefore associative:

blend(filter(a, b),  c)  ==  filter(blend(a, c),  blend(b, c))

Texture filtering: alpha cutouts

Consider a cutout texture that contains a solid shape surrounded by transparency. Let's say this is a tree, although it could equally well be a cat or an overweight Italian plumber. Our tree is opaque and colored green:

tree  =  (0, 255, 0, 255)

The surrounding pixels are transparent:

border  =  (0, 0, 0, 0)

We are drawing over the top of a blue sky:

background  =  (0, 0, 255)

If we draw without filtering, everything works as expected. Opaque pixels will replace the background, leaving solid green, while the transparent pixels have no effect, leaving solid blue.

 

Problem

What if the texture needs to be filtered? For instance our tree could be positioned in such a way that some destination pixels are covered half by opaque green and half by the transparent border. First the filtering hardware interpolates between these two parts of the texture:

filtered  =  (tree + border) / 2  =  (0, 128, 0, 128)

Now the alpha blending hardware combines this filtered color with our blue background:

result  =  lerp(background, filtered.rgb, filtered.a)  =  (0, 64, 128)

Huh? Halfway between a green tree and blue background should be (0, 128, 128), not (0, 64, 128). The output is darker than we wanted.

It seems logical that if a pixel has zero alpha, its RGB value should be irrelevant, right? Not so when filtering is enabled...

Filtering applies equally to the RGB and alpha channels. When used on the alpha channel of a cutout texture it will produce new fractional alpha values around the edge of the shape, which makes things look nice and antialiased. But filtering also produces new RGB colors, part way in between the RGB of the solid and transparent parts of the texture. Thus the RGB values of supposedly transparent pixels can bleed into our final image.

This most often results in dark borders around alpha cutouts, since the RGB of transparent pixels is often black. Depending on the texture, the bleeding could alternatively be white, pink, etc.

 

Alternative Explanation For The Mathematically Inclined

What we really wanted was:

filter( blend(tree, background),  blend(border, background) )

But instead we got:

blend( filter(tree, border),  background )

Because texture filtering and alpha blending are not associative, these do not produce the same result.

 

Workaround

What if we change our tree texture to have green in the RGB channels of its transparent areas, replacing (0, 0, 0, 0) with (0, 255, 0, 0)? We will still get color bleeding around the edges, but because the transparent RGB now matches the color of the main image, this will not look so ugly.

This workaround can be useful, but is far from perfect:

  • Not every paint program is able to edit the RGB values of transparent pixels.

  • Even if your paint program supports this, beware of codecs that may discard these colors when saving out the image, incorrectly figuring that since these pixels are transparent, their RGB values must be irrelevant (for some reason .png codecs seem especially bad at this).

  • Can't use DXT1 compression, which only supports transparent pixels with an RGB of zero.

If you get bored of having to manually fix-up alpha cutout textures, you could automate it using a custom content processor. This could check your textures as part of the build process, automatically replacing the RGB of transparent pixels with a copy of the RGB from the closest opaque pixel.

 

Solution

The best way to fix this is by using premultiplied alpha.  Stay tuned for my next post...

New Game Studio samples

Hot off the press, this sample provides handy classes for generating cubes, spheres, and teapots (great for debug rendering!) while this one shows how to do various hopefully useful things with skinned character models.

Notice how Skinned Model Extensions reuses code from the Primitives3D sample for drawing the collision spheres? You'd almost think Primitives3D was created because halfway through writing the skinning stuff I realized I had no way to display that data :-)

Texture filtering: mipmapped sprite sheets

Beringela comments on my previous post:

"Why don't spritesheets (with an edge around each sprite) and mipmaps go well together and are there any workarounds other than not using spritesheets or not using mipmaps?"

Let's work through an example, which I shall simplify down to one dimension for clarity. Let's say we have two textures, sized 4x1:

A B C D

and 2x1:

E F

With gutters, we could pack these into a 10x1 sprite sheet:

a A B C D d e E F f

Now let's compute the first mip level:

A (B+C)/2 D E F

Hmm. That does somewhat resemble the original sheet, but it no longer has proper gutters, so filtering will bleed from one sprite to the other.

Real trouble starts with the second mip level:

(A*2 + B + C + D) / 5 (D + E*2 + F*2) / 5

Our second sprite has disappeared entirely! That rightmost pixel includes colors E and F from sprite #2, but also some amount of color D from sprite #1.

And of course the final mip is entirely meaningless:

(A*2 + B + C + D*2 + E*2 + F*2) / 10

There are two ways you can try to make sprite sheets and mipmaps play together:

  • Wishful thinking: include gutters wider than a single pixel, cross your fingers, and hope it turns out ok. Any time you see color bleeding problems, increase the gutter size until they go away.

  • The scientific approach: restrict all sprites in a sheet to be the same size, so you can terminate your mip chain at the point where an individual sprite has reached 1x1. Calculate the gutter size so you will still have an entire pixel gutter at this smallest mip level, with corresponding larger gutters for larger mips. Generate mip images separately for each sprite, rather than for the sheet as a whole, and regenerate the gutter with different colors for each mip. This is a considerable pain, and wastes a lot of memory (you end up with significantly more gutter pixels than actual data!) but it does work.

Personally, I would avoid using mipmaps and sprite sheets at the same time. Sprite sheets are never necessary: they're just an optimization to reduce texture switching. When an optimization ends up causing this amount of trouble, you have to wonder if it's really worth it?

More Posts Next page »
 
Page view tracker