If you've ever done work with Web graphics, I'm sure that at some point you reduced an image with thousands of colors, such as a screenshot or photograph, to an image with only 256 colors, for example to store the image in the GIF format. Remarkably, the reduced image is usually very visually similar to the original image, despite the massive loss of information. Most paint programs are capable of performing this transformation, and operating systems such as Windows will perform the transformation on-the-fly when the user is running a video mode with only 256 colors.  But how does it work?

In the literature, this problem is called the color quantization problem -- generally, the word "quantization" refers to any process that maps several values to a single representative value. In this case, the values that we're mapping are colors. There are two main steps to color quantization:

  1. Generate a good palette
  2. Dither

Generate a good palette

First we'll talk about generating a good palette. The simplest approach is to use a fixed palette for all images. We can map pixels in the source image to pixels in the reduced image by simply choosing the closest color. What do we mean by "closest"? Well, we can identify each color with a point in three-dimensional space whose coordinates are the red, green, and blue values of the color.  Then, we consider two colors to be close if their points are close in this three-dimensional space.  This isn't ideal, but it's very simple.

The fixed palette approach has the advantage that all images using the same palette can be rendered on the same screen simultaneously; unfortunately, it severely compromises image quality. Making this even worse is that typical fixed palettes, such as the Windows standard palette and the web colors palette, are chosen as uniform palettes, meaning that the colors are evenly spaced throughout RGB space. This is a terrible way to construct a palette, as humans can distinguish much finer color differences in some areas of RGB space than others.

Below, we show a photograph of a rose taken by Stan Shebs, but modified to remove its blue channel. This means that all of its colors lie in a single plane, the plane blue = 0. That color plane is shown in the graph below.  We also show a 16-color optimized palette found by Adobe Photoshop - each blue point is a palette color, and each region shows the section of the color space mapping to that palette color. As you can see, the palette is far from uniform, and there are many colors that the image doesn't even use.

From this picture, we can see that generating a good palette is essentially a point clustering problem. We want to identify clusters in the color space that contain a large number of points close together and assign a single palette entry to those colors.

One of the most popular algorithms for this problem is called the median cut algorithm. It was invented by Paul Heckburt in 1980 in his paper "Color Image Quantization for Frame Buffer Display" and is still in wide use today. The algorithm follows just a few basic steps:

  1. Create a block or box surrounding the points; it should be just large enough to contain them all.
  2. While the number of blocks is less than the desired palette size:
    1. Find the block with the longest side out of all blocks.
    2. Cut this block into two blocks along its longest side. Cut it in such a way that half of the points inside the block fall into each of the two new blocks (that is, we split it through the median, thus the name).
    3. Shrink these two new boxes so that they just barely contain their points.
  3. Average the points in each box to obtain the final set of palette colors.

Despite a number of more sophisticated modern algorithms, this strikingly simple approach is adequate for generating 256 color palettes for most images. You can read about a C++ implemention of the median cut algorithm at my wiki.

Dither

If the palette is large enough and of high enough quality, Reducing by nearest color is usually sufficient. If not, however, distinct artifacts will be visible, such as banding, where what were once smooth gradients become striped regions (see this image; look closely at the cat's neck). One effective technique for dealing with these palette limitations is called dithering, which mixes dots of different colors to simulate a larger number of colors and smooth out gradients. For example, if you mix a grid of red and blue dots, the result will appear purple. Dithering is also commonly used in printing, where for example a grayscale image can be simulated with great accuracy using a higher-resolution dithered black and white image.

The trick to dithering is to effectively simulate the original color in each area of the image while avoiding telltale artifacts formed by poor dithering patterns; see ordered dithering, commonly used in newspapers, as an example of a dithering scheme with clearly visible artifacts. One of the most popular dithering schemes to this day is the Floyd-Steinberg algorithm, a classic dithering algorithm falling into the general class of error diffusion dithering algorithms.

The basic concept behind error diffusion is to begin with nearest color matching, but whenever we can't match a pixel color exactly, the error between the original color and the palletized color is distributed into adjacent pixels not yet visited. In Floyd-Steinberg, the pixel to the right receives 7/16 of the error, the pixel below 5/16, the pixel below and to the left 3/16, and the pixel below and to the right 1/16. When these pixels are visited, this added error affects the palette entry selected for them, leading to an average colour over that portion of the image close to the actual average color of the pixels in that area.

What's great about Floyd-Steinberg is that it generalizes easily to reducing full 24-bit color images to palettes of any size. The only modification we need to make is that we perform the error distribution step for each of the three channels. The palette entry is still chosen by nearest color. You can read about a C implementation of Floyd-Steinberg dithering on my wiki.

This isn't the end of the story of course - there is ongoing research and all kinds of fancy variations on these approaches, many of which can produce much nicer results than these classical algorithms. But at least now the next time you see a GIF or 8-bit PNG file on the web, or use a computer in 256-color mode, you can appreciate the algorithms under the cover that helped make your images look good under these severe limitations.