The fine folks at the Graphics and Media Lab at Moscow State University (MSU) have completed a comparison of Windows Media Photo and nine different implementations of JPEG-2000.  Their results are right in line with our own analysis; it confirms our message that Windows Media Photo delivers image quality comparable to JPEG-2000 and significantly better than JPEG.

The complete report is available here: 

http://www.compression.ru/video/codec_comparison/wmp_codecs_comparison_en.html

Reviewing the Results

While I'm pleased to see this report and I agree with all the objective results, I do have a few comments related to the assertions, conclusions and the test methodology.

First, the report implies we've claimed Windows Media Photo delivers better compression than JPEG-2000.  In fact, we've claimed on many occasions that our compression efficiency is comparable to JPEG-2000, and overall the format delivers a number of significant advantages.  Let's take a look at all these points in more detail.

The MSU report sites our presentation earlier this year at WinHEC 2006.  We presented one slide (and one portion of the demo) that included references to JPEG-2000.  Here's the slide:

For this particular image, the measured RGB peak signal-to-noise ratio (PSNR) of Windows Media Photo is slightly better than JPEG-2000 except at very high compression ratios (at the far left of the curves.)  These results will vary significantly based on the image content, but these are pretty typical results.  In this case, we're comparing with the JPEG-2000 encoder/decoder provided by Kakadu Software (http://www.kakadusoftware.com), which we've used as our baseline reference.

The Tale of Multiple JPEG-2000's

In the MSU report, their tests show Kakadu Software's JPEG-2000 implementation to deliver the poorest quality results of the nine implementations they tested.  Frankly, I was surprised that there was that much difference among different implementations of JPEG-2000.  One of Kakadu Software's significant claims is their computational performance: "Kakadu offers higher processing throughput, lower memory consumption and many more features than the [JPEG-2000 Verification Model.]"  While the MSU report doesn't provide any details on this topic, typically different implementations of the JPEG-2000 encoder/decoder achieve better compression quality at the expense of additional processing.  Since we have designed Windows Media Photo from the start for very high performance and a low memory footprint (suitable for implementation in cameras), we chose the highest performance version of JPEG-2000 we could find for comparison.  We'll certainly take a look at including other JPEG-2000 implementations in future comparisons we present.

Again, I was a bit surprised with the range of quality differences among the different implementations of JPEG-2000.  The design of basic Windows Media Photo compression algorithm doesn't leave room for that big a range of differences among different implementations.  This is in part because our algorithm supports both lossy and lossless compression - by definition, all implementations will return identical results with lossless compression.  While we believe there is a lot of room for implementers to optimize the performance for specific platforms, the design of the algorithm promises a more consistent level of quality for all implementations.

While not implemented in our current encoder, the Windows Media Photo compressed format is designed to support adaptive quantization.  Rather than use a single quantization value for the entire image, a smarter (and more processor intensive) encoder can analyze the image and adjust the quantization dynamically (at macro-block resolution) based on the image content.  This will enable a new class of smart encoders in the future that should deliver dramatic improvements in overall compression quality.

Looking Beyond Compression Quality

Overall, the MSU results show Windows Media Photo as "middle of the pack" compared to the nine different JPEG-2000 implementations, which fully supports the claims we have made of delivering image quality comparable to JPEG-2000.

When we look beyond compression quality, Windows Media Photo provides a number of key advantages compared to JPEG-2000:

  • Windows Media Photo provides both lossless and lossy compression using the exact same algorithm.  Any implementation will provide both compression options.  This is not possible with JPEG-2000; two separate algorithms are required, and in most cases only the lossy version is implemented.
  • Windows Media Photo is designed for lightweight processors and low memory footprints.  The entire algorithm uses only integer mathematics, with no complex operations (mostly adds and shifts; limited multiplies and no divisions.)  The image can be processed in segments, minimizing the amount of memory required.  Overall it is ideally suited for implementation in low-end DSP's or even ASICS; this has not proved possible with the much more complex algorithm used for JPEG-2000.
  • Windows Media Photo supports an extremely wide range of pixel formats (discussed in detail elsewhere in this blog.)  JPEG-2000 has far less flexible support for image formats.
  • The Windows Media Photo compressed bit stream design enables future implementation of adaptive quantization encoders, providing the option for significant quality improvements when processing power is available.

It's important to emphasize the benefits of a significantly more efficient, lightweight algorithm.  Windows Media Photo can (and has been) implemented in low-end digital signal processors, typical of those used in consumer cameras.  It could also be implemented in very low cost ASIC's.  The JPEG-2000 algorithm is considerably more complex and is not appropriate for consumer device implementation.  Even if sufficiently capable processor becomes cost effective, JPEG-2000's more complex algorithm will invariably require more power, significantly reducing battery life in portable consumer devices.

Not All Compression Errors are Equal

Getting back to image quality, while the measured PSNR between Windows Media Photo and JPEG-2000 are comparable, the nature of the artifacts from lossy compression between the two formats are very different.  Depending on the particular image content (and the amount of compression) the effect of these differences can favor one approach over the other.

JPEG-2000 uses wavelet technology.  As the compression ratio is increased, image detail is sacrificed.  While this can often deliver visually pleasing results, it can damage an image in a way that cannot easily be recovered.  (Once the detail is gone, it's hard to ever get it back again.)

Windows Media Photo delivers a more random distribution of error from lossy compression, independent of the frequency of the image content.  The end result is artifacts that are closer to random noise, and in many cases, this is significantly less damaging to an image, especially when it will be encoded/decoded multiple times.

Here's an example (from my demo at WinHEC):

In this example, the uncompressed image in upper left has been compressed to 12.5% of the original size using JPEG, JPEG-2000 (from Kakadu Software) and Windows Media Photo.  What is displayed here are the differences - the compressed image is subtracted from the uncompressed image and the absolute value is amplified so we can compare the nature of the errors.

From this we can clearly see that JPEG-2000 errors are concentrated in the portions of the image with high frequency content. (That pink jacket has a visible weave compared to the smooth fabric of the blue shirt.)  Windows Media Photo has a more random error distribution, resulting in less error in the high frequency areas, but more error in the out-of-focus low frequency areas.  Overall, our experience shows this is less damaging to the integrity of the original image.

It will certainly be interesting to prepare a similar comparison using one of the very high quality JPEG-2000 implementations identified in the MSU analysis.  I will definitely pick up a copy of ACDSee and share the results of this comparison based on that implementation of JPEG-2000.

A Closer Look at PSNR

I's also like to make a few comments on the test methodology used in the MSU analysis.  Again, overall I applaud their efforts and I don't disagree with any of their measured results.  While PSNR is a very useful objective measurement of the total errors from compression, it is only one metric of image quality.  The MSU report does present visual comparisons, but this requires a high compression ratio to generate visual differences.  These high compression ratios are not typical of normal usage.

It's generally accepted when evaluating image quality using PSNR measurements, an image with a PSNR value above 45db is visually indistinguishable from the original.  At the other extreme, when the PSNR value of an image is below 35db, the quality is sub-standard; it may be acceptable for web or reduced resolution display, but not for subsequent editing or re-encoding.  That is the quality range we have focused during the development, testing and tuning of Windows Media Photo.

I am a little surprised that the MSU researchers only measured Y-PSNR.  If I understand their methodology, they are only measuring noise in the luminance channel.  This is measuring the sum of a portion of each of the three color channels.  This type of measurement doesn't account for chroma errors, which are very common in many compression technologies that favor luminance quality over chrominance quality.  The measurements I presented above are RGB-PSNR; we measure the total PSNR across all three channels in an RGB color space.  This is the error that matters for image display and editing, since this is the color space that is used.  For comparison, here's the complete table of my measurements for the image of the boy shown above:

psnrRGB includes measurements of all the image content, not just luminance, and therefore for all codecs and compression ratios, the psnrRGB values are lower than the psnrY (luminance-only) measurements.  The JPEG-2000 measurements in the table above are using the Kakadu Software implementation.  It will be very interesting to look at the same measurements with one or more of the higher quality JPEG-2000 implementations to see how much of their improvements are in the luminance channel only, or if there is equal improvements in the total RGB PSNR measurement.

Lenna is Not as Young as She Used to Be

My bigger concern is the choice of images used for the MSU evaluation.  In all fairness, these are standard images for quality comparison; they are from an informal library of the universally used set of images that have been the baseline standard for compression evaluation for over 30 years.

But that's the problem.  All of these images are very far removed from the typical images (digital photos) that we all use today.  The "House" and "Lighthouse" images are from a set of test images produced by Kodak many years ago.  "Lenna" was scanned from the printed centerfold of the November 1972 issue of Playboy magazine.  (Oh, those crazy compression scientists!)  I'm not sure of the provenance of "Barbara", but I believe it dates from the same time.

All these photos were originally captured on film, underwent an unspecified chain of optical and/or digital processing, and were then scanned either from film or print.  The files used for the MSU tests are 512x512 pixels (1/4 Mbyte.)  This down-sampling removes significant detail and virtually erases any film grain or other noise-like content that would be typical of digital photos.  These images are very different from the typical digital camera photos that are the most common content we deal with today (and what we've used to optimize and tune Windows Media Photo.)  These low-detail, down-sampled images (especially the very soft scanned-from-print image of Lenna) are also very friendly to JPEG-2000's wavelet compression.

The MSU results are valid (and typical of the way image compression quality comparisons have been done for years.)  However, they're not as meaningful for today's use of image compression as they could be if full resolution original digital photos were used for comparison instead.

The two examples I showed above use images that originated from digital cameras.  They were shot in RAW mode, converted to an uncompressed 16bpc TIFF file, then reduced to an uncompressed 8bpc BMP file that was used as the source for compression into the various formats.  These are two photos from a reference set of 30 different photos I regularly use for Windows Media Photo testing.  This library of photos includes examples from a wide range of digital cameras, representing virtually every type of sensor technology in use today (CMOS, CCD, Foveon, SuperCCD-SR, RGBE.)  They also include a wide variety of "typical" digital photos, including snapshots, landscapes, night shots, flash photography and more.

Here's a preview:

I believe this image library better represents the images we use today, and are more appropriate for comparing different compression solutions.  I also believe it's essential to perform compression tests using the full resolution images as originally captured, not a down-sampled smaller image.

So, my homework assignment is to package up this image library and make it freely available for general use.  Additionally, I'll prepare and publish updated test results comparing Windows Media Photo and at least a couple implementations of JPEG-2000 using these test images.

Summary

My thanks to the researchers at the Graphics and Media Lab at Moscow State University for adding Windows Media Photo to the many compression technologies they have analyzed.  We welcome and appreciate any and all independent reviews of this new file format.  Their report is comprehensive and informative, and while (as I've described above) I don't think it tells the whole story of Windows Media Photo, it's certainly an accurate summary of PSNR-measured quality of this standard set of test images compared to JPEG-2000.  It confirms our own results and gives me some additional insights on future test and analysis procedures we can use moving forward.