Encoding Windows Media Photo Files - Part 1: DPK Tools
This is the first installment of a series of blogs on creating Windows Media Photo files. This series will cover the tools that are currently available, an explanation of the various encoding options, and recommendations and guidelines on how to achieve the best results for a wide variety of different scenarios.
Part 1 starts out with an overview of the tools currently available. The bulk of this blog provides a detailed discussion of the encoder provided as part of the Device Porting Kit (DPK), including how to download it if you don’t have the DPK and a detailed description of the numerous encoder parameters.
In upcoming installments, we’ll discuss an encoder utility based on the Windows Imaging Components (WIC) interfaces that we developed for our own internal testing. We’ll also cover metadata, color profiles, more on alpha channels, and encoding best practices for different scenarios including RAW acquisition, JPEG transcoding, editing, web or email delivery, mobile devices, printing, archiving, and a few more I’m sure we’ll think up along the way. As always, if you have recommendations or specific suggestions for topics you’d like to see covered, send an email or post a comment.
Now, on with the discussion of Windows Media Photo encoding...
The Tools
When you want to create a JPEG, TIFF or BMP file, it either came from your camera or scanner in that format, or you load up your image editing application of choice and save in that format. At present, there aren’t any commercial applications available that save images as Windows Media Photo files. In the interim, we’ve developed two different conversion utilities as part of creating Windows Media Photo, and we’re actively working on new tools that will be available in the near future.
Our current encoder tools include a choice between the sample applications that are part of the Windows Media Photo Device Porting Kit (DPK) or an encoder utility based on the Windows Imaging Components (WIC) interfaces that we developed internally for our own testing purposes. Both tools are command line utilities, and each has its own specific strengths. For the remainder of this discussion, we’ll refer to these two different encoders (and their associated decoder) as the DPK Tools and the WIC Tools.
The DPK Tools are very limited in the uncompressed formats they can convert to and from, and have no ability to convert among pixel formats when encoding or decoding. However, they do provide a few special tricks, discussed below. Also, because the DPK Tools are built from the DPK sample application source code, they can be easily customized for any special-purpose requirement. The WIC Tools rely on the installed WIC codecs for source and destination image file formats, and can take advantage of all the facilities provided by WIC, including pixel format conversion. That imposes some restrictions, but will grow dynamically, including the ability to encode directly from RAW file formats as new codecs are added.
DPK Tools
The Windows Media Photo Device Porting Kit (DPK) includes sample application source code for a Windows Media Photo encoder application (WMPEncApp.exe) and a Windows Media Photo decoder application (WMPDecApp.exe). These are command line utilities that convert between Windows Media Photo and an equivalent uncompressed image format.
Any licensee of the DPK can build these tools from the source code included in the DPK. If you don’t have a copy of the DPK, I’ve provided the compiled DPK 1.0 RC Windows compatible version of these utilities here. These should run on just about any version of Windows (though we’ve never tested on anything prior to Windows XP.) Using the DPK, you can also make versions of these utilities for other platforms and operating systems.
The DPK Tools are based on the DPK reference source code. They are completely stand-alone and make no other calls to any external libraries other than basic operating systems support. The DPK tools contain no platform-specific optimization.
WMPEncApp.exe
This command line utility converts certain uncompressed file formats into equivalent Windows Media Photo files. It provides a complete set of command line options to control all supported Windows Media Photo Encoder options. Here is a summary of the usage of WMPEncApp and the various command line options. All of these options will be discussed in detail in later sections.
wmpencapp [options]...
-i input.bmp/tif/hdr Input image file name
bmp: <=8bpc, BGR
tif: >=8bpc, RGB
hdr: 24bppRGBE only
-o output.wdp Output Windows Media Photo file name
-q quality [1 - 255] Default = 1, lossless
-c format Required to define uncompressed
source pixel format
0: 24bppBGR
1: BlackWhite
2: 8bppGray
3: 16bppGray
4: 16bppGrayFixedPoint
5: 16bppGrayHalf
7: 32bppGrayFixedPoint
8: 32bppGrayFloat
9: 24bppRGB
10: 48bppRGB
11: 48bppRGBFixedPoint
12: 48bppRGBHalf
14: 96bppRGBFixedPoint
15: 128bppRGBFloat
16: 32bppRGBE
17: 32bppCMYK
18: 64bppCMYK
22: 32bppBGRA
23: 64bppRGBA
24: 64bppRGBAFixedPoint
25: 64bppRGBAHalf
27: 128bppRGBAFixedPoint
28: 128bppRGBAFloat
29: 16bppBGR555
30: 16bppBGR565
31: 32bppBGR101010
32: 40bppCMYKA
33: 80bppCMYKA
34: 32bppBGR
-d chroma sub-sampling 0: Y-only
1: YCoCg 4:2:0
2: YCoCg 4:2:2
3: YCoCg 4:4:4 (default)
-l overlapping 0: No overlapping
1: One level overlapping (default)
2: Two level overlapping
-f Frequency order bit stream
(default is spatial)
-t Display timing information
-v Display verbose encoder information
-V tile_wd0 tile_wd1 ... Macro block rows per tile
-H tile_ht0 tile_ht1 ... Macro block columns per tile
-U num_h_tiles num_v_tiles Horizontal & vertical tile count for
uniform tiling
-b Black/White Applies to 1bpp black/white images
0: 0 = black (default)
1: 0 = white
-a alpha channel format Required for any pixel format
with an alpha channel
2: Planar alpha
3: Interleaved alpha
Other: Reserved, do not use
-F trimmed flexbits [0 – 15] 0: no trimming (default)
15: trim all
-s skip subbands 0: All subbands included (default)
1: Skip flexbits
2: Skip highpass
3: Skip highpass & lowpass (DC only)
So for example to create a Windows Media Photo file from a typical 24-bit .bmp using reasonably high quality lossy compression, the command line would be:
wmpencapp -i input.bmp -o output.wdp -q 10
This scenario uses the default settings for most of the encoder options. Obviously, we’d like to take full control of these options to choose exactly how the Windows Media Photo file is created. The following sections describe these options in detail.
NOTE: The –F and –s command line options control compressed domain transformation features that go beyond the scope of this documentation. We’ll save a discussion of compressed domain operations for another day.
Pixel Format: -c Option & the Uncompressed Source File Format
The DPK Tools are certainly not general purpose file format conversion utilities. They provide the absolute minimum support for uncompressed source and destination file formats; the specific file formats supported are TIFF, BMP and HDR. Only certain variations of these formats are supported and they are, for the most part, tied to the pixel format being converted to or from. The DPK Tools do not perform any pixel format conversion. Therefore the source uncompressed image must be in the desired pixel format for the encoded Windows Media Photo file. Only minimal data validation is performed; if the source pixel format is incorrect, you will most likely create a bad Windows Media Photo file.
Here are the uncompressed source file formats supported by the DPK Tools, and the specifics for each pixel format supported. Each of the file formats and specific image formats listed below correspond to a mode that can be created using Adobe PhotoShop CS2.
TIFF The image should be flattened; it should not contain any layers. If it does contain layers, the “Discard Layers and Save a Copy” option should be selected under Layer Compression. Image Compression must be set to “None”; any compressed TIFF format will cause an error or convert to a bad Windows Media Photo file. TIFF images must always be stored in “Interleaved” pixel order; “Per Channel” pixel order is not supported. Byte order should be set to “IBM PC.” “Save Image Pyramid” should be unchecked.
The specific pixel format created depends on the correct combination of Image Mode in PhotoShop, the specific TIFF save options specified, and the encoder pixel format option specified for WMPEncApp.exe. The following table lists all the possible combinations
|
PhotoShop Mode |
Bit Depth |
Encodes as |
-c |
Notes |
|
RGB/8 |
|
24bppRGB |
9 |
|
|
RGB/16 |
|
48bppRGB |
10 |
|
|
RGB/16 w/ alpha |
|
64bppRGBA |
23 |
|
|
RGB/32 |
16 bit (Half) |
48bppRGBHalf |
12 |
|
|
RGB/32 w/ alpha |
16 bit (Half) |
64bppRGBAHalf |
25 |
|
|
RGB/32 w/ alpha |
32 bit (Float) |
128bppRGBFloat |
15 |
Fill alpha black |
|
RGB/32 w/ alpha |
32 bit (Float) |
128bppRGBAFloat |
28 |
|
|
CMYK/8 |
|
32bppCMYK |
17 |
|
|
CMYK/8 w/ alpha |
|
40bppCMYKA |
32 |
|
|
CMYK/16 |
|
64bppCMYK |
18 |
|
|
CMYK/16 w/ alpha |
|
80bppCMYKA |
33 |
|
|
Gray/8 |
|
8bppGray |
2 |
|
|
Gray/16 |
|
16bppGray |
3 |
|
|
Gray/32 |
16 bit (Half) |
16bppGrayHalf |
5 |
|
|
Gray/32 |
32 bit (Float |
32bppGrayFloat |
8 |
|
|
Bitmap |
|
1bppBlackWhite |
1 |
|
For 1bppBlackWhite encoding, the –b option allows you to specify how interpret black vs. white values. When encoding from TIFF files, the default will generate the correct results and this option should not be required.
BMP This file format is only used for 8 bit per channel (bpc) or smaller bit depths. It differs from the equivalent TIFF format because RGB data is stored in the uncompressed bit stream in BGR rather than RGB channel order. When creating the uncompressed image in PhotoShop, only RGB/8 mode should be used. Under BMP Options, the File Format should always be set to “Windows”. “Compress (RLE)” and “Flip row order” should never be checked. The Basic or Advanced Modes under the BMP Save options should be set in combination with the appropriate encoder options to achieve the desired pixel format according to the following table.
|
PhotoShop Mode |
BMP Options |
Encodes as |
-c |
Notes |
|
RGB/8 |
Basic: 24 bit |
24bppBGR |
0 |
|
|
RGB/8 w/ alpha |
Basic: 32 bit |
32bppBGR |
34 |
Fill alpha black |
|
RGB/8 w/ alpha |
Basic: 32 bit |
32bppBGRA |
22 |
|
|
RGB/8 |
Adv: X1 R5 G5 B5 |
16bppBGR555 |
29 |
|
|
RGB/8 |
Adv: R5 G6 B5 |
16bppBGR565 |
30 |
|
HDR This file format is only used for encoding to the Windows Media Photo 24bppRGBE pixel format. This special floating point pixel format uses a shared exponent and three independent mantissas to encode the entire pixel. Unfortunately, while PhotoShop CS 2 supports saving in .HDR file mode, it only saves using the compressed option. WMPEncApp.exe only supports uncompressed .HDR files. So, we’ve provided a simple command line utility called HDR2HDR that only does one thing: It reads and decompresses a compressed .HDR file and saves it as an uncompressed .HDR file. HDR2HDR is included in the DPK. If you don’t have the DPK, it’s also available here.
|
PhotoShop Mode |
PhotoShop Save As… |
Encodes as |
-c |
|
RGB/32 |
Radiance (*.HDR) |
24bppRGBE |
16 |
Fixed Point Pixel Formats
The one significant omission from the pixel formats listed in the tables above is the set of fixed point formats. As discussed in our previous blog on high dynamic range, wide gamut pixel formats, the fixed point pixel formats provide an excellent solution for retaining the full content of an image source while still providing efficient storage and processing.
However, these fixed point formats are one of the innovations introduced with Windows Media Photo and the new graphics infrastructure in Windows Vista and .NET Frameworks 3.0. No other file format supports fixed point encoding. Because WMPEncApp.exe has no built-in capability for pixel format conversion, there is no way we can encode fixed point Windows Media Photo files from standard TIFF or BMP files.
In reality, if we write a little software, we could convert floating point image data to fixed point and still store it in a TIFF file. While the resulting TIFF file containing fixed point image data would not display correctly, it could be converted to a fixed point Windows Media Photo file using WMPEncApp.exe. That’s why there are –c option values defined for the various fixed point pixel formats. But for this to work, you’re on your own to first convert the uncompressed image data to fixed point. I do have a utility that does this, but trust me, it’s not ready for any public use!
If you do choose to write your own software to generate uncompressed fixed point data, you might as well simply write a WIC or Windows Presentation Foundation (WPF) application and directly call the Windows Media Photo converter. This makes a lot more sense than writing out fixed point data in a non-standard TIFF file and then using WMPEncApp.exe.
If you can hang in there for Part 2, we will discuss how you can use the WIC Tools to create fixed point Windows Media Photo files. Since the WIC Tools have full access to all the capabilities of WIC, the encoder can call the appropriate pixel format converter and create a Windows Media Photo file in any pixel format, independent of the uncompressed source image format.
Unsupported Pixel Formats
There are several other pixel formats that can’t easily be encoded using WMPEncApp.exe
It is possible to encode images in any of the pre-multiplied alpha pixel formats, but the uncompressed image will have to first be pre-processed to multiply the RGB channels by the alpha channel value. This can be done using the appropriate blending operations in Photoshop. If there is any interest, I’ll describe that process in another installment (and provide an action to offer some automation.) Like fixed point pixel formats, a TIFF file in a pre-multiplied pixel format will not display correctly, but can be converted to an equivalent Windows Media Photo file using WMPEncApp.exe. But also like fixed point pixel formats, if you wait for Part 2, we’ll discuss how this can be done much more easily using the WIC Tools.
While WMPEncApp.exe includes a –c option value for 32bpcRGB101010 pixel format, there is no way to create this pixel format in an uncompressed file using PhotoShop. This is another pixel format that we’ll handle using the WIC Tools, which we will discuss in Part 2.
WMPEncApp.exe does not support encoding any of the n-Channel pixel formats. Creating uncompressed n-Channel data will be a whole topic in itself, which we will address in a future installment of this blog series. Once we understand how to create this n-Channel data, we’ll be able to use the WIC Tools to encode it in Windows Media Photo file format.
Compression Choices: -q, -d & -l Options
There are three parameters that control the tradeoffs between image quality and compressed file size. In addition to their individual effect, it’s also important to understand how these three parameters interact with each other.
-q Quantization (1-255)
One of the principal ways lossy compression is achieved is to “quantize” a set of continuous values into a smaller set of representative values. In this way, “loss” is achieved by mapping values that are close together to the same value. Only the remaining set of values needs to be coded and saved, reducing the amount of storage required. The greater the degree of quantization, the more the content can be compressed, but in doing so, more of the small differences among similar values are lost.
The quantization level, specified by the –q option basically defines the amount of similarity that can be discarded. It is an arbitrary range from 1-255; the quantization value does not correspond directly to any specific difference amount. Quantization determines the desired image quality rather than the desired compression ratio. The actual compression ratio is a function of both the quantization and the image content; images with less complex content will have fewer differences among values and will achieve better compression without the need for greater quantization. Additionally, the quantization is also highly dependent on the specific pixel format, most importantly the bit depth. Higher values will be required to greater bit depths to achieve a comparable compression ratio, since larger bit depths provide a greater range of possible values and therefore will need more quantization.
Windows Media Photo provides the unique capability of preserving all data values during quantization, effectively providing mathematically lossless compression. When the quantization is set to 1, no values are discarded and all encoded pixel values will be returned with absolutely no loss. This is the default setting if no value for the –q option is specified.
-d Chroma Sub-sampling (0, 1, 2 or 3)
We can choose to reduce the resolution of the chrominance of an image prior to the quantization process. Reducing the chrominance resolution, or chroma sub-sampling, has long been understood as an effective way to reduce image content with very little perceptible degradation. In fact, virtually all television or video you watch, whether analog or digital, takes advantage of chroma sub-sampling to reduce the required bandwidth. The JPEG compression format always uses chroma sub-sampling as well. In fact, the unique capability of Windows Media Photo is not that we provide chroma sub-sampling, but that we provide a mechanism for you to reduce or eliminate this technique to improve image quality. Of course, this only applies to RGB color images.
An image is first reorganized from RGB into a channel for luminance and two channels to describe the color information (or chrominance.) If all chrominance is discarded, what’s left is a monochrome image. Typically, we don’t want to go that far!
Many video systems, as well as the JPEG compression format (or at least the most common variant of it that we all use) discards 75% of the chrominance information. The resolution of the color information is reduced by a factor of two in both dimensions. So every four pixels in an image are represented by four luminance values but only two (one for each chroma channel) chrominance values. What started out as 12 values (four pixels with three channels each) has been cut in half; only 6 values (four luminance values and two chrominance values) have to be saved.
In the world of digital imaging, this is referred to as 4:2:0 chroma sub-sampling, or more simply as 4:2:0. When all chrominance information is retained (no values are discarded), this is referred to as 4:4:4. Another popular approach, particularly for professional video applications, is to only discard 50% of the chroma values; two values for each chroma channel, or four values in total are retained. This is referred to as 4:2:2. Finally, if we discard all color information, retaining only the luminance, this is described as 4:0:0. Windows Media Photo supports all these modes.
-c 3 (4:4:4) All color information is retained, assuring full resolution of the chrominance information. This is the default and is the recommended setting to achieve the best overall image quality. Whenever an image is stored as an intermediate format and further editing is anticipated, it is highly recommended to use 4:4:4.
-c 2 (4:2:2) The color information is encoded at ½ the resolution of the luminance information. Four each set of four pixels, four luminance values are used and the eight chrominance values are reduced down to four (two for each chroma channel.) This provides perceptively lossless color encoding for the final delivery of an image. However, if further editing of the image is anticipated, it’s recommended than any chroma sub-sampling be avoided.
-c 1 (4:2:0) The color information is encoded at ¼ the resolution of the luminance information. Four each set of four pixels, four luminance values are used and the eight chrominance values are reduced down to two (one for each chroma channel.) This is the same sub-sampling used by JPEG. When converting a JPEG file to Windows Media Photo, there is no need to specify a higher chroma sub-sampling mode than 4:2:0.
-c 0 (4:0:0) All color information is discarded and only the luminance information is retained, effectively creating a monochrome image. For performance reasons, Windows Media Photo uses a non-traditional method to calculate luminance. Therefore, the resulting monochrome image will not appear identical to a monochrome version of the image created using other tools. Additionally, although all color information is discarded, the pixel format is not changed, so the image is still stored using an RGB pixel format. It is strongly recommended that if you want to create a monochrome image, the image should first be converted to monochrome using an appropriate image editing application to achieve the desired result, and then this monochrome image should be encoded using the appropriate Gray pixel format.
-l Overlap Processing (0, 1, 2)
Windows Media Photo uses an advanced version of a macro-block based compression scheme. To achieve the best performance and minimize the amount of memory required to encode or decode an image, the overall image is subdivided into a set of 16x16 pixel macro blocks. Each macro block is are further divided into four 4x4 pixel blocks. All image encoding and decoding operations are peformed on these blocks and macro-blocks. As a result, for high quantization values (when we are discarding a higher amount of similar pixel values), the steps between blocks and macro blocks may become visible as artifacts in the compressed image. This is very common with JPEG (which also uses macro blocks) and significantly reduces the amount of compression that can be used without creating these visible artifacts.
Windows Media Photo addresses this problem through a combination of better quantization and an additional step of overlap processing. This overlap processing takes into account the values of pixels in neighboring blocks and macro blocks when choosing the quantization values that represent similar adjacent pixels. By doing so, the visible differences among adjacent blocks and macro blocks are dramatically reduced.
Two levels of optional overlap processing can be specified via the –l parameter. Single level overlap processing (-I 1) is performed at the 4x4 block level. For all pixels in the block, bordering pixels in adjacent blocks are also evaluated when choosing the quantization values for that block. Double level overlap processing (-l 2) also analyzes neighboring adjacent pixels when choosing quantization values at the 16x16 macro-block level.