Welcome to MSDN Blogs Sign in | Join | Help

Today I am going to write a short development note on a GPE-based display driver for Windows CE as we noticed that this stuff is not clearly documented online and have confused many people including those who are writing display drivers for new platforms at MS. Hopefully this entry might save a few strands of hairs of some people out there somewhere.

The topic is about the screen resolutions and orientations supported by the display driver. It is really a simple thing, but there are a set of rules that driver writers must know to get it right. When it is not correctly followed, your device might still boot, but you may start noticing weird glitches here and there.

A GPE-based display driver has to override a function called GetModeInfo() which is a pure virtual function defined by the GPE base class. What it does is very simple; It just returns the screen resolution (and some extra information such as bit depth) supported by the device at the given index. It is used in a combination with NumModes() which gives the number of modes that the GetModeInfo() supports. What is important here is that the display driver framework requires that the GetModeInfo() returns the screen resolution relative to the current screen orientation. Not the native screen orientation. Not the boot-time screen orientation. It is always the current screen orientation.

The screen can take four possible orientations; 0 degree, 90 degrees, 180 degrees, and 270 degrees. Unless you are only dealing with square displays, then depending on the angle, the width and height have to be swapped. For example, if your screen resolution is 800 x 600 at the 0 degree and that is how your GetModeInfo() reports, then it has to return 600 x 800 when the current orientation is at 90 or 270 degrees. Many GWES components that deal with screen resolutions and orientations are dependent upon this behavior, but this doesn't become a problem until you actually try to rotate the screen. So, initially it may look like it is working perfectly, but once you try to rotate it, then you start seeing some issues.

As far as I can think of off the top of my head, most APIs (if not all) that deal with resolutions such as ChangeDisplaySettingsEx() and SetDisplayMode() all expect resolution parameters to be relative to the current screen orientation. So, the rule of the thumb appears to be this: Always think in terms of the current screen orientation.

Here's a quick blog about an issue that we just hit today; most will merely find it interesting, but I hope it saves someone somewhere a little time, effort, and confusion.

We recently got a new codec library drop which we integrated into our mainline code tree. The codec team spends alot of time developing optimized ARM-versions of windows media codecs, and every once in awhile we get a new library that we need to integrate into our build system.

When we checked the libary into our source tree and ran a Smartphone build, we got roughly this error from one of our build tools:

    wmvdmod.dll(0) : fatal error RM0024 : Input File has more than 16 sections

In the ensuing investigation we discovered two things we hadn't previously known:

1. Our codec team has been subdividing their C/C++/Assembly language routines into multiple sections to keep certain code paths together and improve cache/page hit rates. As a result, they had created about 14 extra sections with names like ".decodeX_Pass1" (names changed to protect the innocent ;-). In general, one can view this type of information for any lib or dll by running "dumpbin -headers" on it.

2. Windows CE has some limitations on the number of sections that a module can contain (due to design decisions in the kernel and ROM image filesystems). Ultimately this results in a limit of 16 sections for some scenarios, which is the case we hit in our build tools.

The simplest short-term solution to this problem was to use the merge linker directive to force the linker to merge the different sections in the library back into the .text section. To accomplish this, we added something like the following to the appropriate sources file. This solved the build error without the need to rebuild the library (at the expense of removing all the goodness of using multiple sections to control code placement).

LDEFINES=$(LDEFINES) \
 -merge:.decodeX_Pass1=.text \
 -merge:.decodeX_Pass2=.text \
 -merge:.decodeY_Pass1=.text \
 -merge:.decodeY_Pass2=.text \
...

Note: I'm told one can accomplish the same feature within c/cpp files using #pragma comment(linker,"-merge:.foo=.bar")

In the ensuing discussion of how to fix this in the correct way (e.g. removing the restriction on the number of sections, or using fewer sections in the codec lib), our compiler/linker guru came down firmly on the side that there's no reason to need more than 16 sections (or really more than four or five), and noted that this whole situation could have been easily avoided using the following techniques:

For performance, if you want page alignment, use __declspec(align).  If you need to control code layout, use the linker’s /ORDER switch with a file containing the symbol ordering you need.  Alternatively, use the linker’s automatic sorting of section suffixes, e.g. .text$FOO_A, .text$FOO_B, and .text$FOO_C are automatically merged with .text in alphabetical order.

 

We didn't previously know about the linker options to automatically sort and merge sections using the $ delimiter, and I suspect that most other people don't either. We'll now go back to the codec team and suggest that future drops can just use the automatic sorting mechanism to ensure that code is grouped as needed while keeping all the code in the .text section. As a nice side benefit, grouping code into the same section saves on the amount of ROM required for the code. Each section must start on a 4k boundary, so on average each section will waste 2k or ROM. Note that section names are case sensitive, so .TEXT is not the same as .text. 

 

Here are some other related details about sections which I've shamelessly stolen from some other developers here at MS:

Paging:

Code may be paged into a size-limited RAM buffer called a "page pool". The page pool helps limit the RAM impact of code by keeping resident only the code pages currently in use. Code that must always stay resident in RAM can be marked as non-pageable, but this will cause the full extent of that code section to be copied into RAM for as long as the module is loaded.

To limit the footprint of a module in the page pool, it’s best to group the functions and constant data that are in the working set together.  This will allow the working set of code to exist in the page pool in the smallest number of pages. You can group them together using custom section naming. If section names are unique they will each be page-aligned (4k), so unless they truly need unique attributes, it’s best to name them such that automatic section merging can take place. Automatic section merging happens on sections named using a “section_name$subsection_name” convention, such that they all merge into one section named “section_name”.

 

For readability, give the subsection a name related to the grouping reason, such as “initialization”, “debug”, or “core”.

 

Example

To group function1 and function3 together in a custom subsection, you can do the following.

 

#pragma code_seg(".text$initialization")  // Code that follows goes into named subsection

void function1(void)  {return;}

#pragma code_seg()                        // Code that follows goes into default .text section

void function2(void)  {return;}

#pragma code_seg(".text$initialization")  // Code that follows goes into named subsection

void function3(void)  {return;}

#pragma code_seg()                        // Code that follows goes into default .text section

 

Non-Pageable Sections

If you need only a small bit of the code to stay in RAM always for performance or reliability reasons (like time-critical driver code), you can make the module partially pageable by creating a completely new section with custom attributes.

 

The following pragma defines a section called "NonPageableCode" which is set to non-pageable.

 

#pragma comment(linker, "/SECTION: NonPageableCode,ER!P")

 

There is also an newer, more readable way of specifying the section properties which has been available since CE5:

 

#pragma section("NonPageableCode", execute, read, nopage)

 

Now, in the source code, to make a section of code non-pageable, put the following line before the code:

 

#pragma code_seg("NonPageableCode")

 

Afterward, you may use the following line to force following code to be placed back in the default .text section:

 

#pragma code_seg()

 

Tools

DUMPBIN /HEADERS FOO.DLL (to see what sections exist in the module)

 

That's it for now.

 

I haven't seen this information consolidated online, so here it is:

 

A DLL Forwarder is used if you want to export an entry point from one DLL (or, more likely, for historical purposes you've already exported it from one dll), but you want to actually implement it in a different DLL.

 

For example, suppose you want to implement a function Foo() in a DLL I'll call impl.dll, but for whatever reason you need to export it from a DLL I'll call export.dll. Of course, one simple solution is to add code to export.dll to explicitly call into impl.dll. However, there are a couple of issues with this:

 

- There's an additional function call/return overhead.

- If the functions exported from the two DLLs have the same name, you might have some trouble convincing the linker to call the the Foo() in impl.dll from the Foo() in export.dll.

 

A forwarder solves this problem by directly interacting with the loader to forward exports from one DLL to another DLL without actually adding anything to the code path.

 

Forwarders are implemented in the .def file of the DLL you're forwarding through, and have the following syntax:

 

EXPORTS

    <FuncName>=<ForwardedDll>.<ForwardedFuncName>|#<ForwardedOrdinal> @<FuncOrdinal> NONAME

 

Where: 

<FuncName> is the name of the function as exported (e.g. from export.dll).

@<FuncOrdinal> is the ordinal of the function as exported (e.g. from export.dll). Note the use of the '@' symbol. Optional.

 

<ForwardedDll> is the name of the DLL into which you're forwarding the call (e.g. into impl.dll). Optional; if not specified, the forwarded function is assumed to be in this dll.

<ForwardedFuncName> is the name of the function in the ForwardedDll.

#<ForwardedOrdinal> is the ordinal of the function in the ForwardedDll. Note the use of the '#' symbol.

 

NONAME is the keyword that causes the linker to throw away the name of the function you're exporting so that it can only be referenced by its ordinal. This saves some space in the DLL and forces all callers to use the ordinal to link or GetProcAddress on the function. Optional.

Note: You need to specify the forwarded function name or ordinal (not both). You'll get slightly better load perf and smaller code size by specifying the ordinal. The ordinal is also necessary if the function in the DLL you're forwarding to is specified as NONAME in its def file.

 

Using our example, to forward Foo() from export.dll to impl.dll, the export.def file would have a line that looks like this:

 

EXPORTS 

    Foo = impl.Foo

 

If you run "dumpbin /exports" against export.dll, you should see an entry for showing the forward that looks something like this:

 

    ordinal  hint   RVA      name
    nnnn    mm                Foo (forwarded to Impl.Foo)

 

Tricky detail 1: When linking export.dll, the linker needs to figure out that the function you're exporting is a "C" style function, which it would normally do by looking at the function signature in the code implementation of the function. I've found that the easiest way to work around this is to implement a code stub that is linked into export.dll so it can get the right name in the export.lib file (e.g. decorated/undecorated). The actual code is thrown away at link time, so it doesn't contribute to the size of the Dll. For example, one would need to implement the following and link it into export.dll to make the linker happy.

 

#define FORWARD(fn) extern "C" void fn(){}

FORWARD(Foo)

 

Tricky detail 2: I've run into one issue with ROMIMAGE: if export.dll is in the modules section of the .bib file, but impl.dll is in the files section, ROMIMAGE will generate an error when it tries to resolve the import at makeimg time. This will likely be fixed in the future, but for now it's just something that needs to be avoided.

 

Tricky detail 3: Forwarders are used at load time to forward references between DLLs. The linker will not use forwarders at link time to resolve links within your DLL. Therefore, if the DLL you're forwarding from includes code with references to the function you're forwarding, the link phase of you DLL will fail with an unresolved extern error.

 

Tricky example: If you want to export it a function at ordinal 1000 (and not export it by name), and want to forward it to  DLL which exports it at ordinal 2000 (without a name), the syntax in your .def file is:

 

EXPORTS

    SHIM_ORD_1000=IMPL.@2000 @1000 NONAME

 

The SHIM_ORD_1000 is an arbitrary name; it's only there to satisfy the def file syntax rules. It doesn't really matter what you call it as long as it doesn't alias to anther function exported in your .def. If you then run dumpbin /exports on the resulting dll, you'll see something like:

 

    1000 [NONAME] (forwarded to IMPL.@2000)

 

 

In our previous entry, we talked about how video is synchronized to audio. In this short entry, we will talk about time stamps, master clocks, how adjustments to the master clock are made and how to deal with live streams.

About Reference Clocks and Stream Time

All fiters in a filter graph are synchronized to the same clock, the reference clock. The stream time is based off the reference time, but it is relative, and depends on which state the graph is. For instance, stream time doesn't move when the graph is paused; stream time goes back to 0 after a seek.

DirectShow provides a base class CBaseReferenceClock that implements the IReferenceClock interface. The base class clock object maintains two times internally:

  • internal private time
  • reference time

The internal private time is the actual time kept by the clock, and can be accessed through GetPrivateTime(). The internal private time can go backwards for brief periods of time. The reference time is based off the private time, and cannot go backwards.

Whenever a filter provides the reference clock, it will usually inherit from CBaseReferenceClock. It can either override the GetPrivateTime() function to return directly the time from the device (if available), or it can issue adjustments to the stream time through the SetTimeDelta() function. If it chooses the second method, it will need to monitor the difference between the system time and the time provided by the device.

The default reference clock in WindowsCE is provided by our audio renderer. It uses the SetTimeDelta() method to issue adjustments to the stream time. To change the reference clock in a filter graph, the interface IMediaFilter needs to be queried from the filter graph. Then use SetSyncSource() to change the reference clock.

All filters can access both the reference time and the stream time. The base filter class CBaseFilter has a m_pClock member. The reference time can be accessed by doing m_pClock->GetTime(), and the stream time is a member function of CBaseFilter, StreamTime().

About Time Stamps & Stream Time

The samples being processed in the filter graph may or may not have a time stamp, which is the media sample start and finish time. The time stamps are used in conjuction with the stream time. If a sample has a time stamp that is greater than the current stream time, it means that the sample is early. If a sample has a time stamp that is smaller than the current stream time, it is late. In a playback scenario, usually a splitter is the one that attaches time stamps to the samples. Filters may use time stamps for different reasons. For instance, time stamps may be used for presentation purposes, or to control the amount of buffering. The video renderer will use time stamps to schedule the samples for presentation, and thus, will end up throttling the video playback pipeline. When a sample arrives at the video renderer, there are several possibilities:

  • no timestamp - sample is scheduled immediately
  • in the future (timestamp > stream time) - video renderer needs to schedule the sample, and will usually call m_pClock->AdviseTime()
  • in the past (timestamp < stream time) - may render immediately, or not render at all.

The Reference Clock & The Audio Renderer

Let's assume in this section that the default Windows CE audio renderer is the reference clock. The audio renderer uses time stamps and stream time in a different way. Being the reference clock implies that the stream time is controlled by this component, so it will not follow a behavior similar to the video renderer.

In this case, as soon as the audio renderer receives a sample, it is ready to send it out to the audio driver. If the sample is late, it will drop it. Otherwise, it will send them immediately if there's buffer availability from the audio driver. It will never wait for the time to be right. For cases where the media sample times are not contiguous, that is, the end time of a sample is smaller than the start time of the next sample, then the audio renderer will write silence to the driver, and will wait until the start time for the second sample has arrived. In the normal scenario, there is going to be no space between media samples, so the audio renderer will write samples as fast as it can.

When the default audio renderer has finished processing a sample, it will read the device clock and the system clock, and compute the difference between them. Unfortunately, there is no way to get to the device clock directly, so the audio renderer uses the amsndOutGetPosition(), which can be imprecise. The audio renderer will accumulate differences, and will use a low pass filter on these differences. Whenever the average difference has gone above a certain threshold, then it will issue an adjustment to the stream time, through the usage of the SetTimeDelta() function. As soon as it does that, all filters calling GetTime() will receive the adjusted time - so the stream time will not be continuous. Note that all other filters that used m_pClock->AdviseTime() to get notified when a certain stream time has arrived (such as the video renderer) will not have to know of the stream time "change" that happened because of an adjustment. They will be advised when the stream time reached the desired value.

Live Sources & Clock Slaving

If the default Windows CE audio renderer is not the reference clock, it will write samples as it receives them. There's no automatic slave mode in WindowsCE, so the audio renderer will not wait for the time to be right before it sends the next sample.

For the case of live streams, there is one interface in our audio renderer that causes a speed up or slow down in the audio driver so that we try to match against the live source. The source filter will usually be using the IAudioRenderer->SetDriftRate() to control the audio matching speed. In this case, the audio renderer continues to be the master clock.

Another possibility when the audio renderer can't be the master clock is to simulate a slaving mode by inserting a filter in front of the audio renderer. This filter's responsibility would be to throttle the samples, so that they will be delivered just when it is almost time to send them to the audio driver. Of course, more complicated schemes are possible to try to match the rate of the incoming samples, but we will not go there here...

Foreword 

I've been working on a wavedev2 porting guide over the last few weeks and decided that it's better to post what I've go so far rather than wait until it's what I would consider finished. Expect future updates/additions as time allows, and feel free to ask for specific information in the comments. 

Overview

 

This whitepaper gives an overview of porting the wavedev2 audio driver to new hardware. For additional background on the history and features of wavedev2, please refer to other articles on http://blogs.msdn.com/medmedia.

Different Versions

 

Like most software, each release of Windows CE includes new features and bug fixes. For this article I’ll be referring to the version of wavedev2 which shipped with Windows CE 6 (AKA Yamazaki) under public\COMMON\oak\drivers\wavedev\wavedev2\ensoniq. This version is backward compatible with previous versions and the porting process is comparable. I’ve included some notes on the differences between Windows CE 6 and previous implementations at the end.

File Layout 

All the files needed to build wavedev2 are in a single directory. For porting purposes, these files can be grouped into the following categories:

1.       Files which are device independent, and which you should not need to touch during the porting process beyond just copying them:

audiosys.h: Proprietary wave message definitions used by wavedev2.

devctxt.cpp: Implementation of device context class.

devctxt.h: Definition of device context class.

input.cpp: Implementation of audio input streams.

makefile: Used by build system

midinote.cpp: Implementation of tone generator.

midistrm.cpp: Implementation of MIDI stream and MIDI parser.

midistrm.h: Definition of MIDI note and stream classes.

mixerdrv.cpp: Implementation of Mixer API classes (* may need to change if you want to take advantages of mixer API extensions).

mixerdrv.h: Definition of MIXER API classes.

output.cpp: Implementation of audio output streams.

strmctxt.cpp: Implementation of base audio stream class.

strmctxt.h: Definition of base, input, output stream classes.

wavemain.cpp: Device driver interface.

wavemain.h: Common include header used by all source files.

wavepdd.h: Basic PCM sample definitions.

wfmtmidi.h: MIDI structure definitions.

2.       Files which are device dependent but are logically part of the wavedev2 driver infrastructure. These files will need to be copied and modified during the port. These files are:

 

hwctxt.cpp: Implementation of hardware context class.

hwctxt.h: Definition of hardware context class.

oemsettings.h: HW-specific definitions used by hw-independent code.

sources: Used by build system.

wavedev2_ensoniq.def: Driver exports. Probably just need to rename.

wavedev2_ensoniq.reg: Registry entries used to install driver.

3.       Files which are device dependent and are not logically part of the driver infrastructure. You can ignore these files in your port (unless they happen to be appropriate to your hardware). For the Ensoniq sample driver, these are:

 

AC97.H: AC97 codec-specific definitions.

Es1371.cpp: Ensoniq 1371-specific functions.

Es1371.h: Ensoniq 1371-specific header.

Hw_ac97.cpp: AC97 codec-specific functions.

 

In the above list, note that the vast majority of files shouldn’t need to be touched, and there is really only one source file and two headers you will need to modify to bring up a driver.

Class Descriptions

Class Overview

 

The wavedev2 driver largely consists of three main base classes:

HardwareContext

 

This class represents the actual audio hardware. This is the only class you will typically need to modify to port the driver. It is the only device dependent class, and takes care of hardware initialization, power management, DMA and Codec control, handling audio interrupts, and any other proprietary features the driver may implement. There is one instantiated HardwareContext object in the driver which is pointed to by the g_pHWContext global variable. I’ll go into detail about each HardwareContext’s methods later.

DeviceContext

 

This class represents a specific audio device. You should not have to modify anything in this class to port the driver. There is a DeviceContext virtual base class, from which are derived an InputDeviceContext and an OutputDeviceContext. A typical wavedev2 audio driver (such as the Ensoniq driver) implements a single input device represented by an InputDeviceContext; and a single output device represented by an OutputDeviceContext. In the Ensoniq sample, these objects are directly embedded within HardwareContext class as member variables.

DeviceContext methods include:

StreamContext

 

This class represents a specific audio stream. You should not have to modify anything in this class to port the driver. There is a StreamContext virtual base class from which are derived a variety of stream classes for various flavors of PCM audio and MIDI data. Each stream is associated with a specific device context. This association is implemented as a linked list of stream contexts hanging off of each device context. In addition, each stream context includes a pointer back to its associated device context.

The class hierarchy is roughly as follows:

StreamContext

        CMidiStream 

        WaveStreamContext

                InputStreamContext

                OutputStreamContext

                    OutputStreamContextM8

                    OutputStreamContextM16

                    OutputStreamContextS8

                    OutputStreamContextS16

The reason for the multitude of output contexts is that the mixing/sample-rate-conversion code on the output side is optimized for each type of PCM data (Stereo/Mono, 8/16-bit samples). This avoids some tests in the inner loop. The same optimization wasn’t done for the input side (input isn’t typically used as often as output, and the code is a little simpler).

StreamContextMethods include:

Porting HardwareContext

 

In this section, I’ll go over each of the methods in HardwareContext and describe what they do.

HardwareContext::CreateHWContext

This is a static method which is called during driver initialization (from the WAV_Init code in wavemain.cpp). This function should create and initialize the global g_pHWContext with a new HardwareContext object and call g_pHWContext->Init. You probably won’t need to change this function, as most changes will be in the Init method.

HardwareContext::Init

This method is only called by CreateHWContext, and is where initialization of the Hardware is typically implemented.  Portions of this function may need to be modified for new hardware. Its role is to initialize any hardware, allocate DMA buffers, and startup the interrupt service thread. In addition, during initialization it needs to call into some of the device independent sections to initialize them; specificially:

 

-          Call SetBaseSampleRate on each device context to tell it what sample rate the hardware is running at. Note that these functions can be called at any time to tell the device context that the hardware sample rate has changed, but for devices with a fixed sample rate setting this up during initialization makes sense.

-          Call InitMixerControls to initialize the Mixer API support.

HardwareContext::Deinit

This method is called when the driver is unloaded and the system calls WAV_Deinit. In the current design, wave drivers are never unloaded so this method has limited usefulness.

HardwareContext::UpdateOutputGain

HardwareContext::UpdateInputGain

HardwareContext::SetOutputGain

HardwareContext::SetOutputMute

HardwareContext::GetOutputGain

HardwareContext::GetOutputMute

HardwareContext::GetInputMute

HardwareContext::SetInputMute

HardwareContext::GetInputGain

HardwareContext::SetInputGain

 

These methods are associated with the master input and output gain controls provided by the default mixer API implementation and the device gain waveOutSetVolume API. In the Ensoniq implementation these defer processing to the DeviceContext SetGain methods, which automatically handle volume control in software. There is no need to modify the existing code unless you want to handle some aspects of volume control in hardware. However, keep in mind that individual stream gain controls are still handled in software, and there is no additional overhead in handling device gain as well. Therefore, there is no performance advantage in modifying this code to use hardware gain controls.

HardwareContext::StartOutputDMA

This method starts the DMA controller for audio output. This includes:

1.       Check to see if output dma is already running and ignore the call if it is.

2.       Clear the variables that track how much “live” data is in each DMA buffer.

3.       “Prime” the output DMA buffer with data.

4.       Start the DMA channel if (and only if) data was available to be transferred.

The only line you should need to change is the one that specifically turns on the DMA channel, which in the Ensoniq implementation is:

m_CES1371.StartDMAChannel( ES1371_DAC0 );

HardwareContext::StopOutputDMA

                This method stops the audio output DMA controller. The only line you should need to change is:

m_CES1371.StopDMAChannel( ES1371_DAC0 );

HardwareContext::StartInputDMA

HardwareContext::StopInputDMA

 

These methods are analogous to the methods described above for the output DMA. However, note that the code to start the input DMA doesn’t need to “prime” the buffer or keep track of how much application data is in the buffer, and is therefore somewhat simpler than the output case.

 

HardwareContext::GetDriverRegValue

HardwareContext::SetDriverRegValue

These methods relate to reading driver-specific registry keys. You should not need to change them.

 

HardwareContext::InitInterruptThread

This method initializes the audio driver’s IST thread and sets it to a realtime priority. If your driver has a single IST thread shared by both input and output you will not need to modify this code.

 

HardwareContext::PowerUp

HardwareContext::PowerDown

These methods are called by the system’s power management subsystem. In the Ensoniq driver they are stubbed out.

 

HardwareContext::TransferInputBuffer

HardwareContext::TransferOutputBuffer

This method is called from the IST to transfer one data into our out of the DMA buffers. The code determines the starting address and size of the DMA buffer and passes the information to the device context, which performs the actual transfer. You will not need to modify this code unless you change the organization or data structures representing the DMA buffers.

 

HardwareContext::InterruptThread

This method implements the Interrupt Service Thread which is shared by both input and output DMA. It’s operation is basically:

1.       Wait for an input or output DMA done interrupt.

2.       Determine whether an input or output (or both) DMA interrupt occurred.

3.       If an output DMA interrupt occurred:

§  Transfer/mix application data into the DMA buffer that was just completed.

§  If there is no application data remaining in either DMA buffer, halt output DMA

4.       If an input DMA interrupt occurred:

§  Transfer data out of the DMA buffer that was just completed into application buffers.

§  If we were unable to transfer any data (due to no application buffer being available), halt input DMA.

5.       Go back to step 1.

 

HardwareContext::SetSpeakerEnable

HardwareContext::RecalcSpeakerEnable

HardwareContext::ForceSpeaker

These methods handle the WODM_FORCESPEAKER message which may be used to request that audio data be routed to an auxiliary speaker on the back of the phone (this speaker is typically larger and more powerful than the earpiece speaker, and is used for ringtones). If you hardware supports this functionality, you will need to add code to the SetSpeakerEnable to switch the speaker on or off.

 

HardwareContext::PmControlMessage

This method receives messages from the Power Manager IOCTL calls:

IOCTL_POWER_CAPABILITIES

IOCTL_POWER_QUERY

IOCTL_POWER_SET        

IOCTL_POWER_GET

 

                You probably will not need to modify this code.

 

HardwareContext::IsSupportedOutputFormat

This method is called during waveOutOpen to allow the OEM to support additional custom audio formats beyond the standard PCM functionality. Normally this method should just return FALSE. The Ensoniq driver supports directly playing WMAPro compressed audio content over its S/PDIF interface and therefore returns TRUE for this specific case.

DMA Buffer Organization and Data Transfer

 

In the Ensoniq implementation, which is fairly typical, input and output are each allocated a DMA buffer using HalAllocateCommonBuffer. The size of each buffer is only 4k (the same as a memory page), so it’s unlikely that the allocation will fail (especially since it takes place during boot). Other implementations may choose to preallocate a fixed area of memory for the audio DMA buffers.

During audio transfer, each DMA buffer is logically subdivided into equally-sized DMA pages 0 and 1, and the hardware is programmed to:

a.       Transfer nonstop from the DMA buffer to the codec, and automatically reload the DMA address register with the start address of the buffer when it reaches the end.

b.      Generate an interrupt to the audio system whenever the DMA address moves either past the midpoint of the buffer (e.g. from page 0 to page 1), or reaches the end and restarts itself (e.g. from page 1 to page 0).

On each DMA interrupt, the HardwareContext code needs to determine which DMA page the DMA controller has just finished copying data into/out of and call the DeviceContext’s TransferBuffer method to copy application data into/out of that buffer.

Buffer Security/Copying

 

(Still working on this section) 

Support for S/PDIF

 

(Still working on this section) 

Differences between Windows CE 5 & Windows CE 6

 

All current versions of Windows Mobile (including Windows Mobile 6) are based on Windows CE 5 or earlier OS releases. However, the most recent version of wavedev2 is shipped with Windows CE 6, and that’s the version I’m examining for this guide. Therefore, it’s important to touch on differences between how the two OS’es interact with wave drivers.

For the most part, the audio driver architecture between CE5 and CE6 is the same. Audio drivers written for CE5 can generally run with little or no modification on CE6.

Virtual Addressing Differences

 

When porting the Ensoniq wavedev2 driver from CE5 to CE6, the only change specifically related to moving between operating systems was to surround the call to SetProcPermissions in hwctxt.cpp as follows:

#if (_WINCEOSVER < 600)

    SetProcPermissions((DWORD)-1);

#endif

 

On Windows CE6, the API’s SetProcPermissions and GetProcPermissions are no longer supported due to changes in the virtual memory architecture. They are still exported for backward-compatibility purposes, but they will have no affect (other than printing out a nasty warning message on the debugger). This change bears a little explanation:

On pre-Windows CE6 systems there is a limit of 32 processes, and all processes run in a shared virtual memory space. The system provides cross-process protection to ensure that processes don’t access each other’s memory, and this protection is enforced on a per-thread basis. Device drivers (such as the wave driver) run inside device.exe, which is one of these 32 processes. The Interrupt Service Thread (IST) in the driver is responsible for accessing audio data in various application buffers residing in multiple processes. The audio driver’s IST overrides this protection by calling SetProcPermissions(0xFFFFFFFF); each bit in the parameter represents one of 32 processes in the system, so 0xFFFFFFFF enables access to all of them.

Windows CE 6 adopts a more traditional memory architecture, with the kernel taking the upper 2GB of virtual space and each user process occupying the same lower 2GB region. Switching between user processes involves swapping a new process into the lower 2GB. While this greatly expands the amount of virtual memory available to each user process, it also means that the IST thread (now running in the kernel) may no longer freely access other arbitrary process’ address space.

To solve this problem, the waveapi middleware (which sits above the audio driver on the stack) now takes care of mapping each application data buffer into a kernel (via CeAllocAsynchronousBuffer). The memory mapping/unmapping take place during waveOutPrepareHeader/waveOutUnprepareHeader, so the cost of memory management doesn’t impact performance.

 

The video renderer is the last filter in the video pipe, and it is responsible for displaying the output of upstream filters. The video renderer is just a controller for the underlying display driver, and does not do any processing on the image samples themselves.

The video renderer operates in two distinct modes:

  • GDI
  • DirectDraw

When the graph is first connected, the video renderer always tries to connect using GDI, and for that, it will need a connection with an RGB media type that matches the display format of the primary monitor. Just when it goes into Paused mode, the video renderer will try to allocate surfaces using DirectDraw. This dual mode of operation was envisioned so as to always have a fall back plan in case DirectDraw surfaces were not available in some circumstances.

Choosing an Accelerated Media Type

When the video renderer goes into Paused mode, it is time to allocate the DirectDraw surfaces. The video renderer will do so by enumerating all media types of the upstream filter, and then trying to allocate a surface matching that media type. For instance, let's assume the upstream filter is the WMV DMO. It currently supports the following output media types (the preferred media type is the first one):

  • YV12
  • NV12
  • YUY2
  • I420
  • IYUV
  • UYVY
  • YVYU
  • RGB565
  • RGB555
  • RGB32
  • RGB24
  • RGB8

The video renderer will try to allocate flipping overlay surfaces first, then non-flipping surfaces:

    • For each media type of the upstream filter, in the order dictated by the upstream filter
      • try to allocate a flipping surface of that media type
      • If it succeeds, call QueryAccept on the upstream filter's output pin
      • If it succeeds, use it
    • If the previous didn't succeed, try to allocate a primary flipping surface (if enabled)
      • If it succeeds, call QueryAccept on the upstream filter's output pin
      • If it succeeds, use it
    • If the previous didn't succeed, for each media type of the upstream filter, in the order dictated by the upstream filter
      • try to allocate a surface (not flipping) of that media type
      • If it succeeds, call QueryAccept on the upstream filter's output pin
      • If it succeeds, use it

In this way, if the upstream filter has optimized for certain YUV formats, it can control the choice of media type. In case the display driver can also provide a surface of that type, the accelerated media type is chosen. The whole process is driven by the upstream filter, with the display driver in a passive role.

Dynamic Format Changes from the Video Renderer

Of course, for an optimal pipe, we would like to always have overlay flipping surfaces available. Nevertheless, that may not be the case in some situations. For instance, depending on the display driver capabilities, the flipping overlay may just be available when the user is watching the video at its original size. When the user is stretching or shrinking, overlays might not be available. This is controlled by the DirectDraw hardware capabilities dwMinOverlayStretch and dwMaxOverlayStretch (see http://msdn2.microsoft.com/en-us/library/aa915204.aspx). So, if the display driver doesn't support overlay stretching, and the video renderer is currently using overlays, it will need to swap to GDI (and thus to RGB format), so that GDI will do the necessary scaling.

Note that every time the upstream filter requests a new buffer from the video renderer, the video renderer will try to return a DirectDraw buffer. If all the conditions to use the DirectDraw buffer are OK (clipping, stretching, video memory, etc.), then it will use it. Just in case one of the conditions fail it will resort to using GDI.

Debugging Video Renderer Connection Problems

We have seen some common connection problems when initially bringing up new decoder filters and/or capture drivers:

  • Color space converter is inserted in the graph
  • Video renderer doesn't connect
  • YUV surfaces are not being used, just GDI

Analyzing the DirectShow logs 

The first step in this case is to turn on the debug zones for the DirectShow DLL, quartz.dll, and observe the connection and video renderer messages. 

Run your test scenario, and save the debug output. Look for the section that says "Filter Graph Dump", and verify which filters got inserted in the graph. Here's an example of a filter graph dump:

Filter graph dump
Filter 1a199a30 'Video Renderer' Iunknown 1a199a20
    Pin 1a199f10 Input (Input) connected to 1a0e1880
Filter 1a0e1200 'WMVideo & MPEG4 Decoder DMO' Iunknown 1a0e11f0
    Pin 1a0e16e0 in0 (Input) connected to 1a0e0600
    Pin 1a0e1880 out0 (PINDIR_OUTPUT) connected to 1a199f10
    Pin 1a0e1a00 ~out1 (PINDIR_OUTPUT) connected to 0
Filter 1a0ecc60 'ASF ICM Handler' Iunknown 1a0ecc50
    Pin 1a0ecd70 In (Input) connected to 1a0aa3a0
    Pin 1a0e0600 Out (PINDIR_OUTPUT) connected to 1a0e16e0
Filter 1a0ec240 'Audio Renderer' Iunknown 1a0ec230
wo: GetPin, 0
    Pin 1a0ec4e0 Audio Input pin (rendered) (Input) connected to 1a0eb880
Filter 1a0eb220 'WMAudio Decoder DMO' Iunknown 1a0eb210
    Pin 1a0eb660 in0 (Input) connected to 1a0ea800
    Pin 1a0eb880 out0 (PINDIR_OUTPUT) connected to 1a0ec4e0
Filter 1a0e9380 'ASF ACM Handler' Iunknown 1a0e9370
    Pin 1a0e9490 In (Input) connected to 1a0aa000
    Pin 1a0ea800 Out (PINDIR_OUTPUT) connected to 1a0eb660
Filter 1a0a2ae0 '\Hard Disk2\clips\wmv\0-1.asf' Iunknown 1a0a2ad0
    Pin 1a0aa000 Stream 1 (PINDIR_OUTPUT) connected to 1a0e9490
    Pin 1a0aa3a0 Stream 2 (PINDIR_OUTPUT) connected to 1a0ecd70
End of filter graph dump

After that, verify which media type the video renderer is using when trying accelerated mode (and if it succeeded). Search for "Allocating video resources":

Allocating video resources
Initialising DCI/DirectDraw
Searching for direct format
Entering ReleaseSurfaces
Entering HideOverlaySurface
Enumerated 32315659
Entering FindSurface
Entering GetMediaType
Not a RGB format
Entering CreateYUVFlipping
Entering CheckCreateOverlay
GWES Hook fails surface creation. IDirectDraw::CreateSurface fails.
No surface
Entering ReleaseSurfaces
Entering HideOverlaySurface
Enumerated 3231564e
Entering FindSurface
Entering GetMediaType
Not a RGB format
Entering CreateYUVFlipping
Entering CheckCreateOverlay
GWES Hook fails surface creation. IDirectDraw::CreateSurface fails.
No surface
Entering ReleaseSurfaces
Entering HideOverlaySurface
Enumerated 32595559
Entering FindSurface
Entering GetMediaType
Not a RGB format
Entering CreateYUVFlipping
Entering CheckCreateOverlay
Entering InitOverlaySurface
Entering InitDrawFormat
Entering InitDrawFormat
Entering GetDefaultColorKey
Returning default colour key
Entering InitDefaultColourKey
Entering SetSurfaceSize
Preparing source and destination rectangles
Entering ClipPrepare
Entering InitialiseClipper
Entering InitialiseColourKey
overlay color key on
Colour key
No palette
Found AMDDS_YUVFLP surface
Proposing output type  M type MEDIATYPE_Video  S type MEDIASUBTYPE_YUY2

Note in the above log that the video renderer tried to create surfaces in the order specified by the WMV DMO. For the display driver in use for the above log, it managed to create a YUY2 surface, the third option for the WMV decoder. The last section in this blog entry has more information about FourCC codes.

Here are some solutions for common connection problems we have faced in the past.

Color Space Converter is inserted in the graph 

The number one problem is that the upstream filter doesn't report any RGB format, just YUV formats. If that's the case, the video renderer can't connect directly to the filter since it requires a matching RGB format. Usually, the color space converter will be inserted in the graph in these cases. We don't want this to happen, as it will imply a memory copy of each frame buffer, so we want to make sure the upstream filter does provide RGB formats.

Sometimes the color converted gets inserted in the graph even though the upstream filter does support the needed RGB format. This can happen because the upstream filter is requiring an alignment different than 1 when the allocator is being decided. Currently, the video renderer will just accept 1-byte alignments.

Another common reason for the color converter to be inserted in the graph is when the BITMAPINFOHEADER supplied by the upstream filter doesn't contain the bitmasks correctly at the end of BITMAPINFOHEADER that is passed when getting the output media types. Please make sure that the bitmasks are inserted correctly. For instance, for RGB565, we should have:

        *pdwBitfield++  = 0xF800;       // Red – 5

 *pdwBitfield++  = 0x07E0;       // Green - 6

 *pdwBitfield    = 0x001F;       // Blue - 5

Graph doesn't connect at all

If the upstream filter just supports a subset of YUV formats, and none of these are recognized by the color space converter, then it won't be possible at all to connect the video renderer. Again, in this case the solution is for the upstream filter to provide RGB formats.

YUV Surfaces are not used, just GDI

Another common occurrence is for the upstream filters to provide allocators. If this is the case, the video renderer will be tied to not using DirectDraw (as it can't pass upstream memory buffers to DirectDraw). If we want the optimal overlay flipping path, the video renderer *needs* to be the allocator, so that it is possible for it to provide DirectDraw surfaces upstream.

Surface Types: Controlling Which Surfaces the Video Renderer Creates

There are ways to control which accelerated surfaces the video renderer is allowed or not to create that are useful when debugging the connection process, specially to reduce the number of options and the number of tries in the display driver. This is controlled via a registry key (see http://msdn2.microsoft.com/en-us/library/aa930626.aspx):

HKEY_LOCAL_MACHINE\Software\Microsoft\DirectX\DirectShow\Video Renderer\SurfaceTypes

The following table shows the AMDDS values for use with the SurfaceTypes named value.

Flag Hexadecimal value Description

AMDDS_NONE

0x00

No support for Device Control Interface (DCI) or DirectDraw.

AMDDS_DCIPS

0x01

Use DCI primary surface.

AMDDS_PS

0x02

Use DirectDraw primary surface.

AMDDS_RGBOVR

0x04

RGB overlay surfaces.

AMDDS_YUVOVR

0x08

YUV overlay surfaces.

AMDDS_RGBOFF

0x10

RGB off-screen surfaces.

AMDDS_YUVOFF

0x20

YUV off-screen surfaces.

AMDDS_RGBFLP

0x40

RGB flipping surfaces.

AMDDS_YUVFLP

0x80

YUV flipping surfaces.

AMDDS_ALL

0xFF

Use all available surfaces.

AMDDS_DEFAULT

0xFF

Use all available surfaces.

AMDDS_YUV

0xA8

(AMDDS_YUVOFF | AMDDS_YUVOVR | AMDDS_YUVFLP)

AMDDS_RGB

0x58

(AMDDS_RGBOFF | AMDDS_RGBOVR | AMDDS_RGBFLP)

AMDDS_PRIMARY

0x03

(AMDDS_DCIPS | AMDDS_PS)

If you just want to enable YUV overlay flipping surfaces for debugging purposes, you should set the SurfaceTypes registry key to AMDDS_YUVFLP. Remember to turn on all surfaces back on after you finished debugging your problem...

About FourCC codes:

Note that in the example log we list the FOURCC codes that are being used in the line "Enumerated 32315659". Here's how to map this hex number into a character sequence that will help identify the code:

 Enumerated 32315659

0x32 = '2', 0x31 = '1', 0x56 = 'V', 0x59 = 'Y' ===> 0x32315659 = YV12

 Enumerated 3231564e

0x32 = '2', 0x31 = '1', 0x56 = 'V', 0x4e = 'N' ===> 0x3231564e = NV12

 Enumerated 32595559

0x32 = '2', 0x59 = 'Y', 0x55 = 'U', 0x59 = 'Y' ===> 0x32595559 = YUY2

 Enumerated 56555949

0x56 = 'V', 0x55 = 'U', 0x59 = 'Y', 0x49 = 'I' ===> 0x56555949 = IYUV

 Enumerated 59565955

0x59 = 'Y', 0x56 = 'V', 0x59 = 'Y', 0x55 = 'U' ===> 0x59565955 = UYVY

 Enumerated 55595659

0x55 = 'U', 0x59 = 'Y', 0x56 = 'V', 0x59 = 'Y' ===> 0x55595659 = YVYU

Also, the file in public\directx\sdk\inc\uuids.h contains several FOURCC media subtypes definitions:

  • // 32595559-0000-0010-8000-00AA00389B71 'YUY2' == MEDIASUBTYPE_YUY2
  • OUR_GUID_ENTRY(MEDIASUBTYPE_YUY2,
  • 0x32595559, 0x0000, 0x0010, 0x80, 0x00, 0x00, 0xaa, 0x00, 0x38, 0x9b, 0x71)
  • // 55595659-0000-0010-8000-00AA00389B71 'YVYU' == MEDIASUBTYPE_YVYU
  • OUR_GUID_ENTRY(MEDIASUBTYPE_YVYU,
  • 0x55595659, 0x0000, 0x0010, 0x80, 0x00, 0x00, 0xaa, 0x00, 0x38, 0x9b, 0x71)
  • // 59565955-0000-0010-8000-00AA00389B71 'UYVY' == MEDIASUBTYPE_UYVY
  • OUR_GUID_ENTRY(MEDIASUBTYPE_UYVY,
  • 0x59565955, 0x0000, 0x0010, 0x80, 0x00, 0x00, 0xaa, 0x00, 0x38, 0x9b, 0x71)
  • // 31313259-0000-0010-8000-00AA00389B71 'Y211' == MEDIASUBTYPE_Y211
  • OUR_GUID_ENTRY(MEDIASUBTYPE_Y211,
  • 0x31313259, 0x0000, 0x0010, 0x80, 0x00, 0x00, 0xaa, 0x00, 0x38, 0x9b, 0x71)
  • // 32315659-0000-0010-8000-00AA00389B71 'YV12' == MEDIASUBTYPE_YV12
  • OUR_GUID_ENTRY(MEDIASUBTYPE_YV12,
  • 0x32315659, 0x0000, 0x0010, 0x80, 0x00, 0x00, 0xaa, 0x00, 0x38, 0x9b, 0x71)
  • // 36313259-0000-0010-8000-00AA00389B71 'YV16' == MEDIASUBTYPE_YV16
  • OUR_GUID_ENTRY(MEDIASUBTYPE_YV16,
  • 0x36315659, 0x0000, 0x0010, 0x80, 0x00, 0x00, 0xaa, 0x00, 0x38, 0x9b, 0x71)
  • // 56595549-0000-0010-8000-00AA00389B71 'IUYV' == MEDIASUBTYPE_IUYV
  • OUR_GUID_ENTRY(MEDIASUBTYPE_IUYV,
  • 0x56595549, 0x0000, 0x0010, 0x80, 0x00, 0x00, 0xaa, 0x00, 0x38, 0x9b, 0x71)
  • // 3231564E-0000-0010-8000-00AA00389B71 'NV12' == MEDIASUBTYPE_NV12
  • OUR_GUID_ENTRY(MEDIASUBTYPE_NV12,
  • 0x3231564E, 0x0000, 0x0010, 0x80, 0x00, 0x00, 0xaa, 0x00, 0x38, 0x9b, 0x71)
  • // 30323449-0000-0010-8000-00AA00389B71 'I420' == MEDIASUBTYPE_I420
  • OUR_GUID_ENTRY(MEDIASUBTYPE_I420,
  • 0x30323449, 0x0000, 0x0010, 0x80, 0x00, 0x00, 0xaa, 0x00, 0x38, 0x9b, 0x71)
  • // 56555949-0000-0010-8000-00AA00389B71 'IYUV' == MEDIASUBTYPE_IYUV
  • OUR_GUID_ENTRY(MEDIASUBTYPE_IYUV,
  • 0x56555949, 0x0000, 0x0010, 0x80, 0x00, 0x00, 0xaa, 0x00, 0x38, 0x9b, 0x71)

Please do leave feedback, and let us know if this has been useful. Thanks,

Lucia

All filters in a DirectShow graph should be synchronized to the same clock, the reference clock. The filter graph manager makes sure that it finds one component that will be the reference clock, in the following simplified order: user-specified clock, renderer (usually audio renderer), or system clock if none available before.

The stream time is based off the reference clock, but relative to the time the graph last started running (so the stream time doesn't move if the graph is paused). If a media sample that enters a renderer has a time stamp t, then it means it should be rendered at stream time t. This is the basic mechanism by which a/v synchronization occurs.

There is usually a crystal in the audio hardware though, and no guarantees that the hardware timer will match the system clock. That's why usually we have the audio renderer being the reference clock for the whole DirectShow graph. If the audio renderer receives a sample late, or if the audio clock is consistently drifting from the system clock, then the audio renderer will issue stream time adjustments.

An audio renderer implementation will usually inherit from the CBaseReferenceClock class, and will call SetTimeDelta() function whenever it needs to do an adjustment to the stream time. Note that it should use a low pass filter before sending adjustments to the master clock so that no unnecessary jittering is introduced.

As the video renderer uses the incoming timestamps to schedule samples for presentation, the scheduler is based off stream time, and the audio renderer has control to change the stream time, the video and audio renderer will be using the same timeline.

About the Video Renderer & Frame Dropping

If the video is running slow, and all video frames are being rendered, then theoretically the video renderer will receive samples with timestamps in the past and schedule them for immediate rendering. If this situation continues to happen, what will happen is that the video is going to be behind audio. This shows the need for frame dropping.

In fact, audio and video synchronization in DirectShow works by a combination of two elements:

  • Audio renderer controlling the DirectShow stream time;
  • IQualityControl and IDMOQualityControl interfaces guiding frame dropping algorithm

Dropping frames at the video renderer level is of course not very effective. If using overlay flipping surfaces, for instance, dropping a frame doesn't get you much farther trying to catch up (because the flipping itself is very cheap). Even in the case of Blits, it is still going to help very little (rendering time is small compared to decoding time). That's why there is the need to indicate the state and lateness of the renderer to upstream filters/DMOs, which is done through the quality notification messages.

The video renderer originates the notification messages (since it is the filter that needs to run in real time), and sends them upstream. If the upstream filter is a decoder, and it can handle it, it doesn't pass the message upstream. If it can't handle, then it passes it upstream. Note that the video renderer will drop frames anyway if it is very late.

Here's a coarse example of how to use the Quality interface to be able to drop frames in a decoder filter:

HRESULT CDecoderFilterPin::Notify(IBaseFilter *pSender, Quality q)

{

       if (quality sink has been set)           // m_pQSink

       {

              status = Pass Notify on the quality sink   (base sender is the decoder filter now)

}

else

{

if (has frame dropping algorithm)

              {

                     status = Call decoder filter to do frame dropping

              }

              else

              {

                     if (upstreamQualityControl)

                     {

                           status = Pass Notify() on to upstream quality control interface (base sender is the decoder filter now)

                     }

                     else

                     {

                            status = not handled;

                     }

}

}

return status

}

 

The decoder needs to decide, given what the current time is (given by the IQualityControl/IDMOQualityControl interface) if it needs or not to drop frames. Algorithms for frame dropping can vary from the extremely simple to very complicate. An example of an extremely simple one follows:

 

The quality notification message will indicate how late we were when we last rendered a video frame. A very simple algorithm would be to drop all frames until you arrive at that "time", so that catch-up happens fast:

 

CatchupPTS = q.Late + q.Timestamp

Drop all frames until PTS_frame >= Catchup_PTS

 

Of course, there are many variations for this. If B-frames are available, start trying to catch up by dropping B-frames. If the decoder has decoding decoupled from output generation (or color conversion etc.), a first step when you're late is to start dropping the output generation, and try to catch up. If neither is working, then you may have to drop P-frames, in which case you'll have to wait until the next I-frame, which is never a good user experience because the spacing between I-frames may be large.

 

When doing trick modes, the algorithm for dropping frames will usually be more forgiving, since frames are arriving at rates other than 1.0. In fact, at high-rates, all samples that the decoder receives are probably going to be key frames anyway. In this case, you may drop any frame without a significant penalty, making catch up much easier.

 

This is just a quick note describing the behavior of the MM_WOM_FORCESPEAKER API built into the wavedev2 wave driver.

 

One issue in Smartphone devices is determining where to route ringtones and other high-priority notifications. This is driven by two scenarios:

 

1. A fairly common design feature of Smartphone devices is a separate "high-volume" speaker on the rear of the device to play ringtones or fairly loud notifications. The OEM may want most system sounds to play through the normal handset speaker, but to have incoming call ringtones routed through the rear speaker.

 

2. When the user has a headset plugged in, all audio is routed to that headset. However, what about the situation where the headset is plugged in but the user isn't wearing it? For this reason, some OEMs may choose to play ringtones through the device speaker even if a headset is attached.

 

The common thread to each of these scenarios is that the audio device driver needs to make some decision about where to route the audio based on whether the sound being played is a ringtone (or other high-priority notification), or is just normal audio such as music being played through media player.

 

The wave API doesn't include any built-in provision for the wave driver to determine who is trying to play a sound or where it should be routed. One might use the mixer API to address this, but the mixer API is somewhat complicated and doesn't lend itself to handling this on a per-wave-stream basis. Instead, when we first developed the wavedev2 driver back in 2000, we defined a proprietary wave message, MM_WOM_FORCESPEAKER, which is sent to the driver as a hint that the associated wave stream is a ringtone or notification that should possibly be played over over the device's speaker.

 

The format of this call is: 

 

                waveOutMessage(<DeviceID | hWaveOut >, MM_WOM_FORCESPEAKER, bSpeaker, 0);

 

The first parameter can be either a Device ID (e.g. 0 through N-1 if there are N devices in the system), or the handle of an open wave device. 

The third param tells the driver whether to route sound to the speaker (TRUE) or not (FALSE).

 

If param1 is a wave handle (and bSpeaker is TRUE), the sound will be routed to the speaker until you either call the API again with the same handle and bSpeaker==FALSE, or you close the wave handle (or your app exits). If an application calls this API with bSpeaker==FALSE, audio should be routed to the earpiece rather than the speaker.

 

If param1 is a device ID and bSpeaker==TRUE, all audio is routed to the speaker until there's a matching call to the API with that device ID and bSpeaker==FALSE. This behavior will persist even if the application that made the original call exits. We generally never use this form of the API.

 

Other design details:

 

-  This API is more of a hint than a command to the driver. It’s up to the OEM to decide how to interpret this message; we don’t place any requirements on how it is interpreted. Some drivers might just ignore it.

 

- The sample driver does a pretty good job of refcounting which steams have enabled the speaker, so if you turn the speaker on twice for a specific stream it won't really do any harm (we'll ignore the second call).

 

- If multiple streams are playing and only one of them has turned the speaker on, all of the streams will be routed out the speaker until the stream that turned it on either goes away or turns it back off. As an example, if you're listening to music over WMP and an a ringtone gets played, while the incoming ring is being played you may hear both the incoming ring and the music being played through the speaker. If this is objectionable, OEMs may choose to mute (in software) these other streams for the duration that the ringtone is being played over the speaker.

 

- Most of this design is geared toward hardware implementations where a single piece of audio hardware is multiplexed between one or more speakers and headset jacks (which was the only thing available when the API was designed).

 

- The EventSound subsystem (which manages playback of ringtones) keeps track of which tones should be played with ForceSpeaker turned on via a set of reg keys. OEMs can therefore customize which notifications should play over the speaker.

 

- Don't be surprised if this goes away or gets redesigned in some future release; it's really a Smartphone/PPC proprietary feature, and after 7 years it's getting a little long in the tooth.

 

Feel free to leave feedback (does anyone read this stuff?).

 

-Andy

 

Responses to questions:

1. Where is MM_WOM_FORCESPEAKER defined?

- It should be in audiosys.h, but that might have moved around a bit. The definitions you're looking for are:

    #define MM_WOM_SETSECONDARYGAINCLASS   (WM_USER)

    #define MM_WOM_SETSECONDARYGAINLIMIT   (WM_USER+1)

    #define MM_WOM_FORCESPEAKER            (WM_USER+2)

Keep in mind that these are very likely to change or go away in future release of the operating system.

 

 

Most of the infrastructure is in place to support multichannel audio in Windows CE, although the number of components that we ship to actually implement it is limited. In this blog I'll cover the varying types of multichannel audio and what features are in place in Windows CE to support it.

For the purposes of this blog I'll define multichannel audio to mean any audio stream containing more than two channel stereo. We'll further subdivide multichannel audio into three types, differentiated by how the audio data gets from your CE device (e.g. Set Top Box, Smartphone/PPC, whatever) to your receiver:

1. Analog Matrix Decoders: In this type of decoding, multichannel audio is sent as left/right stereo data to the receiver. In a simple stereo receiver the audio can still be played through left and right speakers and will sound more-or-less correct. However, a receiver supporting the appropriate decoder can use cues placed in the audio to decode to more than two speakers and synthesize additional channels. The most well known decoders are Dolby Pro Logic and Pro Logic II.

2. Compressed audio over S/PDIF: S/PDIF was originally designed to support a maximum of 4 decompressed PCM channels. While one might use this to pass four discrete audio signals to a receiver, today's multichannel content typically has at least six channels (e.g. 5.1). There's no way to squeeze 6 decompressed audio channels across S/PDIF. The solution to this has been to pass the compressed audio over S/PDIF and let the receiver decompress it (as long as the compressed data bandwidth is less than that which would be required for four decompressed channels). Apart from enabling use of a single cable from the device to the receiver, this has the added benefit of offloading the audio decompression processing into the receiver. The downside of this architecture is that it relies on the receiver to be able to correctly decode the audio data. This is complicated because S/PDIF is a one-way transmission mechanism, so there's no way to query the receiver at runtime to determine what it supports. There are potential timing/latency issues with lip synch.

Transferring compressed audio over S/PDIF typically involves massaging the compressed audio into a format that matches the S/PDIF frame format
(e.g. by padding the data with zero's as needed) and adding some header information that lets the receiver figure out that you're sending a
compressed audio stream rather than PCM. Both WMAPro and Dolby AC3 have a spec for this. Almost every receiver in the world supports AC3
decoding. A small (but a growing number) also support WMAPro (Pioneer in particular have spread WMAPro support to even the low end of their product line).

Info on WMAPro is here:
http://download.microsoft.com/download/5/b/5/5b5bec17-ea71-4653-9539-204a672f11cf/wmadrv.doc

Info on AC3-over-S/PDIF is here in Appendix B:
http://www.dolby.com/assets/pdf/tech_library/46_DDEncodingGuidelines.pdf

3. Multiple discrete audio outputs: In this type of connection multiple PCM audio channels are sent to the receiver. Until recently this has meant a separate RCA cable for each channel: a six channel (e.g. 5.1) signal would require six cables between components. HDMI has the potential to overcome this limitation by supporting 6 or more decompressed audio channels (and video) via a single cable.

Outputting to 6 DAC channels presumes that you've already got decompressed multichannel content or you've got a compressed multichannel content that is going to get decompressed before being sent to the wave driver (e.g. AC3 or WMAPro). The latter case is most likely, which means you'll need the appropriate DirectShow decompression filter for CE.

Now, on to what CE supports (and doesn't): 

Device Drivers 

If you want to support either S/PDIF or multiple discrete audio channels, you should probably want to start with the Ensoniq wavedev2 sample driver that
shipped in the Windows CE 5.0 Networked Media Device Feature Pack for CE 5 under public\fp_nmd\common\oak\drivers\wavedev\wavedev2\ensoniq. (Note: everything in this feature pack was rolled forward to CE6 as well, so there's nothing in the feature pack that isn't available in CE6 as well, although it might be in a different place).

This version of the Ensoniq driver has S/PDIF support built into it (the Ensoniq 1371 chip has a sort-of-undocumented S/PDIF mode which we
take advantage of), and supports passing WMAPro-over-S/PDIF compressed date. Support for AC3-over-S/PDIF would be a fairly trivial modification.

One other issue with passing compressed data over S/PDIF is that since the data isn't decompressed to PCM until it gets to your
receiver, there's no way for you to programatically control the volume or mix it with other PCM audio data. The former isn't really a big
issue (the user can always control the volume on their receiver). The latter doesn't have a really great solution.

In the sample Ensoniq driver, whenever we're playing compressed WMAPro out the S/PDIF port we just throw away any PCM data that we're asked to
play so it's never heard (although we maintain the appropriate playback timing, so from the application standpoint everything appears to behave as expected).

To support multichannel discrete outputs in the wave driver one would need to modify the driver to accept a WAVEFORMATEX structure which looks like a normal PCM format but for which the nChannels field is 6 (or more). This is not be a trivial exercise, but should be pretty straightforward. As part of this, for wavedev2 one would have to rewrite the output.cpp file to add a new output stream class that accepts 6 streams, and modify the render functions that handle sample-rate-conversion to support all 6 channels.

Note that the kernel software mixer only supports stereo streams, so it won't do any multichannel mixing for you. This is one reason wavedev2 is probably a good starting place, as it already has code built into it to mix stereo streams which could be extended to more channels.

DirectShow Filters

DirectShow is Windows CE's media processing infrastructure. The architecture is media-type agnostic, meaning that there's nothing in the overall design that makes it support one type of media any better than another. A number of outside customers are working on multichannel audio products using their own DirectShow filters (or filters they licensed from third-parties). The description below only discusses what Microsoft currently ships with CE5 and CE6. 

WMAPro-over-SPDIF filter: The abovementioned Feature Pack also includes a WMAPro-over-SPDIF DirectShow filter to massage WMAPro data into a format which can be sent over S/PDIF. Used in conjunction with a wave driver that supports WMAPro-over-SPDIF content and a receiver which supports decoding WMAPro, this allows a the best decoding quality and performance. To be honest, this isn't currently a terribly common scenario given the limited WMAPro receiver penetration in the market; we did this partly as a proof-of-concept, partly to support our own (Microsoft) technology, and partly because all the pieces were available to us within the company so it wasn't a major development effort. In addition, the architecture and driver changes are applicable to other more common formats (e.g. AC3); although we don't currently ship any explicit support for AC3 streams, OEMs have implemented AC3 support using a similar set of components based on some of this work.

WMAPro decoder: Windows CE includes a WMAPro decoder to decode 5.1, 6.1, and 7.1 compressed content. However, when CE5 was first shipped all our existing customers were still using stereo outputs, so there was no value in passing the discrete channels down to the wave driver. Therefore, while the version that shipped in CE5 decodes all the discrete channels internally, it downmixes them to stereo for output. Therefore, there is currently no way to get the discrete channels out of the WMAPro decoder. The NMD feature pack improved on this situation by introducing matrix-encoding into downmix algorithm: a receiver supporting Pro Logic or Pro Logic II should be able to make use of this information to partially regenerate the discrete channels which were lost during the downmix. We'll look into improving this situation if there's sufficient customer demand.

Dolby AC3: Dolby AC3 is  probably the most common/popular multichannel format. Microsoft doesn't currently ship a Dolby AC3 Directshow decoder, although there are probably lots of third party companies that produce such a thing and there may be open source versions (google "ac3filter").

That's all I've got for now. Please let me know if you found this useful, if there were any errors, or if you have any questions.

Responses to comments (if I misunderstood anyone's question, please let me know):

1. How can I playback audio content simultaneously to both analog audio jacks and S/PDIF (Ianbing)

If I understand correctly, you're trying to play the same audio content over two connections simultaneously (one RCA analog audio jack, and one S/PDIF jack). Assuming that's correct:
- If your audio hardware can simultaneously send a single audio stream over both connections, have the audio driver handle it internally and just expose a single device at the waveapi level.
- If you have two separate pieces of audio hardware (one to handle analog, the other for S/PDIF), you'll need to split the PCM output of the decoder (using a Tee filter- I think there's one under public\directx\sdk\samples\dshow\filters\inftee) and hook both outputs of the tee to the wave renderer.

The latter design causes an additional problem because you'll need a way to tell each renderer which audio device to playback to. To do this, you'll need to hand-construct the graph, get pointers to each of the two audio render filters, and tell each wave renderer which device ID to play to. I don't believe I've ever tried this, but it should be possible by creating an IPropertyBag object (I think you'll have to roll-your-own, but it's not too difficult), setting the "WaveOutId" property to the ID you want to use, and pass that propertybag to the IPersistProperty interface on the wave renderer.

Your code would look something like this (sorry, I haven't compiled/tested this):

    // CPropertyBag is your implementation of the IPropertyBag interface.

    // We might have a public sample of this (search for cpropertybag.cpp), but I'm not sure

    CPropertyBag PropertyBag;

 

    // Setup your desired device ID

    VARIANT var;

    var.vt = VT_I4;

    var.lVal = <desired device ID>;

 

    // Write the desired ID to your property bag

    PropBag.Write( L"WaveOutId", &var ));

 

    // Find the waveout renderer in the graph that you want to talk to...

    ...

 

    // QI for the IID_IPersistPropertyBag interface... something like this...

    IPersistPropertyBag *pPersistPropertyBag = NULL;

    pWaveOutFilter->QueryInterface(IID_IPersistPropertyBag, (void **)&pPersistPropertyBag);

 

    // Pass the property bag into the wave renderer

    pPersistPropertyBag->Load( &PropBag, NULL );

Deep inside the Load call, the waveout renderer will do something like this with the PropBag pointer you passed in:

    VARIANT var;
    var.vt = VT_I4;
    HRESULT hr = pPropBag->Read(L"WaveOutId", &var, 0);
    if(SUCCEEDED(hr))
    {
        m_iWaveOutId = var.lVal;
    }

 

In the Windows CE audio stack, the term "mixer" is used to refer to a couple of different, unrelated components. This blog will try to define each of them and how they differ.

There are usually three different contexts in which "mixer" is used: the "Software Mixer", the "WaveDev2 Mixer", and the "Mixer API".

The "Software Mixer"

Inside the waveapi module there is a software mixer, sometimes also called the "kernel mixer", which can be used to mix and sample-rate-convert multiple PCM audio output streams. This software mixer was added in CE 4.2 to allow audio drivers which only support one output stream to automagically support multiple concurrent streams at different sampling rates.

Internally, the software mixer spins off a thread for each wave device to which it's attached. This thread takes application audio buffers and mixes them together into a set of mixer buffers which are then passed down to the audio driver. During this process the software mixer performs these tasks:

  • Converts all data to a common 16-bit 2-channel (stereo) format.

    Note that this means that for an audio driver to work with the software mixer it must support 16-bit stereo data. If your underlying hardware only supports mono data and you want to make use of the software mixer, your audio driver will need to accept the stereo data and mix it down internally to mono. In this scenario if an application passes mono data down through the stack, the software mixer will duplicate it to the two audio channels and the driver will then remerge the data: not elegant, but the typical performance impact is trivial.

    The mixer only supports application buffers with PCM formats of 8/16 bit samples and mono/stereo channels. The software mixer can't handle compressed data and can't handle multichannel (e.g. 5.1) PCM data. If you need the software mixer to play any of these formats, they need to be converted to something the mixer understands (e.g. 16-bit stereo PCM) first. This is typically done at the DShow or ACM level, above the software mixer. It only becomes an issue if you want to pass compressed or multichannel audio to the driver. When the software mixer sees a format it doesn't recognize or support, it steps out of the way and passes the waveOutOpen request directly to the device driver. At that point it's totally up to the driver to decide how to handle the request.

  • Sample-rate-converts all data to whatever sample rate the driver requires.

    The sample-rate-converter is currently a 5-point FIR. Without going into too much detail, the quality and performance have historically been a fairly good tradeoff for most devices, although there's certainly work we'd like to do in the future to improve the quality and performance.

    Note that to do the sample rate conversion the mixer needs to know what sample rate the wave driver requires (or pick a reasonable default). I'll cover how this works in another blog entry, along with other configuration details and some more info about the internal design.

  • Performs per-stream gain control.

    When an application calls waveOutSetVolume and passes a wave handle, the software mixer absorbs the call and handles this, so it will never be seen by the driver (the driver will still see waveOutSetVolume calls using a device ID though).

  • Implements support for waveOutSetRate.

The "WaveDev2 Mixer"

The WaveDev2 sample wave driver includes its own output mixer to mix PCM wave streams. This mixer performs basically the same function as the software mixer. Why do we have more-or-less the exact same feature in two different places? You can read my blog Windows CE Audio Driver Samples for more background, but basically because:

  • The waveapi software mixer didn't exist at the time WaveDev2 was developed (it first ran on CE 3.0).
  • The WaveDev2 mixer handles some proprietary calls that Smartphone/PPC need (see The Wavedev2 Gainclass Implementation). Someday we might fold those into the software mixer, but it hasn't happened yet.
  • The WaveDev2 mixer uses a less-cpu-intensive linear interpolation algorithm to perform mixing, and runs in the context of the driver's IST rather than a separate thread. This yields better performance in terms of CPU bandwidth, battery life, and latency (all of which are really important on battery-powered mobile devices).

Having the same feature in two different places and with slightly different/incompatible feature sets is generally not a good thing. Someday we'll rationalize this situation so the mixer only exists in one place.

The WaveDev2 sample driver also supports "input mixing"; this isn't really mixing, but it allows a single hardware input stream to be split and sample-rate-converted to multiple input clients. I'm including it here because inside the driver it shares the same architecture and alot of the same code paths.

The other difference between the Software Mixer and the WaveDev2 Mixer is that the former is in private code and isn't generally modifiable by OEMs, while the latter, being part of the OEM device driver, may be modified as needed. This can be a good or bad thing (depending on your desire to change the code and your expertise at not breaking it ;-)

The "Mixer API"

While the Software Mixer and WaveDev2 MIxer are concerned with mixing PCM audio streams coming down through the Wave API, there's a totally different and unrelated thing called the "Mixer API".

The Mixer API is an API which conceptually sits alongside other top-level APIs like the Wave API, ACM API, TAPI API, etc. The role of the Mixer API is to expose various low-level audio-related controls at the application level (e.g. things like volume, bass/treble, surround sound, etc.). When someone talks about the Mixer API, they're talking about the set of APIs including mixerOpen, mixerClose, etc. An MSDN reference page is here http://msdn2.microsoft.com/en-us/library/ms705739.aspx, and there are some interesting discussions of the Mixer API here Mixer API and here Larry Osterman's WebLog : Mapping audio topologies to mixer topologies,

I believe the mixer API was first introduced as part of the Windows Sound System (WSS) DDK. The Windows Sound System was a hardware reference design that Microsoft developed in the early 1990's to evangelize audio hardware support on the PC platform. One of the features of WSS was inclusion of a Crystal Semiconductor 4231 codec, which eventually evolved into the AC'97 codec spec. This codec had a number of hardware mixing and volume control features to support multiple inputs and outputs (cs4231a multimedia audio codec), but there was no API defined to allow applications to access them. Thus was born the mixer API.

The Mixer API was designed to allow a mixer application with no knowledge of the underlying audio architecture to create interactive UI for the end user. As a consequence, the Mixer API allows the application to query information suggesting what type of UI element to use to represent a control (e.g. a pushbutton, slider, multiple-select, etc.), and even query the labels that the application should display. On the desktop windows, when you run SndVol32 to bring up the mixer control panel (which uses the desktop's implementation of Mixer API), remember that SndVol32 has absolutely no a priori knowledge of what your soundcard supports. All those labels and controls are derived by calling down to the driver level.

All MixerAPI calls into the audio device driver are done via an IOCTL_MIX_MESSAGE IoControl. An audio driver can add support for the mixer API by adding support for this call and the myriad messages that route through it. Although Windows CE audio sample drivers typically include sample code to support for the mixer API, in general it's an optional part of the wave driver and there are very few applications that make use of it (although there are some, notably VoIP).

 

Time to switch gears a bit away from audio...

First a minor digression into what we were building (which some people may find interesting, and this is still a Multimedia blog ;-). Then I'll get to how this involves the FAT filesystem.

One of the major components of the Tomatin feature pack was a DVR engine to allow OEMs to develop their own digital video recorders for IP video broadcasts. At its heart, a DVR must support the ability to write a video stream to hard disk and at the same time read a video stream back from hard disk for playback, all at real-time speeds and without glitches or dropouts. Other important features include the ability to fast-forward and rewind playback (while still displaying video onscreen).

The core DVR design was ported from another Microsoft team building an NTSC broadcast DVR. I'm never sure how much info I can give out in these blogs about other groups and project code names, so I don't mean to deny credit- those guys (IMHO) did a nice job and delivered to us exactly what they promised (how often can you say that about code that gets dropped in your lap from another team).

In any case, their DVR implementation included a hardware decoder which took NTSC video and encoded it to an MPEG2 Program Stream. The DVR engine handled writing/reading this stream to/from a hard disk, and also managed a myriad of other details like tracking and indexing key frames for fast forward/rewind/seek, managing A/V synchronization, compensating for drift between the record and playback streams, and a ton of other stuff. The output of the DVR engine was also an MPEG2 Program Stream, for which they had a hardware decoder which would generate and output the final video and audio data.

IPTV systems typically transmit their data as MPEG2 Transport Streams. These are somewhat different from MPEG2 Program Streams, and the two are neither subsets nor supersets of each other. To convert from one to the other you really need to do a full demultiplex/remultiplex operation. The bad news for us was that our DVR was designed to accept Program Streams, not Transport Streams; and we frankly didn't have the time or expertise to rewrite and retest that portion of the code. The good news was that IPTV hardware typically includes a Transport Stream demultiplexer which can produce demultiplexed (AKA elementary) video and audio streams, so we just needed to remultiplex the streams to get a Program Stream. The bad news was that we didn't have such a multiplexer, nor did anyone else inside of Microsoft as far as we could tell. 

To make a long story short, one of our developers wrote, from scratch, an MPEG2 Program Stream multiplexer which takes elementary audio and video streams and turns them into an MPEG2 Program Stream suitable for use by the DVR engine. This DShow component shipped as part of the Tomatin feature pack.

For testing purposes, we also ported the desktop's implementation of the MPEG2 Transport/Program Stream Demultiplexer. We used this component strictly for our own testing, and never ran it through its own system/unit test cycle, so this component didn't ship with Tomatin. However, it did turn out to be quite stable and it was made available at http://codegallery.gotdotnet.com/wincedemux.

Back to our story... 

At the start of the project our goal was to record and playback standard definition content (e.g. 640x480). Partway through the project a key OEM came onboard and added the requirement to handle HDTV content at 20Mbit/sec. Ugh.

The hardware this had to run on was essentially a low-end x86-compatible processor with hardware acceleration of MPEG2 TS demultiplexing and audio/video decoding and playback. Storage was on a fairly generic 7200RPM 300GB IDE hard disk. All our code had to do (performance wise) was remultiplex the elementary streams into a program stream, write it to disk, read it from disk, demultiplex it to separate audio and video streams, and send those streams to the decoding/playback hardware. While this might not seem like that much work, at 20Mbits/sec it was pretty tough to do on low-end x86 hardware, and we spent alot of time optimizing our inner loops (e.g. the code to scan for the next MPEG2 packet start code).

The only filesystem well tested and available from within Microsoft for WinCE is the FAT filesystem. We had heard anecdotal stories of  performance issues with FAT with large hard disks, but we didn't have time in the schedule to write our own filesystem (we did spend a bit of time looking around for something elsewhere in the company that we could reuse, but ultimately didn't find anything suitable from a risk/schedule standpoint).

During design and testing we did experience a number of bottlenecks due to the FATFS design, and we got really familiar with the FAT FS architecture and its limitations. Ultimately, working with the filesystem team, we got fixes for issues that could be fixed (which have since been released as CE 5.0 QFEs), and developed workarounds in our code for problems inherent in the filesystem. That's what this blog is about. Keep in mind that I'm a multimedia guy, so I'll probably gloss over things a bit. Feel free to comment on things I get wrong or wasn’t clear about.

To understand FAT limitations you need a little background.

Sectors: Sectors are the smallest unit of storage on a disk. A hard disk sector is (typically) 512 bytes. A 300GB hard disk will have approximately 586 million sectors.

Clusters: The cluster is a logical grouping of sectors. For a given FAT disk format, a cluster is made up of a fixed number of sectors which is a power of 2. Filesystems typically organize data at the cluster level. Choosing an appropriate cluster size is a trade-off: the bookkeeping overhead for a file is going to be proportional to the number of clusters it contains, and a file always utilizes an integer number of clusters. If you choose too small a cluster size, the amount of overhead (both storage and CPU) associated with a file may be quite large. If you choose too large a cluster size, you'll end up wasting unused cluster space at the end of each file. (Note: Other non-FAT filesystems have come up with a variety of ways to alleviate this problem, but we're dealing with FAT here).

FAT: The FAT, or File Allocation Table, is an array at the start of the disk where each element in the array is associated with a specific cluster. There are different variants of FAT which are represented by the size of each FAT entry. We're only interested in FAT32, where each FAT entry is a 32-bit value, and which is the only version appropriate for relatively large hard disks.

For FAT32 the cluster size may vary depending on how it was formatted. However, the maximum cluster size for FAT32 is 32k, which is 64 sectors. On a 300GB hard disk there will be roughly 9 million clusters. The FAT table itself will take approximately 37.5MB of disk space, spanning 73 thousand sectors. If you've enabled the backup FAT table, the FAT filesystem keeps an additional copy of the FAT (I'm not sure if it does this by default, or if that's controlled by the FATFS_ENABLE_BACKUP_FAT flag).

This is one of the first limitations of the FAT design: the 32k maximum cluster size is too small for such large hard disks, and contributes to the overhead in a variety of ways. 

Each entry in the FAT represents a specific cluster on the disk. The set of clusters which make up a specific file are tracked by using the FAT array as a singly linked list: the FAT entry for a given cluster will contain the number of the next cluster in the file. The FAT entry of the last cluster in the file will have the FAT entry value of 0xFFFFFFFF. To seek to a specific location in a file, one must traverse the entire FAT chain of that file to find the cluster associated with that location. A 4GB file will contain 125,000 clusters, so seeking to the end of that file will entail traversing 125,000 FAT entries. This is one source of performance bottlenecks.

Unallocated clusters are represented by the value 0 in the cluster's associated FAT entry. There is no mechanism built into the FAT on-disk architecture to track unallocated clusters. When allocating a new cluster to a file, the filesystem code must search the FAT until it finds an unallocated cluster. On a large mostly allocated hard disk this can take an extremely large amount of time- yet another bottleneck.

It might help to understand how FAT searches for a free cluster, which is optimized to attempt to allocate clusters contiguously: whenever FAT needs to allocate a new cluster to a file, it searches the FAT table starting at the last cluster it successfully allocated. If you’re doing a single allocation of multiple clusters, it remembers (as a local variable in the loop) this last cluster it allocated to that specific file and will do a good job allocating contiguously from that point for that file. However, if you’re growing the file via individual cluster allocations (e.g. performing lots of relatively small append operations to the file to grow it), each allocation is going to start based on the last cluster allocated on the volume (by any file, not just yours). In this case, if you’ve got multiple threads allocating individual clusters at the same time, there’s a higher probability the files are going to take alternating clusters, causing worse fragmentation.

 

Moving on to the directory structure: Each directory on a disk consists of a file entry table. Each file within a directory occupies an entry in that table (I'll get to long file names in a bit). Each of these entries includes (among other things) the file's 8.3 filename, the index of the first cluster in the file, and a 32-bit filesize. The use of a 32-bit filesize means that individual files are limited to 4GB. This is another limitation, especially for DVR scenarios where extremely large files may be generated.

File entries within a given directory are not kept in any particular order. When a file is opened, the table for the directory is scanned, starting at the begining, to search for the file. When a new file is created, the entire table for that directory must be scanned to determine if the file already exists. When a file is deleted, the entry for that file is cleared, but the remainder of the table is not repacked to reclaim the entry (although the entry may be reused if a new file is created). What all this means is the amount of time to open or create a file in that directory will rise linearly with the number of files in that directory. A directory with a large number of files may impact performance.

The use of long file names can impact directory performance as well: long file names are allocated as additional directory entries, so each normal file which also includes a long file name will occupy additional directory entries. On WinCE, long filenames do not need to be generated if the filename is 8.3 compatible: it must contain all uppercase characters, no unicode characters, and be no longer than 8 characters long with no more than a 3 character extension. For example, FILENAME.TXT would not generate a long filename and uses only one directory entry. FileName.Txt, filename.txt, and FILENAME0.TXT would all genenerate a long filename. Frankly, this wasn't something we thought of during the DVR project (I didn't think of it until I started working on this blog). Our DVR filenames weren't 8.3 compatible and thus did generate long filenames, but we never observed a problem caused by this that I recall.

As an aside: during our testing we hit a bug in the FAT Filesystem code (which has since been fixed via QFE). There was an overflow in the cluster calculation when creating files which used the last cluster before the 4GB limit (i.e larger than 4GB-32K). The result was that that when attempting to create a file of that size, the system would allocate all available clusters on the disk to the file until it ran out of disk space and returned a failure code. At that point the disk would be left in a state where the directory would show the file we created as having 0 bytes allocated to it, yet the disk would appear full. Deleting the file would correct the situation.

Now, to summarize the problems we had, and the fixes or workarounds we came up with:

 

1.  The 4GB filesize limit: There's really nothing that can be done about this. For our project the DVR engine code was already designed to record video streams as a set of relatively small files (e.g. holding a few minutes of data each), rather than one large file. This solved a couple of different problems: it avoided the 4GB filesize limit and it reduced the seek overhead of large files. It also solved a problem peculiar to DVR implementations: when recording to a "temporary" buffer which should only keep the last 60 minutes of video, one needs to conceptually record to the end of the file while simultaneously truncating the file from the beginning. There's no way to do this with a single file, whereas with multiple files it can be done easily by just deleting the earliest file in the set. I believe we ended up with files in the 128MB to 256MB range (this is a registry-tunable parameter). 

 

2.   Seeking within very large files: When seeking within a file, we have to traverse the file’s FAT chain to find the desired cluster. As the file grows toward the 4GB filesize limit, the number of links that need to be traversed gets very large (for 32k clusters, up to 2^17 entries). As mentioned above, we partially avoided this issue by not using the full 4GB filesize available to us.

However, at the same time, the Filesystem team developed an improved algorithm which should greatly help this situation: The original FAT code we were working with included a single per-open-file 32-bit cached entry of the last cluster accessed within the file, so sequential reads/writes always hit this cache and didn’t need to retraverse the FAT chain. However, for DVR applications we were sometimes thrashing this entry by simultaneously reading and writing different locations within the file. The recent QFE fixed this by adding additional cache entries so multiple threads shouldn’t thrash against each other.  There are probably still pathological situations which might have perf issues (e.g. repeatedly seeking backward within a very large file). Note- the “cache” entries I’m talking about here are not the same as the FAT cache I'll mention at the end of this blog: That's  a cache of the actual sectors comprising the FAT table. Increasing the FAT cache size would also help this issue as well though.

 

3.   Appending to a file on a very large, almost full disk: When appending to the end of a file, we need to search the FAT table looking for a free cluster. On very large disks which are almost full this search can potentially take a very long time.  The DVR code already had a solution to this problem by spinning off a low-priority thread which would run out ahead of our disk-write thread and preallocate data files to be used later by the write thread. Since that time the filesystem team has shipped a QFE which keeps an in-memory data structure to track free clusters; this QFE should greatly improve performance in this situation.

 

4.   Opening/creating a file in a directory filled with a large number of files: When you open or create a file, we need to search the directory looking for a file with the requested name (we need to do this even if creating a file to ensure we don’t create two files with the same name). If the directory has a very large number of files, this algorithm can take a long time. The FAT directory structure isn’t sorted and there’s no efficient way of searching it other than looking at every entry; the worst-case situation is creating a new file, since we need to search the entire list before we realize the entry doesn’t already exist. There is no QFE to solve this issue. Increasing the Data cache may help, but the best thing is not to create too many files within a single directory.

This was a serious issue for our DVR code, which was creating hundreds of files. However, we had control over the filesystem names (which were typically something like xxxx0000.yyy, xxxx0001.yyy, xxxx0002.yyy) and where they were stored. Our solution was to generate a directory name hash off of the filename and divide the files across multiple subdirectories.

 

Other useful bits of information: 

 

The FAT filesystem code keeps in-memory caches for both the disk FAT and disk data. The sizes of these caches may be configured via the registry:

 

[HKEY_LOCAL_MACHINE\System\StorageManager\FATFS]

            "FatCacheSize"=0xXXXXX  - Size of the FAT Table Cache

            "DataCacheSize"=0xXXXX - Size of the Data Cache

 

Where XXX is the number of sectors. Must be a power of 2 and at least 16 sectors. If 0, FATFS will determine best cache size.   The maximum size that it defaults to is 256K which is way too small for 120G disk. Pick a number and multiply by 512: that is the amount of memory you will take.

 

 

If you do need to preallocate space to a file, keep in mind that the minimum set of calls to do so are something like:

 

    hFile = CreateFile(szFilePath, GENERIC_WRITE, FILE_SHARE_READ, NULL, CREATE_ALWAYS, 0, NULL);

    long FileSizeLow = 0x80000000;  // e.g. create a 2GB file

    long FileSizeHigh = 0;

    dwRet = SetFilePointer(hFile, FileSizeLow, &FileSizeHigh, FILE_BEGIN);

    bRet = SetEndOfFile(hFile);

    CloseHandle(hFile);

 

You do not need to actually write any data (doing so would just be wasted overhead).

 

Note that the docs on SetFilePointer are a little vague; you must include the &FileSizeHigh param: a side-effect of including this is to force SetFilePointer to interpret FileSizeLow as an unsigned value rather than a signed value. The other caveat is that if you’re preallocating the file you’ll need to keep track of how much data you’ve actually written to the file elsewhere (you can’t rely on GetFileSize, or just seeking to the end of the file and writing to append data to the file).

 

Finally, I should note that the ExFAT filesystem which was recently shipped with Windows CE 6.0 should further alleviate some of these issues. My understanding is that ExFAT supports files larger than 4GB and implements bitmap-based unallocated-cluster mechanism. I'm not intimately familiar with it though and haven't tried it yet, so I can't go into any more detail.

That’s about all I have in me for now. Feel free to leave any comments, good or bad.

-Andy Raffman

 

Goals

 

Back in 2000, while we were defining the requirements for the Windows Mobile Smartphone audio design, one of our goals was to mute most audio applications while a phone call is in progress.

 

A secondary issue is the fact that most of the time the user keeps their phone in their pocket or backpack, or is holding it out in front of them looking at the screen. In this situation we need to play notifications and incoming rings loud enough to be heard. However, during a phone call the phone is being held tightly to the user's ear, and a sound played loudly enough to be heard in the first situation would be far too loud.

 

A final goal was to minimize any changes at the application level. We didn't want applications to need to monitor whether we were in a call and adjust their own volume level, or even require any modifications at all to third party applications that wanted some reasonable behavior.

 

Design

 

To solve these problems, the wavedev2 audio driver implements something we called gain classes. Each wave stream is associated with a specific gain class, and the driver implements an additional gain control for each of these classes. This additonal gain is separate from the controls exposed by waveOutSetVolume, and is transparent to the application. The effects of the various gain controls are cumulative: the total gain applied to a specific output stream will therefore be the product of the stream gain, the device gain, and the class gain.

 

A quick digression here: there are two standard ways that an application can control the volume of a wave stream, each of which involves a call to waveOutSetVolume.

  • Calling waveOutSetVolume and passing in a wave device ID is used to set the device gain, and will theoretically affect all streams playing on the device.
  • Calling waveOutSetVolume and passing in a wave handle (the thing you get back from waveOutOpen) is used to set the stream volume, and will only affect that stream.

By the way, a common (and hard to diagnose) application error is trying to call waveOutSetVolume on a wave handle before the handle has been initialized to something other than 0. This won't generate an error: the call will be interpreted as a request to change the device volume of device ID 0 (typically the only device in the system), which will affect all the volume of all the other apps in the system.

 

Application Usage

 

Whenever an application opens a wave stream, that stream is automatically associated with class 0. However, applications may move their stream to a different class by calling waveOutMessage with the proprietary MM_WOM_SETSECONDARYGAINCLASS. For example, to open a stream and associate it with class 2 one would do the following:

 

    waveOutOpen(&hWaveOut, ...);

    ...

    // Set gain class to 2

    waveOutMessage(hWaveOut, MM_WOM_SETSECONDARYGAINCLASS, 2, 0);

 

Classes are differentiated from each other in two ways:

·         During a call, the amount of attenuation is controlled on a per-class basis. Some classes may be muted; others are attenuated (made a little more quiet on the assumption that the phone is being held up to the user’s ear); and others may have no attenuation at all. The amount of attenuation is controlled by the shell based on a set of registry values.

·         Each class may or may not be affected by the “system volume”. This behavior is hard-coded in the audio device driver.

In the currently shipping implementation there are four classes with the following behavior:

 

Class

Behavior During Call

Affected by system volume

Used by

0

Muted

Yes

Default setting for all sounds

1

Attenuated

Yes

?

2

Attenuated

No

Alarm, Reminder, Notification, Ring, In-call sounds

3

Muted

No

System event sounds

 

In addition, future implementations supporting VoIP may include two additional classes which have no attenuation during a call:

 

Class

Behavior During Call

Affected by system volume

Used by

4

No attenuation

Yes

?

5

No attenuation

No

?

 

Shell Usage

 

Normally, the gains of all classes are set to 0xffff, meaning there’s no attenuation. At the beginning of a phone call the shell calls MM_WOM_SETSECONDARYGAINLIMIT to attenuate each class by some amount, and calls it again during hangup to reset the attenuations. The code to do this looks something like:

 

    for (iClass=0;iClass<NUMCLASSES;iClass++)

    {

        waveOutMessage(<device ID>, MM_WOM_SETSECONDARYGAINLIMIT, iClass, <volume from 0-0xffff>);

    }

 

The amount of attenuation which the shell applies during a call is controlled by the following registry keys:

 

[HKEY_CURRENT_USER\ControlPanel\SoundCategories\Attenuation]

"0"=dword:0

"1"=dword:2

"2"=dword:2

"3"=dword:0

 

The key name is the class index, and the associated value is the amount of gain to allow during a call. The value ranges from 0 to 5, with 0 meaning totally muted and 5 meaning no attenuation.

 

Note that existing apps which don’t set their class will default to class 0, will be muted during a call, and will be affected by system volume (which is generally the behavior that is desired).

 

A note on volume values

 

Volume levels are typically represented as unsigned 16-bit values, with 0xFFFF being full volume and 0x0000 representing the muted state. For example, waveOutSetVolume encodes the volume parameter as a 32-bit DWORD, with the lower 16 bits holding left channel volume and the upper 16 bits holding the right channel volume. On the other hand, the gain class API only accepts a single 16 bit value which is meant to apply to both channels (we didn't think there would be a need to attenuate left and right by different amounts).

 

Historically there's been alot of disagreement over how these 16 bits map to actual dB attenuation values, and how the stream and device gains interact. In the wavedev2 sample driver the behavior is as follows:

  • Stream gains map from 0 to -100dB attenuation. The idea here was to provide applications with a large enough range to handle any potential situation and also maintain some compatibility with the desktop's usage in DirectSound, which uses the same 0 to -100dB range.
  • Device gains map from 0 to -35dB attenuation. The idea behind this was that historically the device gain has been implemented by going directly to the codec hardware, which at the time the API was designed typically was limited to something in the -32dB range.
  • Gainclass gain values map from 0 to -100dB.
  • When calculating the aggregate attenuation of the various gain values, the code converts each gain value to a dB attenuation and then adds the attenuations. For example, if the stream gain is 0x8000 (half scale), the device gain is 0x8000 (also half scale), and the gainclass gain is 0xFFFF (no attenuation), the total attenuation would be (.5 * 100) + (.5 * 35) + (0 * 100) = -67.5dB.
  • Any gain value of 0 represents the totally muted state. For example, if the stream gain in the above calculation started of at 0x0000, the stream would be totally muted independent of the other gain values.

The calculation above and the ranges that each gain type maps to are implemented inside the wavedev2 driver, and OEMs may choose to modify the values to meet their specific needs.

 

Areas for improvement

In retrospect, there are a couple of things I wish we had done differently and which we’ll keep in mind for the future:

·         We should have made the number of gain classes and how they’re affected by device volume programmable, rather than hard-coding them into the sample driver. With the current design whenever we need to add a new gain class, or change whether the device volume affects a given class, we need to touch the OEMs device driver. This is typically only a one or two line change, but it still makes life difficult.

·         There is currently no way to query the current attenuation level for a class.

·         The waveapi component of the core OS implements a “gain class” infrastructure as part of the software mixer component to accomplish a similar goal. However, this was implemented after wavedev2 had shipped and its design is incompatible with the way Smartphone needs it to work. This is the main reason Smartphone/PPC-Phone devices need to use wavedev2 as a starting point for a driver. It would be nice if waveapi’s software mixer implemented the same gain class design so we could pull the code out of wavedev2 and simplify its design.

 

Responses to questions:

1. Where is MM_WOM_SETSECONDARYGAINCLASS defined?

- It should be in audiosys.h, but that might have moved around a bit. The definitions you're looking for are:

    #define MM_WOM_SETSECONDARYGAINCLASS   (WM_USER)

    #define MM_WOM_SETSECONDARYGAINLIMIT   (WM_USER+1)

    #define MM_WOM_FORCESPEAKER            (WM_USER+2)

Keep in mind that these are very likely to change or go away in future release of the operating system.

The wavedev2 wave driver sample code includes a fairly primitive MIDI synthesizer as part of the source code. The reason for its relative simplicity dates back to the question: "What can you implement in two weeks with no additional ROM hit to run on a 100MHz ARM processor"?

Seriously, the original goal was just to provide a sample implementation, with the assumption that OEMs would replace it by licensing or developing their own synthesizer (which, in fact, many OEMs do). We also (correctly, I believe) anticipated that in the future ringtones would more likely be implemented as compressed audio files (e.g. WMA). It made little sense to pour money/development resources into developing our own high-quality MIDI synthesizer, and there are third parties who handle that quite well anyway  (such as Beatnik).

Having said that, the wavedev2 sample still includes the primitive MIDI synthesizer, and it probably ships unmodified on some platforms, so it might be interesting to know how to use it.

The wavedev2 sample MIDI synth has the following attributes:

- Instruments: Only sine wave generation, no other instruments.

- Polyphony: No limit on the number of midi streams. Number of concurrent notes per stream is limited to 32 (controlled by a #define in the driver). Realistically, the total number of notes will be limited by the amount of CPU MIPS. I don’t think we’ll have any problem with 8-10 notes.

- Sample accurate timing

- Extensions to support arbitrary frequency tone generation (e.g. for things like DTMF, ringback, busy, etc.) and tempo changes.

 

The OS doesn't support the standard Win32 MIDI apis, so we had to invent our own somewhat proprietary method. To do this without creating new API entry points, we implemented MIDI as a proprietary wave format. To play MIDI notes, you open the wave device using waveOutOpen with a WAVEFORMAT_MIDI format structure, which is defined in wfmtmidi.h as:

 

typedef struct _WAVEFORMAT_MIDI

{

    WAVEFORMATEX wfx;

    UINT32 USecPerQuarterNote;

    UINT32 TicksPerQuarterNote;

} WAVEFORMAT_MIDI, *LPWAVEFORMAT_MIDI;

 

The wfx.wFormatTag field should be filled in with WAVE_FORMAT_MIDI, which is defined in the header as:

 

#define WAVE_FORMAT_MIDI 0x3000
 

(In retrospect we should have used a WaveFormatExtensible structure, which uses a GUID, rather than arbitrarily allocating another format tag, since there's a chance we'll collide with some other OEM format tag).

 

You then start passing buffers to the driver using waveOutMessage, just as you would for wave data. The data in the buffers consists of an array of WAVEFORMAT_MIDI_MESSAGE structures, which are defined as:

 

typedef struct _WAVEFORMAT_MIDI_MESSAGE

{

    UINT32 DeltaTicks;

    DWORD  MidiMsg;

} WAVEFORMAT_MIDI_MESSAGE;

 

The wave driver will automatically take care of the timing of when to sequence each midi message, based on the relationship between the DeltaTicks field of the next midi event and the USecPerQuarterNote and TicksPerQuarterNote fields of the WAVEFORMAT_MIDI structure.

 

You can send just about any MIDI message to the driver, but the sample drivers only process the midi messages for “note on”, “note off”, and the control change message for “all notes off”; any other MIDI message will be ignored by the driver.

 

The sample driver also supports proprietary messages for playing an arbitrary frequency and for changing the tempo during playback:

 

MIDI_MESSAGE_FREQGENON and MIDI_MESSAGE_FREQGENOFF are roughly analogous to NoteOn/NoteOff, but they take a 16-bit frequency value rather than a 7-bit note value, and they always play a sine wave. This can be useful for things like DTMF, ringback, busy, and other call progress tones which require exact frequencies and which don’t map exactly to the frequencies supported by the musical scale. For these messages, the upper 8 bits of MidiMsg (which are normally 0) are set to either MIDI_MESSAGE_FREQGENON or MIDI_MESSAGE_FREQGENOFF. The next 8 bits are the 7-bit velocity (e.g. volume) (the top bit must be 0), and the lowest 16 bits are the desired frequency.

 

MIDI_MESSAGE_UPDATETEMPO can be used to update the USecPerQuarterNote parameter in the middle of a stream. For these messages, the upper 8 bits of MidiMsg (which are normally 0) are set to either MIDI_MESSAGE_ UPDATETEMPO. The low 24 bits are the updated tempo value.

Other notes:

  • While the MIDI synth handles sequencing and tone generation, it doesn't include any provision for MIDI file parsing. If you want to play MIDI files, you'll need to implement a MIDI file parser, extract the data and timestamp information, and feed it down the the wave driver.
  • The sample MIDI synth only supports sine waves, and has no concept of channels, instruments, or patch changes: all instruments are going to be remapped to sine wave tones. In general this yields a recognizable melody, with one major exception: in MIDI, percussion is considered considered a single instrument, with different types of drums, symbols, etc. mapped to different note values. If you try to play an arbitrary MIDI stream which includes percussion, the various percussive sounds are going to be played as apparently random sine waves. It's going to sound awful.

I've appended two samples below. The first, miditest.cpp, plays a midi scale. The second plays a ringback tone for 30 seconds.

miditest.cpp (plays an 8-note midi scale):

#include "windows.h"
#include "wfmtmidi.h"

int _tmain(int argc, TCHAR *argv[])
{
    // Code to play a simple 8-note scale.
    unsigned char Scale[8] =
    {
        63,65,67,68,70,72,74,75
    };

    // Build a MIDI waveformat header
    WAVEFORMAT_MIDI wfm;
    memset(&wfm,0,sizeof(wfm));
    wfm.wfx.wFormatTag=WAVE_FORMAT_MIDI;
    wfm.wfx.nChannels=1;
    wfm.wfx.nBlockAlign=sizeof(WAVEFORMAT_MIDI_MESSAGE);
    wfm.wfx.cbSize=WAVEFORMAT_MIDI_EXTRASIZE;

    // These fields adjust the interpretation of DeltaTicks, and thus the rate of playback
    wfm.USecPerQuarterNote=1000000;   // Set to 1 second. Note driver will default to 500000 if we set this to 0
    wfm.TicksPerQuarterNote=100;      // Set to 100. Note driver will default to 96 if we set this to 0

    HANDLE hEvent;
    hEvent = CreateEvent( NULL,TRUE,FALSE,NULL);

    MMRESULT Result;
    HWAVEOUT hWaveOut;

    // Open the waveout device
    Result = waveOutOpen(&hWaveOut, 0, (LPWAVEFORMATEX)&wfm, (DWORD)hEvent, 0, CALLBACK_EVENT);

    if (Result!=MMSYSERR_NOERROR)
    {
        return -1;
    }

    // Build a MIDI buffer with 16 MIDI messages.
    int i,j;
    WAVEFORMAT_MIDI_MESSAGE MidiMessage[16];
    for (i=0,j=0;i<8;i++,j+=2)
    {
        MidiMessage[j].DeltaTicks=100;      // Wait 1 second : (DeltaTicks * (UsecPerQuarterNote/TicksPerQuarterNote))
        MidiMessage[j].MidiMsg=0x7F0090 | ((Scale[i])<<8);   // Note on
        MidiMessage[j+1].DeltaTicks=100;    // Wait 1 second
        MidiMessage[j+1].MidiMsg=0x7F0080 | ((Scale[i])<<8); // Note off
    }

    WAVEHDR WaveHdr;
    WaveHdr.lpData = (LPSTR)MidiMessage;
    WaveHdr.dwBufferLength = sizeof(MidiMessage);
    WaveHdr.dwFlags = 0;
    Result = waveOutPrepareHeader(hWaveOut,&WaveHdr,sizeof(WaveHdr));

    // Play the data
    Result = waveOutWrite(hWaveOut,&WaveHdr,sizeof(WaveHdr));

    // Wait for playback to complete
    WaitForSingleObject(hEvent,INFINITE);

    // Cleanup
    Result = waveOutUnprepareHeader(hWaveOut,&WaveHdr,sizeof(WaveHdr));
    Result = waveOutClose(hWaveOut);
    return 0;
}

tonetest.cpp (Plays a 30 second ringback tone):

#include "windows.h"
#include "wfmtmidi.h"

/*
DTMF frequencies:

DTMF stands for Dual Tone Multi Frequency. These are the tones you get when
you press a key on your telephone touchpad. The tone of the button is the
sum of the column and row tones. The ABCD keys do not exist on standard
telephones.

                        Frequency 1

                    1209  1336  1477  1633

                697   1     2     3     A

                770   4     5     6     B
Frequency 2
                852   7     8     9     C

                941   *     0     #     D

Frequencies of other telephone tones

Type                Hz          On      Off
---------------------------------------------------------------------
Dial Tone         350 & 400     ---     ---
Busy Signal       480 & 620     0.5     0.5
Toll Congestion   480 & 620     0.2     0.3
Ringback (Normal) 440 & 480     2.0     4.0
Ringback (PBX)    440 & 480     1.5     4.5
Reorder (Local)   480 & 620     3.0     2.0
Invalid Number    200 & 400
Hang Up Warning 1400 & 2060     0.1     0.1
Hang Up         2450 & 2600     ---     ---
*/

int _tmain(int argc, TCHAR *argv[])
{
    WAVEFORMAT_MIDI wfm = {0};
    wfm.wfx.wFormatTag=WAVE_FORMAT_MIDI;
    wfm.wfx.nBlockAlign=sizeof(WAVEFORMAT_MIDI_MESSAGE);
    wfm.wfx.cbSize=WAVEFORMAT_MIDI_EXTRASIZE;

    // Force each tick to be 1/10 sec
    wfm.USecPerQuarterNote=100000;
    wfm.TicksPerQuarterNote=1;

    MMRESULT Result;
    HWAVEOUT hWaveOut;
    HANDLE hEvent=CreateEvent( NULL,TRUE,FALSE,NULL);

    Result = waveOutOpen(&hWaveOut, 0, (LPWAVEFORMATEX)&wfm, (DWORD)hEvent, 0, CALLBACK_EVENT);
    if (Result!=MMSYSERR_NOERROR)
    {
        return -1;
    }

    // Create a buffer for 5 midi messages
    WAVEFORMAT_MIDI_MESSAGE MidiMessage[5];

    MidiMessage[0].DeltaTicks=0;
    MidiMessage[0].MidiMsg=0x207F0000 | 440;    // Note on 440Hz

    MidiMessage[1].DeltaTicks=0;
    MidiMessage[1].MidiMsg=0x207F0000 | 480;    // Note on 480Hz

    MidiMessage[2].DeltaTicks=20;               // Wait 2 sec
    MidiMessage[2].MidiMsg=0x307F0000 | 440;    // Note off 440Hz

    MidiMessage[3].DeltaTicks=0;
    MidiMessage[3].MidiMsg=0x307F0000 | 480;    // Note off 480Hz

    MidiMessage[4].DeltaTicks=40;               // Wait 4 sec
    MidiMessage[4].MidiMsg=0;                   // Dummy msg, does nothing

    WAVEHDR WaveHdr;

    // Point wave header to MIDI data
    WaveHdr.lpData = (LPSTR)MidiMessage;
    WaveHdr.dwBufferLength = sizeof(MidiMessage);

    // Loop on this buffer 20 times
    WaveHdr.dwFlags = WHDR_BEGINLOOP|WHDR_ENDLOOP;
    WaveHdr.dwLoops = 20;

    // Play it!
    Result = waveOutPrepareHeader(hWaveOut,&WaveHdr,sizeof(WaveHdr));
    Result = waveOutWrite(hWaveOut,&WaveHdr,sizeof(WaveHdr));

    // Wait for it to be done or 30 seconds, whichever comes first
    WaitForSingleObject(hEvent,30000);
    Result = waveOutReset(hWaveOut);
    Result = waveOutUnprepareHeader(hWaveOut,&WaveHdr,sizeof(WaveHdr));
    Result = waveOutClose(hWaveOut);

    return 0;
}

Q&A (I'll start moving my responses to comments here):

Q. I changed the definition of MidiMessage in the above sample to make it a dynamically allocated pointer, and now the sample doesn't work.

A. You need to also change the line that says "WaveHdr.dwBufferLength = sizeof(MidiMessage);" or else dwBufferLength will end up being 4 (the size of your pointer) rather than the size of the buffer. If you don't, the call to waveOutPrepareHeader will fail, as will everything else past that point. Blame me for not doing error checking in the sample code above.

Q. wmftmidi.h is no longer present in the Windows Mobile 5 SDK.

A. You should be able to grab it from an older SDK. Keep in mind that this is really a fairly "unofficial" API which was really designed solely for OEM use to play ringtones. One shouldn't expect it to be present on every device.

Q. How come the ringtones on my phone sound like high-quality MIDI, but when I use the interface described above I only get low-quality sine waves?

A. There are two possibilities. The first is that the ringtone you hear is actuallly a compressed WMA file, which are supported as ring tones. The second is that the OEM may have implemented their own proprietary MIDI synthesizer elsewhere in the system. To elaborate on the latter situation: There's a higher-level API, known as EventSound, where OEMs can plug in their own MIDI synthesizer (or arbitrary audio codec) to play ringtones or other system sounds. This API isn't open to ISVs (it's very subject to change from release to release). The actual implementation of this will vary greatly from device to device.

Q. Didn't Windows CE support a higher quality MIDI synth at some point in the past as part of DirectShow?

A. At one point DirectMusic was ported to Windows CE and shipped to support playback of MIDI files. However, for a variety of reasons (performance, code size, RAM usage, stability) it was unacceptable. As far as I know no one ever shipped a product using it, and it was dropped from the product in successive releases.

 

This is my first blog post, so please feel free to leave feedback with questions or comments, especially if you feel I've gotten anything wrong or if there's some critical bit of infomation missing.

Windows CE currently ships audio driver samples descended from three distict codebases: MDD/PDD, WaveDev2, and UAM. There are historical and functional reasons for this, but the existence of different driver models that all do more-or-less the same thing has caused some confusion. I'll try to clear things up a little in this posting.

First off, all three sample designs adhere to the same WaveAPI driver interface. They all hook into the system as device drivers, export WAV_Open, WAV_IOControl, WAV_Close, etc. entry points, and handle IOCTL_WAV_MESSAGE IoControl codes to interact with the waveapi subsystem. That upper-edge is hardware independent, and all the hardware dependent code goes into the driver. The difference between the samples is in their internal design.

 MDD/PDD 

The oldest design and the one most in use today among Windows CE embedded platforms is the MDD/PDD model. The MDD/PDD implementation splits the driver into two pieces, a "sort-of hardware independent" MDD layer, and a "really hardware dependent" PDD layer. The MDD portion is shipped as public code (in public\COMMON\oak\drivers\wavedev\mdd), and generates a library named wavemdd.lib. The PDD layer must be written (or ported from public\COMMON\oak\drivers\wavedev\pdd) by the OEM. To build a complete driver, the two layers are statically linked together. Between the MDD and PDD layers there is a functional interface defined by public\common\oak\inc\waveddsi.h.

The waveapi driver interface already does a pretty good job of distilling hardware dependencies down to the driver level, so one might wonder how MDD/PDD can further separate hardware independent/dependent layers. To do this, the MDD layer makes a couple of assumptions about the way the hardware works and what types of features you want to support.

Here are some assumptions MDD/PDD makes:

·         Only one device (waveOutGetNumDevs always returns 1)

·         Only one stream per device (e.g. one input and one output stream). Note that waveapi includes an internal “software mixer” which can virtualize the single output stream into multiple streams at the application level.

·         Input and output DMA share the same interrupt.

By making these assumptions, the MDD/PDD model greatly simplifies the PDD layer, and the MDD/PDD driver is relatively easy to port if you have fairly generic audio hardware and you have fairly generic needs. However, if your hardware is nonstandard, or if you need to implement some special handling, you may find yourself itching to modify the MDD source code. At that point you may be fighting against the MDD/PDD interface design and creating more complexity than needed.

Wavedev2

At the start of the Smartphone project in 2000 we had a number of audio requirements which we found the MDD/PDD model could not meet without major changes to the MDD/PDD interface. In addition, at that point in time (WinCE 3.0) there was no waveapi “software mixer” to allow us to play multiple sounds concurrently, so we knew we would have to take care of that in the driver. The solution was to start over and implement a new design which became informally known as wavedev2 (the original wave driver was under platform\hornet\drivers\wavedev, so when it came time to start on the new design it got put in the wavedev2 subdirectory).

Wavedev2 is a monolithic design in which all the source files are located in a single directory. To port a wavedev2 driver you just copy all the files from an existing sample and start modifying. This actually isn’t as bad as it sounds because in most cases the only files you need to modify are hwctxt.h and hwctxt.cpp. In retrospect it would have been better to put the files in different directories to make this a little more clear, reduce the tendency of OEMs to make random changes in the other files, and simplify the task of fixing bugs in the other files. That's probably something we'll be looking at cleaning up in the future.

The most recent wavedev2 sample was shipped as part of WinCE 6 under public\common\oak\drivers\wavedev\wavedev2\ensoniq. This latest wavedev2 driver includes the following features which are not found on the other driver implementations:

-        “MIDI” synthesizer. I put MIDI in quotes because, frankly, it’s a pretty minimal implementation which only supports sine wave output (this will probably be another blog topic). However, it works great for the types of things a phone needs to do: play DTMF and call progress tones and simple melodies. (See The Wavedev2 MIDI Implementation )

-       Sample-rate-conversion and mixing on both input and output streams. The driver can mix multiple output streams at different sample rates into a single output stream (something that can now be done with the MDD/PDD driver using the software mixer). It can also split the single input stream and source it to multiple applications at different sample rates (something no other driver design can currently do).

-       A “gain class” interface. Each output stream is associated with a specific class. Whenever an app creates a new stream it is associated with class 0, although the app can move its stream to a different class via a waveOutMessage call to the driver. The system can use a separate waveOutMessage call to the driver to control the volume level on a per-class basis. This interface is used by the shell to do things like mute audio playback when a phone call is in progress. This is probably another blog topic for later. (See The Wavedev2 Gainclass Implementation )

-       A “forcespeaker” interface which is used by the shell to “hint” to the driver that a specific sound should be played out the speaker even if a headset is plugged in. This is typically used to allow an OEM to play ringtones out a speaker even if a headset is plugged in. (See The Wavedev2 ForceSpeaker API )

-       Support for an S/PDIF interface and for streaming of WMAPro compressed content across S/PDIF. This is a recent addition, specific to the Ensoniq version, which was used as a proof-of-concept for the Tomatin project. (See Multichannel Audio in Windows CE )

[Note: a previous version of this blog claimed that a wavedev2 sample shipped with the Tomatin (NMD) feature pack under public\fp_nmd\common\oak\drivers\wavedev\wavedev2\ensoniq. I was wrong; the files did not ship in that release. I apologize to anyone I misled. The sample code in the CE6 release should be backward compatible, although I have no idea of whether there are any licensing issues with using CE6 sample code with a CE5 device]

If you’re developing a Windows Mobile Smartphone or PocketPC Phone, you pretty much have to start with the wavedev2 sample: the system depends on a number of the extensions implemented in wavedev2. On the other hand, if you’re developing an embedded Windows CE product you can use whichever design best fits your needs.

UAM

During the development of WinCE 4.2 the audio team was working on adding support for DirectSound and needed a sample driver to demonstrate exposing DirectSound support from the device driver. As was discovered during the Smartphone effort, retrofitting the MDD/PDD driver would entail a number of changes. Instead, a new monolithic driver was written using some bits of the wavedev2 design, with added support for the Ensoniq-specific feature of mixing two audio streams in hardware (and falling back to the software mixer for any additional streams). While there are superficial similarities between UAM and Wavedev2, they're still pretty different though.

However, support for DirectSound was dropped in WinCE 5.0, and it’s very rare to find audio designs that support mixing audio streams in hardware. There’s absolutely nothing wrong with it, and many OEMs still use it as the basis for their audio driver ports; but for new designs it doesn’t add much value to either of the other models.

 

Still Image Capture

 

Windows Mobile 5.0 contains an Image Sink Filter (a DShow filter). This filter encodes the image data and writes the encoded image to a file. For encoding, the Image Sink Filter uses Imaging API to access the installed Imaging encoders. The still image encoders which Microsoft ships with Windows Mobile 5.0 are not optimized and should, in general be replaced by OEMs with optimized encoders for the particular platform. One limitation of Imaging framework is that it only accepts RGB input, i.e., the raw buffer sent to Imaging encoder must be in RGB format. For this very reason, the Image Sink filter does not accept YCbCr.

 

Besides accepting RGB data, the Image Sink Filter also accepts pre-encoded JPG data. This is useful when there is an optimized still image encoder available on a given platform, which accepts YCbCr. For this to work the actual encoding is done by the driver and the camera driver exposes a format with SubFormat field of CSDATARANGE equal to MEDIASUBTYPE_IJPG GUID.  The encoded buffer will then be passed down to the Image Sink Filter simply for file I/O.  This optimized encoding can also be done in a DShow Transform filter with the transform filter exposing MEDIASUBTYPE_IJPG on its output pin.

 

The OEMs can still use the encoders that are shipped with Windows Mobile 5.0. These encoders provide a stable, platform independent solution. However, as the resolution of still image increases, the performance of these encoders deteriorates.

 

Following is how you would change the sample null camera driver to provide JPG data on STILL pin.

 

Step 1: Define the format

 

In adapterprops.h  define the following

 

#define FOURCC_IJPG mmioFOURCC('I', 'J', 'P', 'G')

#define MEDIASUBTYPE_IJPG {0x47504A49, 0x0000, 0x0010, 0x80, 0x00, 0x00, 0xaa, 0x00, 0x38, 0x9b, 0x71}

#define DX          176

#define DY          144

#define DBITCOUNT    16

#define FRAMERATE    15

CS_DATARANGE_VIDEO DCAM_StreamMode =

{

      // CSDATARANGE

     {

sizeof (CS_DATARANGE_VIDEO), // Flags

0,

SAMPLESIZE, // Replace with maximum Sample Size that the JPG data would take for the current resolution and image quality

0, // Reserved

STATIC_CSDATAFORMAT_TYPE_VIDEO,

MEDIASUBTYPE_IJPG,

STATIC_CSDATAFORMAT_SPECIFIER_VIDEOINFO

},

TRUE, // BOOL, bFixedSizeSamples (all samples same size?)

TRUE, // BOOL, bTemporalCompression (all I frames?)

CS_VIDEOSTREAM_CAPTURE, // StreamDescriptionFlags (CS_VIDEO_DESC_*)

0, // MemoryAllocationFlags (CS_VIDEO_ALLOC_*)

// _CS_VIDEO_STREAM_CONFIG_CAPS

{

// Omitted for this sample. Please refer to adapterprops.h in the sample null camera driver for ways to fill in this structure

………

………

},

// CS_VIDEOINFOHEADER

{

0,0,DX,DY, // RECT rcSource;

0,0,DX,DY, // RECT rcTarget;

BITRATE, // DWORD dwBitRate;

0L, // DWORD dwBitErrorRate;

REFTIME_15FPS, // REFERENCE_TIME AvgTimePerFrame;

sizeof (CS_BITMAPINFOHEADER), // DWORD biSize;

DX, // LONG biWidth;

DY, // LONG biHeight;

3, // WORD biPlanes;

DBITCOUNT, // WORD biBitCount;

FOURCC_IJPG | BI_SRCPREROTATE, // DWORD biCompression;

SAMPLESIZE, // DWORD biSizeImage;

0, // LONG biXPelsPerMeter;

0, // LONG biYPelsPerMeter;

0, // DWORD biClrUsed;

0, // DWORD biClrImportant;

0, 0, 0 // DWORD dwBitMasks[3]

}

};

Step 2: Add this format to the list of formats of Still pin

 

In sample null camera driver, you would modify CCameraDevice::Initialize() in cameradevice.cpp to have

 

    m_PinVideoFormat[STILL].categoryGUID = PINNAME_VIDEO_STILL;

    m_PinVideoFormat[STILL].ulAvailFormats = 1;

    m_PinVideoFormat[STILL].pCsDataRangeVideo[0] = &DCAM_StreamMode;

 

Step 3: Modify the BufferFill() function

 

Use biCompression field of the current format to determine whether MEDIASUBTYPE_IJPG format is selected. In BufferFill() you can have following check

 

UINT

BufferFill( PUCHAR pImage, PCS_VIDEOINFOHEADER pCsVideoInfoHdr, IMAGECOMMAND Command, bool FlipHorizontal, LPVOID lpParam )

{

    if( NULL == pCsVideoInfoHdr )

    {

        return -1;

    }

   

    DWORD biCompression = pCsVideoInfoHdr->bmiHeader.biCompression;

   

    if ( (FOURCC_IJPG == (biCompression & ~BI_SRCPREROTATE)) )

    {

      // Real drivers would add code here to handle JPEG encoding.

// The biSizeImage that is returned here should be the actual

// size of JPEG data that is written to the buffer.

        return biSizeImage;

    }

 

……….

……….

}

 

Step 4: Handle PROPSETID_VIDCAP_VIDEOCOMPRESSION

 

The default implementation(AdapterHandleCompressionRequests()) for PROPSETID_VIDCAP_VIDEOCOMPRESSION returns ERROR_INVALID_PARAMETER. Camera application on Windows Mobile 5.0 and later, uses this interface to set encoder quality. You would need to change this to return ERROR_SUCCESS after correctly handling various properties in this property set. Failure to do this would result in camera application raising an exception at initialization.

 

 
Page view tracker