New book: Developing Microsoft Media Foundation Applications


New book: Developing Microsoft Media Foundation Applications

  • Comments 1

Microsoft Press is pleased to announce the new book Developing Microsoft Media Foundation Applications (Print ISBN: 978-0-7356-5659-8, 384 pages) by Anton Polinger.

Media Foundation provides the core tools you need to create powerful and professional media applications and components. The author, Anton Polinger is an expert who has been developing Microsoft media technologies for over a decade. This book shows you how to build applications that can capture different types of video and audio files, process media information, and stream media over the Internet. Filled with useful examples, you’ll get a complete introduction to the Media Foundation API, learn to develop custom Media Foundation components, and create advanced video and audio applications—while discovering how to solve common development problems along the way.

Read Chapter 3, “Media Playback” to get a feel for the depth of this book.

Chapter 3 - Media Playback

In this chapter:

Basic File Rendering with Media Sessions

Building the Media Pipeline

Now that we have covered some of the basic Microsoft Media Foundation (MF) concepts, you can look at the process of writing MF applications. In this chapter, you’ll create a simple video file player—a Windows application that accepts a media file and plays it back in a window. This file player demonstrates the core ideas of Media Foundation. With those core concepts in hand, you will be able to build more advanced applications in later chapters.

Before you begin writing Media Foundation applications, you need to have at least a basic understanding of COM APIs. COM is a Microsoft technology that was introduced in the early 1990s to enable disparate objects to link to each other. At its core, COM provides a way for various objects to interoperate with each other without knowing each other’s internal structure or details.

More Info You can find a brief overview of COM in Appendix B. The overview is not meant to be a full description of COM but is designed to give you a refresher of some core COM concepts.

MF uses COM extensively but is not itself a pure COM API. A pure COM API consists of only COM objects that can be instantiated exclusively by the COM CoCreateInstance() function. Instead, MF uses a mix of COM and normal objects, as well as several other standard functions. For example, to instantiate an MF playback topology, you cannot create an instance of a well-known and registered class by calling CoCreateInstance() with the topology’s class ID (CLSID) but must instead use the MFCreateTopology() function.

Despite that, internally MF uses some COM functionality. Therefore, at the beginning of your application you must initialize the COM system by calling the CoInitializeEx() function. This will initialize the COM library and prepare it for use. After the application is done executing, you should also call CoUninitialize() to shut down the COM library.

Note MF is a free-threaded system, which means that COM interface methods can be invoked from arbitrary threads. Therefore, when calling CoInitializeEx(), you must initialize COM with the apartment-threaded object concurrency by passing in the COINIT_APARTMENTTHREADED parameter. Your objects might also need to use synchronization primitives, such as locks, to control access to internal variables.

In most cases you should use implementations of interfaces that are provided by Media Foundation. Some MF APIs rely on additional functionality that is only present in the MF objects that implement these interfaces. For example, if you are going to use the default media session, you should not implement your own IMFTopology object, but use the object returned by the MFCreateTopology() API.

To demonstrate some of the core concepts of Media Foundation, this chapter uses and analyzes a sample player provided with the downloadable code for this book.

More Info For instructions on how to obtain and download the sample code, see the Introduction.

The player is a very simple MF application designed to play various media files using the MF components that are built into Windows Vista and Windows 7.

Background This player application started out as the BasicPlayback MSDN sample (http://msdn.microsoft.com/en-us/library/bb970475(v=VS.85).aspx) but has been heavily modified. The media session eventing model has been separated from the Win32 application layer, and the topology-building functions were pulled into a separate class. In addition, redundant code was removed. The purpose of these modifications was to segment various concepts of the player into self-contained classes, and thus simplify exploring them in manageable chunks.

The player is built on top of a basic Win32 application, with all its functionality encapsulated in winmain.cpp. Because the core purpose of this book is to explore Media Foundation, it does not cover the application itself.

Over the course of this chapter, you’ll explore the two main classes that compose the player: CPlayer, which encapsulates media session behavior; and CTopoBuilder, which builds the media topology for the player. These two classes contain all the MF functionality of the player and abstract it from the Win32 components, thus greatly simplifying the code.

The CPlayer class wraps and encapsulates everything you need to do to instantiate core playback Media Foundation objects and control video playback. You can find the code for the CPlayer class in the Player.h and Player.cpp files in the downloadable code.

The CTopoBuilder class, in turn, encapsulates the functions and objects needed to create a Media Foundation topology. The topology is used to initialize and connect the MF objects that are actually responsible for processing media data. The CTopoBuilder class is separate from the CPlayer to simplify the code and separate each object’s areas of responsibility. You’ll cover topology building and MF pipeline object instantiation in the second half of this chapter. The CTopoBuilder class is defined in the TopoBuilder.h and TopoBuilder.cpp files in the downloadable code.

Article I. Basic File Rendering with Media Sessions

Just like Microsoft DirectShow, the Media Foundation architecture uses a series of components connected to each other to process media files and display them to the user. Although these components can be instantiated and hooked up together in various ways, one of the most common means of managing these sets of data processing components is with a media session object. By using a media session object, you can generate a media pipeline and control the playback of the content.

The media session in MF serves roughly the same purpose as the graph in DirectShow. The media session holds all of the MF components that process the data and gives you the ability to start and stop playback. During playback, the media session pulls samples from the source, passes them through the MFTs, and sends them to the renderers. At the moment, however, you can ignore all that. Internal media session functionality will be discussed in more detail in Chapter 8, “Custom Media Sessions.”

Note In addition to the media session, MF contains another method for building and controlling playback topologies—the MFPlay API. However, MFPlay has been deprecated and therefore will not be covered in this book.

Because the media session is a basic concept used in most Media Foundation applications, it lies at the core of the player application example. As such, the core sample player class—CPlayer—wraps and encapsulates most of the functionality exposed by the media session. The CPlayer class uses the media session to first build a media pipeline, then control video playback, and finally, to clean up resources on shutdown. In addition, the CPlayer class also wraps and hides the complexities of the MF asynchronous messaging system from the main Win32 application.

As mentioned earlier, most MF operations are asynchronous, which makes them difficult to represent graphically without resorting to Unified Modeling Language (UML). In this case, I’ve chosen to demonstrate the basic life cycle of the sample player as a modified UML sequence diagram. Don’t worry: this book does not delve deeper into UML’s complexity and uses it only sparingly.

UML sequence diagrams are well suited for demonstrating lifelines and behaviors of multiple objects. The sequence diagrams used in this book do not follow all of the UML sequence diagram conventions but are used to bring across the basic underlying ideas. The following diagram shows which object instantiates which, and the sequence of calls made by individual components. In addition, this book sometimes uses the vertical rectangles visible along some of the object lifelines to indicate the duration of various function calls. Thus the first vertical rectangle under the Player object indicates the duration of the CPlayer::OpenURL() method, and shows what calls that function makes. The vertical dimension in the diagram represents time, and subsequent calls are shown one above the other. For example, in the diagram, the Create Player call happens before the Set File URL call—which is why the first call is above the second.

Here is the basic sequence of calls in the application to start video playback:

G03DW01

Here’s a detailed description of these calls and steps:

1. The application instantiates the CPlayer component. In the diagram, the rectangle with the name “Player” represents the object, and the dashed vertical line underneath the rectangle indicates the lifeline of the CPlayer object.

2. The application sends the file URL to the player by calling the CPlayer::OpenURL() method. The file URL function triggers the rest of playback, shown in the subsequent steps. The application does not need to make any further calls to play the file.

3. The player instantiates the media session. During session creation, the player registers itself as the receiver for the session’s asynchronous MF events by passing a pointer to itself to the IMFMediaSession::BeginGetEvent() method.

4. The player then instantiates the topology. Topology building in this sample is done by the CTopoBuilder helper class, which will be covered in the “Building the Media Pipeline” section later in this chapter.

5. The player passes the topology to the media session.

6. After the topology is loaded, the media session instantiates all of the components indicated in the topology and fires an asynchronous MF event indicating that the topology is ready. Because the player registered itself as the receiver of session events in step 3, it now gets called by the session with a “topology-ready” event.

7. The topology ready event causes the player to start playback.

MF uses an asynchronous model for most operations. Internally, MF has special worker thread objects that execute scheduled asynchronous calls on separate threads, outside of the normal flow of execution that you might have seen in single-threaded applications. The asynchronous model will be covered in greater detail in Chapter 6, “Media Foundation Sources.”. For now you can ignore the worker thread objects, and assume that the asynchronous objects make calls of their own volition. For example, in the diagram just shown, the media session does not directly call the player’s IMFAsyncCallback::Invoke() method, but instead schedules work with the worker thread object. The worker thread is the component that actually executes the Invoke() call.

More Info The asynchronous nature of MF applications makes debugging and determining the source of problems rather difficult. See Appendix A for a discussion on how to debug MF applications.

To understand how the player operates, you’ll examine each of these steps in detail.

Section 1.01 Creating the Player

Little actually occurs during player creation. Player creation involves calling the CPlayer constructor, which in turn initializes the MF system. Before you can start using Media Foundation calls, you need to initialize the MF system, which you do with the MFStartup() function. Here’s the code for the CPlayer constructor:

//
// CPlayer constructor - instantiates internal objects and initializes MF
//
CPlayer::CPlayer(HWND videoWindow, HRESULT* pHr) :
m_pSession(NULL),
m_hwndVideo(videoWindow),
m_state(Closed),
m_nRefCount(1)
{
HRESULT hr = S_OK;
do
{
// initialize COM
hr = CoInitializeEx(NULL, COINIT_APARTMENTTHREADED);
BREAK_ON_FAIL(hr);
// Start up Media Foundation platform.
hr = MFStartup(MF_VERSION);
BREAK_ON_FAIL(hr);
// create an event that will be fired when the asynchronous IMFMediaSession::Close()
// operation is complete
m_closeCompleteEvent = CreateEvent(NULL, FALSE, FALSE, NULL);
BREAK_ON_NULL(m_closeCompleteEvent, E_UNEXPECTED);
}
while(false);
*pHr = hr;
}

The player constructor receives a handle to the main window as a parameter. It uses this handle when building the topology, to inform the video renderer where it should draw the video.

The player is also initializing COM by calling CoInitializeEx(), and MF by calling MFStartup(). This is necessary for MF to properly function. Each of these calls has a corresponding shutdown call that must be made when the player is finalizing. For each successful CoInitializeEx() call you must make a CoUninitialize() call to close and unload COM, and for each successful MFStartup() call you must make a corresponding MFShutdown() call to shut down MF.

Note Media Foundation APIs come only in the UNICODE flavor—there are no ASCII versions of Media Foundation functions.

In addition, the constructor creates a Win32 event that will be used later to signal that the session has finished closing. You’ll see more about this event—and the reason for its existence—in the “Media Session Asynchronous Events” section later in this chapter.

After all of the components have been initialized, the constructor sets the output HRESULT parameter that returns the success or failure to the caller.

Section 1.02 Initializing the Media Session

Most of the initialization begins only after the application tells the player which file to play. The application passes a file URL to the player (which may be either a network path or a local UNC file path). The player then uses the URL to initialize the topology that will play the content. Here is the OpenURL() function that initiates this work:

//
// OpenURL is the main initialization function that triggers building of the core
// MF components.
//

HRESULT CPlayer::OpenURL(PCWSTR sURL)
{
CComPtr<IMFTopology> pTopology = NULL;
HRESULT hr = S_OK;
do
{

// Step 1: create a media session if one doesn't exist already
if(m_pSession == NULL)
{
hr = CreateSession();
BREAK_ON_FAIL(hr);
}
// Build the topology. Here we are using the TopoBuilder helper class.
hr = m_topoBuilder.RenderURL(sURL, m_hwndVideo);
BREAK_ON_FAIL(hr);
// get the topology from the TopoBuilder
pTopology = m_topoBuilder.GetTopology();
BREAK_ON_NULL(pTopology, E_UNEXPECTED);
// Add the topology to the internal queue of topologies associated with this
// media session

hr = m_pSession->SetTopology(0, pTopology);
BREAK_ON_FAIL(hr);
// If a brand new topology was just created, set the player state to "open pending"
// - not playing yet, but ready to begin.

if(m_state == Ready
)
{
m_state = OpenPending;
}
}
while(false)
;
if (FAILED(hr))
{
m_state = Closed;
}
return hr;
}

As you can see, the main job of the OpenURL() method is to pass calls to other functions. There are three steps to player initialization:

· Creating and initializing the media session

· Building the playback topology

· Passing the topology to the session

The function responsible for creating and initializing the media session is shown here:

//
// Creates a new instance of the media session.
//
HRESULT CPlayer::CreateSession(void)
{
HRESULT hr = S_OK;
do
{
// close the session if one is already created
BREAK_ON_FAIL( CloseSession() );
if(m_state != Closed)
{
hr = E_UNEXPECTED;
break;
}
// Create the media session.
BREAK_ON_FAIL( MFCreateMediaSession(NULL, &m_pSession) );
m_state = Ready;
// designate this class as the one that will be handling events from the media
// session
hr = m_pSession->BeginGetEvent((IMFAsyncCallback*)this, NULL);
BREAK_ON_FAIL(hr);
}
while(false);
return hr;
}

To create a media session, you need to call the MFCreateMediaSession() API. This function takes an optional configuration object as a parameter and returns a pointer to the new media session. After the session has been created, you need to specify a callback object that will receive asynchronous events from the session. You do that by calling the IMFMediaEventGenerator::BeginGetEvent() function, which notifies the media event generator (in this case, the session) that the next event should be sent to the passed-in object.

To simplify the sample code, in this case the CPlayer class itself is the event callback object. It implements the IMFAsyncCallback interface used by the event generator to pass back various events. Whenever anything of note happens in the media session, the player will be notified with a call to its IMFAsyncCallback::Invoke() method.

Finally, after the topology has been built by the CTopoBuilder class, the player passes the topology to the media session by calling the IMFMediaSession::SetTopology() function. The topology contains information about the playback components that the session needs to instantiate, and holds a reference to the source of the media—in this case the file source.

The first parameter to the IMFMediaSession::SetTopology() function is an optional combination of the following flags:

· MFSESSION_SETTOPOLOGY_IMMEDIATE Stop playback of the currently loaded media (if any) and apply the topology immediately.

· MFSESSION_SETTOPOLOGY_NORESOLUTION This is a full topology and doesn’t need to be resolved. MF doesn’t need to attempt to add any missing MFTs, and the session can accept the topology as is.

· MFSESSION_SETTOPOLOGY_CLEAR_CURRENT If the second parameter is NULL, clear any topology association with the session. If the second parameter is not NULL, reset the session only if the passed-in topology matches the one currently associated with the session.

Passing in 0 is valid as well and indicates that no flags are specified. If the MFSESSION_SETTOPOLOGY_IMMEDIATE flag is not specified and a topology is already queued in the session, the new topology will be added to the queue. In that case, after the session is done playing the first topology, the second one will start. This is useful when topology resolution might take a long time and you do not want to have any gaps between playback of multiple pieces of content.

Section 1.03 Media Session Asynchronous Events

As you may already know, all Windows applications are inherently asynchronous. Windows applications do not execute on a single thread. Instead, they have a message loop that constantly waits for events and performs different actions based on the event information. For example, each time you press a key, the system sends an event to the focused application’s message loop with the key ID. The application decodes that event and performs some corresponding action.

To conform more closely to the asynchronous nature of standard Windows applications, MF uses a similar asynchronous architecture. This design choice greatly improves responsiveness of MF applications. If a Windows application directly calls a synchronous function that takes a long time to execute, the call will block the application’s event loop, which causes the window to stop updating and appear to hang. As a result, the UI will freeze, and users will be unable to click buttons, resize the window, or manipulate its menus. To bypass that problem at the root—or to at least make it easier to write asynchronous applications without designing a secondary messaging system—MF also uses an asynchronous model.

In this asynchronous system, most MF functions do not directly execute any operations; instead, the calls only schedule the work with the main object, and return immediately. The work items themselves then execute on a background thread. After the work is completed, the objects fire events that notify the caller of the status of the scheduled work item.

To facilitate this asynchronous communication system, the media session receives a pointer to an event callback object. This callback object will be called whenever something of note happens, such as when the topology is ready for playback, when the video playback has ended, or when errors occur. This example uses the player itself as the asynchronous event callback object. The player implements the IMFAsyncCallback interface, registers itself with the media session in the CPlayer::CreateSession() method, and gets called by the session whenever something of note happens.

More Info As mentioned earlier, there is actually an extra component in this system—the worker thread object. This worker thread does not drastically change the explanation but instead provides more detail and an additional level of understanding to the MF asynchronous design. For now, the worker object will be glossed over in this chapter to simplify the explanation. The worker thread will be discussed in more detail in Chapter 6.

The MF event cycle is actually fairly complex and contains several stages. The following sequence diagram demonstrates how MF events work in our sample. It shows a sequence of calls made to process an asynchronous event fired from the media session to the player object.

G03DW02

Here is a detailed breakdown of the calls shown in the diagram:

1. The player creates a new session by calling MFCreateMediaSession() from the CPlayer::CreateSession() function.

2. The player registers itself as the event handler for the next event from the media session. The player does so by passing a pointer to an IMFAsyncCallback object (in this case, itself) to the IMFMediaEventGenerator::BeginGetEvent() method. The call returns immediately. This call tells the media event generator which object it needs to call whenever an event occurs.

3. The event generator (the session) calls IMFAsyncCallback::Invoke()to notify the event handler of a new event in the queue. Note that this is done from a separate thread other than the main thread—that is why this is known as an asynchronous system. Other things can be happening concurrently in the player on a different thread.

4. The event handler calls the event generator back and gets the actual event from the queue with the IMFMediaEventGenerator::EndGetEvent() method. This call returns immediately.

5. The event handler processes the retrieved event.

6. The player registers itself to receive the next event, the same as in step 2.

7. Go back to step 3.

By using this looping mechanism, MF ensures that events always have an object that can process them, and that responsibility for processing events can be passed smoothly between several objects. For example, an event-handling object (an implementer of the IMFAsyncCallback interface) can easily hand over processing of the next event to a new object by passing the new object’s pointer in the next BeginGetEvent() call.

Note You may have noticed that the IMFAsyncCallback interface also defines a helper GetParameters() method. This method is used by the media event generator to detect how the callback object will handle the calls. Most of the time you will want to use default behavior and have the method return the E_NOTIMPL HRESULT.

Here is the sample player’s implementation of the IMFAsyncCallback::Invoke() method.

//
// Receive asynchronous event.
//
HRESULT CPlayer::Invoke(IMFAsyncResult* pAsyncResult)
{
CComPtr<IMFMediaEvent> pEvent;
HRESULT hr = S_OK;
do
{

CComCritSecLock<CComAutoCriticalSection> lock(m_critSec);
BREAK_ON_NULL(pAsyncResult, E_UNEXPECTED);
// Get the event from the event queue.
hr = m_pSession->EndGetEvent(pAsyncResult, &pEvent);
BREAK_ON_FAIL(hr);
// If the player is not closing, process the media event - if it is, do nothing.
if (m_state != PlayerState_Closing)
{
hr = ProcessMediaEvent(pEvent);
BREAK_ON_FAIL(hr);
}
// If the media event is MESessionClosed, it is guaranteed to be the last event. If
// the event is MESessionClosed, ProcessMediaEvent() will return S_FALSE. In that
// case do not request the next event - otherwise tell the media session that this
// player is the object that will handle the next event in the queue.
if(hr != S_FALSE)
{
hr = m_pSession->BeginGetEvent(this, NULL);
BREAK_ON_FAIL(hr);
}
}
while(false);
return S_OK;
}

Note For the extraction of the asynchronous result to work correctly, you must pass exactly the same IMFAsyncResult object to the IMFMediaEventGenerator::EndGetEvent() function as the one you received in the Invoke() parameter.

As you can see, the Invoke() method does not do any processing of the event data. After the actual IMFMediaEvent object is retrieved from the session with the IMFMediaSession::EndGetEvent() call, the media event is passed to the helper ProcessMediaEvent() function. ProcessMediaEvent() parses the internal values in the media event and determines what, if anything, needs to happen. If the media event type is MESessionClosed, then that event is guaranteed to be the last to come from the session. In that case, ProcessMediaEvent() returns S_FALSE, which indicates that there is no need for the player to register for the next event. Notice also the CComCritSecLock object instantiated in the Invoke() method. This object is an ATL construct that wraps around Win32 synchronization primitives. It ensures that only one thread can execute in the section locked with the same c_critSec critical section. For more information about ATL objects and thread synchronization, see Appendix C.

Section 1.04 Event Processing and Player Behavior

After the Invoke() function has pre-processed the event, it passes control to the player event-processing function:

//
// Called by Invoke() to do the actual event processing, and determine what, if anything,
// needs to be done. Returns S_FALSE if the media event type is MESessionClosed.
//
HRESULT CPlayer::ProcessMediaEvent(CComPtr<IMFMediaEvent>& pMediaEvent)
{
HRESULT hrStatus = S_OK; // Event status
HRESULT hr = S_OK;
UINT32 TopoStatus = MF_TOPOSTATUS_INVALID;
MediaEventType eventType;
do
{
BREAK_ON_NULL( pMediaEvent, E_POINTER );
// Get the event type.
hr = pMediaEvent->GetType(&eventType);
BREAK_ON_FAIL(hr);
// Get the event status. If the operation that triggered the event did
// not succeed, the status is a failure code.
hr = pMediaEvent->GetStatus(&hrStatus);
BREAK_ON_FAIL(hr);
// Check if the async operation succeeded.
if (FAILED(hrStatus))
{
hr = hrStatus;
break;
}
// Switch on the event type. Update the internal state of the CPlayer as needed.
if(eventType == MESessionTopologyStatus)
{
// Get the status code.
hr = pMediaEvent->GetUINT32(MF_EVENT_TOPOLOGY_STATUS, (UINT32*)&TopoStatus);
BREAK_ON_FAIL(hr);
if (TopoStatus == MF_TOPOSTATUS_READY)
{
m_state = PlayerState_Stopped;
hr = OnTopologyReady();
}
}
else if(eventType == MEEndOfPresentation)
{
m_state = PlayerState_Stopped;
}
else if (eventType == MESessionClosed)
{
// signal to anybody listening that the session is closed
SetEvent(m_closeCompleteEvent);
hr = S_FALSE;
}
}
while(false);
return hr;
}

Event processing in the CPlayer::ProcessMediaEvent() function identifies the event and performs various operations based on the type of the event. First of all, the function extracts the type of the event—the type will be a value from the MediaEventType enumeration. The type provides clues about the information in the event, who fired it, and why. For example, when the topology is first set on the session, the player receives a media event with the MESessionTopologySet type; when the session is started, the player receives the an event with the MESessionStarted type; and so on. This naming convention is very useful in identifying the information and context of an event.

More Info The MediaEventType enumeration is defined in the mfobjects.h header file. If you receive a failure event of some sort, you should start by looking up the event type and using it to figure out who fired it. That will provide a clue about the error. For more information about debugging MF asynchronous failures, see Appendix A.

When the ProcessMediaEvent() function has identified the type of the event, it checks whether the event indicates a failure of some sort. If the IMFMediaEvent object was fired because of a failure, the HRESULT stored in the event object will indicate what sort of failure that is. To get the failing HRESULT, use the IMFMediaEvent::GetStatus() method. If the event indicates a failure, the function just exits, aborting further event processing.

If the event status does not indicate failure, ProcessMediaEvent() parses the event type to determine the next step. If the event type is MESessionTopologyStatus, and the status is that the topology is ready for playback, the function calls OnTopologyReady() to start playback. If the type is MEEndOfPresentation, indicating the end of playback, the function updates the player state to signal that the player has stopped. Finally, if the type is MESessionClosed, the function signals to any threads waiting for the m_closeCompleteEvent Win32 event that the close operation is done, and sets the return HRESULT to S_FALSE. This tells the caller—the Invoke() function—that no more new events should be requested from the session.

As you can see, the ProcessMediaEvent() function has special handling for the MESessionClosed event. This is needed to properly dispose of all the resources associated with the session. The problem with session shutdown is that you need to make sure that the closing operation is complete before you can safely dispose of the session. Because the Close() function is asynchronous, however, you need a synchronization mechanism that will let the thread in charge of shutting down the session (executing in the CPlayer::CloseSession() function) wait for the session Close() call to complete before proceeding with cleanup.

To solve this problem, you can use a Win32 event to synchronize individual threads responsible for the session shutdown. The main thread in the player initiates the IMFMediaSession::Close() call from the CPlayer::CloseSession() function, and then waits for the m_closeCompleteEvent to be set. Calling the Close() function, however, triggers a separate thread that actually performs the closing operation. That second thread does the work required to close the session, and then calls the CPlayer::Invoke() method. The Invoke() method sets the m_closeCompleteEvent, which in turn unblocks the first thread that is still waiting in the CloseSession() function. After the first thread resumes, it finishes the shutdown operation.

Here is a sequence diagram of these steps:

G03DW03

The diagram uses two vertical rectangular boxes under the Player object to show two threads running concurrently in two functions. The longer rectangle represents the CPlayer::CloseSession() method. The shorter, darker rectangle represents the player’s Invoke() function, which is called by a different thread. Here are these steps listed in chronological order:

1. Thread 1: The CloseSession() thread calls IMFMediaSession::Close() to start session shutdown, and waits for the m_closeCompleteEvent to be set. The thread blocks until the m_closeCompleteEvent event is signaled.

2. Thread 2: After the Close() operation completes, the media session calls Invoke() on a separate thread.

3. Thread 2: The Invoke() function calls into the session to get the actual media event by invoking its IMFMediaEventGenerator::EndGetEvent() method.

4. Thread 2: The Invoke() call thread gets the media event.

5. Thread 2: EndGetEvent() returns the media event with the MESessionClosed event type.

6. Thread 2: Because this is the close event and is guaranteed to be the last event from the session, Invoke() doesn’t need to call BeginGetEvent() again. Instead, the thread signals the Win32 m_closeCompleteEvent event and exits the Invoke() function.

7. Thread 1: After the first thread detects that m_closeCompleteEvent was set, it resumes execution and calls IMFMediaSession::Shutdown() on the media session.

8. Thread 1: CloseSession() releases the media session object.

Note Again, this explanation is glossing over some details. The media session does not directly call into the player’s Invoke() function. This is just a simplification used to describe the asynchronous operation.

Here is the code for the CPlayer::CloseSession() method—note that the function blocks in the middle, waiting until the m_closeCompleteEvent event is signaled.

//
// Closes the media session, blocking until the session closure is complete
//
HRESULT CPlayer::CloseSession(void)
{
HRESULT hr = S_OK;
DWORD dwWaitResult = 0;
do
{

CComCritSecLock<CComAutoCriticalSection> lock(m_critSec);
m_state = PlayerState_Closing;
// release the video display object
m_pVideoDisplay = NULL;
// Call the asynchronous Close() method and then wait for the close
// operation to complete on another thread
if (m_pSession != NULL)
{
m_state = PlayerState_Closing;
hr = m_pSession->Close();
// IMFMediaSession::Close() may return MF_E_SHUTDOWN if the session is already
// shut down. That's expected and acceptable.
if (SUCCEEDED(hr))
{
// Begin waiting for the Win32 close event, fired in CPlayer::Invoke(). The
// close event will indicate that the close operation is finished, and the
// session can be shut down.
dwWaitResult = WaitForSingleObject(m_closeCompleteEvent, 5000);
if (dwWaitResult == WAIT_TIMEOUT)
{
hr = E_UNEXPECTED;
break;
}
}
}
// Shut down the media session. (Synchronous operation, no events.) Releases all of
// the internal session resources.
if (m_pSession != NULL)
{
m_pSession->Shutdown();
}
// release the session
m_pSession = NULL;
m_state = PlayerState_Closed;
}
while(false);
return hr;
}

The CPlayer::CloseSession() method initiates session shutdown. The method first puts the player into the right state by setting the internal m_state variable, and then sends the IMFMediaSession::Close() call to the session. After that, the thread blocks, waiting for the Win32 m_closeCompleteEvent event to be signaled. As you saw earlier, that event is signaled from a separate thread, from the CPlayer::ProcessMediaEvents() method. Finally, when the close complete event has been signaled, the function sends the Shutdown() call to the session, releases the session by setting the m_pSession variable to NULL, and sets the m_state variable to indicate that the player is closed.

One possible source of confusion with the preceding description might lie in the difference between Win32 and MF events. Although both systems use the same term, the two event systems function quite differently.

You can think of MF events as conceptually “active” objects. When an MF object “fires” an event, it actively calls an event consumer and passes control over to its IMFAsyncEvent::Invoke() function.

Core Windows events, however, are more “passive” entities. Win32 events are actually Windows synchronization primitives used to synchronize multiple threads. Conceptually the events are similar to locks or critical sections—they are usually used to block thread execution until either the event has been signaled or some timeout expires.

Therefore, you can think of Windows events as flags rather than as execution calls. When using events, one thread sets the event “flag,” while another thread waits for that flag. Unlike with MF, the execution thread that will act on the event is in a suspended wait state, passively waiting for the event to be set. After the execution thread detects that the event was set, it resumes execution, relying on the operating system to tell it when the event is signaled.

To wrap up this exploration of the player’s asynchronous behavior, here’s a sequence diagram that shows what happens when you schedule multiple topologies for playback with a session.

G03DW04

Here are the steps that the diagram demonstrates:

1. The application instantiates the player.

2. The application sets the file URL on the player object.

a. The player creates the media session.

b. The player creates topology 1.

c. The player queues topology 1 with the session.

3. When the topology is ready, the session fires the topology-ready event.

4. When the player receives the topology-ready event, it tells the session to start playback.

5. The application sets a new file URL.

a. The player creates topology 2.

b. The player queues topology 2 with the session.

6. After topology 1 is finished playing, the media session fires the end-of-presentation event.

7. The media session gets the next topology and prepares it for playback. When topology 2 is ready, the session fires the topology-ready event again.

8. The topology-ready event tells the player to start playback.

The actual playback command to the session is triggered by the topology-ready event. To be more specific, the session calls the CPlayer::Invoke() function, which gets the event from the session by calling its IMFMediaEventGenerator::EndGetEvent() implementation. The Invoke() function then passes the event to the ProcessMediaEvent() method, which in turn starts playback. Playback is started by calling the CPlayer::OnTopologyReady() method:

//
// Handler for MESessionTopologyReady event - starts video playback.
//
HRESULT CPlayer::OnTopologyReady(void)
{
HRESULT hr = S_OK;
do
{
// release any previous instance of the m_pVideoDisplay interface
m_pVideoDisplay.Release();
// Ask the session for the IMFVideoDisplayControl interface. This interface is
// implemented by the EVR (Enhanced Video Renderer) and is exposed by the media
// session as a service. The session will query the topology for the right
// component and return this EVR interface. The interface will be used to tell the
// video to repaint whenever the hosting window receives a WM_PAINT window message.
hr = MFGetService(m_pSession, MR_VIDEO_RENDER_SERVICE, IID_IMFVideoDisplayControl,
(void**)&m_pVideoDisplay);
BREAK_ON_FAIL(hr);
// since the topology is ready, start playback
hr = Play();
}
while(false);
return hr;
}

The OnTopologyReady() method has two major operations inside of it:

1. When a topology is ready, the method queries the session for the IMFVideoDisplayControl, which is extracted from the video renderer. This interface can be used to force repainting of the video surface and control aspect ratio, video size, and more.

2. After the session fires the event indicating that the topology is ready, the method immediately starts video playback by calling the StartPlayback() function.

The MFGetService() function is a generic helper subroutine used to query various components for other objects associated with them. Although in this case it’s used to extract the Enhanced Video Renderer’s (EVR’s) IMFVideoDisplayControl interface, you can also pass in other flags to extract the MF source, the audio renderer, mixer components, byte streams, proxies for remote objects, and so on.

After obtaining the EVR pointer for the video-renderer control, you can start playing the video by calling the CPlayer::StartPlayback() method:

//
// Start playback from the current position.
//
HRESULT CPlayer::StartPlayback(void)
{
HRESULT hr = S_OK;
PROPVARIANT varStart;
do
{
BREAK_ON_NULL(m_pSession, E_UNEXPECTED);
PropVariantInit(&varStart);
varStart.vt = VT_EMPTY;
// If Start fails later, we will get an MESessionStarted event with an error code,
// and will update our state. Passing in GUID_NULL and VT_EMPTY indicates that
// playback should start from the current position.
hr = m_pSession->Start(&GUID_NULL, &varStart);
if (SUCCEEDED(hr))
{
m_state = Started;
}
PropVariantClear(&varStart);
}
while(false);
return hr;
}

To start playback, you need to use the IMFMediaSession::Start() method, passing in several parameters. The parameters indicate where the video should start playing—at the beginning of the piece of content or at some later point. You indicate the point at which playback should start with a PROPVARIANT structure, which is essentially a generic data type that can contain any value. The start position parameter can indicate that the content should start playing from an absolute position, from the current position, or even from a relative position within a playlist.

To indicate how to interpret the starting position parameter, the IMFMediaSession::Start() function receives a GUID as its first parameter. The GUID indicates how the method should interpret the second parameter. Here are the possible values for the time format GUID:

· GUID_NULL Indicates that the second parameter will either be VT_EMPTY, meaning that the video should start from the current position, or VT_I8 (an 8-byte signed integer), meaning that the video should start this many 100-nanosecond “ticks” from the beginning of the clip. You can use this format to implement seek behavior. For example, passing in GUID_NULL and VT_EMPTY will resume playback from the position at which the video was paused. Passing in GUID_NULL with VT_I8 set to 300,000,000 will start playback at 30 seconds from the beginning of the video.

· MF_TIME_FORMAT_SEGMENT_OFFSET A custom format supported by the sequencer source.

Note To discover the current playback position within the video, you can call the IMFMediaSession::GetClock() method to extract the presentation clock object for this presentation. You can then use that clock object to extract the current playback position.

In this case, the Start() method receives GUID_NULL as the time format, and VT_EMPTY as the start position, meaning that playback should start at the current position.

Article II. Building the Media Pipeline

The previous section glossed over the topology-building steps. The CPlayer class delegates all the work required to build the media pipeline to the CTopoBuilder helper class. This section examines how the CTopoBuilder object assembles the various MF components into a structure that can be used to play content, as well as how the player uses the partial topology to build the media pipeline used to play the file.

The purpose of the CTopoBuilder class is to separate topology-building complexity from the main player class. This helper class takes the URL of a piece of content (usually a string containing the path to a file), creates an MF source that can load that content, and then builds a topology from it. The CTopoBuilder generates not a full topology but a partial one, adding only source and sink nodes and connecting them together. These source and sink nodes are the hints that the media session needs to find the rest of the components needed to play the content. Right before playing the topology, the session finds the right combination of MF transforms that can convert the source media streams into a format that the renderer sinks can process. The procedure of filling in the blanks in partial topologies is known as “resolving” or “rendering” a topology.

The following diagram shows the topology built by the CTopoBuilder and MF session when preparing for playback.

G03DW05

The CTopoBuilder class is responsible for instantiating the source nodes and the renderer nodes and connecting them to each other (in the diagram, these connections are represented by the dashed arrows). Later, the session will examine the data stored in the topology nodes and automatically discover (“resolve”) the additional nodes needed by the topology. That process is represented by the two Auto Resolved Node boxes being directed by the large arrow.

Here are the required steps to create a media pipeline:

1. The CTopoBuilder class creates the MF source component. The CTopoBuilder needs the source to discover the type and number of streams present in the file.

2. The CTopoBuilder object generates a partial topology by repeating the following steps for each stream:

a. Create a source node.

b. Create a sink node.

c. Connect source and sink nodes to each other.

3. Finally, after the partial topology is generated, it is given to a component that resolves the topology, finds all the missing (but implied) components, and instantiates all the objects. In this player, the media session does this job.

You’ll examine these steps in more detail in the following sections.

Section 2.01 Creating the Media Foundation Source

At the very beginning, the only thing a player application has is a file name. To be able to play that file, the application needs a component that can load the file, unwrap the data from the file container, and expose the data to the MF system. This job is done by the MF source.

To create a media source object capable of understanding the container and data format of the specified file, Media Foundation uses a built-in component called the source resolver. The source resolver takes the file path or stream URL and attempts to create the right media source component for that file type. Here’s the CTopoBuilder function that calls the source resolver and asks it to create a media source for a specified file:

//
// Create a media source for the specified URL string. The URL can be a path to a stream,
// or it can be a path to a local file.
//

HRESULT CTopoBuilder::CreateMediaSource(PCWSTR sURL
)
{
HRESULT hr = S_OK;
MF_OBJECT_TYPE objectType = MF_OBJECT_INVALID;
CComPtr<IMFSourceResolver> pSourceResolver;
CComPtr<IUnknown> pSource;
do
{

// Create the source resolver.
hr = MFCreateSourceResolver(&pSourceResolver);
BREAK_ON_FAIL(hr);
// Use the synchronous source resolver to create the media source.
hr = pSourceResolver->CreateObjectFromURL(
sURL, // URL of the source.
MF_RESOLUTION_MEDIASOURCE |
MF_RESOLUTION_CONTENT_DOES_NOT_HAVE_TO_MATCH_EXTENSION_OR_MIME_TYPE,
// indicate that we want a source object, and
// pass in optional source search parameters
NULL, // Optional property store for extra parameters
&objectType, // Receives the created object type.
&pSource // Receives a pointer to the media source.
);
BREAK_ON_FAIL(hr);
// Get the IMFMediaSource interface from the media source.
m_pSource = pSource;
BREAK_ON_NULL(m_pSource, E_NOINTERFACE);
}
while(false)
;
return hr;
}

After creating the source resolver (using the appropriately named MFCreateSourceResolver() function), you can ask the IMFSourceResolver object to create an MF source. Notice that the second parameter to the IMFSourceResolver::CreateObjectFromURL() function is a set of flags that indicate what sort of object to create and how to search for the matching source. These flags indicate the algorithm that the source resolver uses to pick the right source for the media type—a source that can understand and parse the file in question.

Here is the logic that the source resolver follows when provided with the (ridiculously named) MF_RESOLUTION_CONTENT_DOES_NOT_HAVE_TO_MATCH_EXTENSION_OR_MIME_TYPE flag to discover the right source for the file:

G03DW06

AU: The second sentence in the paragraph below was edited in your AU review but doesn’t seem right. Please check.

Initially, the source resolver attempts to discover the right source by looking at the file name and checking whether Windows has a source registered to handle such files. Media Foundation holds the source to the file extension mapping in the Windows registry. If the resolver finds a source that claims to be able to handle files of this type, it passes the file name to the source. During construction, the source might load several megabytes of the file and double-check that the file contents are in a format that it can process. If the source returns success, the resolver is done. However, if the source fails to read the file (which can happen if a file has an incorrect extension—for example, if you rename a WMV file with an AVI extension), then the resolver tries all the sources on the machine. If none of the sources can play the file, the source resolver gives up and returns a failure.

More Info You will learn more about the source resolver and the process it uses to create media sources in Chapter 6.

Note that in this case, the code is using the synchronous IMFSourceResolver::CreateObjectFromURL() function. However, the source resolver also supports asynchronous methods for creating sources; for example, it also has the BeginCreateObjectFromURL() and EndCreateObjectFromURL() methods. Asynchronous object creation is useful when dealing with network streams and other data sources that can take a long time to access; asynchronous creation will not block the main application UI thread. If the operation takes too long—for example, if a network location is unreachable—the asynchronous object creation process can also be cancelled.

You can use the source resolver to create not just media source objects but also IMFByteStream objects. Byte stream objects are responsible for actually opening the streams and passing the unparsed sequence of bytes to a source. The source then parses that stream of bytes, unwraps the actual data from the file container, separates individual elementary media streams, and presents those streams to the rest of the topology. If you use the CreateObjectFromURL() function to create a source directly, MF creates a matching byte stream object in the background automatically. The byte stream object will be of a type that can actually load the content. For example, if you specify a network URL (such as “http://www.contoso.com/file.wmv”), the source resolver will automatically create a network byte stream object. If you specify a file path, the source resolver will generate a byte stream that can load data from a file. Here is a conceptual diagram of how data flows through the topology and how it gets transformed by individual components:

G03DW07

AU: OK that this diagram mentions “Naked Bytes”? You removed “naked” from the text above. ALSO, please use grayscale/pattern to distinguish colors.

These are the individual data flow sections represented in that diagram:

1. The byte stream reader loads a file from disk (or from the network, or some other location). The reader is responsible only for getting the byte stream and giving it to the source. For example, if content is playing from a network stream, the reader is responsible for accessing the network, negotiating with the server using network protocols, and presenting the source simply with byte arrays. The byte stream reader does not parse the stream and has no idea what the stream contains.

2. The MF source receives a byte stream from the byte stream reader.. For example, if the source were processing a file, at this point it would receive byte arrays with file data just as it appears on disk. The bytes would contain a set of media streams wrapped in a media container, multiplexed together according to the file container scheme—AVI, ASF, MP4, or some other format.

3. The source unwraps the media streams from their media container, de-multiplexes them if necessary, and presents them to the topology.

4. The encoded media streams go into the decoder MF transform components, which are in charge of decoding the encoded data and preparing the data for the sinks.

5. The sinks receive completely or partially decoded data and send it to their respective presentation devices.

More Info You’ll see a detailed examination of MF media sources in Chapter 6.

As a result of all these operations, at the end of the CTopoBuilder::CreateMediaSource() method call you have a media source capable of parsing the file. At the same time, the source resolver loads the file byte stream reader behind the scenes. Typically, the file reader and the source work seamlessly, so most of the time you won’t need to care about what the reader does.

Section 2.02 Building the Partial Topology

After the CTopoBuilder object creates the source for the specified file, it can start building the actual topology. This entails analyzing the individual streams, creating appropriate topology source and sink nodes, and pairing the source nodes with their respective sinks. It’s worth noting that the CTopoBuilder doesn’t create the media source and sink objects. Instead, it generates topology nodes that serve as placeholders for those objects. After generating the nodes, the CTopoBuilder arranges them into a structure that can be used later to instantiate the real MF components and generate the actual media pipeline.

The topology-building steps are as follows:

1. Get the presentation descriptor from the source. The presentation descriptor lists the individual streams and their status.

2. Get a stream descriptor from the presentation descriptor.

3. Create the source node for the stream descriptor.

4. Create the sink node of the right type for this stream.

5. Connect the source and sink nodes to each other.

6. If there are streams left to process, go back to step 2.

The following code shows the function responsible for steps 1 and 2 in the preceding list. This function creates a presentation descriptor for the stream loaded in the source, and then extracts the stream descriptors. Just as the name implies, stream descriptors describe the actual streams in the source. This is the main function that generates the topology:

//
// Create a playback topology from the media source by extracting presentation
// and stream descriptors from the source, and creating a sink for each of them.
//

HRESULT CTopoBuilder::CreateTopology(void)
{
HRESULT hr = S_OK;
CComPtr<IMFPresentationDescriptor> pPresDescriptor;
DWORD nSourceStreams = 0;
do
{

// release the old topology if there was one
m_pTopology.Release();
// Create a new topology.
hr = MFCreateTopology(&m_pTopology);
BREAK_ON_FAIL(hr);
// Create the presentation descriptor for the media source - a container object that
// holds a list of the streams and allows selection of streams that will be used.

hr = m_pSource->CreatePresentationDescriptor(&pPresDescriptor);
BREAK_ON_FAIL(hr);
// Get the number of streams in the media source
hr = pPresDescriptor->GetStreamDescriptorCount(&nSourceStreams);
BREAK_ON_FAIL(hr);
// For each stream, create source and sink nodes and add them to the topology.
for (
DWORD x = 0; x < nSourceStreams; x++)
{
hr = AddBranchToPartialTopology(pPresDescriptor, x);
// if we failed to build a branch for this stream type, then deselect it
// that will cause the stream to be disabled, and the source will not produce
// any data for it
if(
FAILED(hr
))
{
hr = pPresDescriptor->DeselectStream(x);
BREAK_ON_FAIL(hr);
}
}
}
while(false)
;
return hr;
}

To build any topology, you first need to know what streams it will need to render—and to discover that information, you must query the source for an object known as a presentation descriptor, represented by the IMFPresentationDescriptor interface.

An MF presentation descriptor is a container object that describes what sort of streams are available for playback, and allows you to indicate which of those streams should be active and played back. For example, for a file containing two audio streams, you usually want to play only one of those streams. The presentation descriptor allows you to activate (“select”) or to deactivate (“deselect”) a stream using the IMFPresentationDescriptor::SelectStream() and DeselectStream() methods. If a stream is deselected, the source will not send any data to that stream. By default, all streams in a source are selected. Therefore, in cases with multiple audio streams, you must explicitly deselect the streams you do not want to play.

Note If a new stream appears somewhere in the middle of a file, the presentation descriptor will most likely not contain that information. This depends on the file format and on the design of the source itself. If the file format provides an easy way to enumerate all the streams in the file, and if the source is smart enough to discover that information, the presentation descriptor will contain the complete information. However, if this is a basic source that doesn’t provide this information and an unexpected stream suddenly appears, you will need to create a brand-new presentation descriptor and regenerate the topology.

After discovering which streams are in the file, you can start building data paths for them. This entails first creating a source topology node for each stream, then creating a sink (renderer) node, and finally, connecting the two. The following conceptual diagram demonstrates a partial topology that will be generated by the CTopoBuilder::CreateTopology() function:

G03DW08

The topology can connect any source and sink nodes together—even if they don’t expose matching media types. This is allowed because the connection is only a hint for the topology resolver. During the topology resolution step, the session tries various MF transforms until it finds a matching combination capable of consuming a media type exposed by the source node, and producing the media type expected by the sink node.

Here is the function that drives source-sink pair generation for our partial topology:

//
// Adds a topology branch for one stream.
//
// pPresDescriptor: The source's presentation descriptor.
// iStream: Index of the stream to render.
//
// For each stream, we must do the following steps:
// 1. Create a source node associated with the stream.
// 2. Create a sink node for the renderer.
// 3. Connect the two nodes.
// The media session will resolve the topology, inserting intermediate decoder and other
// transform MFTs that will process the data in preparation for consumption by the
// renderers.
//

HRESULT CTopoBuilder::AddBranchToPartialTopology(
IMFPresentationDescriptor* pPresDescr,
DWORD iStream)
{
HRESULT hr = S_OK;
CComPtr<IMFPresentationDescriptor> pPresDescriptor = pPresDescr;
CComPtr<IMFStreamDescriptor> pStreamDescriptor;
CComPtr<IMFTopologyNode> pSourceNode;
CComPtr<IMFTopologyNode> pOutputNode;
BOOL streamSelected = FALSE;
do
{

BREAK_ON_NULL(m_pTopology, E_UNEXPECTED);
// Get the stream descriptor for this stream (information about stream).
hr = pPresDescriptor->GetStreamDescriptorByIndex(iStream, &streamSelected,
&pStreamDescriptor);
BREAK_ON_FAIL(hr);
// Create the topology branch only if the stream is selected - in other words,

// if the user wants to play it.

if (streamSelected)
{
// Create a source node for this stream.
hr = CreateSourceStreamNode(pPresDescriptor, pStreamDescriptor, pSourceNode);
BREAK_ON_FAIL(hr);
// Create the sink node for the renderer.
hr = CreateOutputNode(pStreamDescriptor, m_videoHwnd, pOutputNode);
BREAK_ON_FAIL(hr);
// Add the source and sink nodes to the topology.
hr = m_pTopology->AddNode(pSourceNode);
BREAK_ON_FAIL(hr);
hr = m_pTopology->AddNode(pOutputNode);
BREAK_ON_FAIL(hr);
// Connect the source node to the sink node. The topology will find the
// intermediate nodes needed to convert media types.

hr = pSourceNode->ConnectOutput(0, pOutputNode, 0);
}
}
while(false)
;
return hr;
}

As you can see, the code is fairly straightforward. CTopoBuilder::CreateTopology() calls this function for each stream found in the presentation descriptor. If the passed-in stream is selected, the function generates a source and sink node pairing. The function adds a source node first, then creates the output sink node, adds both nodes to the topology, and finally, connects them to each other. This is all the information the media session requires to generate the actual MF data-processing components from the topology nodes, find any intermediate transforms that are missing, and render the stream.

Here’s what you need to do to build a source node:

//
// Create a source node for the specified stream
//
// pPresDescriptor: Presentation descriptor for the media source.
// pStreamDescriptor: Stream descriptor for the stream.
// pNode: Reference to a pointer to the new node - returns the new node.
//
HRESULT CTopoBuilder::CreateSourceStreamNode(
IMFPresentationDescriptor* pPresDescr,
IMFStreamDescriptor* pStreamDescr,
CComPtr<IMFTopologyNode> &pNode)
{
HRESULT hr = S_OK;
CComPtr<IMFPresentationDescriptor> pPresDescriptor = pPresDescr;
CComPtr<IMFStreamDescriptor> pStreamDescriptor = pStreamDescr;
do
{
BREAK_ON_NULL(pPresDescriptor, E_UNEXPECTED);
BREAK_ON_NULL(pStreamDescriptor, E_UNEXPECTED);
pNode = NULL;
// Create the topology node, indicating that it must be a source node.
hr = MFCreateTopologyNode(MF_TOPOLOGY_SOURCESTREAM_NODE, &pNode);
BREAK_ON_FAIL(hr);
// Associate the node with the source by passing in a pointer to the media source
// and indicating that it is the source
hr = pNode->SetUnknown(MF_TOPONODE_SOURCE, m_pSource);
BREAK_ON_FAIL(hr);
// Set the node presentation descriptor attribute of the node by passing
// in a pointer to the presentation descriptor
hr = pNode->SetUnknown(MF_TOPONODE_PRESENTATION_DESCRIPTOR, pPresDescriptor);
BREAK_ON_FAIL(hr);
// Set the node stream descriptor attribute by passing in a pointer to the stream
// descriptor
hr = pNode->SetUnknown(MF_TOPONODE_STREAM_DESCRIPTOR, pStreamDescriptor);
BREAK_ON_FAIL(hr);
}
while(false);
// if failed, clear the output parameter
if(FAILED(hr))
pNode = NULL;
return hr;
}

As mentioned earlier, topology nodes are not the actual objects that perform the work in a media pipeline. Instead, the node objects hold a set of attributes that describe settings for the MF components that will process the data. These attributes are used during the topology rendering phase to instantiate and initialize the MF components. Therefore, the code in the CTopoBuilder::CreateSourceStreamNode() is again relatively simple. All this function does is instantiate the node for the source stream and set various attributes on it by using the nodes’ IMFAttributes interface.

Creating the sink nodes is slightly more complex. You need to figure out what type of stream this is, create a controller object for the right type of renderer, and store that controller in the sink node. Here is the code for the function that creates sink nodes:

//
// This function creates an output node for a stream (sink).
//
HRESULT CTopoBuilder::CreateOutputNode(
IMFStreamDescriptor* pStreamDescr,
HWND hwndVideo,
CComPtr<IMFTopologyNode> &pNode)
{
HRESULT hr = S_OK;
CComPtr<IMFMediaTypeHandler> pHandler;
CComPtr<IMFActivate> pRendererActivate;
CComPtr<IMFStreamDescriptor> pStreamDescriptor = pStreamDescr;
GUID majorType = GUID_NULL;
do
{
BREAK_ON_NULL(pStreamDescriptor, E_UNEXPECTED);
// Get the media type handler for the stream, which will be used to process
// the media types of the stream. The handler stores the media type.
hr = pStreamDescriptor->GetMediaTypeHandler(&pHandler);
BREAK_ON_FAIL(hr);
// Get the major media type (e.g. video or audio)
hr = pHandler->GetMajorType(&majorType);
BREAK_ON_FAIL(hr);
// Create an IMFActivate controller object for the renderer, based on the media type
// The activation objects are used by the session in order to create the renderers
// only when they are needed - i.e., only right before starting playback. The
// activation objects are also used to shut down the renderers.
if (majorType == MFMediaType_Audio)
{
// if the stream major type is audio, create the audio renderer.
hr = MFCreateAudioRendererActivate(&pRendererActivate);
}
else if (majorType == MFMediaType_Video)
{
// if the stream major type is video, create the video renderer, passing in the
// video window handle - that's where the video will be playing.
hr = MFCreateVideoRendererActivate(hwndVideo, &pRendererActivate);
}
else
{
// fail if the stream type is not video or audio. For example, fail
// if we encounter a CC stream.
hr = E_FAIL;
}
BREAK_ON_FAIL(hr);
pNode = NULL;
// Create the node that will represent the renderer
hr = MFCreateTopologyNode(MF_TOPOLOGY_OUTPUT_NODE, &pNode);
BREAK_ON_FAIL(hr);
// Store the IActivate object in the sink node - it will be extracted later by the
// media session during the topology render phase.
hr = pNode->SetObject(pRendererActivate);
BREAK_ON_FAIL(hr);
}
while(false);
// if failed, clear the output parameter
if(FAILED(hr))
pNode = NULL;
return hr;
}

Here’s a breakdown of the steps this function takes, and an analysis of its responsibilities:

1. Get the IMFMediaTypeHandler object for the stream. The media type handler stores all of the stream media types and has functions for comparing and matching media types.

2. Extract the major media type for the stream. The major type indicates whether the stream contains video data, audio data, or closed captioning, for example.

a. If the major type is audio, create an activator object for an audio renderer using the MFCreateAudioRendererActivate() function.

b. If the major type is video, create an activator for the video renderer using the MFCreateVideoRendererActivate() function. Note that this function receives the handle to the target video window.

c. If the stream contains any other type of data, exit the function and return the generic E_FAIL error code. The failure will cause this stream to be de-selected in the CTopoBuilder::CreateTopology() function, and will allow the rest of the streams to play normally.

3. Create the topology node, passing in the MF_TOPOLOGY_OUTPUT_NODE, indicating that you want to create a sink node.

4. Store the activation object created in step 2 inside the output node.

In step 2, the function does not actually create the renderers themselves; to save resources, renders are not created until the moment they are needed. Instead, it creates an activator controller object for the renderer (an object that implements the IMFActivate interface). These activator objects are used later by the media session to instantiate the renderers, and to shut them down after playback is complete.

Important If in another application you bypass the media session and call the IMFActivate::ActivateObject() function directly, do not forget to call IMFActivate::ShutdownObject() function after you are done with the renderer. Failure to do so will cause your application to leak resources. However, if you let the session handle the renderer activation, then it calls the shutdown when necessary, and you don’t need to worry about this step.

After completing all of these operations, you are left with source and sink node pairs—or “branches.” These branches encompass what is known as a partial topology—they do not contain all the final components needed to stream the data, but instead provide hints for the final topology resolver. As demonstrated here, the nodes themselves contain only the information required to instantiate various objects and build the full topology. After CTopoBuilder is done, you have only a partial topology. In the sample player application, the media session does the job of resolving the topology and adding the missing-but-implied components.

Section 2.03 Resolving the Partial Topology

During the rendering phase of the topology-building operation—right before playback—the media session instantiates a topology loading object. The topology loading object (which implements the IMFTopoLoader interface) is a COM object that accepts a partial topology and resolves it, activating internal components and adding any missing components. Here are the steps involved in resolving a partial topology:

1. The session activates components that the nodes represent. In the simple topology shown here, the session needs to instantiate only the audio and video renderers, because the source is already active and available.

2. The media session searches for the MF transforms or combination of transforms needed to convert data from the media type produced by the source into the media type expected by the renderer. For the partial topology shown just previously, the session finds and adds video and audio decoder transforms.

The algorithm used by the default topology loader to “render” a partial topology is in many ways similar to the one used by the source resolver to find the right source. The session loads various MF transforms and keeps trying different ones until it either succeeds or runs out of possible transforms. The following flowchart demonstrates the algorithm used by the media session to resolve a partial topology and to find a combination of transforms between a source and a sink that will successfully convert the data.

G03DW09

This algorithm is robust, and it’s guaranteed to find a set of MFTs that will transform the data in the ways necessary to connect the source and sink nodes. The steps are as follows:

1. Begin processing.

2. Get the output media types exposed by the source.

3. Get the input media types expected by the sink.

4. Attempt to connect the components in the topology. Check to see if they can all agree on the media for input and output types. Even if all the components claimed that they can handle the media types at registration time, they may not all agree on a connection at this time.

5. If the connection succeeded, and all components agree on the media types, you are done.

6. If the connection failed, try to find an MFT registered with matching input and output media types.

7. If an MFT is found, add it to the topology between the source and sink, and go to step 4.

8. If an MFT is not found, then the first pass at building this branch of the topology has failed. Go back to step 3, but this time try combinations of several MFTs at once instead of just one.

This loop repeats until the session either finds a combination of MFTs that can all agree to connect or it runs out of untried MFT combinations. You’ll see more about connections and topology rendering in Chapter 5, “Media Foundation Transforms.”

Note If you want to use a more intelligent or custom algorithm for topology building, you can tell the media session to use a custom topology loader object. To do this, you set a configuration parameter during the MFCreateMediaSession() function call and pass in the COM CLSID of your private topology loader, which must implement the IMFTopoLoader interface. This custom topology loader will receive partial topologies from the session and will be asked to resolve them into full topologies.

Note that the media session and the default topology loader use the same logic as the DirectShow graph builder. The DirectShow graph builder loads the file in a source filter, discovers how many streams it contains, and then attempts to use the filters registered on the machine to build a pipeline that can render those streams.

The main difference between the DirectShow and MF pipeline generation systems is that, by default, the DirectShow graph builder assumes that you want to display the content. As a result the DirectShow graph builder adds the renderers to the graph automatically. Media Foundation, on the other hand, requires you to tell it the final destination for each stream. This simplifies cases in which you want to create custom pipelines for advanced scenarios, such as transcoding and stream analysis, albeit at the cost of making the video playback scenario slightly more complex.

Article III. Conclusion

This chapter covered several core Media Foundation ideas that are required to build an MF player. You were introduced to the media session, examined some of its internal functionality, and looked at how you can use the media session to control playback of a topology. You saw the basic concepts of Media Foundation asynchronous event handling and analyzed how that affects the structure of the sample program.

In addition, you looked at partial topology-building concepts, examined the structure of a topology, and looked at the algorithms needed to render it.

Using these concepts, you can start building more advanced MF applications that can handle all sorts of content types and produce various effects.

Section 3.01 Class Listings

For reference and to simplify your reading of this chapter, here are the class definitions of the core classes presented in this chapter.

The CPlayer class allows the main application to control the player. The CPlayer class wraps around the media session and hides all of the session asynchronous implementation details. Therefore, CPlayer implements the IMFAsyncCallback interface. This interface is used by the session to call the player whenever session events occur.

In addition, this listing presents the PlayerState enumeration used in the player to indicate its current state.

enum PlayerState
{
PlayerState_Closed = 0, // No session.
PlayerState_Ready, // Session was created, ready to open a file.
PlayerState_OpenPending, // Session is opening a file.
PlayerState_Started, // Session is playing a file.
PlayerState_Paused, // Session is paused.
PlayerState_Stopped, // Session is stopped (ready to play).
PlayerState_Closing
// Application has closed the session, but is waiting for
// MESessionClosed.

};
//
// The CPlayer class wraps MediaSession functionality and hides it from a calling
// application.

//
class CPlayer : public IMFAsyncCallback
{
public:
CPlayer(HWND videoWindow, HRESULT* pHr);
~CPlayer();
// Playback control
HRESULT OpenURL(PCWSTR sURL);
HRESULT Play();
HRESULT Pause();
PlayerState GetState() const { return m_state; }
// Video functionality
HRESULT Repaint();
BOOL HasVideo() const { return (m_pVideoDisplay != NULL); }
//
// IMFAsyncCallback implementation.
//
// Skip the optional GetParameters() function - it is used only in advanced players.
// Returning the E_NOTIMPL error code causes the system to use default parameters.
STDMETHODIMP GetParameters(DWORD *pdwFlags, DWORD *pdwQueue) { return E_NOTIMPL; }
// Main MF event handling function
STDMETHODIMP Invoke(IMFAsyncResult* pAsyncResult);
//
// IUnknown methods
//
STDMETHODIMP QueryInterface(REFIID iid, void** ppv);
STDMETHODIMP_(ULONG) AddRef();
STDMETHODIMP_(ULONG) Release();
protected:
// internal initialization
HRESULT Initialize();
// private session and playback controlling functions
HRESULT CreateSession();
HRESULT CloseSession();
HRESULT StartPlayback();
// MF event handling functionality
HRESULT ProcessMediaEvent(CComPtr<IMFMediaEvent>& mediaEvent);
// Media event handlers
HRESULT OnTopologyReady(void);
long m_nRefCount; // COM reference count.
CCritSec m_critSec; // critical section
CTopoBuilder m_topoBuilder;
CComPtr<IMFMediaSession> m_pSession;
CComPtr<IMFVideoDisplayControl> m_pVideoDisplay;
HWND m_hwndVideo; // Video window.
PlayerState m_state; // Current state of the media session.
HANDLE m_closeCompleteEvent; // event fired when session close is complete
};

The CTopoBuilder class is used to construct the playback topology. This class hides the complexity of creating the source and adding the source and sink nodes to the topology.

//
// The CTopoBuilder class wraps constructs the playback topology.
//
class CTopoBuilder
{
public:
CTopoBuilder(void) {};
~CTopoBuilder(void) { ShutdownSource(); };
// create a topology for the URL that will be rendered in the specified window
HRESULT RenderURL(PCWSTR sURL, HWND videoHwnd);
// get the created topology
IMFTopology* GetTopology(void) { return m_pTopology; }
// shutdown the media source for the topology
HRESULT ShutdownSource(void);
private:
CComQIPtr<IMFTopology> m_pTopology; // the topology itself
CComQIPtr<IMFMediaSource> m_pSource; // the MF source
CComQIPtr<IMFVideoDisplayControl> m_pVideoDisplay; // the EVR
HWND m_videoHwnd; // the target window
HRESULT CreateMediaSource(PCWSTR sURL);
HRESULT CreateTopology(void);
HRESULT AddBranchToPartialTopology(
IMFPresentationDescriptor* pPresDescriptor,
DWORD iStream);
HRESULT CreateSourceStreamNode(
IMFPresentationDescriptor* pPresDescr,
IMFStreamDescriptor* pStreamDescr,
CComPtr<IMFTopologyNode> &ppNode);
HRESULT CreateOutputNode(
IMFStreamDescriptor* pStreamDescr,
HWND hwndVideo,
CComPtr<IMFTopologyNode> &pNode);
};

 

  • I thought it was released in October.

Page 1 of 1 (1 items)
Leave a Comment
  • Please add 2 and 5 and type the answer here:
  • Post