In our previous entry, we talked about how video is synchronized to audio. In this short entry, we will talk about time stamps, master clocks, how adjustments to the master clock are made and how to deal with live streams.

About Reference Clocks and Stream Time

All fiters in a filter graph are synchronized to the same clock, the reference clock. The stream time is based off the reference time, but it is relative, and depends on which state the graph is. For instance, stream time doesn't move when the graph is paused; stream time goes back to 0 after a seek.

DirectShow provides a base class CBaseReferenceClock that implements the IReferenceClock interface. The base class clock object maintains two times internally:

  • internal private time
  • reference time

The internal private time is the actual time kept by the clock, and can be accessed through GetPrivateTime(). The internal private time can go backwards for brief periods of time. The reference time is based off the private time, and cannot go backwards.

Whenever a filter provides the reference clock, it will usually inherit from CBaseReferenceClock. It can either override the GetPrivateTime() function to return directly the time from the device (if available), or it can issue adjustments to the stream time through the SetTimeDelta() function. If it chooses the second method, it will need to monitor the difference between the system time and the time provided by the device.

The default reference clock in WindowsCE is provided by our audio renderer. It uses the SetTimeDelta() method to issue adjustments to the stream time. To change the reference clock in a filter graph, the interface IMediaFilter needs to be queried from the filter graph. Then use SetSyncSource() to change the reference clock.

All filters can access both the reference time and the stream time. The base filter class CBaseFilter has a m_pClock member. The reference time can be accessed by doing m_pClock->GetTime(), and the stream time is a member function of CBaseFilter, StreamTime().

About Time Stamps & Stream Time

The samples being processed in the filter graph may or may not have a time stamp, which is the media sample start and finish time. The time stamps are used in conjuction with the stream time. If a sample has a time stamp that is greater than the current stream time, it means that the sample is early. If a sample has a time stamp that is smaller than the current stream time, it is late. In a playback scenario, usually a splitter is the one that attaches time stamps to the samples. Filters may use time stamps for different reasons. For instance, time stamps may be used for presentation purposes, or to control the amount of buffering. The video renderer will use time stamps to schedule the samples for presentation, and thus, will end up throttling the video playback pipeline. When a sample arrives at the video renderer, there are several possibilities:

  • no timestamp - sample is scheduled immediately
  • in the future (timestamp > stream time) - video renderer needs to schedule the sample, and will usually call m_pClock->AdviseTime()
  • in the past (timestamp < stream time) - may render immediately, or not render at all.

The Reference Clock & The Audio Renderer

Let's assume in this section that the default Windows CE audio renderer is the reference clock. The audio renderer uses time stamps and stream time in a different way. Being the reference clock implies that the stream time is controlled by this component, so it will not follow a behavior similar to the video renderer.

In this case, as soon as the audio renderer receives a sample, it is ready to send it out to the audio driver. If the sample is late, it will drop it. Otherwise, it will send them immediately if there's buffer availability from the audio driver. It will never wait for the time to be right. For cases where the media sample times are not contiguous, that is, the end time of a sample is smaller than the start time of the next sample, then the audio renderer will write silence to the driver, and will wait until the start time for the second sample has arrived. In the normal scenario, there is going to be no space between media samples, so the audio renderer will write samples as fast as it can.

When the default audio renderer has finished processing a sample, it will read the device clock and the system clock, and compute the difference between them. Unfortunately, there is no way to get to the device clock directly, so the audio renderer uses the amsndOutGetPosition(), which can be imprecise. The audio renderer will accumulate differences, and will use a low pass filter on these differences. Whenever the average difference has gone above a certain threshold, then it will issue an adjustment to the stream time, through the usage of the SetTimeDelta() function. As soon as it does that, all filters calling GetTime() will receive the adjusted time - so the stream time will not be continuous. Note that all other filters that used m_pClock->AdviseTime() to get notified when a certain stream time has arrived (such as the video renderer) will not have to know of the stream time "change" that happened because of an adjustment. They will be advised when the stream time reached the desired value.

Live Sources & Clock Slaving

If the default Windows CE audio renderer is not the reference clock, it will write samples as it receives them. There's no automatic slave mode in WindowsCE, so the audio renderer will not wait for the time to be right before it sends the next sample.

For the case of live streams, there is one interface in our audio renderer that causes a speed up or slow down in the audio driver so that we try to match against the live source. The source filter will usually be using the IAudioRenderer->SetDriftRate() to control the audio matching speed. In this case, the audio renderer continues to be the master clock.

Another possibility when the audio renderer can't be the master clock is to simulate a slaving mode by inserting a filter in front of the audio renderer. This filter's responsibility would be to throttle the samples, so that they will be delivered just when it is almost time to send them to the audio driver. Of course, more complicated schemes are possible to try to match the rate of the incoming samples, but we will not go there here...