For Windows 8, we made some changes in our audio system designed to improve the app experience. In this blog, I discuss these changes, and how you can take advantage of them in your media apps. Hopefully this info helps you better understand how audio works in Windows Store apps, especially when it comes to audio playback in the background. Let me begin by describing a common user scenario.

You’re listening to your favorite band in a Windows music app, and a friend sends you an mp3 of a sweet live version of a song by his favorite band. When you open the mp3, Windows Media Player opens and starts playing the song. Now both your favorite band and his favorite band are playing at the same time. Then your alarm goes off reminding you to pick up your sister from school and it’s a cacophonous mess. You panic, and shut the lid to your laptop. Well, if you’re like me, you’re probably looking for a better way to manage the sounds coming out of your machine.

For Windows 8 apps, we addressed this problem with the introduction of the Playback Manager and Media Transport Controls. The Playback Manager uses audio categories, which assign behaviors to audio streams—and, by extension, audio apps. Incoming audio streams in the foreground are always allowed to play audio. But by tagging streams, Playback Manager can make intelligent decisions about how to handle multiple streams in the foreground and background.

For example, if a background capable app is playing music and is moved to the background, and then the user opens a new app in the foreground to play music, Playback Manager mutes the audio in the background app. This allows for a more fluid and intuitive user experience. Users hear what they want to hear, not what they don’t. Add in intelligent attenuation of background music for alerts (like alarms and ringtones), and easy to access to Media Transport Controls to stop, start, skip to next/previous tracks, and you have a system that is designed to keep you from panicking and frantically looking for apps that are making noise when you don’t want them to.

Why did we do what we did?

In previous versions of Windows, users could have multiple applications open and running simultaneously. On the audio front, they could minimize your media player, work on a document and surf the web at the same time on one big monitor. With two monitors, they could do much more multitasking. In Windows 8 in the new immersive environment, things have changed with respect to how many apps you can see on the screen at once. With only one app in the body and one snapped, finding the app that’s making sound becomes more of a problem if it’s on the back stack. This is the primary reason behind the evolution of Playback Manager and the Media Transport controls. With these features in place, it is now much easier to make music start and stop when you need it to. And because Playback Manager can mute apps that should not be making sound, it keeps users from having to swipe back through a stack of apps to make something be quiet.

There is no audio mixer in this environment as there is in the desktop. It still exists for desktop applications, but your app won’t show up here because we felt it was not a great experience to pop out into desktop to adjust relative app volumes. Instead, we are encouraging apps to not include volume controls. This way users are focused on the master volume control which helps simplify the entire volume experience as well – but that’s a different story.

Stream categories

To allow the system to manage when audio streams can or cannot be heard, we added stream categories.  By doing this, Windows now has the ability to make some logical decisions about whether audio should be heard.  Audio categories determine how streams are handled when apps are in the foreground (app visible on the device display) or the background (app hidden by another app).  Apps enter the foreground when they are launched or swiped onto the screen.  They are also in foreground and kept there when snapped.

For example, an app that plays some incidental sounds – interactive dings or clicks, likely doesn’t need to be heard in the background. It would be annoying.  So Playback Manager mutes this app as soon as it goes to the background.  But a media app, playing your playlist of music should continue playback so you can surf the web or work while you listen to music.

Here is a brief description of how to use the audio categories. The easiest way to play audio in a Windows 8 app is by using an audio tag.

<audio controls="controls"> 
<source src="song.mp3"/>
</audio>

JavaScript

audtag = document.createElement('audio');
audtag.setAttribute("id", "audtag");
audtag.setAttribute("msAudioCategory", "BackgroundCapableMedia");
document.getElementById("MediaElement").appendChild(audtag);
audtag.load();

C#

In C#, you can set the audio category like this:

Playback.SetAudioCategory(AudioCategory.BackgroundCapableMedia);
Playback.SelectFile();

There are several audio categories to choose from. The table lists the available categories, and provides a description of the behavior associated with each one. You need to decide very carefully which category should be associated with which stream because your app will behave differently in each case.

Available stream categories

Stream type

Description

Background capable?

ForeGroundOnlyMedia

Games or other sounds designed to work only in the foreground, but will mute existing background media sounds.

  • Game audio needed for a game (dancing games, music games)
  • Feature films (designed to pause when they go to the background)

No

BackgroundCapableMedia

For audio that needs to continue playing in the background. Examples include:

  • Local media playback
  • Local playlist
  • Streaming radio
  • Streaming playlist
  • Music videos
  • Streaming audio/radio, YouTube, Netflix, and so on

Yes

Communications

For audio streaming communication audio such as:

  • Voice over IP (VoIP)
  • Real-time chat or other type of phone call

Yes

Alert

Looping or longer running alert sounds:

  • Alarm
  • Ring tones
  • Ringing notification
  • Sounds that need to decrease existing audio

No

GameMedia

Background music played by a game

No

GameEffects

Game sound effects designed to mix with existing audio, such as:

  • Balls bouncing, engine sounds, and so on
  • Characters talking
  • All non-music sounds

No

SoundEffects

Sounds designed to mix with existing audio, such as beeps, dings, and other brief sounds

No

Other

Default stream category used for uncategorized streams.

No

Here are some rules of thumb to help you decide how to categorize your audio stream:

  • To use multiple stream types, you can generate your audio and video tags dynamically, tearing them down as you go.
  • Using the Communications category automatically defaults to low-latency audio. This creates one less step for you when designing two-way communications apps.
  • Don’t configure your audio tag to use low latency mode unless it is absolutely necessary. This is because communication streams already default to low-latency mode. A large amount of CPU resources will be required if you have an audio stream in low latency mode and a communication stream is also initialized.

Stream categories and app behavior

[App behaviors are documented in detail in Audio Playback in a Metro Style App]

The overall rationale for determining if an app can be heard is largely based on whether it is in the foreground. But when you factor in other considerations, like communications apps and background capable media apps, things get more complicated. But don’t worry - in general, app behavior based on stream type is fairly easy to explain. These rules apply:

  • If your app is in the foreground, it doesn’t matter what category its audio stream is. It will always play sound, unless the system is muted or the volume is down.
  • Only one background capable audio app can play at a time, except when two are in the foreground.
  • Communications apps will always attenuate other system sounds when a call comes in on a communications stream type. If background music was playing, to hear the music again while in a call, the user can bring the music app to the foreground (snap the app or just bring it forward full screen) and then rune 1 applies.
  • Sounds from an app in the foreground will mix with background-capable audio if the foreground streams are not incompatible with the background stream. For example, a ForegroundOnlyMedia stream will mute a BackgroundCapableMedia stream playing in the background, but GameEffects will mix with the BackgroundCapableMedia so users can play games and listen to music.

When you choose a category for an app, you must follow some rules for your app to work as expected. To have your app play audio in the background, you must:

  • Add a background audio declaration in the app manifest
  • Set msAudioCategory to either Communications or BackgroundCapableMedia
  • Your app must register for media transport controls (more on this later)

For categories that don’t play audio in the background, you don’t need to use a background capable media type. Instead, choose Other, or use GameEffects, or another relevant type.

For general info about using apps in the background, see http://blogs.msdn.com/b/windowsappdev/archive/2012/05/24/being-productive-in-the-background-background-tasks.aspx.

Background audio and connected standby

Connected standby is a new low power state in Windows 8 that enables a smartphone-like power mode on system-on-a-chip (SoC) devices. In connected standby, the device looks like it’s off, but it’s not. Apps can continue to play music in the background, receive updates, and use the network. For more info see the blog post on improving power efficiency for applications. For streaming music apps, there are some important steps to take if you want your background audio app to continue streaming from the network when a device enters connected standby.

In a nutshell, if the network connection is not up, your app can’t play streaming audio. Local audio will work fine, but if you have a server that’s dictating a playlist, your app won’t get the next song until your network is up.

For your app to continue to stream music even when the device is in connected standby, you have 3 options:

  1. Use the Background Transfer API, which will do all the work for you.
  2. Wrap an existing MF bytestream, when just a small amount of data needs to be transferred.
  3. Use Custom Media Foundation source or bytestream.

For details about these options; see Writing a power savvy background media app.

SoundLevel system notifications

SoundLevel notifications help apps know when they can be heard. When we first started developing the Playback Manager, we were not certain if we would actually end up sending notifications to apps to inform them they were muted, but we soon realized we could end up with apps in the background that are rendering audio or video and using up system resources unnecessarily, which could be a drain on the battery. So, when you register your app to receive SoundLevel notifications, Playback Manager will send your app one of 3 such notifications, depending on the audible state of your app:

  • SoundLevel(Full) means your audio from your app is audible
  • SoundLevel(Mute) means audio from your app is not audible (but can still play audio)
  • SoundLevel(Low) means your app has been attenuated by -28dB and is barely (but still) audible.

Let’s talk about why each notification is sent and what your app should do in response to each one.

SoundLevel(Full)

A SoundLevel(Full) notification will be sent to your app when:

What your app must do:

A non-background capable app moves from the background to the foreground

No action needed

Rhe app first becomes visible after startup

No action needed

SoundLevel(Mute)

A SoundLevel(Mute) notification will be sent to your app when:

What your app must do:

A non-background capable app moves from the foreground to the background

Pause

Another similar stream type begins to play while your app is playing (e.g. a second BackgroundCapableMedia stream begins while another is playing, the older will get a Mute)

Pause (unless you have a good reason not to) to save system resources

If a Communications stream type is in the background, it will get a SoundLevel(Mute) when another Communications stream is started in the foreground

Place the call on hold

SoundLevel(Low)

A SoundLevel(Low) notification will be sent to your app when:

What your app must do:

A communications stream begins to play while your app is playing audio

Pause if it is playing content that you don’t want the user to miss. It can be heard, but the volume will be very low unless it is moved to the foreground. So although it is your choice here, we recommend you pause.

 

Here is a sample showing how to register for and use the SoundLevel events:

JavaScript

// Create the media control. 

mediaControl = Windows.Media.MediaControl;

// Add event listeners for PBM notifications to illustrate app is
// getting a new SoundLevel and pass the audio tag to the function

mediaControl.addEventListener("soundlevelchanged", soundLevelChanged, false);


function soundLevelChanged() {

// Catch SoundLevel notifications and determine SoundLevel state. If it's muted, pause the player.

var soundLevel = Windows.Media.MediaControl.soundLevel;


switch (soundLevel) {

case Windows.Media.SoundLevel.muted:
log(getTimeStampedMessage("App sound level is: Muted"));
break;
case Windows.Media.SoundLevel.low:
log(getTimeStampedMessage("App sound level is: Low"));
break;
case Windows.Media.SoundLevel.full:
log(getTimeStampedMessage("App sound level is: Full"));
break;
}

C#

// add new handlers
MediaControl.SoundLevelChanged += MediaControl_SoundLevelChanged;
MediaControl.PlayPauseTogglePressed += MediaControl_PlayPauseTogglePressed;
MediaControl.PlayPressed += MediaControl_PlayPressed;
MediaControl.PausePressed += MediaControl_PausePressed;
MediaControl.StopPressed += MediaControl_StopPressed;

// save current handlers
SoundLevelChangedHandler = MediaControl_SoundLevelChanged;

string SoundLevelToString(SoundLevel level)
{
string LevelString;

switch (level)
{
case SoundLevel.Muted:
LevelString = "Muted";
break;
case SoundLevel.Low:
LevelString = "Low";
break;
case SoundLevel.Full:
LevelString = "Full";
break;
default:
LevelString = "Unknown";
break;
}
return LevelString;
}

Capture and loopback

We added a couple of small considerations to capture and loopback. First, if you decide to create a capture-based app, you’ll need to use the “Other” category to tag your capture stream (or the system will assign that category),but more importantly, you can’t categorize your capture stream with any other category. For Communications apps, when you have labeled your render stream as a Communications stream, you don’t need to tag your capture stream.

In addition, when the system sends SoundLevel(Mute), render, capture and loopback are all muted.

Why not use Visibility Notifications?

Although app visibility is closely tied to app audibility, it is important to only listen to SoundLevel events if you want to know when your app is heard. Don’t use visibility events for this, or your app’s audible state may be out of sync with its visible state.

Media Transport controls

[See System Transport Controls for a complete usage guide.]

The addition of the globally available Media Transport control (shown in the next figure) makes playing and controlling music from a music app a breeze.

start_screen
Figure 1 - View of the Start screen with Media Transport Control

mt_control
Figure 2 - Cropped view of just the Media Transport Control
showing album art and metadata.

The MTC UI (as it is affectionately known in the halls of the Media Platform team) allows a user to play/pause audio whether it is in the foreground or the background, even if the app has been suspended in the background. This UI is invoked when a user presses the volume buttons on a keyboard or slate. It shows up everywhere: the new immersive environment, desktop, and even in the lock screen. It is a great addition to Windows 8 and your apps can (and should) make use of it where appropriate.

Going back to the opening scenario (where the user has a bunch of audio playing at once and closes the lid), it is for this reason that we felt a global transport control like this was needed. Because there are not multiple windows to switch to, like on the Desktop, users must have a way to quickly stop an app that is making sound. We felt that tying the MTC UI to the volume controls would be intuitive and it would provide a lot of detail about what’s playing on a user’s system in one button press.

If you are using a media app, (and especially one that is background capable) here’s how you can tie in to the controls:

JavaScript:

// Assign the button object to MediaControls
MediaControls = Windows.Media.MediaControl;

// Add event listeners for the buttons
MediaControls.addEventListener(“playselected”, play, false);
MediaControls.addEventListener(“pauseselected”, pause, false);
MediaControls.addEventListener(“playpausetoggleselected”, playpausetoggle, false);

C#

using Windows.Media;
MediaControl.SoundLevelChanged += MediaControl_SoundLevelChanged;
MediaControl.PlayPauseTogglePressed += MediaControl_PlayPauseTogglePressed;
MediaControl.PlayPressed += MediaControl_PlayPressed;
MediaControl.PausePressed += MediaControl_PausePressed;
MediaControl.StopPressed += MediaControl_StopPressed;

The next code snippet shows how to enable the Previous Track and Next Track buttons by adding event listeners to the MediaControl object.

JavaScript

// enable the previous track button
MediaControls.addEventListener(“previoustrackselected”, previoustrack, false);

// enable the next track button
MediaControls.addEventListener(“nexttrackselected”, nexttrack, false);

C#

MediaControl.NextTrackPressed   += MediaControl_NextTrackPressed; 
MediaControl.PreviousTrackPressed -= MediaControl_PreviousTrackPressed;

You can also add artist metadata and artwork to the flyout. See the white paper referenced above for details.

The MTC UI is populated by the media app in use. The UI stays populated until is the system clears it. The situations when the UI will be cleared are:

  1. User closes the app
  2. User clears recently used apps from app switch list
  3. The system terminates the app to free up resources
  4. The app crashes
  5. User shuts down the machine
  6. App unregisters for MTC controls

Summary

We’ve worked hard to bring a totally new audio paradigm your way for Windows 8 that lines up with new form factors and a new user experience in Windows. Audio Categories provide Windows with context so the system will treat your app appropriately. If your app needs to play background audio, all you do is set the category, hook up media transport controls and declare it in the manifest.

SoundLevel notifications let you know if your app can be heard by the user. This gives you info you need to help save battery life by pausing the audio if it can’t be heard in the background.

The Media Transport UI shows volume state, allows users to quickly start and stop audio, and get artist info. from a media app.

We hope you use these new technologies in your apps to enable great audio experiences for your customers. All in all they are easy to implement, and can have huge payback for users.

--Johnny Bregar, Program Manager, Windows