# Matthew van Eerde's web log

• #### Why is 1 Pascal equal to 94 dB Sound Pressure Level? (1 Pa = 94 dB SPL)

Last time we talked about why a full-scale digital sine wave has a power measurement of -3.01 dB FS (Spoiler: because it's not a square wave.)

This time we'll discuss why an atmospheric sound which generates a root-mean-square pressure of 1 Pascal has a power measurement 94 dB SPL.

As before, dB is defined as 10 log10(PA2 / PB2) where PB is a reference level.

Before, we had a digital measurement with an obvious ceiling: sample values of -1 and 1. So the reference point 0 dB FS was defined in terms of the signal with the greatest possible energy.

In the analog domain, there isn't an obvious ceiling. We instead consider the floor - the quietest possible signal that is still audible by human ears.

This is a rather wishy-washy definition, but the convention is to take PB = 20 μPa = 0.00002 Pa exactly.

So our 0 dB SPL reference point is when PA = PB: 0 dB SPL = 10 log10(0.000022 / 0.000022) = 10 log10(1) = 10 (0) = 0.

What if the pressure level is 1 Pascal? This is a quite loud sound, somewhere between heavy traffic and a jackhammer.

1 Pa in dB SPL =

10 log10(12 / PB2) =

20 log10(1 / PB) =

-20 log10(PB) =

-20 log10(2(10-5)) =

-20 (log10 2 + log10 10-5) =

-20 ((log10 2) - 5) =

100 - 20 log10 2 ≈ 93.9794 dB SPL

So 1 Pa is actually a tiny bit less than 94 dB SPL; it's closer to 93.98 = (100 - 6.02) dB SPL.

• #### Arbitrary HTML and JavaScript injection

Any HTML you want

4

• #### Getting peak meters and volume settings for all apps and audio devices on the system

A few previous posts have touched on how to get peak meter readings on the device, and per-app

Let's put it all together and write a single command-line tool which enumerates:
1. All active audio devices (both playback and recording)
2. Dumps the peak meter and volume levels for each device
3. All active audio applications (audio sessions) per device
4. Dumps the peak meter and volume levels for each audio session
Note there is no API for enumerating individual streams within a session.
Pseudocode:
For each flow in (render, capture)
For each device in IMMDeviceEnumerator::EnumAudioEndpoints(flow)
Display the name of the device
Get and display IAudioMeterInformation::GetPeakValue for the device
Get and display IAudioEndpointVolume data for the device
For each session in IAudioSessionManager2::GetSessionEnumerator
Skip the session unless the state is "active"
Get and display IAudioMeterInformation::GetPeakValue for the session
Display session information
Get and display ISimpleAudioVolume information
Get and display IChannelAudioVolume information

Sample output:

>meters.exe
-- Playback devices --
Line out (High Definition Audio Device)
Peak: 0.404736
Mute: 0
Volume range: 0% to 100% (-46.5 dB to 0 dB in steps of 1.5 dB)
Master: 74% (-4.32831 dB)
Channel 1 of 2: 74% (-4.32831 dB)
Channel 2 of 2: 74% (-4.32831 dB)

Active session #1
Peak value: 0.240089
Icon path:
Display name:
Grouping parameter: {98710e41-6535-4cf0-b9b3-4181a0b7103e}
Process ID: 3496 (single-process)
System sounds session: no
Package full name: Microsoft.ZuneMusic_2.2.41.0_x64__8wekyb3d8bbwe
Master volume: 1 (0 dB FS)
Not muted
Channel #1 volume: 1 (0 dB FS)
Channel #2 volume: 1 (0 dB FS)

Active session #2
Peak value: 0.329753
Icon path:
Display name:
Grouping parameter: {fc078096-d2fc-4883-8b0d-af4619266c02}
Process ID: 6720 (multi-process)
System sounds session: no
HWND: 0x00000000000D1390 Windows Media Player
Master volume: 1 (0 dB FS)
Not muted
Channel #1 volume: 1 (0 dB FS)
Channel #2 volume: 1 (0 dB FS)

Internal speakers (High Definition Audio Device)
Peak: 0
Mute: 1
Volume range: 0% to 100% (-46.5 dB to 0 dB in steps of 1.5 dB)
Master: 65.7804% (-6 dB)
Channel 1 of 1: 65.7804% (-6 dB)

-- Recording devices --
Microphone (High Definition Audio Device)
Peak: 0.000274658
Mute: 0
Volume range: 0% to 100% (-34.5 dB to 12 dB in steps of 1.5 dB)
Master: 84.7652% (6 dB)
Channel 1 of 2: 84.7652% (6 dB)
Channel 2 of 2: 84.7652% (6 dB)

Active session #1
Peak value: 0.000274658
Icon path:
Display name:
Grouping parameter: {cee77f5a-d651-4392-8ffc-232c6eecdf51}
Process ID: 8212 (single-process)
Session identifier: {0.0.1.00000000}.{878a0979-89d6-43ec-9cff-e3f70dac2618}|\Device\HarddiskVolume1\Program Files\WindowsApps\Microsoft.WindowsSoundRecorder_6.3.9600.16384_x64__8wekyb3d8bbwe\soundrec.exe%b{00000000-0000-0000-0000-000000000000}
Session instance identifier: {0.0.1.00000000}.{878a0979-89d6-43ec-9cff-e3f70dac2618}|\Device\HarddiskVolume1\Program Files\WindowsApps\Microsoft.WindowsSoundRecorder_6.3.9600.16384_x64__8wekyb3d8bbwe\soundrec.exe%b{00000000-0000-0000-0000-000000000000}|1%b8212
System sounds session: no
Package full name: Microsoft.WindowsSoundRecorder_6.3.9600.16384_x64__8wekyb3d8bbwe
Master volume: 0.847652 (-1.43565 dB FS)
Not muted
Channel #1 volume: 1 (0 dB FS)
Channel #2 volume: 1 (0 dB FS)

Active session #2
Peak value: 0.000274658
Icon path:
Display name:
Grouping parameter: {c346e9e3-a37e-427b-a2be-1feb2c81b469}
Process ID: 2608 (single-process)
Session identifier: {0.0.1.00000000}.{878a0979-89d6-43ec-9cff-e3f70dac2618}|\Device\HarddiskVolume1\Windows\System32\SoundRecorder.exe%b{00000000-0000-0000-0000-000000000000}
Session instance identifier: {0.0.1.00000000}.{878a0979-89d6-43ec-9cff-e3f70dac2618}|\Device\HarddiskVolume1\Windows\System32\SoundRecorder.exe%b{00000000-0000-0000-0000-000000000000}|1%b2608
System sounds session: no
HWND: 0x00000000004611FA Sound Recorder
Master volume: 0.847652 (-1.43565 dB FS)
Not muted
Channel #1 volume: 1 (0 dB FS)
Channel #2 volume: 1 (0 dB FS)

Source and binaries attached.

EDIT September 22 2015: moved source to github https://github.com/mvaneerde/blog/tree/master/meters

• #### shellproperty.exe v2: read all properties on a file; set properties of certain non-VT_LPWSTR types

1. Read all the properties from a given file in one go.
2. Recognize properties by their canonical name (if they have one.)
3. Set a property to VT_EMPTY (removing it), or "VT_VECTOR | VT_LPWSTR", or VT_UI4, in addition to VT_LPWSTR.

Usage:

>shellproperty.exe
shellproperty read [ <key> | all ] from <filename>
shellproperty set <key> on <filename> to <vartype> <vartype-specific-arguments>

<vartype>: VT_EMPTY | VT_LPWSTR | "VT_VECTOR | VT_LPWSTR" | VT_UI4

Example of reading all properties from a file:

>shellproperty read all from "I 01 Track 1.mp3" | sort
{9E5E05AC-1936-4A75-94F7-4704B8B01923} 0: VT_BSTR I 01 Track 1.mp3
{CFA31B45-525D-4998-BB44-3F7D81542FA4} 1: VT_LPWSTR MP3
System.AppUserModel.ID:
System.AppUserModel.ParentID:
System.Audio.ChannelCount: 2 (stereo)
System.Audio.EncodingBitrate: 320kbps
System.Audio.Format: {00000055-0000-0010-8000-00AA00389B71}
System.Audio.IsVariableBitRate: No
System.Audio.PeakValue: 23841
System.Audio.SampleRate: 44 kHz
System.Audio.SampleSize: 16 bit
System.Audio.StreamNumber: 0
System.Author: Unknown artist
System.ComputerName: MATEER-D (this PC)
System.ContentType: audio/mpeg
System.DateAccessed: 9/3/2013 5:55 PM
System.DateCreated: 9/3/2013 5:55 PM
System.DateImported: 9/3/2013 5:55 PM
System.DateModified: 9/24/2013 3:21 PM
System.Document.DateCreated: 9/3/2013 5:55 PM
System.Document.DateSaved: 9/24/2013 3:21 PM
System.DRM.IsProtected: No
System.ExpandoProperties:
System.FileAttributes: A
System.FileAttributesDisplay:
System.FileExtension: .mp3
System.FileName: I 01 Track 1.mp3
System.FileOwner: REDMOND\mateer
System.FilePlaceholderStatus: 7
System.IsFolder: Files
System.IsShared: No
System.ItemAuthors: Unknown artist
System.ItemDate: 9/3/2013 5:55 PM
System.ItemFolderNameDisplay: Les Misérables (concept album)
System.ItemFolderPathDisplay: C:\music\Claude-Michel Schönberg & Alain Boublil\Les Misérables (concept album)
System.ItemFolderPathDisplayNarrow: Les Misérables (concept album) (C:\music\Claude-Michel Schönberg & Alain Boublil)
System.ItemName: I 01 Track 1.mp3
System.ItemNameDisplay: I 01 Track 1.mp3
System.ItemNameDisplayWithoutExtension: I 01 Track 1
System.ItemParticipants: Unknown artist
System.ItemPathDisplay: C:\music\Claude-Michel Schönberg & Alain Boublil\Les Misérables (concept album)\I 01 Track 1.mp3
System.ItemPathDisplayNarrow: I 01 Track 1 (C:\music\Claude-Michel Schönberg & Alain Boublil\Les Misérables (concept album))
System.ItemType: MP3 File
System.ItemTypeText: MP3 File
System.Kind: Music
System.KindText: Music
System.Media.AverageLevel: 4219
System.Media.ClassPrimaryID: {D1607DBC-E323-4BE2-86A1-48A42A28441E}
System.Media.ClassSecondaryID: {00000000-0000-0000-0000-000000000000}
System.Media.CollectionGroupID: {3B02CC9D-BE3E-43A4-81AA-DC23DFD20083}
System.Media.CollectionID: {3B02CC9D-BE3E-43A4-81AA-DC23DFD20083}
System.Media.ContentID: {3780156C-B516-4897-B6AC-CB632A0CA4A5}
System.Media.DlnaProfileID: MP3
System.Media.Duration: 00:04:47
System.Media.Publisher: Colosseum
System.Media.UniqueFileIdentifier: AMGt_id=T 987037;AMGp_id=P 1857378;AMGa_id=R 189777;X_id={9D0F0F00-0500-11DB-89CA-0019B92A3933};XA_id={51E50200-0400-11DB-89CA-0019B92A3933};XAP_id={6357088C-778C-11DC-9403-0019B9B20868}
System.Media.Year: 1989
System.MIMEType: audio/mpeg
System.Music.AlbumArtist: Various Artists
System.Music.AlbumID: Various Artists - Les Miserables - French Concept Album: 1 of 2
System.Music.AlbumTitle: Les Miserables - French Concept Album: 1 of 2
System.Music.Artist: Unknown artist
System.Music.Composer: Alain Boublil; Claude-Michel Schönberg
System.Music.DisplayArtist: Various Artists
System.Music.Genre: Unknown genre
System.Music.PartOfSet: 1/1
System.Music.TrackNumber: 1
System.NetworkLocation:
System.NotUserContent: No
System.OfflineAvailability: Available offline
System.OfflineStatus:
System.ParsingName: I 01 Track 1.mp3
System.ParsingPath: C:\music\Claude-Michel Schönberg & Alain Boublil\Les Misérables (concept album)\I 01 Track 1.mp3
System.PerceivedType: Audio
System.SFGAOFlags: 1077936503
System.SharedWith:
System.ShareScope: music\Claude-Michel Schönberg & Alain Boublil\Les Misérables (concept album)
System.SharingStatus: Not shared
System.Shell.SFGAOFlagsStrings: filesys; stream
System.Size: 10.9 MB
System.ThumbnailCacheId: 16520045390528741485
System.Title: Track 1
System.VolumeId: {14FF6E9D-14F5-11E3-824C-806E6F6E6963}
System.ZoneIdentifier: 0

Example of updating a file:

>type _fixup.bat
@echo off

for /f "usebackq delims=" %%f in (`dir /s /b "I *.mp3"`) do (
shellproperty set System.Music.AlbumTitle on "%%f" to VT_LPWSTR "Madama Butterfly - Sinopoli / Freni: 1 of 3"
)

for /f "usebackq delims=" %%f in (`dir /s /b "II *.mp3"`) do (
shellproperty set System.Music.AlbumTitle on "%%f" to VT_LPWSTR "Madama Butterfly - Sinopoli / Freni: 2 of 3"
)

for /f "usebackq delims=" %%f in (`dir /s /b "III *.mp3"`) do (
shellproperty set System.Music.AlbumTitle on "%%f" to VT_LPWSTR "Madama Butterfly - Sinopoli / Freni: 3 of 3"
)

Source and binaries (x86 and amd64) attached.

EDIT September 22 2015: moved source to github https://github.com/mvaneerde/blog/tree/master/shellproperty

• #### Sample app for RECT functions

Riffing on Raymond Chen's post today about SubtractRect I threw together a toy app which demonstrates three rectangle functions: UnionRect, IntersectRect, and SubtractRect.

Usage:

>rects.exe
rects.exe
union     (left1 top1 right1 bottom1) (left2 top2 right2 bottom2) |
intersect (left1 top1 right1 bottom1) (left2 top2 right2 bottom2) |
subtract  (left1 top1 right1 bottom1) (left2 top2 right2 bottom2)

Sample output:

>rects.exe union (2 2 6 6) (4 4 8 8)
(left = 2; top = 2; right = 6; bottom = 6)
union (left = 4; top = 4; right = 8; bottom = 8)
= (left = 2; top = 2; right = 8; bottom = 8)

Source and binaries (amd64 and x86) attached.

Still no pictures though.

Exercise: implement BOOL SymmetricDifferenceRect(_Out_ RECT *C, const RECT *A, const RECT *B).

EDIT September 22 2015: moved source to github https://github.com/mvaneerde/blog/tree/master/rects

• #### shellproperty.exe - set/read string properties on a file from the command line

Yesterday Raymond Chen blogged a "Little Program" which could edit audio metadata. As it happens, I have a similar tool I threw together which accepts a property key and a string property value to update a property, or can read a string or string-vector property.

Usage:

>shellproperty
shellproperty set <key> to <string> on <filename>

Here's an example _fixup.bat script I use to set audio metadata on my copy of Giuseppe Sinopoli's recording of Madama Butterfly, to help distinguish it from other recordings of the same opera that I have.

@echo off
dir /s /b "I *.mp3" | xargs /addquotes shellproperty set PKEY_Music_AlbumTitle to "Madama Butterfly - Sinopoli / Freni: 1 of 3" on
dir /s /b "II *.mp3" | xargs /addquotes shellproperty set PKEY_Music_AlbumTitle to "Madama Butterfly - Sinopoli / Freni: 2 of 3" on
dir /s /b "III *.mp3" | xargs /addquotes shellproperty set PKEY_Music_AlbumTitle to "Madama Butterfly - Sinopoli / Freni: 3 of 3" on

Source and amd64/x86 binaries attached, but in substance it's very similar to Raymond's "Little Program".

Possible future improvements:

1. When setting, allow specifying a vartype on the command line.
2. Allow specifying a property key by fmtid and pid.
3. Handle more vartypes for displaying properties.
4. Allow dumping all properties on a given file.

EDIT September 22 2015: removed source and binaries as this is obsoleted by http://blogs.msdn.com/b/matthew_van_eerde/archive/2013/09/24/shellproperty-exe-v2-read-all-properties-on-a-file-set-properties-of-certain-non-vt-lpwstr-types.aspx

• #### Even if someone's signaling right, they still have the right of way

I was driving to work this morning and I had an experience which vindicated my paranoia, and may even have passed it on to someone else.

I was heading East on NE 80th St approaching 140th Ave NE in Redmond. This is a two-way stop; drivers on 140th have the right of way and do not stop. Drivers on 80th (me) have to stop.

I came to a full stop and signaled right (I wanted to head South on 140th). A driver (let's call him Sam) pulled up behind me, also signaling right. There were three cars heading South on 140th, all of them signaling right (they wanted to head West on 80th).

At this point I had a conversation with myself that went something like this.

Well, Matt, you could turn right now. All those cars are turning right, so they won't hit you.
But wait, Matt. Those cars have the right of way. Sure they're signaling right. But that doesn't mean they'll actually turn right.
Yup, you're right, Matt. Better to wait to see what actually happens.

So I waited, and sure enough, all three cars actually turned right. So I suppose I could have gone. And more cars were feeding in to 140th from Redmond Way. And all of these cars were signaling right. And one was a school bus.

At this point Sam (remember Sam?) got impatient and honked his horn. This shocked me a little.

I imagine anyone who is from New York or Los Angeles is shaking their heads at me now. Not for waiting, but for being shocked. "He honked his horn? So what?"

(This is a cultural difference. In New York or Los Angeles, if you're waiting at a red light, you will get honked at as soon as the other guy's light turns yellow. But in Washington, the guy behind you will calmly wait through two full greens, then politely knock on your window and ask if everything is OK.)

I trust the school bus even less than the cars, so I let the school bus go.

The car behind the school bus is a minivan. He's signaling right, too. But I let him go as well... and he goes straight!

Behind the minivan, there's enough of a gap that I feel comfortable pulling out, so I do. And Sam pulls up to the line.

As I'm cruising down 140th, I glance in my rear-view mirror. I see a line of cars coming down 140th to the intersection I just left, all signaling right...

... and I see my friend Sam...

... patiently waiting.

May the Force be with you, Sam.

• #### Getting the package full name of a Windows Store app, given the process ID

Last time I talked about enumerating audio sessions and showed an example which listed several Desktop apps and one Windows Store app.

It's possible to guess that this is a Windows Store app by the presence of the WWAHost.exe string in the session instance identifier. Don't rely on this, though; the session identifiers are opaque strings, and their formula can change at any time.

We were able to get some additional information on the Desktop apps by enumerating their top-level windows and reading the window text. But how do we get more information on the Windows Store app? And how do we even know it's a Windows Store app without cracking the session identifier?

By using the Application Model APIs - for example, GetPackageFullName.

Pseudocode:

... get a process ID...

OpenProcess(PROCESS_QUERY_LIMITED_USER_INFORMATION, FALSE, pid);

GetPackageFullName(...)

if APPMODEL_ERROR_NO_PACKAGE then the process has no associated package and is therefore not a Windows Store app.

Updated sample output:

-- Active session #4 --
Icon path:
Display name:
Grouping parameter: {8dbd87b0-9fce-4c27-b7ff-4b20b0dae1a3}
Process ID: 11644 (single-process)
System sounds session: no
Peak value: 0.395276
Package full name: Microsoft.ZuneMusic_2.0.132.0_x64__8wekyb3d8bbwe

Source and binaries attached.

EDIT September 22 2015: removed source and binaries since this is obsoleted by http://blogs.msdn.com/b/matthew_van_eerde/archive/2013/09/26/getting-peak-meters-and-volume-settings-for-all-apps-and-audio-devices-on-the-system.aspx

• #### More on IAudioSessionControl and IAudioSessionControl2, plus: how to log a GUID

I decided to go back and push this a little further and see what information there was to dig out. Pseudocode:

CoCreate(IMMDeviceEnumerator)
MMDevice = IMMDeviceEnumerator::GetDefaultAudioEndpoint(...)
AudioSessionManager2 = MMDevice::Activate(...)
AudioSessionEnumerator = AudioSessionManager2::GetSessionEnumerator()

for each session in AudioSessionEnumerator {
AudioSessionControl = AudioSessionEnumerator::GetSession(...)
if (AudioSessionStateActive != AudioSessionControl::GetState()) { continue; }

AudioSessionControl::GetIconPath (usually blank)
AudioSessionControl::GetDisplayName (usually blank)
AudioSessionControl::GetGroupingParam

AudioSessionControl2 = AudioSessionControl::QueryInterface(...)
AudioSessionControl2::GetSessionIdentifier (treat this as an opaque string)
AudioSessionControl2::GetSessionInstanceIdentifier (treat this as an opaque string)
AudioSessionControl2::GetProcessId (some sessions span multiple processes)
AudioSessionControl2::IsSystemSoundsSession

AudioMeterInformation = AudioSessionControl::QueryInterface(...)
AudioMeterInformation::GetPeakValue

for each top level window in the process pointed to by AudioSessionControl2::GetProcessId {
Use WM_GETTEXTLENGTH and WM_GETTEXT to get the window text, if any
}
}

Here's the output of the new version of meters.exe.

>meters.exe
-- Active session #1 --
Icon path:
Display name:
Process ID: 11812 (single-process)
System sounds session: no
Peak value: 0.2837

-- Active session #2 --
Icon path:
Display name:
Grouping parameter: {a2e2e0f5-81bb-407e-b701-f4f3695f9dac}
Process ID: 15148 (single-process)
Session identifier: {0.0.0.00000000}.{125eeed2-3cd2-48cf-aac9-8ae0157564ad}|\Device\HarddiskVolume1\Program Files (x86)\Internet Explorer\iexplore.exe%b{00000000-0000-0000-0000-000000000000}
Session instance identifier: {0.0.0.00000000}.{125eeed2-3cd2-48cf-aac9-8ae0157564ad}|\Device\HarddiskVolume1\Program Files (x86)\Internet Explorer\iexplore.exe%b{00000000-0000-0000-0000-000000000000}|1%b15148
System sounds session: no
Peak value: 0.428589
HWND: 0x0000000001330B12
HWND: 0x0000000000361CA2
HWND: 0x00000000019A07A8
HWND: 0x0000000001411BF2
HWND: 0x0000000000B60706
HWND: 0x000000000231165A
HWND: 0x0000000002631472
HWND: 0x0000000000441D94

-- Active session #3 --
Icon path:
Display name:
Grouping parameter: {e191c91d-dc24-468d-b542-0d5f12ce8c48}
Process ID: 2324 (multi-process)
System sounds session: no
Peak value: 0.294137
HWND: 0x0000000002900C86 Windows Media Player

-- Active session #4 --
Icon path: @%SystemRoot%\System32\AudioSrv.Dll,-203
Display name: @%SystemRoot%\System32\AudioSrv.Dll,-202
Grouping parameter: {e7d6e107-ca03-4660-a067-1a1f3dc1619c}
Process ID: 0 (multi-process)
System sounds session: yes
Peak value: 0.0502903

-- Active session #5 --
Icon path:
Display name:
Grouping parameter: {2a3e30fb-2ded-471e-9c2f-cbd8572b2af2}
Process ID: 15948 (single-process)
Session instance identifier: {0.0.0.00000000}.{125eeed2-3cd2-48cf-aac9-8ae0157564ad}|\Device\HarddiskVolume1\Program Files (x86)\VideoLAN\VLC\vlc.exe%b{00000000-0000-0000-0000-000000000000}|1%b15948
System sounds session: no
Peak value: 0.287567
HWND: 0x0000000000C8160C Opening Ceremony - VLC media player

Active sessions: 5

Part of this was logging the grouping parameter, which is a GUID. I've seen a lot of code that converts the GUID to a string and logs it using %s. Another way is to use a couple of macros and let the format string do the conversion for you:

#define GUID_FORMAT L"{%08x-%04x-%04x-%02x%02x-%02x%02x%02x%02x%02x%02x}"
#define GUID_VALUES(g) \
g.Data1, g.Data2, g.Data3, \
g.Data4[0], g.Data4[1], g.Data4[2], g.Data4[3], \
g.Data4[4], g.Data4[5], g.Data4[6], g.Data4[7]

...

GUID someGuid = ...;

LOG(L"The value of someGuid is " GUID_FORMAT L".", GUID_VALUES(someGuid));

Standard caveats about not using side effects inside a macro apply. For example, this would be a bug:

for (GUID *p = ...) {
LOG(L"p = " GUID_FORMAT L".", GUID_VALUES(*(p++)); // BUG!
}

Source, amd64 binaries, and x86 binaries attached.

EDIT September 22 2015: removed source and binaries since this is obsoleted by http://blogs.msdn.com/b/matthew_van_eerde/archive/2013/09/26/getting-peak-meters-and-volume-settings-for-all-apps-and-audio-devices-on-the-system.aspx

• #### Buffer size alignment and the audio period

I got an email from someone today, paraphrased below:

Q: When I set the sampling frequency to 48 kHz, and ask Windows what the audio period is, I get exactly 10 milliseconds. When I set it to 44.1 kHz, I get very slightly over 10 milliseconds: 10.1587 milliseconds, to be precise. Why?
A: Alignment.

A while back I talked a bit about the WASAPI exclusive-mode alignment dance. Some audio drivers have a requirement that they deal in buffer sizes which are multiples of a certain byte size - for example, a common alignment restriction for HD Audio hardware is 128 bytes.

A more general audio requirement is that buffer sizes be a multiple of the size of a PCM audio frame.
For example, suppose the audio format of a stream is stereo 16-bit integer. A single PCM audio frame will be 2 * 2 = 4 bytes. The first two bytes will be the 16-bit signed integer with the sample value for the left channel; the last two bytes will be the right channel.
As another example, suppose the audio format of a stream is 5.1 32-bit floating point. A single PCM audio frame will be 6 * 4 = 24 bytes. Each of the six channels are a four-byte IEEE floating-point value; the channel order in Windows will be {Left, Right, Center, Low-Frequency Effects, Side Left, Side Right}.

The audio engine tries to run at as close to a 10 millisecond cadence as possible, subject to the two restrictions above. Given a "desired minimum interval" (in milliseconds), and a streaming format, and an "alignment requirement" in bytes, you can calculate the closest achievable interval (without going under the desired interval) as follows:

Note: this only works for uncompressed formats
aligned_buffer(desired_milliseconds, format, alignment_bytes)
desired_frames = nearest_integer(desired_milliseconds / 1000.0 * format.nSamplesPerSec)
alignment_frames = least_common_multiple(alignment_bytes, format.nBlockAlign) / format.nBlockAlign
actual_frames = ceiling(desired_frames / alignment_frames) * alignment_frames
actual_milliseconds = actual_frames / format.nSamplesPerSec * 1000.0

Here's a table of the actual buffer size (in frames and milliseconds), given a few typical inputs:

 Desired (milliseconds) Format Alignment (bytes) Desired frames Alignment (frames) Actual (frames) Actual (milliseconds) 10 44.1 kHz stereo 16-bit integer 128 441 32 448 10.16 10 48 kHz stereo 16-bit integer 128 480 32 480 10 10 44.1 kHz 5.1 16-bit integer 128 441 32 448 10.16 10 48 kHz 5.1 16-bit integer 128 480 32 480 10 10 44.1 kHz 5.1 24-bit integer 128 441 64 448 10.16 10 48 kHz 5.1 24-bit integer 128 480 64 512 10.66

So to be really precise, the buffer size is actually 640/63 = 10.158730 milliseconds.

• #### An attempt to explain the twin prime conjecture to a five-year-old

Back in April, Zhang Yitang came up with a result that is a major step toward proving the twin prime conjecture that there are infinitely many primes p for which p + 2 is also prime.

In a reddit.com/r/math thread on the subject, I made the following comment as an attempt to explain the twin prime conjecture to a five-year-old:

ELI5 attempt at the twin prime conjecture

• by yourself (you get all the cookies)
• with two people (each person gets 50 cookies)
• with four people (each person gets 25 cookies)
• with five people (each person gets 20 cookies)
• with ten people (each person gets ten cookies)
• with 20 people (each person gets five cookies)
• with 25 people (each person gets four cookies)
• with 50 people (each person gets two cookies)
• with 100 people (each person gets one cookie)

If you're the only person at your party, it's a sad party.

If everyone at the party gets only one cookie, it's a sad party.

If someone gets more than someone else, it's a sad party.

You don't want your party to be sad, so you have to be careful to have the right number of people to share your cookies.

If you have two cookies, or three, or five, or seven, or eleven, then it's not possible to have a happy party. There's no "right number of people."

People used to wonder whether you could be sure to have a happy party if you just had enough cookies. A famous person named Euclid figured out that, no matter how many cookies you had, even if it was, like, more than a million, you might be unlucky and have a sad number of cookies.

If it's a birthday party, the birthday kid's mom might give the birthday kid an extra cookie. (Or they might get something else instead.) That would be OK.

If it's a birthday party, then, yes, you can be sure to have a happy party if you just had enough cookies. In fact, even three cookies would be enough; you could have the birthday kid, and one friend; they would each have one cookie, and the birthday kid would get the extra one.

But Sam and Jane have a problem. They're twins, and they always have the same birthday. One year they had 13 cookies, and it was a big problem. 13 is a sad number. Even if they both had an extra cookie, that would leave 11, and that is still a sad number.

(If you allow the birthday kid to have two extra cookies, that would leave nine; they could invite one more person, give everyone three cookies, and then Sam and Jane could each have two extras. But this is not a happy party because the guests will get upset that the birthday kids got two extra cookies. I mean, come on!)

Sam and Jane wondered whether they could be sure to have a happy party if they just had enough cookies.

So they asked their mom, who is, like, super smart.

But even she didn't know.

In fact, no-one knows. They don't think so. But they're not, like, super-sure.

• #### Grabbing the output of the Microsoft Speech API text-to-speech engine as audio data

A while ago I wrote a post on Implementing a "say" command using ISpVoice from the Microsoft Speech API which showed how to use Speech API to do text-to-speech, but was limited to playing the generated audio out of the default audio device.

Recently on the Windows Pro Audio forums, user falven asked a question about how to grab the output of the text-to-speech engine as a stream for further processing.

Here's how to do it.

The key part is to use ISpStream::BindToFile to save the audio data to a .wav file, and ISpStream::SetBaseStream to save to a given IStream. Then call ISpVoice::SetOutput with the ISpStream, prior to calling ISpVoice::Speak.

ISpStream *pSpStream = nullptr;
hr = CoCreateInstance(
CLSID_SpStream, nullptr, CLSCTX_ALL,
__uuidof(ISpStream),
(void**)&pSpStream
);
if (FAILED(hr)) {
ERR(L"CoCreateInstance(ISpVoice) failed: hr = 0x%08x", hr);
return -__LINE__;
}
ReleaseOnExit rSpStream(pSpStream);

if (File == where) {
hr = pSpStream->BindToFile(
file,
SPFM_CREATE_ALWAYS,
&SPDFID_WaveFormatEx,
&fmt,
0
);
if (FAILED(hr)) {
ERR(L"ISpStream::BindToFile failed: hr = 0x%08x", hr);
return -__LINE__;
}
} else {
// stream
pStream = SHCreateMemStream(NULL, 0);
if (nullptr == pStream) {
ERR(L"SHCreateMemStream failed");
return -__LINE__;
}

hr = pSpStream->SetBaseStream(
pStream,
SPDFID_WaveFormatEx,
&fmt
);
if (FAILED(hr)) {
ERR(L"ISpStream::SetBaseStream failed: hr = 0x%08x", hr);
return -__LINE__;
}
}

hr = pSpVoice->SetOutput(pSpStream, TRUE);
if (FAILED(hr)) {
ERR(L"ISpVoice::SetOutput failed: hr = 0x%08x", hr);
return -__LINE__;
}

Updated source and binaries attached.

Usage:

>say.exe
say "phrase" [--file <filename> | --stream]
runs phrase through text-to-speech engine
if --file is specified, writes to .wav file
if --stream is specified, captures to a stream
if neither is specified, plays to default output

Here's how to generate a .wav file (uh.wav attached)

>say.exe "uh" --file uh.wav
Stream is 1

And here's how to generate an output stream. The app consumes this and prints the INT16 sample values to the console. uh.txt attached.

>say.exe "uh" --stream
Stream is 1
0        0;        0        0;        0        0;        0        0
0        0;        0        0;        0        0;        0        0
...
86       86;    -1052    -1052;    -2839    -2839;    -3774    -3774
-4199    -4199;    -4581    -4581;    -4284    -4284;    -3640    -3640
-3100    -3100;    -2011    -2011;     -393     -393;      533      533
...

EDIT September 22 2015: moved source to github https://github.com/mvaneerde/blog/tree/master/say

• #### How to dump Speech API object properties

Stamatis Pap asked in a forum thread how to use a Speech API ISpVoice with a non-default audio deviceThis MSDN article shows how to use SpEnumTokens to list all the currently active audio outputs, but the number and order of audio outputs is subject to change as things come and go, or as the default audio device changes.

I spent some time poking around the Speech API documentation and discovered that each audio output object has a DeviceId string value which is the WASAPI endpoint ID; this is the way to recognize a given audio output rather than relying on enumeration order.

As part of figuring this out, as a side effect I created a command-line tool to dump all the speech objects and all of their properties.

Source and binaries attached.

Pseudocode:

for each object category in
(audio outputs; audio inputs; voices; recognizers; etc.)
SpEnumTokens(category)
ISpEnumTokens::GetCount();

for each token
ISpEnumTokens::Next(1);

SpGetDescription(ISpObjectToken);
Log the description

the ISpObjectToken is also an ISpDataKey
the ISpDataKey may also contain subkeys
log all subkeys and their values recursively
using ISpDataKey::EnumKeys and ISpDataKey::OpenKey
for each subkey including this one
log all values in the ISpDataKey
using ISpDataKey::EnumValues and ISpDataKey::GetStringValue

Here's the output on my system.  Note the audio output has a DeviceId string value which matches the WASAPI endpoint ID.

>speech-attributes.exe

-- SPCAT_AUDIOOUT --
#1: [[Speakers] ([High Definition Audio Device])]
Attributes
Vendor = Microsoft
Technology = MMSys
(default) = [[Speakers] ([High Definition Audio Device])]
CLSID = {A8C680EB-3D32-11D2-9EE7-00C04F797396}
DeviceName = [[Speakers] ([High Definition Audio Device])]
DeviceId = {0.0.0.00000000}.{c2cbdacb-a70d-4629-8368-542a00f5a4b0}

-- SPCAT_AUDIOIN --

-- SPCAT_VOICES --
#1: Microsoft Zira Desktop - English (United States)
Attributes
Version = 10.4
Language = 409
Gender = Female
SharedPronunciation =
Name = Microsoft Zira Desktop
Vendor = Microsoft
(default) = Microsoft Zira Desktop - English (United States)
LangDataPath = C:\Windows\Speech\Engines\TTS\en-US\MSTTSLocEnUS.dat
VoicePath = C:\Windows\Speech\Engines\TTS\en-US\M1033ZIR
409 = Microsoft Zira Desktop - English (United States)
CLSID = {C64501F6-E6E6-451f-A150-25D0839BC510}

-- SPCAT_RECOGNIZERS --

-- SPCAT_APPLEXICONS --

-- SPCAT_PHONECONVERTERS --
#1: Simplified Chinese Phone Converter
Attributes
Language = 804
(default) = Simplified Chinese Phone Converter
PhoneMap = (lengthy value redacted)
CLSID = {9185F743-1143-4C28-86B5-BFF14F20E5C8}
#2: English Phone Converter
Attributes
Language = 409
(default) = English Phone Converter
PhoneMap = (lengthy value redacted)
CLSID = {9185F743-1143-4C28-86B5-BFF14F20E5C8}
#3: French Phone Converter
Attributes
Language = 40C
(default) = French Phone Converter
PhoneMap = (lengthy value redacted)
CLSID = {9185F743-1143-4C28-86B5-BFF14F20E5C8}
#4: German Phone Converter
Attributes
Language = 407
(default) = German Phone Converter
PhoneMap = (lengthy value redacted)
CLSID = {9185F743-1143-4C28-86B5-BFF14F20E5C8}
#5: Japanese Phone Converter
Attributes
Language = 411
NumericPhones =
NoDelimiter =
(default) = Japanese Phone Converter
PhoneMap = (lengthy value redacted)
CLSID = {9185F743-1143-4C28-86B5-BFF14F20E5C8}
#6: Spanish Phone Converter
Attributes
Language = 40A;C0A
(default) = Spanish Phone Converter
PhoneMap = (lengthy value redacted)
CLSID = {9185F743-1143-4C28-86B5-BFF14F20E5C8}
Attributes
Language = 404
NumericPhones =
NoDelimiter =
(default) = Traditional Chinese Phone Converter
PhoneMap = (lengthy value redacted)
CLSID = {9185F743-1143-4C28-86B5-BFF14F20E5C8}
#8: Universal Phone Converter
Attributes
Language = (lengthy value redacted)
(default) = Universal Phone Converter
PhoneMap = (lengthy value redacted)
CLSID = {9185F743-1143-4C28-86B5-BFF14F20E5C8}

-- SPCAT_RECOPROFILES --
None found.

EDIT September 22 2015: moved source to github https://github.com/mvaneerde/blog/tree/master/speech-attributes

• #### Generating sample first names

I had a need to write a script that would give me a random first name.  I grabbed the top 200 first names for baby boys in the US from 2000-2009, and the same list for baby girls:

 Boys Girls Jacob Emily Michael Madison ... ...

My initial implementation just printed out the name, but I quickly realized I needed to print out the gender if I wanted to talk about what the (fictitious) person did.  So I updated it to print out the gender as well.

In the course of this I realized that some names appeared on both lists.  In particular they are:

• Alexis
• Angel
• Jordan
• Peyton
• Riley

The script is called like this:

>perl -w name.pl
Wesley (male)

EDIT 2015-10-31: moved source to https://github.com/mvaneerde/blog/blob/master/scripts/name.pl

• #### Programmatically adding a folder to a shell library (e.g., the Music library)

I wrote a selfhost tool which allows me to add a folder (for example, C:\music) to a shell library (for example, the Music library.)

This was before I found out about the shlib shell library sample which Raymond Chen blogged about.  If you're looking for a sample on how to manipulate shell libraries, prefer that one to this.

Pseudocode:

CoInitialize
pShellLibrary->Commit()
CoUninitialize

Usage:

>shelllibrary
<library> must be one of:
documents
music
pictures
videos
recorded tv

Source and binaries attached.

EDIT September 22 2015: moved source to github https://github.com/mvaneerde/blog/tree/master/shelllibrary

• #### Changing the desktop wallpaper using IDesktopWallpaper

About a year ago I wrote about how to change the desktop wallpaper using SystemParametersInfo(SPI_SETDESKWALLPAPER).

Windows 8 desktop apps (not Store apps) can use the new IDesktopWallpaper API to get a more fine level of control.  So I wrote an app which uses the new API, though I just set the background on all monitors to the same image path, and I don't exercise any of the advanced features of the API.

Pseudocode:

CoInitialize
CoCreateInstance(DesktopWallpaper)
pDesktopWallpaper->SetWallpaper(NULL, full-path-to-image-file)
pDesktopWallpaper->Release()
CoUninitialize

Usage:

>desktopwallpaper.exe "%userprofile%\pictures\theda-bara.bmp"
Setting the desktop wallpaper to C:\Users\MatEer\pictures\theda-bara.bmp succeeded.

Source and binaries attached

EDIT September 22 2015: moved source to github https://github.com/mvaneerde/blog/tree/master/desktopwallpaper

• #### Grabbing large amounts of text from STDIN in O(n) time

Last time I blogged about an O(n log n) solution to finding the longest duplicated substring in a given piece of text; I have since found an O(n) algorithm, which I linked to in the comments.

But my blog post used an O(n2) algorithm to read the text from STDIN! It looked something like this:

while (!done) {
grab 2 KB of text
allocate a new buffer which is 2 KB bigger
copy the old text and the new text together into the new buffer
free the old buffer
}

There are two better algorithms:

while (!done) {
grab an amount of text equal to the amount we've grabbed so far
allocate a new buffer which is twice as large as the last buffer
copy the old text and the new text together into the new buffer
free the old buffer
}

And:

while (!done) {
grab 2 KB of text
add this to the end of a linked list of text chunks
}

allocate a buffer whose size is the total size of all the chunks added together
walk the linked list and copy the text of each chunk into the buffer

Both "better" algorithms are O(n) but the latter wastes less space.

Here's the improved code:

struct Chunk {
WCHAR text[1024];
Chunk *next;

Chunk() : next(nullptr) { text[0] = L'\0'; }
};

class DeleteChunksOnExit {
public:
DeleteChunksOnExit() : m_p(nullptr) {}
~DeleteChunksOnExit() {
Chunk *here = m_p;
while (here) {
Chunk *next = here->next;
delete here;
here = next;
}
}
void Set(Chunk *p) { m_p = p; }

private:
Chunk *m_p;
};

...

Chunk *tail = nullptr;

DeleteChunksOnExit dcoe;

size_t total_length = 0;

bool done = false;
while (!done) {
Chunk *buffer = new Chunk();
if (nullptr == buffer) {
LOG(L"Could not allocate memory for buffer");
return nullptr;
}

// this runs on the first pass only
tail = buffer;
dcoe.Set(buffer);
} else {
tail->next = buffer;
tail = buffer;
}

if (fgetws(buffer->text, ARRAYSIZE(buffer->text), stdin)) {
total_length += wcslen(buffer->text);
} else if (feof(stdin)) {
done = true;
} else {
return nullptr;
}
}

// gather all the allocations into a single string
size_t size = total_length + 1;
WCHAR *text = new WCHAR[size];
if (nullptr == text) {
LOG(L"Could not allocate memory for text");
return nullptr;
}
DeleteArrayOnExit<WCHAR> deleteText(text);

WCHAR *temp = text;
for (Chunk *here = head; here; here = here->next) {
if (wcscpy_s(temp, size, here->text)) {
LOG(L"wcscpy_s returned an error");
return nullptr;
}

size_t len = wcslen(here->text);
temp += len;
size -= len;
}

deleteText.Cancel();
return text;
}

• #### Finding the longest substring which occurs twice in a given string

I'm reading Jon Bentley's Programming Pearls and one of the interesting exercises was to find the longest substring which occurs twice in a given string of length n.

There's a naïve solution where you look at every pair of (distinct) indexes (i, j), and calculate the length of the common prefix of the substrings starting at those locations; this is O(n2) assuming that the length of the eventual maximum substring is O(1) (that is, << n.)

Jon shows that there is an O(n log n) algorithm, which is a significant savings if n is large (e.g., War and Peace.)  This involves building an array of all substrings, then sorting them lexically, then walking the sorted array; the trick is that now we only need to check pairs of indexes that correspond to adjacent entries in the array.  That step can be done in O(n) time; the O(n log n) comes from the sorting step.

I wrote up a quick app that implements his suggested algorithm.  Source follows.

// main.cpp

#include <windows.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <search.h>

#define LOG(format, ...) wprintf(format L"\n", __VA_ARGS__)

template<typename T>
class DeleteArrayOnExit {
public:
DeleteArrayOnExit(T *p) : m_p(p), m_canceled(false) {}
~DeleteArrayOnExit() { if (!m_canceled) { delete [] m_p; } }
void Cancel() { m_canceled = true; }
void Swap(T *p) { m_p = p; }
private:
T *m_p;
bool m_canceled;
};

// caller must delete [] the buffer when done with it
int __cdecl pwcscmp(const void *pstra, const void *pstrb) {
return wcscmp(*(LPWSTR*)pstra, *(LPWSTR*)pstrb);
}

// returns length of longest common substring
// don't include the null terminator if both strings are identical
// e.g., comlen(banana, bananas) == comlen(banana, banana) == 6
int comlen(LPCWSTR a, LPCWSTR b) {
int i = 0;

// keep going while the strings are the same and not ended
while (a[i] && (a[i] == b[i])) { i++; }

return i;
}

int _cdecl wmain() {
if (nullptr == text) {
return -__LINE__;
}
DeleteArrayOnExit<WCHAR> deleteText(text);

size_t size = wcslen(text) + 1;
LPWSTR *suffixes = new LPWSTR[size];
if (nullptr == suffixes) {
LOG(L"Could not allocate memory for suffixes");
return -__LINE__;
}
DeleteArrayOnExit<LPWSTR> deleteSuffixes(suffixes);

for (size_t i = 0; i < size; i++) {
suffixes[i] = &text[i];
}

qsort(suffixes, size, sizeof(LPWSTR), pwcscmp);

// find the longest common adjacent pair
LPWSTR szMax = suffixes[0];
size_t lenMax = 0;
for (size_t i = 0; i < size - 1; i++) {
size_t len = comlen(suffixes[i], suffixes[i + 1]);
if (len > lenMax) {
lenMax = len;
szMax = suffixes[i];
}
}

WCHAR *substring = new WCHAR[lenMax + 1];
if (nullptr == substring) {
LOG(L"Could not allocate memory for substring");
return -__LINE__;
}
DeleteArrayOnExit<WCHAR> deleteSubstring(substring);
if (0 != wcsncpy_s(substring, lenMax + 1, szMax, lenMax)) {
LOG(L"wcsncpy_s failed");
return -__LINE__;
}
substring[lenMax] = L'\0';

// intentionally not using LOG to avoid trailing newline
wprintf(L"%s", substring);

return 0;
}

WCHAR *text = new WCHAR[1];
if (nullptr == text) {
LOG(L"Could not allocate memory for text");
return nullptr;
}
DeleteArrayOnExit<WCHAR> deleteText(text);
text[0] = L'\0';

// read a 2 KB chunk
const size_t buffer_size = 1024;
WCHAR *buffer = new WCHAR[buffer_size];
if (nullptr == buffer) {
LOG(L"Could not allocate memory for buffer");
return nullptr;
}
DeleteArrayOnExit<WCHAR> deleteBuffer(buffer);

bool done = false;
do {
if (fgetws(buffer, buffer_size, stdin)) {
size_t size = wcslen(text) + wcslen(buffer) + 1;
WCHAR *new_text = new WCHAR[size];
if (nullptr == new_text) {
LOG(L"Could not allocate memory for new text");
return nullptr;
}
DeleteArrayOnExit<WCHAR> deleteNewText(new_text);

WCHAR *dest = new_text;

if (0 != wcscpy_s(dest, size, text)) {
LOG(L"wcscpy_s failed");
return nullptr;
}
dest += wcslen(text);
size -= wcslen(text);

if (0 != wcscpy_s(dest, size, buffer)) {
LOG(L"wcscpy_s failed");
return nullptr;
}

// that should do it for copying
// now swap new_text => text

delete [] text;
text = new_text;

deleteText.Swap(new_text);
deleteNewText.Cancel();

} else if (feof(stdin)) {
done = true;
} else {
return nullptr;
}
} while (!done);

deleteText.Cancel();
return text;
}

• #### Enumerating mixer devices, mixer lines, and mixer controls

The WinMM multimedia APIs include an API for enumerating and controlling all the paths through the audio device; things like bass boost, treble control, pass-through audio from your CD player to your headphones, etc.  This is called the "mixer" API and is the forerunner of the IDeviceTopology API.

I wrote a quick app to enumerate all the mixer devices on the system; for each mixer device, enumerate each mixer line (that is, each source and destination); for each mixer line, enumerate all the controls (volume, mute, equalization, etc.); and for each control, query the associated text (if any) and the current value.

Source and binaries attached.

Pseudocode:

mixerGetNumDevs()
for each mixer device
mixerGetDevCaps(dev)
for each destination (line) on the device
mixerGetLineInfo(dest)
mixerGetLineControls(dest)
for each control on the line
if the control supports per-item description
mixerGetControlDetails(control, MIXER_GETCONTROLDETAILSF_LISTTEXT)
log the per-item description
mixerGetControlDetails(control, MIXER_GETCONTROLDETAILSF_VALUE)
log the value(s)

Usage:

>mixerenum.exe
Mixer devices: 5
Device ID: 0
Manufacturer identifier: 1
Product identifier: 104
Driver version: 6.2
Support: 0x0
Destinations: 1
-- Destination 0: Master Volume --
Destination: 0
Source: -1
Line ID: 0xffff0000
Status: MIXERLINE_LINEF_ACTIVE (1)
User: 0x00000000
Channels: 1
Connections: 2
Controls: 2
Short name: Volume
Long name: Master Volume
-- Target:  --
Type: MIXERLINE_TARGETTYPE_UNDEFINED (0)
Device ID: 0
Manufacturer identifier: 65535
Product identifier: 65535
Driver version: 0.0
Product name:
-- Control 1: Mute --
Type: MIXERCONTROL_CONTROLTYPE_MUTE (0x20010002)
Status: MIXERCONTROL_CONTROLF_UNIFORM (0x1)
Item count: 0
Short name: Mute
Long name: Mute
-- Values --
FALSE
-- Control 2: Volume --
Type: MIXERCONTROL_CONTROLTYPE_VOLUME (0x50030001)
Status: MIXERCONTROL_CONTROLF_UNIFORM (0x1)
Item count: 0
Short name: Volume
Long name: Volume
-- Values --
0xffff on a scale of 0x0 to 0xffff
-- 1: HDMI Audio (Contoso --
Device ID: 1
Manufacturer identifier: 1
Product identifier: 104
Driver version: 6.2
Product name: HDMI Audio (Contoso
Support: 0x0
Destinations: 1
-- Destination 0: Master Volume --
Destination: 0
Source: -1
Line ID: 0xffff0000
Status: MIXERLINE_LINEF_ACTIVE (1)
User: 0x00000000
Component Type: MIXERLINE_COMPONENTTYPE_DST_DIGITAL (1)
Channels: 1
Connections: 2
Controls: 2
Short name: Volume
Long name: Master Volume
-- Target:  --
Type: MIXERLINE_TARGETTYPE_UNDEFINED (0)
Device ID: 0
Manufacturer identifier: 65535
Product identifier: 65535
Driver version: 0.0
Product name:
-- Control 1: Mute --
Type: MIXERCONTROL_CONTROLTYPE_MUTE (0x20010002)
Status: MIXERCONTROL_CONTROLF_UNIFORM (0x1)
Item count: 0
Short name: Mute
Long name: Mute
-- Values --
FALSE
-- Control 2: Volume --
Type: MIXERCONTROL_CONTROLTYPE_VOLUME (0x50030001)
Status: MIXERCONTROL_CONTROLF_UNIFORM (0x1)
Item count: 0
Short name: Volume
Long name: Volume
-- Values --
0xffff on a scale of 0x0 to 0xffff
-- 2: Speakers (Contoso --
Device ID: 2
Manufacturer identifier: 1
Product identifier: 104
Driver version: 6.2
Product name: Speakers (Contoso
Support: 0x0
Destinations: 1
-- Destination 0: Master Volume --
Destination: 0
Source: -1
Line ID: 0xffff0000
Status: MIXERLINE_LINEF_ACTIVE (1)
User: 0x00000000
Component Type: MIXERLINE_COMPONENTTYPE_DST_SPEAKERS (4)
Channels: 1
Connections: 2
Controls: 2
Short name: Volume
Long name: Master Volume
-- Target:  --
Type: MIXERLINE_TARGETTYPE_UNDEFINED (0)
Device ID: 0
Manufacturer identifier: 65535
Product identifier: 65535
Driver version: 0.0
Product name:
-- Control 1: Mute --
Type: MIXERCONTROL_CONTROLTYPE_MUTE (0x20010002)
Status: MIXERCONTROL_CONTROLF_UNIFORM (0x1)
Item count: 0
Short name: Mute
Long name: Mute
-- Values --
FALSE
-- Control 2: Volume --
Type: MIXERCONTROL_CONTROLTYPE_VOLUME (0x50030001)
Status: MIXERCONTROL_CONTROLF_UNIFORM (0x1)
Item count: 0
Short name: Volume
Long name: Volume
-- Values --
0xffff on a scale of 0x0 to 0xffff
Device ID: 3
Manufacturer identifier: 1
Product identifier: 104
Driver version: 6.2
Support: 0x0
Destinations: 1
-- Destination 0: Master Volume --
Destination: 0
Source: -1
Line ID: 0xffff0000
Status: MIXERLINE_LINEF_ACTIVE (1)
User: 0x00000000
Component Type: MIXERLINE_COMPONENTTYPE_DST_WAVEIN (7)
Channels: 1
Connections: 1
Controls: 2
Short name: Volume
Long name: Master Volume
Type: MIXERLINE_TARGETTYPE_WAVEIN (2)
Device ID: 0
Manufacturer identifier: 1
Product identifier: 101
Driver version: 6.2
-- Control 1: Mute --
Type: MIXERCONTROL_CONTROLTYPE_MUTE (0x20010002)
Status: MIXERCONTROL_CONTROLF_UNIFORM (0x1)
Item count: 0
Short name: Mute
Long name: Mute
-- Values --
FALSE
-- Control 2: Volume --
Type: MIXERCONTROL_CONTROLTYPE_VOLUME (0x50030001)
Status: MIXERCONTROL_CONTROLF_UNIFORM (0x1)
Item count: 0
Short name: Volume
Long name: Volume
-- Values --
0xf332 on a scale of 0x0 to 0xffff
-- 4: Microphone (Contoso --
Device ID: 4
Manufacturer identifier: 1
Product identifier: 104
Driver version: 6.2
Product name: Microphone (Contoso
Support: 0x0
Destinations: 1
-- Destination 0: Master Volume --
Destination: 0
Source: -1
Line ID: 0xffff0000
Status: MIXERLINE_LINEF_ACTIVE (1)
User: 0x00000000
Component Type: MIXERLINE_COMPONENTTYPE_DST_WAVEIN (7)
Channels: 1
Connections: 1
Controls: 2
Short name: Volume
Long name: Master Volume
-- Target: Microphone (Contoso --
Type: MIXERLINE_TARGETTYPE_WAVEIN (2)
Device ID: 1
Manufacturer identifier: 1
Product identifier: 101
Driver version: 6.2
Product name: Microphone (Contoso
-- Control 1: Mute --
Type: MIXERCONTROL_CONTROLTYPE_MUTE (0x20010002)
Status: MIXERCONTROL_CONTROLF_UNIFORM (0x1)
Item count: 0
Short name: Mute
Long name: Mute
-- Values --
FALSE
-- Control 2: Volume --
Type: MIXERCONTROL_CONTROLTYPE_VOLUME (0x50030001)
Status: MIXERCONTROL_CONTROLF_UNIFORM (0x1)
Item count: 0
Short name: Volume
Long name: Volume
-- Values --
0xf332 on a scale of 0x0 to 0xffff

EDIT September 22 2015: moved source to github https://github.com/mvaneerde/blog/tree/master/mixerenum

• #### Enumerating MIDI devices

In addition to audio playback and recording, Windows Multimedia (WinMM) provides a Musical Instrument Digital Interface (MIDI) API.

Here's how to make a list of all the MIDI devices on the system, their capabilities, and the hardware device interface associated with each of them.

Source and binaries attached.

Pseudocode:

midiInGetNumDevs or midiOutGetNumDevs
for each device
midiInGetDevCaps or midiOutGetDevCaps
log device capabilities
midiInMessage or midiOutMessage
with DRV_QUERYDEVICEINTERFACESIZE
and DRV_QUERYDEVICEINTERFACE
log the device interface

Output:

>midienum.exe
midiIn devices: 1
-- 0: USB2.0 MIDI Device --
Device ID: 0
Manufacturer identifier: 65535
Product identifier: 65535
Driver version: 1.6
Product name: USB2.0 MIDI Device
Support: 0x0
Device interface: "\\?\usb#vid_xxxx&pid_yyyy&..."
midiOut devices: 2
-- 0: Microsoft GS Wavetable Synth --
Device ID: 0
Manufacturer identifier: 1
Product identifier: 27
Driver version: 1.0
Product name: Microsoft GS Wavetable Synth
Technology: 7 (MOD_SWSYNTH)
Voices: 32
Notes: 32
Support: 0x1
MIDICAPS_VOLUME
Device interface: ""
-- 1: USB2.0 MIDI Device --
Device ID: 1
Manufacturer identifier: 65535
Product identifier: 65535
Driver version: 1.6
Product name: USB2.0 MIDI Device
Technology: 1 (MOD_MIDIPORT)
Voices: 0
Notes: 0
Support: 0x0
Device interface: "\\?\usb#vid_xxxx&pid_yyyy&..."

(Actual device interface string suppressed.)

Note the Microsoft GS Wavetable Synth device, which is always present.

Why would you want to know the device interface? In our case, because we want to test all the audio-related interfaces of a particular device on the system.

EDIT September 22 2015: moved source to github https://github.com/mvaneerde/blog/tree/master/midienum

• #### Implementing a "listen" command using ISpRecoContext from the Microsoft Speech API

Earlier today I posted a quick "say.exe" sample app which you give text and it speaks the text aloud using the text-to-speech part of the Windows Speech API.  It was very straightforward - only 67 lines of C++ code.

It took me a little longer to figure out how to do this "listen.exe" sample app; you run it, speak into the microphone, and it uses the speech-to-text part of the Windows Speech API to print what you're saying to the console.  This is a little more involved: 202 lines of C++ code.

Pseudocode:

CoInitialize()
CoCreateInstance(ISpRecoContext)
pSpRecoContext->SetInterest(recognition events only, thanks)
pSpRecoContext->CreateGrammar()
pSpRecoGrammar->SetDictationState(active)
while(...) {
wait for a speech event (or the user to press Enter)
pSpRecoContext->GetEvents()
for each speech event {
make sure SPEVENT.eEventId is SPEI_RECOGNITION
event.lParam is an ISpRecoResult
pSpRecoResult->GetText()
print the text
}
}

Usage:

>listen.exe
Speak into the microphone naturally; I will print what I understand.
Press ENTER to quit.
(At this point you start talking into the microphone. Text shows up here shortly after you say it.)

Source and binaries attached.

EDIT September 22 2015: removed source and binaries as this is obsoleted by http://blogs.msdn.com/b/matthew_van_eerde/archive/2014/07/11/using-the-speech-api-to-convert-speech-to-text.aspx

• #### Implementing a "say" command using ISpVoice from the Microsoft Speech API

I've known for a while that Microsoft Windows comes with text-to-speech and speech-to-text APIs, which power the Narrator and Speech Recognition features respectively.

This forum post prompted me to mess around with them a little.

I came up with this implementation of a say.exe command which takes a single argument as text, and then uses the ISpVoice text-to-speech API to have the computer speak it aloud.

Source and binaries attached.

Pseudocode:

CoInitialize(nullptr);
CoCreateInstance(ISpVoice)
pSpVoice->Speak(text);

Usage:

EDIT September 22 2015: removed source and binaries as this is obsoleted by http://blogs.msdn.com/b/matthew_van_eerde/archive/2013/03/13/grabbing-the-output-of-the-microsoft-speech-api-text-to-speech-engine-as-audio-data.aspx

• #### Muting all audio outputs with IAudioEndpointVolume

I have a selfhost tool that I use to mute all audio outputs programmatically.

Pseudocode:

IMMDeviceEnumerator::EnumAudioEndpoints
for each device:
IMMDevice::Activate(IAudioEndpointVolume)
IAudioEndpointVolume::SetMute(TRUE)

Source and binaries attached.

EDIT September 22 2015: moved source to github https://github.com/mvaneerde/blog/tree/master/mute-all-outputs

• #### Getting audio peak meter values for all active audio sessions

The Windows Vista volume mixer shows a peak meter for the device.  In Windows 7 we added a peak meter for each application.

The audio interface for both is IAudioMeterInformation; I've used this before in my post about the linearity of Windows volume APIs.  This post showed how an application can get the peak meter reading for the device meter.

In Windows 7 we also added APIs to allow applications to get a list of audio sessions.  I wrote up a quick app which shows how to get a list of all audio sessions, filter it down to the active ones (sessions which have an active audio client), and then get a peak meter reading.

>meters.exe
{0.0.0.00000000}.{c05f2f54-7294-422a-bb0d-8d690c365b73}|#%b{917F8618-9C57-4720-9B7C-88CA45FC983B}: 0.215705
Active sessions: 1

{0.0.0.00000000}.{c05f2f54-7294-422a-bb0d-8d690c365b73}|#%b{917F8618-9C57-4720-9B7C-88CA45FC983B} is the session identifier, which should be considered opaque.  0.215705 is the value of the peak meter, ranging from 0 to 1 linearly in amplitude.  If you are populating a visual peak meter with this information, you will need to apply a curve.

Source and binaries attached.  Pseudocode follows:

CoCreate(IMMDeviceEnumerator);
IMMDeviceEnumerator::GetDefaultAudioEndpoint;
IMMDevice::Activate(IAudioSessionManager2);
IAudioSessionManager2::GetSessionEnumerator;
for (each session) {
IAudioSessionEnumerator::GetSession
IAudioSessionControl::GetState
if the state is anything but "active", skip to the next session
QI IAudioSessionControl to IAudioSessionControl2
IAudioSessionControl2::GetSessionIdentifier
QI IAudioSessionControl to IAudioMeterInformation
IAudioMeterInformation::GetPeakValue
Log the session identifier and the peak value
}

EDIT September 22 2015: removed source and binaries as this is obsoleted by http://blogs.msdn.com/b/matthew_van_eerde/archive/2013/09/26/getting-peak-meters-and-volume-settings-for-all-apps-and-audio-devices-on-the-system.aspx

• #### Windows Sound test team rowing morale event

Last Friday the Windows Sound test team went kayaking.  We went to the Agua Verde paddle club and kayaked around Union Bay for a while.

Here's the route we took:

More detail:

http://connect.garmin.com/activity/179545084

Page 2 of 7 (153 items) 12345»