# Matthew van Eerde's web log

• #### How to calculate a sine sweep - the wrong way

Suppose you want to generate a continuous sine-sweep from time tstart to time tend. You want the starting frequency to be ωstart, and the ending frequency to be ωend; you want the sweep to be logarithmic, so that octaves are swept out in equal times. (The alternative would be to sweep linearly, but usually logarithmic sweeping is what you want.) For today we're going to have continuous time and sample values, so sample rate and bit depth will not be part of this discussion; in particular, the units of t will be seconds, and our sample values will go from -1 to 1.

Here's a picture of want we want to end up with:

Without loss of generality, set tstart = 0. This simplifies some of the math.

Let ω(t) be the "instantaneous frequency" at time t. Advanced Exercise: Define "instantaneous frequency" mathematically.

Sweeping ω(t) logarithmically means that log ω(t) is swept linearly from log ωstart to log ωend. So

Quick check: ω(0) = ωstart, ω(tend) = ωend. Yup.

Let's define R (the rate of change of the frequency) as (ωend / ωstart)1 / tend. This formula reduces to

ω(t) = ωstart Rt

Next step is to look at phase. For a sine wave of constant frequency ω, phase progresses linearly with t on the circular interval [0, 2π):
φ = k ω t
for some constant k. If the frequency of a sine wave was 1 cycle per second, the phase would go from 0 to 2π in a single second - this reveals that the correct value of k is 2π:
φ = 2π ω t

What happens with a variable frequency? A little calculus provides the answer. Consider a small change Δt, small enough so that ω(t) is almost constant over the interval:
Δφ ≈ 2π ω(t) Δt
because over that interval the sine sweep function is well approximated by a sine wave of constant frequency ω(t).

Starting from an initial phase φstart, let Δt → 0, and:

Substitute:
u = τ log R
τ = u / log R
= du / log R

Define:
B = 2π ωstart / log R
A = φstart - B
and we have:

φ(t) = A + BRt

Just for fun, here's the explicit solution without intermediate values:

Now, the payoff: the signal is just the sine of the phase, so our sine sweep is:

or, more explicitly:

Ah, beautiful mathematically.

But useless practically.

Why useless?

Well, note that our equation for phase is an exponential. That expression inside the sin(...) gets very big, very quickly. Any practical implementation of sin(...) is going to be utterly useless once its argument gets beyond a certain threshold - you'll get ugly chunky steps before you're halfway into the sweep.

Nonetheless, there is a practical way to generate (pretty good) sine sweeps. More on that next time.

EDIT: all logs are base e.

• #### How to calculate a sine sweep - the right way

Three quarters are better than a dollar because they make noise!
-- Lilly, Lilly's Purple Plastic Purse

Last time I talked about how to calculate a sine sweep mathematically.  There's a closed-form solution which is quite elegant but, alas, useless in a practical computer implementation due to the very big numbers that are being fed to the sin(...) function.

A computer implementation will care only about the discrete samples that need to be generated.  The integral in the previous post turns out to be counterproductive - we're much more interested in the infinite sequence of φi, and more particularly in their residue mod 2π.

Here are Matlab scripts that implement a sine sweep both ways:

First, the naïve mathematical solution that just calculates the exponential:

function signal = sinesweep_nostate( ...
start, ...
finish, ...
seconds, ...
sample_rate, ...
amplitude, ...
initial_phase, ...
dc ...
)
% sinesweep_nostate returns a single-channel sine logarithmic sweep
%     DO NOT USE THIS FUNCTION IN PRODUCTION
%     THIS IS A PEDAGOGICAL EXERCISE ONLY
%     THE SIGNAL THIS PRODUCES GRADUALLY DISTORTS
%
%     start: starting frequency in Hz
%     finish: ending frequency in Hz
%     seconds: duration of sweep in seconds
%     samplerate: samples per second
%     amplitude: amplitude
%     initial_phase: starting phase
%     dc: dc

% echo these interesting intermediate values to the console
R = (finish / start) ^ (1 / seconds)
B = 2 * pi * start / log(R)
A = initial_phase - B

time = 0 : 1 / sample_rate : seconds;
phase = A + B * R .^ time;
signal = amplitude * sin(phase) + dc;

end % sinesweep_nostate

Now, the iterative version that adds up the little Δφs to calculate φi iteratively:

function signal = sinesweep( ...
start, ...
finish, ...
seconds, ...
sample_rate, ...
amplitude, ...
initial_phase, ...
dc ...
)
% sinesweep returns a single-channel sine logarithmic sweep
%     start: starting frequency in Hz
%     finish: ending frequency in Hz
%     seconds: duration of sweep in seconds
%     samplerate: samples per second
%     amplitude: amplitude
%     initial_phase: starting phase
%     dc: dc

time = 0 : 1 / sample_rate : seconds;
frequency = exp( ...
log(start) * (1 - time / seconds) + ...
log(finish) * (time / seconds) ...
);

phase = 0 * time;
phase(1) = initial_phase;
for i = 2:length(phase)
phase(i) = ...
mod(phase(i - 1) + 2 * pi * frequency(i) / sample_rate, 2 * pi);
end

signal = amplitude * sin(phase) + dc;

end % sinesweep

If anything, one would expect the first to be more accurate, since it is mathematically precise, and the second is only an approximation.  But let's see how the signal produced by each of them holds up.

I calculate, and plot, the same sweep both ways, with a slight dc offset to facilitate seeing the different sweeps on the same plot.  In particular this is a logarithmic sweep from 20 kHz to 21 kHz, over 30 seconds, using 44100 samples per second, an amplitude of 0.2, an initial phase of π, and a dc of 0.25 for one and 0.26 for the other:

plot(...
x, sinesweep(20000, 21000, 30, 44100, 0.2, pi, 0.25), 'd', ...
x, sinesweep_nostate(20000, 21000, 30, 44100, 0.2, pi, 0.26), 's' ...
)

(x is just 1:30*44100 - necessary to allow a single plot statement.)

Note that the approximate method is plotted in diamonds, and will be 0.01 lower on the resulting graph than the "exact" method, plotted in squares.

(This takes a while to plot because there are a heckuva lot of points...)

Ah, a nice solid mass of green and blue.  I won't show it here.

OK, let's zoom in to a very early chunk of the signal - I expect these to match up closely, with only the dc offset separating the points:

Yup.  Looks pretty good - give or take a pixel (quantization error in the display.)

Now let's see what happens to the signal towards the end of the 30 seconds.

Ick.  Something's rotten in the state of Denmark.  This distortion is so bad that it's visible - you don't even have to take an FFT to see it.

Exercise: take an FFT and look at the distortion.

Advanced Exercise: demonstrate that the distortion is, in fact, in the "exact" method and not in the iterative method... that is to say, show which clock is right.

EDIT: On second glance it looks like the second picture is just showing a horizontal shift between the two signals, which is expected.  I'll need to dig deeper if I'm to prove my assertion that the iterative method produces less distortion than the "exact" method.

• #### Jeopardy conundrum

What happens if Double Jeopardy ends and all contestants are in the red (or have no money?)

Do they skip Final Jeopardy and pick three new people for the next show?

• #### Spot the bug - Outlook "block sender" icon

Back in Windows 7 I did a fair amount of UI testing for the Sound team.

One of the problems with testing UI is it gives you an unconscious attention to detail that's difficult to unlearn.

For example, this Outlook bug didn't bother me before... but now it does.

Exercise: what is the bug?

• #### How to write a leading apostrophe in Word

Disclaimer: I don't work on the Office team.

Word has a smart quotes feature where it will automagically transform

straight "double quotes," 'single quotes,' and greengrocer's apostrophes

into

curly “double quotes,” ‘single quotes,’ and greengrocer’s apostrophes

as you type.  You send a Unicode Character 'APOSTROPHE' (U+0027) to Word, and Word turns it into a Unicode Character 'LEFT SINGLE QUOTATION MARK' (U+2018) or a Unicode Character 'RIGHT SINGLE QUOTATION MARK' (U+2019) as appropriate.  If you type it at the beginning of a word, it's an opening-single-quote; if you type it in the middle of a word, it's an apostrophe; if you type it at the end, it's either an apostrophe or a closing-single-quote, but both are the same character, so it doesn't matter.

Right?

Usually.

The apostrophe is occasionally used at the beginning of a word, to mark elided letters.  Word has trouble with this.

Twas brillig, and the slithy toves did gyre and gimbol in the wabe.

-- Jabberwocky

I want to give the Word folks credit here.  A common use of an apostrophe at the beginning of a word is to abbreviate a year.  Word gets this right (I'm using Word 2010 with default settings:)

Check it out: a 57 Chevy!

But Word also gets it wrong when you want to single-quote a clause that begins with a number:

“They were singing 99 bottles of beer on the wall’.”

Also, if you pluralize the date, Word gets suckered:

10 models on sale! Check out the new 11s!

Little nifties, from the 50s, innocent and sweet;
Sexy ladies from the 80s, who are indiscreet
-- 42nd Street

For those times when Word gets it wrong, here's how to fix it.

If you want to type a word that starts with an apostrophe:

1. Type the apostrophe.  Word assumes you want an opening quote.  Fine
2. Type the apostrophe again.  Word shrugs its shoulders and gives you a closing quote in a little soixante-neuf of punctuation.
3. Type the magic sequence:
1. Left arrow
2. Backspace
3. Right arrow
4. Et voilà.  Continue typing your word.
Twas

If you want to use an opening single quote with a sentence that starts with a number:

1. Type the opening quote and the number in its entirety.  So far, so good.
99
2. Type the word breaker (usually a space.)  Word "helpfully" turns the opening single quote into an apostrophe.
99
3. Invoke "Undo" via your favorite mechanism (I prefer Ctrl-Z since I'm already on the keyboard.)
99
4. Et voilà.  Continue typing your quoted clause.
99 bottles of beer on the wall

That "Undo" tip is actually a fairly powerful curative against all sorts of Word voodoo.

• #### How much is a perfect Jeopardy! score?

Some sports and games have a notion of a "perfect score."  For example, a perfect score in bowling is 300 (twelve consecutive strikes;) a perfect score in golf would be 18 (18 holes-in-one.)

After watching Watson destroy Ken Jennings and Brad Rutter I began to wonder what a perfect score in Jeopardy! would be.

Assumptions:

• The player knows (or can guess) all the correct questions for the given answers.
• The player can consistently beat the other players to the buzzer.
• The player knows (or can guess) where all three Daily Doubles are.
• The Daily Double in the Jeopardy! round happens to be on a \$200 clue.
• The two Daily Doubles in the Double Jeopardy! round both happen to be on \$400 clues.
• The player is willing to "bet it all" at all four opportunities (the three Daily Doubles and Final Jeopardy!).

Jeopardy! has three rounds:

1. The Jeopardy! round
The player starts this round with \$0.
There are six categories with five clues each: \$200, 400, 600, 800, 1000.  That's \$3000 for each category for a nominal total of \$18,000.
But wait - one of those clues is a Daily Double!  For maximum value we're assuming it's a \$200 clue.
The player gets all the normal clues correct and amasses \$17,800.
The player then "makes it a true daily double", gets it right, and doubles their total to \$35,600.
2. The Double Jeopardy! round
The player starts this round with \$35,600.  Again, there are six categories with five clues each.  But this time the values are \$400, 800, 1200, 1600, and \$2000.
That's \$6000 for each category for a nominal total of \$36,000.
But wait - two of those clues are Daily Doubles!  For maximum value we're assuming they're both \$400 clues.
The player gets all the normal clues correct and amasses an additional \$35,200.  They now have \$70,800.
The player then selects the first of the two Daily Doubles for this round, "makes it a true daily double", gets it right, and doubles their total to \$141,600.  (Jeopardy!, unlike some game shows, does not materialize winnings at the end of a round; there's just a single running total for each player throughout the episode.)
The player then selects the last Daily Double, "makes it a true daily double", gets it right, and doubles their total to \$283,200.
3. Final Jeopardy!
The player starts this round with \$283,200.
The player looks at the category, bets everything, looks at the answer, writes down the correct question, and doubles their total to \$566,400.

Conclusion: A perfect score for a single episode of Jeopardy! is \$566,400.

The actual record appears to be Roger Craig's win of \$77,000 on September 14 2010.

• #### Spot the bug - control flow macro

Found this in a code review.

Well, in all candor, I didn't find it; our automated code review tool found it.  But I'm taking credit for it because I read the output of the automated code review tool. :-)

Code rewritten because I'm too lazy to look it up.

// macros.h
#define ENSURE_EQUAL(x, y) if (x != y) { bail(); }

// implementation.cpp
...
hr = foo();
ENSURE_EQUAL( foo_supported() ? S_OK : E_FOO_NOT_SUPPORTED, hr );
...

• #### Riffing on Raymond - incrementing the reference count on a smart pointer

He made two suggestions, and I offered a third:

1. (Raymond) tell the smart pointer class to release ownership to the function
2. (Raymond) use a different function that doesn't release the input
3. (Me) take an explicit reference on the function's behalf

Raymond suggested that I should actually try my suggestion. So I did.

For each of the four smart pointer types, I tried four different ways to add a reference. Here are the results:

 Smart pointer type .AddRef() ->AddRef() Cast to IUnknown * Get underlying pointer Microsoft::WRL::ComPtr Compile error Compile error Compile error .Get() ATL::CComPtr Compile error Compile error OK .p _com_ptr_t OK OK OK .GetInterfacePtr() std::unique_ptr Compile error OK Compile error .get()

Here's the code I used.

• #### Generating formulae for polygonal numbers (triangular, square, pentagonal...)

The formula for square numbers is easy to remember:
sn = n2
There are also triangular numbers, pentagonal numbers, hexagonal numbers, and in general k-gonal numbers. These have formulae too... for example, triangular numbers:
tn = n (n + 1) / 2
... but they can be hard to remember.

If you see a fact, try to see it as intuitively as possible.

Some formulae are not worth memorizing - it is more worthwhile to grasp the ideas behind the formulae and rederive them when you need them (or look them up.) This blog post shows how to derive the formulae for k-gonal numbers.

Let's define a few terms. I've defined sn and tn above, and I'll define pn as the nth pentagonal number and hn as the nth hexagonal number, but let's generalize... let gk,n = g(k, n) be the nth k-gonal number. For example:
g(3, n) = tn = n (n + 1) / 2
g(4, n) = sn = n2
The task, then, is to find a formula for g(k, n) for general k and n.

Consider the nth k-gonal number. For simplicity of illustration, I will focus on a particular number, but the approach followed here should be followed while keeping the general case in mind. So, specifically, let's take a look at the 4th pentagonal number p4 = g(5, 4):

The idea is to conceptually "cut" (n – 1) specific lines...

... and then "unroll" the structure to form a kind of "bar chart":

The base of the chart is precisely one edge of the original diagram and so has length n = 4 (counting dots, not lines.) There are a couple of ways to find the height of the last bar - you can observe that each bar (not counting the bottom dot) adds (k – 2) = 3 over the bar before it (as marked below,) and there are (n – 1) = 3 nontrivial bars.
Or you can note that each bar is the bottom dot, plus the sum of (k – 2) = 3 edges, each of which has (n – 1) = 3 unique dots, if we don't count the first dot as being part of the edge.
Either way, you come up with the height being:
h = 1 + (k – 2)(n – 1) = 1 + 3(3) = 10

So we know the base and the height of the triangle. How do we find the area? Well, if we double the triangle, we get a rectangle:

The base of the rectangle is n = 4, and the height is:
h + 1 = 2 + (k – 2) (n – 1) = (k – 2) n + (4 – k) = (5 – 2) 4 + (4 – 5) = 11
The area of the triangle is half of the area of the rectangle:
g(k, n) = n ((k – 2) n + (4 – k)) / 2 = 4 ((5 – 2) 4 + (4 – 5)) / 2 = 22
Counting dots confirms that there are, in fact, 22.

So we have a general theorem - let's try it out on a few special cases. We can use the ones we know as a check.

g(3, n) = n ((3 – 2) n + (4 – 3)) / 2 = n (n + 1) / 2 = tn
g(4, n) = n ((4 – 2) n + (4 – 4)) / 2 = n2 = sn
g(5, n) = n ((5 – 2) n + (4 – 5)) / 2 = n (3n – 1) / 2 = pn
g(6, n) = n ((6 – 2) n + (4 – 6)) / 2 = n (2n – 1) = hn
g(7, n) = n ((7 – 2) n + (4 – 7)) / 2 = n (5n – 3) / 2
...
g(k, n) = n ((k – 2) n + (4 – k)) / 2
...

Further checks: g(k, 1) should be 1 for all k, and g(k, 2) should be k for all k. These also check out.

• #### How to enumerate DirectShow filters on your system

Last time I showed how to enumerate Media Foundation transforms.  This time, DirectShow filters.  Source and binaries attached.

In pseudocode:

GUID directshow_categories[] = { ... }; // static list
ICreateDevEnum = CoCreate(...);
for (each category in directshow_categories) {
display the category name and GUID

IEnumMoniker = ICreateDevEnum::CreateClassEnumerator(category);

for (each IMoniker that IEnumMoniker finds) {
IBindCtx = CreateBindCtx(...);
IPropertyBag = IMoniker::BindToStorage(IBindCtx);

pull the friendly name from the IPropertyBag and display it
}
}

Output of the tool on my system:

>devenum.exe
-- Audio Capture Sources ({33D9A762-90C8-11D0-BD43-00A0C911CE86}) --
Front Microphone (High Definiti
Line In (High Definition Audio
Rear Microphone (High Definitio

-- Audio Compressors ({33D9A761-90C8-11D0-BD43-00A0C911CE86}) --
WM Speech Encoder DMO
WMAudio Encoder DMO
PCM
GSM 6.10
CCITT A-Law
CCITT u-Law
MPEG Layer-3

-- Audio Renderers ({E0F158E1-CB04-11D0-BD4E-00A0C911CE86}) --
Speakers (High Definition Audio
Default DirectSound Device
Default WaveOut Device
DirectSound: Speakers (High Definition Audio Device)

-- Device Control Filters ({CC7BFB46-F175-11D1-A392-00E0291F3959}) --

-- DirectShow Filters ({083863F1-70DE-11D0-BD40-00A0C911CE86}) --
WMAudio Decoder DMO
WMAPro over S/PDIF DMO
WMSpeech Decoder DMO
MP3 Decoder DMO
Mpeg4s Decoder DMO
WMV Screen decoder DMO
WMVideo Decoder DMO
Mpeg43 Decoder DMO
Mpeg4 Decoder DMO
DV Muxer
Color Space Converter
AVI Splitter
VGA 16 Color Ditherer
SBE2MediaTypeProfile
Microsoft DTV-DVD Video Decoder
AC3 Parser Filter
StreamBufferSink
Microsoft TV Captions Decoder
MJPEG Decompressor
CBVA DMO wrapper filter
MPEG-I Stream Splitter
SAMI (CC) Parser
VBI Codec
MPEG-2 Splitter
Closed Captions Analysis Filter
SBE2FileScan
Microsoft MPEG-2 Video Encoder
Internal Script Command Renderer
MPEG Audio Decoder
DV Splitter
Video Mixing Renderer 9
Microsoft MPEG-2 Encoder
ACM Wrapper
Video Renderer
MPEG-2 Video Stream Analyzer
Line 21 Decoder
Video Port Manager
Video Renderer
VPS Decoder
WM ASF Writer
VBI Surface Allocator
File writer
iTV Data Sink
iTV Data Capture filter
DVD Navigator
Microsoft TV Subtitles Decoder
Overlay Mixer2
AVI Draw
RDP DShow Redirection Filter
Microsoft MPEG-2 Audio Encoder
WST Pager
MPEG-2 Demultiplexer
DV Video Decoder
SampleGrabber
Null Renderer
MPEG-2 Sections and Tables
Microsoft AC3 Encoder
StreamBufferSource
Smart Tee
Overlay Mixer
AVI Decompressor
NetBridge
AVI/WAV File Source
Wave Parser
MIDI Parser
Multi-file Parser
File stream renderer
Microsoft DTV-DVD Audio Decoder
StreamBufferSink2
AVI Mux
Line 21 Decoder 2
File Source (Async.)
File Source (URL)
Media Center Extender Encryption Filter
AudioRecorder WAV Dest
AudioRecorder Wave Form
SoundRecorder Null Renderer
Infinite Pin Tee Filter
Enhanced Video Renderer
BDA MPEG2 Transport Information Filter
MPEG Video Decoder

-- External Renderers ({CC7BFB41-F175-11D1-A392-00E0291F3959}) --

-- Midi Renderers ({4EFE2452-168A-11D1-BC76-00C04FB9453B}) --
Default MidiOut Device
Microsoft GS Wavetable Synth

-- Video Capture Sources ({860BB310-5D01-11D0-BD3B-00A0C911CE86}) --

-- Video Compressors ({33D9A760-90C8-11D0-BD43-00A0C911CE86}) --
WMVideo8 Encoder DMO
WMVideo9 Encoder DMO
MSScreen 9 encoder DMO
DV Video Encoder
MJPEG Compressor
Intel IYUV codec
Intel IYUV codec
Microsoft RLE
Microsoft Video 1

-- WDM Stream Decompression Devices ({2721AE20-7E70-11D0-A5D6-28DB04C10000}) --

-- WDM Streaming Capture Devices ({65E8773D-8F56-11D0-A3B9-00A0C9223196}) --
HD Audio Microphone
HD Audio Muxed capture

-- WDM Streaming Crossbar Devices ({A799A801-A46D-11D0-A18C-00A02401DCD4}) --

-- WDM Streaming Rendering Devices ({65E8773E-8F56-11D0-A3B9-00A0C9223196}) --
HD Audio Speaker

-- WDM Streaming Tee/Splitter Devices ({0A4252A0-7E70-11D0-A5D6-28DB04C10000}) --
Tee/Sink-to-Sink Converter

-- WDM Streaming TV Audio Devices ({A799A802-A46D-11D0-A18C-00A02401DCD4}) --

-- WDM Streaming TV Tuner Devices ({A799A800-A46D-11D0-A18C-00A02401DCD4}) --

-- WDM Streaming VBI Codecs ({07DAD660-22F1-11D1-A9F4-00C04FBBDE8F}) --

-- WDM Streaming Communication Transforms ({CF1DDA2C-9743-11D0-A3EE-00A0C9223196}) --
Tee/Sink-to-Sink Converter

-- WDM Streaming Data Transforms ({2EB07EA0-7E70-11D0-A5D6-28DB04C10000}) --

-- WDM Streaming Interface Transforms ({CF1DDA2D-9743-11D0-A3EE-00A0C9223196}) --

-- WDM Streaming Mixer Devices ({AD809C00-7B88-11D0-A5D6-28DB04C10000}) --

-- BDA Network Providers ({71985F4B-1CA1-11D3-9CC8-00C04F7971E0}) --
Microsoft ATSC Network Provider
Microsoft DVBC Network Provider
Microsoft DVBS Network Provider
Microsoft DVBT Network Provider
Microsoft Network Provider

-- BDA Receiver Components ({FD0A5AF4-B41D-11D2-9C95-00C04F7971E0}) --

-- BDA Rendering Filters ({71985F4A-1CA1-11D3-9CC8-00C04F7971E0}) --

-- BDA Source Filters ({71985F48-1CA1-11D3-9CC8-00C04F7971E0}) --

-- BDA Transport Information Renderers ({A2E3074F-6C3D-11D3-B653-00C04F79498E}) --
BDA MPEG2 Transport Information Filter
MPEG-2 Sections and Tables

-- Video Effects (1 input) ({CC7BFB42-F175-11D1-A392-00E0291F3959}) --
BasicImage
Convolution
Chroma
Matrix
Pixelate
DxtAlphaSetter Class
Text Label Class
Scale
Blur
Glow
ICMFilter
Alpha
Wave
MotionBlur
Emboss
Engrave
Light

-- Video Effects (2 inputs) ({CC7BFB43-F175-11D1-A392-00E0291F3959}) --
CrBlinds
Iris
ZigZag
RandomBars
CrIris
Spiral
Pixelate
Wheel
Strips
CrStretch
Inset
CrSlide
CrInset
Compositor
Blinds
CrSpiral
Wipe
CheckerBoard
DxtCompositor Class
CrBarn
DxtKey Class
Slide
DxtJpeg Class
CrZigzag
Barn
Stretch
RandomDissolve

-- EncAPI Encoders ({7D22E920-5CA9-4787-8C2B-A6779BD11781}) --
Microsoft MPEG-2 Audio Encoder
Microsoft MPEG-2 Video Encoder

Microsoft MPEG-2 Encoder

EDIT September 28 2015: moved source to https://github.com/mvaneerde/blog/tree/master/devenum

• #### xargs start

Like many programmers, I've messed around in a lot of development environments - including the UNIX/Linux family of operating systems.

One of the UNIX commands I like is xargs... this is very handy when writing one-off command lines or scripts.

I like it so much that I miss it terribly when I'm in a Windows OS.  So I wrote an xargs.bat and stuck it in my PATH.  Now instead of writing complicated for loops, I can just pipe to xargs.

C:\Users\MatEer\Desktop\custom-path>type xargs.bat
@echo off

setlocal

if /i (%1)==(/?) goto USAGE

if /i (%1)==() goto USAGE

goto NOQUOTES

:USAGE
echo usage: something-that-produces-output ^| %0 [/?] [/addquotes] thing-to-run
echo   xargs.bat by Matthew van Eerde 10/3/2005
echo.
echo   something-that-produces-output should write to its STDOUT
echo   thing-to-run will have each line of the output appended to it,
echo   then will be run successively
echo.
echo   If /addquotes is set, quotes will be added around the line
echo   before appending the line to thing-to-run
echo.
echo   If you call xargs without piping output to it, xargs will wait
echo   for you to type something in on STDIN.
echo   Ctrl-Z on its own line to finish.
goto END

shift

rem Alas, shift doesn't affect %*
if (%1)==() goto USAGE
set basecommand=%1
shift

:BUILDBASECOMMAND
if (%1)==() goto DONEBASECOMMAND
set basecommand=%basecommand% %1
shift
goto BUILDBASECOMMAND
:DONEBASECOMMAND

rem run the program specified by %*
rem as many times as there are lines in STDIN
rem with one extra argument -- defined by each line of STDIN -- in quotes
rem
rem all that the find command does is intercept STDIN
rem
for /F "usebackq delims=" %%a in (`find /v ""`) do %basecommand% "%%a"

goto END

:NOQUOTES

rem run the program specified by %*
rem as many times as there are lines in STDIN
rem with extra arguments defined by each line of STDIN
rem
rem all that the find command does is intercept STDIN
rem
for /F "usebackq delims=" %%a in (`find /v ""`) do call %* %%a

goto END

:END

endlocal

This allows wonderfully anaOStic things like findstr /m IFoo * | xargs start

EDIT 2015-10-31: added script to https://github.com/mvaneerde/blog/blob/master/scripts/xargs.bat

• #### Why a full-scale sine wave has an intensity of -3 dB FS

I was asked one day why this full-scale sine wave was being measured by our signal analysis tools as -3 dB FS, even though it hits the maximum and the minimum sample values:

The answer is "because it's a sine wave, not a square wave."  The intensity of a signal can be calculated from the following formula:

The inner integral does not depend on t - it's just the average sample value - so it's usually precalculated:

The importance of taking dc into account during analysis can be appreciated if you try to calculate the intensity of a signal with a high dc.

Exercise: calculate the intensity of x(t) ≡ -0.5 using the formulas above; calculate the "naïve intensity" by using the last formula above and omitting dc. Note the difference.

Now that we have the necessary formulas, let's analyze our full-scale sine wave signal.  Plugging in a full-scale sine wave and analyzing over a full period we get:

As expected, the average value of the signal over a single period is 0.

Evaluating the intensity requires finding the antiderivative of (sin(t))2. This can be ascertained most easily by plotting a few values and realizing that (sin(t))2 = (1 - cos(2t))/2:

This is a dimensionless number that ranges from 0 (signal is a flat line) to 1 (... we'll get to that later.)

We can convert this into a dB FS measurement using the formula IdB FS = 20 log10 IRMS:

Et voilà - that's where the -3 dB comes from.

In contrast to a sine wave, a full-scale square wave centered at 0 has an intensity of 0 dB FS:

To finish up, a couple of advanced exercises:

Advanced Exercise: prove that, provided t1 and t2 are sufficiently far apart, the intensity of a general sine wave x(t) = a sin(ωt + φ) + c depends only on a and not on ω, φ, or c.

Advanced Exercise: completely characterize the set of digital signals that achieve 0 dB FS intensity.  If you have, say, 1000 samples of mono 16-bit integer audio to play with, how many distinct signals achieve 0 dB FS intensity?

• #### Nitpicking Sam Loyd's "Organ Pipes" mate in two

Sam Loyd was a wacky guy.  His legacy of puzzles is well worth digging through.  (I'd stay away from the 15/14 puzzle though.)

A sprinkling of his puzzles were chess-related, and plenty of them stand up well in the greater lexicon of chess puzzles.  For example, his "Organ Pipes" mate-in-two problem is quite well known:

White to move and mate in two (Sam Loyd, 1859)

(If you haven't seen it before, go ahead and try it out.  It's a cute little interference problem.)

OK, now we get to it.

From a technical point of view, there's a slight flaw here.  Ideally, in a chess problem Black has very many options, and White has only a single path to victory in all variations.

This is not the case here.  If the Bishop on f8 wanders away (1. ... Bg7, 1. ... Bh6) or is blocked by the Rook on e8 (1. ... Re7) then White has two mates: 2. Qb6# or 2. Qxb4#.

 1. Qa5 1. ... Bb7 2. Nf5# 1. ... Bd7 2. Qd5# 1. ... Be6 2. Qe5# 1. ... Bf5 2. Nxf5# 1. ... Rd7 2. Nf5# 1. ... Rd6 2. Qxb4# 1. ... Rd5 2. Qxd5# 1. ... Re7 2. Qb6# or 2. Qxb4# 1. ... Rd6 2. Nf5# 1. ... Re5 2. Qxe5# 1. ... Bc5 2. Qa1# 1. ... Bd6 2. Qd5# 1. ... Be7 2. Qe5# 1. ... Bg7 2. Qb6# or 2. Qxb4# 1. ... Bh6 2. Qb6# or 2. Qxb4#

Luckily, the problem can be easily "fixed" - add a black pawn on a7:

White to move and mate in two
(Sam Loyd, 1859; version by Matthew van Eerde, 2008)

This covers the b6 square.  Now the only mate is to take on b4.

Beauty is a tricky beast though.  This is now a more "technically correct" problem, but the position is no longer quite as "natural"-looking.  (I quote "natural" because I've seen plenty of ugly over-the-board positions.)  The pawn on a6 must have come from b7, and the pawn on b4 must have come from c7 (or perhaps from further away.)

I'm not sure which version I prefer, but it's nice that the flaw is not serious.

• #### The case of the default audio device from the future

Hover the mouse over each picture (xkcd-style) in turn to get the text that tells the story.

An amusing behavior of Windows Vista recently came to my attention... sometimes the Sound control panel wants to go back... to the future!

So far, so good.

D'oh! Luckily, there's a way to recover...

Voilà tout.
• #### Car Talk puzzler analysis - palindromic odometers

Last week's Car Talk question was again of mathematical interest - it dealt with palindromic numbers on a car's odometer. (This is not the first palindromic odometer question they've done.)

Read Car Talk's "Tommy's Drive to Work" puzzler question

Short version:

Tommy goes for a short drive (under an hour; possibly well under an hour) in his 19-year old new car. He notices that the odometer reading at the beginning and the end of the drive are both palindromes (we're given that his odometer shows six digits.) The question is, how many miles did he drive?

You're supposed to do a bunch of math, and then experience an epiphany when you realize that, against all odds, there's a single unique answer, even though the odometer readings themselves cannot be uniquely determined.

Alas, the problem is flawed. There are two perfectly good (and different) answers.

The first reading is given as a palindrome - the first three digits (let's call them a, b, c) are the same as the last three digits in reverse (c, b, a). So we can write the first number as abccba.

The second reading is also given as a palindrome, and we're given that it's a different palindrome (we infer that the odometer is not broken, and he drove at least a mile.) So we can write the second number as xyzzyx.

We want to find whether it's possible to get these two readings while driving a reasonably small distance. The distance driven is the difference between the odometer readings: xyzzyxabccba.

xyzzyxabccba
= (100001 x + 10010 y + 1100 z) – (100001 a + 10010 b + 1100 c)
= 100001 (xa) + 10010 (yb) + 1100 (zc)

Because xyzzyx > abccba, we know that xa. Also, if x = a, we know that yb. Finally, if x = a and y = b, we know that z > c (z cannot equal c in this case.) Let's look at these three cases separately.

Case 1: x = a, y = b, z > c
xyzzyxabccba
= 100001 (xa) + 10010 (yb) + 1100 (zc)
= 100001 (0) + 10010 (0) + 1100 (zc)
= 1100 (zc)

z and c can range from 0 to 9, and z > c, so (zc) can range from 1 to 9. So the smallest the distance can be is 1100 miles - for example, 250052 → 259952 (1100 miles.) This is a bit far for an hour's drive.

Case 2: x = a, y > b, no conditions imposed on z vs. c
xyzzyxabccba
= 100001 (xa) + 10010 (yb) + 1100 (zc)
= 100001 (0) + 10010 (yb) + 1100 (zc)
= 10010 (yb) + 1100 (zc)

y, b, z, and c can range from 0 to 9; y > b; and no conditions are imposed on z vs. c. (yb) can range from 1 to 9, (zc) can range from -9 to 9. So the smallest the distance can be is 100101 + 1100 (-9) = 110 miles - for example, 379973 → 380083 (110 miles.) It's certainly possible to drive 110 miles in an hour but not without attracting the attention of the local police!

Case 3: x > a, no conditions imposed on y vs. b or z vs. c
xyzzyxabccba
= 100001 (xa) + 10010 (yb) + 1100 (zc)

x, a, y, b, z, and c can range from 0 to 9; x > a; and no conditions are imposed on y vs. b, or z vs. c. (xa) can range from 1 to 9, (yb) and (zc) can range from -9 to 9. So the smallest the distance can be is 100001 (1) + 10010 (-9) + 1100 (-9) = 11 miles - for example, 499994 → 500005 (11 miles.) This is much more reasonable.

The intended answer, then, is 11 miles. There are 9 possible ways to get this distance with palindromic starting and ending readings:

099990 → 100001 (11 miles)
199991 → 200002 (11 miles)
299992 → 300003 (11 miles)
399993 → 400004 (11 miles)
499994 → 500005 (11 miles)
599995 → 600006 (11 miles)
699996 → 700007 (11 miles)
799997 → 800008 (11 miles)
899998 → 900009 (11 miles)

## Not a flaw

Do we consider numbers with unmatched leading zeros (like 001221) to be palindromes? I would say no. If we did this, it would open up a whole new field of answers such as 77 → 101 (24 miles.) The problem statement seems to imply that all six digits of the number, including leading zeros, need to participate in the palindrome. We can probably infer that Tommy's odometer is an analog odometer that actually shows leading zeros, rather than a digital odometer which hides them; this is consistent with the car being 19 years old.

## The flaw

All well and good. Alas, there's a flaw in this beautiful analysis - namely, odometers wrap. It's entirely possible for the assumption that xyzzyx > abccba to break down if Tommy started his drive with a number close to 999999, and his odometer wrapped around to 000000 during the drive.

999999 → 000000 (1 mile.)

There are other "wrapped" palindromes, but the next smallest are 998899 → 000000 (1101 miles) and 999999 → 001100 (1101 miles) which are even worse than case 1.

Could a 19-year-old car have a million miles on it?

That comes out to an average of 144.1 miles a day, which is high, but achievable (it's only 6 mph.) It's true that Tommy is unlikely to have put this many miles on a car if he lives only a mile from work, but remember that this is his new car (that is, it only recently came into his possession.)

• #### Phase and stereo-to-mono downmix

Last time we looked at how to downmix a stereo signal to mono (M = L/2 + R/2).  The resulting M signal can only be as powerful as the L and R signals if they are perfectly correlated (L = R); if the L and R signals are uncorrelated (no relationship between L and R), the resulting M signal is only half as powerful; and if the L and R signals are anticorrelated (L = -R), the resulting M signal vanishes.

This time I want to look more closely at what happens to M as {L, R} varies from correlated via uncorrelated to anticorrelated.  In particular I'll take L and R to be full-scale sine waves of equal frequency and offset by a constant phase: {L, R} = {sin θ, sin(θ - φ)}.  We'll see what happens to IM / IL as a function of φ.

Recall from before that the intensity of a signal s(t) over a time interval (a, b) is defined as

where the dc value is

Our downmixed signal is defined as

So our dc is zero (note that I am taking (a, b) to be a single period:)

A refresher on the angle-sum identities, which I will use without proof:

• sin(A + B) = sin A cos B + cos A sin B
• Corollary: sin(A - B) = sin A cos B - cos A sin B
• cos(A + B) = cos A cos B - sin A sin B
• Corollary: cos(A - B) = cos A cos B + sin A sin B

We know from before that IL = IR = sqrt(1/2) = 0.707.... Let's calculate IM as a function of φ:

Nice and simple.

Are we done? No.

When you see a fact, try to see it as intuitively as possible -- Pólya

OK, let's graph this.

That looks suspiciously like a cosine in absolute value.  In particular, after messing around with a graphing calculator, it looks like:

Is this in fact a trigonometric identity?  Let's see...

Much nicer and simpler.  So why didn't our original derivation land us here?

Let's go back to our original graph and shift our frame of reference forward by φ/2.   Instead of {L, R} = {sin θ, sin(θ - φ)} we get {L, R} = {sin(θ + φ/2), sin(θ - φ/2)}.  Solving for IM now gives:

This result is much easier to digest.  The sqrt(1/2) is the intensity of the pre-mixed signal; the loss of intensity due to the downmixing, then, is |cos(φ/2)|.

• #### Grabbing the output of the Microsoft Speech API text-to-speech engine as audio data

A while ago I wrote a post on Implementing a "say" command using ISpVoice from the Microsoft Speech API which showed how to use Speech API to do text-to-speech, but was limited to playing the generated audio out of the default audio device.

Recently on the Windows Pro Audio forums, user falven asked a question about how to grab the output of the text-to-speech engine as a stream for further processing.

Here's how to do it.

The key part is to use ISpStream::BindToFile to save the audio data to a .wav file, and ISpStream::SetBaseStream to save to a given IStream. Then call ISpVoice::SetOutput with the ISpStream, prior to calling ISpVoice::Speak.

ISpStream *pSpStream = nullptr;
hr = CoCreateInstance(
CLSID_SpStream, nullptr, CLSCTX_ALL,
__uuidof(ISpStream),
(void**)&pSpStream
);
if (FAILED(hr)) {
ERR(L"CoCreateInstance(ISpVoice) failed: hr = 0x%08x", hr);
return -__LINE__;
}
ReleaseOnExit rSpStream(pSpStream);

if (File == where) {
hr = pSpStream->BindToFile(
file,
SPFM_CREATE_ALWAYS,
&SPDFID_WaveFormatEx,
&fmt,
0
);
if (FAILED(hr)) {
ERR(L"ISpStream::BindToFile failed: hr = 0x%08x", hr);
return -__LINE__;
}
} else {
// stream
pStream = SHCreateMemStream(NULL, 0);
if (nullptr == pStream) {
ERR(L"SHCreateMemStream failed");
return -__LINE__;
}

hr = pSpStream->SetBaseStream(
pStream,
SPDFID_WaveFormatEx,
&fmt
);
if (FAILED(hr)) {
ERR(L"ISpStream::SetBaseStream failed: hr = 0x%08x", hr);
return -__LINE__;
}
}

hr = pSpVoice->SetOutput(pSpStream, TRUE);
if (FAILED(hr)) {
ERR(L"ISpVoice::SetOutput failed: hr = 0x%08x", hr);
return -__LINE__;
}

Updated source and binaries attached.

Usage:

>say.exe
say "phrase" [--file <filename> | --stream]
runs phrase through text-to-speech engine
if --file is specified, writes to .wav file
if --stream is specified, captures to a stream
if neither is specified, plays to default output

Here's how to generate a .wav file (uh.wav attached)

>say.exe "uh" --file uh.wav
Stream is 1

And here's how to generate an output stream. The app consumes this and prints the INT16 sample values to the console. uh.txt attached.

>say.exe "uh" --stream
Stream is 1
0        0;        0        0;        0        0;        0        0
0        0;        0        0;        0        0;        0        0
...
86       86;    -1052    -1052;    -2839    -2839;    -3774    -3774
-4199    -4199;    -4581    -4581;    -4284    -4284;    -3640    -3640
-3100    -3100;    -2011    -2011;     -393     -393;      533      533
...

EDIT September 22 2015: moved source to github https://github.com/mvaneerde/blog/tree/master/say

• #### Enumerating inactive volume sessions

However, I recently had a need to pull information for inactive sessions too - and it needed to run downlevel as far as possible (in particular, to Windows 7.) This meant I couldn't use the Store APIs, since they were new for Windows 8. I actually wanted to go down to Windows Vista, but I needed the IAudioSessionControl2 interface, and that was new for windows 7.

Source and binaries attached.

Output on my system:

>volume-sessions.exe

-- Devices of EDataFlow eRender (0) --
Speakers (Realtek High Definition Audio)
Peak: 0
Peak, channel 1 of 2: 0
Peak, channel 2 of 2: 0
Not muted
Volume range: 0% to 100% (-46.5 dB to 0 dB in steps of 0.03125 dB)
Master (%): 65.7804%
Master (dB): -6 dB
Volume, channel 1 of 2: 65.7804%
Volume, channel 2 of 2: 65.7804%
Volume, channel 1 of 2: -6 dB
Volume, channel 2 of 2: -6 dB

Session #1 of 3
State: AudioSessionStateInactive (0)
Icon path:
Display name:
Grouping parameter: {a68c25a9-fb30-4144-8002-d8c7615e81ed}
Session identifier: {0.0.0.00000000}.{2cbea9df-ca93-4e17-9708-e14139fd7044}|\Device\HarddiskVolume1\Program Files (x86)\Internet Explorer\iexplore.exe%b{00000000-0000-0000-0000-000000000000}
Session instance identifier: {0.0.0.00000000}.{2cbea9df-ca93-4e17-9708-e14139fd7044}|\Device\HarddiskVolume1\Program Files (x86)\Internet Explorer\iexplore.exe%b{00000000-0000-0000-0000-000000000000}|1%b10288
Process ID: 10288 (single-process)
System sounds session: no
Peak value: 0
ERROR: IAudioMeterInformation::GetMeteringChannelCount() reports zero channels
Master volume: 1 (0 dB FS)
Not muted
Volume, channel #1 of 2: 100% (0 dB FS)
Volume, channel #2 of 2: 100% (0 dB FS)

Session #2 of 3
State: AudioSessionStateInactive (0)
Icon path:
Display name:
Grouping parameter: {8bd029a1-b7da-48ed-bf89-0094a5961dd0}
Session identifier: {0.0.0.00000000}.{2cbea9df-ca93-4e17-9708-e14139fd7044}|\Device\HarddiskVolume1\Windows\System32\WWAHost.exe%b{00000000-0000-0000-0000-000000000000}
Session instance identifier: {0.0.0.00000000}.{2cbea9df-ca93-4e17-9708-e14139fd7044}|\Device\HarddiskVolume1\Windows\System32\WWAHost.exe%b{00000000-0000-0000-0000-000000000000}|1%b220
Process ID: 220 (single-process)
System sounds session: no
Peak value: 0
Peak, channel 1 of 2: 0
Peak, channel 2 of 2: 0
Master volume: 1 (0 dB FS)
Not muted
Volume, channel #1 of 2: 100% (0 dB FS)
Volume, channel #2 of 2: 100% (0 dB FS)

Session #3 of 3
State: AudioSessionStateInactive (0)
Icon path: @%SystemRoot%\System32\AudioSrv.Dll,-203
Display name: @%SystemRoot%\System32\AudioSrv.Dll,-202
Grouping parameter: {f12ec33c-5337-4bed-8bb5-2114e7d91d23}
Session identifier: {0.0.0.00000000}.{2cbea9df-ca93-4e17-9708-e14139fd7044}|#%b{A9EF3FD9-4240-455E-A4D5-F2B3301887B2}
Session instance identifier: {0.0.0.00000000}.{2cbea9df-ca93-4e17-9708-e14139fd7044}|#%b{A9EF3FD9-4240-455E-A4D5-F2B3301887B2}|1%b#
Process ID: 0 (multi-process)
System sounds session: yes
Peak value: 0
ERROR: IAudioMeterInformation::GetMeteringChannelCount() reports zero channels
Master volume: 1 (0 dB FS)
Not muted
Volume, channel #1 of 2: 100% (0 dB FS)
Volume, channel #2 of 2: 100% (0 dB FS)
-- Devices of EDataFlow eCapture (1) --
No such devices.

EDIT September 22 2015: moved source to github https://github.com/mvaneerde/blog/tree/master/inactive-volume-sessions

• #### Taking audio glitch traces on Windows 10: desktop edition

Sometimes if audio is glitching we will reach out to people and ask them to take glitch traces so we can look at them and try to figure out what is going on.

One of the tools we use to take audio glitch traces on Windows 10 desktop editions, is Windows Performance Recorder; this is part of the Windows Performance Toolkit, which ships as part of the Windows Assessment and Deployment Kit (ADK) for Windows 10.

To take audio glitch traces on a Windows 10 desktop:

2. Go to the "Download kits and tools for Windows 10" page: https://msdn.microsoft.com/en-us/windows/hardware/dn913721%28v=vs.8.5%29.aspx
4. Run. Check the Windows Performance Toolkit feature; you don't need the other features to take audio glitch traces.
5. Start | Windows Performance Recorder - this will prompt for elevation
6. Choose the kind of tracing you want:
• First level triage | First level triage
• Scenario Analysis | Audio glitches
7. Changing Logging mode to "File"
8. Hit the "Start" button within the Windows Performance Recorder window

9. Do the thing that causes the glitches
10. Hit Save

At this point you have an .etl file. This can be analyzed using any tool that consumes .etl files, including:

EDIT 8/27/2015:

Windows Performance Toolkit also comes with a command-line tool: ;">wpr.exe. To capture the same traces using the command line tool:

• Start tracing:
wpr.exe -start GeneralProfile -start Audio -start enhanced-audio-glitches.wprp!MediaProfile -filemode
• Stop tracing and save the results to a file (say, my-wpr-glitches.etl:)
wpr.exe -stop my-wpr-glitches.etl
• (Optional) if you want to cancel tracing:
wpr.exe -cancel
• (Optional) if you want to see whether tracing is currently active:
wpr.exe -status
• #### Walking the IDeviceTopology tree to see audio driver settings

In particular, given an audio endpoint, you can map out all the knobs and widgets of all the signal paths that feed into that endpoint (for playback) or out of it (for recording.)

A version of this has been floating around the forums for some years, but I thought I would spiffy it up a little and make a blog post out of it.

Pseudocode:

for each endpoint exposed by IMMDeviceEnumerator:
figure out whether it's a playback endpoint or a recording endpoint
find the IPart which is
the very end of the signal path for a playback endpoint, or
the very beginning for a recording endpoint:
if the IPart advertises jack information, query it
if the IPart is a volume control, query the range and the current setting
if the IPart is a mute control, query the current setting
(we could continue here for various other kinds of controls, but I'm lazy)
for all parts which
feed in to the IPart for a playback endpoint, or
are fed by the IPart for a recording endpoint:
recursively process that IPart
if the IPart is a connector:
find the IPart on the other side of the connection
process it recursively
take care not to immediately hop back to this graph!

Output:

>devicetopology.exe
eRender endpoint
Name: Speakers (High Definition Audio Device)
Endpoint ID: {0.0.0.00000000}.{351933db-e9b5-41df-8fa5-69c3d84531df}
0x10000:
0x10001: Speakers
Jacks: 1
-- Jack 1 --
ChannelMapping: 0x3
Color: 0x00ff00 (red = 0, green = 255, blue = 0)
Connection Type: 1 (eConnType3Point5mm)
Geometric Location: 1 (eGeoLocRear)
General Location: 0 (eGenLocPrimaryBox)
Port Connection: 0 (ePortConnJack)
IsConnected: No
0x20001: Master Mute
0x20000: Speakers
0x10000:
{0.0.1.00000000}.{6dd94b4a-90d7-4ddc-926d-69312eb53841}
0x10000:

eRender endpoint
Name: Speakers (High Definition Audio Device)
Endpoint ID: {0.0.0.00000000}.{6ce8bdf4-d22f-4ec7-a007-2228540b6705}
0x10000:
0x10001: Speakers
Jacks: 1
-- Jack 1 --
ChannelMapping: 0x3
Color: 0x000000 (red = 0, green = 0, blue = 0)
Connection Type: 3 (eConnTypeAtapiInternal)
Geometric Location: 13 (eGeoLocATAPI)
General Location: 0 (eGenLocPrimaryBox)
Port Connection: 1 (ePortConnIntegratedDevice)
IsConnected: Yes
0x20001: Master Mute
Mute node: MUTED
0x20000: Speakers
Channel 0 volume, -46.5 dB to 0 dB in steps of 1.5 dB: -6 dB
0x10000:
0x10001:
0x20000: HD Audio line out
0x10000: PC Speaker

eRender endpoint
Name: Speakerphone (USB Audio Device)
Endpoint ID: {0.0.0.00000000}.{916afdff-2ba3-4a7f-bca9-0bf75c47c946}
0x10000:
0x10003: Speakerphone
0x20009: DAC
0x20008: Volume
Channel 0 volume, -30.0039 dB to 0 dB in steps of 0.5 dB: -17.747 dB
0x20007: Mute
Mute node: NOT MUTED
0x20006: Sample Rate Converter
0x10000:

eCapture endpoint
Name: Microphone (High Definition Audio Device)
Endpoint ID: {0.0.1.00000000}.{38178e43-ed21-4731-b8f0-1330c3b1cd0a}
0x10000:
0x10001: Microphone
Jacks: 1
-- Jack 1 --
ChannelMapping: 0x0
Color: 0xff80c0 (red = 255, green = 128, blue = 192)
Connection Type: 1 (eConnType3Point5mm)
Geometric Location: 1 (eGeoLocRear)
General Location: 0 (eGenLocPrimaryBox)
Port Connection: 0 (ePortConnJack)
IsConnected: No
0x20001: Microphone
0x20002: Microphone
0x20003: Microphone Boost
0x20000: Sum
0x10000:
{0.0.0.00000000}.{f7bf26d7-89a3-4049-8cd3-8148a6615565}
0x10000:

eCapture endpoint
Name: Speakerphone (USB Audio Device)
Endpoint ID: {0.0.1.00000000}.{a21f51bc-5d0b-47af-93b6-ef9a968bb8ff}
0x10000:
0x10002: Speakerphone
0x20001: Mute
Mute node: NOT MUTED
0x20002: Volume
Channel 0 volume, 0 dB to 32 dB in steps of 0.5 dB: 29 dB
0x20003: SuperMix
0x20004:
0x20005: Sample Rate Converter
0x10001:

eCapture endpoint
Name: Microphone (High Definition Audio Device)
Endpoint ID: {0.0.1.00000000}.{a9d04431-a99f-4ebe-b995-33f0d027559c}
0x10000:
0x10002: Microphone
Jacks: 1
-- Jack 1 --
ChannelMapping: 0x0
Color: 0x000000 (red = 0, green = 0, blue = 0)
Connection Type: 1 (eConnType3Point5mm)
Geometric Location: 2 (eGeoLocFront)
General Location: 0 (eGenLocPrimaryBox)
Port Connection: 0 (ePortConnJack)
IsConnected: No
0x20004: Microphone
0x20005: Microphone
0x20006: Microphone Boost
0x20000: Sum
0x10000:
{0.0.0.00000000}.{f7bf26d7-89a3-4049-8cd3-8148a6615565}
0x10000:

eCapture endpoint
Name: Line In (High Definition Audio Device)
0x10000:
0x10003: Line In
Jacks: 1
-- Jack 1 --
ChannelMapping: 0x0
Color: 0x0000ff (red = 0, green = 0, blue = 255)
Connection Type: 1 (eConnType3Point5mm)
Geometric Location: 1 (eGeoLocRear)
General Location: 0 (eGenLocPrimaryBox)
Port Connection: 0 (ePortConnJack)
IsConnected: No
0x20007: Line In
0x20008: Line In
0x20000: Sum
0x10000:
{0.0.0.00000000}.{f7bf26d7-89a3-4049-8cd3-8148a6615565}
0x10000:

And here's a screenshot of the Windows Driver Kit's Kernel Streaming Studio showing the topology exposed by the audio drivers on my machine:

Source and x86/amd64 binaries attached.

EDIT September 22 2015: moved source to github https://github.com/mvaneerde/blog/tree/master/devicetopology

• #### World Cup tiebreaks: how the United States can survive Group G

With World Cup Group G standings being the way they are, I decided to read up on tiebreaks.

Here are the FIFA rules (PDF)

The relevant sections are:

18.4.a: ... with three points for a win, one point for a draw and no points for a defeat (league format)

...

18.6: In the league format, the ranking in each group is determined as follows:

a) greatest number of points obtained in all group matches;
b) goal difference in all group matches;
c) greatest number of goals scored in all group matches.

If two or more teams are equal on the basis of the above three criteria, their rankings shall be determined as follows:

d) greatest number of points obtained in the group matches between the teams concerned;
e) goal difference resulting from the group matches between the teams concerned;
f) greater number of goals scored in all group matches between the teams concerned;
g) the goals scored away from home count double between the teams concerned (if the tie is only between two teams).

18.7 discusses play-off matches between teams that are still tied after all of 18.6.a-g are applied. As we'll see, there is no possibility of the United States being involved in a play-off.

Let's start with 18.6.a. The current point standings in Group G are:

Germany
Four points, with a win and a tie
United States
Four points, with a win and a tie
Ghana
One point, with a tie and a defeat
Portugal
One point, with a tie and a defeat

The remaining games are United States-Germany and Portugal-Ghana.

If the United States-Germany game results in a tie, then the United States and Germany will have five points apiece, with a win and two ties. They will be the top two teams in the group, regardless of the outcome of the Portugal-Ghana game.

If the Portugal-Ghana game results in a tie, then Ghana and Portugal will have two points apiece, with two ties and a defeat. Regardless of the results of the United States-Germany game, the United States and Germany will be the top two teams in the group.

So we need only consider what happens if both games are decisive.

The winner of the United States-Germany game will have seven points, with two wins and a tie. They are in on points.

The loser of the Portugal-Ghana game will have one point, with one tie and two defeats. They are out on points.

The remaining two teams (the loser of the United States-Germany game, and the winner of the Portugal-Ghana game) will have four points apiece, with a win, a tie, and a defeat.

So in this scenario, 18.6.b will be invoked. At this point we have to look at the goal difference for the teams. (The goal difference for, say, Germany is just the number of goals Germany scored, minus the number of goals Germany allowed.) These are currently:

Germany
+4
United States
+1
Ghana
-1
Portugal
-4

The raw numbers look good for Germany and the United States, and bad for Ghana and Portugal. But let's bear in mind that we are considering the scenario where Germany or the United States has just lost, and Ghana or Portugal has just won. How do you win? By scoring more goals than you allow! So the goal differential of the loser of the United States-Germany game will be lower than it is now, and the goal differential of the winner of the Portugal-Ghana game will be higher than it is now.

So as a United States fan:

1. Beat Germany and we're in on points.
2. If we can't beat Germany, tie Germany and we're in on points.
3. If Germany beats us, root for a Portugal-Ghana tie and we're in points.
4. If Germany beats us, and the Portugal-Ghana game is also decisive, root for Portugal to win, and for the margin of victory for both games to add up to no higher than four, and we're in on goal differential.
More explicitly:
1. If Germany wins by one goal, our GD will be zero; Portugal can win by as many as three goals and we will be in on goal differential. If Portugal wins by five or more, we're out on goal differential. If Portugal wins by precisely four, see below.
2. If Germany wins by two goals, our GD will be -1; Portugal can win by as many as two goals and we will be in on goal differential. If Portugal wins by four or more, we're out on goal differential. If Portugal wins by precisely three, see below.
3. If Germany wins by three goals, our GD will be -2; Portugal can win by one goal and we will be in on goal differential. If Portugal wins by three or more, we're out on goal differential. If Portugal wins by precisely two, see below.
4. If Germany wins by four goals, our GD will be -3. If Portugal wins by two or more, we're out on goal differential. If Portugal wins by precisely one, see below.
5. If Germany wins by five goals or more, we're out on goal differential.
5. If Germany beats us, and Ghana beats Portugal, root for the margin of victory in both games to add up to two.
1. If Germany wins by one goal, our GD will be zero; if Ghana wins by one goal, their GD will also be zero. See below.
2. If Germany wins by two goals or more, we're out on goal differential.

What if Germany beats us, and Portugal beats Ghana, but the margin of victory of both games adds up to exactly four? Or what if Germany beats us and Ghana beats Portugal, and the margin of victory of both games adds up to precisely two - that is, each game is decided by a single goal?

We can still get in on 18.6.c. At this point we have to look at the number of goals scored by each team. These are currently:

Germany
6
United States
4
Ghana
3
Portugal
2

Note that the United States has one more goal than Ghana, and two more goals than Portugal. This is good. In the "goal differential" case, there was a strong relationship between winning and changing your goal differential. But there is only a weak relationship between winning and changing your raw goal count.

For example, if Germany beats the United States 3-2, and Ghana beats Portugal 1-0, the United States will actually have pulled even farther ahead in the 18.6.c race!

So:

1. If Germany beats us, and Portugal beats Ghana, and the margin of victory of both games adds up to exactly four, root for Portugal's winning score to be no more than one goal higher than our losing score, and we're in on goals scored.
1. If Portugal's winning score is three or more goals higher than our losing score, we're out on goals scored.
2. If Portugal's winning score is precisely two goal higher than our losing score, see below.
2. If Germany beats us, and Ghana beats Portugal, and the margin of victory of both games adds up to exactly two, root for Ghana's winning score to be no more than our losing score, and we're in on goals scored.
1. If Ghana's winning score is two or more goals than our losing score, we're out on goals scored.
2. If Ghana's winning score is precisely one goal higher than our losing score, see below.

What if Germany beats us, and Portugal beats Ghana, and the margin of victory of the two games adds up to exactly four, and Portugal's winning score is precisely two goals more than our losing score? Or what if Germany beats us and Ghana beats Portugal, and the margin of victory of both games adds up to precisely two, and Ghana's winning score is precisely one goal higher than our losing score?

Then we are in or out on 18.6.d-g. At this point we have to look at the results of the Ghana-United States and United States-Portugal matches. In each match, the first team is the home team and the second team is the away team.

Ghana-United States
1-2
United States-Portugal
2-2

So:

1. If Germany beats us, and Portugal beats Ghana, and the margin of victory of both games adds up to exactly four, and Portugal's winning score is precisely two goals higher than our losing score, we're out:
1. We're still tied on direct match points (18.6.d)
2. We're still tied on direct match goal differential (18.6.e)
3. We're still tied on direct match goals scored (18.6.f)
4. We're out on direct match goals scored with away goals counting double (18.6.g)
2. If Germany beats us, and Ghana beats Portugal, and the margin of victory of both games adds up to exactly two, and Ghana's winning score is precisely one more than our losing score, then we're in on direct match points (18.6.d)

Note there is no possibility of a playoff. Had the United States-Portugal game been a 0-0 tie instead of a 2-2 tie, that would have been a possibility, since double 0 is still 0.

• #### shellproperty.exe - set/read string properties on a file from the command line

Yesterday Raymond Chen blogged a "Little Program" which could edit audio metadata. As it happens, I have a similar tool I threw together which accepts a property key and a string property value to update a property, or can read a string or string-vector property.

Usage:

>shellproperty
shellproperty set <key> to <string> on <filename>

Here's an example _fixup.bat script I use to set audio metadata on my copy of Giuseppe Sinopoli's recording of Madama Butterfly, to help distinguish it from other recordings of the same opera that I have.

@echo off
dir /s /b "I *.mp3" | xargs /addquotes shellproperty set PKEY_Music_AlbumTitle to "Madama Butterfly - Sinopoli / Freni: 1 of 3" on
dir /s /b "II *.mp3" | xargs /addquotes shellproperty set PKEY_Music_AlbumTitle to "Madama Butterfly - Sinopoli / Freni: 2 of 3" on
dir /s /b "III *.mp3" | xargs /addquotes shellproperty set PKEY_Music_AlbumTitle to "Madama Butterfly - Sinopoli / Freni: 3 of 3" on

Source and amd64/x86 binaries attached, but in substance it's very similar to Raymond's "Little Program".

Possible future improvements:

1. When setting, allow specifying a vartype on the command line.
2. Allow specifying a property key by fmtid and pid.
3. Handle more vartypes for displaying properties.
4. Allow dumping all properties on a given file.

EDIT September 22 2015: removed source and binaries as this is obsoleted by http://blogs.msdn.com/b/matthew_van_eerde/archive/2013/09/24/shellproperty-exe-v2-read-all-properties-on-a-file-set-properties-of-certain-non-vt-lpwstr-types.aspx

• #### Buffer size alignment and the audio period

I got an email from someone today, paraphrased below:

Q: When I set the sampling frequency to 48 kHz, and ask Windows what the audio period is, I get exactly 10 milliseconds. When I set it to 44.1 kHz, I get very slightly over 10 milliseconds: 10.1587 milliseconds, to be precise. Why?
A: Alignment.

A while back I talked a bit about the WASAPI exclusive-mode alignment dance. Some audio drivers have a requirement that they deal in buffer sizes which are multiples of a certain byte size - for example, a common alignment restriction for HD Audio hardware is 128 bytes.

A more general audio requirement is that buffer sizes be a multiple of the size of a PCM audio frame.
For example, suppose the audio format of a stream is stereo 16-bit integer. A single PCM audio frame will be 2 * 2 = 4 bytes. The first two bytes will be the 16-bit signed integer with the sample value for the left channel; the last two bytes will be the right channel.
As another example, suppose the audio format of a stream is 5.1 32-bit floating point. A single PCM audio frame will be 6 * 4 = 24 bytes. Each of the six channels are a four-byte IEEE floating-point value; the channel order in Windows will be {Left, Right, Center, Low-Frequency Effects, Side Left, Side Right}.

The audio engine tries to run at as close to a 10 millisecond cadence as possible, subject to the two restrictions above. Given a "desired minimum interval" (in milliseconds), and a streaming format, and an "alignment requirement" in bytes, you can calculate the closest achievable interval (without going under the desired interval) as follows:

Note: this only works for uncompressed formats
aligned_buffer(desired_milliseconds, format, alignment_bytes)
desired_frames = nearest_integer(desired_milliseconds / 1000.0 * format.nSamplesPerSec)
alignment_frames = least_common_multiple(alignment_bytes, format.nBlockAlign) / format.nBlockAlign
actual_frames = ceiling(desired_frames / alignment_frames) * alignment_frames
actual_milliseconds = actual_frames / format.nSamplesPerSec * 1000.0

Here's a table of the actual buffer size (in frames and milliseconds), given a few typical inputs:

 Desired (milliseconds) Format Alignment (bytes) Desired frames Alignment (frames) Actual (frames) Actual (milliseconds) 10 44.1 kHz stereo 16-bit integer 128 441 32 448 10.16 10 48 kHz stereo 16-bit integer 128 480 32 480 10 10 44.1 kHz 5.1 16-bit integer 128 441 32 448 10.16 10 48 kHz 5.1 16-bit integer 128 480 32 480 10 10 44.1 kHz 5.1 24-bit integer 128 441 64 448 10.16 10 48 kHz 5.1 24-bit integer 128 480 64 512 10.66

So to be really precise, the buffer size is actually 640/63 = 10.158730 milliseconds.

• #### How many numbers are straights?

As many people, I have to deal with a lot of numbers all the time (bug numbers, changelist numbers, tracking numbers, ...) and as a mathematician I sometimes notice when numbers have peculiar properties.

For example, I recently ran into 152463, which is a straight (made up of consecutive digits, not necessarily in order.)  I then ran into 35426, which is also a straight.  Is this odd?

To put it another way - what proportion of numbers are straights?

We consider only non-negative integer numbers {0, 1, 2, ... }.  No duplicate digits are allowed, so the very last straight is 9,876,543,210.

Break it down by the length of the number in digits.  For example, how many four-digit numbers are there?  We can choose any number from 1-9 for the first digit, and any number from 0-9 for each of the other three digits; so there are 9 * 103 numbers of length 4.

How many of these are straights?  There are seven ways to choose a set of four consecutive digits to make up a four-digit straight: {0,1, 2, 3}, {1, 2, 3, 4}, {2, 3, 4, 5}, {3, 4, 5, 6}, {4, 5, 6, 7}, {5, 6, 7, 8}, {6, 7, 8, 9}.

Each of these seven ways can then be shuffled to make the straights themselves.  For six of these seven sets of digits, all possible shufflings are allowed; there are 4! shufflings for each set.  This gives 6 * 4! straights.

But for {0, 1, 2, 3}, we have to be careful not to end up with 0 as a leading digit.  We choose the leading digit first out of the set {1, 2, 3} (there are three ways to do this) and then we have complete freedom to shuffle the three remaining digits (including 0.)  This gives 3 * 3! straights.

So there are 6 * 4! + 3 * 3! straights which are four digits long.

Considering all possible lengths from 1 to 10 (we'll consider 0 as a number of length 1) gives the following table:

 Length Numbers Straights % Straights 1 10 10 100.000% 2 9 * 101 17 18.889% 3 9 * 102 46 5.111% 4 9 * 103 162 1.800% 5 9 * 104 696 0.773% 6 9 * 105 3480 0.387% 7 9 * 106 19,440 0.216% 8 9 * 107 115,920 0.129% 9 9 * 108 685,440 0.076% 10 9 * 109 3,265,920 0.036% Total 1010 40,91,131 0.041%

Since roughly 1% of five-digit numbers are straights, it is not surprising that I see them that often.

• #### Implementing a "listen" command using ISpRecoContext from the Microsoft Speech API

Earlier today I posted a quick "say.exe" sample app which you give text and it speaks the text aloud using the text-to-speech part of the Windows Speech API.  It was very straightforward - only 67 lines of C++ code.

It took me a little longer to figure out how to do this "listen.exe" sample app; you run it, speak into the microphone, and it uses the speech-to-text part of the Windows Speech API to print what you're saying to the console.  This is a little more involved: 202 lines of C++ code.

Pseudocode:

CoInitialize()
CoCreateInstance(ISpRecoContext)
pSpRecoContext->SetInterest(recognition events only, thanks)
pSpRecoContext->CreateGrammar()
pSpRecoGrammar->SetDictationState(active)
while(...) {
wait for a speech event (or the user to press Enter)
pSpRecoContext->GetEvents()
for each speech event {
make sure SPEVENT.eEventId is SPEI_RECOGNITION
event.lParam is an ISpRecoResult
pSpRecoResult->GetText()
print the text
}
}

Usage:

>listen.exe
Speak into the microphone naturally; I will print what I understand.
Press ENTER to quit.
(At this point you start talking into the microphone. Text shows up here shortly after you say it.)

Source and binaries attached.

EDIT September 22 2015: removed source and binaries as this is obsoleted by http://blogs.msdn.com/b/matthew_van_eerde/archive/2014/07/11/using-the-speech-api-to-convert-speech-to-text.aspx

Page 2 of 6 (149 items) 12345»