I've avoided writing about this because it's "complicated", but people are starting to ask questions that indicate that they're confused so here goes. It's going to take several posts to cover this, so please bear with me.
Simply stated, volume is a measurement of the “loudness” of a sound (to a physicist or audio engineer, the answer is MUCH more complicated than that). There are lots of ways you can calculate volume, one measurement is in decibels (dB), which is a measure of the sound pressure level (SPL) emitted by a speaker.
In general, when discussing volume, there are two terms typically used – attenuation and gain. Attenuation represents a reduction in the amplitude of an audio signal, gain represents an amplification of that signal. If you look at a pro audio receiver, you’ll notice that the receiver represents its loudness as a negative number of decibels (-20dB, for example). This indicates that the receiver is attenuating the input signal by 20dB. By tradition, attenuation is measured in negative decibels, amplification is measured in positive decibels.
Audio signals flow through an electrical path and at different points in the path, there are opportunities to either amplify or attenuate the signal – the locations at which the amplification or attenuation occur are called “gain stages”, and they can occur in either analog or digital signals (an amplifier represents a gain stage, as does a potentiometer).
When converting an analog signal to digital, the analog signal is sampled – the system measures the amplitude of the signal with fairly high resolution (44,100 samples per second for CD audio), then converts the samples to a digital value (a 16 bit integer for CD, or potentially a 32bit floating point value). In both cases, there is a reference range of legal samples – for this example, let’s assume that values range from -1.0 to 1.0 (it makes things easier).
Consider the following waveform:
When the sample is digitized, it is converted to individual samples like below:
Attenuation simply reduces the amplitude of the digital samples, and amplification simply increases the value of the signals.
For the reference sample, if you attenuate the sample by 50%, you get something like this:
Note that the waveform hasn’t changed shape, it’s just smaller.
If, on the other hand, if you amplify the same signal by 50%, you get:
Note that the samples that went beyond the +1 and -1 range were “clipped” – the samples can’t be represented digitally, so they were cut off. This clipping is very bad, and causes significant audio distortions. The new waveform doesn't really reflect the original waveform.
In addition, if a fixed point digital signal is attenuated then later digitally amplified, the signal resolution will be degraded – if, for example you apply a -6dB attenuation (which reduces the volume by 50%), you divide each of the samples in half (32767 becomes 16383). If you later amplify the signal digitally, you get 32766 – and thus you’ve lost some of the original signal information.
If, on the other hand, you’re using a floating point digital signal, you can attenuate and amplify with less worry – if you apply the same -6dB attenuation and amplification to a floating point sample, the division and multiplication cancel out (.75 becomes 0.375, becomes .75).
This is a large part of the reason the audio pipeline was converted to floating point in Vista – a floating point pipeline allows significantly more resolution and higher accuracy when manipulating the digital samples.
Btw, this loss of fidelity doesn’t happen when amplifying analog signals. That’s why it’s important that any amplification be done using analog signals, not digital signals – digital amplification is always lossy.
For audio hardware in Windows, the audio driver specification requires that for all hardware volume controls on the system that 0dB represent a full fidelity pass-through of audio samples – audio hardware can support either amplification or attenuation (or neither), but 0dB always represents “don’t change the samples”.
Please note that some audio hardware on the market does NOT follow this recommendation. We’ve seen audio devices that support a volume range of +0dB to +96dB. We’ve also seen devices that support volume ranges of +10dB to +60dB (mostly these are microphones).
Ok, so much for the basics on "volume", tomorrow I'll start discussing how volume works in the audio engine on Vista.
Very interesting! Yes, more on audio please:)
Yes, I would like to hear much more about this subject.
Cool plots. Not simplistic, not too complicated. I'm betting it's not Excel. :-)
The graphs were built by a contractor working for one of the DSP architects, they were done using Octave (http://www.gnu.org/software/octave/).
Isn't -3dB equivalent to 50% volume reduction:-
10 * log 0.5 = -3.01
You use -6dB in your examples. Am I missing something?
3dB is a 50% POWER reduction, not a 50% volume reduction. 6dB is a 50% volume reduction.
Recorded level difference is 20 log x, total sound pressure is 10 log x. That's because total sound pressure is measured from the positive to the negative peak, whereas recording level is measured from the 0 to peak.
Question, why choose float instead of 32-bit audio? Either would be as good as the other, so the slightly more common was chosen? Or is it 64-bit float?
Is that 0DB meaning +0DB amplification? i.e. it wouldn't correspond to muting the sound, which is what first jumped into my head?
0db meand no attenuation and no amplification, or 100% volume.
Practically, -96dB == silence.
Really cool! I went back to my old days in polytechnic :)
A pictures worth a thousand words. The graphs helped for better understanding. Keep the drift on with full tank "Audio Diesel".
To expand on foxyshadis' point, the problems attributed to fixed-point representations are not really problems with fixed point so much as simply insufficient range, precision, and care in rounding intermediate results. Use 22.10 fixed point with proper rounding, and your 16-bit audio will come through fine. On the other hand, introduce a temporary DC bias of 2^20 in your floating-point mixing, and you'll start running into precision problems again, because floating point trades off precision bits in the mantissa for range bits in the exponent, and that trade-off isn't always in your favor.
Now, this isn't to say that using hardware floating point is a bad idea, because you get graceful degradation around range/precision problems as well as proper rounding for a lot faster than you could do otherwise in fixed point. On DSP hardware with extra-wide accumulators and free rounding, through, you'd find fixed point arithmetic very competitive.
BTW, loss of fidelity doesn't happen when amplifying analog signals? What about noise?
>> why choose float instead of 32-bit audio
It all depends on your standard. x-bit integers in audio and graphics are usually used as a fixed point standard ranging from 0 to 1 (or -1 to 1, depending on the context). You have a uniform distribution of values in the range.
In floating point, instead, you have separated mantissa and exponent (and sign, but that is not exactly a factor). This allows for a uniform distribution (using some less bits of course) in the desired range (-1 to 1). But it also allows the numbers to increase unpredictably (and in exponential way) over the expected range and get back (and viceversa).
If you want to "see" it in effect, take any last generation game and see the difference in colors when float point surfaces are enabled and when not. Even with both at 16bit (integer or floating point) the floating point result is much better because it handles with almost no loss all the lighting calculations (which are mostly muls and divs).
@foxyshadis: Question, why choose float instead of 32-bit audio? Either would be as good as the other, so the slightly more common was chosen? Or is it 64-bit float?
<a href=http://blogs.msdn.com/audiofool/archive/2006/08/23/715608.aspx>32-bit is just a bit overkill.</a>
For any reasonable audio system, you really don't need more than about 20 bits. The IEEE float32 format offers 24 bit audio (23 mantissa bits + 1 sign bit) with an added 8-bit sliding gain (exponent) that is mainly unused.
Yesterday , I talked about volume in general, today I want to drill into volume more detail. In Vista,
What kind of ramifications does having a floating point audio path have when using a Pentium 4?
I know from various DSP forums (link below) that the Pentium 4's, older ones at least, experience extremely significant slowdowns when it comes to denormalized floats, ie, really really small ones, to the point where the solution is generally to either cut to zero below a certain point, or add a constant noise or dc bias to the signal to prevent it reaching the denormalized levels.
So, does Vista use SSE/SSE2 to avoid this? Do you add noise/bias/offset? Do you hope it never happens? Or is it classified :D
Here's a relevant link, which references the popular MusicDSP group, Intel, and several others: