Yesterday's post by Raymond struck a chord when I read it. He discussed how the property sheets wrote to resource sections and how the kernel would patch up a resource section by making it read/write when the resource was written to.
It turns out that this bit of functionality in the kernel (actually in ntdll.dll, fwiw) was removed in Vista. How do I know this? Well, it turns out that the PlaySound API depended on this functionality to play sounds from resources on 64bit machines.
And how did I know THAT? Well, the Windows startup tone that's played during boot is a resource that's embedded in one of the system dll's, and is played with the SND_RESOURCE flag. You see, on 64bit machines, if the wave format structure within the .WAV file is aligned incorrectly (on a 16 bit boundary, for example), the PlaySound API will move the wave format structure around until it is properly aligned. This safe to do, because it's normally operating on an allocated in-memory copy of the WAV file (when you specify a filename). But when you're playing from a resource, it copies the data around inside the resource - again, not a problem, because there's enough padding within the resource so that this is ok.
However, when they removed the code to automatically mark touched resource pages as read/write, this code that moved the memory around started access violating. We learned about it when every single 64bit machine in the core OS division started crashing on boot if you had a kernel debugger installed.
Fortunately we were able to track down the change relatively quickly and the guys who made the change fixed it (they moved the logic that fixed up resources into our code). While they were at it, they also made the same change to the property sheet logic, because it turns out that problem that Raymond mentioned is still a real issue - there are still apps with broken property sheets that require modifying resources on the fly. Sigh.
Heh, I was waiting for this post ever since I saw your comment on Raymond's blog ;)
> there are still apps with broken property sheets that require modifying resources on the fly.
I guess that's because nobody noticed they had their styles wrong, since the property sheet manager always fixed it up for them...
"But when you're playing from a resource, it copies the data around inside the resource"
What happens if you call PlaySound from two threads simultaneously with the same resource name? Will this potentially corrupt the resource, or does it use a critical section to protect against that?
"I guess that's because nobody noticed they had their styles wrong, since the property sheet manager always fixed it up for them..."
I had the same problem when I left home and went to college. "Hey, why aren't my dirty clothes being automatically picked up and washed by the mystery faerie who visits my room?"
> the kernel would patch up a resource section by making it
> read/write when the resource was written to [...]
Do I understand this properly? Before Vista, when an application tried to write to a read-only page, the question of whether to create a writeable copy or to fire an access violation was decided by the kernel not by user mode stuff. And the criterion on which it decided was whether the page was in a resource section or some other kind of section. Really, the kernel was given this responsibility of figuring out what kind of read-only application page that the application was stepping on?
> when they removed the code to automatically mark touched
> resource pages as read/write this code that moved the
> memory around started access violating [...] every single
> 64bit machine in the core OS division started crashing on boot
> if you had a kernel debugger installed.
So access violations didn't cause crashes when kernel debuggers weren't installed?
I'm all in favour of minimizing crashes in some ways. For example if a sound driver crashes then it's better to let the rest of the OS keep working long enough for the user to save their files to disk instead of BSODing immediately, and let the user reboot when they want their sound back. But when access violations were ignored, what actually were the effects?
Nah, it was all user mode stuff that decided. Basically the "patch up the page" logic is done in the unhandled exception filter in ntdll (this is also the code that handles guard page exceptions).
And the "no exceptions without kernel debuggers" was just me not fully explaining what was going on (because I figured it wasn't important). What actually happened is that a process early in system boot crashed. With a kernel debugger attached, the process broke into the debugger. Without the kernel debugger, the process just broke into watson and generated a crash dump (which was then reported to MS by the windows error reporting facility). We noticed the KD crashes first.