Holy cow, I wrote a book!
Bill Littlefield of NPR's sports program Only a Game interviews Susan Warren about competitive pumpkin-growing. [Direct link - Real format] An excerpt from her book Backyard Giants was printed in The Wall Street Journal: The Race to Break the Squash Barrier, the quest to grow a one-ton pumpkin. I'm fascinated by these subcultures of people obsessed with one thing.
Prerequisite: Understanding what __purecall means.
__purecall
I was asked to help diagnose an issue in which a program managed to stumble into the __purecall function.
XYZ!_purecall: 00a14509 a100000000 mov eax,dword ptr ds:[00000000h] ds:0023:00000000=????????
The stack at the point of failure looked like this:
XYZ!_purecall XYZ!CViewFrame::SetFrame+0x14d XYZ!CViewFrame::SetPresentation+0x355 XYZ!CViewFrame::BeginView+0x1fe
The line at XYZ!CViewFrame::SetFrame that called the mystic __purecall was a simple AddRef:
XYZ!CViewFrame::SetFrame
AddRef
pSomething->AddRef(); // crashes in __purecall
From what we know of __purecall, this means that somebody called into a virtual method on a derived class after the derived class's destructor has run. Okay, well, let's see if we can find the object in question. Since the method being called is a COM method, the __stdcall calling convention applies, which means that the this pointer is on the stack.
__stdcall
this
0:023> dd esp+4 l1 0529f76c 06a88d58
Using our knowledge of the layout of a COM object, we can navigate through memory to find the vtable.
0:023> dps 06a88d58 06a88d58 009b2eac XYZ!CRegistrationSink::`vftable' 06a88d5c 06b20058 06a88d60 00000002 06a88d64 00998930 XYZ!CObjectWithBrush::`vftable' 06a88d68 00000000 06a88d6c 009c9c80 XYZ!CBrowseSite::`vftable' 06a88d70 009c9c70 XYZ!CBrowseSite::`vftable' 06a88d74 00000000 .... 0:023> dps 009b2eac 009b2eac 00a14509 XYZ!_purecall // virtual QueryInterface() = 0 009b2eb0 00a14509 XYZ!_purecall // virtual AddRef() = 0 009b2eb4 00a14509 XYZ!_purecall // virtual Release() = 0 009b2eb8 009cb1e4 XYZ!CRegistrationSink::Register 009b2ebc 009b3d2d XYZ!CRegistrationSink::Unregister
We see that the object has been destructed down to the CRegistrationSink base class, and the attempt to increment its reference count has led us into the abyss of __purecall.
CRegistrationSink
But what was this object before it descended into madness?
Well, we know that the object was something derived from CRegistrationSink. And the other values in memory tell us that the object most likely also derived from CObjectWithBrush and CBrowseSite. Just for fun, here's the CObjectWithBrush vtable, to confirm that we destructed down to that point:
CObjectWithBrush
CBrowseSite
00998930 00a14509 XYZ!_purecall // virtual QueryInterface() = 0 00998934 00a14509 XYZ!_purecall // virtual AddRef() = 0 00998938 00a14509 XYZ!_purecall // virtual Release() = 0 0099893c 0099880d XYZ!CObjectWithBrush::SetBrush 00998940 00a319ee XYZ!CObjectWithBrush::GetBrush 00998944 00a13fd9 XYZ!CObjectWithBrush::`scalar deleting destructor'
Ooh, it looks like CObjectWithBrush has a virtual destructor. Probably to destroy the brush.
A check of the source code tells us that nobody derives from CBrowseSite, so that is almost certainly the original object type.
As a cross-check, we check whether what we have matches the memory layout of a CBrowseSite:
0:023> dt XYZ!CBrowseSite 06a88d58 +0x000 __VFN_table : 0x009b2eac +0x004 m_prgreg : 0x06a88d58 Registration +0x008 m_creg : 2 +0x00c __VFN_table : 0x00998930 +0x010 m_hbr : (null) +0x014 __VFN_table : 0x009c9c80 +0x018 __VFN_table : 0x009c9c70 +0x01c m_cRef : 0
Looks not unreasonable. (Well, aside from the fact that we have a bug...) The object has most likely begun its destruction because its reference count (_cRef) went to zero.
_cRef
At this point, there was enough information to ask the developers responsible for CViewFrame and CBrowseSite to work out how the CViewFrame ended up running around with a pointer to an object that has already been destructed.
CViewFrame
If you work with owner-data listviews, you take the responsibility for managing the data associated with each item in the list view. The list view control itself only knows how many items there are; when it needs information about an item, it asks you for it. It's the fancy name for a "virtual list view" control.
When you use an ownerdata list view, you will receive a new notification, LVN_ODSTATECHANGED. The OD stands for ownerdata, so this is an "owner data state changed" notification. The list view sends this notification when the state of one or more items in an owner data list view control change simultaneously. Mind you, the list view control can also send the LVN_ITEMCHANGED notification if the state of an item changes, so you need to be on the lookout for both.
LVN_ODSTATECHANGED
LVN_ITEMCHANGED
If there is a notification LVN_ITEMCHANGED, then what's the purpose of the LVN_ODSTATECHANGED message? It's redundant, after all.
Well yes, it's redundant, but it's faster, too. The LVN_ODSTATECHANGED notification tells you that the state of all items in the specified range has changed. It's a shorthand for sending an individual LVN_ITEMCHANGED for all items in the range [iFrom..iTo]. If you have an ownerdata list view with 500,000 items and somebody does a select-all, you'll be glad that you get a single LVN_ODSTATECHANGED notification with iFrom=0 and iTo=499999 instead of a half million individual little LVN_ITEMCHANGED notifications.
iFrom=0
iTo=499999
A customer wanted to know if there was a way for their application to invoke the Aero Peek feature so that their window appeared and all the other windows on the system turned transparent.
No, there is no such programmatic interface exposed. Aero Peek is a feature for the user to invoke, not a feature for applications to invoke so they can draw attention to themselves.
Yes, I realize you wrote a program so awesome that all other programs pale in comparison, and that part of your mission is to make all the other programs literally pale in comparison to your program.
Sorry.
Maybe you can meet up with that other program that is the most awesome program in the history of the universe and share your sorrows over a beer.
Many years ago, Microsoft produced a collection of interactive toys called ActiMates, and one of the features was that television programs could broadcast an encoded signal which would enable the toy to interact with the program. The idea would be that the Barney doll would do something that was coordinated with what was happening on Barney & Friends.
When this came out, a bunch of us wondered what it would take to hack into the device and get Barney to say and do, um, very un-Barneyish things. One of us managed to get a schematic for the device, but since none of us was an electrical engineer, that pretty much dead-ended the project.
Over ten years later, I learned that we weren't the only people to get that idea. I met someone who told me that he managed to get his hands on the internal devkit for the ActiMates series and control a Barney doll from his PC. Not satisfied with being limited to the built-in Barney phrases, he was able to "take additional creative steps with the devkit" to stream his own replacement audio to the device (although he was never able to get the sound quality of his streamed audio to sound as good as the built-in phrases). As a result, he could make Barney say whatever he wanted, and if he really felt like it, he could wake up all the Barney toys in his apartment complex at midnight and give orders to his robot army of purple dinosaurs.
The catch was that his robot army most likely would have consisted of just one robot.
Bonus reading: SWEETPEA: Software Tools for Programmable Embodied Agents [pdf], Michael Kaminsky, Paul Dourish, W. Keith Edwards, Anthony LaMarca, Michael Salisbury and Ian Smith, CHI'99.
On Friday, the marketing folks informed me that they decided to put me on the Microsoft Careers United States home page in recognition of Windows 7's first birthday. It's an honor and to be honest a bit scary to be chosen to be the face of Windows on a day of such significance. (They told me that had narrowed it down to me and "some Director of Test". Sorry, Director of Test; maybe they'll pick you for Windows 7's second birthday.)
I think my picture is still there (they didn't tell me how long it was going to be up), but here's a screen capture just to prove it to my relatives:
(Thank goodness they cropped out my withered hand.)
I wondered what would happen if I clicked Find jobs like mine. What did they consider to be jobs like mine? Alas, it just takes you to the job search page with no criteria filled in. Maybe every job at Microsoft is like mine?
One of my colleagues teased me, "Did you really legally change your last name to Windows?"
Consider this code fragment:
void foo() { while (true) { bar(); baz(); } }
When foo calls bar(), and bar has not yet returned, does foo continue executing? Does baz get called before bar returns?
foo
bar()
bar
baz
No, it does not.
The basic structure of the C/C++ language imposes sequential execution. Control does not return to the foo function until bar returns control, either by reaching the end of the function or by an explicit return.
return
Commenter Norman Diamond asks a bunch of questions, but they're all mooted by the first:
I can't find any of the answers in MSDN, and even an answer to one doesn't make answers to others obvious. Unless failures occur, the DialogBox function doesn't return until the new dialog's DialogProc calls EndDialog. It starts its own message loop. Dkring this time the hwndParent (i.e. owner not parent) window is disabled. However, disabling doesn't prevent delivery of some kinds of messages to the parent window's WindowProc or DialogProc, and doesn't prevent delivery of any messages to the application's main message loop, right? So aren't there two or more message loops running in parallel?
I can't find any of the answers in MSDN, and even an answer to one doesn't make answers to others obvious.
Unless failures occur, the DialogBox function doesn't return until the new dialog's DialogProc calls EndDialog. It starts its own message loop. Dkring this time the hwndParent (i.e. owner not parent) window is disabled. However, disabling doesn't prevent delivery of some kinds of messages to the parent window's WindowProc or DialogProc, and doesn't prevent delivery of any messages to the application's main message loop, right? So aren't there two or more message loops running in parallel?
As long as the function DialogBox has not yet returned, control does not return to the application's main message loop, since it is the one which called DialogBox (most likely indirectly).
DialogBox
MSDN doesn't explain this because it is a fundamental property of the C and C++ languages and is not peculiar to Win32.
Disabling a window does not prevent it from receiving messages in general; it only disables mouse and keyboard input. This is called out in the opening sentence of the EnableWindow function documentation:
EnableWindow
The EnableWindow function enables or disables mouse and keyboard input to the specified window or control.
Messages unrelated to mouse and keyboard input are delivered normally. And they aren't dispatched by the application's main message loop because, as we saw above, the main message loop isn't executing!
I would recommend reviewing a book that covers the basics of Win32 GUI programming, since there appear to be some fundamental misunderstandings. Since I try to target an advanced audience, I generally assume that everybody understands the basics and is ready to move on to the intermediate and advanced topics. If you have trouble with the basics, you should work on that part first.
We finish our tour of the evolution of the ICO file format with the introduction of PNG-compressed images in Windows Vista.
The natural way of introducing PNG support for icon images would be to allow the biCompression field of the BITMAPINFOHEADER to take the value BI_PNG, in which case the image would be represented not by a DIB but by a PNG. After all, that's why we have a biCompression field: For forward compatibility with future encoding systems. Wipe the dust off your hands and declare victory.
biCompression
BITMAPINFOHEADER
BI_PNG
Unfortunately, it wasn't that simple. If you actually try using ICO files in this format, you'll find that a number of popular icon-authoring tools crash when asked to load a PNG-compressed icon file for editing.
The problem appeared to be that the new BI_PNG compression type appeared at a point in the parsing code where it was not prepared to handle such a failure (or the failure was never detected). The solution was to change the file format so that PNG-compressed images fail these programs' parsers at an earlier, safer step. (This is sort of the opposite of penetration testing, which keeps tweaking data to make the failure occur at a deeper, more dangerous step.)
Paradoxically, the way to be more compatible is to be less compatible.
The format of a PNG-compressed image consists simply of a PNG image, starting with the PNG file signature. The image must be in 32bpp ARGB format (known to GDI+ as PixelFormat32bppARGB). There is no BITMAPINFOHEADER prefix, and no monochrome mask is present.
PixelFormat32bppARGB
Since we had to break compatibility with the traditional format for ICO images, we may as well solve the problem we saw last time of people who specify an incorrect mask. With PNG-compressed images, you do not provide the mask at all; the mask is derived from the alpha channel on the fly. One fewer thing for people to get wrong.
Windows XP introduced the ability to provide icon images which contain an 8-bit alpha channel. Up until this point, you had only a 1-bit alpha channel, represented by a mask.
The representation of an alpha-blended image in your ICO file is pretty straightforward. Recall that the old ICO format supports 0RGB 32bpp bitmaps. To use an alpha-blended image, just drop in a ARGB 32bpp bitmap instead. When the window manager sees a 32bpp bitmap, it looks at the alpha channel. If it's all zeroes, then it assumes that the image is in 0RGB format; otherwise it assumes it is in ARGB format. Everything else remains the same as for the non-alpha version.
Note carefully that everything else remains the same. In particular, you are still required to provide a mask. I've seen some people be a bit lazy about providing a meaningful mask and just pass in all-zeroes. And everything seems to work just fine, until you hit a case where it doesn't work. (Read on.)
There are basically three ways of drawing an alpha-blended icon image.
DrawIcon(DI_NORMAL)
DrawIcon(DI_IMAGE)
DrawIcon(DI_MASK)
The DI_IMAGE and DI_MASK flags let an application draw just one of the two images contained in an icon image. Applications do this if they want finer control over the icon-drawing process. For example, they might ask for the mask so they can build a shadow effect under the icon. The mask tells them which parts of the icon are opaque and therefore should cast a shadow.
DI_IMAGE
DI_MASK
If you understand this, then you can see how people who set their mask image to all-zeroes managed to get away with it most of the time. Since most programs just use DI_NORMAL to draw icons, the incorrect mask is never used, so the error never shows up. It's only when the icon is used by a program that wants to do fancy icon effects and asks for DI_MASK (or calls GetIconInfo and looks at the hbmMask) that the incorrect mask results in an ugly icon.
DI_NORMAL
GetIconInfo
hbmMask
The ironic thing is that the people who incorrectly the mask to all-zeroes are probably the same people who will then turn around and say, "When I try to use alpha-blended icons, the result is hideously ugly under conditions X and Y. Those Microsoft programmers are such idiots. More proof that Windows is a buggy pile of manure." What they don't realize is that the hideous ugliness was caused by their own error.
Given a HICON or a HCURSOR, how do you get the dimensions of the icon or cursor?
HICON
HCURSOR
The GetIconInfo function gets you most of the way there, returning you an ICONINFO structure which gives you the mask and color bitmaps (and the hotspot, if a cursor). You can then use the GetObject function to get the attributes of the bitmap. And then here's the tricky part: You have to massage the data a bit.
GetIconInfo
ICONINFO
GetObject
// Also works for cursors BOOL GetIconDimensions(__in HICON hico, __out SIZE *psiz) { ICONINFO ii; BOOL fResult = GetIconInfo(hico, &ii); if (fResult) { BITMAP bm; fResult = GetObject(ii.hbmMask, sizeof(bm), &bm) == sizeof(bm); if (fResult) { psiz->cx = bm.bmWidth; psiz->cy = ii.hbmColor ? bm.bmHeight : bm.bmHeight / 2; } if (ii.hbmMask) DeleteObject(ii.hbmMask); if (ii.hbmColor) DeleteObject(ii.hbmColor); } return fResult; }
As we've learned over the past few days, an icon consists of two bitmaps, a mask and an image. A cursor is the same as an icon, but with a hotspot.
To get the dimensions of the icon or cursor, just take the dimensions of the color bitmap. If you have one.
If the icon/cursor is monochrome, then there is no color bitmap. As we've learned, in that case, the mask and image bitmaps are combined into a single double-height bitmap, and the color is reported as NULL. To get the size of the image, you therefore have to take the mask bitmap and divide its height by two.
NULL