September, 2009

Larry Osterman's WebLog

Confessions of an Old Fogey
  • Larry Osterman's WebLog

    I can make it arbitrarily fast if I don’t actually have to make it work.

    • 27 Comments

    Digging way back into my pre-Microsoft days, I was recently reminded of a story that I believe was told to me by Mary Shaw back when I took her Computer Optimization class at Carnegie-Mellon…

    During the class, Mary told an anecdote about a developer “Sue” who found a bug in another developer’s “Joe” code that “Joe” introduced with a performance optimization.  When “Sue” pointed the bug out to “Joe”, his response was “Oops, but it’s WAY faster with the bug”.  “Sue” exploded “If it doesn’t have to be correct, I can calculate the result in 0 time!” [1].

    Immediately after telling this anecdote, she discussed a contest that the CS faculty held for the graduate students every year.  Each year the CS faculty posed a problem to the graduate students with a prize awarded to the grad student who came up with the most efficient (fastest) solution to the problem.  She then assigned the exact same problem to us:

    “Given a copy of the “Declaration of Independence”, calculate the 10 most common words in the document”

    We all went off and built programs to parse the words in the document, inserting them into a tree (tracking usage) and read off the 10 most frequent words.  The next assignment was “Now make it fast – the 5 fastest apps get an ‘A’, the next 5 get a ‘B’, etc.”

    So everyone in the class (except me :)) went out and rewrote their apps to use a hash table so that their insertion time was constant and then they optimized the heck out of their hash tables[2].

    After our class had our turn, Mary shared the results of what happened when the CS grad students were presented with the exact same problem.

    Most of them basically did what most of the students in my class did – built hash tables and tweaked them.  But a couple of results stood out.

    • The first one simply hard coded the 10 most common words in their app and printed them out.  This was disqualified because it was perceived as breaking the rules.
    • The next one was quite clever.  The grad student in question realized that they could write the program much faster if they wrote it in assembly language.  But the rules of the contest required that they use Pascal for the program.  So the grad student essentially created an array on the stack and introduced a buffer overflow and he loaded his assembly language program into the buffer and used that as a way of getting his assembly language version of the program to run.  IIRC he wasn’t disqualified but he didn’t win because he circumvented the rules (I’m not sure, it’s been more than a quarter century since Mary told the class this story).
    • The winning entry was even more clever.  He realized that he didn’t actually need to track all the words in the document.  Instead he decided to track only some of the words in the document in a fixed array.  His logic was that each of the 10 most frequent words were likely to appear in the first <n> words in the document so all he needed to do was to figure out what "”n” is and he’d be golden.

     

    So the moral of the story is “Yes, if it doesn’t have to be correct, you can calculate the response in 0 time.  But sometimes it’s ok to guess and if you guess right, you can get a huge performance benefit from the result”. 

     

     

    [1] This anecdote might also come from Jon L. Bentley’s “Writing Efficient Programs”, I’ll be honest and say that I don’t remember where I heard it (but it makes a great introduction to the subsequent story).

    [2] I was stubborn and decided to take my binary tree program and make it as efficient as possible but keep the basic structure of the solution (for example, instead of comparing strings, I calculated a hash for the string and compared the hashes to determine if strings matched).  I don’t remember if I was in the top 5 but I was certainly in the top 10.  I do know that my program beat out most of the hash table based solutions.

  • Larry Osterman's WebLog

    Building a flicker free volume control

    • 30 Comments

    When we shipped Windows Vista, one of the really annoying UI annoyances with the volume control was that whenever you resized it, it would flicker. 

    To be more specific, the right side of the control would flicker – the rest didn’t flicker (which was rather strange).

     

    Between the Win7 PDC release (what we called M3 internally) and the Win7 Beta, I decided to bit the bullet and see if I could fix the flicker.  It seems like I tried everything to make the flickering go away but I wasn’t able to do it until I ran into the WM_PRINTCLIENT message which allowed me to direct all of the internal controls on the window to paint themselves.

    Basically on a paint call, I’d take the paint DC and send a WM_PRINTCLIENT message to each of the controls in sndvol asking them each to paint themselves to the new DC.  This worked almost perfectly – I was finally able to build a flicker free version of the UI.  The UI wasn’t perfect (for instance the animations that faded in the “flat buttons” didn’t fire) but the UI worked just fine and looked great so I was happy that' I’d finally nailed the problem.  That happiness lasted until I got a bug report in that I simply couldn’t figure out.  It seems that if you launched the volume mixer, set the focus to another application then selected the volume mixer’s title bar and moved the mixer, there were a ton of drawing artifacts left on the screen.

    I dug into it a bunch and was stumped.  It appeared that the clipping rectangle sent in the WM_PAINT message to the top level message didn’t include the entire window, thus portions of the window weren’t erased.  I worked on this for a couple of days trying to figure out what was going wrong and I finally asked for help on one of our internal mailing lists.

    The first response I got was that I shouldn’t use WM_PRINTCLIENT because it was going to cause me difficulty.  I’d already come to that conclusion – by trying to control every aspect of the drawing experience for my app, I was essentially working against the window manager – that’s why the repaint problem was happening.  By calling WM_PRINTCLIENT I was essentially putting a band-aid on the real problem but I hadn’t solved the real problem, all I’d done is to hide it.

     

    So I had to go back to the drawing board.  Eventually (with the help of one of the developers on the User team) I finally tracked down the original root cause of the problem and it turns out that the root cause was somewhere totally unexpected.

    Consider the volume UI:

    image

    The UI is composed of two major areas: The “Devices” group and the “Applications” group.  There’s a group box control wrapped around the two areas.

    Now lets look at the group box control.  For reasons that are buried deep in the early history of Windows, a group box is actually a form of the “button” control.  If you look at the window styles for a button in SpyXX, you’ll see:

    image

     

    Notice the CS_VREDRAW and CS_HREDRAW window class styles.  The MSDN documentation for class styles says:

    CS_HREDRAW - Redraws the entire window if a movement or size adjustment changes the width of the client area.
    CS_VREDRAW - Redraws the entire window if a movement or size adjustment changes the height of the client area.

    In other words every window class with the CS_HREDRAW or CS_VREDRAW style will always be fully repainted whenever the window is resized (including all the controls inside the window).  And ALL buttons have these styles.  That means that whenever you resize any buttons, they’re going to flicker, and so will all of the content that lives below the button.  For most buttons this isn’t a big deal but for group boxes it can be a big issue because group boxes contain other controls.

    In the case of sndvol, when you resize the volume control, we resize the applications group box (because it’s visually pinned to the right side of the dialog).  Which causes the group box and all of its contained controls to repaint and thus flicker like crazy.  The only way to fix this is to remove the CS_HREDRAW and CS_VREDRAW buttons from the window style for the control.

    The good news is that once I’d identified the root cause, the solution to my problem was relatively simple.  I needed to build my own custom version of the group box which handled its own painting and didn’t have the CS_HREDRAW and CS_VREDRAW class.  Fortunately it’s really easy to draw a group box – if themes are enabled a group box can be drawn with DrawThemeBackground API with the BP_GROUPBOX part and if theming is disabled, you can use the DrawEdge API to draw the group box.  Once I added the new control that and dealt with a number of other clean-up issues (making sure that the right portions of the window were invalidated when the window was resized for example), making sure that my top level control had the WS_CLIPCHILDREN style and that each of the sub windows had the WS_CLIPSIBLINGS style I had a version of sndvol that was flicker free AND which let the window manager handle all the drawing complexity.  There are still some minor visual gotchas in the UI (for example, if you resize the window using the left edge the right side of the group box “shudders” a bit – this is apparently an artifact that’s outside my control – other apps have similar issues when resized on the left edge) but they’re acceptable.

    As an added bonus, now that I was no longer painting everything manually, the fade-in animations on the flat buttons started working again!

     

    PS: While I was writing this post, I ran into this tutorial on building flicker free applications, I wish I’d run into it while I was trying to deal with the flickering problem because it nicely lays out how to solve the problem.

  • Larry Osterman's WebLog

    The story behind the mysterious “Ding” in Windows Vista.

    • 16 Comments

    I just ran into this fairly old post on Channel 9.  mstefan reported that his applications played a “Ding” noise when selecting an item in a listview (or tree) control.

     

    It turns out that I’d had the problem independently reported to me by one of the people here at Microsoft.  Here were his reproduction steps:

    1. Clean Install Windows
    2. Open the sounds control panel
    3. Set the “Select” sound to a value
    4. Clear the ��Select” sound (set it to “(None)”).
    5. Change the selection in a listview

    Complicating matters is that this didn’t occur with many apps, just a couple of Windows Forms applications that the person at Microsoft had.  So I dug into it a bit.

    I quickly realized that the problem was that the application was calling PlaySound without the “SND_NODEFAULT” flag.  I’ve written about this flag before (in fact I wrote that post shortly after investigating this issue, so that post contains part of the story).

    Digging in deeper, I realized that the application was using version 5 of the common controls (there are two versions in Windows, v5 and v6).  The problem didn’t occur in the version 6 common controls (this is why most applications didn’t reproduce the problem).

    So why did the problem occur?

    The common controls have a single routine which calls PlaySound and that code attempts to be somewhat clever.  Instead of simply calling PlaySound, the code instead read the HKEY_CURRENT_USER\AppEvents\Schemes\.Default\CCSelect registry key.  If the key didn’t exist, it skipped calling PlaySound.  If the CCSelect registry key existed, it would call PlaySound specifying the CCSelect alias.

    I’ve actually run into this pattern a bunch in WIndows – teams decided that calling PlaySound was “too slow”, or had “too much overhead” so they tried to avoid the call to PlaySound if the call to PlaySound wasn’t going to do anything (once upon a time it was important to avoid calling PlaySound because it could sometimes hang your application for several seconds – this isn’t the case any more if you specify SND_ASYNC).

    The problem is that when you hit step 2 above (setting the “Select” sound), the control panel created the CCSelect registry key with a default value of the sound file you’re setting.  When you set it to “None” the control panel cleared the value of the sound contained under the key.  But the key itself wasn’t deleted.  So the check above that checked for the registry key’s existence succeeded and it called PlaySound.  But because the call didn’t specify SND_NODEFAULT, the PlaySound API decided to play the default sound when it realized that there wasn’t a sound associated with the CCSelect alias.

     

    I wanted to understand how this bug was introduced so I looked back at the source history of the file containing the bug.  It turns out that this was actually a bug fix made to Windows XP that wasn’t incorporated into Windows Server 2003.  When Windows Vista was created, the team started with the Windows Server 2003 code base and incorporated all the bug fixes made since Windows XP forked into the Win2K3 code base.  For whatever reason, this particular fix was missed when the team did the merge.  Because the behavior was so subtle (to trigger the change you MUST go through steps 2 through 4 to get the registry key created and your app must use the v5 of the common controls).

    Needless to say, it’s fixed in WIndows 7.

  • Larry Osterman's WebLog

    What’s the difference between GetTickCount and timeGetTime?

    • 26 Comments

    I’ve always believed that the most frequently used multimedia API in winmm.dll was the PlaySound API.  However I recently was working with the results of some static analysis tools that were run on the Windows 7 codebase and I realized that in fact the most commonly used multimedia API (in terms of code breadth) was actually the timeGetTime API.  In fact almost all the multimedia APIs use timeGetTime which was somewhat surprising to me at the time.

    The MSDN article for timeGetTime says that timeGetTime “retrieves the system time, in milliseconds. The system time is the time elapsed since the system started.”.

    But that’s almost exactly what the GetTickCount API returns “the number of milliseconds that have elapsed since the system was started, up to 49.7 days.” (obviously timeGetTime has the same 49.7 day limit since both APIs return 32bit counts of milliseconds).

    So why are all these multimedia APIs using timeGetTime and not GetTickCount since the two APIs apparently return the same value?  I wasn’t sure so I dug in a bit deeper.

    The answer is that they don’t.  You can see this with a tiny program:

    int _tmain(int argc, _TCHAR* argv[])
    {
        int i = 100;
        DWORD lastTick = 0;
        DWORD lastTime = 0;
        while (--i)
        {
            DWORD tick = GetTickCount();
            DWORD time = timeGetTime();
            printf("Tick: %d, Time: %d, dTick: %3d, dTime: %3d\n", tick, time, tick-lastTick, time-lastTime);
            lastTick = tick;
            lastTime = time;
            Sleep(53);
        }
        return 0;
    }

    If you run this program, you’ll notice that the difference between the timeGetTime results is MUCH more stable than the difference between the GetTickCount results (note that the program sleeps for 53ms which usually doesn’t match the native system timer resolution):

    Tick: 175650292, Time: 175650296, dTick:  46, dTime:  54
    Tick: 175650355, Time: 175650351, dTick:  63, dTime:  55
    Tick: 175650417, Time: 175650407, dTick:  62, dTime:  56
    Tick: 175650464, Time: 175650462, dTick:  47, dTime:  55
    Tick: 175650526, Time: 175650517, dTick:  62, dTime:  55
    Tick: 175650573, Time: 175650573, dTick:  47, dTime:  56
    Tick: 175650636, Time: 175650628, dTick:  63, dTime:  55
    Tick: 175650682, Time: 175650683, dTick:  46, dTime:  55
    Tick: 175650745, Time: 175650739, dTick:  63, dTime:  56
    Tick: 175650792, Time: 175650794, dTick:  47, dTime:  55
    Tick: 175650854, Time: 175650850, dTick:  62, dTime:  56

    That’s because GetTickCount is incremented by the clock tick frequency on every clock tick and as such the delta values waver around the actual time (note that the deltas average to 55ms so on average getTickCount returns an accurate result but not with spot measurements) but timeGetTime’s delta is highly predictable.

    It turns out that for isochronous applications (those that depend on clear timing) it is often important to be able to retrieve the current time in a fashion that doesn’t vary, that’s why those applications use timeGetTime to achieve their desired results.

  • Larry Osterman's WebLog

    Digging into the history bin (AKA: Microsoft Developer says that Windows is useless)

    • 14 Comments

    As I was writing my “25 years of Larry’s history at Microsoft in 1 year chunks” blog posts, I spent a fair amount of time digging through my email archives (trying to figure out exactly what happened at what time).  During this, I ran into a link to a post I’d made on the Info-IBMPC mailing list mailing list back in 1992:

    Date: Thu, 12 Mar 92 12:44:39 PST
    From: lar...@microsoft.com
    Subject: What do you do with your windows? (V92 #36)

    || >From: m...@Violin.CC.MsState.Edu (Mubashir Cheema)

    ||   I recently acquired Windows 3.0 and I don't seem to understand one
    || thing.  What is it for?  What do I do with it?  What major advantage
    || does it have over Dos?  (I don't see any except being able to use mouse
    || and also the thing is bit more colorful) I think it was made for lazy
    || people who couldn't learn couple of DOS commands.

    ||   Don't tell me I could multi-task with it. I've been using Amigas
    || extensively

    I've got to jump in here, even though I suspect that there will probably be some form of an "official" response from MS if anyone in the DOS/Windows group is listening...... :)

    I'm going to be brutally honest about this one. Basically, Windows by itself IS pretty useless. The thing that makes Windows great is the same thing that has made DOS the most popular operating system in history. It's the applications that are available for it.

    GUI's (Graphical User Interfaces) have been proven to be significantly easier for users to understand for beginning users, and are arguably the wave of the future. I don't know of a significant operating system being introduced for the PC market that doesn't have a GUI available on it, be it PM, X, GEM, or Windows. Windows is arguably the best GUI available for DOS based on what I consider the most significant criteria: What applications are available for the platform.

    Consider the list of available windows apps: Excel, WinWord, PageMaker, Corel Draw, WordPerfect, Lotus 123, etc just to name a couple off the
    top of my head.

    You also hit on one of the significant reasons to use Windows - Multi-tasking.

    Windows is a non pre-emptive multi-tasking operating system.  On a 386, it does an ok job of multi-tasking multiple DOS applications, but on a
    286 it functions as a simple task switcher like DOS 5 does.  It really shines when multi-tasking Windows applications however.

    In addition, when you couple the multi-tasking capabilities of Windows with a windows mechanism known as DDE (for Dynamic Data Exchange), you
    can generate some truly incredible synergy between Windows applications. With Win 3.0/Win 3.1 Microsoft has introduced a concept
    known as OLE (Open Linking and Embedding) which allows you to cut and past from multiple "applets" allowing applications to take advantage of
    the capabilities of other shipped applications.  This allows an applet like an equation editor to manage all the information about formatting
    an equation even when the equation is embedded in a word document. With OLE, you can simply double-click on the object and bring up the
    "agent" that manages it (in my example, the equation editor).

    For application developers, Windows gives developers the ability to develop their applications without knowing anything about the
    underlying hardware of the machine - a windows application that runs on a machine with a CGA adapter will also run on a machine with a graphics
    accelerator that runs in 1024x1024 with 24 bits of color.

    In addition, when you write an application for windows, your application instantly will support literally hundreds of printers
    transparently - Windows does all the work for you.

    To re-iterate, Windows as a stand-alone product is not extraordinarily interesting - there are lots of productivity packages that provide
    similar functionality to users, the real benefit of Windows is the applications that run on it.

    I will also point out that there are more than 5000 Windows applications available today and still more will come out with Win 3.1.
    The available windows applications span all ranges of applications from games (Microsoft's Entertainment pack, Berkley-Soft's After Dark, and
    Sierra's Laffer Utilities for Windows) to Spreadsheets (Microsoft Excel, Lotus 1-2-3) to Word processors (Microsoft Word For Windows,
    Lotus Ami), to Desktop publishing (Aldus Pagemaker, Microsoft Publisher), to presentation graphics (Microsoft Powerpoint), to
    development tools (Microsoft Visual Basic) etc......

    Larry Osterman

    Disclaimer:  The opinions above are my own.  They are not necessarily the same as those of Microsoft.  I only work here.

    Remember that this was written back in 1992 after Windows 3.0 had come out but before Windows 3.1.  There was no Win32, no web browser, no multimedia support, none of the things that we all take for granted in a modern system.  Back then a display card that supported 1Kx1K with 24bit color was considered a monster display card (and hard disks still came in “megabytes” – I remember buying a 2G hard disk back then for about a thousand dollars).

    Reading this again, I find it vaguely funny that in many ways my feelings about Windows haven’t really changed that much in 18 years – the value of the Windows platform is STILL the applications available for that platform (although the number of applications has grown from the 5000 or so back in 1992 to several million applications).

Page 1 of 1 (5 items)