Larry Osterman's WebLog

Confessions of an Old Fogey
  • Larry Osterman's WebLog

    One year ago today (August 2008)…

    • 9 Comments

    I was finishing Windows 7 M3 (the build which eventually was delivered at the PDC).  During M3, I spent most of my time working on the “Ducking” feature.  I was working on my PDC presentation, although the slides I had in August bore almost no resemblance to the slides I eventually presented (I started with 50 some slides and ended up with 23).

     

    At home, I’d replaced all our 100 megabit switches with new gigabit Ethernet switches to boost performance (I was bored one weekend when Valorie and the kids were out of town).  Daniel was attending the pre-college program at Carnegie-Mellon University, and came back at the end of the week. 

     

    And long time readers of my blog know where this particular series is going :).

  • Larry Osterman's WebLog

    6.1.7600.16385

    • 18 Comments

     

    ‘nuf said.

  • Larry Osterman's WebLog

    Tonight’s the night, won’t be just any night.

    • 3 Comments

    Ok, I’m a day late on this one…

     

    Daniel’s summer show opened last night: West Side Story at the Village Theatre Summer Independent.  As usual, the show was amazing.  The young actors playing Tony and Maria (Kyle Anderson and Elise Myette) are extraordinary (especially Elise, her voice is simply astonishing for someone so young).

    Daniel plays Pepe, Bernardo’s number 2 on the Sharks.

     

    The show runs for  5 more performances.  Sunday the 19th at 2:00PM and 7:30PM, Thurs the 23rd through Sunday the 26th at 7:30 and a Saturday Matinee at 2PM on the 25th.

    The show is absolutely amazing.  The Kidstage summer independent productions are entirely done by students aged 20 and younger – everyone on the production is under 20 from the director to the stagehands. 

     

    It’s quite remarkable what they’ve done and the show is well worth seeing.

  • Larry Osterman's WebLog

    Thinking about Windows Build numbers

    • 31 Comments

    There’s been an ongoing thread internally speculating about the windows build number that will be chosen for Windows 7 when it finally ships.  What’s interesting is that we’re even having speculation about the builds being chosen. 

    The Windows version is actually composed of a bunch of different fields, all packed into an OSVERSIONINFO structure.  The relevant parts of the OSVERSIONINFO are:

    • Major Version (dwMajorVersion)
    • Minor Version (dwMinorVersion)
    • Build # (dwBuildNumber)

    The major and minor version numbers are primarily marketing numbers – they’re broad brush fields that the marketing department decides are appropriate for the OS.  For Windows 7, the major and minor versions have been fixed at 6.1 for many months now, but the build numbers change more-or-less daily.

     

    Back to my story… Back in the dark ages when Windows NT was first developed, the rules for build numbers were relatively simple.  Today's build is yesterdays build number + 1.  That’s why Windows NT 3.1 was build number 511, NT3.5 was build 807, NT 3.51 was build 1057, NT 4.0 was build 1381.

    But after NT 4.0, things changed.

    When Brian Valentine moved from the Exchange team to the Windows team, he brought with him a tradition that the Exchange team used – The Exchange build numbers were rounded up to round numbers for major milestones in the product.  So Exchange 4.0’s RTM version was 4.0.837 but Exchange 5.0 started at build 1000 (maybe it was 900, I honestly don’t remember).  For NT, Brian and his team adopted this scheme but used it to ensure that the OS build number was a round number – so WIndows 2000 (the first version of Windows that was shipped with Brian as the lead) it had a (relatively) round version number of 5.0.2195.

    That tradition was continued with Windows XP (5.1.2600) and Vista (6.0.6000).  In the Vista case, it appears that there was some massaging of the numbers to make the build number work out so evenly – this list of build numbers shows that the build numbers jumped from 5825 to 5840 to 5920 to 6000 during the final push – the last few build numbers were incremented by 80 each build with sub-build numbers (QFE number) incrementing by 1 between the builds.

    For Windows 7, we’ve also seen a number of jumps in build numbers.  The PDC build was build 6801, the Beta build was 7000 and the RC build was 7100.  It’ll be interesting to see what the final build number will be (whenever that happens).  I honestly have no idea what the number’s going to be.

  • Larry Osterman's WebLog

    I get still more spam

    • 4 Comments

    This morning I awoke to find the following spam email in my inbox:

    Greetings from Amazon Payments.

    Your bank has contacted us regarding some attempts of charges from your credit card via the Amazon system. We have reasons to believe that you changed your registration information or that someone else has unauthorized access to your Amazon account Due to recent activity, including possible unauthorized listings placed on your account, we will require a second confirmation of your identity with us in order to allow us to investigate this matter further. Your account is not suspended, but if in 48 hours after you receive this message your account is not confirmed we reserve the right to suspend your Amazon registration. If you received this notice and you are not the authorized account holder, please be aware that it is in violation of Amazon policy to represent oneself as another Amazon user. Such action may also be in violation of local, national, and/or international law. Amazon is committed to assist law enforcement with any inquires related to attempts to misappropriate personal information with the intent to commit fraud or theft. Information will be provided at the request of law enforcement agencies to ensure that perpetrators are prosecuted to the full extent of the law.

    To confirm your identity with us click here: <LINK REDACTED>

    After responding to the message, we ask that you allow at least 72 hours for the case to be investigated. Emailing us before that time will result in delays. We apologize in advance for any inconvenience this may cause you and we would like to thank you for your cooperation as we review this matter.

    Thank you for your interest in selling at Amazon.com.

    Amazon.com Customer Help Service

    In many ways this tickled my fancy.  The first paragraph (“Greetings from Amazon Payments”) indicates that it’s directed to one of the Amazon affiliates and I’m not an Amazon affiliate.  if it was directed to customers, it wouldn’t come from Amazon’s Payments department, instead it would come from some other department (maybe Amazon billing?).

    But they immediately discuss “attempts of charges from your credit card” (let’s ignore the fractured English, it’s a phishing email so you sort-of expect crappy English).  If I’m an affiliate, why would Amazon be charging my credit card?

    They then go on and indicate that if this isn’t resolved right away they’ll cancel my Amazon account – very scary.  In fact the risk is so severe, they’re going to ask that I provide a second confirmation of my identity.  And Amazon is going to be totally helpful in ensuring that law enforcement is notified of the charges.  How very helpful of them.

     

    But what made this email stand out to me is the next to last paragraph.  The one where they say:

    “…we ask that you allow at least 72 hours for the case to be investigated. Emailing us before that time will result in delays.”

    To paraphrase that fragment: “we figure it’s going to take us at least 3 days to clean out your credit card and get away.  So please don’t bother us before then.”

     

     

     

    Somewhat OT: On a more serious note, a friend of the family recently had her email account hacked (we don’t know how it happened but it did).  The criminals who did this then proceed to send fraudulent emails to all the contacts in her address book asking for money.  The good news is that she complained to the Live Mail folks about it and they were able to reclaim the account for her within 24 hours, so hopefully the damage is minimal.  And she’s gone out and changed all her online passwords in case they figured out those passwords while they had access to her email.  Live email also has an excellent “what to do when you think your account’s been stolen” resource which lays out the various options available when this happens.  The local police department also pointed her to the FBI’s Internet Crime Complaint Center, it’s not clear if engaging them will make a difference (especially if the crooks are international) but it’s something.

  • Larry Osterman's WebLog

    Thinking about Last Checkin Chicken

    • 8 Comments

    Raymond Chen’s post today started me thinking about “Last Check-in Chicken” again.  Back in the says when we were close to shipping Windows Vista, I wrote about ”Last Check-in Chicken”.  What I didn’t mention was who ultimately won the game for Windows Vista.

    It turns out that the very last change to Windows Vista was actually made by one of the developers on the sound team.

     

    When you reach the last few days of a project, the bar for taking changes is insanely high – the teams which approve changes to the product get increasingly more conservative about taking changes – every change taken is an opportunity for regression and resets some amount of the testing which has gone before.  So the number of bugs that are accepted towards the end of a product gets smaller and smaller. You can think of the ability to take bugs as a series of ever increasingly high barriers – it starts fairly low – just about any bug fix will be accepted into the tree.  This is the normal state during most of product development.  As time goes on and the team gets closer to shipping, the bug bar gets raised and the bugs that are considered are only those that are going to affect customers directly (as opposed to those bugs found during testing won’t necessarily be encountered by customers).  Then the bar gets raised again (and again, and again) until eventually it gets to the point where the only bugs that are accepted are “recall class” bugs[1].

    The idea behind a “recall class bug” is that it’s is a bug that is so bad that we’d be willing to call the manufacturer and pull the product off the assembly line (at a cost of millions of dollars) to fix.  These are the worst-of-the-worst bugs, and typically involve major scenarios not working.   When the bug bar is at “recall class only”, there are typically only two or three bugs that are considered each day across all of Windows and even then most of the bugs brought up to the triage team aren’t accepted.

    At some point the bug bar gets beyond even “recall class only” – this is when you’re REALLY close to being done (typically the last two or three days of a product).  Normally builds of the product are done daily because there are one or two “recall class” bugs still being accepted.  But eventually all those bugs are fixed and the build team stops doing daily builds because there have been no changes since the previous build.  The test team is hard at work doing it’s final sign-off of the bits and everybody is on tenterhooks waiting for the final build to come out.  When you’re at this stage of the product, every once in a while a change comes in that would be really nice to have because it fixes a critical issue with an important scenario, but it’s just just not important enough to justify cracking open the bits to take the change.  Raymond calls these type of changes “Remora Check-ins”.   The idea is that if another bug was discovered during the final testing phase that forced us to rebuild the system, we would take these “Remora Check-ins” along for the ride.

    In our case, the change we made was a Remora check-in – it was an important bug, but it wasn’t important enough to justify resetting the final test pass.  But someone else’s component had a critical bug that HAD to be fixed and our change came along for the ride (and no, I don’t remember exactly what either of the changes were, I just know that our check-in was chronologically the last one made).

     

    Nitpickers corner: None of the information in this post should be particularly controversial – much of what I’ve described here is software engineering 101.  There’s always a bar for taking bug fixes in every product – if there weren’t, you’d never ship the product (for example, the Mozilla Foundation shipped Firefox version 3.5 today (congrats!) and they still have several dozens of critical bugs active in their database – I’m sure that these are all bugs that didn’t meet their bug bar).  Heck, there’s even a book that’s all about the process of shipping NT 3.1 that covers much of this information.

     

    ----

    [1] In the past these bugs would be called “Show Stoppers”.

  • Larry Osterman's WebLog

    What’s wrong with this code, part 26 – the answer

    • 6 Comments

    Yesterday I posted a code snippet from inside a real piece of code inside the client side of a client/server utility in a Microsoft product.

    static DWORD WINAPI _PlayBeep(__in void* pv)
    {
        UNREFERENCED_PARAMETER(pv);
        PlaySound(L".Default", NULL, SND_SYNC | SND_ALIAS);
        return 0;
    }
    
    LRESULT WndProc(...)
    {
        :
        :
        case WM_KEYDOWN:
            if (!_AcceptInputKeys(wParam, lParam))
            {
                QueueUserWorkItem(_PlayBeep, NULL, 0);
            }
            break;
    }

    The bug here was that the call to PlaySound is synchronous.  Like many Win32 APIs, PlaySound is made thread-safe by taking a critical section around the code that does the work inside the call to PlaySound (since the semantics of PlaySound are that only one PlaySound call plays at any time, this is a reasonable thing to do).  Because the call to PlaySound specifies SND_SYNC, that processing is done on the thread that calls PlaySound and thus blocks the _PlayBeep call until the sound completes.  The “.Default” sound is approximately 1 second long, so the call to _PlayBeep takes about a second to complete.

    Unfortunately keyboards repeat at a lot faster than 1 keystroke/second and each keystroke causes a call to QueueUserWorkItem.  Because there’s no thread available to process the request, the thread pool logic creates a new thread to process the next work item for each keystroke.  This quickly backs up and eventually you run into the 500 thread limit in the default process threadpool.  So you’re going to have the machine play “Ding’s” for hours before this cleans itself up.

    But in this case the effects were was even worse. 

    Remember when I told you that it was a client/server utility?  In this case, it meant that it used async RPC to communicate with the server.  And async RPC uses the default thread pool for on certain applications.  When the response to a server request came in, there were no RPC threads running so RPC tried to create a thread in the default threadpool.  Which failed because the default threadpool was full.

    So the client/server utility stopped working.  It took about 15 seconds of hammering on invalid keys to get the utility into this state, NOT pretty.

     

    The first fix was to change the WM_KEYDOWN to WM_KEYUP (so that you actually had to release the keys instead of letting the typematic feature repeat for you).  That didn’t work because it still could be reproduced, it just took longer.

    The final fix was:

    static DWORD WINAPI _PlayBeep()
    {
        PlaySound(L".Default", NULL, SND_ASYNC | SND_ALIAS);
        return 0;
    }
    
    LRESULT WndProc(...)
    {
        :
        :
        case WM_KEYDOWN:
            if (!_AcceptInputKeys(wParam, lParam))
            {
                _PlayBeep();
            }
            break;
    }

    So instead of queueing a work item, the team changed the code to simply call PlaySound with SND_ASYNC and they were done.  When you specify SND_ASYNC, the PlaySound call queues the request to an internal-to-playsound worker thread, and it cancels the old sound before it starts playing a new sound (thus the sounds don’t back up).

     

    The object lesson here is: Don’t queue long running items to the thread pool because it can lead to unexpected results.  And even a 1 second “Ding” can count as “long running” if it can be queued rapidly enough.

  • Larry Osterman's WebLog

    What’s wrong with this code, part 26 – a real-world example

    • 28 Comments

    This is an example of a real-world bug that was recently fixed in an unreleased Microsoft product.  I was told about the bug because it involved the PlaySound API (and thus they asked me to code review the fix), but it could happen with any application.

    static DWORD WINAPI _PlayBeep(__in void* pv)
    {
        UNREFERENCED_PARAMETER(pv);
        PlaySound(L".Default"NULL, SND_SYNC | SND_ALIAS);
        return 0;
    }
    
    LRESULT WndProc(...)
    {
        :
        :
        case WM_KEYDOWN:
            if (!_AcceptInputKeys(wParam, lParam))
            {
                QueueUserWorkItem(_PlayBeep, NULL, 0);
            }
            break;
    }

     

    This is actual code from inside the client side of a client/server component in Windows that was attempting to “beep” on invalid input (I’ve changed the code slightly to hide the actual origin and undoubtedly introduced issues).  And it has a whopper of a bug in it.

    Given the simplicity of the code above, to get the answer right, it’s not enough to say what’s wrong with the code (the problem should be blindingly obvious).  You also need to be able to explain why this is so bad (in other words, what breaks when you do this).

     

    Bonus points if you can identify the fix that was eventually applied.

  • Larry Osterman's WebLog

    Windows 7 fixes the PlaySound(XXX, SND_MEMORY|SND_ASYNC) anti-pattern

    • 0 Comments

    A number of times in the past, I’ve mentioned that the PlaySound(xxx, xxx, SND_MEMORY|SND_ASYNC) pattern is almost always a bad idea.  After the last wave of crash dumps were received for this problem, our team decided to do something about it.  Starting with Windows 7, if you call PlaySound with SND_MEMORY|SND_ASYNC, instead of relying on the memory passed in by the application, we allocate our own buffer for the sound file on the heap and copy the file into that buffer.  We’ll only do it for WAV files that are smaller than 2M in size, and if the allocation of the buffer fails, we fall back on the original code path, but it should dramatically reduce the number of apps that crash while using this pattern.

    It’s a little thing, but it should make life much easier for those applications.

  • Larry Osterman's WebLog

    Good News! strlen isn’t a banned API after all.

    • 6 Comments

    We were doing some code reviews on the new Win7 SDK samples the other day and one of the code reviewers noticed that the code used wcslen to compute the length of a string.

    He pointed out that the SDL Banned API page calls out strlen/wcslen as being banned APIs:

    For critical functions, such as those accepting anonymous Internet connections, strlen must also be replaced:

    Table 19. Banned string length functions and replacements

    Banned APIs StrSafe Replacement Safe CRT Replacement
    strlen, wcslen, _mbslen, _mbstrlen, StrLen, lstrlen String*Length strnlen_s

    I was quite surprised to see this, since I’m not aware of any issues where the use of strlen/wcslen could cause security bugs.

     

    I asked Michael Howard about this and his response was that Table 19 has a typo – the word “server” is missing in the text, it should be “For critical server functions, such as those accepting anonymous Internet connections, strlen must also be replaced”. 

    Adding that one word makes all the difference.  And it makes sense – if you’re a server and accepting anonymous data over the internet, an attacker could cause you to crash by issuing a non null terminated string that was long enough – banning the API forces the developer to think about the length of the string.

    Somewhat OT, but I also think that the table is poorly formatted – the “For critical…” text should be AFTER the table title – the way the text is written, it appears to be a part of the previous section instead of being attached as explanatory text on Table 19 (but that’s just the editor in me).

     

    Apparently in SDL v5.0 (which hasn’t yet shipped) the *len functions are removed from the banned API list entirely.

  • Larry Osterman's WebLog

    What was Valorie doing last weekend? Competing at the Sweet Adelines Region 13 competition…

    • 6 Comments

    And her performance video just got posted to YouTube…

     

    They came in 8th out of 15 in the competition and won best novice quartet. 

    And they rocked :).

  • Larry Osterman's WebLog

    Well Hello Daniel!

    • 3 Comments

    Gaarggghh.  I can't believe I didn't write about this.

     

    Tonight is the first night of previews for Daniel's first show on a professional stage.  He's appearing as a performance intern at the 5th Avenue Theatre's production of "Hello Dolly!" starring Jenifer Lewis and Pat Cashman.

     

    Daniel has been working his butt off for the past 3 weeks rehearsing every day for the show (12 hours a day for the past week) and tonight the curtain rises for the first time on the production.

    As a performance intern, he is a member of the ensemble.  Most of the time he sits backstage in a booth with the other interns providing vocal support for the cast but he IS on stage for one big number (As the Parade Passes By).

     

    Valorie and I aren't seeing the show until Thursday evening (the official opening night) but I can't wait to see it.  I've been hearing so many great things about this production (especially the Waiters Gallop scene in the Harmonia Gardens - Daniel says it's completely insane) so I'm really looking forward to it.

     

    Ok, enough gushing.  Come see the show, it should be great!

  • Larry Osterman's WebLog

    Delay Load is not a good way to check for functionality

    • 15 Comments

    On my previous post, Koro made the following comment:

    “Don't ever check windows versions.  Instead check for functionality being present or not."

    You can't always do that.

    Do I want to add a __try/__except to catch delay-load exceptions around every UxTheme call or just do:

    g_bTheme=(g_bWinNT&&(g_nWinVer>0x00050001));

    Then check that flag before calling OpenThemeData?

    In some other cases too (all the Crypt Hash functions - trying to compute an MD5) the functions is documented as working fine in Win98 but it just fails - there is no way to know except of checking the version beforewards.

    At least, as implied earlier, I just pack the Windows version in a DWORD at program startup to avoid nasty version comparision errors.

    IMHO Koro’s misusing the delayload functionality.

     

    DelayLoad is primarily a performance tool – when you DelayLoad a function, a tiny stub for the DelayLoad function is inserted into your application which calls LoadLibrary/GetProcAddress on the function.  That then means that when your application is launched, the loader doesn’t resolve references to the DelayLoaded function and thus your application will launch faster.

    For example many components in the OS delay load WINMM.DLL because all they use in WINMM is the PlaySound API, and even then they only use it on relatively rare circumstances.  By delayloading WINMM.DLL they avoid having the performance penalty of having WINMM.DLL loaded into their address until it’s needed.

     

    As Koro mentioned, DelayLoad can also be used as a mechanism to check to see if a particular piece of OS functionality is present, but the challenge is that now you need to wrap every API call with an exception handler (or you need to specify a delay load handler that provides a more reasonable default behavior).  Personally I wouldn’t do that – instead I’d manually call LoadLibrary/GetProcAddress to load the required functions because it allows you to have complete control over when you access over your error handling.  It also allows you to avoid using structured exception handling (which should be avoided if at all possible).

     

    If you DO have need to use DelayLoad as a functionality check, you could try this trick (which works only for Koro’s problem).  Instead of wrapping all the theme API calls with SEH, you just add code to your app like this (I haven’t compiled this code, it’s just an example):

    BOOL g_EnableThemes = FALSE;
    __try
    {
        g_EnableThemes = IsThemeActive();
    }
    __except(<Your Exception Filter>)
    {
    }
    if (g_EnableThemes)

    {
    g_ThemeHandle = OpenThemeData(…)





    }

    In other words check for functionality being enabled once with an exception handler and later on use just the EnableThemes global variable to key off the behavior.

    But this doesn’t change the fact that (IMHO) you’re abusing the DelayLoad functionality and using it as a versioning mechanism.

  • Larry Osterman's WebLog

    Checking file versions is surprisingly hard.

    • 28 Comments

    I was wandering around the web the other day and ran into this post.  In general I don’t have many issues with the post, until you get to the bottom of the article.  The author mentions that his code only runs on Win7 or newer so he helpfully included a check to make sure that his code only runs on WIn7:

    // Example in C#.
    
    internal bool SupportsTaskProgress() {
        if (System.Environment.OSVersion.Version.Major >= 6) {
            if (System.Environment.OSVersion.Version.Minor >= 1) {
                return true;
            }
        }
        return false;
    }

    This is a great example of why it’s so hard to write code that checks for versions.  The problem here is that this code is highly likely to fail to work on the next version of Windows (or whenever Windows 7.0 is released).  In that case SupportsTaskProgress will incorrectly return false.

     

    Personally I wouldn’t even bother writing the SupportsTaskProgress function this way.  Instead I’d check for the “new TaskbarLib.TaskbarList()” call to return NULL and assume that if it returned NULL the API call wasn’t supported (the non COM interop equivalent would be to check for a failure on the call to CoCreateInstance).  That way the code would work even if (for some obscure reason) the taskbar logic was ported to a previous OS version.

     

    If I simply HAD to keep the SupportsTaskProgress function, I’d rewrite it as:

    // Example in C#.
    
    internal bool SupportsTaskProgress() {
        if (System.Environment.OSVersion.Version.Major >= 6) {
            if (System.Environment.OSVersion.Version.Major == 6) {
    if (System.Environment.OSVersion.Version.Minor >= 1) { return true;
    }
    return false; }
    return true; } return false; }

    That way it would only check for minor version being greater than 1 if the major version is 6.  I suspect that this code could be tightened up further as well.

     

    This is a part of the reason that picking a version number for the OS is so complicated.

  • Larry Osterman's WebLog

    PlaySound(xxx, SND_MEMORY | SND_ASYNC) is almost always a bad idea.

    • 0 Comments

    Whenever you submit a crash report to OCA, a bug gets filed in the relevant product database and gets automatically assigned to the developer responsible for the code.  I had a crashing bug in the PlaySound API assigned to me. 

     

    In this case, the call was crashing deep inside of the waveOutOpen API and it was crashing because the input WAVEFORMATEX structure was bogus.  The strange thing is that the PlaySound API does some fairly thorough validation of the input WAVEFORMATEX read from the .WAV file and that validation had to have passed to get to the call to waveOutOpen.

    I looked a bit deeper and came to the realization that every single one of the crashes (in maybe a dozen different applications) had specified SND_MEMORY | SND_ASYNC in their call to PlaySound.

    I’ve talked about that particular combination before in my blog, but I wanted to call it out in a top level post in the hopes that people will stop making this common mistake.

    When you call PlaySound with the SND_MEMORY flag, it tells the PlaySound API that instead of reading the audio data from a file, you’re passing in a pointer to memory which holds the wave contents for you.  That’s not controversial, and can be quite handy if (for instance) you want to build a .WAV file in memory instead of calling the wave APIs directly.

    When you call PlaySound with the SND_ASYNC flag, that flag tells PlaySound that instead of blocking until the sound has finished playing, the API should return immediately instead of blocking while the sound is played.

     

    Neither of these flags is controversial and neither of them is particularly dangerous until you combine the two together.

    The problem is that there’s really no way of knowing when the sound has finished playing and thus when the application frees the memory, it’s entirely possible that the PlaySound API is still using it.  That means that if you ever call PlaySound with both of these flags, you stand a very high chance of crashing due to the combination of these behaviors.

     

    The unfortunate thing is that this behavior has existed since the SND_MEMORY flag was added back in Windows 3.1.  The only safe way of dealing with this that works on all current Windows operating systems is to call PlaySound(NULL, 0, 0) before freeing the memory – the call to PlaySound(NULL, 0, 0) will block until the currently playing sound has completed playing (or abort the playsound if it hasn’t started yet).

  • Larry Osterman's WebLog

    Everyone wants a shiny new UI

    • 55 Comments

    Surfing around the web, I often run into web sites that contain critiques of various aspects of Windows UI.

    One of the most common criticisms on those sites is "old style" dialogs.  In other words, dialogs that don't have the most up-to-date theming.  Here's an example I ran into earlier today:

    AutoComplete

    Windows has a fair number of dialogs like this - they're often fairly old dialogs that were written before new theming elements were added (or contain animations that predate newer theming options).  They all work correctly but they're just ... old.

    Usually the web site wants the Windows team update the dialog to match the newest styling's because the dialog is "wrong".

    Whenever someone asks (or more often insists) that the Windows team update their particular old dialog, I sometimes want to turn around and ask them a question:

    "You get to choose: You can get this dialog fixed OR you can cut a feature from Windows, you can't get both.  Which feature in Windows would you cut to change this dialog?"

    Perhaps an automotive analogy would help explain my rather intemperate reaction:

    One of the roads near my house is a cement road and the road is starting to develop a fair number of cracks in it.  The folks living near the road got upset at the condition of the road and started a petition drive to get the county to repair the road.  Their petition worked and county came out a couple of weeks later and inspected the road and rendered their verdict on the repair (paraphrasing):  We've looked at the road surface and it is 60% degraded.  The threshold for immediate repairs on county roads is 80% degradation.  Your road was built 30 years ago and cement roads in this area have a 40 year expected lifespan.  Since the road doesn't meet our threshold for immediate repair and it hasn't met the end of its lifespan, we can't justify moving this section of road up ahead of the hundreds of other sections of road that need immediate repair.

    In other words, the county had a limited budget for road repairs and there were a lot of other sections of road in the county that were in a lot worse shape than the one near my house.

    The same thing happens in Windows - there are thousands of features in Windows and a limited number of developers who can change those features.   Changing a dialog does not happen for free.  It takes time for the developers to fix UI bugs.  As an example, I just checked in a fix for a particularly tricky UI bug.  I started working on that fix in early October and it's now January.

    Remember, this dialog works just fine, it's just a visual inconsistency.  But it's going to take a developer some amount of time to fix the dialog.  Maybe it's only one day.  Maybe it's a week.  Maybe the fix requires coordination between multiple people (for example, changing an icon usually requires the time of both a developer AND a graphic designer).  That time could be spent working on fixing other bugs.  Every feature team goes through a triage process on incoming bugs to decide which bugs they should fix.  They make choices based on their limited budget (there are n developers on the team, there are m bugs to fix, each bug takes t time to fix on average, that means we need to fix (m*t)/n bugs before we can ship).

    Fixing theming bug like this takes time that could be spent fixing other bugs.  And (as I've said before) the dialog does work correctly, it's just outdated.

    So again I come back to the question: "Is fixing a working but ugly dialog really more important than all the other bugs?"  It's unfortunate but you have to make a choice.

     

    PS: Just because we have to make choices like this doesn't mean that you shouldn't send feedback like this.   Just like the neighbors complaining to the county about the road, it helps to let the relevant team know about the issue. Feedback like this is invaluable for the Windows team (that's what the "Send Feedback" link is there for after all).  Even if the team decides not to fix a particular bug in this release it doesn't mean that it won't be fixed in the next release.

  • Larry Osterman's WebLog

    Fixing an accessibility bug with the trackbar common control

    • 9 Comments

    The trackbar common control is a strange beast. 

    The trackbar can be oriented either horizontally or vertically.  On LTR language machines, when the trackbar is horizontal, it works much as you’d expect it to: The minimum value of the trackbar is on the left, the maximum value is on the right (it’s reversed for RTL languages so it works consistently).

    When the trackbar is vertical, it’s a different beast entirely.  For whatever reason, the trackbar control designers set the minimum value of the trackbar at the top, the maximum value at the bottom.  If you think about how the trackbar designers actually implemented the trackbar this makes sense.  If you orient the trackbar so that the minimum value is at the top, then when you need to draw the trackbar you can use the same drawing code to draw the horizontal trackbar – you just swap the X and Y axis (I have absolutely no idea if that’s how it works internally, it just seems to make sense that way).

    While this works great for the implementer, for the consumer of the trackbar, it’s a pain in the neck.  That’s because the users who interact with the control expect the maximum value to be at the top of the control, not the bottom.

    As a result, when you look at code that uses the trackbar control, you end up seeing a lot of:

    LRESULT MyDialog::OnNeedTTText(int idCtrl, LPNMHDR pnmh)
    {
        LPTOOLTIPTEXT pTT = (LPTOOLTIPTEXT)pnmh;
        if (idCtrl == IDD_TRACKBAR)
        {
            int nPos = (int)SendMessage(TBM_GETPOS, 0, 0);
    
            StringCchPrintf(pTT->szText, ARRAYSIZE(pTT->szText), TEXT("%d"), 100 - nPos);
        }
        return 0;
    }

    In other words, retrieve the position from the trackbar, convert it from 0..100 to 100..0 and return that as a tooltip text.

    All of this works great – you have a lot of 100 - <n> scattered throughout your code, but it’s not the end of the world.  And then one day a tester comes to you and says that when he uses the narrator tool to read the contents of your tool, it reports that the value reported by the control is wrong – when the slider’s at the top (tooltip 100), it reports that the value is 0, when it’s at the 1/4 point (tooltip 75), it reports that the value is 25.

    Crud.  At this point many developers start scratching their heads and start thinking about subclassing the trackbar to replace the reported position.  However that’s way more work than they need to do.

    It turns out that the designers of the trackbar control thought of this problem.  If you’re using version 5.8 or higher of the common controls, you can specify the TBS_REVERSED trackbar control style to your trackbar.  If you do, the visuals of the trackbar are unchanged as is the trackbar functionality, but the Microsoft accessibility framework will look for the presence of the TBS_REVERSED style and if it is found, it assumes that the control is “backwards” and it reports the position for the toolbar as if the maximum and minimum values were reversed.

    And no, I didn’t know about this before today.  But it was too good a trick not to share.

  • Larry Osterman's WebLog

    When you do UX work, sometimes you have to worry about the strangest things…

    • 31 Comments

    I recently got a bug reported to me about the visuals in the sound control panel applet not being aligned properly (this is from the UI for a new Windows 7 feature):

    image

    The problem as reported was that the microphone was aligned incorrectly w.r.t. the down arrow. – the microphone was too far to the right.

     

    But if you look carefully, you’ll see that that isn’t the case – drawing a box around the controls makes it clearer:

    imageNitpickers Corner: For those of you that love to count pixels, it’s entirely possible that the arrow might be off by a couple of pixels but fixing it wouldn’t fix the problem, because then the arrow would be off-center with respect to the speakers.  The real problem is that the microphone icon is visually weighted to the right – the actual icon resource was lined up with the arrow, but because the visual weight was to the right, it displayed poorly. 

     

    It turns out that there’s really no good way of fixing this – if we were to adjust the location of the icons, it wouldn’t help, because a different device would have a different visual center (as the speaker icon does)…

     

    Instead, we looked at the visuals and realized that there was an alternative solution: Adjust the layout for the dialog and the problem more-or-less goes away:

    newmonitor

    The problem still exists at some level because the arrow is centered with the icons but some icons (like the stalk microphone above) are bottom heavy.  But for whatever reason, the visuals aren’t as disconcerting when laid out horizontally.

    As I said in the title – sometimes you need to worry about the strangest things.

  • Larry Osterman's WebLog

    XKCD tries Windows 7 and loves it!

    • 18 Comments

     

     

    Well I thought it was cute :).

  • Larry Osterman's WebLog

    Why do people think that a server SKU works well as a general purpose operating system?

    • 70 Comments

    Sometimes the expectations of our customers mystify me.

     

    One of the senior developers at Microsoft recently complained that the audio quality on his machine (running Windows Server 2008) was poor.

    To me, it’s not surprising.  Server SKUs are tuned for high performance in server scenarios, they’re not configured for desktop scenarios.  That’s the entire POINT of having a server SKU – one of the major differences between server SKUs and client SKUs is that the client SKUs are tuned to balance the OS in favor of foreground responsiveness and the server SKUs are tuned in favor of background responsiveness (after all, its a server, there’s usually nobody sitting at the console, so there’s no point in optimizing for the console).

     

    In this particular case, the documentation for the MMCSS service describes a large part of the root cause for the problem:  The MMCSS service (which is the service that provides glitch resilient services for Windows multimedia applications) is essentially disabled on server SKUs.  It’s just one of probably hundreds of other settings that are tweaked in favor of server responsiveness on server SKUs. 

     

    Apparently we’ve got a bunch of support requests coming in from customers who are running server SKUs on their desktop and are upset that audio quality is poor.  And this mystifies me.  It’s a server operating system – if you want client operating system performance, use a client operating system.

     

     

    PS: To change the MMCSS tuning options, you should follow the suggestions from the MSDN article I linked to above:

    The MMCSS settings are stored in the following registry key:

    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Multimedia\SystemProfile

    This key contains a REG_DWORD value named SystemResponsiveness that determines the percentage of CPU resources that should be guaranteed to low-priority tasks. For example, if this value is 20, then 20% of CPU resources are reserved for low-priority tasks. Note that values that are not evenly divisible by 10 are rounded up to the nearest multiple of 10. A value of 0 is also treated as 10.

    For Vista, this value is set to 20, for Server 2008 the value is set to 100 (which disables MMCSS).

  • Larry Osterman's WebLog

    Fixing a customer problem: “No Audio Device is Installed” when launching sndvol on Windows Vista

    • 18 Comments

    Yesterday someone forwarded me an email from one of our DirectShow MVPs – he was having problems playing audio on his Windows Vista machine.

     

    Fortunately David (the MVP) had done most of the diagnostic work – the symptoms he saw were that he was receiving a “No Audio Device is Installed” error launching sndvol (and other assorted problems). 

    David tried the usual things (confirming that the driver for his audio solution was correctly installed (this probably fixes 99% of the problems)).  He also tried reinstalling the driver to no avail.

    He next ran the Sysinternals Process Monitor tool to see what was going on.  He very quickly found the following line in the output from process monitor:

    "RegOpenKey", "HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\MMDevices\Audio\Render\{e4ee1234-fc70-4925-94e9-4117395f7995}", "ACCESS DENIED", "Desired Access: Write"

    With that information, he looked for the ACL on that registry key:

    clip_image002

    He then looked at the configuration for the Windows Audio service:

    image

    Woah – the Windows Audio service doesn’t have access rights to that registry key – the Windows Audio service is running as LocalService and the LocalService account doesn’t have any access to the registry key.

    At this point he decided to contact Microsoft with his problem.

    I looked at his info and quickly realized that the problem was that somehow the ACL on the registry key had been corrupted: something had removed the entries for the audio services.  On a normal Windows Vista installation this registry key’s ACL should look something like:

    endpointpermissions

    Something that ran on David’s machines went in and reset the permissions for this registry key to the ACL that is on the root node of the HKEY_LOCAL_MACHINE\Software registry hive.  I have no idea what did this, but messing with the ACLs on the registry is a known cause of various compatibility problems.  That’s why Microsoft KB 885409 has such strong warnings about why it’s important to not apply blind modifications to files or registry keys in Windows.  It’s unfortunate, but the warnings in the KB articles that say that modifying registry keys or permissions can cause your machine to malfunction are absolutely right – it’s not hard to make modifications to registry keys that can really screw up a machine, if you make the right ones.  From the KB article:

    For example, modifications to registry ACLs affect large parts of the registry hives and may cause systems to no longer function as expected. Modifying the ACLs on single registry keys poses less of a problem to many systems. However, we recommend that you carefully consider and test these changes before you implement them. Again, we can only guarantee that you can return to the recommended out-of-the-box settings if you reformat and reinstall the operating system.

    The good news is that it should be relatively simple to fix David’s problem – As far as I know, he has two options.  The first is to reinstall Windows Vista – that should reset the ACLs on the property key to their default values (because it will recreate the property keys), which should resolve the problem.

    The second solution is to add an ACL to the registry keys under the MMDevices registry key to allow the LocalService account to have permissions to modify this registry key.

  • Larry Osterman's WebLog

    "The Story about Ping", someone else's book review

    • 4 Comments

    I was looking up old favorite children's books on Amazon the other day and I ran into one of my childhood favorites: The Story about Ping, a family about ducks living on the Yangtze river in China...

    I eventually scrolled down and noticed the first review of the book by John E. Fracisco:

    Ping! I love that duck!, January 25, 2000

    By 
    John E. Fracisco (El Segundo, CA USA)

    This review is from: The Story about Ping (Hardcover)

    PING! The magic duck!

    Using deft allegory, the authors have provided an insightful and intuitive explanation of one of Unix's most venerable networking utilities. Even more stunning is that they were clearly working with a very early beta of the program, as their book first appeared in 1933, years (decades!) before the operating system and network infrastructure were finalized.

    The book describes networking in terms even a child could understand, choosing to anthropomorphize the underlying packet structure. The ping packet is described as a duck, who, with other packets (more ducks), spends a certain period of time on the host machine (the wise-eyed boat). At the same time each day (I suspect this is scheduled under cron), the little packets (ducks) exit the host (boat) by way of a bridge (a bridge). From the bridge, the packets travel onto the internet (here embodied by the Yangtze River).

    The title character -- er, packet, is called Ping. Ping meanders around the river before being received by another host (another boat). He spends a brief time on the other boat, but eventually returns to his original host machine (the wise-eyed boat) somewhat the worse for wear.

    If you need a good, high-level overview of the ping utility, this is the book. I can't recommend it for most managers, as the technical aspects may be too overwhelming and the basic concepts too daunting.

    Problems With This Book

    As good as it is, The Story About Ping is not without its faults. There is no index, and though the ping(8) man pages cover the command line options well enough, some review of them seems to be in order. Likewise, in a book solely about Ping, I would have expected a more detailed overview of the ICMP packet structure.

    But even with these problems, The Story About Ping has earned a place on my bookshelf, right between Stevens' Advanced Programming in the Unix Environment, and my dog-eared copy of Dante's seminal work on MS Windows, Inferno. Who can read that passage on the Windows API ("Obscure, profound it was, and nebulous, So that by fixing on its depths my sight -- Nothing whatever I discerned therein."), without shaking their head with deep understanding. But I digress.

    'nuf said :).

     

    Edit1: It appears that this review might not be original: http://ftp.arl.mil/~mike/ping.html

    Edit 2: I just realized that this is 2 "Duck" posts in a row.  I guess I've been living that feature too much :)

     

  • Larry Osterman's WebLog

    The ducking whitepaper is now online

    • 12 Comments

    I just got an email indicating that the powers that be have just published the Ducking (automatic volume adjustment for communications applications) whitepaper that I mentioned during my PDC Talk.    It's the whitepaper entitled "Stream Attenuation in Windows 7".  This is a preliminary version of the final ducking documentation and should be good enough to help anyone wanting to work with the ducking feature get started.  It also covers some of the subtleties that I wasn’t able to cover in my talk.

    Enjoy!

     

    EDIT: Added which paper it was and a definition of "Ducking".

     

  • Larry Osterman's WebLog

    I get more spam :)

    • 11 Comments

    I just received this phishing letter, I liked it simply because it was so remarkably brazen:

    --

    Dear Webmail User,

    This message was sent automatically by a program on Webmail which periodically checks the size of inbox, where new messages are received. The program is run weekly to ensure no one's inbox grows too large. If your inbox becomes too large, you will be unable to receive new email.

    Just before this message was sent, you had 18 Megabytes (MB) or more of messages stored in your inbox on Webmail. To help us re-set your SPACE on our database prior to maintain our INBOX, you must reply

    to this e-mail and enter your Current UserID: ( ) and

    Password ( ) Select server ( ) if any

    You will continue to receive this warning message periodically if your

    inbox size continues to be between 18 and 20 MB. If your inbox size grows

    to 20 MB, then a program on Webmail will move your oldest

    email to a folder in your home directory to ensure that you will

    continue to be able to receive incoming email. You will be notified by email

    that this has taken place. If your inbox grows to 25 MB, you will be unable to

    receive new email as it will be returned to the sender.After you read a

    message, it is best to REPLY and SAVE it to another folder.

    Thank you for your cooperation.

    Webmail Help Desk

    ---------------------------------------------------------------------------

    3webXS HiSpeed Dial-up...surf up to 5x faster than regular dial-up alone...

    just $14.90/mo...visit www.get3web.com for details

     

    The email was in plain text from “Webmail Service Support [general@3web.net]” (I don’t feel bad about including their real email address on a post on the web, after all they deserve to get spam, right?

     

    As I said, I thought it was remarkably brazen and very low budget.  Why bother trying to set up a domain when you can get the victim to send you their credentials by email :).

  • Larry Osterman's WebLog

    What’s wrong with this code, part 25 – the answers

    • 3 Comments

    Yesterday I described a very real bug in some of the Windows UI.

    CControlLayout::CControlLayout(const HWND hWndControl, const HWND hWndDlg)
        : m_hWnd(hWndControl)
        , m_hWndDlg(hWndDlg)
    {
        // Get the parent (dialog) rect, and the control rect
        ::GetClientRect(m_hWndDlg, &m_rcRefDlg);
        ::GetWindowRect(m_hWnd, &m_rcRef);
        ScreenToClientRect(hWndDlg, m_rcRef);
    }
    void ScreenToClientRect(/* [in] */ const HWND hWndClient,                         /* [in/out] */ RECT &rcInOut)
    {
        CPoint ptTopLeft(rcInOut.left, rcInOut.top); 
        CPoint ptBottomRight(rcInOut.right, rcInOut.bottom); 
        ::ScreenToClient(hWndClient, &ptTopLeft); 
        ::ScreenToClient(hWndClient, &ptBottomRight); 
        rcInOut.left = ptTopLeft.x; 
        rcInOut.top = ptTopLeft.y; 
        rcInOut.right = ptBottomRight.x; 
        rcInOut.bottom = ptBottomRight.y;
    }

    And as David Gladfelter pointed out, the root cause of the problem is that the routine calls ScreenToClient.  This works just fine when you’re running on Left-to-Right builds of Windows, but on Right-to-Left languages (Arabic, Hebrew, etc), this code sets the rcInOut.left to the wrong location.

    It turns out that MSDN has a warning that is explicitly about this kind of problem:

    For example, applications often use code similar to the following to position a control in a window:

    Copy Code

    // DO NOT USE THIS IF APPLICATION MIRRORS THE WINDOW
    
    // get coordinates of the window in screen coordinates
    GetWindowRect(hControl, (LPRECT) &rControlRect);  
    
    // map screen coordinates to client coordinates in dialog
    ScreenToClient(hDialog, (LPPOINT) &rControlRect.left); 
    ScreenToClient(hDialog, (LPPOINT) &rControlRect.right);

    This causes problems in mirroring because the left edge of the rectangle becomes the right edge in a mirrored window, and vice versa. To avoid this problem, replace the ScreenToClient calls with a call to MapWindowPoints as follows:

    Copy Code

    // USE THIS FOR MIRRORING
    
    GetWindowRect(hControl, (LPRECT) &rControlRect);
    MapWindowPoints(NULL, hDialog, (LPPOINT) &rControlRect, 2)

    It turns out that this is explicitly the mistake that was made in the code.  The good news is that the “Use this for mirroring” code listed in the article is exactly the fix necessary to solve this problem.

     

    As I mentioned, David Gladfelter was the first person to pick up the problem, kudos to him!

Page 3 of 33 (815 items) 12345»