Larry Osterman's WebLog

Confessions of an Old Fogey
  • Larry Osterman's WebLog

    What’s wrong with this code–a real world example

    • 15 Comments

    I was working on a new feature earlier today and I discovered that while the code worked just fine when run as a 32bit app, it failed miserably when run as a 64bit app.

    If I was writing code that used polymorphic types (like DWORD_PTR) or something that depended on platform specific differences, this wouldn’t be a surprise, but I wasn’t.

     

    Here’s the code in question:

            DWORD cchString;
            DWORD cbValue;
            HRESULT hr = CorSigUncompressData(pbBlob, cbBlob, &cchString, &cbValue);
            if (SUCCEEDED(hr))
            {
                cbBlob -= cbValue;
                pbBlob += cbValue;
    
                if (cbBlob >= cchString)
                {
                    //  Convert to unicode
                    wchar_t rgchTypeName[c_cchTypeNameMax];
                    DWORD cchString = MultiByteToWideChar(CP_UTF8, 0, reinterpret_cast<LPCSTR>(pbBlob), static_cast<int>(cchString), 
    rgchTypeName, ARRAYSIZE(rgchTypeName)); if (cchString != 0 && cchString < ARRAYSIZE(rgchTypeName)) { // Ensure that the string is null terminated. rgchTypeName[cchString] = L'\0'; } cbBlob -= cchString; pbBlob += cchString; } }

    This code parses a ECMA 335 SerString. I’ve removed a bunch  of error checking and other code to make the code simpler (and the bug more obvious).

    When I ran the code when compiled for 32bits, the rgchTypeName buffer contained the expected string contents. However, when I ran it on a 64bit compiled binary, I only had the first 6 characters of the string. Needless to say, that messed up the subsequent parsing of the blob containing the string.

    Stepping into the code in the debugger, I saw that the cchString variable had the correct length at the point of the call to MultiByteToWideChar, however when I stepped into the call to MultiByteToWideChar, the value had changed from the expected length to 6!

    After a couple of minutes staring at the code, I realized what was happening: Through a cut&paste error, I had accidentally double-declared the cchString local variable.  That meant that the cchString variable passed to the MulitByteToWideChar call was actually an uninitialized local variable, so it’s not surprising that the string length was bogus.

    So why did this fail on 32bit code but not on 64bit?  Well, it turns out that the 32bit compiler’s code generation algorithm had re-used the stack storage for the inner and outer cchString variables (this is safe to do because the outer cchString variable was not used anywhere outside this snippet), thus when the processor pushed the unitialized cchString variable it happened to push the right value.  Since the 64bit compiler allocates local variables differently, the uninitialized variable was immediately obvious.

  • Larry Osterman's WebLog

    Insecure vs. Unsecured

    • 5 Comments

    A high school classmate of mine recently posted on Facebook:

    Message just popped up up my screen from Microsoft, I guess. "This site has insecure content." Really? Is the content not feeling good about itself, or, perchance, did they mean "unsecured?" What the ever-lovin' ****?

    I was intrigued, because it was an ambiguous message and it brings up an interesting discussion.   Why the choice of the word “insecure” instead of “unsecured”?  

    It turns out that this message (which doesn’t come from Internet Explorer, but instead from another browser) is generated when you attempt to access a page which contains mixed content.  In other words, a page where the primary page is protected via SSL yet there are child elements in the page that are not protected by SSL. 

    Given that this is a mixed content warning, wouldn’t my friend’s suggestion (that they use “unsecured” in the message rather than “insecure”) be a better choice?   After all, the message is complaining that there is content that hasn’t been secured via SSL on the page, so the content is unsecured (has no security applied).

     

    Well, actually I think that insecure is a better word choice than unsecured, for one reason:  If you have a page with mixed content on it, an attacker can use the unsecured elements to attack the secured elements.  This page from the IE blog (and this article from MSDN) discuss the risks associated with mixed content – the IE blog post points out that even wrapping the unsecured content in a frame won’t make the page secure.

     

    So given a choice between using “insecure” or “unsecured” in the message, I think I prefer “insecure” because it is a slightly stronger statement – “unsecured” implies that it’s a relatively benign configuration error.

     

    Having said all that, IMHO there’s a much better word to use in this scenario than “insecure” – “unsafe”.  To me, “unsafe” is a better term because it more accurately reflects the state – it says that the reason that the content is being blocked is because it’s not ”safe”.

    On the other hand, I’m not sure that describing content secured via SSL as “safe” vs. “unsafe” is really any better, since SSL can only ensure two things: that a bystander cannot listen to the contents of your conversation and that the person you’re talking to is really the person who they say they are (and the last is only as reliable as the certificate authority who granted the certificate is).   There’s nothing that stops a bad guy from using SSL on their phishing site.

    I actually like what IE 9 does when presented with mixed content pages – it blocks the non SSL content with a gold bar which says “Only secure content is displayed” with a link describing the risk and a button that allows all the content to be displayed.  Instead of describing what was blocked, it describes what was shown (thus avoiding the “insecure” vs “unsecured” issue) and it avoids the “safe” vs “unsafe” nomenclature.  But again, it does say that the content is secure – which may be literally true, but many customers believe that “secure” == “safe” which isn’t necessarily true.

  • Larry Osterman's WebLog

    Read-Only and Write-Only computer languages

    • 17 Comments

    A colleague and I were chatting the other day and we were talking about STL implementations (in the context of a broader discussion about template meta-programming and how difficult it is).

     

    During our discussion, I described the STL implementation as “read-only” and he instantly knew what I was talking about.  As we dug in further, I realized that for many languages, you can characterize computer languages as read-only and write-only[1]

    Of course there’s a huge amount of variation here – it’s always possible to write incomprehensible code, but there are languages that just lend themselves to being read-only or write-only.

    A “read-only” language is a language that anyone can understand when reading it, but you wouldn’t even begin to be able to know how to write (or modify) code in that language.  Languages that are read-only tend to have very subtle syntax – it looks like something familiar, but there are magic special characters that change the meaning of the code.  As I mentioned above, template meta-programming can be thought of as read-only, if you’ve ever worked with COBOL code, it also could be considered to be read-only.

    A “write-only” language is a language where only the author of the code understands what it does.  Languages can be write-only because of their obscure syntax, they can be write-only because of their flexibility.   The canonical example of the first type of write-only language is Teco (which was once described to me as “the only computer language whose syntax is indistinguishable from line noise”[2]).  But there are other languages that are also write-only.   For instance JavaScript and Perl are often considered to be write-only – the code written is often indecipherable to a knowledgeable viewer (but is almost always totally understandable to the author of the code).  It’s possible to write legible JS and Perl, but all too often, the code is impenetrable to the casual observer.

     

    Of course anyone for someone who’s very familiar with a particular language, the code written in that language is often understandable – back when I was coding in Teco on a daily basis (and there was a time when I spent weeks working on Emacs (the original Emacs written by RMS, not the replacement written by Jim Gosling) extensions), I could easily read Teco code.  But that’s only when you spend all your time living and breathing the code.

     

     

     

     

    [1] I can’t take credit for the term “read-only”, I first heard the term from Miguel de Icaza at the //Build/ conference a couple of weeks ago.

    [2] “line noise” – that’s the random characters that are inserted into the character stream received by an acoustic modem – these beasts no longer exists in todays broadband world, but back in the day, line noise was a real problem.

  • Larry Osterman's WebLog

    What has Larry been doing for two years (and why has the blog been dark for so long)?

    • 21 Comments

    As many of you may know, I tend to blog about things I encounter in my day-to-day work that I think might be of general interest.  And for the past two years, even though I've run into things that were "blog-worthy", I couldn't write about them in public.  And thus no blog posts.

    But that's changed now that the //Build conference is over.  I can finally talk about some of the things I've worked on over the past few years.  Most of the things will come through more official channels: the "Building Windows 8" blog, the windows dev center, etc.  But I do hope to write more about what I have done in the past and what I'm doing these days.

    So what *have* I been doing for the past two years?  After we shipped Windows 7, I moved from the Windows Audio team to a new team known as the "Runtime Experience" team.  The Runtime Experience team is responsible for the architectural elements that make up the new "Windows Runtime" which is a part of the next version of Windows.  My development manager, Martyn Lovell gave a great talk at the //Build conference about the runtime here

    My work has focused on developer tools to enable authoring windows runtime APIs and designing the metadata format used to represent the windows runtime APIs.  It's a bit esoteric and geeky, but I've had a huge amount of fun working on this over the past two years. 

    Anyway, that's a very brief version of my job, and as I said, I hope to be able to write more often in the near future.

     

  • Larry Osterman's WebLog

    Getting started with test driven development

    • 9 Comments

    I'm at the build conference in Anaheim this week, and I was in the platform booth when a customer asked me a question I'd not been asked before: "How do you get started with test driven development".  My answer was simply "just start - it doesn't matter how much existing code you already have, just start writing tests alongside your new code.  Get a good unit test framework like the one in Visual Studio, but it really doesn't matter what framework you use, just start writing the tests".

    This morning, I realized I ought to elaborate on my answer a bit.

    I'm a huge fan of Test Driven Development.  Of all the "eXtreme Programming" methodologies, TDD is by far the one that makes the most sense.  I started using TDD back in Windows 7.  I had read about TDD over the years, and was intrigued by the concept but like the customer, I didn't really know where to start.  My previous project had extensive unit tests, but they really didn't use any kind of methodology when developing them.  When it came time to develop a new subsystem for the audio stack for Windows 7 (the feature that eventually became the "capture monitor/listen to" feature), I decided to apply TDD when developing the feature just to see how well it worked.  The results far exceeded my expectations.

    To be fair, I don't follow the classic TDD paradigm where you write the tests first, then write the code to make sure the tests pass.  Instead I write the tests at the same time I'm writing the code.  Sometimes I write the tests before the code, sometimes the code before the tests, but they're really written at the same time.

    In my case, I was fortunate because the capture monitor was a fairly separate piece of the audio stack - it is essentially bolted onto the core audio engine.  That meant that I could develop it as a stand-alone system.  To ensure that the capture monitor could be tested in isolation, I developed it as a library with a set of clean APIs.  The interface with the audio engine was just through those clean APIs.  By reducing the exposure of the capture monitor APIs, I restricted the public surface I needed to test.

    But I still needed to test the internal bits.  The good news is that because it was a library, it was easy to add test hooks and enable the ability to drive deep into the capture monitor implementation.  I simply made my test classes friends of the implementation classes and then the test code could call into the protected members of the various capture monitor classes.  This allowed me to build test cases that had the ability to simulate internal state changes which allowed me to build more thorough tests.

    I was really happy with how well the test development went, but the proof about the benefits of TDD really shown when it was deployed as a part of the product. 

    During the development of Windows 7, there were extremely few (maybe a half dozen?) bugs found in the capture monitor that weren't first found by my unit tests.  And because I had such an extensive library of tests, I was able to add regression test cases for those externally found tests.

    I've since moved on from the audio team, but I'm still using TDD - I'm currently responsible for two tools in the Windows build system/SDK and both of them have been developed with TDD.  One of them (the IDL compiler used by Windows developers for creating Windows 8 APIs) couldn't be developed using the same methodology as I used for the capture monitor, but the other (mdmerge, the metadata composition tool) was.  Both have been successful - while there have been more bugs found externally in both the IDL compiler and mdmerge than were found in the capture monitor, the regression rate on both tools has been extremely low thanks to the unit tests.

    As I said at the beginning, I'm a huge fan of TDD - while there's some upfront cost associated with creating unit tests as you write the code, it absolutely pays off in the long run with a higher initial quality and a dramatically lower bug rate.

  • Larry Osterman's WebLog

    Nobody ever reads the event logs…

    • 19 Comments

    In my last post, I mentioned that someone was complaining about the name of the bowser.sys component that I wrote 20 years ago.  In my post, I mentioned that he included a screen shot of the event viewer.

    What was also interesting thing was the contents of the screen shot.

    “The browser driver has received too many illegal datagrams from the remote computer <redacted> to name <redacted> on transport NetBT_Tcpip_<excluded>.  The data is the datagram.  No more events will be generated until the reset frequency has expired.”

    I added this message to the browser 20 years ago to detect computers that were going wild sending illegal junk on the intranet.  The idea was that every one of these events indicated that something had gone horribly wrong on the machine which originated the event and that a developer or network engineer should investigate the problem (these illegal datagrams were often caused by malfunctioning networking hardware (which was not uncommon 20 years ago)).

    But you’ll note that the person reporting the problem only complained about the name of the source of the event log entry.  He never bothered to look at the contents of this “error” event log entry to see if there was something that was worth reporting.

    Part of the reason that nobody bothers to read the event logs is that too many components log to the eventlog.  The event logs on customers computers are filled with unactionable meaningless events (“The <foo> service has started.  The <foo> service has entered the running state.  The <foo> service is stopping.  The <foo> service has entered the stopped state.”).  And they stop reading the event log because there’s never anything actionable in the logs.

    There’s a pretty important lesson here: Nobody ever bothers reading event logs because there’s simply too much noise in the logs. So think really hard about when you want to write an event to the event log.  Is the information in the log really worth generating?  Is there important information that a customer will want in those log entries?

    Unless you have a way of uploading troublesome logs to be analyzed later (and I know that several enterprise management solutions do have such mechanisms), it’s not clear that there’s any value to generating log entries.

  • Larry Osterman's WebLog

    Reason number 9,999,999 why you don’t ever use humorous elements in a shipping product

    • 4 Comments

    I just saw an email go by on one of our self hosting aliases:

    From: <REDACTED>
    Sent: Saturday, April 30, 2011 12:27 PM
    To: <REDACTED>
    Subject: Spelling Mistake for browser in event viewer

    Not sure which team to assign this to – please pick up this bug – ‘bowser’ for ‘browser’

    And he included a nice screen shot of the event viewer pointing to an event generated by bowser.sys.

    The good news is that for once I didn’t have to answer the quesion.  Instead my co-workers answered for me:

    FYI: People have been filing bugs for this for years. Larry Osterman wrote a blog post about it. J

    http://blogs.msdn.com/b/larryosterman/archive/2006/03/14/551368.aspx

    <Redacted>

    From: <Redacted>
    Sent: Saturday, April 30, 2011 1:54 PM
    To: <Redacted>

    Subject: RE: Spelling Mistake for browser in event viewer

    The name of the service is (intentionally) bowser and has been so for many releases.

    My response:

    “many releases”.  That cracks me up.  If I had known that I would literally spend the next 20 years paying for that one joke, I would have reconsidered it.

    And yes, bowser.sys has been in the product for 20 years now.

     

    So take this as an object lesson.  Avoid humorous names in your code or you’ll be answering questions about them for the next two decades and beyond.  If I had named the driver “brwsrhlp.sys” (at that point setup limited us to 8.3 file names) instead of “bowser.sys” it would never have raised any questions.  But I chose to go with a slightly cute name and…

     

    PS: After posting this, several people have pointed out that the resources on bowser.sys indicate that it's name should be "browser.sys".  And they're right.  To my knowledge, nobody has noticed that in the past 20 years...

  • Larry Osterman's WebLog

    How do people keep coming up with this stuff (mspaint as an audio track).

    • 14 Comments

    The imagination of people on the internet continues to astound me.

    Todays example: Someone took mspaint.exe and turned it into a PCM .WAV file and then played it.

    The truly terrifying thing is that it didn't sound that horribly bad.

    TThere’s also a version of the same soundtrack with annoying comments

  • Larry Osterman's WebLog

    Someone is a glutton for punishment

    • 15 Comments

    From Long Zheng, a video of someone who decided to upgrade every version of Windows from Windows 1.0 to Windows 7.

    The amazing thing is that it worked.

  • Larry Osterman's WebLog

    The case of the inconsistent right shift results…

    • 17 Comments

    One of our testers just filed a bug against something I’m working on.  They reported that if they compiled code which calculated: 1130149156 >> –05701653 it generated different results on 32bit and 64bit operating systems.  On 32bit machines it reported 0 but on 64bit machines, it reported 0x21a.

    I realized that I could produce a simple reproduction for the scenario to dig into it a bit deeper:

    int _tmain(int argc, _TCHAR* argv[])
    {
        __int64 shift = 0x435cb524;
        __int64 amount = 0x55;
        __int64 result = shift >> amount;
        std::cout << shift << " >> " << amount << " = " << result << std::endl;
        return 0;
    }

    That’s pretty straightforward and it *does* reproduce the behavior.  On x86 it reports 0 and on x64 it reports 0x21a.  I can understand the x86 result (you’re shifting right more than the processor size, it shifts off the end and you get 0) but not the x64. What’s going on?

    Well, for starters I asked our C language folks.  I know I’m shifting by more than the processor word size (85), but the results should be the same, right?

    Well no.  The immediate answer I got was:

    From C++ 03, 5.8/1: The behavior is undefined if the right operand is negative, or greater than or equal to the length in bits of the promoted left operand.

    Ok.  It’s undefined behavior.  But that doesn’t really explain the difference.  When in doubt, let’s go to the assembly….

    000000013F5215D3  mov         rax,qword ptr [amount]  
    000000013F5215D8  movzx       ecx,al  
    000000013F5215DB  mov         rax,qword ptr [shift]  
    000000013F5215E0  sar         rax,cl  
    000000013F5215E3  mov         qword ptr [result],rax  
    000000013F5215E8  mov         rdx,qword ptr [shift] 

    The relevant instruction is highlighted.  It’s doing a shift arithmetic right of “shift” by “amount”.

    What about the x86 version?

    00CC14CA  mov         ecx,dword ptr [amount]  
    00CC14CD  mov         eax,dword ptr [shift]  
    00CC14D0  mov         edx,dword ptr [ebp-8]  
    00CC14D3  call        @ILT+85(__allshr) (0CC105Ah)  
    00CC14D8  mov         dword ptr [result],eax  
    00CC14DB  mov         dword ptr [ebp-28h],edx  

    Now that’s interesting.  The x64 version is using a processor shift function but on 32bit machines, it’s using a C runtime library function (__allshr).  And the one that’s weird is the x64 version.

    While I don’t have an x64 processor manual, I *do* have a 286 processor manual from back in the day (I have all sorts of stuff in my office).  And in my 80286 manual, I found:

    “If a shift count greater than 31 is attempted, only the bottom five bits of the shift count are used. (the iAPX 86 uses all eight bits of the shift count.)”

    A co-worker gave me the current text:

    The destination operand can be a register or a memory location. The count operand can be an immediate value or the CL register. The count is masked to 5 bits (or 6 bits if in 64-bit mode and REX.W is used). The count range is limited to 0 to 31 (or 63 if 64-bit mode and REX.W is used). A special opcode encoding is provided for a count of 1.

    So the mystery is now solved.  The shift of 0x55 only considers the low 6 bits.  The low 6 bits of 0x55 is 0x15 or 21.  0x435cb524 >> 21 is 0x21a.

    One could argue that this is a bug in the __allshr function on x86 but you really can’t argue with “the behavior is undefined”.  Both scenarios are doing the “right thing”.  That’s the beauty of the “behavior is undefined” wording.  The compiler would be perfectly within spec if it decided to reformat my hard drive when it encountered this (although I’m happy it doesn’t Smile).

    Now our feature crew just needs to figure out how best to resolve the bug.

  • Larry Osterman's WebLog

    Why does Windows still place so much importance on filenames?

    • 35 Comments

    Earlier today, Adrian Kingsley-Hughes posted a rant (his word, not mine) about the fact that Windows still relies on text filenames.

    The title says it all really. Why is it that Windows still place so much importance on filenames.

    Take the following example - sorting out digital snaps. These are usually automatically given daft filenames such as IMG00032.JPG at the time they are stored by the camera. In an ideal world you’d only ever have one IMG00032.JPG on your entire system, but the world is far from perfect. Your camera might decide to restart its numbering system, or you might have two cameras using the same naming format. What happens then?

    I guess I’m confused.  I could see a *very* strong argument against Windows dependency on file extensions, but I’m totally mystified about why having filenames is such a problem.

    At some level, Adrian’s absolutely right – it IS possible to have multiple files on the hard disk named “recipe.txt”.  And that’s bad.  But is it the fault of Windows for allowing multiple files to have colliding names? Or is it the fault of the user for choosing poor names?  Maybe it’s a bit of both.

    What would a better system look like?  Well Adrian gives an example of what he’s like to see:

    Why? Why is the filename the deciding factor? Why not something more unique? Something like a checksum? This way the operating system could decide is two files really are identical or not, and replace the file if it’s a copy, or create a copy if they are different. This would save time, and dramatically reduce the likelihood of data loss through overwriting.

    But how would that system work?  What if we did just that.  Then you wouldn’t have two files named recipe.txt (which is good).

    Unfortunately that solution introduces a new problem: You still have two files.  One named “2B1015DB-30CA-409E-9B07-234A209622B6” and the other named “5F5431E8-FF7C-45D4-9A2B-B30A9D9A791B”. It’s certainly true that those two files are uniquely named and you can always tell them apart.  But you’ve also lost a critical piece of information: the fact that they both contain recipes.

    That’s the information that the filename conveys.  It’s human specific data that describes the contents of the file.  If we were to go with unique monikers, we’d lose that critical information.

    But I don’t actually think that the dependency on filenames is really what’s annoying him.  It’s just a symptom of a different problem. 

    Adrian’s rant is a perfect example of jumping to a solution without first understanding the problem.  And why it’s so hard for Windows UI designers to figure out how to solve customer problems – this example is a customer complaint that we remove filenames from Windows.  Obviously something happened to annoy Adrian that was related to filenames, but the question is: What?  He doesn’t describe the problem, but we can hazard a guess about what happened from his text:

    Here’s an example. I might have two files in separate folders called recipe.txt, but one is a recipe for a pumpkin pie, and the other for apple pie. OK, it was dumb of me to give the files the same name, but it’s in situations like this that the OS should be helping me, not hindering me and making me pay for my stupidity. After all, Windows knows, without asking me, that the files, even if they are the same size and created at exactly the same time, are different. Why does Windows need to ask me what to do? Sure, it doesn’t solve all problems, but it’s a far better solution than clinging to the notion of filenames as being the best metric by which to judge whether files are identical or not.

    The key information here is the question: “Why does Windows need to ask me what to do?”  My guess is that he had two “recipe.txt” files in different directories and copied a recipe.txt from one directory to the other.  When you do that, Windows presents you with the following dialog:

    Windows Copy Dialog

    My suspicion is that he’s annoyed because Windows is forcing him to make a choice about what to do when there’s a conflict.  The problem is that there’s no one answer that works for all users and all scenarios.    Even in my day-to-day work I’ve had reason to chose all three options, depending on what’s going on.  From the rant, it appears that Adrian would like it to chose “Copy, but keep both files” by default.  But what happens if you really *do* want to replace the old recipe.txt with a new version?  Maybe you edited the file offline on your laptop and you’re bringing the new copy back to your desktop machine.  Or maybe you’re copying a bunch of files from one drive to another (I do this regularly when I sync my music collection from home and work).  In that case, you want to ignore the existing copy of the file (or maybe you want to copy the file over to ensure that the metadata is in sync).

    Windows can’t figure out what the right answer is here – so it prompts the user for advice about what to do.

    Btw, Adrian’s answer to his rhetorical question is “the reason is legacy”.  Actually that’s not quite it.  The reason is that it’s filenames provide valuable information for the user that would be lost if we went away from them.

    Next time I want to spend a bit of time brainstorming about ways to solve his problem (assuming that the problem I identified is the real problem – it might not be). 

     

     

    PS: I’m also not sure why he picked on Windows here.  Every operating system I know of has similar dependencies on filenames.  I think that’s an another indication that he’s jumping on a solution without first describing the problem.

  • Larry Osterman's WebLog

    Hacking Windows with Phones… I don’t get it.

    • 12 Comments

    Over the weekend, Engadget and CNet ran a story discussing what was described as a new and novel attack using Android smartphones to attack PCs.  Apparently someone took an Android smartphone and modified the phone to emulate a USB keyboard.

    When the Android phone was plugged into Windows, Windows thought it was a keyboard and allowed the phone to inject keystrokes (not surprisingly, OSX and Linux did the same).  The screenshots I’ve seen show WordPad running with the word “owned!” on the screen, presumably coming from the phone.

     

    I have to say, I don’t get why this is novel.  There’s absolutely no difference between this hack and plugging in an actual keyboard to the computer and typing keys – phones running the software can’t do anything that the user logged into the computer can’t do, they can’t bypass any of Windows security features.  All they can do is be a keyboard.

    If the novelty is that it’s a keyboard that’s being driven by software on the phone, a quick search for “programmable keyboard macro” shows dozens of keyboards which can be programmed to insert arbitrary key sequences.  So even that’s not particularly novel.

     

    I guess the attack could be used to raise awareness of plugging in devices, but that’s not a unique threat.  In fact the 1394 “FireWire” bus is well known for having significant security issues (1394 devices are allowed full DMA access to the host computer). 

    Ultimately this all goes back to Immutable Law #3.  If you let the bad guys tamper with your machine, they can 0wn your machine.  That includes letting the bad guys tamper with the devices which you then plug into your machine.

    Sometimes the issues which tickle the fancy of the press mystify me.

  • Larry Osterman's WebLog

    It’s a bad idea to have a TEMP environment variable longer than about 130 characters

    • 8 Comments

    I've been working with the Win32 API for almost 20 years - literally since the very first Win32 APIs were written.  Even after all that time, I'm occasionally surprised by the API behavior.

    Earlier today I was investigating a build break that took out one of our partner build labs.  Eventually I root caused it to an issue with (of all things) the GetTempName API.

    Consider the following code (yeah, I don’t check for errors, <slaps wrist />):

    #include "stdafx.h"
    #include <string>
    #include <iostream>
    #include <Windows.h>
    
    using namespace std;
    
    const wchar_t longEnvironmentName[] = 
    L"c:\\users\\larry\\verylongdirectory\\withaverylongsubdirectory" L"\\andanotherlongsubdirectory\\thatisstilldeeper\\withstilldeeper" L"\\andlonger\\untilyoustarttorunoutofpatience\\butstillneedtobelonger" L"\\untilitfinallygetslongerthanabout130characters"; int _tmain(int argc, _TCHAR* argv[]) { wchar_t environmentBuffer[ MAX_PATH ]; wchar_t tempPath[ MAX_PATH ]; SetEnvironmentVariable(L"TEMP", longEnvironmentName); SetEnvironmentVariable(L"TMP", longEnvironmentName); GetEnvironmentVariable(L"TEMP", environmentBuffer, _countof(environmentBuffer)); wcout << L"Temp environment variable is: " << environmentBuffer << " length: " << wcslen(environmentBuffer) << endl; GetTempPath(_countof(tempPath), tempPath); wcout << L"Temp path: " << tempPath<< " length: " << wcslen(tempPath) << endl; return 0; }

    When I ran this program, I got the following output:

    Temp environment variable is: c:\users\larry\verylongdirectory\withaverylongsubdirectory\andanotherlongsubdirectory\thatisstilldeeper\withstilldeeper\andlonger\
    untilyoustarttorunoutofpatience\butstillneedtobelonger\untilitfinallygetslongerthanabout130characters length: 231
    Temp path: C:\Users\larry\ length: 15

    So what’s going on?  Why did GetTempPath return a pointer to my profile directory and not the (admittedly long) TEMP environment variable?

    There’s a bunch of stuff here.  First off, let’s consider the documentation for GetTempPath:

    The GetTempPath function checks for the existence of environment variables in the following order and uses the first path found:

    1. The path specified by the TMP environment variable.
    2. The path specified by the TEMP environment variable.
    3. The path specified by the USERPROFILE environment variable.
    4. The Windows directory.

    So that explains where the c:\Users\larry came from – something must have gone wrong retrieving the “TMP” and “TEMP” environment variables so it fell back to step 3[1].  But what could have happened?  We know that at least the TEMP environment variable was correctly set, we retrieved it in our test application.  This was where I got surprised.

    It turns out that under the covers (at least on Win7), the function which retrieves the TEMP and TMP environment variables uses a UNICODE_STRING structure to initialize the string.  And, for whatever reason, they set MaximumLength to MAX_PATH+1.  If I look at the documentation for UNICODE_STRING, we find:

    MaximumLength

    Specifies the total size, in bytes, of memory allocated for Buffer. Up to MaximumLength bytes may be written into the buffer without trampling memory.

    So the function expects at most 261 bytes,  or about 130 characters.  I often see behaviors like this in the "A" version of system APIs, but in this case both the "A" and "W" version of the API had the same unexpected behavior.

    The moral of the story: If you set your TEMP environment variable to something longer than 130 characters or so, GetTempPath will return your USERPROFILE.  Which means that you may unexpectedly find temporary files scribbled all over your profile directory.

     

    The fix was to replace the calls to GetTempPath with direct calls to GetEnvironmentVariable - it doesn't have the same restriction.

     

     

    [1] Note that the 4th step is the Windows directory.  You can tell that this API has been around for a while because apparently the API designers thought it was a good idea to put temporary files in the windows directory.

     

    EDIT: Significantly revised to improve readability - I'm rusty at this.

  • Larry Osterman's WebLog

    What does “size_is” mean in an IDL file?

    • 7 Comments

    My boss (who has spent a really long time working on RPC) and I got into a discussion the other day about the “size_is” IDL attribute (yeah, that’s what Microsoft developers chat about when they’re bored).

    For context, there are two related attributes which are applied to an array in IDL files.  size_is(xxx) and length_is(xxx).  They both relate to the amount of memory which is marshaled in a COM or RPC interface, but we were wondering the exact semantics of the parameter.

    The documentation for “size_is” says:

    Use the [size_is] attribute to specify the size of memory allocated for sized pointers, sized pointers to sized pointers, and single- or multidimensional arrays.

    The documentation for “length_is” says:

    The [length_is] attribute specifies the number of array elements to be transmitted. You must specify a non-negative value.

    So the length_is attribute clearly refers to the number of elements in the array to be transmitted.  But what are the units for the size_is attribute?  The MSDN documentation doesn’t say – all you see is that it “specif[ies] the size of memory allocated for … single- or multidimentional arrays”.  Typically memory allocations are specified in bytes, so this implies that the size_is attribute measures the number of bytes transferred.

    And that’s what I’ve thought for years and years.  length_is was the number of elements and size_is was the number of bytes.

    But my boss thought that size_is referred to a number of elements.  And since he’s worked on RPC for years, I figured he’d know best since he actually worked on RPC.

     

    To see if the problem was just that the current MSDN documentation was incorrect, I dug into the oldest RPC documentation I have – from the original Win32 SDK that was shipped with Windows NT 3.1 way back in 1993 (I have my own personal wayback machine in my office).

    The old SDK documentation says:

    “the size_is attribute is used to specify an expression or identifier that designates the maximum allocation size of the array”

    Well, allocation sizes are always in bytes, so size_is is in bytes, right?

    Well maybe not.  It further goes on to say:

    “the values specified by the size_is, max_is and min_is attributes have the following relationship: size_is = max_is – 1.  The size_is attribute provides an alternative to max_is for specifying the maximum amount of data”

    So what is “max_is”?  Maybe there’s a clue there…

    Go to max_is and it says “designates the maximum value for a valid array index” – so clearly it is a count of elements.  And thus by induction, size_is must be in number of elements and not number of bytes… 

    Ok, so the old documentation is ambiguous but it implies that both length_is and size_is refer to a count of elements.

     

    To confirm, I went to the current owner of the MIDL compiler for the definitive word on this and he said:

    Always in elements for all the XXX_is attributes.  And everything else except allocation routines IIRC.

    <Your boss> is correct that we allocate the buffer based on size_is, but we transmit elements based on length_is if they’re both present.  BTW, [string] is basically [length_is(<w>strlen(…))].

    So that’s the definitive answer:

    size_is and length_is both refer to a count of elements.  size_is defines the size of the buffer allocated for the transfer and length_is specifies the number of elements transferred within that buffer.

     

     

    And yes, I’ve asked the documentation folks to update the documentation to correct this.

     

    EDIT: Oops, fixed a typo.  Thanks Sys64738:)

  • Larry Osterman's WebLog

    Microsoft Office team deploys botnet for security research

    • 4 Comments

    Even though it’s posted on April 1st, this is actually *not* an April Fools prank.

    It turns out that the Office team runs a “botnet” internally that’s dedicated to file fuzzing.  Basically they have a tool that’s run on a bunch of machines that runs file fuzzing jobs in their spare time.  This really isn’t a “botnet” in the strictest sense of the word, it’s more like SETI@home orother distributed computing efforts or but “botnet” is the word that the Office team uses when describing the effort.

     

    For those that don’t know what fuzz testing is, it’s a remarkably effective technique that can be used to find bugs in file parsers.  Basically you build a file with random content and you try to parse the file.  Typically you start with a known good file and randomly change the contents of the file.  If you iterate over that process many times, you will typically find dozens or hundreds of bugs.  The SDL actually requires that every file parser be fuzz tested for a very large (hundreds of thousands) number of iterations.

    The Windows team has an entire lab that is dedicated to nothing but fuzz testing.  The testers author fuzz tests (using one of several fuzz testing frameworks) and they hand the tests to the fuzz test lab which actually runs the tests.  This centralizes the fuzz testing effort and keeps teams from having to keep dozens of machines dedicated to fuzz testing.  The Office team took a different tack on the effort – instead of dedicating an entire lab to fuzz testing they also dedicated the spare cycles of their machines.  Very cool.

     

    I’ve known about the Office teams effort for a while now (Tom Gallagher gave a talk about it at a recent BlueHat conference) but I didn’t know that the Office team had discussed it at CanSecWest until earlier today.

  • Larry Osterman's WebLog

    Not Invented Here’s take on software security

    • 3 Comments

    One of my favorite web comics is Not Invented Here by Bill Barnes and Paul Southworth.  I started reading Bill’s stuff with his other web comic Unshelved (a librarian comic).

     

    NIH is a web comic about software development and this week Bill and Paul have decided to take on software security…

    Here’s Monday’s comic:

    Not Invented Here strip for 2/15/2010

     

    Check them out – Bill and Paul both have a good feel for how the industry actually works :).

  • Larry Osterman's WebLog

    NextGenHacker101 owes me a new monitor

    • 102 Comments

    Because I just got soda all over my current one…

    One of the funniest things I’ve seen in a while. 

     

    And yes, I know that I’m being cruel here and I shouldn’t make fun of the kids ignorance, but he is SO proud of his new discovery and is so wrong in his interpretation of what actually is going on…

     

     

     

    For my non net-savvy readers: The “tracert” command lists the route that packets take from the local computer to a remote computer.  So if I want to find out what path a packet takes from my computer to www.microsoft.com, I would issue “tracert www.microsoft.com”.  This can be extremely helpful when troubleshooting networking problems.  Unfortunately the young man in the video had a rather different opinion of what the command did.

  • Larry Osterman's WebLog

    What’s up with the Beep driver in Windows 7?

    • 93 Comments

    Earlier today, someone asked me why 64bit versions of windows don’t support the internal PC speaker beeps.  The answer is somewhat complicated and ends up being an interesting intersection between a host of conflicting tensions in the PC ecosystem.

     

    Let’s start by talking about how the Beep hardware worked way back in the day[1].  The original IBM PC contained an Intel 8254 programmable interval timer chip to manage the system clock.  Because the IBM engineers felt that the PC needed to be able to play sound (but not particularly high quality sound), they decided that they could use the 8254 as a very primitive square wave generator.  To do this, they programmed the 3rd timer on the chip to operate in Square Wave mode and to count down with the desired output frequency.  This caused the Out2 line on the chip to toggle from high to low every time the clock went to 0.  The hardware designers tied the Out2 line on the chip to the PC speaker and voila – they were able to use the clock chip to program the PC speaker to make a noise (not a very high quality noise but a noise nonetheless).

    The Beep() Win32 API is basically a thin wrapper around the 8254 PIC functionality.  So when you call the Beep() API, you program the 8254 to play sounds on the PC speaker.

     

    Fast forward about 25 years…  The PC industry has largely changed and the PC architecture has changed with it.  At this point they don’t actually use the 8254 as the programmable interrupt controller, but it’s still in modern PCs.  And that’s because the 8254 is still used to drive the PC speaker. 

    One of the other things that happened in the intervening 25 years was that machines got a whole lot more capable.  Now machines come with capabilities like newfangled hard disk drives (some of which can even hold more than 30 megabytes of storage (but I don’t know why on earth anyone would ever want a hard disk that can hold that much stuff)).  And every non server machine sold today has a PC sound card.  So every single machine sold today has two ways of generating sounds – the PC sound card and the old 8254 which is tied to the internal PC speaker (or to a dedicated input on the sound card – more on this later).

     

    There’s something else that happened in the past 25 years.  PCs became commodity systems.  And that started exerting a huge amount of pressure on PC manufacturers to cut costs.  They looked at the 8254 and asked “why can’t we remove this?”

    It turns out that they couldn’t.  And the answer to why they couldn’t came from a totally unexpected place.  The American’s with Disabilities Act.

     

    The ADA?  What on earth could the ADA have to do with a PC making a beep?   Well it turns out that at some point in the intervening 25 years, the Win32 Beep() was used for assistive technologies – in particular the sounds made when you enable the assistive technologies like StickyKeys were generated using the Beep() API.   There are about 6 different assistive technology (AT) sounds built into windows, their implementation is plumbed fairly deep inside the win32k.sys driver. 

    But why does that matter?  Well it turns out that many enterprises (both governments and corporations) have requirements that prevent them from purchasing equipment that lacks accessible technologies and that meant that you couldn’t sell computers that didn’t have beep hardware to those enterprises.

     

    This issue was first noticed when Microsoft was developing the first 64bit version of WIndows.  Because the original 64bit windows was intended for servers, the hardware requirements for 64bit machines didn’t include support for an 8254 (apparently the AT requirements are relaxed on servers).  But when we started building a client 64bit OS, we had a problem – client OS’s had to support AT so we needed to bring the beep back even on machines that didn’t have beep hardware.

    For Windows XP this was solved with some custom code in winlogon which worked but had some unexpected complications (none of which are relevant to this discussion).  For Windows Vista, I redesigned the mechanism to move the accessibility beep logic to a new “user mode system sounds agent”. 

    Because the only machines with this problem were 64bit machines, this functionality was restricted to 64bit versions of Windows. 

    That in turn meant that PC manufacturers still had to include support for the 8254 hardware – after all if the user chose to buy the machine with a 32bit operating system on it they might want to use the AT functionality.

    For Windows 7, we resolved the issue completely – we moved all the functionality that used to be contained in Beep.Sys into the user mode system sounds agent – now when you call the Beep() API instead of manipulating the 8254 chip the call is re-routed into a user mode agent which actually plays the sounds.

     

    There was another benefit associated with this plan: Remember above when I mentioned that the 8254 output line was tied to a dedicated input on the sound card?  Because of this input to the sound card, the sound hardware needed to stay powered on at full power all the time because the system couldn’t know when an application might call Beep and thus activate the 8254 (there’s no connection between the 8254 and the power management infrastructure so the system can’t power on the sound hardware when someone programs the 3rd timer on the 8254).  By redirecting the Beep calls through the system audio hardware the system was able to put the sound hardware to sleep until it was needed.

     

    This redirection also had had a couple of unexpected benefits.  For instance when you accidentally type (or grep) through a file containing 0x07 characters in it (like a .obj file) you can finally turn off the annoying noise – since the beeps are played through the PC speakers, the PC mute key works to shut them up.  It also means that you can now control the volume of the beeps. 

    There were also some unexpected consequences.  The biggest was that people started noticing when applications called Beep().  They had placed their PCs far enough away (or there was enough ambient noise) that they had never noticed when their PC was beeping at them until the sounds started coming out their speakers.

     

     

    [1] Thus providing me with an justification to keep my old Intel component data catalogs from back in the 1980s.

  • Larry Osterman's WebLog

    More fun with Amazon reviews.

    • 8 Comments

    A co-worker sent this around and I just HAD to share it…   It’s not nearly as geeky as “The Story of Ping”.

     

    For those that don’t want to follow the link, these are some of reviews for this:

    image

     

    “I bought this review based on all of the positive reviews, but am having some issues with space. The desk is great shape and fit for my 2004 Chevy Caprice, but I am really having a heck of a time fitting the Chevy in the office. I put my printer/fax, coffee machine, shredder, and inappropriate family photos snugly in the backseat, yet I still don't have enough room to walk around the Chevy. And worse, I can barely open the door and squeeze in. Most of the time I squeeze in the passenger side and scramble into the driver seat. I tried leaving the window open and crawling through, but Scrappy hops in and hangs his head out the window. I just don't have the heart to tell him we aren't going to the dogpark and that I am still working.
    When the contractor comes to tear out the adjoining wall to my living room, I know he will ask me "Why don't you just use the garage?" But has he actually *seen* the garage! It's filled with stuff I never use like my treadmill, ab-lounger, roto-till, lawnmower... I couldn't work in that kind of clutter! NO WAY!
    Don't get me wrong! I LOVE THIS PRODUCT! Now I can turn my Neil Diamond up as loud as I want to and not have Sandra from Accounting peek over my cubical with those angry little eyes. And when I want to go to Starbucks or McDonalds I can work while I drive AND in the drive thru, too! It works perfectly with my Universal Portable Urinal - Unisex and Reliance Products Hassock Portable Lightweight Self-Contained Toilet
    Working from my car, and home, has never been easier. I finally threw out that crappy IKEA desk and printer stand combo! Good riddance! Best of all now I don't have to "tele-commute" to work, I "auto-commute." AWESOME.”

    “My 16 year old daughter just got her license a few weeks ago. Since then, she's been going out for drives a lot after school. Unfortunately, all the time spent in the car for her has meant less time for homework. Her grades have noticeably slipped, but instead of taking away her car privileges, I bought this steering wheel desk. It's perfect for young drivers with heavy academic loads! Now she can work on her homework and still be out driving, improving her road skills and staying on top of her grades. I couldn't be prouder and would encourage all parents with new drivers to set their kids up with this super-portable work station!”

    “My copilot and I both used these during our "daily grind" transcontinental flights from San Diego to Minneapolis. We had to modify them a bit to fit snug against the instrument panels (when we bought them we didn't realize the planes we fly don't have steering wheels!), but in the end it did the job. With our laptops firmly in place we were able to focus our attention on what really mattered, participating in raids with our WoW clan. During our last flight we were so immersed in trying to take down Eranikus that we overshot Minneapolis by a full hour and a half before some annoying flight attendant interrupted us, babbling something about "FAA and F16 fighters."
    We'll definitely use this product again at our next gig, whatever and whenever that happens to be...
    Highly recommended! “ [Editors Note: A reference to this]

    “This is definitely one of the best products out there!
    While commuting through downtown Seattle, I always had to make my sushi on the little center console in my Honda - try making a perfect California roll there! No way!
    Now, I can safely make my California rolls, Spider rolls and Rainbow rolls - all while steering with my knee.
    Downsides - it doesn't have a receptical for my egg drop soup bowl. Also, I have to keep my knife on the dashboard and my chopsticks tucked in my crotch. So, there are some flaws with this product. I would also recommend keeping your sake in a flask under the passengers seat.”

    The reviews go on and on in this vein.  Very funny.

  • Larry Osterman's WebLog

    Why are they called “giblets” anyway?

    • 0 Comments

    Five years ago, I attended one of the initial security training courses as a part of the XP SP2 effort.  I wrote this up in one of my very first posts entitled “Remember the giblets” and followed it up last year with “The Trouble with Giblets”.  I use the term “giblets” a lot but I’d never bothered to go out and figure out where the term came from.

    Well, we were talking about giblets in an email discussion today and one of my co-workers went and asked Michael Howard where the term came from.  Michael forwarded the question to Steve Lipner who was the person who originally coined the term and he came back with the origin of the term.

     

    It turns out that “giblets” is a term that was used at Digital Equipment Corporation back in the 1980s.  DEC used to sell big iron machines (actually I used DEC machines exclusively until I started at Microsoft).  The thing about big machines is that you usually need more than just the machine to build a complete solution – things like Ethernet repeaters and adapters and other fiddly bits.  And of course DEC was more than willing to sell you all these fiddly bits.  It seems that some of the DEC marketing people liked to refer to these bits and pieces as “giblets”. 

    Over time Steve started using the term for the pieces of software that were incidental to the product but which weren’t delivered by the main development team – things like the C runtime library, libJPG, ATL, etc. 

    Later on, someone else (Steve wasn’t sure who, it might have been Eric Bidstrup) pointed out that the giblets that came from a turkey didn’t necessarily come from the actual turkey that you’re eating which makes the analogy even more apt.

    Thanks to Craig Gehre for the picture.

  • Larry Osterman's WebLog

    Windows 7 Reflections…

    • 16 Comments

    Today[1] Microsoft formally launched Windows 7.  I can’t say how proud I am of the work we did in Windows 7 – it’s been an amazing journey.  This is the 4th version of Windows I’ve worked on and I have never felt this way about a release of Windows.  I have to admit that I get a bit weepy with pride whenever I see a Win7 commercial (Kylie is simply too cute :)). 

    I thought I’d write a bit about the WIn7 experience from my point of view.  I’ve written a bit of this stuff in my post on the Engineering 7 blog but that was more about the changes in engineering processes as opposed to my personal experiences in the process.

    For me, the Windows 7 work basically started just after we shipped Vista.  While the PMs and leads on the sound team were busy working on planning for Win7, I spent most of the time between Vista RTM and the start of the Win7 feature design cleaning up some of the Vista code I was responsible for.  During the final Vista testing, I realized that there were some architectural deficiencies in some of the sound code that caused some really subtle bugs (that I don’t believe anyone outside of the sound team has ever found) so I took the opportunity to clean up those deficiencies. 

    I also fixed an issue that occurred when someone called the wave APIs from their DLL entry point.  Yes I know that apps aren’t supposed to call any APIs from DllMain but anyone who’s read either my blog or Raymond Chen’s blog will realize that a lot of apps do stuff like that and get away with it.  This fix was actually first deployed in Vista SP1 – we had identified the DllMain problem before we shipped Vista and included a workaround for the problem but we also added some telemetry so we could determine the number of customers that were affected by the bug.  Based on that telemetry we decided that we had to include the fix in Vista SP1 of the number of users affected by the issue.  This is a perfect example of some of the ways that the customer experience improvement program directly leads to product improvements.  Before we had the CEIP, we would have had no way of knowing how much impact the bug had on customers, the CEIP gave us visibility into the severity of the problem that we wouldn’t have had before.

    During this interim time, I also worked on a number of prototype projects and helped the SDL tools team work on the current version of the threat modeling tool.  played around with some new (to me) development strategies – RAII and exception based programming and test driven development. 

    In June of 2007, we started working on actual feature planning – the planning team had come up with a set of tentative features for Win7 and we started the actual design for the features – figuring out the user experience for the features, the internal implementation details, etc.   During the first milestone, I worked on the capture monitor feature – I ended up writing the feature from start to finish.  This was my first time writing features using RAII and using TDD.  I’ve got to say that I like it – as I remember, there were only 2 or 3 bugs found in the capture monitor that weren’t found by my unit tests and I’m not aware of any bugs that were found outside the team (which might just be an indication of how little the feature is used :)).

    After the capture monitor work, I spent the next milestone working on UI features – I was on a feature crew of 2 developers working on enhancing sndvol to make it work better with multiple devices and to fix a number of accessibility issues with the UI.  This was the first time in my almost 25 years at Microsoft where I had an opportunity to do “real” UI development.  I know that there’s a lot of controversy about the UI choices we made but I’m pretty happy with the UI changes.  

    The third milestone for Win7 I worked on the “Ducking” feature.  Of all the features I worked on for WIn7, the ducking feature is the closest to a “DWIM” feature in Windows – the system automatically decreases the volume for applications when you start a communicating with other people (this feature requires some application changes to work correctly though which is why you don’t see it in use right now (although it has shown up in at least one application by accident)).

     

    The remarkable thing about Win7 development was that it was almost friction free.  During the Vista development process (and in every other product I’ve worked on) development was marked by a constant stream of new issues which were a constant drain on time an energy.  It felt like we moved from one crisis to another crisis.  For Win7 it was different.  I think it was some time during the second milestone that I realized that Win7 was “special”.  The newer development process that was deployed for Win7 was clearly paying off and my life was far less stressed.  In fact I don’t think I worked late or came in on weekends once during the entire 3 years that Win7 was under development – this was a HUGE change.  Every other product I’ve ever worked on has required late nights and weekends (sometime it required all-nighters).  But for Win7 it just didn’t happen.  Instead we set a set of goals that were reasonable with achievable schedules and we executed on those goals and delivered the features we promised.

     

    I’m so happy that customers are now going to be able to play with the stuff we’ve all worked so hard to deliver to you.  Enjoy :).

     

    [1] I started writing this on the 22nd but didn’t finish it until today.

  • Larry Osterman's WebLog

    Win7 Whoppers

    • 8 Comments

    Wow, one of my co-workers just sent this image out.  It’s totally awesome (IMHO)…

    http://twitpic.com/mehp6/full

     

     

    Edit: The image tag didn't work for some reason so I removed it and just left the link...

    Bonus: The first Win7 ad: http://video.nytimes.com/video/2009/10/21/multimedia/1247465293593/an-ad-for-windows-7-from-microsoft.html#

     

  • Larry Osterman's WebLog

    Looking for new skillz (turning the blog around)…

    • 42 Comments

    Just for giggles, I went looking at the various job listings within Microsoft and outside Microsoft (no, I’m not going anywhere, I was just curious).  While looking, I realized that I had absolutely no marketable skills :).  Nobody seems to be hiring an OS developer these days.

    To repeat and be even more clear: I’m *not* leaving Microsoft.  I’m *not* leaving Windows. 

    I’m just looking for a book or two to read to improve my skills (I do this regularly – most of my recent reading has either been on Security or WPF and to be honest, I’m kinda bored of those topics so I’m interested in branching out beyond security and UI topics)…

    I could run out and browse the bookstores (and I might just do that) but I figured “Hey, I’ve got a blog, why don’t I ask the folks who read my blog?”.  So let me turn the blog around and ask:

    If I wanted to go out and learn web development, which books should I read? 

    I’ve already read “Javascript: The ood Parts” and it was fascinating but it was more of a language book (and a very good language book), but it’s not a web development book.  So what books should I read to learn web development?

  • Larry Osterman's WebLog

    I can make it arbitrarily fast if I don’t actually have to make it work.

    • 27 Comments

    Digging way back into my pre-Microsoft days, I was recently reminded of a story that I believe was told to me by Mary Shaw back when I took her Computer Optimization class at Carnegie-Mellon…

    During the class, Mary told an anecdote about a developer “Sue” who found a bug in another developer’s “Joe” code that “Joe” introduced with a performance optimization.  When “Sue” pointed the bug out to “Joe”, his response was “Oops, but it’s WAY faster with the bug”.  “Sue” exploded “If it doesn’t have to be correct, I can calculate the result in 0 time!” [1].

    Immediately after telling this anecdote, she discussed a contest that the CS faculty held for the graduate students every year.  Each year the CS faculty posed a problem to the graduate students with a prize awarded to the grad student who came up with the most efficient (fastest) solution to the problem.  She then assigned the exact same problem to us:

    “Given a copy of the “Declaration of Independence”, calculate the 10 most common words in the document”

    We all went off and built programs to parse the words in the document, inserting them into a tree (tracking usage) and read off the 10 most frequent words.  The next assignment was “Now make it fast – the 5 fastest apps get an ‘A’, the next 5 get a ‘B’, etc.”

    So everyone in the class (except me :)) went out and rewrote their apps to use a hash table so that their insertion time was constant and then they optimized the heck out of their hash tables[2].

    After our class had our turn, Mary shared the results of what happened when the CS grad students were presented with the exact same problem.

    Most of them basically did what most of the students in my class did – built hash tables and tweaked them.  But a couple of results stood out.

    • The first one simply hard coded the 10 most common words in their app and printed them out.  This was disqualified because it was perceived as breaking the rules.
    • The next one was quite clever.  The grad student in question realized that they could write the program much faster if they wrote it in assembly language.  But the rules of the contest required that they use Pascal for the program.  So the grad student essentially created an array on the stack and introduced a buffer overflow and he loaded his assembly language program into the buffer and used that as a way of getting his assembly language version of the program to run.  IIRC he wasn’t disqualified but he didn’t win because he circumvented the rules (I’m not sure, it’s been more than a quarter century since Mary told the class this story).
    • The winning entry was even more clever.  He realized that he didn’t actually need to track all the words in the document.  Instead he decided to track only some of the words in the document in a fixed array.  His logic was that each of the 10 most frequent words were likely to appear in the first <n> words in the document so all he needed to do was to figure out what "”n” is and he’d be golden.

     

    So the moral of the story is “Yes, if it doesn’t have to be correct, you can calculate the response in 0 time.  But sometimes it’s ok to guess and if you guess right, you can get a huge performance benefit from the result”. 

     

     

    [1] This anecdote might also come from Jon L. Bentley’s “Writing Efficient Programs”, I’ll be honest and say that I don’t remember where I heard it (but it makes a great introduction to the subsequent story).

    [2] I was stubborn and decided to take my binary tree program and make it as efficient as possible but keep the basic structure of the solution (for example, instead of comparing strings, I calculated a hash for the string and compared the hashes to determine if strings matched).  I don’t remember if I was in the top 5 but I was certainly in the top 10.  I do know that my program beat out most of the hash table based solutions.

  • Larry Osterman's WebLog

    Building a flicker free volume control

    • 30 Comments

    When we shipped Windows Vista, one of the really annoying UI annoyances with the volume control was that whenever you resized it, it would flicker. 

    To be more specific, the right side of the control would flicker – the rest didn’t flicker (which was rather strange).

     

    Between the Win7 PDC release (what we called M3 internally) and the Win7 Beta, I decided to bit the bullet and see if I could fix the flicker.  It seems like I tried everything to make the flickering go away but I wasn’t able to do it until I ran into the WM_PRINTCLIENT message which allowed me to direct all of the internal controls on the window to paint themselves.

    Basically on a paint call, I’d take the paint DC and send a WM_PRINTCLIENT message to each of the controls in sndvol asking them each to paint themselves to the new DC.  This worked almost perfectly – I was finally able to build a flicker free version of the UI.  The UI wasn’t perfect (for instance the animations that faded in the “flat buttons” didn’t fire) but the UI worked just fine and looked great so I was happy that' I’d finally nailed the problem.  That happiness lasted until I got a bug report in that I simply couldn’t figure out.  It seems that if you launched the volume mixer, set the focus to another application then selected the volume mixer’s title bar and moved the mixer, there were a ton of drawing artifacts left on the screen.

    I dug into it a bunch and was stumped.  It appeared that the clipping rectangle sent in the WM_PAINT message to the top level message didn’t include the entire window, thus portions of the window weren’t erased.  I worked on this for a couple of days trying to figure out what was going wrong and I finally asked for help on one of our internal mailing lists.

    The first response I got was that I shouldn’t use WM_PRINTCLIENT because it was going to cause me difficulty.  I’d already come to that conclusion – by trying to control every aspect of the drawing experience for my app, I was essentially working against the window manager – that’s why the repaint problem was happening.  By calling WM_PRINTCLIENT I was essentially putting a band-aid on the real problem but I hadn’t solved the real problem, all I’d done is to hide it.

     

    So I had to go back to the drawing board.  Eventually (with the help of one of the developers on the User team) I finally tracked down the original root cause of the problem and it turns out that the root cause was somewhere totally unexpected.

    Consider the volume UI:

    image

    The UI is composed of two major areas: The “Devices” group and the “Applications” group.  There’s a group box control wrapped around the two areas.

    Now lets look at the group box control.  For reasons that are buried deep in the early history of Windows, a group box is actually a form of the “button” control.  If you look at the window styles for a button in SpyXX, you’ll see:

    image

     

    Notice the CS_VREDRAW and CS_HREDRAW window class styles.  The MSDN documentation for class styles says:

    CS_HREDRAW - Redraws the entire window if a movement or size adjustment changes the width of the client area.
    CS_VREDRAW - Redraws the entire window if a movement or size adjustment changes the height of the client area.

    In other words every window class with the CS_HREDRAW or CS_VREDRAW style will always be fully repainted whenever the window is resized (including all the controls inside the window).  And ALL buttons have these styles.  That means that whenever you resize any buttons, they’re going to flicker, and so will all of the content that lives below the button.  For most buttons this isn’t a big deal but for group boxes it can be a big issue because group boxes contain other controls.

    In the case of sndvol, when you resize the volume control, we resize the applications group box (because it’s visually pinned to the right side of the dialog).  Which causes the group box and all of its contained controls to repaint and thus flicker like crazy.  The only way to fix this is to remove the CS_HREDRAW and CS_VREDRAW buttons from the window style for the control.

    The good news is that once I’d identified the root cause, the solution to my problem was relatively simple.  I needed to build my own custom version of the group box which handled its own painting and didn’t have the CS_HREDRAW and CS_VREDRAW class.  Fortunately it’s really easy to draw a group box – if themes are enabled a group box can be drawn with DrawThemeBackground API with the BP_GROUPBOX part and if theming is disabled, you can use the DrawEdge API to draw the group box.  Once I added the new control that and dealt with a number of other clean-up issues (making sure that the right portions of the window were invalidated when the window was resized for example), making sure that my top level control had the WS_CLIPCHILDREN style and that each of the sub windows had the WS_CLIPSIBLINGS style I had a version of sndvol that was flicker free AND which let the window manager handle all the drawing complexity.  There are still some minor visual gotchas in the UI (for example, if you resize the window using the left edge the right side of the group box “shudders” a bit – this is apparently an artifact that’s outside my control – other apps have similar issues when resized on the left edge) but they’re acceptable.

    As an added bonus, now that I was no longer painting everything manually, the fade-in animations on the flat buttons started working again!

     

    PS: While I was writing this post, I ran into this tutorial on building flicker free applications, I wish I’d run into it while I was trying to deal with the flickering problem because it nicely lays out how to solve the problem.

Page 1 of 33 (815 items) 12345»