Larry Osterman's WebLog

Confessions of an Old Fogey
  • Larry Osterman's WebLog

    Free security CD from Microsoft


    This isn’t news, but it came up in a discussion in the comments section of my blog yesterday.

    Microsoft offers a free CD that contains all the security patches for Windows XP, Windows ME, Windows 2000, Windows 98, and Windows 98SE up to February of this year.

    You can order it here.


  • Larry Osterman's WebLog

    Unique pointers in COM interfaces


    One issue that keeps on coming up day after day has to do with what RPC (or COM) does with pointers as parameters in RPC (or COM) interfaces.  I’m going to talk about string parameters, since they’re relatively simple, but everything I say applies to non string pointer parameters as well.

     If the string parameter to your method isn’t optional, then it’s really easy.  You declare your method with the appropriate attributions and you’re done.

          long MyCOMMethod([in, string] wchar_t *StringParameter1,
                           [in, string] wchar_t *StringParameter2,
                       [in] unsigned long DWORDParameter1,
                       [in] unsigned long DWORDParameter2,
                       [out] unsigned long *DWORDReturned);

    Simple and straightforward.  RPC (or COM) will marshal the StringParameter1 and StringParameter2 parameters as null terminated Unicode strings and will pass them to the server.

    The thing is that by default, pointer parameters to RPC interfaces are treated as “ref” pointers (passed by reference).  And “ref” pointers can’t be NULL.

    What’s even more subtle about this is that the uniqueness of “ref” pointers is enforced in the RPC marshaler.  So it’s entirely possible you could have an in-proc COM object and pass a null pointer in for StringParameter1 and never notice that there’s a problem.

    Until you pass the null pointer to a COM object that was created in a different apartment, or the COM object goes out-of-proc.  All of a sudden, your RPC call starts failing with error 1780, RPC_X_NULL_REF_POINTER (if it’s a COM API, then you’ll get 0x800706F4 returned).

    The good news is that RPC has a mechanism to handle this; all you need to do is to declare the pointer to be “unique”.  A unique pointer has some useful characteristics:

    ·         It can be NULL.

    ·         It can change from NULL to non NULL, which means that it’ll allocate new memory on the client to hold the new value.  This makes it ideal for returning structures to the client.

    ·         It can’t be aliased – in other words, the memory pointed to by a unique pointer will not be pointed to by any other parameter to the function.  More on that later.

    So to fix the interface, all you need to do is:

          long MyCOMMethod([in, string, unique] wchar_t *StringParameter1,
                           [in, string] wchar_t *StringParameter2,
                       [in] unsigned long DWORDParameter1,
                       [in] unsigned long DWORDParameter2,
                       [out] unsigned long *DWORDReturned);

    And now you can pass null pointers to StringParameter1.

    But there’s a gotcha that has bitten almost every programmer that’s dealt with this.

    You see, RPC also has this attribute called “pointer_default” which is applied to an entire interface:

    interface dictionary

    People see this attribute and make the logical assumption that if they have the pointer_default attribute on their interface that it means that all pointers in all the methods are declared as “unique”. 

    Unfortunately that’s not the case.  The pointer_default attribute on the interface specifies the behavior of all pointers EXCEPT the pointers passed in as parameters to routines.

    By the way, the reason that “unique” pointers are called “unique” is to differentiate them from “ptr” pointers.  “ptr” pointers have many of the same qualities of “unique” pointers, with one critical difference.  A “ptr” pointer is a full C pointer.  This means that it allows aliasing of data – two “ptr” parameters to a routine (or two “ptr” fields in a structure) can point to the same piece of memory.  As I mentioned above, a “unique” pointer can’t be aliased.

    So why would I ever use “unique” pointers instead of “ptr” pointers?  Well, there’s a non trivial amount of overhead associated with “ptr” pointers – the RPC runtime library has to scan the parameters to see if there are any aliases so it can resurrect them on the server side.  If you can guarantee that there aren’t any aliases in your parameters (which is 99.9% of the time for the interfaces I’ve worked with), then the “unique” attribute is faster.


  • Larry Osterman's WebLog

    Microsoft just doesn't get Security - NOT!


    I was reading Robert Scoble’s post on “Longhorn Myths”, and I noticed this comment from “Dave” in his comments thread:

    Most outlandish Longhorn myth? I mean this with all due respect, and say it with complete sincerity.... it will be one that MS will in fact say: that Longhorn will be a very secure sytstem.

    Yes, it will be much more secure than any other verison of Windows. Yes, it will be as secure as MS can possibly make it. But try as they might, a few factors come into play that will make it next to impossible for Longhorn to be a very secure system.

    (1) Longhorn, being a Microsoft product and a popular product, is destined to be targeted by hackers around the world. If there's a hole to be found, they'll find it. And nobody can make a system 100% secure.

    (2) MS still places a higher emphasis on new forms of functionality/interaction than they do on security. Yes, they have a greater emphasis on security than even one year ago, but their concern - at this point in the Longhorn product life cycle - is more on getting things to work and work well than it is to play devil's advocate and find all the security holes they can find.

    My response (updated and edited): Um Compared to what? Linux? Hands down, Longhorn will be more secure out-of-the-box than any Linux distribution available at the time.

    There will be holes found in Longhorn, absolutely. But Microsoft GETS security nowadays. In general, Linux/Open Source community doesn't yet (The OpenBSD guys appear to get it, but I’ve not seen any indications of this level of scrutiny associated with the other distributions).

    The Linux guys will eventually, but they don't get it yet.

    If you're going to argue that Linux/OSX is somehow safer because they're not popular, then that's relying on security by obscurity. And that dog don’t hunt :)

    Even today, I'd stack a Win2K3 machine against ANY Linux distribution out there on the internet. And Longhorn's going to be BETTER than Win2K3.  After all, Longhorn’s starting with an amalgam of Win2K3 and XP SP2, and we’re enhancing system security even beyond what’s gone into the previous releases.


    “Dave’s” comment #2 is the one I wanted to write about though.  Microsoft doesn’t place a higher emphasis on new forms of functionality than they do on security.  Security is an integral part of every one of our development processes here at Microsoft.  This hits every aspect of a developer’s life.  Every developer is required to attend security training, starting at New Employee Orientation, continuing through annual security refreshers.

    Each and every new feature that’s added to the system has to be thoroughly threat-modeled, we need to understand every aspect that any new component can be attacked, and the kind of compromise that can result from a failure of each system.  If there’s a failure mode, then we need to understand how to defend against it, and we need to design mitigations against those threats.

    Every test group at Microsoft is required to have test cases written that test exploiting ALL of our interfaces, by various means.  Our test developers have all gone through the same security testing that the other developers have gone through, with an intensive focus on how to test security holes.

    Every line of code in the system is code reviewed before it’s checked into the mainline source tree, to check for security problems, and we’ve got a security push built-into the schedule where we’ll go back and re-review all the code that was checked in during the lifetime of the project.

    This is a totally new way of working, it’s incredibly resource intensive, but the results are unmistakable.  Every system we’ve released since we started implementing these paradigms has been significantly more secure than the previous ones, Longhorn will be no different.

    I’m not saying that Longhorn will be security hole free, it won’t be.  We’re human, we screw up.  But it’ll be orders of magnitude better than anything out there. 

    Edit: Added the following:

    By the way, I want to be clear: I'm not trying to denegrate the entire open source community.  There ARE people who get it in the open source community.  The OpenBSD link I mentioned above is a good example of a team that I believe DOES understand what's needed these days.

    I just don't see the same level of rigor being applied by the general community.  Maybe I'm just not looking in the right places.  Believe me, I'd LOVE to be proven wrong on this one.

    Edit: Replaced thread-modeled with threat-modeled :)


  • Larry Osterman's WebLog

    Things not to do, Part 3: Don't call CreateFile on your Windows message pump thread.


    Sometimes it’s not a big deal, but there are some situations where it’s disastrous to call CreateFile on a message pump thread.

    The problem occurs because CreateFile might take a long time to complete.  Conceivably it could actually take HOURS (although that is highly unlikely).  And if you’re calling CreateFile on your message pump thread, you’re not pumping messages (since CreateFile is synchronous).  Which means your app freezes and you get the dreaded “(not responding)” on your window if the user attempts to interact with your app.

    Why would it take so long?  Well, what happens if the file’s located on slow media?  Like a floppy – remember floppy disks?  They can spend several seconds spinning up, especially if the floppy is newly inserted in the disk drive. 

    Another even more common cause is if the file in question is located on a network drive – the first thing that happens when you’re opening a networked file is a DNS lookup which can take several seconds to complete.  Then you need to contact the file server, which in turn has to authenticate the user, and then process the open, all of which may also take several seconds to complete. 

    If the file being opened is a message mode named pipe, you might block until an instance of the pipe becomes available on the server end, which could conceivably be hours (whenever the current named pipe clients disconnect).


    Bottom line: Don’t do ANY file I/O on your message pump threads; offload them to worker threads so you can make your app responsive.


  • Larry Osterman's WebLog

    Business Conduct Training Complaints


    Oh man, I’m gonna get in SO much trouble for this one. 

    So I get an email this morning telling that I’ve got to watch web training video about Microsoft’s standards of business conduct.

    It’s a standard corporate training video – discussion about what our standards are, and then you come to the meat of the video: Six scenarios, where you’ve got to make a judgment call about the issues associated with various things that could happen in the course of business.

    Good scenarios actually – covering things like insider trading, sexual harassment, use of software purchased for work at home, etc.

    But at the end of each scenario, there’s a question: “Can <the person in the scenario> do this?”

    Duh!  The answer’s No!  The answer is ALWAYS “No” (ok, the sexual harassment question was “Does this scenario violate Microsoft’s sexual harassment policy?”  The answer to that one was “Yes”). 

    Why is it that people who write these training videos always chose scenarios where the answer is clearly “No”?  Why not show a somewhat ambiguous scenario where the answer is “Yes”?

    This isn’t restricted to Microsoft by the way.  We had a discussion about this last year at my kid brother’s high school graduation celebration – my father (a lawyer at Whiteman, Osterman and Hanna), my brother (a lawyer at Weil, Gotshaw & Manges), and my uncle (a lawyer at Hale and Dorr) had a discussion about the ethics training that the bar association requires that all lawyers receive (we were discussing a college course in ethics my brother had taken)

    My father’s only comment was “These things are easy – the answer’s always that it’s unethical”.

    So my question is: Why don’t they give us ambiguous ethical dilemmas in this training?  Why do the people authoring this training always give us scenarios that have a clear and obvious answer? 


  • Larry Osterman's WebLog

    Software archaeology


    There are times that I think my job is the same as an archeologist. 

    Rick touched on this a bit on his “Anatomy of a Software Bug” post (an excellent read, btw, if you haven’t already seen it).

    Code, like people, gets old.  And, just like old people, old code tends to be pretty brittle.

    This is especially true after the original developer of the code has moved on to greener pastures and the remaining members of the team are left to maintain the code.  Since they don’t always understand the details of the implementation, they’re not willing modify the code, for fear of breaking the code.  But they’re still tasked with adding new features to the code, fixing bugs in the code, etc.

    As a result, they tend to make their changes as small surgical changes that affect the component as little as possible.  And the more these surgical changes add up, the more fragile the code gets.

    As these changes build up, the code in the component starts to resemble a piece of conglomerate rock.  The more people add onto the base, the more blobs of other rocks get added to the conglomerate.

    When I work on the code like this, it sometimes feels like I’m an archaeologist digging through the layers of an ancient city – if you pay really close attention, you can see which developers wrote which portions of the code.

    One of the clues I have to work with is the tab styles – different developers use different tabbing conventions, which is a clear clue.  Similarly, different developers use different bracing styles or different naming conventions on variables.

    I’ve been working in winmm.dll for the past several months, and it is a classic example of this.  The base code is about fifteen years old, it was written for Windows 3.1.  Over the years, it’s been ported to Win95, then ported to Windows NT, then ported to Win64.  The code’s been tweaked, support for new technologies have been added (WDM, PnP), functionality’s been replaced (the current joystick logic is totally different from the original logic).  So there’s been a lot of stuff added to the original core.  But it’s all been bolted on piecemeal.

    In addition, since the code was written, Microsoft as a whole has gotten a whole lot getter at software engineering – the coding standards and conventions that are standard today are orders of magnitude of what was done 15 years ago, so sometimes it’s painful to see the old code sometimes – as I’ve written before, back in the late 1980’s, code size was king, memory was expensive, so design decisions were made to favor techniques that reduced code size at the cost of code clarity.

    By the way, this isn’t a dig on WINMM, or Microsoft’s coding – ANY sufficiently old piece of code will have these characteristics.  I’m sure that you could perform the exact same archaeological analysis of any old piece of code and find the exact same kind of issues.


  • Larry Osterman's WebLog

    I win the google prize!


    My post on “how do I explain dividing fractions” from yesterday is the #1 google hit when you search for:

    what do i do to figure out fractions with different denominators


  • Larry Osterman's WebLog

    Managed Code can't have memory leaks, right?


    One of the axioms of working with managed code is that since you’re using managed code, you don’t have to worry about memory leaks.

    This can’t be further from the truth.  It’s totally possible to write code that leaks memory, even with managed code.

    The trick is when you realize that your code can leak, even if the CLR hasn’t.

    Here’s some code I wrote earlier today that leaks memory, for example:

                public static void AppendXxxNodes(XmlNode Node)
                      if (Node.SelectSingleNode("Xxx") == null)
                            if (Node.HasChildNodes)
                                  foreach (XmlNode childNode in Node.ChildNodes)
                                        XmlNode systemProtectionNode = Node.OwnerDocument.CreateElement("Xxx");

    The problem is of course that Node.AppendChild adds an entry to the ChildNode array, so the foreach node is effectively in an infinite loop.  The CLR hasn’t leaked any memory; it knows where all the objects are.  But my app leaked memory just the same.

    I had this happen to me in the Exchange NNTP server many years ago in unmanaged code.  I had a credentials cache that was used to hold onto the logon information for users connecting to the NNTP server.  But there was a bug in the cache that prevented cache hits.  As a result, every user logon caused a new record to be added to the credentials cache.

    Now, the memory didn’t “leak” – when we shut down the server, it quite properly cleaned out all the entries in the credentials cache.  My code didn’t forget about the memory, it just was using too much of it.

    There are LOTS of ways that this can happen, both obvious (like my example above), and subtle.

    Edit: Fixed stupid mistake :)

  • Larry Osterman's WebLog

    How do I divide fractions?


    Valorie works as a teacher's aid in a 6th grade classroom at a local elementary school.

    They've been working on dividing fractions recently, and she spent about two hours yesterday working with one student trying to explain exactly how division of fractions works.

    So I figured I'd toss it out to the blogsphere to see what people's answers are.  How do you explain to a 6th grader that 1/2 divided by 1/4 is 2? 

    Please note that it's not sufficient to say: Division is the same as multiplication by the inverse, so when you divide two fractions, you take the second one, invert it, and multiply.  That's stating division of fractions as an axiom, and not a reason.

    In this case in particular, the teacher wants the students to be able to graphically show how it works.

    I can do this with addition and subtraction of numbers (both positive and negative) using positions on a number line. Similarly, I can do multiplication of fractions graphically - you have a whole, divide it into 2 halves.  When you multiply the half by a quarter, you are quartering the half, so you take the half, divide it into fours, and one of those fours is the answer.

    But how do you do this for division?

    My wife had to type this part because we have a bit of, um, discussion, about how simple this part is....

    How can you explain to 9-11 year old kids why you multiply by the reciprocal without resorting to the axiom? It's easy to show graphically that 1/2 divided by 1/4 is 2 quarters because the kids can see that there are two quarters in one half. Equally so, the kids can understand that 1/4 divided by 1/2 is 1/2 of a half because the kids can see that only half of the half is covered by the original quarter. The problem comes in when their intuition goes out.  They can solve it mathematically, but the teacher is unwilling to have them do the harder problems “on faith“ and the drawing is really confusing the kids. Having tried to draw the 5/8 divided by 3/10, I can assure you, it is quite challenging. And no, the teacher is not willing to keep the problems easy. And no, don't get me started on that aspect of this issue.

    I'm a big fan that if one method of instruction isn't working, I try to find another way to explain the concept. I visited my usual math sites and found that most people don't try to graph this stuff until 10th grade or adulthood. Most of the sites have just had this “go on faith“ response (show the kids the easy ones, and let them “go on faith“ that it will hold true for all cases). I really wish I could figure out a way to show successive subtraction, but even that gets difficult on the more complicated examples.

    What I am hoping is that someone out there can provide me with the “aha!“ I need to come up with a few more ways to explain this. What this has been teaching me is that I've been doing this “on faith“ most of my life and never stopped to think about why myself.

    Any ideas/suggestions would be much appreciated.


  • Larry Osterman's WebLog

    Should I check the parameters to my function?


    I just had an interesting discussion with one of the testers in my group.

    He had just finished filing a series of bugs against our components because they weren’t failing when he passed bogus pointers to the API.  Instead, they raised a 0xC0000005 exception and crashed his application.

    The APIs did fail if he passed a null pointer in, with E_POINTER. 

    But he felt that the API should check all the bogus pointers passed in and fail with E_POINTER if the pointer passed in didn’t point to valid memory.

    This has been a subject of a lot of ongoing discussion over the years internally here at Microsoft.  There are two schools of thought:

    School one says “We shouldn’t crash the application on bogus data.  Crashing is bad.  So we should check our parameters and return error codes if they’re bogus”.

    School two says “GIGO – if the app hands us garbage, then big deal if it crashes”.

    I’m firmly in the second camp (not surprisingly, if you know me).  There are a lot of reasons for this.  The biggest one is security.  The way you check for bad pointers on Win32 is by calling the IsBadReadPtr and IsBadWritePtr API.  Michael Howard calls these APIs “CrashMyApplication” and “CorruptMemoryAndCrashMySystem” respectively.  The problem with IsBadReadPtr/IsBadWritePtr is that they do exactly what they’re advertised as doing:  They read and/or write to the memory location specified, with an exception handler wrapped around the read/write.  If an exception is thrown, they fail, if not, they succeed.

    There are two problems with this.  The only thing that IsBadReadPtr/IsBadWritePtr verifies is that at the instant that the API is called, there was valid memory at that location.  There’s nothing to prevent another thread in the application from unmapping the virtual address passed into IsBadReadPtr immediately after the call is made.  Which means that any error checks you made based on the results of this API aren’t valid (this is called out in the documentation for IsBadWritePtr/IsBadReadPtr).

    The other one is worse.  What happens if the memory address passed into IsBadReadPtr is a stack guard page (a guard page is a page kept at the bottom of the stack – when the system top level exception handler sees a fault on a guard page, it will grow the threads stack (up to the threads stack limit))?  Well, the IsBadReadPtr will catch the guard page exception and will handle it (because IsBadReadPtr handles all exceptions).  So the system exception handler doesn’t see the exception.  Which means that when that thread later runs, its stack won’t grow past the current limit.  By calling IsBadReadPtr in your API, you’ve turned an easily identifiable application bug into a really subtle stack overflow bug that may not be encountered for many minutes (or hours) later.

    The other problem with aggressively checking for bad parameters on an API is that what happens if the app doesn’t check the return code from the API?  This means that they could easily have a bug in their code that passes a bogus pointer into IsBadWritePtr, thus corrupting memory.  But, since they didn’t check the return code, they don’t know about their bug.  And, again, much later the heap corruption bug that’s caused by the call to IsBadWritePtr shows up.  If the API had crashed, then they’d find the problem right away.

    Now, having said all this, if you go with school two, you’ve still got a problem – you can’t trust the user’s buffers.  At all.  This means you’ve got to be careful when touching those buffers to ensure that you’re not going to deadlock the process by (for instance holding onto a critical section while writing to the user’s buffer).

    The other thing to keep in mind is that there are some situations where it’s NOT a good idea to crash the user’s app.  For example, if you’re using RPC, then RPC uses structured exception handling to communicate RPC errors back to the application (as opposed to API return codes).  So sometimes you have no choice but to catch the exceptions and return them.  The other case is if someone has written and shipped an existing API that uses IsBadReadPtr to check for bad pointers on input, it may not be possible to remove this because there may be applications that depend on this behavior.

    So in general, it’s a bad idea to use IsBadXxxPtr on your input parameters to check for correctness.  Your users may curse you for crashing their app when they screw up, but in the long term, it’s a better idea.

  • Larry Osterman's WebLog

    Office Decorations


    One of the long standing traditions here at Microsoft is decorating other employee’s offices.

    Over the years, people have come up with some extraordinarily creative ways to trash others offices.  It’s truly awe inspiring how people use their imagination when they want to make mischief.

    One of my all-time favorites was done to one of the Xenix developers for his birthday.

    This particular developer had recently taken up golf.  So the members of his team came in one night, removed all the furniture from the office, and brought in sod to cover the office floor.

    They then cut a hole in the sod for the golf cup, mounted a golf pole (stolen from a nearby golf course, I believe), and put all the office furniture back in the office, making him his own in-office putting green.

    You could smell the sod from one side of the building to the other, it was that strong.

    I don’t want to think of how they ended cleaning it up.


  • Larry Osterman's WebLog

    What assembly language DOES your code generate?


    Pat Niemeyer had a fascinating comment in my article about programmers knowing roughly what assembly language their code generates:

    Your example serves to demonstrate that developers should, under normal circumstances, *not* care about low level code or performance issues... They should just do the "right thing" where that is usually the simplest and most staightforward thing. If developers had followed this rule in Java all along then they woudln't have been bitten by all these stupid optimization rules that ended up being anti-optimizations when the VMs got smarter.

    Trust the compiler people... trust the VM people and then when your code is working trust the profilers to tell you what is actually going on. People's intuition about performance issues is wrong more often thant it's right. You don't know anything unless you profile it - zip, nada, nothing. You may think you're a hotshot Java programmer (we all do) but you're wrong most of what you think about optimization. That's the real rule.

    At some level, Pat’s absolutely right – the VM and compiler people know far better than you do how to optimize your code.  But please note my use of the phrase “Roughly” – You don’t need to know the exact instructions that are generated, actually, as I mentioned above Pat’s comment – in the face of modern optimizing compilers, it’s actually a bad idea to write your code in assembly unless you REALLY know what you’re doing (I’d be willing to trust someone like Mike Abrash or maybe a couple of others to write assembly language code for modern processors, but I certainly wouldn’t attempt to do it myself, even though I spent the first 10 years of my career at Microsoft programming exclusively in x86 assembly language) .

    When I started writing x86 assembly language, the rule of thumb for performance was pretty simple:  As long as you don’t use multiply or divide, the smaller your code, the faster it is.  That was all you needed to know.  With the Pentium, this changed for Intel processors (it had always been the case for RISC processors).  The infamous Appendix H described in full how to write optimal code for the Pentium series processors.

    All of a sudden, it became quite difficult to actually write efficient code.  And for every release of Intel’s processors, it’s become harder and harder to write the fastest code possible in assembly – typically compilers can do a much better job of it than mere mortals.

    But I still stand by my statement – I disagree with Pat, especially if you’re writing systems code.  If you’re introducing several thousand extra instructions inside your POP3 server because you chose to do error processing by throwing exceptions (no, I didn’t do that in the Exchange POP3 server), or if your I/O subsystem is twice as large as it should be because you used macros for critical sections, you in fact be using the most “straightforward thing”.  But unless you know what the performance implications of doing the “most straightforward thing” are, it’s highly likely that doing the “straightforward thing” is the thing that stops your POP3 server from being able to support 50,000 users on a single Exchange server  Or it’ll be the difference between serving 100 million hits a day and 50 million hits a day.


  • Larry Osterman's WebLog

    Why is it FILE_SHARE_READ and FILE_SHARE_WRITE anyway?


    Raymond’s post about FILE_SHARE_* bits reminded me of the story about why the bits are FILE_SHARE_READ in the first place.

    MS-DOS had the very same file sharing semantics as NT does (ok, NT adds FILE_SHARE_DELETE, more on that later).  But on MS-DOS, the file sharing semantics were optional – you had to load in the utility to enable them.  This was because on a single tasking operating system, there was only ever going to be one application running, so the sharing semantics were considered optional.  Unless you were running a file server, in which case Microsoft strongly suggested that you should load the utility.

    On MS-DOS, the sharing mode was controlled by the three “sharing mode” bits.  They legal values for “sharing mode” were:

                000 – Compatibility mode. Any process can open the file any number of times with this mode.  It fails if the file’s opened in any other sharing mode.
                001 – Deny All.  Fails if the file has been opened in compatibility mode or for read or write access, even if by the current process
                010 – Deny Write.  Fails if the file has been opened in compatibility mode or for write access by any other process
                011 – Deny Read – Fails if the file has been opened in compatibility mode or for read access by any other process.
                100 – Deny None – Fails if the file has been opened in compatibility mode by any other process.

    Coupled with the “sharing mode” bits is the four “access code” bits.  There were only three values defined for them, Read, Write, and Both (Read/Write).

    The original designers of the Win32 API set (in particular, the designer of the I/O subsystem) took one look at these permissions and threw up his hands in disgust.  In his opinion, there are two huge problems with these definitions:

    1)                  Because the sharing bits are defined as negatives, it’s extremely hard to understand what’s going to be allowed or denied.  If you open a file for write access in deny read mode, what happens?  What about deny write mode – Does it allow reading or not?

    2)                  Because the default is “compatibility” mode, it means that by default most applications can’t ensure the integrity of their data.  Instead of your data being secure by default, you need to take special actions to guarantee that nobody else messes with the data.

    So the I/O subsystem designer proposed that we invert the semantics of the sharing mode bits.  Instead of the sharing rights denying access, they GRANT access.  Instead of the default access mask being to allow access, the default is to deny access.  An application needs to explicitly decide that it wants to let others see its data while it’s manipulating the data.

    This inversion neatly solves a huge set of problems that existed while running multiple MS-DOS applications – if one application was running; another application could corrupt the data underneath the first application.

    We can easily explain FILE_SHARE_READ and FILE_SHARE_WRITE as being cleaner and safer versions of the DOS sharing functionality.  But what about FILE_SHARE_DELETE?  Where on earth did that access right come from?  Well, it was added for Posix compatibility.  Under the Posix subsystem, like on *nix, a file can be unlinked when it’s still opened.  In addition, when you rename a file on NT, the rename operation opens the source file for delete access (a rename operation, after all is a creation of a new file in the target directory and a deletion of the source file).

    But DOS applications don’t expect that files can be deleted (or renamed) out from under them, so we needed to have a mechanism in place to prevent the system from deleting (or renaming) files if the application cares about them.  So that’s where the FILE_SHARE_DELETE access right comes from – it’s a flag that says to the system “It’s ok for someone else to rename this file while it’s running”. 

    The NT loader takes advantage of this – when it opens DLL’s or programs for execution, it specifies FILE_SHARE_DELETE.  That means that you can rename the executable of a currently running application (or DLL).  This can come in handy when you want to drop in a new copy of a DLL that’s being used by a running application.  I do this all the time when working on winmm.dll.  Sine winmm.dll’s used by lots of processes in the system, including some that can’t be stopped, I can’t stop all the processes that reference the DLL, so instead, when I need to test a new copy of winmm, I rename winmm.dll to winmm.old, copy in a new copy of winmm.dll and reboot the machine.


  • Larry Osterman's WebLog

    Things you shouldn't do, part 2: Dll's can't ever call CoInitialize* on the application's thread.


    When a Dll’s executing code on an application’s behalf, the Dll can NEVER call CoInitalizeEx on the application’s thread.

    Why?  Because you can’t know the application’s threading model, so you can’t get it right.  If the app’s initialized COM in a single-threaded apartment and you attempt to put the thread into the multi-threaded apartment, the CoInitializeEx call will fail.  Similarly, if the app’s called CoInitializeEx put the thread in the MTA, you can’t reinitialize in the STA.  You could add code to allow the RPC_E_CHANGED_MODE error return that CoInitializeEx returns, but there are sometimes COM objects that require a particular threading model.

    This is especially true if your DLL allocates some object that contains pointers to other COM objects and returns a pointer to that object. A good example of an API that uses this pattern is CryptAcquireContext – I’m not saying that it uses COM, it probably doesn’t, but it COULD. 

    An API written using the XxxAcquireContext/XxxReleaseContext design pattern could be tempted to call CoInitializeEx() in the XxxAcquireContext routine and call CoUnitialize in the XxxReleaseContext routine.  And that would be utterly wrong.  The problem is that in this case, if you initialized COM in the MTA during the XxxAcquireContext routine, and then the application attempted to create an STA for the thread, the app’s call will fail.  And the app is entirely likely to be quite unhappy with its call to CoInitialize failing. 

    Even if the application ignored the error, then the application would potentially get callbacks on other threads, which is likely to break the application.  So you say “Ok, I can deal with this, I’ll just initialize myself in an STA.  But then if the app attempted to put the thread in the MTA, you’ve once again messed up the application. 

    You just can’t win.

    In addition, this rule has some interesting corollaries:  You either need to rely on the application to call CoInitialize for you, and add this to your documentation, or you need to do all your COM interactions on a separate worker thread.  If you’re dealing with a legacy component however, you have no choice but to move your COM interactions to a separate worker thread.  I first ran into this issue when I made some modifications to winmm.dll to add support that required interacting with a COM component.  The good news is that it’s pretty easy to set up a worker thread to do the work – just create the thread and use PostQueuedCompletionStatus to post work items to the queue.


  • Larry Osterman's WebLog

    What's wrong with this code, part 2 (the answers)


    So here are the intentional bugs in the “what’s wrong with this code, take two” post:

    The first two bugs are quite straightforward, but insidious.  I can’t think of the number of times I’ve seen code that gets this wrong. 

    Reading (or writing) data from (to) a socket is like putting your mouth on the end of hose and turning on the faucet.  Water flows out until it’s turned off.  But there’s a valve in between you and the faucet, and someone else controls that valve.  It’s possible that conditions can cause the data received to be shorter than the amount that was sent (or is shorter than the receive buffer).

    Given that, the first two bugs are straightforward:  This code:

           //     Receive the byte count from the socket.
           cbReceived = recv(socket, (char *)&cbMessage, sizeof(short), 0);
           if (cbReceived == SOCKET_ERROR)
                  return WSAGetLastError();

    Should really be written like:

           char lengthBuffer[sizeof(short)];
           cbReceived = 0;
                  int cbThisReceive = recv(socket, &lengthBuffer[cbReceived], sizeof(short) - cbReceived, 0);
                  if (cbThisReceive == SOCKET_ERROR)
                         return WSAGetLastError();
                         cbReceived += cbThisReceive;
           } while (cbReceived != sizeof(short));
           cbMessage = *((short *)lengthBuffer);

    Similarly, the code that receives the actual buffer needs to be coded to deal with data received being too short for the input buffer (I’m not going to show the corrected code, it’ll make this post too long).

    The third bug’s a spec bug.  Nowhere in the specification does it say what byte order is to be used for the length of the message.  So if you compile the server for this code on a big endian machine, and then run the client on a little-endian machine, you’ll have a problem.

    And the fourth problem’s the big one.  This would result in a security hotfix.

    The problem is:

           //     Allocate a buffer to hold the network message.
           pnetmsg = (PNETMSG)new BYTE[cbMessage + FIELD_OFFSET(NETMSG, Message)];
           if (pnetmsg == NULL)
    The cbMessage  value isn’t checked, and when we add FIELD_OFFSET(NETMSG, Message) to cbReceived, you have an integer overflow security vulnerability.  Jerry Pisk says that it shouldn’t overflow, but if the value sent by the server causes cbMessage to be negative, then it will sign extend the value to a negative 32 bit integer and…

    Credits:  B.Y. and Jerry Pisk, and Mike for being the first to pick up on the two receive bugs, B.Y. and Jerry for the cbReceived security hole.  BJ Herrington (‘d0x) for finding the spec problem

    Now for the unintentional bugs. 

    B.Y. pointed out that recv() can return 0 if the client gracefully tore down the connection.  The code correctly handles that, but unnecessarily calls recv() a second time.  He also pointed out that there are two code paths that don’t free the buffer.

    Jerry Pisk pointed out that the allocation above should be:

           pnetmsg = (PNETMSG)new BYTE[cbMessage + FIELD_OFFSET(NETMSG, Message)];
           if (pnetmsg == NULL)

    The result is the same (security hole) but the effect is different (and easier to exploit).  The good news is that Jerry’s bug is likely to be caught by tools like PageHeap or appverifier.

     AT also pointed another documentation error – the routine isn’t specified as working only on TCP streams, if you tried to use it on a UDP socket, it would fail.

    There were also several other very valid points made during the discussion, thanks all J

    I’ll probably have another of these in the future, they’re fascinating.


  • Larry Osterman's WebLog

    What's wrong with this code, take two...


    Well, my last “What’s wrong with this code” post was such a rollicking good success, I figured I’d do it one more time (I’m SUCH a glutton for punishment).

    This time, I not only bench checked the code, but I verified that it actually worked (which is more than I can say for the last one) J.

    Basically this is a routine that receives a “message” from the network.  The “message” is to be time stamped and sequenced on arrival (this allows the message to be posted to worker threads while preserving ordering semantics).  A “message” consists of a short containing the length of the message and a buffer containing the actual contents of the message. 

    Just for reference, there are four intentional bugs in this code.  Tomorrow, I’ll post what the bugs are.       

    Assume that there is only one thread calling ReceiveSocketMessage on a particular socket at any given time – in other words, the logic involving the sequence numbers and/or timestamps isn’t the source of the bugs J.

     *     Network Message – Holds a message received from the network.
    typedef struct _NetworkMessage
           long                 MessageSequence;
           FILETIME             MessageTime;
           unsigned short       MessageSize;
           char                 Message[ANYSIZE_ARRAY];
    long GlobalMessageSequence = 0;   // Global variable containing the current sequence count.
     * ReceiveSocketMessage
     *     Receives a message from a socket.
     *     A “message” consists of a short message length and a buffer containing the actual message. 
     * Inputs:
     *     socket – Opened socket on which to receive the message.
     *     ppnetmsg – Output pointer that will hold a newly allocated PNETMSG structure with the MessageSequence, MessageTime, MessageSize and Message fields filled in.
     * Returns:
     *      Winsock result code if error, 0 if success.
    int ReceiveSocketMessage(SOCKET socket, PNETMSG *ppnetmsg)
           int cbReceived = SOCKET_ERROR;
           short cbMessage;
           PNETMSG pnetmsg;
           assert(ppnetmsg != NULL);
           //     Receive the byte count from the socket.
           cbReceived = recv(socket, (char *)&cbMessage, sizeof(short), 0);
           if (cbReceived == SOCKET_ERROR)
                  return WSAGetLastError();
           //     Allocate a buffer to hold the network message.
           pnetmsg = (PNETMSG)new BYTE[cbReceived + FIELD_OFFSET(NETMSG, Message)];
           if (pnetmsg == NULL)
                  // Couldn't allocate the buffer for the socket.
                  return WSAENOBUFS;
           //     Fill in the static header values.
           pnetmsg->MessageSequence = InterlockedIncrement(&GlobalMessageSequence);
           pnetmsg->MessageSize = cbMessage;
           //     Receive the actual buffer from the server.
           cbReceived = recv(socket, (char *)&pnetmsg->Message, pnetmsg->MessageSize, 0);
           if (cbReceived == SOCKET_ERROR)
                  return WSAGetLastError();
           //     If we didn't get the amount requested, return an error.
           if (cbReceived != cbMessage)
                  return WSAEMSGSIZE;
           //     Return success.
           *ppnetmsg = pnetmsg;
           return 0;

  • Larry Osterman's WebLog

    Things not to do when you run a combined build and BVT lab...


    First, a caveat: I have nothing but respect for the people who run build and BVT labs.  It’s a thankless task that requires obscene amounts of time and energy.

    Now on with the story J

    My favorite example of things that you shouldn’t do in a build/BVT lab:  Page a developer in on a weekend to look at a BVT failure when the developer's got an 18 month old son and his wife's off enjoying a "Mom's Day Out".

    I was watching Daniel one weekend when he was 18 months old (back in 1993ish) when I got a call from the build lab to come in to look at a BVT failure.  In those days, the BVT lab and the build lab were in a single lab – the BVT machines on one side of a partition, and the build machines on the other side.  The BVT and build labs were filled with lab shelving – a desk-like bench that held the monitors and a shelf above that held computers.  All the machines were connected to power strips (and network hubs) connected to power supplies run through poles that fed into the ceiling.  A typical lab setup here in Redmond (it still is).  The builds ran all day, so the build team had people in the lab basically 24/7.

    I told them that I was watching Daniel, so it would be really inconvenient for me to come in, but they indicated that they couldn’t find anyone else on the team to look at the problem.  And what the heck, they’d met Daniel before, they knew he was a good kid, so what the heck, they’d keep an eye on him while I looked at the dead machine.

    So Daniel and I came into work to look at the problem.  The build guys took Daniel to their side of the room, and started playing with him (as much as you can “play” with an 18 month old) and I started working on the machine that had failed BVTs.

    Things went just fine until all of a sudden, I heard an exclamation coming from the build lab side of the wall:

    Kyle: “Hey, what the heck?”

    Orson: “What just happened?  My machine just went dead.”

    Kyle: “Wait, the MIPS machine just went dead too”.

    I felt that familiar feeling you get when you realized that your kid’s probably just gotten into trouble and came running.

    Yup.  The build guys had decided that Daniel was safe exploring the area behind the lab shelving – looking at all the cool wires, stuff like that.

    They had forgotten about all the really pretty red lights attached to the power strips that were supplying power to the build machines.

    You see, Daniel’s a really curious kid.  It didn’t take him very long to realize that when you pushed the little red light, it turned off.  And if you pushed it on the other side, it went back on.  So he turned it into a game and went down the entire rack of build machines flipping the little light on every one of the power strips.

    And every build machine quietly turned off, one by one.

    The good news is that the build guys took it in stride, the better news was that since we were running on our own Dogfood, while the builds that were running had stopped, the disks weren’t corrupted – we had to run chkdsk on one or two of them, but the builds restarted without a hitch.

    But they never did leave Daniel alone in the build lab again.


  • Larry Osterman's WebLog

    Longhorn system requirements


    Mary Jo’s comments about the system requirements for longhorn have been stirring up quite the discussion about the “average” Longhorn machine would be.  Robert picked this up with some insightful commentary on it as well.  And I figured I’d toss in my own two cents.

    First off, Robert’s totally right about the developer machines here.  Longhorn’s being developed for the most part on 3gHz or lesser machines – my test machine’s a year old hand-me-down 2gHz P4 (no HT) with 700M of RAM.  Longhorn works just fine on it.

    And it’s important to realize that we’ve barely started performance tuning on Longhorn.  There are WAY too many processes running on the system, and far too many things going on behind the scenes that suck down performance.  When we’re done, things will look very different to the way they look today.

    Now, on a vaguely related historical note:

    One of the reasons that Windows 3.x was so successful is directly related to the development lead for Windows at the time.  The dev lead, Phil Barrett, insisted that the dev team for Windows 3.x run on the machines that the Windows team’s customers were running.  At the time, the cutting edge machines were 33 MHz 486 boxes, the doublespeed machines (66MHz machines) were just hitting the market.  Many Microsoft developers had these machines (we tend to get new development machines every two years or so).

    But Phil REQUIRED that his developers run on 386/20’s.  With 1M or 2M of RAM.  At the same time, most developers were running 486/33 boxes with 4 or 8M of RAM.  But, because the expected customers of Windows 3.x were running were 386/20’s, that’s just what the Windows development team ran.  His logic was that if the developers were forced to run the same machines that their customers ran on, then they’d make darned sure that the system ran just fine – it forced them to feel their customers pain and ensured that they’d make the system as small and as fast as possible.

    In many ways, we’re still doing this today in Longhorn, but instead of looking at the machines our customers are running today, we’re running on the machines that the vast majority of our customers will be running 18 months from now.   Today, without even trying, if I go to, I can find a Dell Dimension 2400, with a 2.6gHz P4 processor, 40G hard disk, a 48x CDROM and a 48X CDRW drive, and 128M of RAM for $599.  That’s almost comparable to my test system (I’ve got a bit more disk space and RAM). 


    18 months from now, what will be the $500 computer?


  • Larry Osterman's WebLog

    So what WILL the postal service accept?


    Valorie somehow found this yesterday and forwarded it to me.

    Someone over at AIR decided to see what exactly the post office would accept for delivery.  The results, to say the least were “fascinating”.

    The post office delivered a deer tibia without a fuss.

    They also delivered a fresh green coconut.

    When a brick was sent, it was received.  But only after the brick had been pulverized by the DEA.

    Anyway, it’s a fun read.


  • Larry Osterman's WebLog

    Mea Culpa (it's corrections time).


    One of the rules in Tim Bray’s version of Sun’s blogging policy is “Write What You Know”.

    Well, I should have listened to this when I posted my 3rd post, “So why does NT require such a wonking great big paging file on my machine”.  I’ve since been gently corrected by the guys on the memory management team who really DO know how this stuff works, and there are a couple of really important mistakes in my post that need to be corrected.

    The biggest one is in the title itself.  NT doesn’t require a wonking great big paging file.  In fact, NT works with a paging file that’s too small, or even without a paging file at all.

    But, when you run without a paging file, you run certain risks. 

    One of the things that the system keeps track of is the “system wide commit limit”.  It’s composed of the system RAM plus the aggregate size of all the pagefiles on the system.  If there’s no pagefile, then the commit limit is lower.  When you allocate memory that requires commitment (some examples would be if you allocate memory with the MEM_COMMIT flag, or you create shared memory backed by the paging file), the memory is “charged” to the commitment limit.  If there’s not enough room in the commit limit, then the memory allocation will fail.  So your app won’t run until you either free up other memory (by shutting down running applications) or you increase the commit limit by increasing the pagefile space (or adding more RAM).

    The other risk of running with too small a paging file occurs when NT attempts to create a dump file in the event of a bluescreen.  If the paging file isn’t big enough to hold physical memory, then the dump file isn’t saved, and an opportunity to improve the system is lost.  You could specify a minidump or a kernel-only dump, but won’t necessarily have all the information needed to debug the problem.

    When NT runs setup, it chooses a default paging file size based on 1.5x memory (for certain sizes of RAM, the actual size chosen will be different, but this covers most of the case).  This is a guess, but it’s a pretty good guess, especially since the cost of NOT getting it right is likely a call to PSS.

    One of the people commenting on my blog asked: “Is there any reason to set the paging file greater than 4095 (ie. the maximum addressable memory space)?  If so, why.”  The answer to this is actually pretty straightforward: If the memory load on your system is more than 4G+physical RAM, then you will need to have a bigger paging file.  If you’ve got 20 copies of autocad running on your machine, and each of them is sucking down 400M of memory, then you’re going to need 8G of paging file space to hold all those pages.

    Btw, one of the tidbits that came out of my discussion with the MM guys was the maximum paging file size on a system:

    ·         On a regular x86 machine, you can have 16 paging files, each is 4G in size, for a total of 64G.

    ·         On an x86 running in PAE mode, you can have 16 paging files, but each can be 16tb in size, for a total of 256tb of paging file space.


  • Larry Osterman's WebLog

    Remember the Giblets!


    I was planning on writing something else today, but Raymond’s post about a buffer overflow in the LHA libraries convinced me to write this up.

    Yesterday afternoon, I spent a really quite enjoyable 4 hours sitting in listening to Michael Howard give a refresher course in security to about a thousand product group developers.

    Most, if not all of the people attending had been through the security training before, during the first windows security push, this was the first of the annual security refresher courses we’re required to take.  Yes, that’s right, required.  Microsoft’s REALLY serious about security these days, and as a part of that, every developer, tester and program manager at Microsoft (regardless of division) is going to be required to attend this training yearly.  Security training is even a part of new employee orientation.

    Fortunately, Michael’s a truly entertaining speaker – he has a knack for telling war stories, and his comments about code are remarkably insightful.  If you haven’t bought (and read) his book (here, here, or here), you should do so immediately.  I cannot stress how important it is.

    Anyway, this new security training session brought up some things I had previously considered (see the discussion here about the zlib library) but hadn’t formalized.  Fortunately, Michael framed it brilliantly: Remember the “Giblets”.  “Giblets” are the pieces of software that you include in your product that you don’t always remember.  Like zlib, or LHA, or MSXML, or the C runtime library

    Whenever you ship code, you need to consider what your response strategy is when a security hole occurs in your giblets.  Do you even have a strategy?  Are you monitoring all the security mailing lists (bugtraq, ntbugtraq) daily?  Are you signed up for security announcements from the creator of your giblets?  Are you prepared to offer a security update for your product when a problem is found in one of your giblets?  How do your customers know what giblets your application includes?

    Have you checked YOUR Giblets lately?

  • Larry Osterman's WebLog

    Measure Twice, Optimize Once


    When I move offices, it takes 16 moving boxes to hold my junk (I‘ve got a lot of it).  One reason is because of all the books I’ve collected over the years.

    Many of them were read once and discarded, others are treasures I come back to time and time again.  One of the ones I come back to regularly (at least once a year) is Jon L. Bentley’s “Writing Efficient Programs”, a book that is sadly out of print.

    In that book, I have a bookmark (actually it’s a ripped off piece of tissue) that’s been there for at least 10 years now because I continually refer back to the story.  As Jon tells it:

    Victor Vyssotsky enhanced a FORTRAN compiler in the early 1960s under the design constraint that compilation time could not be noticeably slower. A particular routine in his program was executed rarely (he estimated during design that it would be called in about one percent of the compilations, and just once in each of these) but was very slow, so Vyssotsky spent a week squeezing every last unneeded cycle out of the routine. The modified compiler was fast enough. After two years of extensive use the compiler reported an internal error during compilation of a program. When Vyssotsky inspected the code he found that the error occurred in the prologue of the "critical" routine, and that the routine had contained this bug for its entire production life. This implied that the routine had never been called during more than 100,000 compilations, so the week Vyssotsky put into prematurely optimizing it was completely wasted.

    I don’t know how many times I’ve opened the book up and referred to this story when talking to co-workers about performance.

    As an example, when we were merging the Exchange Server and MCIS products, we had a bunch of architectural review discussions where each of the two groups described their internal architecture to the other.  During the one where we described the Exchange 5.5 POP3 server, I described how it rendered the user’s email message to a temporary temporary file, then used the TransmitFile API to send it to the remote client.

    The dev lead for MCIS expressed disbelief at the design: “What on earth are you guys doing?  You’re making dozens of system calls to write the file to the disk, just to be able to call TransmitFile!  There’s no way that that’s going to be fast enough for production systems.”  Instead, he suggested (rather strongly) that we render the file to user mode memory, then convert the server to use non blocking sockets and keep the kernel socket buffer as full as possible (without blocking the user mode threads).

    To humor him, we made the changes he suggested, and then ran some of our performance tests on the new bits.  And what did you know, it didn’t make a bit of difference in our actual throughput (it may have been slightly slower actually, I’m not 100% sure at this point).

    So of course we backed out all those changes and went back to our old mechanism.

    The fundamental problem that the MCIS dev lead made in this case was to assume that the number of system calls made by the application was an accurate measurement of the throughput of the application.  He misunderstood a couple of things: First, he was operating under the mistaken belief that system calls (and thus ring transitions) on NT take a significant amount of time. Second, he failed to realize that there was significant synergy between temporary temporary files and the TransmitFile API – because the file data was never written to disk, the TransmitFile API was able to read the data to send on the socket directly from the filesystem cache.

    Bottom line: Understand where your bottlenecks are before you begin to optimize.  And make sure you understand the underlying performance characteristics of your system BEFORE you start to optimize.

    The carpenters have it right: Measure Twice, Cut Once.  In our case, it’s more like: Measure Twice, Optimize Once; Wash, Rinse, Repeat.


  • Larry Osterman's WebLog

    Units of measurement


     Whenever people find out that I work at Microsoft, invariably the next question they ask is “Have you met Bill?” (The next question is: “So what’s with the stock?” – as if I had a magic 8-ball to tell them).

    I’ve met Bill socially a couple of times (at various company functions); he doesn’t know who I am thoughJ.  But there was one memorable meeting I attended with him.

    It was back in 1986ish; we were presenting the plans for Lan Manager 1.0 to him.  One portion of the meeting was about my component, DOS Lan Manager (basically an enhanced version of the MS-NET redirector, with support for a fair number of the Lan Manager APIs on the client).  My boss and I were given the job of presenting the data for that portion.

    One of the slides (not Powerpoint, it didn’t exist at the time – Lucite slides on an overhead projector) we had covered the memory footprint of the DOS Lan Manager redirector.

    For DOS LM 1.0, the redirector took up 64K of RAM.

    And Bill went ballistic.

    “What do you mean 64K?  When we wrote BASIC, it only took up 8K of RAM.  What the f*k do you think idiots think you’re doing?  Is this thing REALLY 8 F*ing BASIC’s?”

    The only answer we could give him was “Yes”J.

    To this day, I sometimes wonder if he complains that Windows XP is “16,000 F*ing BASIC’s”.

    Edit: To add what we finally did with DOS Lan Manager's memory footprint. 

    We didn't ignore Bill's comment, btw.  We worked on reducing the footprint of the DOS redirector by first moving the data into LIM Expanded memory, next by moving the code into expanded memory.  For LAN Manager 2.1, we finally managed to reduce the below 640K footprint of the DOS redirector to 128 bytes.  It took a lot of work, and some truely clever programming, but it did work.


  • Larry Osterman's WebLog

    How much is too much?


    There was a fascinating discussion on one of our internal mailing lists earlier today, towards the tail end of it, someone asked what the memory management guys would do when machines started coming with 8TB of memory.

    5 minutes later came the following response:

    There are already TPCC benchmarks being done with 1TB of RAM.

    I bet we will run out of 8TB for giant machines sometime in 2008.5, if Moore’s Law applies to memory capacity (double once every 18 months).


    I find the concept of a machine with more than 8G of RAM to be mind boggling.  And this guy's saying that big machines will be running with 8TB of of RAM in four years.  That's 8,000,000,000,000 bytes of RAM (give or take a couple of gigabytes).  8,000 GIGABYTES of RAM.

    We truely do live in remarkable times.


  • Larry Osterman's WebLog

    Choosing a C runtime library


    Yesterday a developer in my group came by asking about a failure he saw when running the application verifier on his component.  The app verifier was reporting that he was using a HEAP_NO_SERIALIZE heap from a thread other than the one that created the heap.

    I looked a bit deeper and realized that he was running with the single threaded statically linked C runtime library.  An honest mistake, given that it’s the default version of the C runtime library.

    You see, there are 3 different versions of the C runtime library shipped (and 3 different versions of the ATL and MFC libraries too). 

    The first is the statically linked single-threaded library.  This one can be used only on single threaded applications, and all the object code for the C runtime library functions used is included in the application binary.  You get this with the /ML compiler switch.

    The second is the statically linked, multi-threaded library.  This one’s the same as the first, but you can use it in a multithreaded application.  You get this one with the /MT compiler switch.

    The third is the dynamically linked library.  This one keeps all the C runtime library code in a separate DLL (MSVCRTxx.DLL).  Since the runtime library code’s in a DLL, it also handles multi-threaded issues.   The DLL library is enabled with the /MD switch.

    But I’ve been wondering.  Why on earth would anyone ever choose any option OTHER than multi-threaded DLL version of the runtime library?

    There are LOTS of reasons for always using the multithreaded DLL:

    1)      Your application is smaller because it doesn’t have the C runtime library loaded into it.

    2)      Because of #1, your application will load faster.  The C runtime library is almost certainly in memory, so the pages containing the library don’t have to be read from disk.

    3)      Using the multithreaded DLL future-proofs your application.  If you ever add a second thread to your application (or call into an API that creates multiple threads), you don’t have to remember to change your C runtime library.  And unless you’re running the app verifier regularly, the only way you’ll find out about the problem is if you get a heap corruption (if you’re lucky).

    4)      If your application has multiple DLL’s, then you need to be VERY careful about allocation – each DLL will have its own C runtime library heap, as will the application.  If you allocate a block in one DLL, you must free it in the same DLL.

    5)      If a security bug is ever found in the C runtime library, you don’t have to release an update to your app.

    The last one’s probably the most important IMHO.  Just to be clear - There haven’t been any security holes found in the C runtime library.  But it could happen.  And when it happens, it’s pretty ugly.  A really good example of this can be seen with the security vulnerability that was found in the zlib compression library. This library was shipped in dozens of products, and every single one of them had to be updated.  If you do a google search for “zlib library security vulnerability” you can see some of the chaos that resulted from this disclosure.  If your app used the DLL C runtime library, then you’d get the security fix for free from windows update when Microsoft posted the update.

    The only arguments I’ve been able to come up with for using the static C runtime libraries are:

    1)      I don’t have to distribute two binaries with my application – If I use the DLL, I need to redistribute the DLL.  This makes my application setup more complicated.

    Yes, but not significantly (IMHO).  This page lists the redistribution info for the C runtime library and other components.

    2)      If I statically link to the C runtime library, I avoid DLL hell.

    This is a red herring IMHO.  Ever since VC6, the C runtime library has been tightly versioned, as long as your installer follows the rules for version checking of redistributable files (found here) you should be ok.

    3)      My code is faster since the C runtime library doesn’t have to do all that heap synchronization stuff.

    Is it really?  How much checking is involved in the multithreaded library?  Let’s see.  The multithreaded library puts some stuff that was kept in global variable in thread local storage.  So there’s an extra memory indirection involved on routines like strtok etc.  Also, the single threaded library creates it’s heap with HEAP_NO_SERIALIZE (that’s what led to this entire post J).  But that just wraps the heap access with an EnterCriticalSection/ExitCriticalSection.  Which is very very fast if there’s no contention.  And since this is a single threaded application, by definition there’s no contention for the critical section.

    Using the multithreaded DLL C runtime library is especially important for systems programmers.  First off, if your system component is a DLL, it’s pretty safe to assume that you’ll be called from multiple threads, so at an absolute minimum, you’re going to want to use the multithreaded static C runtime library.  And if you’re using the multithreaded static C runtime library, why NOT use the DLL version?

    If you’re not writing a DLL, then it’s highly likely that your app does (or will) use multiple threads.  Which brings me back to the previous comment – why NOT use the DLL version? 

    You’re app will be smaller, more secure, future-proof, and no slower than if you don’t.


Page 31 of 33 (815 items) «2930313233