Rob Earhart's WebLog

  • Applied Software Project Management

    So, I recently read Applied Software Project Management, by Andrew Stellman and Jennifer Greene.  I absolutely loved it; here's the review I wrote for Amazon...


    (Disclosure: I went to college with one of the authors.)

    I like that it doesn't evangelize. The authors simply point out that software project management isn't obvious, that well-intentioned people often make the same kinds of mistakes, and that they're preventable with a little knowledge and foresight.

    I've seen a lot of software projects. I've seen them succeed, I've seen them fail, and I've seen them produce the wrong products (not actually doing what the customer wants). Almost all took far longer than expected.

    Reading the book, I was consistently impressed by how well the authors identified almost all of the problems I've noticed (as well as a number of problems I haven't), explored why they occur, and provided straightforward solutions to those problems. I've sometimes seen things go wrong, but not really known how to fix them; this book's helped me understand what's happened and how to help things go right in the future.

    I wish I'd read this book ten years ago. (Or to be more precise, I wish the project managers for whom I've worked had read this book ten years ago.) I can't recommend it strongly enough; if you manage software projects, and you get the sense that things could maybe go better, read this book.

  • Hypervisor

    So, after my last post way-back-when, I got busy fixing bugs for a while.  Then I decided I wanted to do something other than fix bugs, and switched over to the Hypervisor team.

    So for those who missed the announcement a few months ago: the hypervisor's an OS which happens to look like ordinary hardware to the applications which run on it.  So you can run multiple copies of Windows on it (or Linux, or DOS, or anything else which runs on x86/x64), and take them up and down individually, and everything just works.

    Sounds a lot like Microsoft's existing Virtual Server product, eh?  It's basically the same idea, with a much better architecture which should let us run guests significantly more efficiently.

    On the personal angle: we're getting to write an OS, basically from scratch.  It's amazingly fun; there've been a lot of interesting design problems to solve, and the team as a whole has really bought into taking the steps necessary to produce a very high-quality product (it's always a plus when you think your project's really going to succeed).  It's been good.

  • Threadpool

    So I've checked in the new Longhorn threadpool.  (Unmanaged).

    If you use the threadpool, I think you'll enjoy it; there are a number of API features which should make it very easy to write correct code which uses it, particularly in the area of cleanup and dll unload synchronization, which the old QueueUserWorkItem() interface was sorely lacking in.  Performance and scalability should be a little better, too.

    So now it's back to fixing bugs, at least for a little while; there are a lot of edge cases in ntdll.dll and kernel32.dll which never really were thought through properly.  I spent a bit of time this past fall plowing through these; this is a good time to do some more before the next big project.

    I wish I had time before the kernel team stops Longhorn work to do some serious work in the PE loader code; there are a number of performance problems with having one big loader lock per process, so making that a bit more granular would be great.  It'd also be nice to untangle it a bit; currently there're callouts to various components sprinkled throughout the code.  Maybe once Longhorn ships...

  • Hungarian

    (In this post, I demonstrate just how young I am; I'm sure this stuff's been hashed out many times over the years...)

    So, I like Hungarian notation.  This puts me at odds with most of my colleagues, who intensely dislike it.  The few times it's come up, though, I'll ask them, "Have you read something by Charles Simonyi on the subject, or are you just reading winbase.h and thinking it looks like useless junk?"  Every time, they've only read winbase.h.  And in winbase.h, it is mostly useless junk, lots of identifiers such as dwFoo and pBuffer, in which the little sprinkling of Hungarian-ish syntax does absolutely nothing to improve the understanding of the reader.

    I don't use Hungarian very much, though.

    For one thing, in ntdll.dll and in the kernel, that's not the naming convention, and I like to follow the existing style so closely that it's hard to tell from a stylistic standpoint where I've made modifications.

    For another... despite the asthetic appeal, the identifiers can be difficult to read.  I wrote some code once which was grovelling a PE's export table, and I wound up with names like rgiipfnNameOrdinals, rgipaszNames, and rgipfnFunctions.  They just don't trip lightly off the fingertips, and they're hard to read, even though they're very easy to break down and understand.

    (rgipaszNames, for instance, is an array of image pointers (which need to be translated through the section table to get the offsets to add to the image base to get the actual pointer within the data mapping), and those pointers point to the first characters of zero-terminated strings.  It's not explicit from the names, but the array index can then be used as an offset into rgiipfnNameOrdinals to obtain the index of an image pointer to a function within the image.  That index can be used with rgipfnFunctions to get the image pointer to the corresponding function, and that can be translated to the function.  Clear, eh?  It could be more explicit by creating a type for the name/ordinal array index, maybe "nai" or something, which would then make the identifiers things like mpnaiipaszNames and mpnaiiipfnNameOrdinals, but that doesn't really improve readability; at that point, one would probably want to start inventing more basic types for these arrays, giving the reader more types to memorize in order to understand the code...)


    So last weekend, my wife mentioned a study in which someone noticed that people can read words pretty well even when the letters are scrambled, as long as the first and last letters are unaltered, and you have some context to go on--so, you suohld be albe to raed tihs ptrety eislay.

    I think this explains a lot about Hungarian, where the exact ordering of the letters matters quite a bit, and the names don't match pre-existing words in the reader's brain.  The reader has to actually slow down and pay attention to all of the letters; the mental mechanism most people seem to have, which is capable of decoding scrambled words given just a few clues, doesn't work.  So the reader's reading speed goes down, and reading comprehension probably drops as well.

    For all of its plusses, Hungarian doesn't match the way people think.

    (And then there's the whole lack of validation problem, so nothing catches bad Hungarian, and people think dwFoo is a good Hungarian identifier for something which just happens to be implemented as a DWORD, and think Hungarian must be pointless, because comeon, what good is adding a little dw there?)

    It'd be nice to see a better naming convention; I don't know of one, though.  I'd love to see better language / development environment support for the sort of semantic typing Hungarian provides.  (If I had my way, languages would support and validate SI units on identifiers, too--it's roughly the same idea.)

    In the meantime, I do tend to use a few Hungarian ideas, like the standard qualifiers (so in code I write, Foo < FooLim implies that Foo is usable).  But I don't use Hungarian.

  • Childhood computers

    So I was visiting my parents the other day.  They've been going through old boxes of stuff, simplifying a little, and had stumbled across the first home computer we bought: a Commodore 128.

    This was the state of the art when I was in sixth grade, I think: a 1mhz 8502 processor (2mhz if you disabled the 40-column video chip, which shared bus bandwidth with the processor), and an extra Z80 for running CPM, with 128k of RAM, and a 5.25" floppy drive which could hold about 360k (double-sided, at that).  Video was either the old C64-style 40x25 character display (or 320x200 pixels if you wanted graphics, with some pretty severe color restrictions), or the new 80 column chip (with its own memory, so you could use it while running the processor at 2mhz).

    IIRC, at least part of the system was written by Microsoft (I think there was a "Microsoft Basic" banner when you turned the thing on).  I probably don't have coworkers who worked on it, though--anyone who was at Microsoft that long ago would be extremely rich and most likely retired; Bill Gates would've been around, and maybe Steve Ballmer, but the next in line after them is Mark Zbikowski (look at the beginning of any Windows PE file--see the "MZ" signature?  That guy.), and I think Mark got there after that.

    So I spent the better part of a day re-reading the programmer's reference manual, just completely inhaling it.  It was great (although my wife and parents were less amused; I told them that if I didn't like computing machines to this extent, I probably wouldn't have gotten into kernel development).  It's amazing how much the perspective of twenty years of experience with other machines makes a difference; I think I understand the thing a lot better now.

    Of course, reading it, the idea came to me that I could easily write an assembler for the thing... maybe a C compiler... and it hit me that implementing multiple threads and a single-priority round-robin scheduler would be pretty easy... I wonder how hard it'd be to write an MSIL->8502 translator, and implement just enough of the CLR to run something interesting...?

    I probably won't get around to all that; I have other fun side projects I'd rather work on.  But I still really like the machine, for all of its limitations; it's fun to write assembly for a processor with 8-bit registers, a 16-bit hardware address space, addressing 128k of RAM.  It's fun to work with a machine where the processor logic diagram actually ends with the physical pins; it's so simple.  It's like writing a sonnet; the constraints make it more challenging, more fun.

  • Yet another reason not to handle all SEH exceptions

    ... you'll probably forget to call _resetstkoflw() in your exception handler when it's a stack overflow exception.  This re-establishes the guard page, so that the next time the thread hits a stack overflow, it'll get another stack overflow exception; if you forget to call it, then the next time the thread runs off the end of its stack, it'll AV instead (if you're lucky), and not have anywhere to run the exception filter or termination handlers, which could be a problem if there are locks to be dropped, &c.

    Everyone understands that you usually don't want to be using try/except, right?  try/finally is almost always more appropriate.

    "Gee, Rob, why are you talking about exception handling so much?"  Mostly because in the core OS code, it's an area where there are a lot of unnecessary defects; it'd be nice if more people understood and tested this stuff.

  • SEH stack walking

    I noticed another fun thing about structured exception handling the other day; it's probably old news to the compiler team, but I thought it was interesting.

    Imagine you have code like this:

    DoSomething(); // This may raise an exception.
    try {
       *x = NULL;  // This may raise an exception, too.
       ...
    }

    On x86, this is pretty straightforward.  There are exception registration records on the stack, chained together, with pointers to functions to be called to filter the exception and unwind the stack.  The exception handling code just walks a linked list.

    On amd64, the stack needs to be examined.  There are no exception registration records; the return addresses are walked to find the call stack and the exception handling context for each call frame.

    So if you're a compiler, you must not generate code like this:

      0000000100001C6E: FF 15 A4 F4 FF FF  call        qword ptr [__imp_DoSomething]
      0000000100001C74: 66 C7 03 00 00     mov         word ptr [rbx],0

    ... because then the return address of the call would be the mov instruction, and exceptions raised in DoSomething() would be in the same context as exceptions raised by the mov instruction.

    So the compiler adds a nop:

      0000000100001C6E: FF 15 A4 F4 FF FF  call        qword ptr [__imp_DoSomething]
      0000000100001C74: 90                 nop
      0000000100001C75: 66 C7 03 00 00     mov         word ptr [rbx],0

    That nop's sole purpose in life (so far as I can determine, anyway) is to provide a return address from the call.

    Incidentally, ia64 doesn't appear to require this--at least, the compiler doesn't generate it. I'd assume that since ia64 instructions are more consistently sized, it's easier to figure out the address of the actual call instruction for a given return pointer (and this appears to be what the exception handling code actually does).

  • SEH Ordering

    A number of devs don't quite understand the order of operations in structured exception handling.  This leads to some interesting bugs...

    Take this snippet of pseudocode, for example:

    try {
        EnterCriticalSection();

        try {
            DoSomething()
            RaiseException();
            DoSomethingElse();

        } finally {
            LeaveCriticalSection();
        }
    } except(ExceptionFilter()) {
        ReportException();
    }

    Assuming that ExceptionFilter() returns EXCEPTION_EXECUTE_HANDLER, in what order will these functions be called?


    It goes like this:
    EnterCriticalSection();
    DoSomething();
    RaiseException();
    ExceptionFilter();
    LeaveCriticalSection();
    ReportException();

    ... that is, the exception filter--which is deciding how to deal with the exception--is called before the termination handlers on the stack are executed.

    This is pretty obvious if you recall that with a continuable exception, the filter may return EXCEPTION_CONTINUE_EXECUTION, causing execution to resume from the point where the exception was raised.  In that case, DoSomethingElse() would be called, and you wouldn't want to have left the critical section--that'll happen when DoSomethingElse() returns.

    The implications get interesting.

    If an exception filter is wrapping code the author doesn't control (say, it's making a call into some other dll), it can't know very much about the state of the thread and the system.  The thread might be impersonating, it might be holding locks, who knows?

    This limits what the exception filter can do.  For instance, if the thread faulted in the heap, it might be holding the heap lock (since the heap lock's dropped in a termination handler); another thread might be holding the loader lock and attempting to acquire the heap lock, so if the faulting thread blocks on the loader lock, it'll deadlock.

    It takes some thought to get this right.  The moral is, unless you fully understand and control the code your exception filter is wrapping, you can't make assumptions about the state of the system in your exception filter.

    (Standard Microsoft disclaimer: This posting is provided "AS IS" with no warranties, and confers no rights.)


© 2008 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker