February, 2004

  • The Old New Thing

    Sure, we do that

    The DirectX video driver interface for Windows 95 had a method that each driver exposed called something like "DoesDriverSupport(REFGUID guidCapability)" where we handed it a capability GUID and it said whether or not that feature was supported.

    There were various capability GUIDs defined, things like GUID_CanStretchAlpha to ask the driver whether it was capable of stretching a bitmap with an alpha channel.

    There was one driver that returned TRUE when you called DoesDriverSupport(GUID_XYZ), but when DirectDraw tried to use that capability, it failed, and in a pretty spectacular manner.

    So one of the DirectDraw developers called the vendor and asked them, "So does your card do XYZ?"

    Their response: "What's XYZ?"

    Turns out that their driver's implementation of DoesDriverSupport was something like this:

    BOOL DoesDriverSupport(REFGUID guidCapability)
      return TRUE;

    In other words, whenever DirectX asked, "Can you do this?" they answered, "Sure, we do that," without even checking what the question was.

    (The driver must have been written by the sales department.)

    So the DirectDraw folks changed the way they queried for driver capabilities. One of the developers went into his boss's office, took a network card, extracted the MAC address, and then smashed the card with a hammer.

    You see, this last step was important: The GUID generation algorithm is based on a combination of time and space. When you ask CoCreateGuid to create a new GUID, it encodes the time of your request in the first part of the GUID and information that uniquely identifies your machine (the network card's MAC address, which is required to be unique by the standards that apply to network card).

    By smashing the network card with a hammer, he prevented that network card from ever being used to generate a GUID.

    Next, he added code to DirectDraw so that when it starts up, it manufactures a random GUID based on that network card (which - by its having been destroyed - can never be validly created) and passes it to DoesDriverSupport. If the driver says, "Sure, we do that", DirectDraw says, "Aha! Caught you! I will not believe anything you say from now on."
  • The Old New Thing

    Dunkin Donuts vs. Krispy Kreme

    Having grown up on the east coast, I imprinted on Dunkin Donuts. Once a month we would stop at DD on the way home and buy a shoebox of doughnuts. Toasted coconut and butternut, those were my favorites.

    Ironically, Dunkin Donuts is really a coffee shop disguised as a doughnut shop. (Doughnuts account for only 20% of their sales; coffee 50%.)

    So during my travels through Manhattan, I walked past one of the twenty-five zillion Dunkin Donuts stores there and popped in for a toasted coconut doughnut. One bite and I was a little kid again.

    Some people say that DD's doughnuts are awful, but that's pretty much irrelevant to me by now. It's all about the memories that are invoked.

    And besides, those people are wrong. I don't understand the appeal of KK donuts. They have no flavor; it's just sugar.

  • The Old New Thing

    Answer to exercise: Pointer to member function cast


    Yesterday's exercise asked you to predict and explain the codegen for the following fragment:

    class Base1 { int b1; void Base1Method(); };
    class Base2 { int b2; void Base2Method(); };
    class Derived : public Base1, Base2
      { int d; void DerivedMethod(); };
    class Derived2 : public Base3, public Derived { };
    void (Derived::*pfnDerived)();
    void (Derived2::*pfnDerived2();
    pfnDerived2 = pfnDerived;

    Well, the codegen might go something like this:

      mov  ecx, pfnDerived[0]       ; ecx = address
      mov  pfnDerived2[0], ecx
      mov  ecx, pfnDerived2[4]      ; ecx = adjustor
      add  ecx, sizeof(Base3)       ; adjust the adjustor!
      mov  pfnDerived2[4], ecx

    Let's use one of our fancy pictures:


    Just for fun, I swapped the order of Base1 and Base2. There is no requirement in the standard about the order in which storage is allocated for base classes, so the compiler is completely within its rights to put Base2 first, if it thinks that would be more efficient.

    A pointer to member function for class Derived expects the "this" pointer to be at "q". So when we have a "p", we need to add sizeof(Base3) to it to convert it to "q", on top of whatever other adjustment the original function pointer wanted. That's why we add sizeof(Base3) to the existing adjustor to make a new combined adjustor.

  • The Old New Thing

    Orkut's privacy policy and terms of service

    It was bound to happen sooner or later. I was invited to join Orkut. But before clicking Submit, I always read the fine print: their Terms of Service and their Privacy Policy. (Oh great, you have to have scripting enabled just to read their Terms of Service and Privacy Policy!)

    Notice, for example, the terms for changes to their terms of service:

    We also reserve the right to modify these Terms of Service from time to time without notice. You are responsible for regularly reviewing these Terms of Service so that you will be apprised of any changes.

    (Emphasis mine.) Notice that they do not say that they will notify you when the Terms of Service change. It is your responsibilty to check the Terms of Service. So tomorrow, they could quietly amend their Terms of Service to read, "By agreeing to these Terms of Service, you also agree to pay orkut.com a fee of $50 per day in perpetuity, and you grant that orkut.com or its agents are authorized to use physical force or threats of force to compel such payment," and it is your responsibility to notice this.

    And even if you do manage to notice this, their termination clause says

    Once your membership terminates, you will have no right to use the orkut.com service. Our proprietary rights, disclaimer of warranties, indemnities, limitations of liability and miscellaneous provisions shall survive any termination of your membership.

    Suppose you alertly notice that they changed their Terms of Service and you quickly contact them to cancel your membership. Does that relieve you of your $50/day habit? Nope. The $50/day fee survives termination of the membership. Even though you lose your right to the benefits of membership, they retain the rights to exploit your membership (that you don't have any more but are still paying for).

    Of course, any interpretation of the above paragraph is meaningless since they can totally write their Terms of Service at any time and hold you to the rewritten version.

    What about the privacy policy?

    We reserve the right to transfer your personal information in the event of a transfer of ownership of orkut.com, such as acquisition by or merger with another company. In such an event, orkut.com will notify you before information about you is transferred and becomes subject to a different privacy policy.

    Note that there is no way to opt out of being subjected to a different privacy policy. So Orkut could be bought by vast-left-wing-conspiracy.com, whose privacy policy reads, "We reserve the right to use all information gathered about you, both aggregate and personally identifiable, for any means whatsoever, without compensation or recourse. The terms of this policy remain in effect even after membership is cancelled."

    And now vast-left-wing-conspiracy.com can sell your name to hate groups and you can't do anything about it.

    I find it interesting that there are no provisions in the Privacy Policy for changes to the Privacy Policy.

  • The Old New Thing

    I think this counts as having come full circle

    First, ABBA rises to stardom in their native Sweden with Ring, Ring. They then win the Eurovision Song Contest with Waterloo, which is also recorded in English, French, German, and probably Spanish.

    Twenty-five years later, the English-language musical Mamma-Mia premieres in London and subsequently spreads through large portions of the world not yet civilized enough to ban fluorescent pink tuxedos or platform shoes.

    And now, the musical is being translated into Swedish and auditions are being taken for a scheduled opening in Stockholm on 12 February 2005.

  • The Old New Thing

    Pointers to member functions are very strange animals


    Pointers to member functions are very strange animals.

    Warning: The discussion that follows is specific to the way pointers to member functions are implemented by the Microsoft Visual C++ compiler. Other compilers may do things differently.

    Well, okay, if you only use single inheritance, then pointers to member functions are just a pointer to the start of the function, since all the base classes share the same "this" pointer:

    class Simple { int s; void SimpleMethod(); };
    class Simple2 : public Simple
      { int s2; void Simple2Method(); };
    class Simple3 : public Simple2
      { int s3; Simple3Method(); };

    Since they all use the same "this" pointer (p), a pointer to a member function of Base can be used as if it were a pointer to a member function of Derived2 without any adjustment necessary.

    The size of a pointer-to-member-function of a class that uses only single inheritance is just the size of a pointer.

    But if you have multiple base classes, then things get interesting.

    class Base1 { int b1; void Base1Method(); };
    class Base2 { int b2; void Base2Method(); };
    class Derived : public Base1, Base2
      { int d; void DerivedMethod(); };

    There are now two possible "this" pointers. The first (p) is used by both Derived and Base1, but the second (q) is used by Base2.

    A pointer to a member function of Base1 can be used as a pointer to a member function of Derived, since they both use the same "this" pointer. But a pointer to a member function of Base2 cannot be used as-is as a pointer to a member function of Derived, since the "this" pointer needs to be adjusted.

    There are many ways of solving this. Here's how the Visual Studio compiler decides to handle it:

    A pointer to a member function of a multiply-inherited class is really a structure.

    Address of function
    The size of a pointer-to-member-function of a class that uses multiple inheritance is the size of a pointer plus the size of a size_t.

    Compare this to the case of a class that uses only single inheritance.

    The size of a pointer-to-member-function can change depending on the class!

    Aside: Sadly, this means that Rich Hickey's wonderful technique of Callbacks in C++ Using Template Functors cannot be used as-is. You have to fix the place where he writes the comment

    // Note: this code depends on all ptr-to-mem-funcs being same size

    Okay, back to our story.

    To call through a pointer to a member function, the "this" pointer is adjusted by the Adjustor, and then the function provided is called. A call through a function pointer might be compiled like this:

    void (Derived::*pfn)();
    Derived d;
      lea  ecx, d       ; ecx = "this"
      add  ecx, pfn[4]  ; add adjustor
      call pfn[0]       ; call

    When would an adjustor be nonzero? Consider the case above. The function Derived::Base2Method() is really Base2::Base2Method() and therefore expects to receive "q" as its "this" pointer. In order to convert a "p" to a "q", the adjustor must have the value sizeof(Base1), so that when the first line of Base2::Base2Method() executes, it receives the expected "q" as its "this" pointer.

    "But why not just use a thunk instead of manually adding the adjustor?" In other words, why not just use a simple pointer to a thunk that goes like this:

    Derived::Base2Method thunk:
        add ecx, sizeof(Base1)  ; convert "p" to "q"
        jmp Base2::Base2Method  ; continue

    and use that thunk as the function pointer?

    The reason: Function pointer casts.

    Consider the following code:

    void (Base2::*pfnBase2)();
    void (Derived::*pfnDerived)();
    pfnDerived = pfnBase2;
      mov  ecx, pfnBase2            ; ecx = address
      mov  pfnDerived[0], ecx
      mov  pfnDerived[4], sizeof(Base1) ; adjustor!

    We start with a pointer to a member function of Base2, which is a class that uses only single inheritance, so it consists of just a pointer to the code. To assign it to a pointer to a member function of Derived, which uses multiple inheritance, we can re-use the function address, but we now need an adjustor so that the pointer "p" can properly be converted to a "q".

    Notice that the code doesn't know what function pfnBase2 points to, so it can't just replace it with the matching thunk. It would have to generate a thunk at runtime and somehow use its psychic powers to decide when the memory can safely be freed. (This is C++. No garbage collector here.)

    Notice also that when pfnBase2 got cast to a pointer to member function of Derived, its size changed, since it went from a pointer to a function in a class that uses only single inheritance to a pointer to a function in a class that uses multiple inheritance.

    Casting a function pointer can change its size!

    I bet that you didn't know that before reading this entry.

    There's still an awful lot more to this topic, but I'm going to stop here before everybody's head explodes.

    Exercise: Consider the class

    class Base3 { int b3; void Base3Method(); };
    class Derived2 : public Base3, public Derived { };
    How would the following code be compiled?
    void (Derived::*pfnDerived)();
    void (Derived2::*pfnDerived2();
    pfnDerived2 = pfnDerived;

    Answer to appear tomorrow.

  • The Old New Thing

    Adjustor thunks


    Yesterday we learned about the layout of COM objects and I hinted at "adjustor thunks".

    If you find yourself debugging in disassembly, you'll sometimes find strange little functions called "adjustor thunks". Let's take another look at the object we laid out last time:

    class CSample : public IPersist, public IServiceProvider
      // *** IUnknown ***
      STDMETHODIMP QueryInterface(REFIID riid, void** ppv);
      STDMETHODIMP_(ULONG) Release();
      // *** IPersist ***
      // *** IQueryService ***
      STDMETHODIMP QueryService(REFGUID guidService,
                      REFIID riid, void** ppv);
      LONG m_cRef;
    p    lpVtbl    QueryInterface (1)
    q    lpVtbl    QueryInterface (2) AddRef (1)
    m_cRef AddRef (2) Release (1)
    ... Release (2) GetClassID (1)
    QueryService (2)

    In the diagram, p is the pointer returned when the IPersist interface is needed, and q is the pointer for the IQueryService interface.

    Now, there is only one QueryInterface method, but there are two entries, one for each vtable. Remember that each function in a vtable receives the corresponding interface pointer as its "this" parameter. That's just fine for QueryInterface (1); its interface pointer is the same as the object's interface pointer. But that's bad news for QueryInterface (2), since its interface pointer is q, not p.

    This is where the adjustor thunks come in.

    The entry for QueryInterface (2) is a stub function that changes q to p, and then lets QueryInterface (1) do the rest of the work. This stub function is the adjustor thunk.

      sub     DWORD PTR [esp+4], 4 ; this -= sizeof(lpVtbl)
      jmp     CSample::QueryInterface

    The adjustor thunk takes the "this" pointer and subtracts 4, converting q into p, then it jumps to the QueryInterface (1) function to do the real work.

    Whenever you have multiple inheritance and a virtual function is implemented on multiple base classes, you will get an adjustor thunk for the second and subsequent base class methods in order to convert the "this" pointer into a common format.

  • The Old New Thing

    The layout of a COM object


    The Win32 COM calling convention specifies the layout of the virtual method table (vtable) of an object. If a language/compiler wants to support COM, it must lay out its object in the specified manner so other components can use it.

    It is no coincidence that the Win32 COM object layout matches closely the C++ object layout. Even though COM was originally developed when C was the predominant programming language, the designers saw fit to "play friendly" with the up-and-coming new language C++.

    The layout of a COM object is made explicit in the header files for the various interfaces. For example, here's IPersist from objidl.h, after cleaning up some macros.

    typedef struct IPersistVtbl
        HRESULT ( STDMETHODCALLTYPE *QueryInterface )(
            IPersist * This,
            /* [in] */ REFIID riid,
            /* [iid_is][out] */ void **ppvObject);
            IPersist * This);
            IPersist * This);
            IPersist * This,
            /* [out] */ CLSID *pClassID);
    } IPersistVtbl;
    struct IPersist
        const struct IPersistVtbl *lpVtbl;

    This corresponds to the following memory layout:

    p    lpVtbl    QueryInterface

    What does this mean?

    A COM interface pointer is a pointer to a structure that consists of just a vtable. The vtable is a structure that contains a bunch of function pointers. Each function in the list takes that interface pointer (p) as its first parameter ("this").

    The magic to all this is that since your function gets p as its first parameter, you can "hang" additional stuff onto that vtable:

    p    lpVtbl    QueryInterface
    other stuff

    The functions in the vtable can use offsets relative to the interface pointer to access its other stuff.

    If an object implements multiple interfaces but they are all descendants of each other, then a single vtable can be used for all of them. For example, the object above is already set to be used either as an IUnknown or as an IPersist, since IUnknown is a subset of IPersist.

    On the other hand, if an object implements multiple interfaces that are not descendants of each other, then you get multiple inheritance, in which case the object is typically laid out in memory like this:

    p    lpVtbl    QueryInterface (1)
    q    lpVtbl    QueryInterface (2) AddRef (1)
    other stuff
    AddRef (2) Release (1)
    Release (2) ...

    If you are using an interface that comes from the first vtable, then the interface pointer is p. But if you're using an interface that comes from the second vtable, then the interface pointer is q.

    Hang onto that diagram, because tomorrow we will learn about those mysterious "adjustor thunks".

  • The Old New Thing

    Answers to exercises - mismatching new/delete

    Answers to yesterday's exercises:

    What happens if you allocate with scalar "new" and free with vector "delete[]"?

    The scalar "new" will allocate a single object with no hidden counter. The vector "delete[]" will look for the hidden counter, which isn't there, so it will either crash (accessing nonexistent memory) or grab a random number and attempt to destruct that many items. If the random number is greater than one, you will start corrupting memory after the object. If the random number is zero, you fail to destruct anything. If the random number is exactly one, then the one object is destructed.

    Next, the vector "delete[]" will attempt to free the memory block starting one size_t in front of the actual memory block. Depending on how the heap feels today, this may be detected as an invalid parameter and ignored, or this can result in heap corruption.

    Final result: not good.

    What happens if you allocate with vector "new[]" and free with scalar "delete"?

    The vector "new[]" allocates several objects and stores the "howmany" in the hidden counter. The scalar "delete" destructs the first object in the vector. If it was a vector of zero objects, you corrupted memory. If it was a vector of two or more objects, then objects 2 an onward will not be destructed. (Result: Memory or other leak.)

    Next, the scalar "delete" will free the memory block directly, which will fail because the memory block actually starts at the hidden size_t in front of the vector. This again corrupts the heap since you are freeing memory that is not a valid heap pointer.

    Final result: also not good.

    What optimizations can be performed if the destructor MyClass::~MyClass() is removed from the class definition?

    If the class does not have a destructor, then no special work needs to be done when the vector is freed aside from freeing the memory. In this case, no hidden counter is necessary; the block can be allocated directly with no overhead and freed with no overhead.

    More specifically, if the class has a trivial destructor (none of its base classes or sub-objects - if any - have a destructor), then the scalar and vector new/delete allocate and free the memory the same way, and mixing them does not generate a runtime error. You got lucky.

    Of course, somebody might add a destructor to your class tomorrow, and then you won't be so lucky any more.

    Note of course that all of this discussion assumes compiler behavior as described yesterday. That behavior is implementation-dependent so you should not rely on it. You may be lucky today, but the next version of the compiler may change the way it manages vectors and your luck will have run out.

  • The Old New Thing

    The Glass Engine and Ishkur's Guide to Electronic Music

    The Glass Engine is an interactive guide to the music of Philip Glass, organized by... um... at least they're organized. By something.

    Bizarre yet oddly compelling.

    (Perhaps if we ask nicely, we can get Marc Miller to tell the story of the time he actually met Philip Glass...)

    In a similar vein, a friend of mine directed me to Ishkur's Guide to Electronic Music, an attitude-filled tour of the world of of electronic music.

    I'd like to say that I learned something, but that would be overstating it.
Page 3 of 4 (32 items) 1234