Welcome to MSDN Blogs Sign in | Join | Help

Shipping default values in the registry

Recently, someone asked about how their code should work if a certain configuration setting was missing from the registry. The responses were mixed.  One suggested checking the value of RegQueryValue. Another suggested that the missing value should cause the app to fail. The right answer is "it should use the default value."  "But hark!" you say - "the question was about how to deal with the missing default value!"  And the answer doesn't change - have a fallback strategy that can still pull a default value out of your ear at runtime.

Ok then, let's suppose you've done this and now have code that looks like "For this configuration widget, read the current value; if there is no current value, read the default; if there is no default, use <mystery value> instead."  That's three levels of decision making, three independent code paths for your test team to validate.  What if you could just cut out that middle piece and go back to two - a default in-code and a possible per-user override?

This leads me to the following broad statement - Defaults should be shipped in code, user configurations saved on the device in stable storage.  Three reasons:

1.       Setting a value during setup is a point of failure.  Yes it almost never happens.  But it happens.  The best setup you can imagine is one done by (logical) xcopy.  Imagine your app being run from a network share.  The vast majority of applications - web apps, for instance - have no "install" step.  They just start up and use default settings until overridden by a user-initiated configuration step.

2.       "Reset to known good state" should be "delete all overridden settings and start again."  Apps that have "repair" fail mightily when they try to distinguish important things to keep from unimportant things to keep.  If the app is able to start up and run correctly from a completely clean state "repair" can be "delete this key, restart the app."

3.       Servicing of user-defaults is impossible without an override model.  Imagine a default color value like "blue".  The user changes her mind and picks "orange" instead.  Later they decide they like "blue" again.  Now imagine an update coming down for your application that changes the default to "white".  A naïve updater might look at the old default (blue) and the current setting (blue) and say "aha! They're still on the default, change to white!" which is not what the customer intended.  They consciously chose "blue".  Granted, this means your settings model needs a "use default" button, but that's a small price to pay for customer sat.

Every interesting app has some form of configuration the user or the environment can supply it.  Well-designed apps work correctly even when there are no configurations provided at all.

Rude Q&A:

1.       "But I don't want to recompile to change my defaults!"  Have your default values in a text file in your app deployment unit.  Managed code provides an app configuration model based on yourexe.exe.config, which is reasonably OK.  Others use a .txt file structured like an INF.

2.       "But I can't possibly change all my code to use an override system!"  Have you checked?  I bet there's one or two interesting functions in your code that read those settings that you could change.  If there are many functions that all do configuration checks, refactor them into a single point of access where you can do this.

3.       "But having defaults in code means image bloat!" A dword in code is maximally 4 bytes in the const data section, less-so if the optimizer inlines uses of it.  A reg_dword is those four bytes plus the size of the registry value, the key that references it, the parent key of that key, and so on.  You're trading a small fixed overhead for a larger unknown overhead.

Note: this is orthogonal to registering with the system (such as creating services, registering COM goo, etc.).  Apps that integrate with the system do need an "announce" step (call it 'install' if you prefer).  The announcement-management should be owned, debugged, and tested by a central team.  Windows, CE, and others all do this wrong today for legacy reasons.  MSI tries to make this better by requiring you to register COM objects through their tables, but it's a stopgap because they still allow calling into DllRegisterServer.  One can imagine a system where a data-driven “announcement” mechanism is the only way to get integrated with the system, and the announcing application gets to run no actual code of its own.

Posted by jonwis | 0 Comments

Deleting from the WinSxS directory

In Windows Vista, the directory %windir%\WinSxS has much stronger protection on it than it did in Windows XP and Server 2003.  The owner/group is now a SID named "Trusted Installer", a service SID used to start the TrustedInstaller service.  Users other than the trusted installer are granted only generic-read/generic-execute by default.  This increased protection ensures that only the trusted installer service is allowed to modify the servicing-related metadata and files.  If a limited user could modify a file in the directory, for example, they could convince the servicing stack to overwrite one binary with another when the next administrator comes along to enable the Games for Windows package.

Content is added to this directory in response to installing applications, enabling packages in the add-remove-programs UI, and installing Windows Out-of-Band releases.  Content is removed from this directory as a result of uninstall + scavenging - a topic for another time.  One important note - uninstalling your application or Windows app will not necessarily remove the physical bits from the system.  The servicing stack marks the bits as unusable and prevents their use through "normal" means.  Files and directories will be removed over time as the servicing system cleans up after itself.  Administrators should not, for any reason, take it upon themselves to clean out the directory - doing so may prevent Windows Update and MSI from functioning properly afterwards.  Preventing accidental deletion from the directory is accomplished by putting a strong security descriptor on the directory that inherits to its children.

The directory itself is marked with the SDDL:

  • O:S-1-5-80-956008885-3418522649-1831038044-1853292631-2271478464
  • G:S-1-5-80-956008885-3418522649-1831038044-1853292631-2271478464
  • D:PAI
    • (A;OICI;FA;;;S-1-5-80-956008885-3418522649-1831038044-1853292631-2271478464)
    • (A;OICI;0x1200a9;;;BA)
    • (A;OICI;0x1200a9;;;SY)
    • (A;OICI;0x1200a9;;;BU)"

That long SID (S-1-5-80-956008885-3418522649-1831038044-1853292631-2271478464) is the service sid for Trusted Installer.  By parts, that SDDL means "the owner and group are Trusted Installer; there is a protected and auto-inherit DACL; the ACEs of that DACL grant the trusted installer full access; the ACEs of the DACL also grant builtin-administrators, local system, and builtin users generic-read and generic-write access; each ACE is also automatically inherited by child containers and child objects."

So, the security descriptor on the object will prevent even administrators from creating or deleting objects underneath %windir%\winsxs, which is why even an elevated administrator cannot (by default) delete content.

Elevated administrators do have (by default) the SeTakeOwnership privilege enabled.  Such a token could take ownership of anything under %windir%\winsxs.  Once the owner, the administrator could then reset the SDDL on the object, granting themselves full access to the object.  Of course, the generic LUA administrator token does not allow enabling the "take ownership" privilege, so this is only possible after answering the LUA elevation prompt successfully.

I leave it to the intrepid user to figure out the correct combination of "takeown.exe", "icacls.exe", and "rmdir" to actually remove content.

Posted by jonwis | 1 Comments

On-demand multhreaded critical section creation

A question came up on an internal newsgroup recently - "How do I do on-demand initialization of critical sections in a multithread-aware library?"  The asker didn't have an explicit Initialize function in which his critical section could be created, and instead wanted to know what the right approach was for creating one on demand. Below I provide a sample of how this could be achieved.  Next time, I'll have a "clarification" of the sample that has a better debugging profile.

// This structure is only a holder for a pointer for now.
typedef struct _ONETIME_CS {
  CRITICAL_SECTION volatile *Cs;
} ONETIME_CS, *PONETIME_CS;

BOOL InitOnetimeCs(PONETIME_CS pCs) {
  MemoryBarrier();
  if (pCs->Cs == NULL) {
    CRITICAL_SECTION *NewCs;
    NewCs = (CRITICAL_SECTION)HeapAlloc(GetProcessHeap(), 0, sizeof(*NewCs));
    if (NewCs == NULL) {
      SetLastError(ERROR_OUTOFMEMORY);
      return FALSE;
    }
    // InitCs and DeleteCs fail by raising SEH exceptions. When
    // those fly by, make sure to free the heap. The loser of
    // the race deletes the losing CS, and the __finally ensures
    // that it gets freed. The winner sees 'null' from the ICEP,
    // all others get what the winner put there.
    __try {
      PVOID OldCs;
      InitializeCriticalSection(NewCs);
      OldCs = InterlockedCompareExchangePointer(
        (volatile PVOID *)&pCs->Cs, NewCs, NULL);
      if (OldCs == NULL) {
        NewCs = NULL;
      } else {
        DeleteCriticalSection(NewCs);
      }
    } __finally {
      if (NewCs != NULL)
        HeapFree(GetProcessHeap(), 0, (PVOID)NewCs);
    }
  }
  MemoryBarrier();
  return (Guarded->Cs != NULL);
}

BOOL EnterOnetimeCs(PONETIME_CS pCs) {
  if (pCs != NULL) {
    if (!InitOnetimeCs(pCs))
      return FALSE;
  }
  EnterCriticalSection(pCs->Cs);
  return TRUE;
}

VOID LeaveOnetimeCs(PONETIME_CS pCs) {
  LeaveCriticalSection(pCs->Cs);
}

VOID DeleteOnetimeCs(PONETIME_CS pCs) {
  PVOID OldCs;
  // Delete can be multithread-aware as well, so use IEP to
  // atomically swap out the value with NULL.  The winner gets
  // a non-null value back, losers get NULL.
  OldCs = InterlockedExchangePointer(&pCs->Cs, NULL);
  if (OldCs != NULL) {
    DeleteCriticalSection((CRITICAL_SECTION*)OldCs);
    HeapFree(GetProcessHeap(), 0, OldCs);
  }
}

ONETIME_CS g_Cs = { NULL };
DWORD MyThreadProc(PVOID) {
  if (!EnterOnetimeCs(&g_Cs))
    RaiseError(ERROR_INTERNAL_ERROR, 0, 0, NULL);
  // do something
  LeaveOnetimeCs(&g_Cs);
}

This is just another example of "racy initialization" using InterlockedCompareExchangePointer. Some "racy init" patterns are dangerous - especially those that involve "read after write" of the data that's been initialized. That's what the memory barrier is there for - it guarantees (expensively) that any writes InitializeCriticalSection might have performed will be written back before the following call to EnterCriticalSection.

This method is cheap, but it does require extra heap and may create a few critical sections when there is contention. Luckily, the cost in CS creation is about zero in Vista - entering the CS when there is contention is where the real cost lies. Certain analysis methods are made harder - when the app verifier prints the address of a busted CS, "ln thataddress" won't tell you which symbol contained the critical section.

Next time, a method that avoids spurious CS creation, allows for "ln thataddress" on the CS address, and has zero heap cost!

Posted by jonwis | 0 Comments

DLLs and resource ID 2 manifests

Manifests at resource ID 2 help simplify the lives of DLL authors who want to consume side-by-side components via static imports. Just before processing your DLL's static imports and calling its entrypoint, the loader will create and activate an activation context based on the manifest that it finds at resource ID 2. This ensures that all your static imports are processed in the correct context - if this wasn't the case, binding your DLL's static imports would be affected by whoever was calling LoadLibrary("yourdll.dll"). Not a big deal, until you find out that they also had some dll loaded also called "zop.dll" with an export "Flarn", but whose parameter list looked completely different. The autoactivation ensures that your DLL is bound to the right "zop.dll" with the right "Flarn" export (at least subject to publisher policy.)

The logic (not the implementation) looks something like the following:

  1. LoadLibrary("yourdll.dll") is called by the application
  2. "yourdll.dll" is found using using the current context plus the usual DLL lookup mechanisms.
  3. When found, if yourdll.dll is not already loaded into this context, the DLL is mapped.
  4. CreateActCtx is called using the mapped DLL's HMODULE, passing resource ID MAKEINTRESOURCE(2) as the resource number.
  5. If that succeeded, the resulting context is activated.
  6. The static imports of "yourdll.dll" are resolved, possibly recursively executing these operations until all the statically referenced DLLs have been loaded.
  7. The DLL entrypoint for "yourdll.dll" is invoked.
  8. If a context was activated above, it's deactivated.
  9. Control returns to the application with an appropriate HMODULE

You can use any manifest you care to in your resource section. Visual Studio has options for embedding manifests at the right resource ID. Nikola has some tips for doing so in both makefiles and the VS IDE.

Publisher policy is of course applied, but application policy (ie: foo.exe.policy) is not. The usual "look in the app directory for private assemblies" logic applies as well, so this works even if your DLL is distributed privately with an application.  See also Using Side-By-Side Assemblies as a Resource, which covers when various resource values will be used during the loading and initialization of DLLs and executables.

Posted by jonwis | 0 Comments

C++ object for activating and Deactivating Contexts

As explained earlier, the activation context system is implemented as a per-thread (or per-fiber) stack. Activating a context pushes the context onto the stack, deactivating it pops it from the stack. To ensure that the same sequence of pushes and pops is performed, each ActivateActCtx call gets a cookie - an opaque value that identifies the activation request. That cookie is then passed to the DeactivateAcCtx call, and the system verifies that it matches the context at the top of the stack. If the cookie you ask to deactivate is not the top of the stack, the Win32 SEH exception STATUS_SXS_EARLY_DEACTIVATION is raised; if the cookie is not in the current thread's stack at all, STATUS_SXS_INVALID_DEACTIVATION is raised. These errors are extremely hard to track down.

Being a good citizen of activation contexts can be done with a simple rule - perform both the activation and deactivation in the same function. Here's a sample C++ object that does this for you. Be sure to compile asynchronous exception unwinding - the Visual C++ compiler option is /EHa. If you're using some other compiler, look for an option that calls destructors as both C++ and SEH exceptions flow past your frame and forces the compiler to assume any individual instruction may cause an SEH to be raised.

class CActCtxActivator {
  HANDLE m_hContext;
  ULONG_PTR m_Cookie;
  CActCtxActivator(const CActCtxActivator &);
  void operator=(const CActCtxActivator &);
public:
  // Addref the context to avoid someone dereffing it under you
  CActCtxActivator(HANDLE hContext) {
    m_Cookie = 0;
    m_hContext = hContext;
    if (m_hContext != INVALID_HANDLE_VALUE)
      AddRefActCtx(m_hContext);
  }
  // Deactivate if it was active, deref if we had a real one
  ~CActCtxActivator() {
    Deactivate();
    if (m_hContext != INVALID_HANDLE_VALUE) {
      HANDLE hContext = m_hContext;
      m_hContext = INVALID_HANDLE_VALUE;
      ReleaseActCtx(hContext);
    }
  }
  // Activate the context. Feel free to use "throw" if you're a C++ EH kind of
  // person.  Double activation is a programmer's error - this class is not
  // reentrant
  bool Activate() {
    if (m_Cookie != 0)
      RaiseException(ERROR_INTERNAL_ERROR, 0, 0, 0);
    return ActivateActCtx(Ctx.m_hContext, &m_Cookie) ? true : false;
  }
  // Deactivate if activated - somewhat asymmetric behavior with Activate, as
  // it doesn't fail when the context isn't active.
  void Deactivate() {
    if (m_Cookie != 0) {
      ULONG_PTR Cookie = m_Cookie;
      m_Cookie = 0;
      DeactivateActCtx(0, Cookie);
    }
  }
};

So now, you have the ability to control your activation and deactivation automagically with an instance and a global context handle. Assuming that context handle is called g_hActCtx, you can use this in the following way:

void Foo() {
  CActCtxActivator Activator(g_hActCtx);
  if (Activator.Activate()) {
    if (Bar()) {
      if (!Zot())
	return;
    }
  }
}

I'll soon use the above object to deal with calling out to plugins to avoid polluting their contexts (the other direction of being a good citizen.) As with any code posted here, it's provided as-is without warranty for any purpose.

[Edit 1/17/2006: I knew this felt familiar; my memory is hazy, but I vaguely recall writing this up before many moons ago...]

Posted by jonwis | 0 Comments

Fixing Activation Context Pollution

As the number of apps in the world that use side-by-side activation (as a result of depending on the new Visual C++ Runtime v8.0) increases, providers of callable code (libraries, control packs, whatever) may start seeing odd and potentially unexpected behavior. Typically it's hard to diagnose. Somewhere deep inside your publicly exposed surface area, a LoadLibrary(gdiplus.dll) fails, or CreateWindow(ComCtlv6WindowClass) fails. You might have fallen victim to context pollution inadvertantly caused by your caller. There is hope however, in that you can control your own destiny even in the face of your callers' context.

The activation context stack is as it sounds, a per thread stack (on Windows Server 2003 it's per thread or per fiber) of contexts that have been activated. When the process is launched, an initial context may be created and pushed into the process as the process default context. When thread A creates thread B, the currently-active context for A will flow onto B as the top of B's thread context stack. Sometimes, this flow of contexts works in your favor. The windowing system captures the current stack top along with a SendMessage/PostMessage, then ensures that the context is activated before it invokes the target window procedure. Similarly, sending an APC or making a cross-apartment COM call both capture the stack top and push it onto the stack of whatever thread ends up servicing the request before calling whatever code is to be invoked. The important point is that the stack top at the target is the same as the stack top at the call site. When whatever target code was invoked returns, the top of the stack is popped so that the thread returns to the state before the call was performed.

This automatic flow of contexts ensures that the call executes as if it was running in the same context as when the call was initiated. It only covers a small subset of the possible ways code can be invoked, and it certainly doesn't help the usual case - the "call" instruction generated by the compiler. Unless you (the provider of a library) take special action, your library code will execute using the current context stack top - whether that's what you wanted or not!

Let's go back to our example to see what this might mean. Assume your library depends on some other side-by-side components that use registration-free COM activation - maybe RTC as an example. When control transfers into your function, the first thing you do is CoCreateInstance(CLSID_SomeRTCObject). Your tests work - you authored "testharness.exe.manifest" which contains a reference to Microsoft.Windows.Networking.RtcDll, so that the process default context created at process launch always contains component registration information for CSLID_SomeRTCObject. Your clients are writing a rendering engine on top of your functionality that uses GDI+, and complain that your function always fails with "class not registered." Doing a little debugging, it turns out that CoCreateInstance is returning that error.

Your library has just fallen prety to context pollution. The rendering client executable had its own "client.exe.manifest", but it only referred to the GDI+ assembly that it knew it needed. The context it has on the stack when it calls into your code does not have the RtcDll assembly in it, so it can't have any of the COM registration information present either. One immediate fix is to demand that client executables always reference the RtcDll assembly as well, so that the activation information will be present at the time of your CoCreateInstance call. Fortunately, the client balks at this - their app has worked forever, at least until they started using your communication DLL.

The way out of this mess is to be Master Of Your Own Destiny. Don't like the context you're handed? Create an activate another one! Creating an activation context is easy, if you've burned your manifest into your resource section (a topic for another day). Activating a context is easy. Deactivating a context is easy. The only "hard" part is remembering to activate & deactivate in the right places and at the right times.

Assume that you have a finely crafted manifest that refers to the correct set of components. You've gone through the right steps, and this manifest is now baked into your DLL's resource section with type RT_MANIFEST at resource ID 1000. To create a new activation context from this manifest, use CreateActCtx:

HANDLE hMyActCtx = INVALID_HANDLE_VALUE;
ACTCTX Request = { sizeof(Request) };
Request.Flags = ACTCTX_FLAG_HMODULE_VALID | ACTCTX_FLAG_RESOURCE_NAME_VALID;
Request.hModule = g_hMyModuleHandle;
Request.lpResourceName = MAKEINTRESOURCE(1000);
hMyActCtx = CreateActCtx(&Request);

On success, hMyActCtx will be something other than INVALID_HANDLE_VALUE. You can now use this with ActivateActCtx and DeactivateActCtx:

HRESULT MyPublicFunction(...) {
  ULONG_PTR ulpCookie = 0;
  ActivateActCtx(g_hMyActCtx, &ulpCookie);
  CoCreateInstance(CLSID_SomeRTCObject);
  DeactivateActCtx(0, ulpCookie);
}

Now, when CoCreateInstance looks at the active context, it'll see the one you activated rather than the one your client had activated. Pollution solved - you now control your own destiny. Go nuts - have different manifests for different top-level entrypoints. Have one mondo context that contains everything you'll need. No matter what context your caller might have stuck you with, you have complete control over the context you decide to bind code with.

Next time - making your life easier with C++ RAII around activation and deactivation; "racy initialization" of the activation context, and more.

Posted by jonwis | 0 Comments

What's that awful directory name under Windows\WinSxS?

As the Visual C++ Runtimes version 8.0 is now a side-by-side component, you may have seen what looks like an unreasonably complexly named path from which parts of the CRT are loaded. "Golly, what can they possibly be thinking - creating a directory whose name is full of underscores, numbers, and dots?" The good news is, we definitely were thinking something. You may or may not have ever peeked into the %windir%\winsxs directory on your system. If you haven't, now would be a good time. First thing you'll notice is that there are a lot of those funkily-named directories. You might further notice that there seem to be several that differ only by what looks like a version number and some random-looking eight characters on the end of the name. Next you might see that some of them differ only by the second-to-last stringish thing. Lastly, note that mostly, the strings can be deciphered with a little help.

In the component world, each component has what's called an identity. This is the unique name of the component, generated by the component author and referred-to in manifests and user interfaces. No two components have exactly the same set of properties; if they did, even if the file contents were different, they would be considered the same component. (Note: the CLR abuses this rule and often ships new bits under old identities - that's outside the scope of today's missive.) There's a whole set of rules around how identities work, which I may get to at some point. The key thing is that identities are basically property bags of string triplets - namespace, name, and value - for each attribute. Those attributes in the bag without a namespace are called well known attributes, and there are only a few of those (only Microsoft is currently allowed to define new ones...). Further, certain well-known attributes have rules around their values - the version attribute has to be a dotted-quad version, the public key token attribute has to be a string of hex digits of nonzero but even length. Other well-known attributes like name can be whatever you like - "Foo:Bar:Bas" is ok, as is just "Q".

Each shared component (in the winsxs directory) gets its own directory into which its payload bits are placed. Somehow, we have to generate (mostly) unique & repeatable directory names for this purpose. The requirements of directory names are reasonably simple - can't overall be more than MAX_PATH (260) characters, can't contain certain characters, etc. Given the naming requirement, it was impossible to use the entire identity as the name of the directory, as someone could name their component "foo\bar" and mess things up. With the extensibility requirement for identities themselves, we couldn't possibly use the entire identity, as the set of tuples would end up being far longer than MAX_PATH. Most importantly, we wanted the directory names to be readable to your average administrator or PSS representative. Finally, generation of the keyform from an identity had to be fast.

So, Mike Grier came up with the idea for a key form of identities. This key form would be a reasonably-unique one-way noncryptographic readable representation of the major defining attributes of an identity. What he ended up with was the following:

proc-arch_name_public-key-token_version_culture_hash

The italicized strings (except for the hash) are replaced by the values from the identity for their respective properties. If the property was unset, then "none" was put in it place. In the identity model name, processor architectureand culture are allowed to have very laxly-validated contents, so they may contain "unfriendly" characters that have to be filtered. Characters not in the group "A-Za-z0-9.\-_" are removed from the attribute value before being written into the string. Certain attributes have upper-limits places on their values (nameis limited to 64 characters, processor architecture to 8, culture to 12) achieved by dropping characters from the middle of the filtered string and replacing them with "..". Finally, the whole string is lower-cased using a clone of the unicode casing table that shipped in Windows XP RTM.

Voila! A string representing the identity that's filesystem friendly!

But wait... what about all those characters that got dropped? Couldn't I construct an identity whose keyform matched the keyform of another? Yes, if it weren't for the _hash value on the end of the keyform. This hash (not in the cryptographic sense) is of all the namespaces, names and values of properties in the identity. Anything that didn't appear in the keyform text would have been represented in this hash. The two identities whose names are "Foo!" and "Foo?" will generate different keyforms - while the ! and ? were dropped from the keyform, they still appear in the hash. A coworker did some experiments and determined that while it was possible to reset the hash generation function, it would involve a ridiculous amount of work.

The algorithm for generating the keyform overall (especially the hash) is undocumented at this time. Not because we're trying for security through obscurity, but because the keyform is merely an implementation artifact at this point. Maybe someday we'll lose our heads entirely and store the component payloads in a database of some sort, in a compound storage document, CAB file, whatever.  Also, the algorithm has changed for Vista, and no "normal" use cases for knowing the algorithm exist. If you're trying to find files in the WinSxS directory, you should be using the CreateActCtx/ActivateActCtx/SearchPath set of APIs. If you're trying to write files into the WinSxS directory, you should be using MSI which knows about installing components into the right places. If you're writing your own binder, don't - it's really hard to get right. It should be sufficient to say "the generation of this string is opaque and must be assumed to change."

But wait... what stops Evil Bob from creating a component with the exact same name and overwriting what Nice Jill had already shipped? Suffice it to say, that public key token has something to do with it - I'll explain that next time, when I talk about signing catalogs for side-by-side components.

Posted by jonwis | 1 Comments

Making an MSM from whole cloth – Series Intro

Not too long ago, I was asked to provide a set of merge modules for an external team. The restriction was that they could accept an exe whose output was the merge module in question, and it had to generate stable – based on the content – component IDs. I’m being cagy because the team in question hasn’t hit RTM yet, I don’t want to steal their thunder.

My first inkling was to use the excellent WiX tools. I’d have my tool cobble up a .wxs file and CreateProcess on both "light" and "candle", creating the "all in one" tool experience the team wanted. Here I was, coding along, when the team in question came back and said "Sorry, we don’t think we can support using WiX in our build process either directly or indirectly." Enter roadblock number one. I’d never used the MSI APIs to generate modules, let alone interact with them! Several hours and hundreds of links followed on MSDN later I was headed down a path of hand-creating the entire merge module from scratch – cabs and all.

The most frequent criticism I have about our documentation is that while it provides small use cases for individual APIs, it’s rare (outside the Crypto docs, which are excellent) to find an end-to-end all-encompassing demo for complex use cases. All the items I found were disjoint micro-samples about how to read and query MSI databases and what - at a broad level – a merge module should contain. So, almost nine months later, four intensive test cycles, three frustrated PMs, and two major design changes later, it’s done. I plan to share my long strange odyssey through MSI-land in a series of posts over the next month – going from "what tables do I need?" to "what do you mean you want to merge the cabs?"

And yes, this time the entries will get written and posted, as I’ve scheduled myself an hour a day this week and next to write installments.

Posted by jonwis | 0 Comments

Desiging Datastructures for Longevity

One of the more problematic areas of long-lived software is the versioning and updating of shared structures. As improvements come to a package, it's inevitable that you'll have to add more fields to a structure. Your structure today may contain two pointers, but tomorrow needs to contain three. If you ship all the bits at the same time, then there's never any problem - everything just gets recompiled and the right sizes are reserved on various stacks, and accesses to those fields don't accidentally overlap existing slots. If you don't ship all at the same time, or you persist data structures out to disk, you're eventually going to run into a problem. Here's an example:

struct Foo {
    void *pvSomething;
    void *pvSomethingElse;
}
void Bar(const Foo *pcFoo);
void Zot() {
    Foo f = { NULL, NULL };
    Bar(&f);
}

In this v0 of the structure layout, the compiler reseved two pointer-sized things on the stack, and initialized them both to zero. Let's further say that Bar and Zot live in different modules. Now, assume that something comes along and you decided to add two more pointer-sized things to your structure. After rebuilding the module containing Bar, you try testing and notice "random" access violations when reading that new field. What's wrong? With a little debugging of Bar, you notice that the new field is a random value when coming from Zot .. You've just been bitten by structure versioning!

The problem here is that since you didn't recompile Zot with the new structure header, the compiler's baked-in two-pointers structure is smaller than the expected three-pointers structure in the new Bar. That third pointer, when read from Bar, points to the next local past "f" in Zot, often times the return address. How do you solve this problem? What can be done to avoid it in the future?

First off, it's important to recognize that this problem happens with any layout-dependent object, from classes to structures to vtables. If you change the layout of an object in a way that changes its behavior for existing producers/consumers, you must provide a versioning mechanism. In the world of vtables, the typical answer is that new functions are added to the end of a vtable. Existing callers continue to call vtable[x], new callers call vtable[y], and as long as x < y all is good. Structures, on the other hand, don't do as well here. Often, structures are embedded in persisted storage, in an array, or other such places that are hard to version. The case of being passed on the stack (or from heap!) is a simplification of this problem.

Solution 1 - Add a Version field

This is the simplest, but possibly least useful answer over time. In each structure, have the first field be called Version, or something similar. In your shared headers, create #define values starting at 1 and moving forward as you revision the structure. Generators of structure instances specify the version number that they are generating, and consumers of the structure change how they interpret the structure based on the version found.

As an upside, as long as this version field is consistently maintained, you may completely shift the layout of the structure over time - remove some fields, add some fields, retype some, whatever strikes your fancy. The version number makes sure that it's correctly interpreted by the callees of your functions.

The downside to this approach is that sometime, somewhere, you're going to forget to update the version number. Additionally, you'll have to maintain all those old structure definitions you created over time so that consumers can know what "version 3" really looked like. This approach is attactive because of its simplicity, but the fact that it requires manual intervention to stay correct suggests that there's more to be done.

Solution 2 - Add a Size field

Aside from explicit versioning of the structure, you can definitely get a hand in this problem from the compiler itself. To do this, add a field called Sizeto the top of all structures whose size may shift over time. Initializations of these structures must also change somewhat, as they must also specify the size. You can do this as such:

struct Foo {
    unsigned long Size;
    void *pvSomething;
    void *pvSomethingElse;
};
void Zot() {
    Foo f = { sizeof(f) };
}

This has the lovely side-effect of zero-initializing the remaining members of the Foo instance. The callee changes as such, using the included handy macros:

#define FIELD_OFFSET(TType, Field) ((size_t)(&(((TType*)NULL)->Field)))
#define FIELD_SIZE(TType, Field) (sizeof(((TType*)NULL)->Field))
#define SIZE_THROUGH_FIELD(TType, Field) (FIELD_OFFSET(TType, Field) + FIELD_SIZE(TType, Field))
void Bar(Foo *pf) {
     if (pf->Size >= SIZE_THROUGH_FIELD(Foo, pvSomething))
         /* Use pf->pvSomething */
    if (pf->Size >= SIZE_THROUGH_FIELD(Foo, pvSomethingElse))
        /* Use pf->pvSomethingElse */
}

So when the Foo structure gets another two pointers, the "old" code in Zot continues to set the size to what it knew about (which on 32-bit machines would have been 12), and the "new" code in Bar (which contains extra uses of SIZE_THROUGH_FIELD) does the Right Thing when it detects that the structure passed is only 12 bytes long rather than the 16 or 20 bytes that it might need. In this manner, Bar is able to adjust its behavior when it finds old clients, and even newer clients could behave better if they only have to set certain members.

Solution 3 - Use a Flags field

As a hybrid of #1 and #2, adding a field containing flags is another good answer, but it doesn't directly answer the problem of size and layout changes over time. As you add members to a structure, add a #define for the member's validity. When that member is set and available, or it into the Flags member. Consumers of the structure must look at the Flags member before inspecting other members of the structure. As "old" generators won't know about new flags, and don't set those new fields, you're guaranteed that new consumers of old structures won't behave poorly. Here's an example:

#define FOO_FLAGS_PVSOMETHING_VALID (0x00000001)
#define FOO_FLAGS_PVSOMETHINGELSE_VALID (0x00000002)
struct Foo {
    unsigned long Flags;
    void *pvSomething;
    void *pvSomethingElse;
};
void Zot() {
    Foo f = { 0 };
    f.pvSomething = /* ? */
    f.Flags |= FOO_FLAGS_PVSOMETHING_VALID;
    Bar(&f);
} void Bar(const Foo *pf) {
    if (pf->Flags & FOO_FLAGS_PVSOMETHING_VALID)
         /* Consume pf->pvSomething */
    if (pf->Flags & FOO_FLAGS_PVSOMETHINGELSE_VALID)
         /* Consume pf->pvSomethingElse */
}

When you add another field to Foo, you also create #define FOO_FLAGS_ANOTHERFIELD_VALID (0x00000004). Bar changes to check that flag before using the AnotherField member. Even though Zot didn't know about AnotherField, since it didn't set that new flag, there's no harm done when an old-sized structure is passed to a new-sized-reading function. The downside with this approach is that you can typically only have 32 fields. And, if you steal flags values for other purposes (changing the meaning of a member when a second flag is set), you've got even less. This method says nothing about the actual layout of the data structure, and requires that new fields are also always added at the end.

Conclusions

Versioning structures is hard, but you must do it if you plan to service independently two components that share an object (structure or otherwise). The easy answer of setting a version member in the structure makes this very easy from the client side, but makes it much harder from the consumer side where each structure definition must be kept around. Adding a "valid through" size field makes the code obviously tied to the size of the structure, but requires some help from the compiler and some macro magic on the consumer's side to interpet the results. Flags indicating which fields are set and valid is probably the easiest, but pushes that "what is valid" work onto the caller - they have to remember to set the correct bit when they want the callee to interpet a field.

What's the optimal solution? Take both #2 and #3 together. Have the first field be the valid size in bytes, because you get that for free from the compiler; saying Foo f = { sizeof(f) }; will just be natural after a while. The second field contains flags, indicating which of the possibly-valid fields should be interpreted. Not all data types have a handy "not valid" value like NULL for pointers. How do you indicate that an int is invalid and should not be read? If you use the Flags approach, you can know this - if the flag for the field isn't set, then don't read it.

Future-proofing your data structures should be done as part of the initial design. Once the first set of compiled bits has walked out the door, it's too late to introduce these mechanisms. Design with servicability in mind and you won't have to worry about structure-layout skew.

Posted by jonwis | 1 Comments

Registration-free applications and components

An area of new technology in Windows XP and Windows Server 2003 that didn't get nearly enough coverage is the ability to write applications and components that take full advantage of COM without actually registering anything on the target system.  Apps developed with this registration-free mechanism don't require a call to RegSvr32 during install to get their intra-application COM objects set up - no tampering with the registry to get progids listed, no screwing around with INFs and installers - just xcopy (or “net use”) and go.  Isolated applications don't just do COM, however - they also do self-contained xcopy-deployed applications using resources and MUI.  There's an entire book to be written on the topic of side-by-side apps & components that I just don't have time to sit down and pound out given my schedule.  However, I'll be posting a series of “mini chapters” covering this topic - designs, implementations, strategies, etc. Bear with me, as I'm definitely not an author by training.  The first chapter, “Being Isolated” should appear in this space in about a week.

If you happen to be actively developing side-by-side components and applications to take advantage of registration-free COM work (hey, this works from managed code, too...), or you've heard of the topic but want to know more specifics, I'm interested in hearing what areas I should cover.  Do you want code samples?  Airy academic discussion?  Dissection of what CoCreateInstance really does? Something inbetween those three?  Let me know by posting feedback.

Posted by jonwis | 4 Comments

Intro

Howdy - I'm a software design engineer at Microsoft, in the Windows core technologies group. My work involves isolating applications, components, and the operating system from each other. I don't have much original or interesting to say, but by gum - I'll say it here. Like many of my generation, I grew up using computers. I won't enumerate them all, but the list starts with an Atari 800 XL and hasn' ended yet.

The unfortunate state of the world today is that once an installer exits (either for an operating system or a software package), the resulting state of the world is completely unknown and undefined. Sure, judicious use of regmon and filemon will let you guess at what those installers did, and maybe you'll be able to uninstall. Compound this with the large set of installations required to get a running system, differing versions of shared components, and even more fun around points of extensibility, and you end up with a morass of crap - the only hope is to flatten and reinstall the system.

So what happens when we uninstall? Well, generally, the uninstaller has a list of objects that it installed, and some sort of reference tracking to know when stuff should go away. The installer decrements those references, and whacks the stuff that should be deleted. Too bad the installer for FoobleSoft's Bongoblam III's installer forgot to increment the reference on the shared component ponkfoo.dll - it hit zero, so the installer does away with it. Practical upshot? Next time you run Bongoblam III, it crashes during binding static imports. Well crap.

How do you help this? You could move to a single installer (MSI), so that only that installer knows how to do the right thing. You could move to a world where the system deeply understands the requirements between DLLs and EXEs (or images in general) and refuses to allow deletion of images if any references on them are alive. Both of thse, when done correctly, fix the problem of shared components accidentally disappearing out from under you. That's a good step forward...

But, it's not sufficient. That package you uninstalled - call it Barbaszot - and Bongoblam both wanted to be the .XYZ shell extension handler. The installer's simple logic noticed that Barbaszot had written to HKCU\.xyz, so it deletes everything that was updated. Now when the shell wants to activate, it could find Bongoblam III, but it can't! The .xyz handler metadata is gone, and the shell pops up a "what do you want to do with this file" dialog.

Smarter installers - like MSI - can help with this as well, by noticing the state of the world before values are written or updated in the registry. Maybe when you uninstall Barbaszot, it knows to just rewrite the old values, and suddenly Bongoblam III is now the handler again. That's groovy. But wait - what if Bongoblam was installed after Barbaszot, but Barbaszot stole back the extension handler (this horrible practice started about five minutes after the first alternate handler for .whatever files was created...)? The installer in all its logic says "well there was no value before" and helpfully deletes the keys. Now Bongoblam III is out in the cold again.

What can we do about all this? My work involves doing completely self-described simple and nondestructive operations to a running system. In my world, when the shell was looking for an .xyz handler but after Barbaszot had been removed, it would look through a data-driven list of applications that had declared themselves to be .xyz handlers. Maybe it would have prompted the user, maybe it would have picked the last-used, maybe it would have picked the first random one. In any case, removal of Barbaszot would not have destroyed the list of possible choices

It would appear I'm rambling again. I'll have more later - less ranting on how bad software is now, and more things I've discovered while working here.

Posted by jonwis | 6 Comments
 
Page view tracker