A question came up on an internal newsgroup recently - "How do I do on-demand initialization of critical sections in a multithread-aware library?"  The asker didn't have an explicit Initialize function in which his critical section could be created, and instead wanted to know what the right approach was for creating one on demand. Below I provide a sample of how this could be achieved.  Next time, I'll have a "clarification" of the sample that has a better debugging profile.

// This structure is only a holder for a pointer for now.
typedef struct _ONETIME_CS {
  CRITICAL_SECTION volatile *Cs;
} ONETIME_CS, *PONETIME_CS;

BOOL InitOnetimeCs(PONETIME_CS pCs) {
  MemoryBarrier();
  if (pCs->Cs == NULL) {
    CRITICAL_SECTION *NewCs;
    NewCs = (CRITICAL_SECTION)HeapAlloc(GetProcessHeap(), 0, sizeof(*NewCs));
    if (NewCs == NULL) {
      SetLastError(ERROR_OUTOFMEMORY);
      return FALSE;
    }
    // InitCs and DeleteCs fail by raising SEH exceptions. When
    // those fly by, make sure to free the heap. The loser of
    // the race deletes the losing CS, and the __finally ensures
    // that it gets freed. The winner sees 'null' from the ICEP,
    // all others get what the winner put there.
    __try {
      PVOID OldCs;
      InitializeCriticalSection(NewCs);
      OldCs = InterlockedCompareExchangePointer(
        (volatile PVOID *)&pCs->Cs, NewCs, NULL);
      if (OldCs == NULL) {
        NewCs = NULL;
      } else {
        DeleteCriticalSection(NewCs);
      }
    } __finally {
      if (NewCs != NULL)
        HeapFree(GetProcessHeap(), 0, (PVOID)NewCs);
    }
  }
  MemoryBarrier();
  return (Guarded->Cs != NULL);
}

BOOL EnterOnetimeCs(PONETIME_CS pCs) {
  if (pCs != NULL) {
    if (!InitOnetimeCs(pCs))
      return FALSE;
  }
  EnterCriticalSection(pCs->Cs);
  return TRUE;
}

VOID LeaveOnetimeCs(PONETIME_CS pCs) {
  LeaveCriticalSection(pCs->Cs);
}

VOID DeleteOnetimeCs(PONETIME_CS pCs) {
  PVOID OldCs;
  // Delete can be multithread-aware as well, so use IEP to
  // atomically swap out the value with NULL.  The winner gets
  // a non-null value back, losers get NULL.
  OldCs = InterlockedExchangePointer(&pCs->Cs, NULL);
  if (OldCs != NULL) {
    DeleteCriticalSection((CRITICAL_SECTION*)OldCs);
    HeapFree(GetProcessHeap(), 0, OldCs);
  }
}

ONETIME_CS g_Cs = { NULL };
DWORD MyThreadProc(PVOID) {
  if (!EnterOnetimeCs(&g_Cs))
    RaiseError(ERROR_INTERNAL_ERROR, 0, 0, NULL);
  // do something
  LeaveOnetimeCs(&g_Cs);
}

This is just another example of "racy initialization" using InterlockedCompareExchangePointer. Some "racy init" patterns are dangerous - especially those that involve "read after write" of the data that's been initialized. That's what the memory barrier is there for - it guarantees (expensively) that any writes InitializeCriticalSection might have performed will be written back before the following call to EnterCriticalSection.

This method is cheap, but it does require extra heap and may create a few critical sections when there is contention. Luckily, the cost in CS creation is about zero in Vista - entering the CS when there is contention is where the real cost lies. Certain analysis methods are made harder - when the app verifier prints the address of a busted CS, "ln thataddress" won't tell you which symbol contained the critical section.

Next time, a method that avoids spurious CS creation, allows for "ln thataddress" on the CS address, and has zero heap cost!