Welcome to MSDN Blogs Sign in | Join | Help

What possible use are those extra bits in kernel handles? Part 3: New object types

Last time, we saw how those extra bits can be used to multiplex HANDLE with other values. That was a specific case of a more general scenario: Expanding the handle namespace to include things that aren't handles. (You can also view today's example as a generalization of the sentinel value problem, where we need to generate an arbitrary number of sentinel values dynamically. Actually, multiplexing HANDLE with HRESULT is also just another special case: We expanded the handle namespace to include error codes too.)

As I noted in the base article, the people who are most interested in this sort of thing are people writing low-level class libraries or wrapping kernel objects inside a larger framework.

Suppose, for example, you are writing a library that manipulates kernel objects, but you also have private types of objects (say, a handle to a remote computer) that you also want this library to be able to manipulate. One way of doing this is to wrap everything inside some base class that virtualizes away what type of handle you're working on:

class ExtendedHandle {
public:
  virtual ExtendedHandleType GetType() = 0;
  virtual ~ExtendedHandle() { }
};

class KernelExtendedHandle : public ExtendedHandle {
public:
  static KernelExtendedHandle *Create(...);
  ExtendedHandleType GetType() { return KernelHandle; }
  HANDLE GetHandle() { return Handle; }
private:
  KernelExtendedHandle(...);
  HANDLE Handle;
};

class RemoteComputerExtendedHandle : public ExtendedHandle {
public:
  static RemoteComputerExtendedHandle *Create(...);
  ExtendedHandleType GetType() { return RemoteComputer; }
  LPCTSTR GetComputerName() { ... }
  ... other remote computer methods ...
private:
  RemoteComputerExtendedHandle(...);
  ... stuff necessary for tracking remote computers ...
};

Now, your library spends only 1% of its time manipulating these private object types; most of the time, it's dealing with regular kernel handles. In other words, 99% of your objects are of type KernelExtendedHandle. What used to be just a HANDLE (4 bytes) is now a EHANDLE that in turn points to an 8-byte structure (4 bytes for the vtable and 4 bytes for the HANDLE). Your memory requirements have tripled and you added another level of indirection (costing you locality), just for that 1% case.

But you can do better if you have those extra bits to play with. Since 99% of the objects you're wrapping are just plain old kernel handles, you can say that if the bottom bit is clear, then it's just a kernel handle, and if the bottom bit is set, then the upper bits tell us what we are really operating on.

typedef void *EHANDLE;

BOOL IsKernelHandle(EHANDLE Handle)
{
  return (!((INT_PTR)Handle & 1));
}

EHANDLE CreateHandleFromKernelHandle(HANDLE Handle)
{
  // if this assertion fires, then somebody tried to
  // use an invalid kernel handle!
  assert((!((INT_PTR)Handle & 1));
  return (EHANDLE)Handle;
}

EHANDLE CreateHandleFromExtendedHandle(ExtendedHandle Handle)
{
  // if this assertion fires, then somebody allocated
  // a misaligned ExtendedHandle!
  assert(!((INT_PTR)Handle & 1));
  return (EHANDLE)((INT_PTR)Handle | 1));
}

ExtendedHandle *GetExtendedHandle(EHANDLE Handle)
{
  assert(!IsKernelHandle(Handle));
  return (ExtendedHandle*)((INT_PTR)Handle & ~1);
}

ExtendedHandleType GetType(EHANDLE Handle)
{
 if (IsKernelHandle(Handle)) {
   return KernelHandleType;
 } else {
   return GetExtendedHandle()->GetType();
 }
}

void ECloseHandle(EHANDLE Handle)
{
  if (IsKernelHandle(Handle))
  {
    CloseHandle(GetKernelHandle(Handle));
  } else {
   delete GetExtendedHandle();
  }
}

Now the cost of a wrapped kernel handle is just 4 bytes: 4 bytes for the EHANDLE, which also doubles as the actual kernel handle. The cost of wrapped pseudo-handles is the same as before (4 bytes for the EHANDLE, plus the size of the corresponding XxxExtendedHandle class). We used the trick from last time in order to pack 33 bits into only 32 bits: Since we know that the bottom bit of both kernel HANDLEs and ExtendedHandle pointers are zero, we can use it as a discriminator.

If you are not confident that your ExtendedHandle classes all use aligned pointers, you can use a different packing mechanism by using your own handle translation table:

ExtendedHandle *ExtendedHandleTable;

ExtendedHandle *GetExtendedHandle(EHANDLE Handle)
{
  assert(!IsKernelHandle(Handle));
  return ExtendedHandleTable[(INT_PTR)Handle >> 1];
}

Using this technique, the upper 31 bits of an EHANDLE which refers to an ExtendedHandle is an index into a privately-managed table of ExtendedHandle objects.

This secondary handle table technique is entirely analogous to the trick which one Posix library uses to distinguish "real" process IDs from "virtual" process IDs, except that they are relying on undocumented behavior because the bottom bits of process IDs are not guaranteed to be zero!

So there you have it, three scenarios where you can take advantage of knowing that the bottom bits of kernel handles are always zero.

Published Friday, August 29, 2008 7:00 AM by oldnewthing
Filed under:

Comments

# re: What possible use are those extra bits in kernel handles? Part 3: New object types

Friday, August 29, 2008 10:22 AM by Adam

Not a big deal, but near the end of the snippet:

delete GetExtendedHandle();

should be

delete GetExtendedHandle(Handle);

# re: What possible use are those extra bits in kernel handles? Part 3: New object types

Friday, August 29, 2008 10:40 AM by An Hero

So, how much of a performance difference between the two methods do your extensive benchmarks reveal?

# re: What possible use are those extra bits in kernel handles? Part 3: New object types

Friday, August 29, 2008 11:02 AM by Pierre B.

I'll admit right away that I didn't *all* the comments on each part of this serie.

My question is: the fact that the bottom two bits are available is something at the NT-kernel level (in ntdef.h). How can we be sure those bits really are available when we are writing code at the Win32 level?

Some APIs (the GetQueuedCompletionStatus function being the initial example) might use them now. Some APIs or sub-system might use them in the future.

Is this trick reserved for the Windows team?

# re: What possible use are those extra bits in kernel handles? Part 3: New object types

Friday, August 29, 2008 11:05 AM by Eddie

@An Hero

Yeah, but he is not talking about today's computers, he is talking about yesterday's computers that the code would be written for.  I am sure benchmarks would reveal that the difference is little.

I think he is simply trying to justify why the design was made the way it was

# re: What possible use are those extra bits in kernel handles? Part 3: New object types

Friday, August 29, 2008 12:09 PM by Leo Davidson

4 bytes can make a huge difference, even on today's computers.

I worked on a system where someone added a single int member to an object without thinking. An easy thing to do since it's just 4 bytes and we do that all the time. Problem was, there were a lot of those objects in memory during large operations and that 4 bytes per object pushed the process to the limits of the 2gig memory space for 32-bit apps (this stuff had to run inside Excel so even if we had 64-bit CPUs it wouldn't matter).

Okay, you might say that 32-bit is "yesterday's machines" but that still isn't a reality for the majority of code out there and using up several hundred meg of additional memory is a big deal even on a 64-bit machine.

I don't generally like having overloaded values (e.g. the outright wrongness that is atoi() returning zero on failure) but sometimes they are better than the alternative (though atoi() is still wrong!).

# re: What possible use are those extra bits in kernel handles? Part 3: New object types

Friday, August 29, 2008 12:36 PM by Reginald Wellington III

"So, how much of a performance difference between the two methods do your extensive benchmarks reveal?"

Uh oh, here comes the "NO PREMATURE OPTIMIZATION!!!!!" crowd.  There's nothing wrong with cutting memory usage of an object by 1/3 and getting rid of an indirection, especially if the implementation difficulty of the better method is roughly the same.

Not all of us have to profile our code extensively to figure out what's better.  We just do it the better way the first time.

# re: What possible use are those extra bits in kernel handles? Part 3: New object types

Friday, August 29, 2008 12:53 PM by DrkMatter

The important difference between atoi and the current discussion is that 0 is part of the set of results you would expect from atoi, whereas you should never expect an HANDLE with the bottom bits set: it's part of the HANDLE definition.

# re: What possible use are those extra bits in kernel handles? Part 3: New object types

Friday, August 29, 2008 1:11 PM by An Hero

Leo:  I don't mean this in a derogatory way; I am genuinely curious.  What the hell does your Excel plug-in do that it can use up almost the entire process address space?  I think I am going to have nightmares tonight.

Reginald:  So you've never used a profiler, or even a simple GetTickCount() measurement in your entire life?  I guess you must be God.  Truly, I am in awe and blessed to be in your presence.

# re: What possible use are those extra bits in kernel handles? Part 3: New object types

Friday, August 29, 2008 1:47 PM by Reginald Wellington III

"Reginald:  So you've never used a profiler, or even a simple GetTickCount() measurement in your entire life?  I guess you must be God.  Truly, I am in awe and blessed to be in your presence."

And the reward for best melodramatic response to a comment goes to.... YOU!  Bravo!  I look forward to more ridiculous conclusions from you in the future.

# re: What possible use are those extra bits in kernel handles? Part 3: New object types

Friday, August 29, 2008 1:56 PM by Pierre B.

Hero: don't confuse the issues. That bit of wisdom about not doing early optimizations has too often been abused in exactly the way Reginald says. And it's not only me and Reginald saying this, although I won't go googling for references.

Not doing early optimization is not a ticket to write bad code, choose bad data structures, not doing up-front design and choosing your implementation wildly from the get-go. It's about not writing obscure and complex code without scientific measurement justifying the added complexity and increased maintenance and debugging costs.

Often, the right and good and simple design is also efficient. You just have to think about your code before writing it.

# re: What possible use are those extra bits in kernel handles? Part 3: New object types

Friday, August 29, 2008 2:28 PM by mikefried

Hero:

Premature optimization for time can be bad (though it isn't always - inner loops expected to run on millions of items, for example should be optimized, and profilers/perf counters are one tool to help), but premature optimization for space without a time or coding readability impact is usually smart, and Raymond is pointing out a specific instance where you save time as well as space.

There is a space/time tradeoff when you need to redo calculations instead of lookups, but that is mostly to do cacheing of complex calculations and not bit manipulation of the kind Raymond is espousing.

A profiler or even calls to QueryPerformanceCounter (GetTickCount is NOT a high precision timer) is all well and good when measuring cycles, but working set can be a critical factor in performance. Once your program work becomes large enough that cache line fills start happening, you can measure the impact to performance in cache line fills.

The cost of allocating one 4-byte integer is more than the size of the integer. This costs more in 64-bits than in 32-bits. Small improvements here and there can make a huge difference overall.

The overall theme of the message is: Don't waste space needlessly - you have bits, so use them wisely.

# re: What possible use are those extra bits in kernel handles? Part 3: New object types

Friday, August 29, 2008 2:29 PM by Ulric

it could be said the memory has a lot more than tripple with the c++ approach.

I haven't checked if each object has in fact two vtable pointer, rather than 1, but in any case in order to use that approach, you'd have to 'new' the object of the right class and use pointers.  So you've got multiple other pointers (probably two or three DWORD_PTR in the heap block around your object).  Now if you actually did need to store or associate data with the handle, and if you're making a framework/libary it's likely, then this may indeed be a needless early optimization.  but the article was just an example.

# re: What possible use are those extra bits in kernel handles? Part 3: New object types

Friday, August 29, 2008 2:38 PM by Ulric

to guys above:  this who 'memory has doubled' thing is an over simplication.  I understand why you think it's a big optimization benefit if you see it that way.  

But you need to know how much RAM a handle actually cost. It probably links to structure in the OS, there may be stuff associated with like security descriptor, heap block header/footer, etc.  Saving a DWORD in your code may in fact increase the memory per handle of just 5%, not 200%.

Isn't it in these comments a few weeks ago that someone was talking about lottery and chances.  A statement had been made that you had a 50% chance: either you win or you loose.  Same thing here, the problem is more complex.

# re: What possible use are those extra bits in kernel handles? Part 3: New object types

Friday, August 29, 2008 6:53 PM by Mark Sowul

I propose yet another new tagline for Raymond's blog - "You will never find a more wretched hive of carping and pedantry."

# re: What possible use are those extra bits in kernel handles? Part 3: New object types

Friday, August 29, 2008 11:23 PM by KJK::Hyperion

Pierre: to the best of my knowledge, the only tagged handles you'll ever come across are console handles (console input and console screen buffers), which aren't kernel handles in the first place. Win32 also special-cases STD_INPUT_HANDLE, STD_OUTPUT_HANDLE and STD_ERROR_HANDLE, which, zero-extended to a handle with UlongToHandle, count as an implicit GetStdHandle call - but this is an undocumented feature you most definitely didn't hear about from me and should forget for your own good

Curiously enough, handle tags are in fact used by the NT layer in cases where adding a new system call for a single flag was presumably deemed too expensive

# re: What possible use are those extra bits in kernel handles? Part 3: New object types

Sunday, August 31, 2008 6:09 PM by Sys64738

Thanks for sharing these low-level optimization techniques. While for business apps all the "managed" stuff (.NET, CLR, Java, etc.) can be good, for low-level stuff like operating systems, device drivers, and in general for high-performance applications these techniques can save lots of memory and increase speed.

# re: What possible use are those extra bits in kernel handles? Part 3: New object types

Monday, September 01, 2008 7:22 AM by Medinoc

@Ulric: You're right on one point, adding four bytes to a dynamically allocated structure is usually cheap. Adding a dynamically allocated structure where there weren't one in the first place, however, isn't.

# re: What possible use are those extra bits in kernel handles? Part 3: New object types

Monday, September 01, 2008 8:36 AM by Richard

The thing which has most struck me about this whole series is (apologies to Raymond) /what if two people did this?/.

Suppose some team maintaining some lower-layer code decides they need a flag with their handle, and add one. Further suppose some team maintaining some higher-layer code decides the same thing. Suppose further that the release schedules of the two teams are unrelated (possibly they're in different companies altogether?). Result: assertions if you're lucky/careful, all kinds of failure if not.

At most one group can own the spare bits here, even if they're not used by anyone at the moment. Fortunately, in my experience, the documentation's been reasonably good in identifying who that is, but the thought of applications doing this still fills me with dread.

# re: What possible use are those extra bits in kernel handles? Part 3: New object types

Monday, September 01, 2008 6:08 PM by SuperKoko

"How can we be sure those bits really are available when we are writing code at the Win32 level?"

The two lower bits are documented to be zero.

"Some APIs (the GetQueuedCompletionStatus function being the initial example) might use them now."

I don't care if it uses them internally. I just need to pass a proper HANDLE, and it has to work. If I store data in the two lower bits, I have to mask them (myhandle_with_data & 3), so that it becomes a proper HANDLE, of course!

# re: What possible use are those extra bits in kernel handles? Part 3: New object types

Tuesday, September 02, 2008 12:54 PM by 640k

In the old days a true handle was defined as "a pointer to a pointer", how can any old application still work now that this concept has been broken by the kernel by manipulating the addresses?

# re: What possible use are those extra bits in kernel handles? Part 3: New object types

Tuesday, September 02, 2008 5:20 PM by Pierre B.

SuperKoko: obviously, you didn't understand my comment. The fact that the two lower bits are zero is a contract at the NT kernel level. Win32 lives above the NT kernel, it's a separate layer, and is not necessarily bound by the promises made by the kernel.

KJK::Hyperion already provided an example of a Win32 call (GetStdHandle) which will return HANDLE with low-bits set. You can't "mask them out" when you make Win32 calls later on, it will not work!

# re: What possible use are those extra bits in kernel handles? Part 3: New object types

Tuesday, September 02, 2008 5:44 PM by SuperKoko

"Win32 lives above the NT kernel, it's a separate layer, and is not necessarily bound by the promises made by the kernel."

I thought it was guaranteed at the Win32 level too, but it looks like I'm  wrong, as the documentation is in ntdef.h. Consequently, Raymond's hack mayn't be portable to Windows 9x/Me or future non-NT based versions of Windows and should never be used by Win32 programs intending to be portable. Basically, it should never be used.

# re: What possible use are those extra bits in kernel handles? Part 3: New object types

Wednesday, September 03, 2008 5:59 PM by Good Point

I guess Raymond doesn't update his mistakes anymore.

As Adam pointed out, ECloseHandle() is wrong.  Also I don't think CreateHandleFromExtendedHandle() was meant to accept an ExtendedHandle by value.

[I guess Raymond is too busy working (y'know that thing that he gets paid to do) to update mistakes immediately after they are reported, preferring to wait a few days to let all the corrections settle out. -Raymond]
New Comments to this post are disabled
 
Page view tracker