# October, 2009

• #### What is the format of a double-null-terminated string with no strings?

One of the data formats peculiar to Windows is the double-null-terminated string. If you have a bunch of strings and you want to build one of these elusive double-null-terminated strings out of it, it's no big deal.

 H e l l o \0 w o r l d \0 \0

But what about the edge cases? What if you want to build a double-null-terminated string with no strings?

Let's step back and look at the double-null-terminated string with two strings in it. But I'm going to insert line breaks to highlight the structure.

 H e l l o \0 w o r l d \0 \0

Now I'm going to move the lines around.

 Hello\0 world\0 \0

This alternate way of writing the double-null-terminated string is the secret. Instead of viewing the string as something terminated by two consecutive null terminators, let's view it as a list of null-terminated strings, with a zero-length string at the end. Alternatively, think of it as a packed array of null-terminated strings, with a zero-length string as the terminator.

This type of reinterpretation happens a lot in advanced mathematics. You have some classical definition of an object, and then you invent a new interpretation which agrees with the classical definition, but which gives you a different perspective on the system and even generalizes to cases the classical definition didn't handle.

For example, this "modern reinterpretation" of double-null-terminated strings provides another answer to a standard question:

How do I build a double-null-terminated string with an empty string as one of the strings in the list?

You can't, because the empty string is treated as the end of the list. It's the same reason why you can't put a null character inside a null-terminated string: The null character is treated as the terminator. And in a double-null-terminated string, an empty string is treated as the terminator.

 One\0 \0 Three\0 \0

If you try to put a zero-length string in your list, you end up accidentally terminating it prematurely. Under the classical view, you can see the two consecutive null terminators: They come immediately after the `"One"`. Under the reinterpretation I propose, it's more obvious, because the zero-length string is itself the terminator.

If you're writing a helper class to manage double-null-terminated strings, make sure you watch out for these empty strings.

This reinterpretation of a double-null-terminated string as really a list of strings with an empty string as the terminator makes writing code to walk through a double-null-terminated string quite straightforward:

`for (LPTSTR pszz = pszzStart; *pszz; pszz += lstrlen(pszz) + 1) { ... do something with pszz ...}`

Don't think about looking for the double null terminator. Instead, just view it as a list of strings, and you stop when you find a string of length zero.

This reinterpretation also makes it clear how you express a list with no strings in it at all: All you have is the zero-length string terminator.

 \0

Why do we even have double-null-terminated strings at all? Why not just pass an array of pointers to strings?

That would have worked, too, but it makes allocating and freeing the array more complicated, because the memory for the array and the component strings are now scattered about. (Compare absolute and self-relative security descriptors.) A double-null-terminated string occupies a single block of memory which can be allocated and freed at one time, which is very convenient when you have to serialize and deserialize. It also avoids questions like "Is it legal for two entries in the array to point to the same string?"

Keeping it in a single block of memory reduces the number of selectors necessary to represent the data in 16-bit Windows. (And this data representation was developed long before the 80386 processor even existed.) An array of pointers to 16 strings would require 17 selectors, if you used `GlobalAlloc` to allocate the memory: one for the array itself, and one for each string. Selectors were a scarce resource in 16-bit Windows; there were only 8192 of them available in the entire system. You don't want to use 1% of your system's entire allocation just to represent an array of 100 strings.

One convenience of double-null-terminated strings is that you can load one directly out of your resources with a single call to `LoadString`:

`STRINGTABLEBEGIN IDS_FILE_FILTER "Text files\0*.txt\0All files\0*.*\0"ENDTCHAR szFilter[80];LoadString(hinst, IDS_FILE_FILTER, szFilter, 80);`

This is very handy because it allows new filters to be added by simply changing a resource. If the filter were passed as an array of pointers to strings, you would probably put each string in a separate resource, and then the number of strings becomes more difficult to update.

But there is a gotcha in the above code, which we will look at next time.

Bonus Gotcha: Even though you may know how double-null terminated strings work, this doesn't guarantee that the code you're interfacing with understands it as well as you do. Consequently, you'd be best off putting the extra null terminator at the end if you are generating a double-null-terminated string, just in case the code you are calling expects the extra null terminator (even though it technically isn't necessary). Example: The ANSI version of `CreateProcess` locates the end of the environment block by looking for two consecutive NULL bytes instead of looking for the empty string terminator.

• #### Why doesn't the mail image resizer check the image size before offering to resize?

Commenter Igor lambastes the image resizer dialog that appears when you select Send To Mail Recipient. (And people think I'm the one with the social skills of a thermonuclear device.) This dialog pisses him off so much, he complained about it again.

The root of the diatribe appears to be that the image resizer dialog appears, even if it turns out the resizer won't do anything. For example, the resizer dialog appears even if the images are already small, or if the files have a .jpg extension but aren't actually JPG images, Why is it so idiotic that it fails to check these simple things before offering to do its work?

Because checking these simple things before showing the dialog is even more idiotic.

One of the grave errors when doing work with files is accessing the file before the user asks for it. This is a grave error because accessing the file can take an excruciatingly long time if the file is stored on a server halfway across the world over a slow network connection, or if the file has been archived to tape.

This particular code path is sensitive to the file access time because the user has just picked a menu item. Suppose the dialog box went ahead and opened the files to confirm that, yes, they really are images, and yes, the dimensions of the image are larger than what the dialog offers to resize them to. You select 1000 small images on a slow server, right-click them, and pick Send To... Mail Recipient.

Then you wait 30 minutes while the dialog box goes off and does something like this:

```shouldOfferResize = false;
foreach (file in selection)
{
if (file.IsJPGThatIsNotCorrupted() &&
file.IsWorthResizing()) {
shouldOfferResize = true;
break; // can early-out the loop once we find something
}
}
```

Opening each file, parsing it to verify that it is a valid JPG file that decodes without error, and extracting its dimensions takes, say, 2 seconds per file. (The file is slow to access, say, it's on a network server or on a slow medium like a CD-ROM or a tape drive. Or the file is large and it takes 2 seconds to read it off the disk and parse it to verify that there are no decoding errors.)

After about 15 seconds with no response, you give up and say "I hate computers." and go off and do something else, frustrated that you were unable to email your photos.

And then in the middle of working in your word processor, this dialog box suddenly appears: "Windows can resize the pictures you send in e-mail so that they transfer faster and are easier to view by the recipient."

Gee thanks, Windows, for finally getting around to asking me about that thing I wanted to do a half hour ago.

Idiot.

And then when you click No, Windows has to go and decode the files a second time in order to print them. (Unless Igor's recommendation is to cache the decoded bits from the first pass. Then you'd complain that selecting 1000 files and clicking "Send To... Mail Recipient" causes your computer to run out of memory. As Igor is fond of saying when insulting the Windows team: "Looks like this feature was designed without any adult supervision.")

Sidebar: A good fraction of these blog entries are just elaborations on very simple concepts. When I toss an entry onto the "I should blog this" pile, it usually gets a short remark of five to ten words which captures what I want to say. Then when it floats to the head of the queue, I expand those ten words into a 300-word entry. The short version of today's entry: "That would hit the disk."

• #### Why won't my computer go to sleep? Where is the energy going?

The `powercfg` utility has been around for a while, but in Windows 7, it gained a little bit more awesome.

`powercfg /energy` will analyze your computer's power consumption and report on things like devices that prevent the computer from sleeping, devices which won't suspend, and processes which are increasing your battery drain.

Another neat flag is `powercfg /requests` which will report on why your computer can't go to sleep, for example, because it has open files on the network, or because the clown will eat it.

• #### What is the format for FirstInstallDateTime on Windows 95?

Public Service Announcement: Daylight Saving Time ends in most parts of the United States this weekend.

Windows 98/98/Me recorded the date and time at which Setup was run in the registry under `HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion` as a binary value named `FirstInstallDateTime`. What is the format of this data?

Take the binary value and treat it as a 32-bit little-endian value. The format of the value is basically DOS date/time format, except that the seconds are always 0 or 1 (usually 1), due to a programming error.

Exercise: What error would result in the seconds always being 0 or 1 (usually 1)?

• #### In Hawaiʻi, "mahalo" might officially mean "thank you"

In Hawaiʻi, you see the word Mahalo on signs everywhere. In theory, the word means Thank you, but my friend Joe Beda pointed out that in practice the word has a completely different meaning. Here are some examples:

• We do not accept coupons at this location. Mahalo.
• No refills. Mahalo.
• This counter is closed. Mahalo.
• Elevator is out of order. Please use stairs. Mahalo.
• Rest rooms for customers only. Mahalo.

In practice, the word mahalo means You're screwed.

Obligatory clarification: This was a joke, an attempt at observational humor.

• #### If you have to cast, you can't afford it

A customer reported a crash inside a function we'll call `XyzConnect`:

```DWORD XyzConnect(
__in DWORD ConnectionType,
__in PCWSTR Server,
__in PCWSTR Target,
__out void **Handle);
...
// HACK - Create a dummy structure to pass to the XyzConnect
// function to avoid AV within the function.
int dummy = 0;
if ( NO_ERROR != ( XyzConnect( 0, L"", L"", (PVOID*)&dummy ) )
{
TRACE( L"XyzConnect failed." );
return FALSE;
}
...
```

The title of today's entry gives the answer away. (The title is also an exaggeration, but it's a pun on the saying If you have to ask, you can't afford it.)

The last parameter to the `XyzConnect` function is declared as a `void**`: A pointer to a generic pointer. Note that it is not itself a generic pointer, however. A generic pointer can point to anything, possibly unaligned. But this is an aligned pointer to a generic pointer. Therefore, the memory for the generic pointer must be aligned in a manner appropriate to its type.

But this caller didn't pass a pointer to a pointer; the caller passed a pointer to an `int`, and an `int` has different alignment requirements from a pointer on 64-bit systems. (You might conclude that this decision was the stupidest decision on the face of the planet, but that's a different argument for a different time. For example, I can think of decisions far stupider.)

When the `XyzConnect` function tries to dereference this purported `void **` pointer, it encounters an alignment fault, because it does not in fact point to a `void *` as the type claims, but rather points to a `DWORD`. A `DWORD` requires only 32-bit alignment, so you have a 50% chance that the `DWORD*` is not suitably aligned to be a `void*`.

Mind you, you also have a 100% chance of a buffer overflow, because a `DWORD` is only four bytes, whereas a `void*` is eight bytes. The function is going to write eight bytes into your four-byte buffer.

When this question was posed, one person suggested changing the `DWORD` to a `__int64`, since the `__int64` is an 8-byte value, which is big enough to hold a pointer on both 32-bit and 64-bit Windows. Then again, it's overkill on 32-bit systems, since you allocated eight bytes when you only needed four. Another suggestion was to use `DWORD_PTR`, since that type changes in size to match the size of a `void*`.

Well, yeah, but here's another type that matches the size of a `void*`: It's called `void*`.

Just declare `void *dummy` and get rid of the cast. And get rid of the comment while you're at it. If you do it right, you don't need the cast or the hack.

```void *handle = 0;
if ( NO_ERROR != ( XyzConnect( 0, L"", L"", &handle ) )
{
TRACE( L"XyzConnect failed." );
return FALSE;
}
```

A large number of porting problems can be traced to incorrect casts. The original author probably inserted the cast to "shut up the compiler" but the compiler was trying to tell you something.

Any time you see a function cast or see a cast to/from something other than `void*` or `BYTE*`, then you should be suspicious, because there's a chance somebody is simply trying to shut up the compiler.

• #### Still working out the finer details of how this Hallowe'en thing works

Here's an excerpt from a conversation on the subject of Hallowe'en which I had with my niece some time ago. Let's call her "Cathy". (This is a different Cathy from last time.)

"Cathy, what do you do on Hallowe'en?"

"You get all dressed up and people give you candy."

• #### How to write like Raymond: Start a sentence with a question mark

Another installment in the extremely sporadic series on how to write like Raymond.

I use the question mark as an emoticon to indicate befuddlement or confusion. (This is not to be confused with the use of an inverted question mark in Spanish.) Here's an imaginary example:

Hi, I'm trying to bake a carrot cake, but I'm having trouble finding the right staple gun. Does anybody have any recommendations?

My reply might go something like this:

? What the heck are you planning to do with that staple gun?

Whenever somebody reports that the `SHFileOperation` function or the `lpstrFilter` member of the `OPENFILENAME` structure is not working, my psychic powers tell me that they failed to manage the double-null-terminated strings.

Since string resources take the form of a counted string, they can contain embedded null characters, since the null character is not being used as the string terminator. The `LoadString` function knows about this, but other functions might not.

Here's one example:

`TCHAR szFilters[80];strcpy_s(szFilters, 80, "Text files\0*.txt\0All files\0*.*\0");// ... or ...strlcpy(szFilters, "Text files\0*.txt\0All files\0*.*\0", 80);`

The problem is that you're using a function which operates on null-terminated strings but you're giving it a double-null-terminated string. Of course, it will stop copying at the first null terminator, and the result is that `szFilters` is not a valid double-null-terminated string.

Here's another example:

`sprintf_s(szFilters, 80, "%s\0*.txt\0%s\0*.*\0", "Text files", "All files");`

Same thing here. Functions from the `sprintf` family take a null-terminated string as the format string. If you "embed" a null character into the format string, the `sprintf` function will treat it as the end of the format string and stop processing.

Here's a more subtle example:

`CString strFilter;strFilter.LoadString(g_hinst, IDS_FILE_FILTER);`

There is no obvious double-null-termination bug here, but there is if you look deeper.

`BOOL CString::LoadString(UINT nID){  // try fixed buffer first (to avoid wasting space in the heap)  TCHAR szTemp[256];  int nCount =  sizeof(szTemp) / sizeof(szTemp[0]);  int nLen = _LoadString(nID, szTemp, nCount);  if (nCount - nLen > CHAR_FUDGE)  {    *this = szTemp;    return nLen > 0;  }   // try buffer size of 512, then larger size until entire string is retrieved  int nSize = 256;  do  {    nSize += 256;    nLen = _LoadString(nID, GetBuffer(nSize - 1), nSize);  } while (nSize - nLen <= CHAR_FUDGE);  ReleaseBuffer();   return nLen > 0;}`

Observe that this function loads the string into a temporary buffer, and then if it succeeds, stores the result via the `operator=` operator, which assumes a null-terminated string. If your string resource contains embedded nulls, the `operator=` operator will stop at the first null.

The mistake here was taking a class designed for null-terminated strings and using it for something that isn't a null-terminated string. After all, it's called a `CString` and not a `CDoubleNullTerminatedString`.