Larry Osterman's WebLog

Confessions of an Old Fogey
Blog - Title

My new "favorite" WIn32 API

My new "favorite" WIn32 API

  • Comments 24
Every once in a while, you discover a new Win32 API that you've never heard of.  The other day, one of the guys in my group sent an email extolling the values of a new WIn32 API that was added for Windows Professional X64 edition and Windows Server 2003 SP1 (and of course Windows Vista).

To read a value from the registry, historically you called the RegQueryValueEx.  Unfortunately, the RegQueryValueEx API suffered from a number of fatal problems.  The biggest one was that it didn't adequately type check the data being returned - for example, if the registry contained a string value, it was possible that the data in the registry might not be null terminated, resulting in the following warning in the documentation:

If the data has the REG_SZ, REG_MULTI_SZ or REG_EXPAND_SZ type, the string may not have been stored with the proper null-terminating characters. For example, if the string data is 12 characters and the buffer is larger than that, the function will add the null character and the size of the data returned is 13*sizeof(TCHAR) bytes. However, if the buffer is 12*sizeof(TCHAR) bytes, the data is stored successfully but does not include a terminating null. Therefore, even if the function returns ERROR_SUCCESS, the application should ensure that the string is properly terminated before using it; otherwise, it may overwrite a buffer. (Note that REG_MULTI_SZ strings should have two null-terminating characters, but the function only attempts to add one.)

Unfortunately, many people didn't implement this logic correctly (it's quite hard to get this right for all cases).  In addition to the null termination issue, the caller needed to deal with ANY data type being returned - you had to add in checks to ensure that the type of data returned matched the type of data you expected.  The root cause of this is a "leaky abstraction" issue - the NT base registry API simply stores blobs of data with the type information maintained as metadata alongside the data being stored.  Thus when you retrieve a value from the registry, you get the data in the underlying store and the metadata back.  But there's no attempt at ensuring that the metadata matches the intent of the application because the intent of the application isn't known.

So a new API was added to the Windows API set that resolves these issues, RegGetValue.  I just converted a 50 line routine to use it, the entire routine 50 line routine turned into a one line call to RegGetValue.  Using RegGetValue, I was able to remove:

  • The code that checked the type of data in the registry
  • The logic to handle REG_EXPAND_SZ (it's automatically handled by RegGetValue)
  • Code to ensure null termination of the registry string.
  • Code to validate that the length of the registry string was "appropriate" (a multiple of 2).

The bottom line was that I was able to remove a whole chunk of potentially buggy code and replace it with a single API call.  Heck, I didn't even need to open the registry key, since the RegGetValue API will even open and close the key for you (it opens the key for KEY_QUERY_VALUE if you care).

  • For new API like this, I wish we could get wrappers such as the ones that deal with multi-mo nitors. Thus we can convert our source to use the new API and when the program is run on a legacy OS, it uses the supplied wrapper code instead of the real API.
  • Cool -- a brand new function, and I get to report three different bugs against the documentation! I know it's not your job Larry -- but please mention to your boss that the Microsoft documentation department clearly needs more bodies! The number of silly mistakes that I find is leaps and bounds higher than for any other "big" company.

    The bugs are:

    1. It doesn't cross-reference SHRegGetValue

    2. The RRF_RT values are clearly a bit field, but they are documented as an exclusive enum (that is, according to the documentation you can't specify both a REG_SZ and a REG_EXPAND_SZ

    3. It doesn't mention that the environment variables are not expanded the same way that the shell expands them.
  • I do like this new API. The Reg* functions are horribly error-prone. The other big logic error I've seen is with RegEnum*, pcbData is set to the actual data length on return. Some people set it once and make multiple calls, then don't understand why the return data is being truncated (sometimes) on calls 2, 3, etc.

    Still, the potentially buggy code will have to stay there and will be the most frequent code path for years to come. I don't write Vista-only apps, so the "real" calling sequence would involve a wrapper that tries LoadLibrary+GetProcAddress for RegGetValue and then falls back to using RegQueryValueEx.
  • Speaking of Win32 APIs, I just discovered CreateFile() because OpenFile() was failing to create files > 128 characters.

    It takes more parameters, but it works. Sometimes I wish Microsoft would just make OpenFile() use CreateFile() when creating a file, so I could use either.

    -greg-
  • "For new API like this, I wish we could get wrappers such as the ones that deal with multi-mo nitors. Thus we can convert our source to use the new API and when the program is run on a legacy OS, it uses the supplied wrapper code instead of the real API. "

    Agreed. Of course, if the new API can be implemented in terms of the old API's, I have to wonder why it's an API in the first place. Making it an API introduces a new potential OS version dependancy...
  • Ho ! Docs say it's implemented both as Unicode and ANSI.

    It's surprising that such new APIs are implemented as ANSI as well. I think Michael Kaplan once wrote that new APIs were usually implemented as Unicode only. Maybe this is one of the 'unusual' cases.
  • I'd love to be able to use this function, but I can't - because Windows XP32 will still be in use for about 10 years. (We're only just starting to think about removing Windows 98 support from our products, because a significant chunk of our customers still uses it.)

    Why can't this function be backported to at least 9x / 2k / XP32 in a service pack? Requiring the latest service pack for an OS is reasonable. Requiring that the user install a different OS isn't.

    Overall, it's great that this function is introduced, but it's not really going to help us for about a decade. :(
  • Greg: That limit for OpenFile is documented, and the API itself is only provided for compatibility with Win16 (which means behavior changes, like making it use CreateFile some of the time, are bad). Why on earth would you be using it in the first place?

    http://msdn.microsoft.com/library/en-us/fileio/fs/openfile.asp
  • Serge: I think that the ANSI version just converts lpSubKey and lpValue to Unicode and calls the Unicode version with that, like pretty much all the ANSI versions apparently do.

    I don't remember where I saw it, but I do remember reading that the ANSI versions of API functions are created automatically and just do that, converting inputs to Unicode and outputs back to ANSI.
  • The old function's documentation needs some bug reports too.

    > For example, if the string data is 12
    > characters and the buffer is larger than
    > that, the function will add the null
    > character

    So far so good. You need to take more care than Microsoft did in trying to figure out whether the buffer is larger than that, but anyway it looks OK up to this point.

    > and the size of the data returned is
    > 13*sizeof(TCHAR) bytes.

    That is true for Unicode but false for ANSI. Microsoft even went to the trouble of using the TCHAR macro and doing a computation but still didn't test it.

    In a Unicode compilation the size of the data returned is indeed 13*sizeof(TCHAR) bytes because that's 13 wchar_t elements and sizeof(TCHAR) is sizeof(wchar_t).

    In an ANSI compilation the 12 characters can occupy anywhere from 12 to 24 bytes, and with an appended null character that's anywhere from 13 to 25 bytes. The size of the data returned is anywhere from 13*sizeof(TCHAR) bytes to 25*sizeof(TCHAR) bytes because sizeof(TCHAR) is sizeof(char).

    > However, if the buffer is 12*sizeof(TCHAR)
    > bytes, the data is stored successfully

    Sometimes it is, sometimes it isn't.

    > but does not include a terminating null.

    That part of it is true again.

    Thursday, January 12, 2006 2:05 PM by Greg Wishart
    > Speaking of Win32 APIs, I just discovered
    > CreateFile() because OpenFile() was failing
    > to create files > 128 characters.

    That's OK. In an ANSI compilation CreateFile() can't open some existing files either.
  • a little bit too late, who will use it anyway (and when, in 2020?) i wouldnt use it just to break compatibility with previous windows versions...
  • One thing that’s been bugging me for a while is that many Windows API functions seem to be invitations for race conditions.

    For example, suppose we want to read a string from the registry. Clearly we will need to use RegQueryValueEx (or RegGetValue). But it wants a pointer to a buffer to store the data in. So let’s allocate a buffer.

    > char buf[20];
    > DWORD cb = sizeof(buf);
    > RegQueryValueEx(HKEY_CURRENT_USER, "Test", NULL, NULL, buf, &cb);

    Oh, but this won’t do. What if the data is longer?

    > DWORD cb;
    > RegQueryValueEx(HKEY_CURRENT_USER, "Test", NULL, NULL, NULL, &cb); // 1
    > char* buf = new char[cb];
    > RegQueryValueEx(HKEY_CURRENT_USER, "Test", NULL, NULL, buf, &cb); // 2
    > …
    > delete buf;

    Oops. Between points 1 and 2, the value got changed by a concurrent thread or process. And, of course, it grew. So we have a buffer for 317 characters, but the value is now 323 characters long. The second call will return ERROR_MORE_DATA.

    So, to ensure that we read the value, the whole value and nothing but the value, we now have to wrap the second call into a while (ERROR_MORE_DATA == result) loop, reallocating the buffer each time to accommodate the new value.

    Many other API functions follow the same pattern: you ask for the size, then allocate the buffer, then ask for the data, praying it didn’t grow while you were busy. How many applications out there actually follow the reallocation loop pattern? How many just allocate a static-sized buffer on the stack and thus impose an arbitrary limitation on the value length (and in the worst case have their buffer overrun and their stack smashed)?

    I do not know what the correct solution would be, design-wise. Maybe it would accept a flag that would cause the function to allocate the buffer for us, as FormatMessage does, and require us to free it afterwards with a specific API function. Or maybe opening a key for querying values would create a snapshot on which concurrent changes would have no effect, as versioning databases do. Or the concurrent process would get blocked until we’re done reading, as locking databases do.
  • What happens to those of us who want to write Win32 programs that run under Windows in general? Do we have to reimplement RegGetValue ourselves?
  • Sorry, but why is this being called an API, instead of just a function?

    I always thought that the term "API" referred to a related collection of functions and datatypes - so the "Win32 API" referred to *everything* in Win32, the "Win32 registry API" referred to the set of functions and datatypes in Win32 for dealing with the registry, etc...

    So, when I saw "new API", I thought it meant "a whole new set of functions and datatypes", and got excited. Talk about a let-down. :-)

    So, where does this usage come from? I've never been aware of it before. Is it common?
  • I believe RegGetValue is just the port of SHRegGetValue from shlwapi.dll to be exported from a lower level binary of the OS (thus making it available to a broader set of consumers, since not everyone can link to shlwapi). SHRegGetValue should be available as a public export from shlwapi.dll as far back as XPSP2 if that helps.
Page 1 of 2 (24 items) 12