Larry Osterman's WebLog

Confessions of an Old Fogey
Blog - Title

Erroneous assumptions

Erroneous assumptions

Rate This
  • Comments 16

Has anyone noticed that all of the Win32 documentation has something like this for each API:

Return Values

If the function succeeds, the return value is NO_ERROR.

If the function fails, the return value is one of the following error codes.

Value

Meaning

ERROR_INVALID_PARAMETER

Something about the ERROR_INVALID_PARAMETER error

Other

A system error code defined in WinError.h.

 I can’t think of the number of people who have complained about the last line in the table.  Why on Earth can’t Microsoft bother to document the errors returned from this API anyway?  Are they being stupid or something?

Actually the answer’s somewhat simpler.  We’ve been burned by doing this in the past, and we’re not willing to get burned again.

One of the pieces of memorabilia I have on my desk is a copy of the MS-DOS 2.0 reference manual (published by Microsoft in 1984).  On page 1-143, near the description of the Create Handle API (the MS-DOS equivilant of open()).  It indicates that the API has the following return values:

                        Carry set:
                        AX
                                    3 = Path not found
                                    4 = Too many open files
                                    5 = Access denied

That’s it.  Microsofts (and IBMs) documentation specified the complete set of errors returned by all the DOS APIs.  We told all our customers that the ONLY error codes that the INT 21, 0x3DH API would return are errors 3, 4, and 5.  And you know what?  Our customers believed us, and they wrote their apps with that assumption.

Well, along came DOS 3.1, which added support for networking.  And with that came a whole host of ways for the APIs to fail.  Things like “Network path not found” (the file is on a server and the server’s down).  Or “Sharing Violation” (someone else has the file open and they’re not letting you access the file).

Originally, the DOS developers just returned the new error codes, thinking that most app authors were smart enough to realize that there might be other error codes returned from the APIs in the future.  And we started testing.

And we discovered just how wrong that assumption was.  Apps crashed left and right.  EVERYONE’S apps crashed.  Why?  Because Microsoft and IBM had told them that they would never see any errors other than 3, 4 or 5.  And since RAM was at an absolute premium on these machines, they didn’t waste valuable code space on useless features like error checking for errors that could never ever be generated.  When your app is going to be running on a machine with 64K of RAM, then defensive programming becomes an optional feature.

So Microsoft invented the DOS error mapping table.  It defined a mapping from all the new error codes into the DOS 2.0 set of error codes.  To find the REAL error code, you called the “Get Extended Error” API which returned you the “real” reason for the failure.

This table still exists in the Longhorn source tree (just for grins, I looked it up the other day).  It’s in the NTVDM logic, so it’s not a part of any of the Win32 logic, but the bottom line is that it’s still there.  And it’s likely that we’ll never be able to get rid of it (at a minimum, we’re not going to be able to get rid of it until we get rid of the 16 bit DOS support, which isn’t gonna happen anytime soon).

And ever since then, Microsoft has refused to completely document the error codes from its APIs.  By not documenting the complete set of error codes possible, it moves the onus of handling new error codes from Microsoft to the application author, where it belongs.

 

  • That's no excuse for not documenting the errors that do exist and keeping the documentation up to date.
  • I have to say that it's a poor excuse.

    A strong note saying that current documentation cannot pre-empt future error codes (and noting the "current" version being documented) would have been all that was needed - and possibly suggesting using elseif/case else to handle such things.

    Not that I don't have sympathy with the position that IBM and Microsoft found themselves in, you understand. But this does smack of sloppy documentation in both the original case and the fix...

    (And I hated the Get Extended Error API because of this. So I guess I'm quite, quite biased. *grins*)
  • Did I say that there was? If so, then I apologize.

    On the other hand, I still FIRMLY claim that Microsoft should only document the conditions that are NORMAL - in other words, the error conditions where there is a reasonable degree of certainty of what the application should do when it encounters the error.

    I spent about 10 minutes looking for documentation for the system error codes and found it at: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/debug/base/system_error_codes.asp

    This page is the definitive reference for all the Win32 errors. And ANY Win32 API can return ANY of these errors.

    To my knowledge, this table is pretty up-to-date
  • No, I guess I should apologise. :-)

    You didn't say it was an excuse, or anything of the kind. And I'm not attacking you when I say it's a poor excuse. I'm just being very sloppy and provocative with my language. *sighs*
    (I'm in the UK, and it's 02:20 as I write this - maybe I should get some sleep, but I'm not tired and I'm addicted to RSS feeds... *grins*)

    I guess we got the impression that it was an attempt to excuse it - not perhaps from you personally, but more as a "corporate line" that's been perpetuated ever since. I know that Microsoft (and IBM) take compatibility very seriously for new releases, so I can understand why they took such actions.

    But it seems to me that to get burned in that way - and have to create a new API to compensate - is just a complex way to get yourself burned again. If the documentation for GetExtendedError isn't clear and doesn't tell people to look for future errors, then there's a chance that you're going to have to implement GetExpandedExtendedError in the next version, because of stupid programs that are doing the same thing but with the new API. The solution in the first place may well have to be an API - but the long-term solution is better documentation. To direct people to a header file is not my idea of better documentation - it's only going to disappoint readers.

    As I said, I think a warning about how to code properly and a note on the current version would be more appropriate. Then you can put a table of values below that. In future versions, you can add a column saying in which version each item was added in. That would produce truly useful documentation, and there would be no complaints.

    (OK, fewer complaints. We all know that complaints are like zero-point energy - there even in a vacuum, just not quite so noticable.)

    Of course, it's not the OS writer's fault - it's boneheaded programmers who are writing for performance at the cost of stability. (Now there's an argument I don't want to get into... *grins*)

    Anyway, sorry if I annoyed you. Thanks for the polite rebuff, and the ten minutes of research. :-)
  • Hey, it's no big deal :) You didn't annoy me in the least (actually, to be honest, I hadn't read your response when I posted my response to joe's original comment :)). I know how frustrating it can be dealing with the documentation (hey, I use MSDN all the time at work).

    Especially when you get error 9874 back from the MumbleFrotz() API and have absolutely no idea where that came from.
  • Hmm. I should have dug a bit deeper when I was writing the original article. A few links from the URL posted above, I ran into the following:
    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/debug/base/error_handling_reference.asp

    which is the generic MSDN description of how to deal with errors of all sorts. It's actually worth reading :).
  • The most annoying problem I see is not that there is not a complete list of all errors that could possibly be returned by some API function. I would much more appreciate some more details about the possible causes of at least the most common errors for _that specific function_. Something like
    ERROR_INVALID_PARAMETER ... dwSize does not contain the proper value, or dwAccess has an invalid value
    ERROR_ILLEGAL_ACCESS ... dwAccess requests an access mode that is not ...blah blah

    You know -- I can find in MSDN that ERROR_ILLEGAL_ACCESS means illegal access. But that does not help much, does it... In 1000 functions, that error code means 1000 different problems and it would be fine if each function specified what does it mean by that error code.

    (Such notes are present by _some_ functions, but often, they only copy the generic error code description, and even more often, they are not present at all.)
  • On the point of 16 bit dos compatability, why exactly is the DOS error mapping table still required by longhorn?
  • Hmm ... How about more serius problem ?
    .NET Framework/MSDN documentation sometimes do not bother to list all the exceptions raised.

    This is a much bigger problem compared to error code everybody can ignore. Exceptions are can not be ignored and may cause some realy tricky things to happend.
  • Petr, my entire point is that providing such a table is a recipe for disaster. It's far better to follow the lead of the documentation for HttpReceiveHttpRequest (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/http/http/httpreceivehttprequest.asp) - document some of the error conditions but not all of them.

    Grant - the DOS error mapping table is required because the very same DOS applications that caused us to add it back in DOS 3.1 will continue to run on Longhorn under the NTVDM. So as long as those applications are going to be supported, the table is supported.

    AT - An interesting point. At some point in the future, I'll write about exceptions and exception handling. For a start, you should check out: http://weblogs.asp.net/mgrier/archive/2004/02/18/75324.aspx
  • Another problem with not listing the error codes is that sometimes you won't have a copy of WinError.h -- I code mostly in Delphi, so referring me to a C header file is of little use. (Well, it is of use because I do actually have a copy, but you get the idea.)
  • The good news is that you don't need winerror.h. See: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/debug/base/system_error_codes.asp
  • When you code in Delphi, you develop a habit of wrapping every possible Win32 API function call into a Win32Check(), if it’s a function that returns a BOOL. If it returns something you need and it can be a special error value, you do something like:

    var hf: THandle;
    […]
    hf := CreateFile(…);
    Win32Check(INVALID_HANDLE_VALUE <> hf);
    try
    // Now do what you want with hf
    finally
    CloseFile(hf)
    end;

    Under the hood, the function checks its parameter, and if it is True, returns True immediately. If False, it calls GetLastError() and then FormatMessage() and then raises an exception of class EOSError with message returned from FormatMessage().

    Actually, I do that in C++ too.
  • That was exactly my idea! ("at least the most common errors") The solution of HttpReceiveHttpRequest seems almost perfect to me. I am not against the "Other ... A system error code defined in WinError.h" line, as I understand the reasons now (after reading your article). ;-)

    (A little bit of experience: some time ago, I experienced strange error codes returned by LoadLibrary, like ERROR_NOACCESS, meaning "Invalid access to memory location.". That description didn't help much... after some googling, I learned that such an error is returned when DllMain raises an exception, or something like that. It would be fine if such information was present on the help page of LoadLibrary...)
  • One of the many problems in this area (I feel like we're just starting to realize how f.... messed up error reporting and propagation are) is that software tends to be layered.

    What are all the error codes that kernel32!CreateFileW() could return? Well, let's see... oh maybe the user mode part allocates some memory so there's ERROR_OUT_OF_MEMORY. The conversion from a win32 path to an nt path may generate ERROR_INVALID_PATH. The same may happen once the path is passed to the kernel to start parsing it... oh but wait! Once the path reaches a filesystem, the filesystem finishes parsing it. What errors does it return? Who knows! If we started restricting what errors a filesystem could return, probably someone would feel it's anti-competitive...

    Layering makes this kind of problem really hard. The CLR explicitly chose not to have exception specifications because it tended to make everyone write "try { pFoo->Bar(); } catch (Exception e) { throw new MyException(e); }". People don't have to do this now but instead random exceptions are propagated from underneath. Is that FileNotFound exception that you caught for the right file and was it thrown from the context that means that the file you were looking for wasn't present?

    That said, we should do better. A great Win32 example of this is GetFileAttributesW(). About 70% of the code in the world seems to just assume that when this fails, it means the file is not present. That's pretty bad - under stress, apps start to act mysteriously. The rest of the code tries valiantly to do the right thing, but uh, what *is* the right thing?

    ERROR_FILE_NOT_FOUND. Yup, the file wasn't found. ERROR_PATH_NOT_FOUND. Hmm... what's up there? Well maybe this is OK. File not found again. ERROR_BAD_NETPATH. Hmmm... the server's there but the share isn't available right now. Does this mean the file isn't there or that you couldn't tell if it was there? ERROR_BAD_NET_NAME. The machine can't be reached. I can't figure out if the file exists or not at all! Do I treat this as file not found (possibly influencing a security decision!)?

    I don't have a good answer here, but getting better documentation is only a step in the right direction. It's not clear what the "best" world is as long as you have extensible systems whose implementations evolve.

    My discussion on exceptions concludes that the only safe exception is the one you don't catch. So what do you do about these kind of "alternate success" cases? Trying to enumerate them doesn't scale.
Page 1 of 2 (16 items) 12