What's Up With Hungarian Notation?

What's Up With Hungarian Notation?

  • Comments 22

I mentioned Hungarian Notation in my last post -- a topic of ongoing religious controversy amongst COM developers. Some people swear by it, some people swear about it.

The anti-Hungarian argument usually goes something like this:

"What is the point of having these ugly, hard-to-read prefixes in my code which tell me the type? I already know the type because of the declaration! If I need to change the type from, say, unsigned to signed integer, I need to go and change every place I use the variable in my code. The benefit of being able to glance at the name and know the declaring type is not worth the maintenance headache."

For a long time I was mystified by this argument, because that's not how I use Hungarian at all. Eventually I discovered that there are two completely contradictory philosophical approaches to Hungarian Notation. Unfortunately, each can be considered "definitive", and the bad one is in widespread use.

The one I'll call "the sensible philosophy" is the one actually espoused by Charles Simonyi in his original article. Here's a quote from Simonyi's paper:

The basic idea is to name all quantities by their types. [...] the concept of "type" in this context is determined by the set of operations that can be applied to a quantity. The test for type equivalence is simple: could the same set of operations be meaningfully applied to the quantities in questions? If so, the types are thought to be the same. If there are operations that apply to a quantity in exclusion of others, the type of the quantity is different. [...] Note that the above definition of type [...] is a superset of the more common definition, which takes only the quantity's representation into account. Naturally, if the representations of x and y are different, there will exist some operations that could be applied to x but not y, or the reverse.

(Emphasis added.)

What Simonyi is saying here is that the point of Hungarian Notation is to extend the concept of "type" to encompass semantic information in addition to storage representation information.

There is another philosophy which I call "the pointless philosophy". That's the one espoused by Charles Petzold in "Programming Windows". On page 51 of the fifth edition he says

Very simply, the variable name begins with a lowercase letter or letters that denote the data type of the variable.  For example [...] the i prefix in iCmdShow stands for "integer".

And that's all! According to Petzold, Hungarian is for connoting the storage type of the variable.

All of the arguments raised by the anti-Hungarians (with the exception of "its ugly") are arguments against the pointless philosophy! And I agree with them: that is in fact a pointless interpretation of Hungarian notation which is more trouble than it is worth.

But Simonyi's original insight is extremely powerful! When I see a piece of code that says

iFoo = iBar + iBlah;

I know that there are a bunch of integers involved, but I don't know the semantics of any of these. But if I see

cbFoo = cchBar + cbBlah;

then I know that there is a serious bug here! Someone is adding a count of bytes to a count of characters, which will break on any Unicode or DBCS platform. Hungarian is a concise notation for semantics like "count", "index", "upper bound", and other common programming concepts.

In fact, back in 1996 I changed every variable name in the VBScript string library to have its proper Hungarian prefix. I found a considerable number of DBCS and Unicode bugs just by doing that, bugs which would have taken our testers weeks to find by trial and error.

By using the semantic approach rather than the storage approach we eliminate the anti-Hungarian arguments:

I already know the type because of the declaration!

No, the Hungarian prefix tells you the semantic usage, not the storage type. A cBar is a count of Bars whether the storage is a ushort or a long.

If I need to change the type from, say, unsigned to signed integer, I need to go and change every place I use the variable in my code.

Annotate the semantics, not the storage. If you change the semantics of a variable then you need to also change every place it is used!

The benefit of being able to glance at the name and know the declaring type is not worth the maintenance headache.

But the benefit of knowing that you will never accidentally assign indexes to counts, or add apples to oranges, is worth it in many situations.

UPDATE: Joel Spolsky has written a similar article: Making Wrong Code Look Wrong. Check it out!

  • another comment about 'I already know the type because of the declaration' is: often the declaration and use are distant from each other. the benefit of using hungarian in that case is that you don't have to look at a line of code and then grovel around to find out how the vars are declared.
  • Though you make a good point, I'd counter that by saying that (a) many developers now use some pretty sophisticated editors that can find the declaration very quickly, and (b) I try to keep my routines under three screens long. If its a local variable, the type is nearby. If it's a member variable or a global variable then sure, sometimes it is a pain to find the storage type, but really, how often do you care whether that counter is a UINT or a DWORD?
  • I've always thought that it would be handy to make that sort of thing a compile-time error. Describe the set of prefixes you plan to use and how they can be associated with each other, then make the compiler scream at you if you do something Wrong.
  • There's always the Ada way of doing things: creating new integer types, and new subtypes. However, I suspect that most of us don't have the patience for that. As far as I'm aware, there aren't any mainstream languages apart from Ada that allow us to declare the exact valid range of an integer variable, or differentiate apples from oranges within the type system (i.e. with support from the compiler). There's less need for Hungarian prefixes in a strong type system like Ada's, or the user-defined type systems of C++ - because the compiler can tell you you're making mistakes. However, if you have conversion operators and alternate constructors in C++, there's a chance of introducing type errors.
  • My primary argument for Hungarian has to do with making code easier to read. Even if the declaration of a local/member/global variable is just a screen away, that's still too far if I'm trying to quickly figure out this line of code where the debugger dropped me. My philosophy for Hungarian notation actually falls between your "sensible" and "pointless" philosophies. I don't look to Hungarian to tell me whether an integer is 16- or 32-bit, signed or unsigned, but I also don't look to it to tell me much semantic information -- I figure the actual variable name is good for that. I do want Hungarian to tell me that the variable is an integer (versus a real number, or a Boolean, or a rectangle, or a pen, or whatever). So "nChildren" is probably the number of children, whereas "bChildren" is probably true if there are any children, and "astrChildren" is probably an array of their names. It communicates enough of the type to know what the basic operations on the variable are, and, combined with the variable name, gives solid semantic information.
  • As no-one has commented on this yet, I will note that it there is no argument against using it in a language like JScript where there are variants involved. It makes the code 10 times easier to read and look pretty cool to boot :)
  • << ...there is no argument against using it in a language like JScript where there are variants involved. It makes the code 10 times easier to read and look pretty cool to boot... >>

    Ugh. JScript is the WORST kind of language in which to use Hungarian notation, as there is no way to enforce typing.

    And I disagree with claims that Hungarian Notation makes code easier to read.
  • I lovr hungarian for all the right reasons. I agree wiht the blooger's comments, but you've got to start somewhere...

    http://CodeInsight.com/Docs/Programming Standards.doc
  • If you do use hungarian it also seems like you need to setup some standards so everyone uses the SAME prefixes (however you plan on using then). If you have multiple people using different prefixes for the same thing then you still end up having to go look at the definition because you don't know what stupid thing is. Also we'd always run into problems with people manufacturing prefixes for classes they create so you end up with wpbmObject or something indecipherable like that.

    I used to use it but we hit so many problems due to people making up their own rules that we don't use it anymore. At first I was pretty skeptical about not using it but as long as you take care and name things appropriately it's not really missed. Of course if people named things appropriately in the first place then using it would probably not have caused as many problems as it did.
  • Hungarian doesn't have any place in production code.  It goes beyond "messy" or "dirty".  Most variable names imply their data type.  ID is int, Name is string, so on.  For complex data types, you get into trouble when people start using their own abbreviations.  Class People does it abbriviate to pplPerson? or plePerson, what if there's a pointer ppple?  Also most IDEs nowadays(especially VS) provide abundant hover over which tells you data type, among other things.

    Redundancy is bad.

  • Hungarian notation is only useful if you name your variables things like "blah" and "foo". If you give your variables good, descriptive names, this is a complete non-issue.

Page 1 of 2 (22 items) 12