Larry Osterman's WebLog

Confessions of an Old Fogey
Blog - Title

Hugarian notation - it's my turn now :)

Hugarian notation - it's my turn now :)

  • Comments 36

Following on the heals of Eric Lippert’s posts on Hungarian and of course Rory Blyth’s classic “Die, Hungarian notation… Just *die*”, I figured I’d toss my hat into the fray (what the heck, I haven’t had a good controversial post in a while).

One thing to keep in mind about Hungarian is that there are two totally different Hungarian implementations out there.

The first one, which is the one that most people know about, is “Systems Hungarian”.  System’s Hungarian is also “Hungarian-as-interpreted-by-Scott-Ludwig” (Edit: For Scott's side of this comment, see here - the truth is better than my original post).  In many ways, it’s a bastardization of “real” (or Apps) Hungarian as proposed by Charles Simonyi. 

Both variants of Hungarian have two things in common.  The first is the concept of a type-related prefix, and the second is a suffix (although the Systems Hungarian doesn’t use the suffix much (if at all)).  But that’s where the big difference lies.

In Systems Hungarian, the prefix for a type is almost always related to the underlying data type.  So a parameter to a Systems Hungarian function might be “dwNumberOfBytes” – the “dw” prefix indicates that the type of the parameter is a DWORD, and the “name” of the parameter is “NumberOfBytes”.  In Apps Hungarian, the prefix is related to the USE of the data.  The same parameter in Apps Hungarian is “cb” – the “c” prefix indicates that the parameter is a type, the “b” suffix indicates that it’s a byte parameter.

Now consider what happens if the parameter is the number of characters in a string.  In Systems Hungarian, the parameter might be “iMaxLength”.  It might be “cchWideChar”.  There’s no consistency between different APIs that use Systems Hungarian.  But in Apps Hungarian, there is only one way of representing the parameter; the parameter would be “cch” – the “c” prefix again indicates a count, the “ch” type indicates that it’s a character.

Now please note that most developers won’t use “cch” or “cb” as parameters to their routines in Apps Hungarian.  Let’s consider the Win32 lstrcpyn function:

 LPTSTR lstrcpyn(     
LPTSTR lpString1,
LPCTSTR lpString2,
int iMaxLength
);

This is the version in Systems Hungarian.  Now, the same function in Apps Hungarian:

 LPTSTR Szstrcpyn(     
LPTSTR szDest,
LPCTSTR szSrc,
int cbLen
);

Let’s consider the differences.  First off, the name of the function changed to reflect the type returned by the function – since it returns an LPTSTR, which is a variant of a string, the function name changed to “SzXxx”.  Second, the first two parameters name changed.  Instead of “lpString1” and “lpString2”, they changed to the more descriptive “szSrc” and “szDest”.  The “sz” prefix indicates that the variable is a null terminated string.  The “Src” and “Dest” are standard suffixes, which indicate the “source” and “destination” of the operation.  The iMaxLength parameter which indicates the number of bytes to copy is changed to cbLen – the “cb” prefix indicates that it’s a count of bytes, the standard “Len” suffix indicates that it’s a length to be copied.

The interesting thing that happens when you convert from Systems Hungarian to Apps Hungarian is that now the usage of all the parameters of the function becomes immediately clear to the user.  Instead of the parameter name indicating the type (which is almost always uninteresting), the parameter name now contains indications of the usage of the parameter.

The bottom line is that when you’re criticizing Hungarian, you need to understand which Hungarian you’re really complaining about.  Hungarian as defined by Simonyi isn’t nearly as bad as some have made it out to be.

This is not to say that Apps Hungarian was without issue.  The original Hungarian specification was written by Doug Klunder in 1988.  One of the things that was missing from that document was a discussion about the difference between “type” and “intent” when defining prefixes.  This can be a source of a great confusion when defining parameters in Hungarian.  For example, if you have a routine that takes a pointer to a “foo” parameter to the routine, and internally the routine treats the parameter as single pointer to a foo, it’s clear that the parameter name should be “pfoo”.  However, if the routine treats the parameter as an array of foo’s, the original document was not clear about what should happen – should the parameter be “pfoo” or “rgfoo”.  Which wins, intent or type?  To me, there’s no argument, it should be intent, but there have been some heated debates about this over the years.  The current Apps Hungarian document is quite clear about this, intent wins.

One other issue with the original document was that it predated C++.  So concepts like classes weren’t really covered and everyone had to come up with their own standard.  At this point those issues have been resolved.  Classes don’t have a “C” prefix, since a class is really just a type.  Members have “m_” prefixes before their actual name.  There are a bunch of other standard conventions but they’re relatively unimportant.

I used Hungarian exclusively when I was in the Exchange team; my boss was rather a Hungarian zealot and he insisted that we code in strict Apps Hungarian.  Originally I chafed at it, having always assumed that Hungarian was stupid, but after using it for a couple of months, I started to see how it worked.  It certainly made more sense than the Hungarian I saw in the Systems division.  I even got to the point where I could understand what an irgch would without even flinching.

Now, having said all that, I don’t use Hungarian these days.  I’m back in the systems division, and I’m using a home-brewed coding convention that’s based on the CLR standards, with some modifications I came up with myself (local variables are camel cased, parameters are Pascal cased (to allow easy differentiation between parameters and local variables), class members start with _ as a prefix, globals are g_Xxx).  So far, it’s working for me.

I’ve drunk the kool-aid from both sides of the Hungarian debate though, and I’m perfectly happy working in either camp.

 

  • Hungarian Notation Lite®

    Since bitching (or otherwise) about Hungarian notation appears to be a common past-time right now, I thought I'd shove my oar in and deliver my 2 cents...

    http://www.accidentalscientist.com/2004/06/hungarian-notation-lite.html
  • Everyone nowadays likes to throw away tried and true practices because... well, they can.

    Hungarian has its uses in a bunch of cases where the type system lacks information.

    int *Foo;

    is that a pointer to an int or an array/vector of ints? It's clear in these cases:

    int *prgnElements;
    int **prgprgiCurrentPositions;
    int nElements;

    differentiating between pointers to singletons and arrays is very useful. Differentiating whether something is in an index or a count is useful. Differentiating between a count of bytes and a count of characters is useful.

    But I guess a bunch of people from other companies didn't invent it so we have to throw away the good with the bad.

    The "apps hungarian" tyrrany was stupid. Having to scroll up and down constantly to try to find the nature of an identifier is also stupid.
  • "My copy of Exceptional C++ uses an underscore for data members, but as a suffix, not prefix"

    Typical, the one page that I choose to base my complaint on (Item 20) seems to be the only page in the entire book to use an underscore prefix :-)

    I'll concede the point about leading underscores only being reserved for global identifiers, though.
  • Hungarian's useful in situations where the type system isn't strong enough to express your intent fully. In C, practically anything can be promoted to anything else with no casts, and strings and arrays aren't first-class types (or even types at all in the case of strings) so you need some way for the programmer to be able to see whether the operations performed are in fact correct - because the compiler can't help you.

    However, there's little need to be warty when using user-defined types (structs) except around the implicit conversion between any pointer type and void*.

    In C++, there's even less need because there is no implicit conversion to void* - you must use a cast. An C++ program will tend to have less typeswitching and peculiar casts due to the use of polymorphism.

    The ultimate in static type systems still has to be Ada, in which you can define new integer types that don't have implicit conversions between themselves, the built-in Integer type, or any other integer types, and you can also define range-restrictions of types (keeping the implicit conversions of its parent type). The main problems with Ada are that its interop with C requires writing declarations, its syntax (derived from or inspired by Pascal) is verbose (due to a requirement to be largely LL(1) parsable) and that the object-oriented extensions of Ada 95 aren't. That is, you type Fn( obj, arg1, arg2 ) rather than obj.Fn( arg1, arg2 ).

    In Ada, warting is completely unnecessary. Instead you should define new types. It doesn't completely prevent errors (there was a space project where a lander completely missed its target because the programmers were working in traditional units while the scientists were working in metric) but it can be helpful.

    Turning to more practical languages <g> in C# it's also largely unnecessary to wart, although the index/count difference can be necessary. If you're programming in VB.NET, no warts are required if you turn on Option Strict.

    (to wart: to decorate your variables and parameters with the type; warts: the decorations themselves)
  • "What happens with Systems Hungarian when you decide to change a signed long to an unsigned long? You need to rename your variables from l to ul."

    Exactly. Which is precisely the reason that I encode type in variable names in my personal notation which seems to be similar to System Hungarian. If I change the type of a variable I need to check every line where that variable is used if I accidently introduced hidden bugs by the type change. And I want the compiler to complain loudely should I have missed an occurence, which it will do if the name of the variable did also change. As Mike mentioned, in C/C++ the change of type is not necessarily enough to get a compiler error or warning.

    The problem is that people who argue that Hungarian notation, especially System, is not needed anymore tend to leave out "if you code with MSVS or a similar IDE in C#, VB or Java for the Wintel plattform". If you code in C/C++ for Embedded, probably with an editor without MS gadgets, its a whole different story and other rules apply, IMHO.
  • "The first one, which is the one that most people know about, is “Systems Hungarian”.  System’s Hungarian is also “Hungarian-as-interpreted-by-Scott-Ludwig”.  In many ways, it’s a bastardization of “real” (or Apps) Hungarian as proposed by Charles Simonyi."

    Hi Larry. Now I'm famous :). The story is that the hungarian bastardization originally came from the documentation folks. In the systems group we originally produced raw documentation for them that had standard "apps like" hungarian. They decided it was too obtuse for documentation so they did some serious readability changes to it. They are not programmers so this wasn't a graceful operation. This had a *huge* secondary effect because new programmers in the systems group would read the documentation and "more or less" reproduce that "documentation group" style. Not to mention books were written about the api that referenced that style. Pretty soon we had more code in this "docs group" style than in any other style.

    A smaller effect came from Win32 birthing. Many new kernel32 apis were created and what you see in those apis is MarkL's personal interpretation of what he read as the style in the documentation. Many "CountOfBytes" instead of "cb", "IndexOfX" instead of ix, etc.

    Good to hear you are still cranking away. I hope all is going well with you.
  • Wow! Thanks Scott for the clarification.

    I apologize for taking your name in vain, btw, I didn't have contact info for you so I didn't check it with you. I REALLY appreciate the clarification.

    And yes, I'm still cranking away here, I'm over in multimedia land (talk about strange journeys) but I'm still having fun. It's scary, I hit 20 years in 2 months.

  • Dude, that was NINE months ago. How is this _possibly_ following on the heels? Just how big do you think my heels are?

    :-)

  • One of the best arguments for HN in the "old" days was to reduce the need to PageUp to see variable declarations. Two (unrelated?) factors that render this less necessary/cumbersome are (1) large, hi-res displays that let the developer see more lines - I can easily see 60-80 LOC on my screen right now; and (2) "improved" programming practices/languages leading to smaller, more cohesive routines, where definition and usage are extremely close.

    Where the definition of a variable cannot be local (forms controls are the most obvious) I still feel the need to see some qualifier for ease-of-comprehension. That said, these days I prefer UserNameTextBox to txtUserName...
  • Larry,

    Thanks to you, I learned today that i've been 'doing' Apps Hungarian forever. I didn't even know it had a name ;-)

    My personal rules are pretty much consistent with Simon's Lite rules (Although I have a few more ones).

    I tend to specify the type only if it has some importance e.g. I have a WORD variable, there's most likely a reason why it's not simply an int. So people who read the code should better be aware of it.

    But in most cases, my Hungarian prefixes are just abbreviations for common words that should otherwise appear in the variable name. e.g. I truly hate nNumberOfBytes. cBytes does a way better job IMHO.
  • I was bored this weekend so I ended up trawling through a bunch of blog archives and came across posts

  • PingBack from http://www.electricmonk.nl/log/2005/05/16/joel-on-software-linkdump/

  • PingBack from http://dukelupus.wordpress.com/2008/04/29/muutujate-nimetamine-kige-raskem-osa-programmeerimisest/

  • PingBack from http://inside.echobit.net/dreijer/archives/2008/08/12/reflections-on-hungarian-notation/

Page 2 of 3 (36 items) 123