Larry Osterman's WebLog

Confessions of an Old Fogey
Blog - Title

Hugarian notation - it's my turn now :)

Hugarian notation - it's my turn now :)

  • Comments 36

Following on the heals of Eric Lippert’s posts on Hungarian and of course Rory Blyth’s classic “Die, Hungarian notation… Just *die*”, I figured I’d toss my hat into the fray (what the heck, I haven’t had a good controversial post in a while).

One thing to keep in mind about Hungarian is that there are two totally different Hungarian implementations out there.

The first one, which is the one that most people know about, is “Systems Hungarian”.  System’s Hungarian is also “Hungarian-as-interpreted-by-Scott-Ludwig” (Edit: For Scott's side of this comment, see here - the truth is better than my original post).  In many ways, it’s a bastardization of “real” (or Apps) Hungarian as proposed by Charles Simonyi. 

Both variants of Hungarian have two things in common.  The first is the concept of a type-related prefix, and the second is a suffix (although the Systems Hungarian doesn’t use the suffix much (if at all)).  But that’s where the big difference lies.

In Systems Hungarian, the prefix for a type is almost always related to the underlying data type.  So a parameter to a Systems Hungarian function might be “dwNumberOfBytes” – the “dw” prefix indicates that the type of the parameter is a DWORD, and the “name” of the parameter is “NumberOfBytes”.  In Apps Hungarian, the prefix is related to the USE of the data.  The same parameter in Apps Hungarian is “cb” – the “c” prefix indicates that the parameter is a type, the “b” suffix indicates that it’s a byte parameter.

Now consider what happens if the parameter is the number of characters in a string.  In Systems Hungarian, the parameter might be “iMaxLength”.  It might be “cchWideChar”.  There’s no consistency between different APIs that use Systems Hungarian.  But in Apps Hungarian, there is only one way of representing the parameter; the parameter would be “cch” – the “c” prefix again indicates a count, the “ch” type indicates that it’s a character.

Now please note that most developers won’t use “cch” or “cb” as parameters to their routines in Apps Hungarian.  Let’s consider the Win32 lstrcpyn function:

 LPTSTR lstrcpyn(     
LPTSTR lpString1,
LPCTSTR lpString2,
int iMaxLength
);

This is the version in Systems Hungarian.  Now, the same function in Apps Hungarian:

 LPTSTR Szstrcpyn(     
LPTSTR szDest,
LPCTSTR szSrc,
int cbLen
);

Let’s consider the differences.  First off, the name of the function changed to reflect the type returned by the function – since it returns an LPTSTR, which is a variant of a string, the function name changed to “SzXxx”.  Second, the first two parameters name changed.  Instead of “lpString1” and “lpString2”, they changed to the more descriptive “szSrc” and “szDest”.  The “sz” prefix indicates that the variable is a null terminated string.  The “Src” and “Dest” are standard suffixes, which indicate the “source” and “destination” of the operation.  The iMaxLength parameter which indicates the number of bytes to copy is changed to cbLen – the “cb” prefix indicates that it’s a count of bytes, the standard “Len” suffix indicates that it’s a length to be copied.

The interesting thing that happens when you convert from Systems Hungarian to Apps Hungarian is that now the usage of all the parameters of the function becomes immediately clear to the user.  Instead of the parameter name indicating the type (which is almost always uninteresting), the parameter name now contains indications of the usage of the parameter.

The bottom line is that when you’re criticizing Hungarian, you need to understand which Hungarian you’re really complaining about.  Hungarian as defined by Simonyi isn’t nearly as bad as some have made it out to be.

This is not to say that Apps Hungarian was without issue.  The original Hungarian specification was written by Doug Klunder in 1988.  One of the things that was missing from that document was a discussion about the difference between “type” and “intent” when defining prefixes.  This can be a source of a great confusion when defining parameters in Hungarian.  For example, if you have a routine that takes a pointer to a “foo” parameter to the routine, and internally the routine treats the parameter as single pointer to a foo, it’s clear that the parameter name should be “pfoo”.  However, if the routine treats the parameter as an array of foo’s, the original document was not clear about what should happen – should the parameter be “pfoo” or “rgfoo”.  Which wins, intent or type?  To me, there’s no argument, it should be intent, but there have been some heated debates about this over the years.  The current Apps Hungarian document is quite clear about this, intent wins.

One other issue with the original document was that it predated C++.  So concepts like classes weren’t really covered and everyone had to come up with their own standard.  At this point those issues have been resolved.  Classes don’t have a “C” prefix, since a class is really just a type.  Members have “m_” prefixes before their actual name.  There are a bunch of other standard conventions but they’re relatively unimportant.

I used Hungarian exclusively when I was in the Exchange team; my boss was rather a Hungarian zealot and he insisted that we code in strict Apps Hungarian.  Originally I chafed at it, having always assumed that Hungarian was stupid, but after using it for a couple of months, I started to see how it worked.  It certainly made more sense than the Hungarian I saw in the Systems division.  I even got to the point where I could understand what an irgch would without even flinching.

Now, having said all that, I don’t use Hungarian these days.  I’m back in the systems division, and I’m using a home-brewed coding convention that’s based on the CLR standards, with some modifications I came up with myself (local variables are camel cased, parameters are Pascal cased (to allow easy differentiation between parameters and local variables), class members start with _ as a prefix, globals are g_Xxx).  So far, it’s working for me.

I’ve drunk the kool-aid from both sides of the Hungarian debate though, and I’m perfectly happy working in either camp.

 

  • How is that blog entry a "classic"? He's belittling people who use Hungarian improperly, not talking about Hungarian itself.

    I personally HATE reading code w/o proper Hungarian. As Eric Lippert has been saying lately, reading code is harder than writing code. When I read code w/o Hungarian, I always find myself paging up to find the type of a variable, then paging back down and (after finding my place) resuming reading the code.

    Proper prefixes also keep the developer honest about matching types (signed vs. unsigned; MBCS vs. Unicode strings, etc.) which helps prevent bugs related to mismatched types. Sure, the compiler MIGHT warn you about them, if you have the warning level high enough... but we all know people who ignore (or worse, turn off) warnings, and besides anything that helps find bugs earlier is a Good Thing in my book.

    Rory also asks why use Hungarian "in this day and age of the superpowered IDE". Well, you don't always have your IDE to hold your hand. I often skim code in an 80-column 4NT window because it's faster than using VC.

    In any case, it's another case of someone bashing bad Hungarian and then condeming Hungarian altogether, without recognizing the benefits that good Hungarian brings.
  • Your points are absolutely valid. But Rory's "classic" made me laugh, which is always a good thing.

    You're absolutely right his point was to bash bad hungarian and throw the baby out with the dishtowel (purposely mixing metaphors).

    I'm hoping to write about the negatives in Hungarian sometime in the future (maybe tomorrow), they can be quite significant actually, which is why I don't code in it these days. Especially with Systems Hungarian
    where the Hungarian represents the type not the intent.

    What happens with Systems Hungarian when you decide to change a signed long to an unsigned long? You need to rename your variables from l to ul. Also, what's the thing about the difference between dw and ul? Why do you differentiate? Apps Hungarian actually forbids the use of dw because it's a compiler/hardware specific type.

  • I've seen this growing fashion for using an underscore prefix for member variables. Sutter's "Exceptional C++" uses it extensively, for example.

    Frankly, I don't like it. As I understand it, the C/C++ standards state that identifiers beginning with an underscore are reserved for the use of the language/library implementer.

    Thus, by using an underscore prefix, you're explicitly allowing your implementation to break your code -- by defining macros with underscores, for example.

    Do you have any comments about this?
  • An interesting point. I'm trying to avoid the m_Xxx thingy because it's too "mfc-ish" for my tastes, do you have an alternative suggestion?
  • Yeah, I don't think you can really hold any Rory Blythe post up as a "classic" except as a "classic Rory Blythe post".

    People who make the argument that you should do away with a programming notation because of the IDE you are using aren't very good programmers. If you are leaning on the IDE too much, you don't understand the language/framework you are using well enough.

    People who say Hungarian notation makes the code harder for them to read have a point. It may be harder for THEM to read the code. I knew a guy once who couldn't make heads or tails of SQL stored procedures unless it was indented with line breaks a certain way. He had the same problem with variable declarations, they all had to be on a separate line.

    I still use those old VB/VBA (I don't remember where they came from) guidelines for naming controls on your form. e.g. btnGo instead of just go. Say you are designing a winform and you have a text box for the name and a label for the "name" textbox. I'd name the label "lblName" and the textbox "txtName". Which is "pointless Hungarian" (as Eric L so named it). Given that in ASP.NET the form markup where the control is declared and the code behind that responds to the control are separate, it makes sense to me to use some kind of prefix (or suffix like "nameTextBox" except that I hate typing THAT much just for a variable name) to tell me what type the control is. Plus it allows me to semantically group controls together that are related. Say, for a search control that allows the user to input some text and then choose whether or not to search the web or just "this site" using a drop down. The three controls I would need are a Button, TextBox, and a DropDownList. btnSearch, txtSearch, ddlSearch (or cboSearch). How does Apps Hungarian handle widgets?
  • "What happens with Systems Hungarian when you decide to change a signed long to an unsigned long? You need to rename your variables from l to ul."

    Most every text editor supports a "find and replace" function that easily lets me change the variable names. Only Visual Studio allows me to hover over a var name and find out the type (which doesn't always work and sometimes requires a restart to make it work) or lets me right-click on a var/method and "go to definition" (the best trivial IDE function I've ever seen)
  • No, I don't have an alternative solution. Since my day job still requires me to use (and abuse) MFC, I tend to stick with m_.

    Is there any real reason (other than that it's an MFC-ism) that you don't like the m_?

    I'm starting to do a bit of C# coding these days, and I'm trying to stick with the coding guidelines (from MSDN), which seem to imply that member variable names should look likeThis.

    Frankly, I miss my m_.

    I started using Hungarian naming religiously back in the days of Windows 3.0, but I've mellowed to the point that I think the only useful prefixes are m_ and g_, to signify scope, rather than type.
  • The C# coding guidelines are intended for public classes, which is why they don't differentiate between fields and non-fields - in fact if you run fxcop, it complains about having public fields.

    I agree that m_ and g_ are the only carryovers, I may start using them again :)
  • I have to admit, that go to definition is the best feature since the pop up of an objects methods and properties.
  • My copy of Exceptional C++ uses an underscore for data members, but as a suffix, not prefix, which is ok as far as the standard is concerned.

    I have also seen (and used) a convention where instance data members start with “its” (itsName) and class data members start with “their” (theirCount).

    Although, in C++ on Win32 it’s hard to use any naming conventions, with STL using lowercase wor almost everything, and Win32 using Pascal case for functions and all caps for data types. (I once worked with a guy who consistently named (non-POD) classes with all caps… I almost heard the code scream every time I looked at it!)
  • > As I understand it, the C/C++ standards state that identifiers beginning with an underscore are reserved for the use of the language/library implementer.

    "In addition to the names documented in this manual, reserved names include all external identifiers (global functions and variables) that begin with an underscore (_) and all identifiers regardless of use that begin with either two underscores or an underscore followed by a capital letter are reserved names. This is so that the library and header files can define functions, variables, and macros for internal purposes without risk of conflict with names in user programs."

    http://www.gnu.org/software/libc/manual/html_node/Reserved-Names.html

    So, a member name beginning with a underscore and a lowercase letter would be safe, since it's not a global symbol (and macros would use two underscores or a underscore and a uppercase letter).

    Of course, you also have to avoid all the hundreds of function and variable names defined in the standards, because any of them can be implemented as a macro (for instance, errno). Some other prefixes and suffixes are reserved too; see the page above for details (no source code, so probably Safe For Microsoft Employees).

    I think there's no need to be that pedantic; if you can't compile something, it means you have the source code for that something, and can edit it to remove any incompatibilities. Of course, I avoid naming my types something_t, since I know it's reserved, but I don't check every name I create with the manual or the standards.
  • Actually in the Exchange Store, the classes are all in all caps (OFOLD, OMSG, etc). They're used internally as all lowercase (pofoldFoo, etc).
  • I use F as prefix for data members and A as prefix for parameters. It's something that's left over from Borland's TV/OWL/VCL coding standards (which have been close to the same since the very early 90's). I don't like m_, g_ or anything else with an underscore in it. There's really no need for an underscore unless you can't find the shift-key on your keyboard and write everything in lower or UPPER case, neither of which I want to see in my code.
  • The main problem is it makes names too long and too similar to other name's shape. They then need to be read rather than recognised (fluent humans recognise word shapes - illiteratacy they try to spell the word - it takes too long - they forget the previous words and so can't extract meaning from a sentence [as those who can't read at all is very small numbers of illiterate people)).

    So in short. It sounds like a good idea but not for humans. I'll also speak for the cat, it can't make heads or tails of it either.
  • I second the comment about go to definition / xref browsing being the best reason to use an IDE over just text editors. I honestly have no idea how people can debug / code in large source bases with out them. :P

    Personally, I use "the" as my prefix for member variables -- although this is mostly because I hate typing underscores (otherwise I'd probably use a leading underscore). Either way, I agree with the gist of the article that Apps hungarian is FAR more sensible than System hungarian. Again, though, I'm coming from the stance that I read code in my debugger which gives me right click type information, goto def, and xref browsing etc. YMMV.
Page 1 of 3 (36 items) 123