Larry Osterman's WebLog

Confessions of an Old Fogey
Blog - Title

What does style look like?

What does style look like?

Rate This
  • Comments 17
Building on my post on style last week.

So what does "coding style" look like, anyway?

Well, the facile answer is "I know it when I see it".  But it's more than that.  A good coding style should result in high quality self-documented code that is esthetically pleasing to look at.  This is not to reduce the value of external specifications, they are utterly critical to the success of a project, but external specifications serve a different purpose from the code.  Neither can function independently.

I thought it might be interesting to see what happens to a single piece of code when you use different styles on it.

For the purposes of illustration, I opened my copy of Robert Sedgewick's "Algorithms in C, 3rd edition" to a random page, and picked the most reasonable example from that page.  On that page is the following code:

#include "list.h"
main(int argc, char *argv[])
  {
    int i, N = atoi(argv[1]), M = atoi(argv[2]);
    Node t, x; initNodes(N);
    for (i = 2, x = newNode(1); i <= N; i++)
      { t = newNode(i); insertNext(x, t); x = t; }
    while (x != Next(x))
      {
        for (i = 1; i < M ; i++) x = Next(x);
        freeNode(deleteNext(x));
      }
    printf("%d\n", Item(x));
  }

This is the code as it exists in the book, any errors come from my poor transcription efforts.

Let's see what happens to the code if you apply a couple of different coding styles.  Please note: there's no one-true coding style.  They're all different, and they all have their advantages.

First up, Hungarian.  Hungarian's a structure-neutral coding style, so all I did was to hungarian-ize the variables and functions, without touching the other aspects.

#include "list.h"
main(C cArg, SZ rgszArg[])
  {
    I iNode, cNodes = atoi(rgszArg[1]), cNodesToSkip = atoi(rgszArg[2]);
    PNODE pnodeT, pnodeCur; InitNodes(cNodes);
    for (iNode = 2, pnodeCur = PnodeNew(1); iNode <= cNodes ; iNode++)
      { pnodeT = PnodeNew(i); InsertNext(pnodeCur, pnodeT); pnodeCur = pnodeT; }
    while (pnodeCur != PnodeNext(x))
      {
        for (iNode = 1; iNode < cNodesToSkip ; iNode++) pnodeCur = PnodeNext(pnodeCur);
        FreeNode(PnodeDeleteNext(pnodeCur));
      }
    printf("%d\n", Item(nodeCur));
  }

So what changed?  First off, all the built-in types are gone.  Hungarian can use them, but not for most reasons.  Next, the hungarian types "I", "C", and "SZ" are used to replace indexes, counts and strings.  Obviously the C runtime library functions remain the same.  Next, I applied the appropriate prefix - i for indices, c for counts, p<type> for "pointer to <type>".  The Node type was renamed PNODE, in Hungarian, all types are uppercased.  In Hungarian, the name of the routine describes the return value - so a routine that returns a "pointer to foo" is named "Pfoo<something relevent to which pfoo is being returned>".

Looking at the transformation, I'm not sure it's made the code any easier to read, or more maintainable.  The next examples will try to improve things.

  • It was not clear to me if Hungarian notation was of any help in improving source code readability. I thought that it was more a matter of being used to it, because variable names alone (without including the prefix that indicates the type) should indicate the context/use for it.

    The context, again, gains more importance if you consider the rule of writting procedures/functions for resolving a specific and bounded problem (the context.) So, for example, in the case you're processing a set of command line arguments, say:

    char* cmd = argv[2];

    everybody will easily realize that, for instance, using

    int i = cmd;

    is a mistake, because in this context, cmd points to a command line argument. However, using the more reasonable code:

    int i = atoi(cmd);

    seems to be the most common sense thing to do.

    As I said, that was my previous thinking... Now, looking at the code you wrote, it is clear to me that Hungarian notation does _not_ help improving software readability :)

    Regards,
    diego
  • Diego,
    Actually I don't think that the hungarian helped at all. But that's only the first step in the transformation. I suspect that with some of next changes will improve things.
  • OK, Larry, let me see if I can contribute to enhance a little bit more the readability of this code.

    First, the code seems to have been written trying to save space for the book. This is why it joins a lot of variable declarations in one line, and, in one case, a call to a function (initNodes) in the same line of a variable declaration.

    So the first two rules that come to my mind are (in no particular order):

    1. Try to define each variable in one line, specially if you give an initial value to this variable.
    2. Do not mix variable declarations with function calls in the same line.

    The indenting style seems to be a question of taste. I particularly like the BSD style:

    if (xxx)
    {
    yyy (8 spaces of indent)
    }

    I think it is more clear. In particular, a big indenting helps two things: better identify the code (loops, etc.); and write shorter functions (indentation goes far from the left margin). So, for my taste, two more rules:

    3. Use BSD indent :)
    4. Use 8-space indent

    Then, the first for loop initializes a variable that it is not tested in the test part of the if. Another rule would be:

    5. Do not use the initialization of the "for" statement to initialize other variables than the ones checked in the check part of that "for".

    The body of this same for could have been expanded to write a statement in a line.

    I can't think of more things right now. Perhaps, that it would be nice to have some kind of prefix for functions so that it is clear that they are refered to lists. Having "Next" as the name of a function is kind of looking for problems of duplicate definitions... maybe using "ListNext"? Which leads me to the last and more important rule:

    6. Use appropriate names for variables and functions :)

    Finally, it is not clear to me that this piece of code is working properly... What if N is 1? The call to "newNode" includes it into the list as the first element?

    Just my €0.02 :)

    Best regards,
    diego
  • Diego,
    You're absolutely right about the code being dense for use in a book. I'm not attempting to criticize the code, I'm just investigating how differing programming styles affect a particular piece of code. Sedgewick's book happened to be on my desktop and I know it's got a lot of code in it (and I can't use any production Microsoft code for obvious reasons).
  • Larry, you didn't use Real World Hungarian, you used Really Bad Hungarian (the kind of Hungarian that knee-jerk anti-Hungarian zealots always write). ;) No one in their right mind would use "I" or "SZ" for type names, so I trust this was an intentional illustration of bad style.

    Diego> "it is clear to me that Hungarian notation does _not_ help improving software readability"

    Not quite right. Bad Hungarian does not help readability. Good Hungarian is a *huge* benefit, in fact I personally hate reading code that doesn't use Hungarian because half the time I'm paging up to remind myself what the variable types are.
  • The thing about Hungarian -- as I remark several times in my blog -- is that its value really shines through only in a very small number of situations -- but when it does, it really helps.

    One of the first things I did as a full-timer at Microsoft was went through the VBScript string library -- which had to work on everything from fully UTF-16 WinNT to partially UTF-16 Win95 to ANSI Chinese Windows 3.1 -- with a fine-toothed comb renaming every variable so that it CORRECTLY indicated whether it was a count/index of bytes/characters, etc.

    I found SO MANY BUGS that way, bugs that it would have taken testing an _insane_ amount of time to find. Every time I looked at code and saw hey, this thing is assigning cb to a cch, I knew that was a bug in either Far East Windows 3.1 or Windows NT and could enter a bug and assign it to testing for regression.

    That's the value of Hungarian -- that it makes extra information about the semantics of a variable apparent that are not captured in the storage type. I don't need Hungarian to tell me what I already know -- what type a function returns, for instance, or what the storage type of a variable is.
  • Mike,
    That's 100% pure hungarian as laid out in Doug Klunder's original Hungarian document. Every production Hungarian application I've worked on uses I, C, CB, SZ, etc as types.


    Eric's 100% right - Hungarian's value shows up in rare cases. I chose it for starters just as an example, in a subsequent example, I'm going to rewrite the function using the .Net framework coding style, for now, bear with it :).

  • Anyone who doesn't switch to Firefox 1.0 is a bozo!
  • It's no coincidence that most examples of Hungarian being useful involve poor encapsulation and multiple alternative representations - the classic cb/cch examples come from C's mess of memcpy, wcscmp et al, and simply don't happen with well designed string types in C++. On the other hand, it's no coincidence, then, that Larry's example of good Hungarian uses SZ & co as types as well as name prefixes.

    When you stop thinking about low-level stuff, replace 'int' with 'CompanyID' and enforce your new type system statically, you can drop the prefixes. Then you can get on with your life, forget what Hungarian is, and accuse it of being bad because it's ugly.
  • Norman, I renamed the variable names to match hungarian. cN isn't valid hungarian - the suffix (N) should be meaningful in hungarian.

    I may have the pnodeT vs pnodeCur vs pnodeNew wrong, pnodeT is legal hungarian (T is a valid suffix for a temporary variable), but pnodeX isn't.

    Hungarian's more than just encoding the type as a prefix of the name - it also involves having meaningful names, and meaningful suffixes (Mac, Max, Cur, etc).
  • In the base note you already did more than Hungarianize the variable names, you also renamed some of the base names. Here are some efforts that more or less successfully avoid doing that kind of renaming:
    N -> cN (instead of cNodes)
    M -> cM (instead of cNodesToSkip)
    t -> pnodeT (instead of pnodeCur)
    x -> pnodeX (instead of pnodeNew)
    Then it is apparent that Hungarian notation is not usually the most important part of a variable name.

    In my experience it is valuable to distinguish type names from variable names, and it is usually valuable to distinguish constant names from variable names. This much of Hungarian notation is indeed useful. Though when a constant changes into a variable the renaming operation can be painful.

    Regarding a book's typography being a reason for using different style from production programs, that is somewhat true. Historically there was a benefit to having a function fit on a single page. But when maintenance costs exceed original development costs (the number of times a program is altered exceed the number of times it is initially coded), there are benefits to making some preparations for them. For example, always putting braces in if and else sections, instead of omitting them when the substatement is a single statement.
  • I thought the idea of using meaningful names for variables had already been proposed[*], long before the Hungarian proposal to prefix each name with a cryptic^H^H^H^H^H^H^Habbreviated designation of its type. OK, then Hungarian was half-meaningful.

    [* proposed repeatedly and rejected repeatedly, by Real Programmers who felt that if it was difficult to code then it should be difficult to understand[**]]

    [** What was that magazine article, Real Programmers Code in what language, I forgot. Cryptic variable names used to be enforced by Fortran, but I don't think the article's title said Fortran.]
  • Norman,
    T and X aren't meaningful, ever, so it would be inappropriate hungarianization to leave them alone.
  • Almost without exception, Hungarian sucks (In my opinion). I hate reading code written that way or writing it (and I've been required to do both).

    There's definitely a better way (not quite *any* other way, but almost)
Page 1 of 2 (17 items) 12