Larry Osterman's WebLog

Confessions of an Old Fogey
Blog - Title

What does style look like, part 6

What does style look like, part 6

Rate This
  • Comments 14
Previously in the series, I've touched on indentation and commenting, The next aspect of "style" that I want to talk about is variable (and function) naming conventions.

Upon reflection, I hungarianized the code way too early in the series, it properly belongs in this article, so lets go back to the beginning, and see what happens to the code with different naming styles.  Here's the original code:

#include "list.h"
main(int argc, char *argv[])
  {
    int i, N = atoi(argv[1]), M = atoi(argv[2]);
    Node t, x; initNodes(N);
    for (i = 2, x = newNode(1); i <= N; i++)
      { t = newNode(i); insertNext(x, t); x = t; }
    while (x != Next(x))
      {
        for (i = 1; i < M ; i++) x = Next(x);
        freeNode(deleteNext(x));
      }
    printf("%d\n", Item(x));
  }

 

Rewriting the code in hungarian yields:

#include "list.h"
main(C cArg, SZ rgszArg[])
  {
    I iNode, cNodes = atoi(rgszArg[1]), cNodesToSkip = atoi(rgszArg[2]);
    PNODE pnodeT, pnodeCur; InitNodes(cNodes);
    for (iNode = 2, pnodeCur = PnodeNew(1); iNode <= cNodes ; iNode++)
      { pnodeT = PnodeNew(i); InsertNext(pnodeCur, pnodeT); pnodeCur = pnodeT; }
    while (pnodeCur != PnodeNext(x))
      {
        for (iNode = 1; iNode < cNodesToSkip ; iNode++) pnodeCur = PnodeNext(pnodeCur);
        FreeNode(PnodeDeleteNext(pnodeCur));
      }
    printf("%d\n", Item(nodeCur));
  }

So what changed?  First off, all the built-in types are gone.  Hungarian can use them, but not for most reasons.  Next, the hungarian types "I", "C", and "SZ" are used to replace indexes, counts and strings.  Obviously the C runtime library functions remain the same.  Next, I applied the appropriate prefix - i for indices, c for counts, p<type> for "pointer to <type>".  The Node type was renamed PNODE, in Hungarian, all types are uppercased.  In Hungarian, the name of the routine describes the return value - so a routine that returns a "pointer to foo" is named "Pfoo<something relevent to which pfoo is being returned>".

But, of course, there are LOTs of alternative naming conventions other than Hungarian.  For example, Microsoft's published extensive documentation on design guidelines for class library developers who want to publish managed libraries.  Lets recast the sample application in the .Net coding style:

So what does the .Net/CLR coding style look like?  First off, the CLR naming guidelines for methods require that they use PascalCase, and that verbs or verb phrases are used to name the methods.  Second, the CLR naming guidelines require that parameters to functions obey camelCase.  This example doesn't have any parameters declared other than argv and argc, so that aspect dosn't change.  But the naming guidelines are pretty clear about the choice of names: parameters should always be spelled out, abbreviations should only be used if developers generally understand them.  And you should NOT use hungarian notation.  They go on to say that "good names describe semantics, not type".  I find this last one fascinating, since it highlights one of the major differences between systems Hungarian and applications Hungarian - Systems Hungarian represents the type, apps Hungarian represents the semantics.

Because the CLR naming conventions don't describe anything about local variables, the CLR standards for local variables have to be inferred.  This makes sense because the CLR coding standards are all about describing the public interfaces to components, and not their internal standards.  As such, they explicitly don't cover local variables (or member variables (fields), except to say that you're not allowed to have any).  But a CLR-ish style can be inferred.  The first thing that's clear is that abbreviations are out, and that variables should represent full English words.  I'm going to go out on a limb and keep the "i" as an indexer because it's a convention that's pretty standardized.  This still leaves the camelCase vs PascalCase issue open, just for grins, lets go with camelCase. 

#include "list.h"
main(int argCount, char *args[])
  {
    int i,  numberOfNodes = atoi(args[1]), nodesToSkip = atoi(args[2]);
    Node newNode, currentNode; InitNodes(numberOfNodes);
    for (i = 2, currentNode = NewNode(1); i <= numberOfNodes; i++)
      { newNode = NewNode(i); InsertNext(currentNode, newNode); currentNode = newNode; }
    while (newNode != Next(currentNode))
      {
        for (i = 1; i < nodesToSkip ; i++) currentNode = Next(currentNode);
        FreeNode(DeleteNextNode(currentNode));
      }
    printf("%d\n", Item(currentNode));
  }

One thing that was pointed out by one of my co-workers is that it's often extremely useful to visually differentiate between parameters, local variables, and member variables, and in this example you can't differentiate between parameters and locals.  This isn't as big a deal in a small example like this, but for a larger, more complicated example, it might turn into a big deal.

If we decide that parameters are declared in camelCase, if you apply this paradigm, you can name local variables with PascalCase.  That leaves member variables.  Even though the CLR guidelines for members explicitly prevents prefix characters (like m_ and g_), if you just use "_" as a prefix character, it allows you the ability to differentiate between them.

Lets look at the example using camelCase for parameters and PascalCase for local variables.

#include "list.h"
main(int argCount, char *args[])
  {
    int I,  NumberOfNodes = atoi(args[1]), NodesToSkip = atoi(args[2]);
    Node NewNode, CurrentNode; InitNodes(NumberOfNodes);
    for (I = 2, CurrentNode = NewNode(1); I <= NumberOfNodes; I++)
      { NewNode = NewNode(i); InsertNext(CurrentNode, NewNode); CurrentNode = NewNode; }
    while (NewNode != Next(CurrentNode))
      {
        for (I = 1; I < NodesToSkip ; I++) CurrentNode = Next(CurrentNode);
        FreeNode(DeleteNextNode(CurrentNode));
      }
    printf("%d\n", Item(CurrentNode));
  }

Hmm. Two major issues jump out immediately with this example.  The first one is:

NewNode = NewNode(i);

There's a major inconsistency here - the local variable and the function have exactly the same name (of course, in a non case-sensitive language, this issue is more significant).  With luck, the ambiguity can be resolved by the compiler, but that's only for this case - and a style has to work for ALL cases (pun unintended).  It's pretty horrible when you get to the point when you realize that youve got to either (a) change your code for the sake of style of (b) change your style for the sake of the compiler.

This illustrates an important aspect of a naming convention: Whatever you do, you shouldn't have a naming convention that allows for ambiguity.  In this case, each naming (and capitalization) convention can be considered as a puzzle piece.  When you come up with your final coding style, you need to take all of these into account.  You have four variables: Global Function/Member Function naming convention/case, Parameter naming convention/case, Local Variable naming convention/case, and Member Variable naming convention/case.  And your have a myriad of possible naming convention/case options, from hungarian to CLR, from PascalCase, to camelCase, to whatever else you can come up with.

When defining a style, you need to balance all of these aspects to come up with a coherent whole.  Fundamentally, it doesn't matter which naming/case convention you chose for each of the variables, just as long as it's aesthetically pleasing to you and your team.

So lets take the sample application and reframe it in the first of the CLR conventions I mentioned above (camelCase local variables):

/*++
 *        <Copyright Notice>
 *
 *  Module Name:
 *        MyProgram.c
 *
 *  Abstract:
 *        This program implements a solution to the Josephus problem. 
 *        Code taken from Robert Sedgewick's "Algorithms in C, Third Edition",
 *        Program 3.13, on Page 103.
 *
 *  Author:
 *        Robert Sedgewick
 *
 *  Revision History:
 *
 *    LarryO             11/12/2004     Restructured for weblog posts.
 *    Robert Sedgewick   <Unknown>      Created. 
 --*/

#include "list.h"


/*++    main - Implements the Josephus problem and prints the result.
 *
 *
 *     main (cArg, rgszArg)
 *
 *     ENTRY   cArg - number of arguments.
 *             rgszArg - array of cArg null terminated strings
 *     EXIT    None.
 *
 *     <discussion of internals, one or more paragraphs>
 *
 *
 --*/
main(int argCount, char *args[])
{
    int i
    int numberOfNodes = atoi(args[1]);
    int nodesToSkip = atoi(args[2]);
    Node newNode;
    Node currentNode;

    InitNodes(numberOfNodes);

    //
    // Create a list of cNodes nodes. 
    // 
    for (i = 2, currentNode = NewNode(1); i <= numberOfNodes; i++)
    {
        newNode = NewNode(i);
        InsertNext(currentNode, newNode);
        currentNode = newNode;
    }

    //
    // Walk the list of nodes, freeing the node that occurs at every cNodesToSkip nodes in the list.
    //
    while (newNode != Next(currentNode))
    {
        for (i = 1; i < nodesToSkip ; i++)
        {
            currentNode = Next(currentNode);
        }
        FreeNode(DeleteNextNode(currentNode));
    }


    //
    // Print out the value of the current node.
    //
    printf("%d\n", Item(currentNode));
}

When I first rewrote the code in hungarian, I got screams from the commenters about how hideous it looked in hungarian. Just for grins, lets compare the code portions of the two examples (hungarian and CLR).

First the hungarian:

main(C cArg, SZ rgszArg[])
{
    I iNode;
    I cNodes = atoi(rgszArg[1]);
    I cNodesToSkip = atoi(rgszArg[2]);
    PNODE pnodeT;
    PNODE pnodeCur;

    InitNodes(cNodes);

    //
    // Create a list of cNodes nodes. 
    // 
    for (iNode = 2, pnodeCur = PnodeNew(1); iNode <= cNodes ; iNode++)
    {
        pnodeT = PnodeNew(iNode);
        InsertNext(pnodeCur, pnodeT);
        pnodeCur = pnodeT;
    }

    //
    // Walk the list of nodes, freeing the node that occurs at every cNodesToSkip nodes in the list.
    //
    while (pnodeCur != PnodeNext(x))
    {
        for (iNode = 1; iNode < cNodesToSkip ; iNode++)
        {
            pnodeCur = PnodeNext(pnodeCur);
        }
        FreeNode(PnodeDeleteNext(pnodeCur));
    }

    //
    // Print out the value of the current node.
    //
    printf("%d\n", Item(nodeCur));
}

Now, the CLR:

main(int argCount, char *args[])
{
    int i;
    int numberOfNodes = atoi(args[1]);
    int nodesToSkip = atoi(args[2]);
    Node newNode;
    Node currentNode;

    InitNodes(numberOfNodes);

    //
    // Create a list of numberOfNodes nodes. 
    // 
    for (i = 2, currentNode = NewNode(1); i <= numberOfNodes; i++)
    {
        newNode = NewNode(i);
        InsertNext(currentNode, newNode);
        currentNode = newNode;
    }

    //
    // Walk the list of nodes, freeing the node that occurs at every cNodesToSkip nodes in the list.
    //
    while (currentNode!= Next(currentNode))
    {
        for (i = 1; i < nodesToSkip ; i++)
        {
            currentNode = Next(currentNode);
        }
        FreeNode(DeleteNextNode(currentNode));
    }


    //
    // Print out the value of the current node.
    //
    printf("%d\n", Item(currentNode));
}
 

What I find fascinating about putting these two examples side-by-side is that stylistically, there's actually not that much difference between the two - if you hold all things constant and just vary the naming convention, the code isn't that different.

Tomorrow, lets look at some C/C++ minutae.

 

EDIT: Fixed some typos (Thanks David Mellis).

 

  • "good names describe semantics, not type" [and therefore do not use Hungarian]

    *sigh* Where to start with that one...
  • Mike, I'm just the messenger :)

    I agree with the comment otoh - good variable names SHOULD describe semantics.
  • Everyone pooh poohs Pascal, but Wirth got a lot right. I would say that Java and C# owe more to Pascal, in some ways, than to C.
  • I didnt see the part of the CLR guidelines that discourages prefixes for member variables, and if so, that's stupid.

    There should be a differentiation between member variables, parameters and local variables, if desired.


    I typically prefix member variables with the "_" underscore prefix, and to some degree I also qualify references with "this" so you'd see in my code

    "this._name = value;"
    "this._id = value;"

    The added benefit is intellisense once you type "this._" you see all your class member variables lined up. Why this is explicitly discouraged in the CLR guidelines is dumb, and goes against what Microsoft's own programmers apply to a good 75% of the classes.

    Additionally, related to naming conventions/clashes with member variable names/etc, I have a problem with people writing methods that are more than 10-20 lines.

    IMO, If you have a function with 15 local variables, (which incidentally slows down CLR efficiency, I hear because of certain IL instructions are optimized for the first 5 locals), then you've got a function that is too broad in scope and should be refactored.

    Consider when running release builds of your methods, and an Null reference exception happens on a 100 line method. No line number, just the method name. Figure that one out.

    Frankly, I dont care how your internals look. Just please follow the public interface guidelines so I understand the semantics of calling the public functions: parameter names and function names that define what they do.

    I just find it amusing (to the tune of Daily WTF) of just how little common sense devs apply to their code.
  • Eric,
    That's pretty much what I do - members get an _ in front of the name to differentiate them.

    I don't use visual studio for my editor (I use Epsilon), so I don't have the benefit of intellisense :(

  • CLR == Common Language Runtime - Also known as the .Net framework.
  • In all this talk about indenting, naming conventions, and comment standards, I think you've ignored the most important aspects of this code's style.

    First, there's cleary some global state involved (otherwise, what's the call to initNodes() do?). Second, there seems to be some trickery with the default next node for new nodes. Otherwise, how does the linked list end up as a circle? Third, what does DeleteNextNode do? If it sets node->next to node->next->next, it's badly named, as "delete" is associated with memory deallocation, not removal from a list. Fourth, what's the Josephus problem? I'd prefer a brief description to the comments for each section of code.

    I know that you're just using this piece of code as an example, but it's difficult to evaluate the effectiveness of these surface changes when more fundamental problems remain.

    BTW, you have some typos in your final snippet. In the line:

    while (newNode != Next(currentNode))

    the "newNode" should be "currentNode". You're missing a semi-colon after the "int i" and you've forgotten to change the variable names in the comments.
  • David,
    Your point is valid, there IS global state. This may be a poor example, as I said in the first post, it was chosen by opening a textbook at random and picking the nearest sample code fragment with significant amounts of code in it (I was lazy and didn't want to spend the time working on a good example :))

    I'll fix the typos.

  • One thing to keep in mind is that C++ specifications reserve leading underscores for the implementation.

    So calling a local variable _foo is a no-no in C++, and calling a local variable m_foo in C# is discouraged.

    Interesting.

    Myself, I like the camelCase and _local style when writing C++. I'm just not disciplined enough for true Hungarian sadly, so I've stopped trying to half-ass it with strFoo and iFoo.
  • > The first thing that's clear is that
    > abbreviations are out,

    So the typedef formerly known as I has to be INTEGER?

    > and that variables should represent full
    > English words.

    Not Hungarian? Of course the language keywords are English, but why should variables be? Actually I think the C committee went too far when they allowed non-Italian characters in variable names, I think those should stay in comments and character strings, but it seems rather reasonable to have variable names using Italian characters to represent Hungarian or Japanese words.

    Sometimes it seems kind of irrelevant, for example Gosa could be named ErrorRange instead of Gosa, but sometimes it's highly relevant. Consider an IME that recognizes ConversionKey and NonConversionKey and others in deciding how to convert among ItalianCharacters and JapanesePhoneticCharactersUsuallyUsedForPartsOfJapaneseWords and JapanesePhoneticCharactersUsuallyUsedForForeignWords and ChineseCharacters. It would be more readable to recognize the HenkanKey and MuHenkanKey and others in deciding how to convert among Romaji and Hiragana and Katakana and Kanji. (By the way, Microsoft's English language help files for the IME would also be more readable if they did the same. Some months after trying to guess what kind of key is a conversion key and what kind of key is a non conversion key, I finally figured out that Microsoft meant the keyboard's henkan key and muhenkan key.)
  • Norman, I'm describing the CLR conventions, which are pretty clearly anti-hungarian.
  • For reading code from other people, I do prefer the CLR style, because it's much more "human" readable (versus "developer" readable in the hungarian notation).
    But I see that in some cases you need to have prefixes to know what you have to expect from that variable.
    But when programming with classes, you will sooner or later run out of prefixes that are short enough to be typed in (at least that's what I find), so the hungarian paradigm seems to fit more to the funcional world than to the OO-World (where the .net citizens _should_ live).
    By using .net (and Visual Studio as your editor), you really don't need to put this into the name, because IntelliSense will pop up the type just when moving the mouse over the variable and just entering a period (hopefully in whidbey at any character) the list of members and locals will appear, so you get instant information about the type.

    (maybe you missed that because you're hooked to Epsilon ?)
  • Guenter,
    Even if I used visual studio, I wouldn't have the benefit of intellisense. The NT build environment (or the Exchange build environment) doesn't run from within visual studio, and the browser database that intellisense uses isn't generated.

    And, like source level debugging, intellisense is a crutch, you shouldn't depend on it.
  • Larry,

    if you refer to VStudio 6.0, you're perfectly right, but although I am a programmer since Windows 2.11 (using the MS C compiler) and did not use intellisense at all in C and C++, I instantly switched to this behavior when turning into C# development, as it works much more reliable than the VS6 thing did...

    Maybe this will get better inside MS when all code is ported to CLR, but I think this won't be too soon for Exchange and NT :-)
Page 1 of 1 (14 items)