While some may think that naming conventions are much ado about nothing, no other subject of coding standards evokes as much fervent discourse.
When I first started programming for Windows back in college (ca 1990), I was baffled that all these really bright programmers at Microsoft would use cryptic symbol names like LPCWSTR and crgpcsz (called Hungarian notation). In short, it’s a naming convention that incorporates a symbol’s type information into its name using a series of short prefix identifiers. At the time, this seemed anathema to everything I was being taught about producing readable, maintainable code. Indeed, this notation presented a particularly difficult barrier for me when entering the world of Windows programming, and played no small part in my choosing to be a Unix developer for many years.
In 2000 I had the privilege of working in Microsoft Research for Dr. Charles Simonyi, the inventor of Microsoft Word and yes, the infamous Hungarian notation. When I asked Charles about his popular notation and why he proposed such a confusing convention, he got this amused look on his face and told me that most people had actually missed the point entirely. His intention, he explained, was not simply to conflate type information into the name of a symbol; rather, he wanted to free the developer from the burden of name selection, a “frustrating and time consuming task”. His premise was that if two programmers, using the same convention, would independently choose the same name for the same program text, then both goals of readability and write-ability have been served. Readability, he argued, becomes a natural artifact of write-ability, and thus emphasis on the latter is rightly placed. Although I remained skeptical, I had to admit that given the historically weak type safety of C and its lack of name encapsulation, it wasn’t difficult to understand the broad appeal of Charles’ proposal amongst early Microsoft programmers. Indeed, to this day some developers at Microsoft adhere to Hungarian notation with near religious fervor.
My personal experience is that Hungarian notation tends to obfuscate rather than to illuminate; that different programmers using Hungarian do not independently choose the same names for the same program text for the same reason that different programmers don’t usually choose to write algorithms in identical ways. There are often many ways to implement the same algorithm using different structures and types. To make matters worse, independent teams inevitably choose subtle style variations, further increasing confusion and inhibiting long term maintainability. Over time even a single team’s notation will evolve such that each successive generation of code will look progressively different from legacy code. I recently joined a team at Microsoft with a continuous product line that’s more than twenty years old. This team’s codebase includes some legacy C components so old that they look entirely different than code within recent years. If they had instead chosen plain English words and phrases for symbol names, their old code would be just as readable as the new code (though still in C instead of C++). And while I agree with Dr. Simonyi that naming a symbol with just the right words can be difficult, even frustrating at times, I think the effort pays off in more readable, maintainable code.
In recent years I’ve had the joy of working with increasingly more programmers from Generation Y. These brilliant “kids” cut their teeth on object oriented programming, have never had reason to use ancient editors like vi or Emacs, nor have they ever programmed without the aid of basic semantic tools like Intellisense. To them, Hungarian notation is not just an anachronism; it’s a pedantic scheme that gets in the way of their efficiency and creativity.
In twenty years of programming, I’ve found one thing to be universally true: consistency, above all else, is crucial to writing readable, maintainable code. Consistency between programmers on the same team as well as consistency for the same programmer from year to year.
Last year I proposed as simple naming convention for my team’s C++ development. It can be summarized quite succinctly. “Use consistent, meaningful English names that reflect the object described or action being taken. Name types, functions, properties and namespaces LikeThis, variables and parameters likeThis, private fields likeThis_ and C++ macros LIKE_THIS.” Notice the intentional similarity to the very practical naming convention for CLR development. However, C++ is different enough from C# to necessitate a few changes.
The details of my C++ naming proposal are as follows:
Casing Styles Defined
UpperCamelCase : the first letter in the identifier and the first letter of each subsequent, concatenated word are capitalized. You can use UpperCamelCase for identifiers of three or more characters. No underscores are used. For example: DeviceLock, Scene, TabScene
camelCase : The first letter of an identifier is lowercase and the first letter of each subsequent concatenated work is capitalized. No underscores are used. For example: deviceLock, scene, tabScene
UPPER_CASE : All letters in the identifier are capitalized. Concatenated words are separated by an underscore.
Type Names
Type names in C++ include class, struct and interface identifiers, enum typenames, and typedefs. In general, type names should be noun phrases, where the noun is the entity represented by the type. For example, Button, Stack and File each have names that identify the entity represented by the type. Choose names that identify the entity from the developer’s perspective; names should reflect usage scenarios. Use these guidelines:
- Use UpperCamelCase
- Use nouns, noun phrases or occasionally adjective phrases. Do not use verbs.
- Consider ending the name of a derived class with the name of the base class.
- Prefix interfaces with the letter I. Do not prefix class names with the letter C, nor structs with the letter S, nor template class names with the letter T.
- For a class that simply implements an interface, consider ending the class with the interface name, sans the I prefix.
- Do not use abbreviations, except those that are commonly recognized (Io, Ctrl etc).
Template parameter names
Template parameter names are a bit of a special case. Choose descriptive names for template parameters, unless a single-letter name is completely self explanatory and a descriptive name would not add value (consider using the letter T in such cases).
- Use UpperCamelCase
- Prefix the parameter name with the letter T. Although template parameter names are usually types, it’s usually important to differentiate them from non parameterized names. Someday our integrated development environments may provide a nice method of doing this without mangling the name; say, by displaying templated name in italics for instance.
- Consider indicating semantic constraints placed on a type parameter in the naming of the parameter. For instance, a parameter constrained to the type ISignInMessageReceiver may be called TSignInMessageReceiver.
- Use nouns or noun phrases for object types and object instances and verbs for functor or function object parameters.
- Do not use abbreviations except those that are commonly recognized.
Enumeration value names
In C#, references to enumeration value names must be prefixed by the enumeration type name. Unfortunately, C++ has no such requirement; only the value name is referenced. To accommodate this in our naming convention, follow these rules:
- Use camelCase
- Declare enum types within a scope appropriate to its usage
- Use nouns, noun phrases or occasionally adjective phrases. Do not use verbs.
- Do not use abbreviations except those that are commonly recognized.
- Optional: many C++ developers prefer to prefix enumeration value names with the name of the enumeration type.
For instance, enum State { stateIdle, stateReading, stateWriting }.
The prefix should not be an abbreviation of the enum typename.
Preprocessor symbol names.
Preprocessor symbols (macros) do not exist in C#, so the CLR naming convention offers little guidance. The C++ industry standard is to declare macros LIKE_THIS.
- Use UPPER_CASE
- #undef temporary macro names
- Do not use abbreviations except those that are commonly recognized.
- Macro names should be at least three characters
Method and function names
Methods are actions upon an object and their names should employ verbs and verb phrases. Do not select a name that describes how the method operates; in other words, do not use implementation details for your method names.
- Use UpperCamelCase
- Use verbs or verb phrases
- Do not use implementation details in a method’s name
- Consider prefixing event handling methods with “On”, such as “OnInitialize”
Field names
Although the C# naming convention proscribes exposing any fields with public or protected protection, and recommends UpperCamelCase for private fields, I have found this to be a bit clumsy to employ in C++. Moreover, its inconsistent with C#'s variable naming rules. Instead, I propose the following for field names.
- Use camelCase
- Use descriptive names, typically nouns (though function object fields may be verbs)
- Use plural names for collection fields rather than suffixing with the container type. Ex: “names” instead of “nameList”
- Some developer prefix private class fieldnames with “m_” (as with MFC classes). I personally prefer to suffix private class fieldnames with an underscore likeThis_, as does Alexandrescu and other prominent C++ developer/authors. Perhaps someday most IDEs allow us to visually identify member field names (say, with italics for instance) to differentiate them from other names; until then this kind of name decoration has shown to be useful.
- Do not decorate field names with type information (as in Hungarian notation).
C++ Property Names
Did you know that Microsoft’s C++ has properties? Well it does, though the syntax is a bit clumsy. It uses a special __declspec directive. I usually define a set of MACROs that ameliorate the clunky syntax (I will write on that more in another blog posting).
- Use UpperCamelCase
- Use nouns, noun phrases or adjectives.
- Prefix Boolean property names with Is, Can, Has etc. where it contributes to readability.
- Prefer Boolean property names with affirmative phrases (CanSeek instead of CantSeek).
- Properties imply simple value lookups or trivial computations so do not use a property when non-trivial computations are involved. Instead use a method.
- Prefer to name the "getter" and "setter" functions with GetPropertyName and SetPropertyName respectively (though this is not strictly necessary).
Parameters and auto/local variable names
· Use camelCase
· Use descriptive names which reflect how the variable will be used
· Do not decorate names with type information (as in Hungarian notation)
· Use nouns, noun phrases or an adjective, except for function objects
· Use plural names for collection/container fields rather than suffixing with the container type. Ex: “names” instead of “nameList”
· Avoid declaring variables in the global scope; instead declare them as variables within an appropriate namespace and follow the naming convention for C++ Properties.
· Do not declare a variable with the static keyword at global scope; this is a deprecated language feature. Instead, use an anonymous namespace.
Namespaces
- Use UpperCamelCase
- Use nouns or noun phrases
- Do not use generic type names that might conflict with class names (eg. Element, Node, Log, Message)
- Consider using plural names where appropriate: eg. Strings instead of String
- Do not use the same name for a namespace as a type within the namespace.
- Do not place application specific namespaces within the namespace of a shared library namespace.
Use of Acronyms
Acronyms are generally proscribed, as they reduce readability especially to those programmers for whom English is a second language. However, an acronym may be used if it is generally recognized by your programming community and if it doesn’t reduce readability. Examples include DB, IO, Xml, Cpu, Gpu, Html, etc.
- Capitalize both characters of two-character acronyms, except the first word of a camelCase identifier. Example: DB, IO for UpperCamelCase and ioChannel for camelCase.
- Capitalize only the first character of acronyms with three or more characters, except the first word in a camelCase identifier.