I have always been interested in reading great code from others because -- like Scott Hanselman -- I strongly believe that reading other people's code is the most effective way of becoming a better developer. It's a bit unfortunate then that our [development] culture has an obsession with bad and poorly written code. I must confess that I too syndicate and peruse The Daily WTF with morbid fascination. I have even dabbled in the notion of submitting my own entry to the IOCCC. But why is it that poor code is so much easier to come by? Are there any sites out there that celebrate achievements in great code? Sure it's easier to write bad code than great code; but could it be that there is really no such thing as [standalone] great code? To answer that, we first need to know whether you would recognize great code if you saw it.
If you ask most professional developers what great code looks like they'll inevitably throw around words like elegant, performant and maintainable. Others might use words like robust, scalable or resilient. None of these people are actually great developers. It's like asking college students what they are looking for in a significant other; what they say they're looking for versus what they're actually looking for are worlds apart. And they often times don't even know it. Everything I learned about writing great code, I learned from the following mantra: correct, clear, concise and then fast. Even then, this is just a prescription for how to write great code; not necessarily any attributes of it. In fact, great code often goes undetected (drawing minimal attention to itself) and as a result is generally a thankless job. Like thousands of IT professionals or operations engineers already know -- called upon or scrutinized only -- when trouble hits the fan. Trouble? No praise. No trouble? No praise.
The truth is that a lot of code starts out pretty great. Then quickly starts to decay as bugs are fixed and features are inevitably added. Sometimes this decay begins before the project is even completed. The rate at which your software decays is directly proportional to how well it was originally designed. But designing software is all about trade-offs and compromise. I won't delve into the process or merits of design here since there are reams of books from people much, much smarter than I who have already, and continue to, tackle these problems. A revealing synopsis though would probably read as follows: designing software well is hard. The magic of software ultimately comes from its malleability and it is unfortunately this very same characteristic that has turned this science into an art.
You can validate this by asking any developer [working on a fairly sizable project] what they think of their code base and they'll almost always roll their eyes or sigh deeply. They might comment that a single component or feature was exceptionally well-done but the overall system is convoluted, unmaintainable, and sacrilegious by any engineering standard. As a bonus, especially if they're young and ambitious, they might even suggest that a complete rewrite is necessary. I used to be a big fan of rewrites until I learned that new code is always more buggy or less functional than old code (and sometimes both!). What's the take away from all this? Is it that there is no such thing as great venerable design and, by extension, no such thing as great code? We know that change is inevitable. We know that absolute correctness is impossible. What can we then do as software developers to help ourselves? There are infinite potential processes to mitigate against these unjust constraints but processes alone are not enough.
Great code conveys its intent clearly and concisely. It's clarity comes from its simplicity. Its simplicity from its conciseness. Just as the mastery of the English language is the cornerstone of great English writing, mastery of said programming language is also a prerequisite in producing great, high-quality code. A large source of TDWTFs -- albeit entertaining -- come simply from a developer's ignorance of a language or framework feature. Be it manually parsing dates, rolling your own hashing algorithm, or writing functions with cyclomatic complexities in excess of two hundred and forty-three. All pitfalls that could have been avoided by simply learning the tools of the trade.
All this preamble finally brings me to my original point. In the past month or so, I've started to systematically fix (or rather try to fix) the majority of style violations throughout our code base. It is a modest but long-term effort to increase code quality on our team. The code base is several hundred thousand lines of C# and ASP.NET code. As I started to chug through the code, I noticed that some projects were much easier to fix than others and there was little correlation between the actual number of violations and the effort required to bring these projects into compliance. This is because the vast majority of violations are actually simple (and low risk) changes. What became most troubling though was code whose intent was not clear. Or more appropriately: it was obvious what the code was doing but the intention of the developer was clearly not. I could cite many different examples but one simple yet very common pattern was in the initialization of private members. Consider the following highly simplified and contrived example:
1: public class ServiceConfig
2: {
3: private string serviceUrl = "";
4: private string friendlyName = "";
5: private bool isEnabled;
6:
7: public string ServiceUrl
8: {
9: get { return serviceUrl; }
10: set { serviceUrl = value; }
11: }
12:
13: public string FriendlyName
14: {
15: get { return friendlyName; }
16: set { friendlyName = value; }
17: }
18:
19: public bool IsEnabled
20: {
21: get { return isEnabled; }
22: set { isEnabled = value; }
23: }
24: }
This code has three specific style violations: (1) lack of documentation for public members, (2) missing the 'this.' prefix when referencing instance members and (3) the hard-coded value of "". It makes no odds really whether you agree with the proposed style. The key take away is that it is unclear whether the original developer had the specific intention to initialize the values to empty strings or if it was just out of (bad) habit. You could take an educated guess that maybe the developer wanted to avoid returning null values when the object was uninitialized. A bit odd since both empty and null strings are almost always considered in the same scenario (hence the prevalence of string.IsNullOrEmpty checks). Moreover, the developer makes no effort to prevent the property from being set to null either; further enforcing the original assumption that it was done for no particular reason other than habit or lack of forethought. On the other hand, the IsEnabled property is defaulted to false by the compiler automatically and clearly identifies the intention of the developer. That is, if the developer did not intend for the IsEnabled property to start out as false then this is an issue of (in-) correctness. Therefore, it is easy to refactor the IsEnabled property to an automatic property without changing the semantics of the class. This though does not hold regarding the other two properties. I can almost assure you that the developer had no specific intention given the context of the class but the change would no doubt cause subtle differences in its semantics. And the last thing I want to do in my style compliance effort is to change the semantics of the code in any way, shape or form. That is not the point of style standards. I ended up refactoring the code to something along these lines:
3: private string serviceUrl = string.Empty;
4: private string friendlyName = string.Empty;
5:
6: /// <summary>
7: /// Gets or sets the url of the service.
8: /// </summary>
9: public string ServiceUrl
10: {
11: get { return this.serviceUrl; }
12: set { this.serviceUrl = value; }
13: }
14:
15: /// <summary>
16: /// Gets or sets the friendly name of the service.
17: /// </summary>
18: public string FriendlyName
19: {
20: get { return this.friendlyName; }
21: set { this.friendlyName = value; }
22: }
23:
24: /// <summary>
25: /// Gets or sets a value indicating whether the service has been enabled.
26: /// </summary>
27: public bool IsEnabled { get; set; }
28: }
The code is now style compliant but not necessarily any better (readable) than the original. In fact, I would argue that the next developer who comes by this code would be more inclined to believe that the empty string defaulting was intentional (whereas we made our changes under the exact opposite notion). For one, the inconsistent use of automatic properties leads one to believe that there was a reason behind the inconsistency. Secondly, the explicit use of string.Empty potentially reinforces the notion that the developer wanted to initialize these values to empty strings (e.g. it was not out of sheer laziness or habit). By trying to improve the style of the class, we have potentially changed (either better reinforced and completely misunderstood) the original intention. The repercussions in this contrived scenario are obviously minimal but these class of regressions are subtle and often times very difficult to track down because the code looks easy and simple.
How could this debacle have been avoided in the first place? For starters, the first thing that may come to mind is the addition of comments. If the original developer had explicitly added a comment along the lines of "initializing to empty strings"; there would be minimal guesswork involved on our side. It may still not have been the correct thing to do but at least the intent was clear. But comments such as these often times seem superfluous. If I had read those comments in a different context my gut reaction would have been something cynical and sarcastic like; "O rly? And here I thought it was initializing the darned thing to 42. Thanks for the clarification, dimwit!" But really, as Jeff Atwood points out, you should strive to code without comments. To clearly convey the intent of the original developer (if indeed the intent was to initialize the properties to an empty string), the class should have looked something like this (with style violations intact):
3: private string serviceUrl;
4: private string friendlyName;
5: private string isEnabled;
7: public ServiceConfig()
9: serviceUrl = string.Empty;
10: friendlyName = string.Empty;
13: public string ServiceUrl
15: get { return serviceUrl; }
16: set { serviceUrl = value; }
19: public string FriendlyName
21: get { return friendlyName; }
22: set { friendlyName = value; }
24:
25: public bool IsEnabled
26: {
27: get { return isEnabled; }
28: set { isEnabled = value; }
29: }
30: }
This not only avoids the superfluous comments but also makes the intent much clearer. We can then move along without much thought or hesitation and cleanup the class per our style and coding standards without changing the semantics of the code:
3: /// <summary>
4: /// Initializes a new instance of the ServiceConfig class.
5: /// </summary>
6: public ServiceConfig()
7: {
8: this.ServiceUrl = string.Empty;
9: this.FriendlyName = string.Empty;
10: }
11:
12: /// <summary>
13: /// Gets or sets the url of the service.
14: /// </summary>
15: public string ServiceUrl { get; set; }
16:
17: /// <summary>
18: /// Gets or sets the friendly name of the service.
19: /// </summary>
20: public string FriendlyName { get; set; }
21:
22: /// <summary>
23: /// Gets or sets a value indicating whether the service has been enabled.
24: /// </summary>
25: public bool IsEnabled { get; set; }
26: }
The question of why these properties are being initialized is still unclear but at least we know with greater certainty that they ought to be. Sometimes that makes all the difference.
Adieu. Navid.