Welcome to MSDN Blogs Sign in | Join | Help
How to Easily Understand Unfamiliar Code

I suspect my coworkers may consider me mildly insane. 

In our group, we typically require that code must be seen and reviewed by at least one other person before checking in.  I try to do my part, and tend to respond to a lot of code reviews that come across my desk.  The thing is, I almost never review for algorithmic correctness.  My coworkers are fairly bright people, and honest enough to check that something actually works before sending out the code review.  Nor am I much of a coding guidelines nazi.  Actually we have some fairly ridiculous guidlines still hanging about; nearly everything that makes C++ different from C is considered somewhat esoteric, obscure, and prone to abuse.  I don't share in this particular form of language bigotry, so I don't bother to point out such violations.

Rather, my reviews tend to focus on issues of design, structure, and readability.  That's right, I'm the annoying git who sends you dozens of suggestions where a public method could have been private, or a variable declaration scoped inside a loop, or the use of std::auto_ptr could have saved you from manually coding up an object dtor, or that you should have moved an if statement from the caller to the callee, or that you duplicated five lines that ought to be pulled out into a separate function, or that you've added a superfluous member variable that really ought to be passed around as a function parameter.  All these ridiculous, nitpicky, useless suggestions ...

Or are they so useless?  The thing is, I have a bit of a reputation in my group for being able to jump into unfamiliar code and understand with unusual speed the overall design.  And as I think about it, I don't think it's because my brain neurons fire faster than anyone elses'.  It's just that I look at different things than they do.

Suppose that you've suddenly been dumped into a project with tens of thousands of lines of C++ code, and you need to come up to speed on it yesterday.  Obviously you can't read it all in one day.  You've got to start somewhere.  You've got to simplify the task somehow.  But how? 

Do you jump into the code?  Find the biggest function you can find and start tracing through it -- reasoning being that most of the interesting code must be in the longest functions?  Good luck with that.  Maybe you go scavaging among the function header comments.  If so ... you poor misguided fool.  Don't you know that function header comments only come in three flavors? 

  • Comments that are blank templates no one ever filled out
  • Comments that are two years out of date with the code
  • Comments that are gee-duh rephrasings of the function name and parameter list. 

Ignore what your college professor told you; function header comments are useless.  Don't even bother reading them.  In fact, try to avoid writing them, except in cases where they're absolutely necessary.

Let me share a simple trick with you.  Ignore the CPP files.  Look at the headers only.  In fact, don't look at all of the headers, just look at the classes.  And don't look at the entire class - ignore all the private methods. The secret is that nearly everything you really need to know about a class can be inferred from the public methods it supports and the private fields it contains.  And if there's a comment there explaining the class's role in life, you're really in business. 

Back in the Structured Programming Era (back in the Jurassic age, I believe), Fred Brooks observed, "show me your flowchart and conceal your tables, and I shall continue to be mystified; show me your tables and I won't need your flowchat, it will be obvious".  In modern terms: understand the underlying data structures, and the algorithms can be inferred.  Once you know what's in an object, and especially what operations the outside world can do on the object, the actual method implementations write themselves.

For this reason, class diagrams are always far more interesting than sequence diagrams. 

If you start doing this, then you actually start caring about whether an object has too much state in it, or whether it has too many public methods.  They become a distraction.  It actually makes a difference if you use std::auto_ptr to define a member variable, as opposed to a plain old pointer -- std::auto_ptr explicitly documents that the object is contained rather than shared, which is enormously useful information to know.  And you finally stop that amazingly annoying knee-jerk habit of declaring every class and every function in a .H file, regardless if used outside of one .CPP file or not.

To be sure, if the code is genuinely a rat's nest of dependencies, no technique in the world is going to help you.  But more often, deep within that mess of code you've inherited, there's some semblance of structure there struggling to break free.  With practice you can learn to discern the patterns which indicate that underlying structure.

And, I think, you become a better programmer because of it. 

Posted: Tuesday, February 20, 2007 11:03 PM by cashto

Comments

Monkeyget said:

Interesting.

I'm really bad/slow at reading source code no matter if it's delibarately obfuscated code, good code, or code i wrote 6 months ago.

I usually open the various source files trying to get a general sense of what's going on (and fail).

Also I don't like modifying code until I have deep understanding of how it's working (which is hard since there is a lot of details to keep in mind).

I'll definitley try the 'public interface' technique the next time i'll read foreign code.

# February 22, 2007 3:13 PM

cashto's blog said:

After having reviewed three ... suboptimal ... design documents this week, I feel I should really say

# August 21, 2007 4:40 AM

Gopi said:

Fine for me just i will go through ur steps

# October 15, 2007 2:45 AM

mohammed said:

or just read "code reading" a very useful book in the

same subject :)

http://www.spinellis.gr/codereading

# October 11, 2009 4:30 PM

cleek said:

"It's just that I look at different things than they do."

well, sure. not having to work through the chore of solving the problem the original coder solved is a luxury that allows you to look at things like organization and structure. you've skipped over the "thank god i've got this working" stage and landed smack in the middle of "since i've got nothing better to do, how about i make this code pretty" stage.

# November 4, 2009 3:47 PM
Leave a Comment

(required) 

(required) 

(optional)

(required) 

  
Enter Code Here: Required

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Page view tracker