Update: this blog is no longer active. For new posts and RSS subscriptions, please go to http://saintgimp.org.

In the process of my continuous learning about agile development, one of my biggest problems is that it’s easy to find materials that say, “Do this, don’t do that,” but offer only trivial examples at best.  I’m always wishing for some non-trivial examples of what the principles, or the violation of the principles, look like in the real world.  Part of the reason why I put GenesisEngine up on GitHub and am blogging about it here is to provide some slightly-less-than-trivial examples of good techniques, but just as importantly, examples of mistakes and how to fix them.

True confessions

So, I have a confession to make.

I work on GenesisEngine in my spare (ha!) time and progress has been kind of slow.  I spent a fair amount of time up front on infrastructure and was careful to build it all with TDD and SOLID principles.  I had a good time building that stuff but really, all of the interesting parts have to do with the generation and rendering of the terrain itself.  Everything else is just plumbing.

So after spending quite a number of weeks telling my friends that I was working on this nifty terrain engine project but having only a featureless white ball to show them, I really, really wanted to get something working that resembled actual planetary terrain.  The problem was moderately complex and I was growing impatient.  I started cutting corners.  My QuadNode class started out fairly small but it quickly accumulated a lot of responsibilities and started to sprawl all over the place.  I was violating the Single Responsibility Principle, and frankly, I made a mess.

Warning signs

One of the early warning signs that you have a fat class that does too many thing is that it’s not fun to write tests for it.  Rather than just having simple inputs and outputs to manage in your tests, you have to construct elaborate chains of actions just to get your object into the state that you want to test.  The tests aren’t elegant statements of intent; they’re full of confusing noise.

You’ll also see a lot of combinatorial explosion going on where you get a ridiculous number of different contexts that say, “when A is true and B is true and C is true”, then “when A is true and B is true and C is false”, and so on through all the combinations of states.  It’s tedious to write tests for each combination, especially when they’re messy tests anyway.

As I got deeper into the functionality of my quad tree terrain generation and rendering system, I started to clearly see those warning signs in my code.  But . . . I just wanted to get something working.  Like, now, darn it!  I was tired of waiting, and I resisted the obvious need to refactor the QuadNode class because it would take more time than I was willing to spend.  Rather than stopping to figure out how many responsibilities I had running around in QuadNode and then figuring out how to tease them apart into separate classes, I simply stopped writing tests for that class.  Once I did that then it was easy to not build the Perlin noise generation system test-first either.

Stampede!

In my non-technical life I’m into long-distance backpacking and in that world we have a term for when you’re about half a day out from town after four or five days and 100+ miles on the trail, and someone says the magic word.  “Pizza.”  Or maybe “hamburgers”.  The technical term for what happens then is “stampede”.  All common sense and self-preservation go right out the window and everyone hurtles down the trail at breakneck speed in an effort to get to town.  Sometimes people punish their bodies in ways they end up regretting later.

We stampede in software development, too.  We spend a lot of time being careful, doing thing right, making steady progress, but at some point close to the end we sometimes say, “Ah, screw it, let’s just hack it together and make it work!”  The result is usually something we call technical debt.  You build up a pile of messy stuff that you’ve got to go back and fix later.

I guess that’s not always a bad thing.  If you’re trying to hit an aggressive deadline and you need to just throw yourself forward right at the end, building up some technical debt is a valid way to do that.  Or if, like me, you just want to see something working and you’re not willing to wait, you can hack something together to scratch that itch.

The really evil thing about technical debt is not the short-term impact of creating it.  You can sometimes derive a lot of benefit from technical debt in the short term.  No, the evil thing about technical debt is when you don’t immediately go back and clean it up once your short-term goal is realized.

Anatomy of an SRP violation

Right now the QuadNode class is 472 text lines long.  Visual Studio code analysis reports that it has one of the worst maintainability indexes of any class in the project.  It has at least three big responsibilities jammed into it right now:

  1. As the name implies, it acts as a node in the quad tree.
  2. It also owns the job of generating heightfield data, vertex buffers, and index buffers.  This clearly has nothing to do with #1.
  3. It also has to decide when it’s appropriate to split itself into four node children or to merge and delete its children.  I first thought that was a trivial aspect of #1 but it turns out to be a huge deal in its own right.

Here’s one of the QuadNode spec contexts.  When a node is updated, it may decide to do certain things based on the state of the world.  In this case, when a non-leaf node (that is, a node that has children) is far enough away from the camera, the children nodes should be removed and disposed because we don’t need that level of detail any more.

[Subject(typeof(QuadNode))]

public class when_a_nonleaf_node_is_updated_and_the_camera_is_far : QuadNodeContext

{

    public static DoubleVector3 _nearCameraLocation;

    public static DoubleVector3 _farCameraLocation;

 

    Establish context = () =>

    {

        _nearCameraLocation = DoubleVector3.Up * 11;

        _farCameraLocation = DoubleVector3.Up * 15 * 10 * 2;

 

        _node.InitializeMesh(10, Vector3.Up, Vector3.Backward, Vector3.Right, _extents, 0);

        _node.Update(new TimeSpan(), _nearCameraLocation, DoubleVector3.Zero, _clippingPlanes);

    };

 

    Because of = () =>

        _node.Update(new TimeSpan(), _farCameraLocation, DoubleVector3.Zero, _clippingPlanes);

 

    It should_remove_subnodes = () =>

        _node.Subnodes.Count.ShouldEqual(0);

 

    It should_dispose_subnodes = () =>

    {

        foreach (var subnode in _node.Subnodes)

        {

            ((IDisposable)subnode).AssertWasCalled(x => x.Dispose());

        }

    };

}

This is not a horrible test set.  Believe me, I’ve seen (and written!) worse.  But let’s look at a couple of things that it’s trying to tell me:

  • The process of setting up a non-leaf node in the context is built on indirect side-effects.  Instead of just telling my class under test, “Hey, assume you’re a non-leaf node”, I have to initialize the node’s mesh, then call .Update() with camera coordinates that are near enough to cause the node to split itself and generate children, then call .Update() again with different camera coordinates that are far enough to cause the node to merge its children.  The spec isn’t able to say what it means explicitly; it’s very roundabout.  Someone unfamiliar with the code base would probably have to put in significant effort to understand how the spec works.
  • There’s no way to determine whether the QuadNode we’re testing decided to merge its children except by inspecting its children.  Again, this is relying on indirect side-effects.  There’s no way to get a clear statement from the class that says, “Yes, I’ve decided to merge!”, which is really what I’m interested in testing here.
  • This spec context is one of four that test a combinatorial set of conditions:
    • When a non-leaf node is far away from the camera
    • When a non-leaf node is close to the camera
    • When a leaf node is far away from the camera
    • when a leaf node is close to the camera
    • when a leaf node is at the maximum allowable tree depth and is close to the camera
  • There is another factor that isn’t even mentioned in these specs because I didn’t want to deal with a doubling of the condition set.  A node should only be split if it’s not over the horizon and out of sight, and it should be merged if it does get too far over the horizon even if the camera isn’t far enough to cause a merge on its own.  That would turn my five contexts into nine.  Yuck.

The implementation of .Update() in QuadNode is about as circuitous as these specs would lead you to believe.  There’s a lot of stuff going on in Update but it’s not clearly explained.  There are quite a few tests and branches and it’s not very maintainable.

So what’s the root problem here?  The root problem is that I violated the Single Responsibility Principle.  The decision of whether to split or merge a quad node is a good-sized responsibility all on its own.  There are different ways to make that decision and it’s probably something I’ll want to fiddle with a lot over time since it heavily impacts performance and memory footprint.  I probably need a SplitMergeStrategy class for the QuadNode to depend on, or maybe even separate SplitStrategy and MergeStrategy classes.

What would that buy me?  First, it would help break apart the combinatorial set.  The QuadNode wouldn’t have to care anything about the position of the camera or whether it’s below the horizon.  All it would have to know is that if it’s a leaf node, make a call to SplitStrategy, otherwise make a call to MergeStrategy.  If the return value is true, do the appropriate thing.

SplitStrategy and MergeStrategy, for their part, wouldn’t have to know whether they’re being called by a leaf or non-leaf node.  They trust the QuadNode to take care of that question.  They just need to think about the camera distance and the horizon and respond with yes or no.  Not only does that reduce the combinatorial set but it also makes the inputs and outputs very explicit.  Inputs are numbers, output is a boolean.  No mysterious multiple calls to QuadNode.Update to set up the context and no mysterious poking at child nodes to determine the results.

Cleaning up my mess

The technical debt I incurred certainly accomplished my short-term goal.  I’ve got a working proof of concept of a planetary terrain engine and I feel satisfied at reaching that milestone.  However, now I have a problem.  The implementation of my terrain generation is very naive and does all of its work on the main thread.  At low altitudes this causes so much stuttering as to render the program virtually unusable unless you first turn off updates, move somewhere, then turn updates back on and wait awhile.  The fix for that is obviously to a) enlist my other cores for terrain generation and b) do the generation asynchronously so that camera movement and frame rate aren’t impacted, even if I have to wait a bit for higher levels of detail to show up.

Well, yes, that’s a great plan except that my QuadNode class is a mess.  The code that I need to make more complex with threading and async logic is exactly the code that’s already overly-complex and obtuse and isn’t fully covered by tests.  Ah, ok, now we see the downside of technical debt.  You get a quick spike of progress and then a long, slow, painful slide into hell.

I’ve promised myself that before I do any more significant work on this project, I’m going to clean up my mess and break QuadNode into multiple classes with single responsibilities.  I’m curious to see how it turns out.  If you want to take a closer look at the code as it is at the time of this writing, the permalink to the current tree is here.