On configurable code

On configurable code

Rate This
  • Comments 5

I have a confession to make, but before I make it I want to clarify where I come from. I've been in software for many years now, and over those years I've configured systems and have seen and been exposed to many ways software can be configured, even written a few. I've come across systems from the bad (like .ini files) to the great (like Hadoop's hierarchical configurations). And I've been in situations where having that ability to change configuration saved the day. So with that clarified, here is my confession:

I hate configurable code.

I phrased that sentence quite deliberately. I didn't say configurable code is bad. I said I hated it. As in gut-feel reaction when I see such code or am involved in discussions about it. There are rational reasons for this of course that I'll go into, but yes this is subjective. One reason I'm writing this post though is that I've seen many times when developers are beaming with pride at how configurable their code is, and I wanted to add my voice to the dissenting side to at least give pause to such people that maybe having their code be that configurable is not as awesome as they think.

So since this is a subjective opinion post, I'll structure it not as a balanced pros-and-cons article but rather will start with my personal reasons for hating it, then will give a rundown of situations that people feel justify configurability and my opinions of them.

Reasons for my hate

Reason 1: expanded test matrix

As I'm writing this post, the tech world is abuzz with the news and analyses of the Heartbleed bug in OpenSSL. One relevant thing that struck me about it though is that theoretically the code was configurable with an #ifdef knob (one of my least favorite configuration mechanisms by the way) that should significantly mitigate the risk of this bug by not using the custom OpenSSL memory allocator, but practically people couldn't do that because nobody ever tested the code with this knob in the wanted setting. Which reinforced my internal bias on this: if you haven't tested your code with a possible value for a configuration knob, you might as well assume it doesn't work with it. And generally people will cite the combinatorial explosion problem as the blocker for testing all combinations of configuration knob, but in practice at least in my experience it's not even that: it's that people already struggle to find time and diligence to test the normal code paths, and it's rare that they would go out of their way as well to test the code paths under non-standard knobs.

Reason 2: hard-to-read and follow code

There are many reasons why you would want to read code to understand your system. Countless smart people have emphasized the point that code is read many more times than it is written, so it should be optimized for that. So when I'm trying to understand system behavior and I reach a point in the code that says:

int numThreads = Configuration.GetNumThreadsForFabricator()

I'm now going to be sent on a lovely discovery journey to find out where the hell that value is provided. Perhaps it's in some file that's checked in with the code. But perhaps that file gets overridden by another file that's bundled with the deployment code. Or perhaps our ops team changed that value back in June when we had that crazy problem with fabricator going out of control and never checked in that change anywhere. Or it could be something that we let individual customers change so there's no one true answer, but practically everyone either uses the default or sets it to 10 since that's what awesome consultant X recommended for everyone. In any case it's going to be a long and error-prone journey, especially when contrasted with something like:

// We experimented and 15 seems like a good number. See bug 555 for investigation results.
int numThreads = 15;

Reason 3: you (frequently) lose the goodness of source control

Source control is awesome. If you don't have source control for your code (no matter how big or small) then stop reading this post, go read instead on why you're a low-life criminal who barely deserves to live, then repent and go fix your ways. The problem here is that I've worked with decent software developers (including myself) who are all bought into the goodness of source control for code, but then configuration is stored on a database or on a file share or in the Cloud somewhere. And then a bug happens and we see that it's probably because there are 50 threads per fabricator in production and we bump into the eternal questions everyone without source control comes to ask: has it always been this way? If not: who changed it? When? Why? What was the value before the change? Did someone review this change?...

Good teams solve this usually by tracking their configuration in source control, which if you do then thanks and you can in good conscience consider this point moot for your situation. But I'm including it here because in my experience (again, subjective post here) more often than not this is a real problem.

Reasons I've encountered for configurability

Configurability as delaying the decision

Situation: I'm writing code for a buffering component and I've decided the best design for this one is to keep a fixed number of entries in the buffer at any time, except I have no idea what this number should be. This is for the production web-site that we run so my team needs to make that choice, but I'm writing complex code now and I need to get the logic right so can't be bothered by that detail now. Solution: make it a configurable knob! I'll heroically finish the code getting all the pesky logical details right, pass through code reviews without getting bogged down by any discussions on why this particular choice of numbers, then surely in integration testing we'll figure out the magic number to use. Or we'll just tune it in production - isn't configurable code great?

I think this type of situation is the primary reason for my gut-hatred of configurable code. I think of this type of situation as a milder version of this: I'm tasked with writing the method to compute the ideal number of cats in a given house, and I write the following code and call it done:

int computeIdealNumberOfCats(House house) {
    CatCalculator concreteCatCalculator = getCatCalculatorFromConfiguration();
    return concreteCatCalculator.compute(house);
}

It's very flexible! We'll just stick the ideal code for calculating this in the configuration once we test it out...

I hope you can see why I think this extreme example is repugnant (hint: see all my reasons for hating configurable code above). The reason I think saying "put the number of buffer entries in the configuration" is in the same category as above is that the number of buffer entries is code: it's an implementation detail. I fail to see the fundamental difference here between this and just putting snippets of code directly in configuration.

Personally, I think the responsible way of resolving this situation is by putting something like this in the code instead of a configuration knob:

// This is an arbitrary number for now. I've created a follow-up task, ID: 1234, to test and tune this.
private const int NumberOfEntriesInBuffer = 10;

And then to do follow-up check-ins with good comments once testing reveals what's a good value to hold for this. And if testing reveals that having a fixed value is not the right strategy but that the value should be calculated based on total machine memory, then we instead change the code to be this way. See: it's all code refactoring, whether it's changing the value, changing how we calculate it or changing how the whole component works so that this value doesn't even mean anything (we decide to buffer everything and not cap it at all for example).

Configurability as a short-cut for long check-in or deployment times

Situation: I work in a team that maintains the awesome cloud service Ancient Weather Guesser. We care about the quality of guesses we deliver about weather in ancient times, so we have a rigorous test pipeline where any change in the code has to be subjected to a series of functional/reliability/performance/etc. tests that take about a week to run. The problem is that every now and then some blogger sees that we think Julius Cesar was killed in a cloudy weather when obviously he was killed in ironically sunny weather and raises a stink about it and the big boss wants this fixed right now, so instead of fighting with process we decide to make the code wonderfully configurable so we can change our guess about any day in history with a simple configuration change that doesn't have to go through our barrage of tests so can go in right away. Problem solved, day saved, where's my bonus?

My position on this one is mostly loathing with an occasional reluctant acceptance. I think the treatment of configuration changes as simple changes that don't need to go through a proper test pass is irresponsible and dangerous office politics: in a lot of case configuration changes are code changes (see above) and they most certainly need to go through a proper test pass. I've personally witnessed this attitude lead to fundamentally bizarre contradictions: verifiably harmless check-ins e.g. into a little-used test tool triggers a test pass (because it's a code check-in), while a change in a configuration knob that controls a fundamental aspect of the service doesn't.

My occasional reluctant acceptance is for knobs that are explicitly put in as last-resort mitigations and responsibly tested. A good example of that is a knob that turns a whole new feature of the service off: we're enhancing the service by expanding to guesses about ancient Australia, but we're adding this enhancement with a kill switch so that if our code for whatever reason ends up bogging down the entire service, we can quickly kill this enhancement so the site runs as good as before and the only people pissed off are in Australia. The reason my acceptance even for this (by now fairly standard) practice is reluctant is that a) in a lot of cases it's not responsibly tested and b) in some cases it's not responsibly used, so that e.g. we have a mix in our deployments with some deployments having a different set of features on and off and people are now losing track of what's going on leading to bizarre situations.

Configurability as giving control and flexibility to customers

Situation: while testing out my future-telling app I found that some of my customers want precise predictions and are willing to forgive my mistakes there as obvious relativistic effects in the star positions, but some want vague general predictions that can't really go wrong. Rather than lose out on either group of people that want to give me money, I decide to put in a configurable precision knob that each customer can set to their desire.

This is the classic use of configurability. It's a hallmark of well-designed software (and otherwise) products that they strike a good balance between flexibility and complexity when considering this use case. My highly subjective opinion is that bad software developers tend to err on the side of complexity (give too many knobs to customers), OK ones err on the side of rigidity and just make best-judgment choices ("I've talked to people and most customers want precise predictions, let's go with that"), and great ones go the extra mile of sound statistical analysis of real data to make great choices ("people say they want precise predictions, but look at the correlation of app rating a month after we give them a vague vs. precise prediction; clearly they prefer having a correct-sounding vague prediction") and/or make the extra effort of presenting choices to customers as high-level concepts instead of raw configuration knobs ("let's have several virtual future tellers with names, faces and personalities that each have a different precision level, and let customers choose among those").

The common ground

This is obviously a complex subject with no one-size-fits-all answer. On the customer-facing side, some products have died (or at least should have died) because they were too configurable and thus perceived as too complex, but also some have died because they were not configurable enough thus perceived as too rigid. So too on the internal-workings side: some products are so configurable that they are impossible to understand and can't be changed without breaking at least some of the configuration settings (and who knows if those are the important ones), while some are so hard-wired that adapting any component to different circumstances always requires major refactorings. As explained, I personally prefer to err on the side of rigidity and hard-wiring, but I can definitely appreciate the nuances here.

So I wanted to close on some common ground between my camp and the pro-configuration camp: having a single point of change. I may prefer to write my code like this:

private const int NumberOfDogs = 10;

While the pro-config camp prefers to write this:

private int NumberOfDogs = config.get("Number Of Dogs");

But I think we agree that both ways are better than this:

Dog[] dogs = new Dog[10];
for (int i = 0; i < 10; i++)
    dogs[i] = fetch();

In the last snippet if I wanted to change the number of dogs I'd have to hunt through every place of the code where that is specified, whereas in the pro-config way I'd change the configuration, and in my way I'd change a constant in the code. So we can at least all agree to that: write your code so you only specify your parameters in one place (obey the great DRY principle). After that we can hash out over beers whether to leave that place as a constant in the code or to hoist it into config files.

Leave a Comment
  • Please add 5 and 7 and type the answer here:
  • Post
  • Very well written! And I totally agree with you, I also hate configurable code. I liked the CatCalculator example, have seen a lot of code like that in the wild.

  • Your entire premise here seems to based on the idea that if a hard coded value needs to be tweaked, you can simply change it, check in the change, and "redeploy", whatever that process entails.

    While some of us may live and work in environments conducive to this type of  highly iterative process, that is not always the case. Sometimes, the end user or the administrator, not the developer, needs to be able to tweak these values without incurring the people cost of a new release / deployment, which is what the idea of configurable code is all about.

  • Hi!

    I think configurability has it's place, and in many cases shouldn't really impact the "test matrix". Eg.: I'm a web developer, and so the code I develop has to work on different platforms. Multiple development machines, Staging and Live environments.

    So it is useful to be able to configure certain paths and calls to utility software on the server, because every environment might be different.

    The underlying code shouldn't be bothered whether to read from directory a or b or c, it just needs to be able to get the configured directory path, and then see if it can do what it needs to do there.

    The unit tests should check the code it tests in isolated units anyways, which means that the corresponding code would expect the path to be handed to it in some way in any case and tested thusly, meaning it shouldn't have different paths of execution depending on the configuration.

    The same with calls to external APIs etc.

    If pieces of functionality can be enabled / disabled via configuration, the code which decides / differentiates inside the software can be tested to see if its dispatching logic works as expected; and the executed code for each feature should be unit tested in itself and should work no matter how it's called, right?

    ===

    I think there are some simple distinctions to be made:

    * Some things don't belong in to Version Control and therefore need to be configurable somehow: Passwords, AWS Keys, API Keys and other "private" information, as once they're inside it most likely you won't get it out again, not good

    * Some things need to be configurable: Namely things which differ in different environments - 3rd Party URLs, Database Connection Parameters, Paths, ...

    * Also candidates for configuration:

       * Things which might be changed in the future and shouldn't require a developer to interrupt his workflow

       * If you're doing a module which might be used in different Projects / Contexts, it might also be apt to make it work in different environments depending on the user (but that's actually a similar point to the one above, see "Things which differ in different environments" )

    ===

    Also, in case of magic numbers like "NumberOfDogs" and "numThreads", I think it should be really easy for the developer to drop that value, or a default value, in a configuration file somewhere and be able to get at it as easily as possible. Then - Why not use that and reap the benefits of having a list of configurable parameters and their defaults in one easy to find location?

    Maybe there just isn't a good and simple enough configuration facility in some of your projects?

    Thanks & Best wishes

    Daniel

  • @Johan: Thanks!

    @Chris: I tried to address almost exactly this point in the "Configurability as a short-cut for long check-in or deployment times", though of course I oversimplified in there as one must when generalizing a complex concept. Please see my thoughts there, and of course take it with a grain of salt since real life situations are messy. Thanks for your thoughts!

    @Daniel: You make very good points. While writing this post I thought of making this distinction between "inputs" and "knobs" for the cases you mention (but ended up taking it out for the sake of brevity). Let's take your example of DB password as a good example of input: our DB component obviously needs to know the password to connect to the production database, and as you correctly point out this is one thing we don't want to store in version control. Traditionally people solve this by putting it in a config file (and hopefully do proper security practices around placing this config file). Which is OK for password, but to my mind has the same problems as global variables do in general, and I'd much rather have them be as explicit inputs that are passed around to precisely the locations that need them in an obvious and less error-prone fashion. It's a complex topic that I don't think I can address in a comment, and honestly practically speaking if I start a web site today I'll probably put passwords in a config file because that's what all the tools want and I pick my fights, but I'm hoping we continue to be aware of the downsides and keep looking for better options. Anyway thanks for thoughtfully weighing in!

  • Thank you for the excellent post.

    You provide well articulated points supported by examples of patterns we have all seen.

    Your points about source control and external configuration sources particularly hit home with me.

    I have worked on software that is designed to be so configurable that its behavior depends entirely on environmental state, eg. the current schema and contents of a database used for many other purposes, that it becomes to next impossible to predict if and how it will behave when deployed.

    Your post makes me recall a quote attributed to James Gosling

    "Every configuration file eventually becomes its own programming language"

Page 1 of 1 (5 items)