Ow! My Foot!

Ow! My Foot!

  • Comments 19

Numbers are bad.

That might sound like a strange statement coming from a professional software developer with a degree in mathematics. Let me try to justify that.

Numbers are bad the way that power tools are bad. I mean, I like power tools. A lot. But you give me a power tool and pretty soon I start making up excuses to use it, whether it makes any sense to use it or not. And do I take the time to use it carefully, and read the manual first, and put on safety glasses? Usually. But maybe not always -- I haven't nailed my foot to the floor yet, but it's probably only a matter of time. When used judiciously by well-trained people, power tools can do great things, but there are a lot of trips to the emergency room for the rest of us dilettantes.

I see this all the time at work, on the news, everywhere: people lured in by the power of numbers, making up new, but not necessarily sensible, uses for the numeric power tools at their disposal.

Let me give you an example of a time when I almost misused numbers recently.

Once a year in the summer at Microsoft we have a formal review of what we've accomplished over the last twelve months and what we want to accomplish in the next. The review process is kind of complicated, and I don't want to go into the boring details of how it all works. Suffice to say that by the time we're done we have a long document describing accomplishments and goals, where the last thing on the review before the signatures is one of three categories. We could call the categories "I Met My Goals", "I Exceeded My Goals" and "I Totally Rocked!" (In actuality there are more categories, but the vast majority of employees fall into one of these three buckets, so we might as well consider just these three.) The review system here is not perfect, but it works pretty well and I have no complaints with it.

I said that we could call the categories "I Rocked!" and so on, but in fact we call them "3.0", "3.5" and "4.0". By naming the categories after numbers, it becomes very tempting to use power tools on them. Last night I got to thinking "I wonder what my career average for yearly reviews has been?" Fortunately, I stopped myself because I realized that I was about to nail my foot to the floor.

When you make something into a number you make it amenable to all the power tools that three thousand years of mathematicians have invented, whether or not those power tools make the slightest bit of sense. For example, you can't take an average of some Golds, some Silvers and some Bronzes. "Average" simply doesn't apply to those things! But (let's make up some fictional numbers for a fictional employee Alice) you can take an average of three 4.0's, two 3.5's and a 3.0 ! What could it possibly mean, that average? What can we deduce from it? Nothing, because we do not know whether the weighting is actually sensible.

After six years, Alice has a career average of 3.67 -- so what? Compare Alice against Bob, who has a career of six 3.0's. How do we compare two numbers? How about percentages? Yeah, that's the ticket! Clearly Alice rocks 22% more than Bob. Great! But to what do we apply that 22%? Salary? Bonus? Vacation time? What does it mean?

The 22% has no meaning because I'm not taking a percentage of anything meaningful! I have no evidence that the system was deliberately designed so that 3.0, 3.5 and 4.0 have meaningful mathematical properties when averaged. What if the numbers were 1.0, 10.0 and 100.0? Then Alice would have an average of 53, Bob would have an average of 1, Alice is over 5000% "better"! What if the numbers were 0.0, 1.0 and 2.0? Then Alice would have an average of 1.33, Bob would have an average of 0.0, and percentage differences would suddenly cease to make any sense.

By changing the weightings, the comparative benefit of being an overachiever changes and even the set of mathematical tools you can use changes. If the weightings are arbitrary, an average is also arbitrary, so two averages cannot be sensibly compared. This is the old computer science maxim in action, only this time its "arbitrariness in, arbitrariness out". You can slice and dice numbers until the cows come home, but since the original data were completely arbitrary choices that do not reflect any measurable feature of reality, the results are going to be little more than curiosities.

ASIDE: I am reminded of a sign I saw in a pottery factory I once toured in England. It said that the 1100 degree Celcius kiln was "eleven times hotter than a kettle boiling at 100 degrees Celcius." I, being a smartass, asked the tour guide whether it was also "negative eleven hundred times hotter than an ice cube at -1 degrees Celcius." I got in reply a very strange look. Some things you just can't sensibly add, subtract, multiply and divide even though they are numbers. (Temperatures can only be divided by temperatures when measured in an absolute scale, like degrees Kelvin.)

It gets worse. In this particular case, we don't even need to consider the question of whether the weighting is sensible to determine that averages are meaningless. We can determine from first principles that an average like "3.67" is meaningless.

Consider a pile of two-by-fours, all of which are exactly 3.0, 3.5 or 4.0 feet long. You toss them into three piles based on their size, multiply the number in each pile by the length, add 'em up, divide through by the total number, and you obtain an extremely accurate average. Why you care what the average is, I don't know -- but at least you definitely have an accurate average.

Now consider a pile of two-by-fours of completely random lengths, but all between 2.75 and 4.25 feet long. Divide them up into piles by rounding to the nearest of 3.0, 3.5 and 4.0, and again, multiply the number in each pile by 3.0, 3.5 or 4.0, add 'em up, divide through, and what have you got?

You've lost so much information by rounding off that you've got an "average" which is only likely to be close to the actual average when the number of two-by-fours is large. Furthermore, I said "random", but I didn't say what the distribution was. In a "normal" or "bell curve" distribution, a 2.76 is NOT necessarily just as likely as a 3.01, and you have to take that into account when determining what the likelihood of error in the average is.

When the number of two-by-fours total is tiny -- say, six -- and you're averaging three 4.0 +/- .25, two 3.5 +/- .25 and one 3.0 +/- 0.25, well, I'm not sure what you've got. You've got 3.67 +/- "something in some neighbourhood of 0.25, some of the time", where I could work out the "somethings" if I dug out my STATS 241 notes (or called up my statistician friend David, which would be faster).

My point is that because of rounding error, the so called "average" is smeared over so large a range that it is largely useless.

Probably many of those 3.5's are "actually" 3.7's who didn't quite make it to the 4.0 bucket. But that information about the extra 0.2 is lost when it comes time to take an average. 3.67 is way, way too precise. All we know about Alice's average is that her actual average is somewhere between 3.0 and 4.0, probably closer to 4.0 -- which we knew already just from the range of the data!

And we're just on averages! We haven't even begun to consider the power-tool-mishap possibilities of, say, trend lines.

I'm glad I stopped myself. As a developer and a mathematician, I love both the practicality of numbers and the beauty of mathematics for its own sake. But just because something has a number attached to it doesn't mean that any of that mathematical machinery is the right tool for the job. Sometimes numbers are just convenient labels, not mathematical entities.


In other news, I was looking at the blog server statistics last night. Of the 207 Microsoft bloggers on this site, I'm ninth in terms of page hits from non-rss-aggregator browsers on a per week basis. Clearly, I rock. But what about you guys, my readers? If we take the number of comments and divide by the number of posts on a per-week basis, and then take a (geometric fit) trend line, I see that... OW! MY FOOT!

  • Excellent point, and I like what Damit brought up as well.

    I have been particularly annoyed at the current GPA system which doesn't really say anything about what a student is good at.

    I had friends in high school with near 4.0 gpas that were taking classes that an elementary school kid could probably get an A in. Yet I was getting 3.03 taking numerous AP classes. When you looked at the numbers, it looked like I was the dumb one (maybe I was, since I should of just skipped the worthless AP classes and enjoyed my life a little).

    I often wondered if it wouldn't be better to break things down into simple skill sets with a rating of pass or fail. Either you know it or not.

    Then there wouldn't be those people who, with the 3.0 in algebra, get into trig, but who really shouldn't have gone on, since the things they missed may have been really key concepts.

    Heck, if I were an employer, I would much rather see if a person's skill sets matched my job requirements more than if the person got a 4.0 in a related major. 4.0 doesn't tell me how good the person is, or what they are particularly good at.

    Anyway, sorry for going on and on, this one struck a nerve (who am I kidding, I always go on and on).

    Later

  • Re: "I'm ninth in terms of page hits "

    Dude..
    Google search of VBscript+Pause+reference stumbled me onto your blog and i HAVE to tell you...

    YOU DO ROCK ! ! !
    Fantastically entertaining yet still very educational blog. Thank you very much for sharing. ~ Bravo Sincerely, SloLearner



  • Hey, thank you for writing a great article about a topic that so many people seem to get wrong. A professor of the university I study at was back from a long hospital stay, rambling about the lack of understanding of exactly this topic. He stated that the whole medical system could be much more efficient, if all the fancy values that can be measured would really be understood and interpreted right by the doctors. Sadly, he died from a heart attack. Maybe, if he's right, he would be still alive, maybe not, but I strongly believe he has a point.
  • i really find this article interesting as well as helpful for preparing my statistics presentation.

Page 2 of 2 (19 items) 12