Fabulous Adventures In Coding
Eric Lippert is a principal developer on the C# compiler team. Learn more about Eric.
No computers today, but some interesting - and important - math. (And, happy Canada Day, Canadians!)
"Car Talk" is a popular weekly phone-in program that has been on National Public Radio for several decades now, in which Bostonian brothers Tom and Ray crack wise and diagnose car (and relationship) problems. On many programs they feature a "puzzler". Here's the puzzler from a couple of weeks ago, which I reprint in full:
Tim and Jethro were happy to have their jobs at the new self-serve gas station in town. And, since the Farmer's Almanac had predicted this to be the coldest winter since the last ice age, they were happy to be working indoors, while the customers pumped their own gas.
This station was so modern that it had a video camera for each of the pumps, and a TV monitor that would show the rear of everyone's vehicles as soon as they pulled up to the pumps.
When the boredom of their jobs finally set in, Tim and Jethro began playing a little game. The game involved trying to figure out which customers had pulled up to a pump with the fuel door on the wrong side-- that is, facing away from the pump.
Now, they couldn't see the cars pull in to the gas station. The video cameras were only aimed at the back of the vehicles. So, there was no time during which they could see the side of a vehicle where the fuel door was located. They could only see the vehicle after it was in position to refuel.
They had to make their bets before the driver shut off the key and exited the vehicle-- before he dope slapped himself for pulling in on the wrong side.
Jethro was correct 99 percent of the time. Tim was correct about 50 percent of the time, because he was just guessing.
What did Jethro know that enabled him to tell when a driver had pulled up to the pump with the fuel door facing the wrong way?
When I heard that puzzler, the answer was immediately obvious to me. Imagine my surprise when the answer that was immediately obvious to me was not the answer Ray gave this past Saturday! The answer given by Ray was that in almost all cars, the tailpipe is on the opposite side of the car from the fuel door. Jethro knows that if a car pulls up with their tailpipe on the same side as the pump then they're on the wrong side. Jethro is 99% accurate because almost all cars you actually see on the road follow this pattern; very few cars have a tailpipe in the middle, have two tailpipes, have the fuel door in the middle, have the tailpipe on the same side as the door, or other anomalies.
Now, I'm not willing to go as far as Tommy sometimes does and call BOOOOOOOOOOOGUS! on this one; I believe them that this is a 99% reliable heuristic for deducing the side the fuel door is on, and that if Jethro knew that, then Jethro could consistently defeat Tim. (I note that we are not given the parameters of how Tim is guessing, but we have reason to assume that Tim is guessing by some process akin to flipping a fair coin.) Clearly, when Jethro and Tim play this game, about half the games are going to be ties (because Tim is guessing right at random), but of the non-ties, Jethro is going to win most of the time. I'm not disputing that.
However, there is an extremely important aspect of this analysis which has been ignored. The percentage of drivers who drive up to the wrong side we know from experience is low. The only times I have driven up to the wrong side of the pump and had to back out is when in a borrowed or rented car (and many cars now have an arrow on the fuel gauge telling you where the fuel door is.) The vast majority of customers will pull up to the correct side. This additional fact, not given in the statement of the puzzler but reasonably assumed, changes the analysis.
Let's call a car which pulls up to the wrong side a "positive" (for reasons which will become apparent later) and a car which pulls up to the correct side a "negative". Jethro and Tim each make a prediction of whether a given car is going to be a positive or a negative. Let's call a car whose fuel door side can be correctly predicted by Jethro to be a "normal" car and one which cannot, an "unusual" car.
Let's assume that the percentage of "positive" drivers is 1%. Notice that this is the same percentage of cars that Jethro's heuristic predicts incorrectly. I've made the assumption that these percentages are about the same deliberately but really all that matters is that they are both small. We'll do a bit of fiddling with the numbers later to see what happens when we change those around. But for now, just assume that it is a coincidence that the rate of boneheaded drivers is roughly the same rate as the number of cars that Jethro cannot correctly predict the side of the fuel door.
Given that reasonable assumption, Jethro does not need to have his fancy heuristic in the first place! Suppose we replace Jethro with Bob, who always bets negative, regardless of the position of the tailpipe. This strategy is 99% accurate because 99% of the drivers are negatives!
That was the solution that was immediately obvious to me: deny the premise of the question. You can defeat Tim's coin-flipping random strategy by simply observing that negatives vastly outnumber positives; it is reasonable to always bet on negative.
Suppose Bob and Tim play this game a million times (it's a long winter in Boston) and Bob uses the "always negative" strategy, which, as we know, will be accurate 99% of the time solely on the basis of distribution of negatives vs positives. On average there will be 990000 negatives and 10000 positives. Bob will predict the negatives with 100% accuracy and Tim will predict them with 50% accuracy. So for those 990000 cases, Bob wins 495000 times, ties 495000 times, and loses never. Bob will predict the positives with 0% accuracy, Tim will predict them with 50% accuracy. So for those 10000 cases, Bob wins never, ties 5000 times, and loses 5000 times. Of the million games, Bob wins 495000 times, ties 500000 times, and loses 5000 times. As we'd naively expect, Bob wins 99% of the games which are not ties.
Now suppose Jethro and Tim play this game a million times and Jethro uses his fancy tailpipe strategy, which is 99% accurate on the basis not of distribution of negatives, but on ability to detect normals. The analysis appears to be just the same as before: There will be 990000 normals and 10000 unusuals. Jethro will predict the normals with 100% accuracy, the unusuals with 0% accuracy, blah blah blah, and will win 495000 times, tie 500000 times and lose 5000 times, same as Bob.
This demonstrates my point that Jethro doesn't need his fancy-pants strategy to beat Tim. Bob and Jethro both beat Tim exactly the same number of times, on average, with their respective strategies. But we can look deeper at Jethro's strategy:
Of those million trials, 99% of them will be normals. Of those normals, 99% of them will be negatives. Working out the percentages, on average we should see:
980100 negative normals - Jethro predicts negative, correctly9900 positive normals - Jethro predicts positive, correctly9900 negative unusuals - Jethro predicts positive, incorrectly100 positive unusuals - Jethro predicts negative, incorrectly.
Holy smackers! Jethro predicts positive 19800 times and is wrong 50% of the time, even with an overall 99% accurate heuristic! Considering only those cases where Jethro says positive, he is on average no more accurate than Tim, who is flipping a coin.
Now suppose instead of yokels trying to predict whether a driver is boneheaded, we have three doctors each trying to predict whether you have Tappet's Disease. Suppose further that 99% of the population is negative for Tappet's Disease: they do not have it. Dr. Jethro has a test for Tappet's Disease which is 99% accurate. (People for whom the test works are "normals", and 99% of people are normals.) Dr. Bob doesn't even bother to diagnose you, he just says "you're negative" every time. Dr. Tim flips a coin.
Suppose you go to all three doctors and they all say "you're negative". Remember, Dr. Bob and Dr. Jethro are both accurate 99% of the time, but Dr. Tim is only accurate 50% of the time. Clearly you have learned absolutely nothing from Dr. Bob, who didn't even look at you; the fact that he is 99% accurate tells you nothing about you. Clearly you have learned absolutely nothing from Dr. Tim; he flipped a coin right in front of you. But of every 980200 patients where Dr. Jethro says "negative", he is correct 980100 times and incorrect 100 times. Dr. Jethro's 99% accurate test is actually 99.99% accurate when he says negative.
Put another way: odds of Drs. Bob and Tim being wrong when they predict negative are both 1 in 100. Odds of Dr. Jethro being wrong if he predicts negative are 1 in 10000, a hundred times less likely. Dr. Jethro is taking advantage of the low incidence of positives and the accuracy of his test in a way that the other two are not. Dr. Jethro is way, way more reliable than either of the other two, provided that your result is negative.
But what if the result had been positive? (Obviously Dr. Bob would not say positive, so let's ignore him.) Of Dr. Jethro and Dr. Tim, which should you trust? If Dr. Tim says positive then you have a 99 in 100 chance that Dr. Tim is wrong. If Dr. Jethro says positive then you have a 50 in 100 chance that Dr. Jethro is wrong. Dr. Jethro is clearly still the winner here, but it is deeply counterintuitive to people that a test which is overall accurate 99% of the time can have a 50% false positive rate.
And of course the more rare the disease is, obviously the more likely it is that a negative result is correct; most people don't have the disease, so a negative result is likely. But the flip side is that the more rare a disease is, the more likely it is that a positive is a false positive: an artefact of flaws in the test.
Imagine if only one in ten thousand drivers pulled up to the wrong side. Jethro's 99% accurate heuristic would now be worse than Bob's "always guess negative" strategy because Jethro gets so many false positives.
This is a serious problem in medicine! The worst false outcome is a false negative - that is, the test says you do not have the condition when really you do. That's why Dr. Bob's strategy is completely unacceptable; all his false results are false negatives. But as we've seen, the mathematics of the situation means that given a Dr. Jethro with a reasonably accurate test for a condition with low incidence, false negatives are very rare.
But false positives cause unnecessary, potentially harmful or expensive treatment, not to mention unnecessary anxiety. The mathematics of the situation is that false positives are an extremely high percentage of positives when the inaccuracy of the test and the rarity of the disease are close to each other. Tests for rare conditions have to be incredibly accurate for the false positive rate to be low. The inaccuracy of the test has to be orders of magnitude less than the rarity of the condition.
This sort of probability analysis is based on Bayes' Theorem and it has many fascinating implications beyond just this quick sketch. It has implications in law, in spam detection, it comes up all over the place.
If Tom and Ray have any comments on this critique of their puzzler, I'd be happy to hear them.
They could easily tweak the puzzle to make sense with their reasoning if they rephrased it to limit the results to the set of those in which the driver drove up on the wrong side, i.e. "In all the situations in which a driver pulled up on the wrong side, Tim called it 50% of the time and Jethro called it 99% of the time." That should resolve the hangup.
If you like this sort of post, I recommend "The Drunkard's Walk" by Leonard Mlodinow.
It explains numerous statistical "gotcha's":
- Let's Make a Deal puzzle
- Lady with twins and one is X and Y
- The one from the post (I believe it's the prosecutor's fallacy)
- And the gambler's fallacy (it's gotta land on black, it's been red five times now... it's due!)
I am a practicing internist and I think you did a great job of explaining the implications of Bayes Theorem on the practice of medicine. I would like to share how this very logic effects my practice.
First of all the decision to test someone for a disease is obviously not as simple as someone coming up to me and asking for me to run a test. (although this does happen) Usually after taking a history and physical I have suspicion that someone might have a disease. Then I order a test that can confirm or refute my impression. So even if the disease in question is relatively rare in the general population, it is no longer unlikely in this particular patient. This of course depends on my ability to glean clues in my interview and exam that I feel increase the "pretest probability" of disease. If my impression of the pretest probability is sufficiently high, I will order the test and be relatively pleased with both the positive predictive value and the negative predictive value. Then I will make a firm diagnosis based on the result of the test.
Now for three different scenarios that really drive this home:
John is a 25 year old male with no risk factors for heart disease. He comes in complaining of sharp, burning chest pain after eating spicy mexican food. He is very concerned that he is having a heart attack. It is my opinion that his chest pain is very unlikely to be cardiac. I do not order further testing. (I might recommend Zantac.) I know that if I order a stress test and even if it is positive, it is still extremely unlikely that the patient has heart disease. (A nuclear stress test is considered to be about 90% sensitive and specific)
George is a 50 year old male whose dad died at age 70 with a heart attack. He smokes a pack per day, he is about 30 pounds overweight, and he doesn't like to take his blood pressure pills, He gets a funny feeling in his left chest when he carries his garbage to the curb. This is the most strenuous thing he does all week. Now my pretest probability is significantly higher. Maybe even 20%. I order a stress test. He passes or fails. Either way I'm pretty confident in the result. If it's positive I send him to the cardiologist. If it's negative, I reassure him and try to get him to work on his risk factors.
Joe is a 70 year old male with Diabetes, High blood pressure, hypercholesterolemia. He also smokes. Last year he lost his leg due to peripheral arterial disease. He comes in with crushing chest pain radiating to his jaw, shortness of breath, nausea, and heavy diaphoresis. I don't order a stress test because now my pretest probability is so high that I wouldn't be satisfied with a negative result. His chances of a false negative are probably higher than of a true negative. Instead I would send him straight to the cath lab.
Its important to note that even John's chances of having heart disease are not zero, there just small. However if I ran stress tests on people like him, about 10% or so would come back positive. Now we have a population of people with positive stress tests who are now worried that they might have heart disease. If we send all of those people for heart caths, about one in one thousand will have a serious complication like a stroke. So if his chances of having heart disease are less than one in 10 thousand (and they are) then it would be much riskier to be tested than not tested.