Floating Point And Benford's Law, Part Twohttp://blogs.msdn.com/b/ericlippert/archive/2005/01/13/floating-point-and-benford-s-law-part-two.aspxA number of readers asked for an explanation of my offhand comment that Benford's Law can be used to show that binary is the best base for doing floating point math. One of the desired properties of a floating point system is that the "representationen-USTelligent Evolution Platform Developer Build (Build: 5.6.50428.7875)Check Out The Big Brain on Erichttp://blogs.msdn.com/b/ericlippert/archive/2005/01/13/floating-point-and-benford-s-law-part-two.aspx#353202Fri, 14 Jan 2005 23:19:00 GMT91d46819-8472-40ad-a661-2c78acb4018c:353202Stack Of Toast<div style="clear:both;"></div><img src="http://blogs.msdn.com/aggbug.aspx?PostID=353202" width="1" height="1">re: Floating Point And Benford's Law, Part Twohttp://blogs.msdn.com/b/ericlippert/archive/2005/01/13/floating-point-and-benford-s-law-part-two.aspx#352533Thu, 13 Jan 2005 23:22:00 GMT91d46819-8472-40ad-a661-2c78acb4018c:352533Eric LippertYeah, I deliberately glossed over that.
<br>
<br>It's not just base ten being not special either -- we'd also expect that scientific constants should begin with 1 in base ten about 30% of the time whether they were measured in Calories or dynes!
<br>
<br>Essentially, Benford's Law works on any system in which the probability distribution for first digits ought to be invariant over changes in scale. You can use straightforward differential equations to show that the only such distribution must be logarithmic.
<br><div style="clear:both;"></div><img src="http://blogs.msdn.com/aggbug.aspx?PostID=352533" width="1" height="1">re: Floating Point And Benford's Law, Part Twohttp://blogs.msdn.com/b/ericlippert/archive/2005/01/13/floating-point-and-benford-s-law-part-two.aspx#352520Thu, 13 Jan 2005 23:03:00 GMT91d46819-8472-40ad-a661-2c78acb4018c:352520ZacSomewhere at home I have a math journal (you may have had the same one, Eric) which describes the reason for the distribution of the numbers in a slightly more detailed way than "multiplication explains it" ...
<br>
<br>It goes something like this: whatever distribution the leading digits you encounter have, you'd expect that distribution to be the same no matter what base you used to represent the numbers. A constant distribution of decimal digits will only be stable across representations if the numbers you encounter are evenly distributed across the real number system , which is preposterous (we encounter numbers between 1 and 999 far more often than numbers between six gooogle + 8635001 and six google + 8635999). This kind of dropoff of leading digits mirrors the dropoff of numbers you typically encounter (well, it is the same dropoff, really).
<br>
<br>It's the same reason why more scientific constants begin with 1 than with any other digit, which is sometimes used to indicate mystical forces at work in the structure of the universe. Actually the opposite is true; any other scenario would mean something magic about base 10.<div style="clear:both;"></div><img src="http://blogs.msdn.com/aggbug.aspx?PostID=352520" width="1" height="1">re: Floating Point And Benford's Law, Part Twohttp://blogs.msdn.com/b/ericlippert/archive/2005/01/13/floating-point-and-benford-s-law-part-two.aspx#352450Thu, 13 Jan 2005 21:17:00 GMT91d46819-8472-40ad-a661-2c78acb4018c:352450Eric Lippert> Is a signifcant digit of 8 several orders of magnitude less likely than a significant digit of 0 ?
<br>
<br>I assume you mean "of 1", not "of 0" -- no numbers begin with zero except zero!
<br>
<br>Yes, it is an exponential. In decimal, the probability that a number begins with a 1 is 30%, the probability that it begins with a 9 is 4.5%.
<br>
<br>It is an irony that in the hex scheme I mentioned above, that the typical relative error is smallest for exactly the situations where the leading digit is less likely.
<br>
<br>As you conjecture, if we could come up with a system that had the opposite property -- as numbers get larger leading digits, their precision diminishes, thereby granting better precision to more common numbers -- we might be able to squeeze out an even lower typical relative representation error.
<br>
<br>However, such a system is probably too complicated to be practical.<div style="clear:both;"></div><img src="http://blogs.msdn.com/aggbug.aspx?PostID=352450" width="1" height="1">re: Floating Point And Benford's Law, Part Twohttp://blogs.msdn.com/b/ericlippert/archive/2005/01/13/floating-point-and-benford-s-law-part-two.aspx#352441Thu, 13 Jan 2005 21:08:00 GMT91d46819-8472-40ad-a661-2c78acb4018c:352441Mark MullinHmmmmm. OK, for non-normalized numeric ranges and ruling out nyc investment bankers calculating their bonuses (and once upon a time, msft people calculating option yields) I'd agree.
<br>
<br>However now something has intrigued me - Is it per chance an inverse exponential that that dropoff follows ? Is a signifcant digit of 8 several orders of magnitude less likely than a significant digit of 0 ?
<br>
<br>If it is exponential, and purely as a mental exercise (the vast majority of the world has problems enought with ieee764), I wonder if Benfords law implies there is a more efficient representational form - in short, is there a mechanism where one can steal the magnitude bits for lower values (0,1,2,3) and use them in the mantissa ? (basically, make it more expensive/less accurate for the larger numbers in order to benefit the majority of the use cases)
<br>
<br>Consider an extreme case, where you dropped the range by permanently stealing a bit from the exponent (partial bit swipe possible but too complicated to be fun) - Now, if this bit was set to 1, you could assume that there was no exponent, and that all remaining bits were part of the mantissa
<br>
<br>I only say this because of another interesting side effect of base 2 systems - every bit you add doubles the resolution, so swiping back 1 or two bits for the most common exponents could double or quadruple the resolution of the mantissa.
<br>
<br>This is purely a bit of hypothetical fun however - in the real world I'll stay with fixed point :-)<div style="clear:both;"></div><img src="http://blogs.msdn.com/aggbug.aspx?PostID=352441" width="1" height="1">re: Floating Point And Benford's Law, Part Twohttp://blogs.msdn.com/b/ericlippert/archive/2005/01/13/floating-point-and-benford-s-law-part-two.aspx#352386Thu, 13 Jan 2005 19:41:00 GMT91d46819-8472-40ad-a661-2c78acb4018c:352386Eric LippertIndeed, the typical relative representation error for the binary system is always between 24 and 25 (binary) orders of magnitude smaller than the value being approximated.
<br>
<br>The typical relative representation error for the hex system is between 22 and 26 orders of magnitude.
<br>
<br>So the binary system does have the nice property that the average relative representation error is more tightly bound, and that alone might be a good reason to choose it.
<br>
<br>But we also need to demonstrate not just that the bounds are tighter, but that the 26 best case for the hex system does not outweigh the 24/25 cases for the binary system.
<br>
<br>It IS the case that floating point chips will on average do more math on values with a hex leading significant digit of 1, 2 and 3 than with 8-F put together.
<br>
<br><div style="clear:both;"></div><img src="http://blogs.msdn.com/aggbug.aspx?PostID=352386" width="1" height="1">re: Floating Point And Benford's Law, Part Twohttp://blogs.msdn.com/b/ericlippert/archive/2005/01/13/floating-point-and-benford-s-law-part-two.aspx#352367Thu, 13 Jan 2005 19:27:00 GMT91d46819-8472-40ad-a661-2c78acb4018c:352367Mark MullinEffectively in IEEE the granularity is a function of magnitude, or, absolute error varies but relative error remains (more) constant. I don't think it's good to mix 'Benfords Law' in here, as it isn't that some numbers are better than others, or more popular (think of Google :-D ) its just a mechanism that attempts to keep the relative error constant.
<br>
<br>If you wanted to maintain constant absolute error (and hence varying relative error) then normalized #s work pretty nicely and all the math can be done with integers and a little fiddling
<br>
<br>That said, nice post<div style="clear:both;"></div><img src="http://blogs.msdn.com/aggbug.aspx?PostID=352367" width="1" height="1">