<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Fabulous Adventures In Coding : Floating Point Arithmetic</title><link>http://blogs.msdn.com/ericlippert/archive/tags/Floating+Point+Arithmetic/default.aspx</link><description>Tags: Floating Point Arithmetic</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>As Timeless As Infinity</title><link>http://blogs.msdn.com/ericlippert/archive/2009/10/15/as-timeless-as-infinity.aspx</link><pubDate>Thu, 15 Oct 2009 13:25:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9904452</guid><dc:creator>Eric Lippert</dc:creator><slash:comments>32</slash:comments><comments>http://blogs.msdn.com/ericlippert/comments/9904452.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ericlippert/commentrss.aspx?PostID=9904452</wfw:commentRss><description>&lt;DIV class=mine&gt;
&lt;P&gt;&lt;STRONG&gt;User:&lt;/STRONG&gt; Recently I found out about a peculiar behaviour concerning division by zero in floating point numbers in C#. It does not throw an exception, as with integer division,&amp;nbsp;but rather returns an "infinity". Why is that?&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Eric:&lt;/STRONG&gt; As I've often said, "why" questions are difficult for me&amp;nbsp;to answer. My first attempt at an answer to a "why" question is usually "because that's what the specification says to do"; this time is no different. The C# specification says to do that in section 4.1.6. But we're only doing that because that's what&amp;nbsp;the IEEE standard for floating point arithmetic says to do. We wish to be compliant with the established industry standard. See &lt;A href="http://en.wikipedia.org/wiki/IEEE_754-1985" mce_href="http://en.wikipedia.org/wiki/IEEE_754-1985"&gt;IEEE standard 754-1985&lt;/A&gt; for details. Most floating point&amp;nbsp;arithmetic&amp;nbsp;is done in hardware these days, and most hardware is compliant with this specification.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;User:&lt;/STRONG&gt;&amp;nbsp;It seems to me that division by zero is a bug no matter how you look at it!&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Eric:&lt;/STRONG&gt; Well, since clearly that is not how the members of the IEEE standardization committee looked at it in 1985, your statement that it must be a bug "no matter how you look at it" must be incorrect. Some industry experts do not look at it that way.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;User:&lt;/STRONG&gt;&amp;nbsp;Good point. What motivated this design decision?&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Eric:&lt;/STRONG&gt; I wasn't there; I was busy playing&amp;nbsp;Jumpman on my Commodore 64 at the time. But my educated guess is that &lt;STRONG&gt;it is desirable for all possible operations on all floats to produce a well-defined float result&lt;/STRONG&gt;. Mathematicians would call this a "closure" property; that is, the set of floating point numbers is "closed" over all operations.&lt;/P&gt;
&lt;P&gt;Positive infinity seems like a reasonable choice for dividing a positive number by zero. It seems plausible because of course the limit of 1 / x as x goes to zero (from above)&amp;nbsp;is "positive infinity", so why shouldn't 1/0 be the number "positive infinity"?&lt;/P&gt;
&lt;P&gt;Now, speaking &lt;EM&gt;as a mathematician&lt;/EM&gt;, I find that argument specious. A thing&amp;nbsp;and its limit need not have any particular property in common; it is fallacious to reason that just because, say, a sequence has a particular limit that a fact about the limit is also a fact about the sequence. Mathematically, "positive infinity" (in the sense of a limit of a real-valued function; let's leave transfinite ordinals, hyperbolic geometry, and all of that&amp;nbsp;other stuff out of this discussion)&amp;nbsp;is not a number at all and should not be treated as one; rather, it's a terse way of saying "the limit does not exist because the sequence diverges upwards". &lt;/P&gt;
&lt;P&gt;When we divide by zero, essentially what we are saying is "solve the equation x * 0 = 1"; the solution to that equation is not "positive infinity", it is "I cannot because there is no solution to that equation". It's just the same as asking to solve the equation "x + 1 = x" -- saying "x is positive infinity" is not a solution; there is no solution.&lt;/P&gt;
&lt;P&gt;But speaking &lt;EM&gt;as a practical engineer&lt;/EM&gt; who uses floating point numbers to do an imprecise approximation of ideal arithmetic, this seems like a perfectly reasonable choice. &lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;User:&lt;/STRONG&gt;&amp;nbsp;But surely it is impossible for the hardware to represent "infinity".&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Eric:&lt;/STRONG&gt; It certainly is possible. You've got 32 bits in a single-precision float; that's over four billion possible floats. All bit patterns of the form &lt;/P&gt;
&lt;P&gt;?11111111??????????????????????? &lt;/P&gt;
&lt;P&gt;are reserved for "not-a-number" values. That's over sixteen million possible NaN combinations. Two of those sixteen million NaN bit patterns are reserved to mean positive and negative infinity. Positive infinity is the bit pattern 01111111100000000000000000000000 and negative infinity is 11111111100000000000000000000000. &lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;User:&lt;/STRONG&gt;&amp;nbsp;Do all languages and applications use this convention of division-by-zero-becomes-infinity?&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Eric: &lt;/STRONG&gt;No.&amp;nbsp;For example, C#&amp;nbsp;and JScript do but&amp;nbsp;VBScript does not. VBScript gives an error if you do that.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;User:&lt;/STRONG&gt;&amp;nbsp;Then how do language implementors get the desired behaviour for each language if these semantics are implemented by the hardware?&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Eric: &lt;/STRONG&gt;There are two basic techniques. First, many chips which implement this standard allow&amp;nbsp;the programmer&amp;nbsp;to make float division by zero an exception rather than an infinity. On the 80x87 chip, for example, you can use bit two of the precision control register to determine whether division by zero returns an infinity or&amp;nbsp;throws a hardware exception. &lt;/P&gt;
&lt;P&gt;Second, if you don't want it to be a hardware exception but do want it to be a software exception, then you can check bit two of the status register after each division;&amp;nbsp;it records whether there was&amp;nbsp;a recent divide-by-zero event. &lt;/P&gt;
&lt;P&gt;The latter strategy is used by VBScript; after we perform a division operation we check to see whether the status register recorded a divide-by-zero operation; if it did, then the VBScript runtime creates a divide-by-zero error and the usual VBScript error management&amp;nbsp;process takes over, same as any other error.&lt;/P&gt;
&lt;P&gt;Similar bits exist for other operations that seem like they might be better treated as exceptions, like numeric overflow.&lt;/P&gt;
&lt;P&gt;The existence of the "hardware exception" bits creates problems for the modern&amp;nbsp;language implementor, because we are now often in a world where code written in multiple languages from multiple vendors is running in the same process. Control bits on hardware are the ultimate "global state", and we all know how irksome it is to have global, public state that random code can stomp on. &lt;/P&gt;
&lt;P&gt;For example: I might be misremembering some details, but I seem to recall that Delphi-authored controls set the "overflows cause exceptions" bit. That is, the Delphi implementors did not use the VBScript strategy of "try it, allow it to succeed, and check to see whether the&amp;nbsp;overflow bit was set in the status register". Rather, they used the "make the hardware throw an exception and then catch the exception" strategy. This is deeply unfortunate.&amp;nbsp;When a VBScript script&amp;nbsp;calls a Delphi-authored control, the control flips the bit to force exceptions but it never "unflips" it. If, later on in the script, the VBScript program does an overflow, then we get an unhandled hardware exception because the bit is still set, even though the Delphi control might be long gone! I fixed that by saving away the state of the control register before calling into a component and restoring it when control returns. That's not ideal, but there's not much else we can do.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;User:&lt;/STRONG&gt;&amp;nbsp;Very enlightening! I will be sure to pass this information along to my coworkers. I would be delighted to see a blog post on this.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Eric: &lt;/STRONG&gt;And here you go!&lt;/P&gt;
&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;&lt;/DIV&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=9904452" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ericlippert/archive/tags/VBScript/default.aspx">VBScript</category><category domain="http://blogs.msdn.com/ericlippert/archive/tags/C_2300_/default.aspx">C#</category><category domain="http://blogs.msdn.com/ericlippert/archive/tags/Floating+Point+Arithmetic/default.aspx">Floating Point Arithmetic</category><category domain="http://blogs.msdn.com/ericlippert/archive/tags/Dialogue/default.aspx">Dialogue</category><category domain="http://blogs.msdn.com/ericlippert/archive/tags/Language+Design/default.aspx">Language Design</category><category domain="http://blogs.msdn.com/ericlippert/archive/tags/exception+handling/default.aspx">exception handling</category><category domain="http://blogs.msdn.com/ericlippert/archive/tags/Delphi/default.aspx">Delphi</category></item><item><title>Every Number Is Special In Its Own Special Way</title><link>http://blogs.msdn.com/ericlippert/archive/2006/11/28/every-number-is-special-in-it-s-own-special-way.aspx</link><pubDate>Tue, 28 Nov 2006 21:21:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:1166651</guid><dc:creator>Eric Lippert</dc:creator><slash:comments>28</slash:comments><comments>http://blogs.msdn.com/ericlippert/comments/1166651.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ericlippert/commentrss.aspx?PostID=1166651</wfw:commentRss><description>&lt;DIV class=mine&gt;
&lt;P&gt;I got a question recently about where in the .NET framework the "special numbers" were defined. The questioner was actually asking about the &lt;SPAN class=code&gt;Double.NaN&lt;/SPAN&gt;, &lt;SPAN class=code&gt;Double.PositiveInfinity&lt;/SPAN&gt;, etc, special values for floating point numbers. Of course there are other "special numbers" defined by the framework, such as &lt;SPAN class=code&gt;Math.PI&lt;/SPAN&gt;. &lt;/P&gt;
&lt;P&gt;The question was easily answered but it got me thinking, which, as we know, is usually trouble. &lt;/P&gt;
&lt;P&gt;Clearly zero is a very special number, being the first natural number. &lt;/P&gt;
&lt;P&gt;One is pretty special too, being the multiplicative identity. &lt;/P&gt;
&lt;P&gt;Two is the only even prime. &lt;/P&gt;
&lt;P&gt;Three is the lowest odd prime... &lt;/P&gt;
&lt;P&gt;Clearly lots of numbers are special. This led me to propose the following theorem: &lt;/P&gt;
&lt;P&gt;&lt;B&gt;Theorem:&lt;/B&gt; &lt;I&gt;Every&lt;/I&gt; natural number (0, 1, 2, ...) is a special number. &lt;/P&gt;
&lt;P&gt;&lt;B&gt;Proof:&lt;/B&gt; &lt;/P&gt;
&lt;P&gt;Let’s posit that the set of nonspecial natural numbers is nonempty, and deduce a contradiction. &lt;/P&gt;
&lt;P&gt;If there exists a nonspecial natural number then there must be a &lt;I&gt;lowest&lt;/I&gt; nonspecial natural number. &lt;/P&gt;
&lt;P&gt;What an unusual property! The lowest nonspecial natural number! &lt;/P&gt;
&lt;P&gt;Whatever number has that unusual property must be kinda... special. &lt;/P&gt;
&lt;P&gt;Therefore the lowest nonspecial natural number is special. &lt;/P&gt;
&lt;P&gt;Therefore, if the set of nonspecial natural numbers is nonempty then it contains a special number. &lt;/P&gt;
&lt;P&gt;That is clearly nonsensical, therefore the set of nonspecial natural numbers is empty. &lt;/P&gt;
&lt;P&gt;Therefore all natural numbers are special, QED. &lt;/P&gt;
&lt;P&gt;And yet I can't find anything special about 7920687935872092847630945767548023. But it must be special somehow! 
&lt;P&gt;Extending the proof show that all real numbers are special is left as an exercise for the reader. &lt;/P&gt;&lt;/DIV&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=1166651" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ericlippert/archive/tags/Rarefied+Heights/default.aspx">Rarefied Heights</category><category domain="http://blogs.msdn.com/ericlippert/archive/tags/Floating+Point+Arithmetic/default.aspx">Floating Point Arithmetic</category><category domain="http://blogs.msdn.com/ericlippert/archive/tags/Mathematics/default.aspx">Mathematics</category></item><item><title>Fun With Floating Point Arithmetic, Part Six</title><link>http://blogs.msdn.com/ericlippert/archive/2005/01/26/fun-with-floating-point-arithmetic-part-six.aspx</link><pubDate>Wed, 26 Jan 2005 21:02:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:361041</guid><dc:creator>Eric Lippert</dc:creator><slash:comments>4</slash:comments><comments>http://blogs.msdn.com/ericlippert/comments/361041.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ericlippert/commentrss.aspx?PostID=361041</wfw:commentRss><description>&lt;font face="Lucida Sans Unicode"&gt;&lt;font color="#800080" size="2"&gt; &lt;p&gt;One more thing -- I said earlier that the VBScript float-to-string algorithm was a little bit different than the JScript algorithm. We can demonstrate quite easily by comparing the outputs of two nigh-identical programs:&lt;/p&gt;&lt;/font&gt;&lt;font face="Lucida Console" color="#333399" size="2"&gt; &lt;p&gt;' VBScript&lt;br /&gt;print 9.2 * 100.0 &amp;lt; 920.0&lt;br /&gt;print 919.9999999999999 &amp;lt; 920.0 &lt;br /&gt;print 920.0000000000001 &amp;gt; 920.0&lt;/p&gt; &lt;p&gt;' JScript&lt;br /&gt;print(9.2*100.00 &amp;lt; 920.0);&lt;br /&gt;print(919.9999999999999 &amp;lt; 920.0);&lt;br /&gt;print (920.0000000000001 &amp;gt; 920.0);&lt;/p&gt;&lt;/font&gt;&lt;font color="#800080" size="2"&gt; &lt;p&gt;As you'd expect, the last two comparisons of each result in true. But why does the first also result in true? Because of the very issues we've been talking about in the last five parts -- 9.2 cannot be exactly represented as a float. There is some representation error. When the float is multiplied by 100, the representation error also gets 100 times larger, and that's big enough to make it slightly smaller than 920.&lt;/p&gt; &lt;p&gt;If that's the case then why do these programs produce different output?&lt;/p&gt;&lt;/font&gt;&lt;font face="Lucida Console" color="#333399" size="2"&gt; &lt;p&gt;print 9.2 * 100.0&lt;br /&gt;print 919.9999999999999 &lt;br /&gt;print 920.0000000000001&lt;/p&gt; &lt;p&gt;print(9.2*100.00);&lt;br /&gt;print(919.9999999999999);&lt;br /&gt;print(920.0000000000001);&lt;/p&gt;&lt;/font&gt;&lt;font color="#800080" size="2"&gt; &lt;p&gt;The VBScript program produces 920, 920, 920. The JScript program produces 919.9999999999999, 919.9999999999999, 920.0000000000001. What is up with that?&lt;/p&gt; &lt;p&gt;The JScript algorithm for converting floats to strings is designed to have as much precision as possible. Since 919.9999999999999 and 920.0 have different binary representations as floats, they have different string representation. &lt;/p&gt; &lt;p&gt;The VBScript algorithm on the other hand assumes that if you have 919.9999999999999 or 920.0000000000001, that probably what has happened is you've run into a floating point error accrual&amp;nbsp;issue, and it rounds it back to the correct value for you when it displays the string. &lt;/p&gt; &lt;p&gt;This heuristic means that VBScript (paradoxically) loses a small amount of precision and yet displays more accurate results for typical cases. The down side is that VBScript is unable to display full&amp;nbsp;precision when you really DO want to represent 919.9999999999999. Such cases are quite rare though, and the error created in such cases is tiny.&lt;/p&gt;&lt;/font&gt;&lt;/font&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=361041" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ericlippert/archive/tags/JScript+.NET/default.aspx">JScript .NET</category><category domain="http://blogs.msdn.com/ericlippert/archive/tags/VBScript/default.aspx">VBScript</category><category domain="http://blogs.msdn.com/ericlippert/archive/tags/Scripting/default.aspx">Scripting</category><category domain="http://blogs.msdn.com/ericlippert/archive/tags/Floating+Point+Arithmetic/default.aspx">Floating Point Arithmetic</category></item><item><title>Fun with Floating Point Arithmetic, Part Five</title><link>http://blogs.msdn.com/ericlippert/archive/2005/01/20/fun-with-floating-point-arithmetic-part-five.aspx</link><pubDate>Thu, 20 Jan 2005 18:56:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:357407</guid><dc:creator>Eric Lippert</dc:creator><slash:comments>10</slash:comments><comments>http://blogs.msdn.com/ericlippert/comments/357407.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ericlippert/commentrss.aspx?PostID=357407</wfw:commentRss><description>&lt;font face="lucida sans unicode"&gt;&lt;font color="#800080" size="2"&gt; &lt;p&gt;I went to &lt;a href="http://www.joelonsoftware.com"&gt;Joel Spolsky's &lt;/a&gt;geek dinner at Crossroads the other night, which was a lot of fun. I didn't get much of a chance to chat with Joel, as he was surrounded by a cadre of adoring fans three deep the whole time.&amp;nbsp; I mostly hung out with &lt;A href="http://blogs.msdn.com/kclemson/"&gt;KC &lt;/a&gt;and &lt;A href="http://blogs.msdn.com/larryosterman/"&gt;Larry &lt;/a&gt;and some other attendees and had an interesting talk about digital rights management, corporate blogging, and the difficulties of finding good quality Jackie Chan movies in the original Chinese.&lt;/p&gt; &lt;p&gt;I ran into &lt;a href="http://wesnerm.blogs.com/net_undocumented/"&gt;Wesner Moise&lt;/a&gt;, who I first met long ago when he was working on Access or Excel or one of those Office kind of products but haven't really run into since.&amp;nbsp;He was rather surprised that I remembered his name, but hey, how many Wesner Moises do you think I meet?&amp;nbsp; &lt;/p&gt; &lt;p&gt;Anyway, coincidentally, Wesner has also been running a series on the perils of floating point and integer mathematics&amp;nbsp;on his blog recently.&amp;nbsp;Check it out!&amp;nbsp; &lt;/p&gt; &lt;p&gt;*********************************&lt;/p&gt; &lt;p&gt;I said a while back that floating point math is nothing like the math we're used to.&amp;nbsp;Consider some of the properties that define real number addition. For instance:&lt;/p&gt; &lt;p&gt;&lt;strong&gt;closure&lt;/strong&gt;: x + y is a number&lt;br /&gt;&lt;strong&gt;commutative&lt;/strong&gt;: x + y = y + x&lt;br /&gt;&lt;strong&gt;unique zero&lt;/strong&gt;: a + b = a&amp;nbsp;if and only if&amp;nbsp;b&amp;nbsp;= 0&lt;br /&gt;&lt;strong&gt;associative&lt;/strong&gt;: (x + y ) + z = x + (y + z)&lt;/p&gt; &lt;p&gt;and so on, and similarly for multiplication. Commutivity still holds, but many of these rules do not work in floating point arithmetic.&amp;nbsp; I'll dig into a few of them here -- for more math rules that don't work in floating point, see Wesner's blog articles on the subject.&amp;nbsp; He lists dozens of them!&lt;/p&gt; &lt;p&gt;Consider the closure property, for example. It's not true in VBScript:&lt;/p&gt;&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt; &lt;p&gt;print 10^308 + 10^308 ' Overflow error&lt;/p&gt;&lt;/font&gt;&lt;font color="#800080" size="2"&gt; &lt;p&gt;It is true in JScript, if you consider Infinity to be a number.&lt;/p&gt; &lt;p&gt;The commutative property is true in both VBScript and JScript, but the unique zero, and hence the associative property, are true in neither: &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;print 10^20&amp;nbsp;= 10^20 + 5000 &lt;/font&gt;&lt;font color="#800080" size="2"&gt;prints &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;True -- &lt;/font&gt;&lt;font color="#800080" size="2"&gt;So clearly, zero is not a unique number which, when added, results in the same value. (Zero is unique in that it is the only number when added to EVERY number, results in no change. But almost every number has multiple values which result in no change when added.)&lt;/p&gt; &lt;p&gt;That means that the associative property goes out the window: &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;print 10^20 + (5000 + 5000) = (10^20 + 5000) + 5000 &lt;/font&gt;&lt;font color="#800080" size="2"&gt;prints &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;False&lt;/p&gt;&lt;/font&gt;&lt;font color="#800080" size="2"&gt; &lt;p&gt;The fact that the order in which you make the additions can affect the result makes a difference if you are designing algorithms that must add up lots of little things to one big thing. In those cases, &lt;strong&gt;you should try to add together all the little things first, and then add the total to the big thing.&lt;/strong&gt; That way, the small additions are done with the most precision possible.&lt;/p&gt; &lt;p&gt;There's a related error due to rounding off. 1/100 cannot be represented perfectly accurately in binary any more than 1/3 can be represented perfectly in decimal:&lt;/p&gt;&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt; &lt;p&gt;var steps = 100;&lt;br /&gt;var start = 10;&lt;br /&gt;var stop = 11;&lt;br /&gt;var current = start;&lt;br /&gt;do&lt;br /&gt;{&lt;br /&gt;&amp;nbsp; print(current);&lt;br /&gt;&amp;nbsp; current = current + (stop-start)/steps;&lt;br /&gt;} while (current &amp;lt; stop)&lt;/p&gt;&lt;/font&gt;&lt;font color="#800080" size="2"&gt; &lt;p&gt;Which ends up with&lt;/p&gt;&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt; &lt;p&gt;...&lt;br /&gt;10.96999999999998&lt;br /&gt;10.979999999999979&lt;br /&gt;10.989999999999979&lt;br /&gt;10.999999999999978&lt;/p&gt;&lt;/font&gt;&lt;font color="#800080" size="2"&gt; &lt;p&gt;This actually takes 101 steps, because the last one through the loop gets to 10.999999999999978, which is less than 11.0. &amp;nbsp;This tiny accumulated error results in the algorithm running for 1% too many steps. A better algorithm is to do the looping in integers and compute the current value anew every time:&lt;/p&gt;&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt; &lt;p&gt;var steps = 100;&lt;br /&gt;var start = 10;&lt;br /&gt;var current;&lt;br /&gt;for (var step = 0; step &amp;lt; steps ; ++step)&lt;br /&gt;{&lt;br /&gt;&amp;nbsp; current = start + step/steps;&lt;br /&gt;&amp;nbsp; print(current);&lt;br /&gt;}&lt;/p&gt;&lt;/font&gt;&lt;font color="#800080"&gt; &lt;p&gt;&lt;font size="2"&gt;That actually runs the right number of steps.&lt;/font&gt;&lt;/p&gt; &lt;p&gt;&lt;font size="2"&gt;A corollary of this is that you should almost never compare two floating point numbers for equality, because you never know when some rounding error might have crept in. Rather, subtract them and look at the absolute difference.&amp;nbsp; For instance, if you know that x and y are positive floats that are likely to be close to each other, don't say&amp;nbsp;&lt;font face="Lucida Console" color="#000080"&gt;if (x==y)&lt;/font&gt;&lt;font size="3"&gt;, say &lt;/font&gt;&lt;font color="#800080"&gt;&lt;font face="Lucida Console" color="#000080"&gt;if (Math.abs(x-y) &amp;lt; 0.0001)&lt;font face="Lucida Sans Unicode" color="#800080"&gt; or &lt;/font&gt;&lt;font color="#800080"&gt;&lt;font face="Lucida Console" color="#000080"&gt;if (Math.abs(x-y) &amp;lt; 0.0000001 * x) &lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;font color="#800080"&gt;&lt;font size="2"&gt;or whatever makes sense in your application.&amp;nbsp; (If you need to deal with NaNs and infinities, simple subtraction is anything but!)&lt;/font&gt;&lt;/p&gt; &lt;p&gt;&lt;font size="2"&gt;We saw above that addition of a small number to a large number can lead to an erroneous result, but the error was extremely small.&amp;nbsp; An error of 5000 in a number as big as 10&lt;sup&gt;20&lt;/sup&gt; is 0.00000000000002%, which ain't bad.&amp;nbsp;&amp;nbsp;W&lt;/font&gt;&lt;font size="2"&gt;e'd expect that subtraction of two numbers very close to each other would also produce an erroneous result, but with a similar error percentage.&lt;/font&gt;&lt;/p&gt; &lt;p&gt;&lt;font size="2"&gt;We'd be wrong. H&lt;/font&gt;&lt;font size="2"&gt;ere's an example:&lt;/font&gt;&lt;/p&gt; &lt;p&gt;&lt;font size="2"&gt;Suppose you have to solve an arbitrary quadratic equation:&lt;/font&gt;&lt;/p&gt; &lt;p&gt;&lt;font size="2"&gt;A x&lt;sup&gt;2&lt;/sup&gt; + B x + C = 0&lt;/font&gt;&lt;/p&gt; &lt;p&gt;&lt;font size="2"&gt;The solutions are well known, so we can write a little subroutine:&lt;/font&gt;&lt;/p&gt;&lt;/font&gt;&lt;font face="Lucida Console" color="#000080"&gt; &lt;p&gt;&lt;font size="2"&gt;Sub SolveQuadratic(A, B, C)&lt;br /&gt;&amp;nbsp; &lt;/font&gt;&lt;font size="2"&gt;Discriminant = B*B-4*A*C&lt;br /&gt;&amp;nbsp; &lt;/font&gt;&lt;font size="2"&gt;If Discriminant &amp;lt; 0 Then&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/font&gt;&lt;font size="2"&gt;Print "No real solutions"&lt;br /&gt;&amp;nbsp; &lt;/font&gt;&lt;font size="2"&gt;Else&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/font&gt;&lt;font size="2"&gt;Print (-B + Sqr(Discriminant)) / (2 * A)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/font&gt;&lt;font size="2"&gt;Print (-B - Sqr(Discriminant)) / (2 * A)&lt;br /&gt;&amp;nbsp; &lt;/font&gt;&lt;font size="2"&gt;End If&lt;br /&gt;&lt;/font&gt;&lt;font size="2"&gt;End Sub&lt;br /&gt;&lt;/font&gt;&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;SolveQuadratic 2, 5, -12&lt;/p&gt;&lt;/font&gt;&lt;font color="#800080" size="2"&gt; &lt;p&gt;And sure enough, it prints out -4 and 1.5, done. What about&lt;/p&gt;&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt; &lt;p&gt;SolveQuadratic 1, -10000000.0000001, 1&lt;/p&gt;&lt;/font&gt;&lt;font color="#800080"&gt; &lt;p&gt;&lt;font size="2"&gt;The correct solutions are 10000000 and 0.0000001, but this prints out 10000000 and 9.96515154838562E-08, yielding an error of nearly 3.5%! We've got about &lt;strong&gt;a&amp;nbsp;hundred&amp;nbsp;trillion times as much error&lt;/strong&gt; here as we did in the addition. &lt;/font&gt;&lt;font size="2"&gt;Why? &lt;/font&gt;&lt;/p&gt; &lt;p&gt;&lt;font size="2"&gt;Because B and the root of the discriminant are &lt;em&gt;very, very&lt;/em&gt; close to each other.&amp;nbsp; You only get 15 decimal places of accuracy, and we've used them all up.&amp;nbsp;Therefore, the difference is going to be&amp;nbsp;&lt;em&gt;very&lt;/em&gt; inaccurate. Their sum, however, is going to be quite accurate, since they're of similar size and precision.&lt;/font&gt;&lt;/p&gt; &lt;p&gt;&lt;font size="2"&gt;Fortunately, in this example there's a trick. The product of the two solutions is always C / A, so we can use this fact to write a better algorithm:&lt;/font&gt;&lt;/p&gt;&lt;/font&gt;&lt;font face="Lucida Console" color="#000080"&gt; &lt;p&gt;&lt;font size="2"&gt;Sub SolveQuadratic(A, B, C)&lt;br /&gt;&amp;nbsp; &lt;/font&gt;&lt;font size="2"&gt;Discriminant = B*B-4*A*C&lt;br /&gt;&amp;nbsp; &lt;/font&gt;&lt;font size="2"&gt;If Discriminant &amp;lt; 0 Then&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/font&gt;&lt;font size="2"&gt;Print "No real solutions"&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/font&gt;&lt;font size="2"&gt;Exit Sub&lt;br /&gt;&amp;nbsp; &lt;/font&gt;&lt;font size="2"&gt;End If&lt;br /&gt;&amp;nbsp; &lt;/font&gt;&lt;font size="2"&gt;Soln1 = (-B - Sqr(Discriminant)) / (2 * A)&lt;br /&gt;&amp;nbsp; &lt;/font&gt;&lt;font size="2"&gt;Soln2 = (-B + Sqr(Discriminant)) / (2 * A)&lt;br /&gt;&amp;nbsp; &lt;/font&gt;&lt;font size="2"&gt;If Abs(Soln1) &amp;lt; Abs(Soln2) Then Soln1 = Soln2&lt;br /&gt;&amp;nbsp; &lt;/font&gt;&lt;font size="2"&gt;Soln2 = C / (A * Soln1)&lt;br /&gt;&amp;nbsp; &lt;/font&gt;&lt;font size="2"&gt;Print Soln1&lt;br /&gt;&amp;nbsp; &lt;/font&gt;&lt;font size="2"&gt;Print Soln2&lt;br /&gt;&lt;/font&gt;&lt;font size="2"&gt;End Sub&lt;/font&gt;&lt;/p&gt;&lt;/font&gt;&lt;font color="#800080" size="2"&gt; &lt;p&gt;Which produces a much more accurate result. &lt;/p&gt; &lt;p&gt;But wait a minute -- addition is order-dependent, as we've seen.&amp;nbsp;so&amp;nbsp;are multiplication and division. Should that be &lt;font color="#000080"&gt;( C / A ) / Soln1&lt;/font&gt; or &lt;font color="#000080"&gt;C / (A * Soln1)&lt;/font&gt; ? Or does it even matter? Figuring that out&amp;nbsp;is left as an exercise for the reader!&lt;/p&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=357407" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ericlippert/archive/tags/JScript+.NET/default.aspx">JScript .NET</category><category domain="http://blogs.msdn.com/ericlippert/archive/tags/VBScript/default.aspx">VBScript</category><category domain="http://blogs.msdn.com/ericlippert/archive/tags/Scripting/default.aspx">Scripting</category><category domain="http://blogs.msdn.com/ericlippert/archive/tags/Floating+Point+Arithmetic/default.aspx">Floating Point Arithmetic</category></item><item><title>Fun with Floating Point Arithmetic, Part Four</title><link>http://blogs.msdn.com/ericlippert/archive/2005/01/18/fun-with-floating-point-arithmetic-part-four.aspx</link><pubDate>Tue, 18 Jan 2005 19:07:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:355351</guid><dc:creator>Eric Lippert</dc:creator><slash:comments>7</slash:comments><comments>http://blogs.msdn.com/ericlippert/comments/355351.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ericlippert/commentrss.aspx?PostID=355351</wfw:commentRss><description>&lt;font face="lucida sans unicode"&gt;&lt;font color="#800080" size="2"&gt; &lt;p&gt;A reader also asked the other day why it is that in VBScript, &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;CSng(0.1) = CDbl(0.1)&lt;/font&gt;&lt;font color="#800080" size="2"&gt; is &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;False&lt;/font&gt;&lt;font color="#800080" size="2"&gt;. &lt;/font&gt;&lt;/p&gt; &lt;p&gt;&lt;font color="#800080" size="2"&gt;Forget about binary floating point for a moment. Suppose that we had two fixed-point decimal systems, say one with five digits after the decimal place and one with ten. You want to represent one-third. In our first system, the closest we can get is 0.33333. In our second system, the closest we can get is 0.3333333333. &lt;/p&gt; &lt;p&gt;Now we compare these two things. But this is comparing apples to oranges -- two things need to be the same type to sensibly compare them. We have a choice -- we can either convert the type with more precision to the less-precise format and then compare, or we can convert the less-precise type to the more precise format and compare.&lt;/p&gt; &lt;p&gt;If we did the former, then we'd truncate the long one and it would compare equal to the short one. If we did the latter, then clearly they would not be equal because we'd be comparing 0.3333300000 with 0.3333333333. &lt;/p&gt; &lt;p&gt;The analogy holds for doubles and singles. In a single, 1/10 in binary is &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;0.00011001100110011001100&lt;/font&gt;&lt;font color="#800080" size="2"&gt;. In a double, it's &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;0.000110011001100110011001100110011001100110011001100110&lt;/font&gt;&lt;font color="#800080" size="2"&gt;. If we compare by converting the double to a single, then clearly they are equal -- and, in fact a billion or so doubles which are close enough to 1/10 also compare equal. If we compare by converting the single to a double before comparing, clearly they are not equal.&lt;/p&gt; &lt;p&gt;VBScript always converts to the more precise format before doing the comparison. &lt;/p&gt; &lt;p&gt;You might think that this is kind of bogus. Surely if we're comparing a more precise value to a less precise value, it makes sense to say that the less significant bits are, well, &lt;i&gt;less significant&lt;/i&gt; and throw them away. By converting the less precise format to a more precise format, we are essentially &lt;em&gt;manufacturing new precision&lt;/em&gt; that didn't previously exist. We're just making it up out of whole cloth. &lt;/p&gt; &lt;p&gt;In the world of science, there's a word for that.&amp;nbsp; It's called "fraud".&lt;/p&gt; &lt;p&gt;Yep, we are totally cheating. This is one of those unfortunate "gotchas" which you've got to be very careful of if you're mixing doubles with singles. There is &lt;i&gt;some&lt;/i&gt; justification for it though. &lt;/p&gt; &lt;p&gt;Consider addition, for example. If you have a single, say &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;0.10000000000000000000000&lt;/font&gt;&lt;font color="#800080" size="2"&gt;, and a double &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;0.10000000000000000000000111000&lt;/font&gt;&lt;font color="#800080" size="2"&gt;…, and you add them together, do you expect that the result will be a single or a double? For this situation, many people would say that the sensible thing to do is to treat the single as a double and add them together, rather than losing the information in the less significant bits of the double. Yet this is once more manufacturing new precision for the single. &lt;/font&gt;&lt;/p&gt; &lt;p&gt;&lt;font color="#800080" size="2"&gt;It comes down to a simple decision. Which is more important: &lt;b&gt;not losing existing information&lt;/b&gt;, or &lt;b&gt;not creating new information arbitrarily&lt;/b&gt;?&lt;/p&gt; &lt;p&gt;Once you pick which&amp;nbsp;factor is more important, you've got to apply&amp;nbsp;the rules&amp;nbsp;that entails&amp;nbsp;consistently. You can't say that for addition and subtraction, you convert singles to doubles, but do it the other way for comparisons. If you do that then you get into the rather ridiculous situation that two numbers can have a nonzero difference and yet compare as equal! &lt;/p&gt; &lt;p&gt;The Visual Basic designers decided that loss of information is worse than manufacturing new information and applied that rule consistently to the variant arithmetic logic. Hence, the same goes for operations between integer and floating point types; the integer types are converted to floating point types and the operations are done in floats. You'd certainly never say that 100 + 0.25 should avoid manufacturing new precision, convert the double to an integer, and result in 100, I hope. Similarly, comparisons between the integer 100 and the double 100.25 are done by converting the integer to a double, not converting the double to an integer. &lt;/p&gt; &lt;p&gt;In one case, a comparison can be done by converting to neither type. If you're comparing a 32 bit integer to a single-precision float, you can't convert the single to an integer or the integer to a single without one of them being potentially lossy. In that case, both are converted to doubles. In the VBScript implementation we consult this handy table for what conversion is used when comparing currency, 8-byte float, 4-byte float, 4-byte integer and 2-byte integer to each other:&lt;/p&gt;&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt; &lt;p&gt;&amp;nbsp;&amp;nbsp; I2 I4 R4 R8 CY&lt;br /&gt;&amp;nbsp;&amp;nbsp;+--------------&lt;br /&gt;CY|CY CY CY CY CY&lt;br /&gt;R8|R8 R8 R8 R8 &lt;br /&gt;R4|R4 R8 R4 &lt;br /&gt;I4|I4 I4 &lt;br /&gt;I2|I2 &lt;/p&gt;&lt;/font&gt;&lt;font color="#800080" size="2"&gt; &lt;p&gt;(As an aside, in JScript .NET, where we have 64 bit integers and 64 bit floats which could be compared, we're in this cleft stick again, but this time with no clear way out! There is no larger type to which both can be losslessly converted. Comparing a 64 bit integer to a 64 bit float is a bad idea.)&lt;/p&gt;&lt;b&gt; &lt;p&gt;Unless you have really compelling backwards-compatibility reasons, avoid using single precision floats altogether&lt;/b&gt;. In VBScript both a single and a double are stored as a 16 byte VARIANT, so there is no space savings. And on the chip, both single and double precision floats are converted to an internal extended format (which I believe is 80 bits), processed in that format, and then converted back to singles or doubles when the operation is done. There are no significant savings in either time or space obtained by using singles, and you get potentially a lot of pain because things don't compare the way you might think they do. Avoid, avoid, avoid.&lt;/p&gt; &lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;/font&gt;&lt;/font&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=355351" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ericlippert/archive/tags/VBScript/default.aspx">VBScript</category><category domain="http://blogs.msdn.com/ericlippert/archive/tags/Scripting/default.aspx">Scripting</category><category domain="http://blogs.msdn.com/ericlippert/archive/tags/Floating+Point+Arithmetic/default.aspx">Floating Point Arithmetic</category></item><item><title>Fun With Floating Point Arithmetic, Part Three</title><link>http://blogs.msdn.com/ericlippert/archive/2005/01/17/fun-with-floating-point-arithmetic-part-three.aspx</link><pubDate>Mon, 17 Jan 2005 21:54:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:354658</guid><dc:creator>Eric Lippert</dc:creator><slash:comments>3</slash:comments><comments>http://blogs.msdn.com/ericlippert/comments/354658.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ericlippert/commentrss.aspx?PostID=354658</wfw:commentRss><description>&lt;font face="Lucida Sans Unicode"&gt;&lt;font color="#800080" size="2"&gt; &lt;p&gt;I've been getting lots of mail, questions and pointers to interesting articles on some of the trials and tribulations of using floating point arithmetic correctly. Please do keep it coming!&amp;nbsp;Though I am certainly no expert in this area, I'm happy to take a crack at any questions you might have.&lt;/p&gt; &lt;p&gt;To sum up the story so far, "normal" floats are all numbers representable by this pattern:&lt;/p&gt;&lt;/font&gt;&lt;font face="Lucida Console" color="#333399" size="2"&gt; &lt;p&gt;+/- 1.(52 binary digits after the decimal point) x 2&lt;sup&gt;exp&lt;/sup&gt;&lt;/font&gt;&lt;font color="#800080" size="2"&gt;, where exp is any integer from -1022 through +1023. &lt;/font&gt;&lt;/p&gt; &lt;p&gt;&lt;font color="#800080" size="2"&gt;We also have special floats to represent zero, other tiny floats with less than 52 digits of precision, infinities and NaNs, but we'll ignore those for now.&lt;/p&gt; &lt;p&gt;Clearly a float can represent every 53 bit integer with full fidelity. After you get to 54 bit integers though, only every other integer is going to be representable with full fidelity. With 55 bit integers, only every fourth, and so on. &lt;/p&gt; &lt;p&gt;A reader wrote in to ask some questions about how floating point numbers are displayed in decimal. He noticed a weirdness in VBScript, but it's actually easier to show the scenario in JScript. (There are additional factors at play in VBScript which I may get to in another article later.) Consider the following:&lt;/p&gt;&lt;/font&gt;&lt;font face="Lucida Console" color="#800080" size="2"&gt; &lt;p&gt;var x = &lt;/font&gt;&lt;font face="Lucida Console" color="#008000" size="2"&gt;0x8000000000000800&lt;/font&gt;&lt;font face="Lucida Console" color="#800080" size="2"&gt;;&lt;/p&gt;&lt;/font&gt;&lt;font color="#800080" size="2"&gt; &lt;p&gt;That would be a 64 bit unsigned integer. It's obviously too large to fit into a 32 bit signed integer, so JScript generates a float and assigns it to the variable slot. However, since this number requires exactly 53 bits, &lt;strong&gt;it can be represented with full fidelity as a float&lt;/strong&gt;. It is not rounded. &lt;/p&gt; &lt;p&gt;In decimal notation, this&amp;nbsp;value should be 922337203685477&lt;/font&gt;&lt;font size="2"&gt;7856&lt;/font&gt;&lt;font color="#800080" size="2"&gt;. But if we print out the value of x, we get 922337203685477&lt;/font&gt;&lt;font size="2"&gt;7000&lt;/font&gt;&lt;font color="#800080" size="2"&gt;! Why is it rounded off when this particular float has full precision?&lt;/p&gt; &lt;p&gt;Maybe it doesn't have full precision. Maybe I've been lying to you this whole time.&amp;nbsp;Maybe in fact the float is stored in decimal internally, with a 16 slot decimal digit buffer! Fortunately, we can test this hypothesis out.&lt;/p&gt;&lt;/font&gt;&lt;font face="Lucida Console" color="#800080" size="2"&gt; &lt;p&gt;print(x % &lt;/font&gt;&lt;font face="Lucida Console" color="#008000" size="2"&gt;0x800&lt;/font&gt;&lt;font face="Lucida Console" color="#800080" size="2"&gt;); // 0&lt;br /&gt;print(x % 10);&amp;nbsp; // 6&lt;/font&gt;&lt;/p&gt; &lt;p&gt;&lt;font color="#800080" size="2"&gt;Whew! The mod operator shows that JScript believes that this number is evenly divisible by 2048, and that the last digit when represented in base 10 is in fact 6.&lt;/p&gt; &lt;p&gt;But that just makes it even more confusing! If JScript knows that the last decimal digit is a six, why does converting the number to a string end in a zero?&lt;/p&gt; &lt;p&gt;Because &lt;strong&gt;we do not want to ever make it look like a float has more precision than it actually does&lt;/strong&gt;. By lopping off the last few decimal digits and replacing them with zeros, we emphasize that floats are accurate only to about fifteen or sixteen significant decimal digits. Imagine the confusion that would result if the situation were reversed:&lt;/p&gt;&lt;/font&gt;&lt;font face="Lucida Console" color="#800080" size="2"&gt; &lt;p&gt;var x = 9223372036854777000;&lt;br /&gt;print(x); // prints 9223372036854777856&lt;/p&gt;&lt;/font&gt;&lt;font color="#800080" size="2"&gt; &lt;p&gt;Where did the extra precision come from? To the naïve user who does not realize that numbers are stored in binary internally, this looks really bizarre. They put in something with 16 significant digits and something with 19 comes out!&amp;nbsp;We do this rounding because in the real world, people expect floating point numbers to act like decimal numbers, not binary numbers.&amp;nbsp; &lt;/p&gt; &lt;p&gt;A correct and efficient float-to-string algorithm which shows numbers with a prescribed level of decimal precision is surprisingly difficult to write, particularly if you add the requirement that the string-to-float algorithm have nice "round trip" properties. There are lots of places where things can go slightly wrong.&lt;/p&gt; &lt;p&gt;You've probably noticed already, for instance, that the algorithm which JScript uses does NOT have the property that the decimal integer which comes out is the &lt;em&gt;closest&lt;/em&gt; decimal integer to the actual value. Given that we're going to round to a fixed number of decimal significant digits, we would expect that 9223372036854777856 would be rounded to 922337203685477&lt;/font&gt;&lt;font size="2"&gt;8000&lt;/font&gt;&lt;font color="#800080" size="2"&gt;, not 922337203685477&lt;/font&gt;&lt;font size="2"&gt;7000&lt;/font&gt;&lt;font color="#800080" size="2"&gt;. &lt;/font&gt;&lt;/p&gt; &lt;p&gt;&lt;font color="#800080" size="2"&gt;In fact, the&amp;nbsp;specification categorically states that the last digit need not be correctly rounded, because doing so is a pain. Section 9.8.1 of ECMA specification 262, Revision 3 reads as follows: (emphasis added)&lt;/p&gt;&lt;/font&gt;&lt;font size="2"&gt; &lt;p&gt;The operator ToString converts a number m to string format as follows:&lt;/p&gt; &lt;ol&gt; &lt;li&gt;If m is NaN, return the string "NaN". &lt;li&gt;If m is +0 or &lt;/font&gt;&lt;font face="Symbol" size="2"&gt;-&lt;/font&gt;&lt;font size="2"&gt;0, return the string "0". &lt;li&gt;If m is less than zero, return the string concatenation of the string "-" and ToString(&lt;/font&gt;&lt;font face="Symbol" size="2"&gt;-&lt;/font&gt;&lt;font size="2"&gt;m). &lt;li&gt;If m is infinity, return the string "Infinity". &lt;li&gt;Otherwise, let n, k, and s be integers such that k &lt;/font&gt;&lt;font face="Symbol" size="2"&gt;³&lt;/font&gt;&lt;font size="2"&gt; 1, 10&lt;sup&gt;k&lt;/sup&gt;&lt;/font&gt;&lt;font face="Symbol" size="2"&gt;&lt;sup&gt;-&lt;/sup&gt;&lt;/font&gt;&lt;font size="2"&gt;&lt;sup&gt;1&lt;/sup&gt; &lt;/font&gt;&lt;font face="Symbol" size="2"&gt;£&lt;/font&gt;&lt;font size="2"&gt; s &amp;lt; 10&lt;sup&gt;k&lt;/sup&gt;, the number value for s &lt;/font&gt;&lt;font face="Symbol" size="2"&gt;´&lt;/font&gt;&lt;font size="2"&gt; 10&lt;sup&gt;n&lt;/sup&gt;&lt;/font&gt;&lt;font face="Symbol" size="2"&gt;&lt;sup&gt;-&lt;/sup&gt;&lt;/font&gt;&lt;font size="2"&gt;&lt;sup&gt;k&lt;/sup&gt; is m, and k is as small as possible. Note that k is the number of digits in the decimal representation of s, that s is not divisible by 10, and that &lt;b&gt;the least significant digit of s is not necessarily uniquely determined by these criteria.&lt;/b&gt; &lt;li&gt;If k &lt;/font&gt;&lt;font face="Symbol" size="2"&gt;£&lt;/font&gt;&lt;font size="2"&gt; n &lt;/font&gt;&lt;font face="Symbol" size="2"&gt;£&lt;/font&gt;&lt;font size="2"&gt; 21, return the string consisting of the k digits of the decimal representation of s (in order, with no leading zeroes), followed by n&lt;/font&gt;&lt;font face="Symbol" size="2"&gt;-&lt;/font&gt;&lt;font size="2"&gt;k occurrences of the character ‘0’. &lt;li&gt;If 0 &amp;lt; n &lt;/font&gt;&lt;font face="Symbol" size="2"&gt;£&lt;/font&gt;&lt;font size="2"&gt; 21, return the string consisting of the most significant n digits of the decimal representation of s, followed by a decimal point, followed by the remaining k&lt;/font&gt;&lt;font face="Symbol" size="2"&gt;-&lt;/font&gt;&lt;font size="2"&gt;n digits of the decimal representation of s. &lt;li&gt;If &lt;/font&gt;&lt;font face="Symbol" size="2"&gt;-&lt;/font&gt;&lt;font size="2"&gt;6 &amp;lt; n &lt;/font&gt;&lt;font face="Symbol" size="2"&gt;£&lt;/font&gt;&lt;font size="2"&gt; 0, return the string consisting of the character ‘0’, followed by a decimal point, followed by &lt;/font&gt;&lt;font face="Symbol" size="2"&gt;-&lt;/font&gt;&lt;font size="2"&gt;n occurrences of the character ‘0’, followed by the k digits of the decimal representation of s. &lt;li&gt;Otherwise, if k = 1, return the string consisting of the single digit of s, followed by lowercase character ‘e’, followed by a plus sign ‘+’ or minus sign ‘&lt;/font&gt;&lt;font face="Symbol" size="2"&gt;-&lt;/font&gt;&lt;font size="2"&gt;’ according to whether n&lt;/font&gt;&lt;font face="Symbol" size="2"&gt;-&lt;/font&gt;&lt;font size="2"&gt;1 is positive or negative, followed by the decimal representation of the integer abs(n&lt;/font&gt;&lt;font face="Symbol" size="2"&gt;-&lt;/font&gt;&lt;font size="2"&gt;1) (with no leading zeros). &lt;li&gt;Return the string consisting of the most significant digit of the decimal representation of s, followed by a decimal point, followed by the remaining k&lt;/font&gt;&lt;font face="Symbol" size="2"&gt;-&lt;/font&gt;&lt;font size="2"&gt;1 digits of the decimal representation of s, followed by the lowercase character ‘e’, followed by a plus sign ‘+’ or minus sign ‘&lt;/font&gt;&lt;font face="Symbol" size="2"&gt;-&lt;/font&gt;&lt;font size="2"&gt;’ according to whether n&lt;/font&gt;&lt;font face="Symbol" size="2"&gt;-&lt;/font&gt;&lt;font size="2"&gt;1 is positive or negative, followed by the decimal representation of the integer abs(n&lt;/font&gt;&lt;font face="Symbol" size="2"&gt;-&lt;/font&gt;&lt;font size="2"&gt;1) (with no leading zeros).&lt;/li&gt;&lt;/ol&gt; &lt;p&gt;NOTE The following observations may be useful as guidelines for implementations, but are not part of the normative requirements of this standard.&lt;/p&gt; &lt;ul&gt; &lt;li&gt;If x is any number value other than &lt;/font&gt;&lt;font face="Symbol" size="2"&gt;-&lt;/font&gt;&lt;font size="2"&gt;0, then ToNumber(ToString(x)) is exactly the same number value as x.&lt;b&gt; &lt;li&gt;The least significant digit of s is not always uniquely determined by the requirements listed in step 5.&lt;/b&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;For implementations that provide more accurate conversions than required by the rules above, it is recommended that the following alternative version of step 5 be used as a guideline:&lt;/p&gt; &lt;dir&gt; &lt;p&gt;5. Otherwise, let n, k, and s be integers such that k &lt;/font&gt;&lt;font face="Symbol" size="2"&gt;³&lt;/font&gt;&lt;font size="2"&gt; 1, 10&lt;sup&gt;k&lt;/sup&gt;&lt;/font&gt;&lt;font face="Symbol" size="2"&gt;&lt;sup&gt;-&lt;/sup&gt;&lt;/font&gt;&lt;font size="2"&gt;&lt;sup&gt;1&lt;/sup&gt; &lt;/font&gt;&lt;font face="Symbol" size="2"&gt;£&lt;/font&gt;&lt;font size="2"&gt; s &amp;lt; 10&lt;sup&gt;k&lt;/sup&gt;, the number value for s &lt;/font&gt;&lt;font face="Symbol" size="2"&gt;´&lt;/font&gt;&lt;font size="2"&gt; 10&lt;sup&gt;n&lt;/sup&gt;&lt;/font&gt;&lt;font face="Symbol" size="2"&gt;&lt;sup&gt;-&lt;/sup&gt;&lt;/font&gt;&lt;font size="2"&gt;&lt;sup&gt;k&lt;/sup&gt; is m, and k is as small as possible. If there are multiple possibilities for s, choose the value of s for which s &lt;/font&gt;&lt;font face="Symbol" size="2"&gt;´&lt;/font&gt;&lt;font size="2"&gt; 10&lt;sup&gt;n&lt;/sup&gt;&lt;/font&gt;&lt;font face="Symbol" size="2"&gt;&lt;sup&gt;-&lt;/sup&gt;&lt;/font&gt;&lt;font size="2"&gt;&lt;sup&gt;k&lt;/sup&gt; is &lt;b&gt;closest in value to m&lt;/b&gt;. &lt;b&gt;If there are two such possible values of s, choose the one that is even.&lt;/b&gt; Note that k is the number of digits in the decimal representation of s and that s is not divisible by 10.&lt;/p&gt;&lt;/font&gt;&lt;font color="#800080" size="2"&gt;&lt;/dir&gt; &lt;p&gt;If you are &lt;em&gt;really&lt;/em&gt; interested in this subject, read this excellent paper: &lt;/font&gt;&lt;a href="http://www.ampl.com/REFS/rounding.pdf"&gt;&lt;u&gt;&lt;font color="#0000ff" size="2"&gt;http://www.ampl.com/REFS/rounding.pdf&lt;/u&gt;&lt;/font&gt;&lt;/a&gt;&lt;font color="#800080" size="2"&gt;.&amp;nbsp;This paper was highly influential in the design and implementation&amp;nbsp;of the JScript float-to-string conversion algorithm.&lt;/p&gt; &lt;p&gt;VBScript has some even weirder rules for conversion of floats to strings -- VBScript's FormatNumber method will convert that float to the string 9,223,372,036,854,780,000.00, which has one fewer digits of precision. The reasoning behind that will have to wait for another post!&lt;/p&gt; &lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;/font&gt;&lt;/font&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=354658" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ericlippert/archive/tags/JScript/default.aspx">JScript</category><category domain="http://blogs.msdn.com/ericlippert/archive/tags/VBScript/default.aspx">VBScript</category><category domain="http://blogs.msdn.com/ericlippert/archive/tags/Scripting/default.aspx">Scripting</category><category domain="http://blogs.msdn.com/ericlippert/archive/tags/Floating+Point+Arithmetic/default.aspx">Floating Point Arithmetic</category></item><item><title>Floating Point And Benford's Law, Part Two</title><link>http://blogs.msdn.com/ericlippert/archive/2005/01/13/floating-point-and-benford-s-law-part-two.aspx</link><pubDate>Thu, 13 Jan 2005 17:54:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:352284</guid><dc:creator>Eric Lippert</dc:creator><slash:comments>7</slash:comments><comments>http://blogs.msdn.com/ericlippert/comments/352284.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ericlippert/commentrss.aspx?PostID=352284</wfw:commentRss><description>&lt;font face="lucida sans unicode"&gt;&lt;font color="#800080" size="2"&gt; &lt;p&gt;A number of readers asked for an explanation of my offhand comment that Benford's Law can be used to show that binary is the best base for doing floating point math. &lt;/p&gt; &lt;p&gt;One of the desired properties of a floating point system is that the "representation error" is as small as possible. For example, suppose we want to express "one third", in ten-place fixed point decimal notation. The closest we can get is "0.3333333333", which has a representation error of 0.0000000000333333… &lt;/p&gt; &lt;p&gt;A useful way to get a handle on representation error is to look at the &lt;b&gt;granularity&lt;/b&gt; of the system. By granularity I mean the smallest difference we can make between two values. For ten-place fixed point decimal notation, the smallest nonzero difference between any two numbers is 10&lt;sup&gt;-10&lt;/sup&gt;. In a floating point system, the granularity changes as the exponent changes. Really large numbers have large granularity; the difference between two successive floats might be millions or billions. And really small numbers have tiny granularity. &lt;b&gt;The maximum possible representation error is half the granularity&lt;/b&gt;. Since there is a clear relationship between granularity and representation error, I’m just going to talk about granularity from now on. Small granularity = small representation error = goodness.&lt;/p&gt; &lt;p&gt;Let's consider two similar systems, one which uses a binary mantissa and one which uses a hexadecimal mantissa. I’m going to continue my convention of showing binary numbers in blue, and we’ll do hex in green.&lt;/p&gt; &lt;p&gt;Suppose we've got on the one hand, 32 bit IEEE floats, aka "singles". That is, we've got one sign bit, nine bits of exponent biased by -255 (so exponents can be from -255 to 256), a 22 bit mantissa, and an implicit leading "1." So a number like &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;(0, 100000011, 1100000000000000000000)&lt;/font&gt;&lt;font color="#800080" size="2"&gt; would be +&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;1.1100&lt;/font&gt;&lt;font color="#800080" size="2"&gt; x 2&lt;sup&gt;4&lt;/sup&gt; = 28&lt;/p&gt; &lt;p&gt;I’m getting sick of spelling out the exponents in non-biased binary. From now on, I’m just going to give them in decimal. And I’m going to give the sign bit by plus and minus, not zero and one.&lt;/p&gt; &lt;p&gt;We now want a hex system with roughly similar range that doesn't take up more storage. Suppose we've got one sign bit, seven bits of exponent biased by -63, and six hex digits for the mantissa. We don't get a leading "1." because not every hex number can be so expressed, so we'll have to use a leading 0. That system has roughly the same range, and if we were to "behind the scenes" express this thing as bits, we'd still be using 32 bits. One for the sign, seven for the exponent, and 24 for the mantissa.&lt;/p&gt; &lt;p&gt;In this system, &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;(+, &lt;font color="#800080"&gt;2&lt;/font&gt;, &lt;/font&gt;&lt;font face="Lucida Console" color="#008000" size="2"&gt;A00000&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;)&lt;/font&gt;&lt;font color="#800080" size="2"&gt; would be +&lt;/font&gt;&lt;font color="#008000" size="2"&gt;0.A00000&lt;/font&gt;&lt;font color="#800080" size="2"&gt; x 16&lt;sup&gt;2&lt;/sup&gt; = 160&lt;/p&gt; &lt;p&gt;An immediate disadvantage of this system is that numbers have multiple representations. &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;(+, &lt;font color="#800080"&gt;2&lt;/font&gt;, &lt;/font&gt;&lt;font face="Lucida Console" color="#008000" size="2"&gt;A00000&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;) &lt;/font&gt;&lt;font color="#800080" size="2"&gt;and &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;(+, &lt;font color="#800080"&gt;3&lt;/font&gt;, &lt;/font&gt;&lt;font face="Lucida Console" color="#008000" size="2"&gt;0A0000&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;) &lt;/font&gt;&lt;font color="#800080" size="2"&gt;are the same number. Let’s ignore that for now. We’ll ignore all hex mantissas with leading zeros. (And besides, the binary IEEE system wastes lots of cases for denormals and NaNs too, so it's not clear that this is any worse.)&lt;/p&gt; &lt;p&gt;What's the granularity of the hex system when, say, the exponent is 2? Well, consider a number like &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;(+, 2, &lt;/font&gt;&lt;font face="Lucida Console" color="#008000" size="2"&gt;A00000&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;)&lt;/font&gt;&lt;font color="#800080" size="2"&gt; – the smallest possible number higher than this is &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;(+, 2, &lt;/font&gt;&lt;font face="Lucida Console" color="#008000" size="2"&gt;A00001&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;) &lt;/font&gt;&lt;font color="#800080" size="2"&gt;which is 2&lt;sup&gt;-16&lt;/sup&gt; larger. Clearly, 2&lt;sup&gt;-16&lt;/sup&gt; is the granularity for all values with an exponent of 2. More generally, if the exponent is N then the granularity is 2&lt;sup&gt;4N-24&lt;/sup&gt;.&lt;/p&gt; &lt;p&gt;How would we represent &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;(+, &lt;font color="#800080"&gt;2&lt;/font&gt;, &lt;/font&gt;&lt;font face="Lucida Console" color="#008000" size="2"&gt;A00000&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;)&lt;/font&gt;&lt;font color="#800080" size="2"&gt; in our binary system? That would be &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;(+, &lt;font color="#800080"&gt;7&lt;/font&gt;, 0100000000000000000000) &lt;/font&gt;&lt;font color="#800080" size="2"&gt;= +&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;1.01&lt;/font&gt;&lt;font color="#800080" size="2"&gt; x 2&lt;sup&gt;7&lt;/sup&gt; = 160. The next largest number that can be represented in our system is &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;(+, &lt;font color="#800080"&gt;7&lt;/font&gt;, 0100000000000000000001)&lt;/font&gt;&lt;font color="#800080" size="2"&gt; which is 2&lt;sup&gt;-15&lt;/sup&gt; larger. So the hex system has smaller granularity and hence smaller representation error, and is therefore the better system, right?&lt;/p&gt; &lt;p&gt;Not so fast. Let’s make a chart.&lt;/p&gt;&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt; &lt;p&gt;Decimal&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Hex&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;Binary&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Binary Granularity Exponent&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;16&amp;nbsp;&amp;nbsp;&amp;nbsp; (+, &lt;font color="#000080"&gt;2&lt;/font&gt;,&amp;nbsp;&lt;/font&gt;&lt;font face="Lucida Console" color="#008000" size="2"&gt;100000&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;) (+, 4, 0000000000000000000000) -18&lt;br /&gt;&amp;nbsp;&amp;nbsp;32&amp;nbsp;&amp;nbsp;&amp;nbsp; (+, 2, &lt;/font&gt;&lt;font face="Lucida Console" color="#008000" size="2"&gt;200000&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;) (+, 5, 0000000000000000000000) -17&lt;br /&gt;&amp;nbsp;&amp;nbsp;48&amp;nbsp;&amp;nbsp;&amp;nbsp; (+, 2, &lt;/font&gt;&lt;font face="Lucida Console" color="#008000" size="2"&gt;300000&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;) (+, 5, 1000000000000000000000) -17&lt;br /&gt;&amp;nbsp;&amp;nbsp;64&amp;nbsp;&amp;nbsp;&amp;nbsp; (+, 2, &lt;/font&gt;&lt;font face="Lucida Console" color="#008000" size="2"&gt;400000&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;) (+, 6, 0000000000000000000000) -16&lt;br /&gt;&amp;nbsp;&amp;nbsp;80&amp;nbsp;&amp;nbsp;&amp;nbsp; (+, 2, &lt;/font&gt;&lt;font face="Lucida Console" color="#008000" size="2"&gt;500000&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;) (+, 6, 0100000000000000000000) -16&lt;br /&gt;&amp;nbsp;&amp;nbsp;96&amp;nbsp;&amp;nbsp;&amp;nbsp; (+, 2, &lt;/font&gt;&lt;font face="Lucida Console" color="#008000" size="2"&gt;600000&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;) (+, 6, 1000000000000000000000) -16&lt;br /&gt;&amp;nbsp;112&amp;nbsp;&amp;nbsp;&amp;nbsp; (+, 2, &lt;/font&gt;&lt;font face="Lucida Console" color="#008000" size="2"&gt;700000&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;) (+, 6, 1100000000000000000000) -16&lt;br /&gt;&amp;nbsp;128&amp;nbsp;&amp;nbsp;&amp;nbsp; (+, 2, &lt;/font&gt;&lt;font face="Lucida Console" color="#008000" size="2"&gt;800000&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;) (+, 7, 0000000000000000000000) -15&lt;br /&gt;&amp;nbsp;144&amp;nbsp;&amp;nbsp;&amp;nbsp; (+, 2, &lt;/font&gt;&lt;font face="Lucida Console" color="#008000" size="2"&gt;900000&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;) (+, 7, 0010000000000000000000) -15&lt;br /&gt;&amp;nbsp;160&amp;nbsp;&amp;nbsp;&amp;nbsp; (+, 2, &lt;/font&gt;&lt;font face="Lucida Console" color="#008000" size="2"&gt;A00000&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;) (+, 7, 0100000000000000000000) -15&lt;br /&gt;&amp;nbsp;176&amp;nbsp;&amp;nbsp;&amp;nbsp; (+, 2, &lt;/font&gt;&lt;font face="Lucida Console" color="#008000" size="2"&gt;B00000&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;) (+, 7, 0110000000000000000000) -15&lt;br /&gt;&amp;nbsp;192&amp;nbsp;&amp;nbsp;&amp;nbsp; (+, 2, &lt;/font&gt;&lt;font face="Lucida Console" color="#008000" size="2"&gt;C00000&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;) (+, 7, 1000000000000000000000) -15&lt;br /&gt;&amp;nbsp;208&amp;nbsp;&amp;nbsp;&amp;nbsp; (+, 2, &lt;/font&gt;&lt;font face="Lucida Console" color="#008000" size="2"&gt;D00000&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;) (+, 7, 1010000000000000000000) -15&lt;br /&gt;&amp;nbsp;224&amp;nbsp;&amp;nbsp;&amp;nbsp; (+, 2, &lt;/font&gt;&lt;font face="Lucida Console" color="#008000" size="2"&gt;E00000&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;) (+, 7, 1100000000000000000000) -15&lt;br /&gt;&amp;nbsp;240&amp;nbsp;&amp;nbsp;&amp;nbsp; (+, 2, &lt;/font&gt;&lt;font face="Lucida Console" color="#008000" size="2"&gt;F00000&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;) (+, 7, 1110000000000000000000) -15&lt;/p&gt;&lt;/font&gt;&lt;font color="#800080" size="2"&gt; &lt;p&gt;The hex system always has a granularity exponent of -16.&amp;nbsp; In 8/15ths of the cases, the binary system has worse granularity than the hex system. In 4/15ths of the cases, they have the same granularity, and in only 3/15ths of the cases, the binary system has better granularity. Therefore the hex system is clearly the same or better most of the time.&amp;nbsp; We should use this system instead of binary floats.&lt;/p&gt; &lt;p&gt;Not so fast! We’ve forgotten Benford’s Law! That's true if any number is as likely as any other, but that's not realistic. &lt;/p&gt; &lt;p&gt;Suppose the numbers that we are manipulating obey Benford’s Law. In that case, we would expect that fully a quarter of them would begin with &lt;font color="#006400"&gt;1&lt;/font&gt; when encoded in hex. We’d expect another quarter of them to begin with &lt;font color="#006400"&gt;2&lt;/font&gt; or &lt;font color="#006400"&gt;3&lt;/font&gt;, another quarter to begin with &lt;font color="#006400"&gt;4&lt;/font&gt;, &lt;font color="#006400"&gt;5&lt;/font&gt;, &lt;font color="#006400"&gt;6&lt;/font&gt; or &lt;font color="#006400"&gt;7&lt;/font&gt;, and the remaining quarter to begin with &lt;font color="#006400"&gt;8&lt;/font&gt;, &lt;font color="#006400"&gt;9&lt;/font&gt;, &lt;font color="#006400"&gt;A&lt;/font&gt;, &lt;font color="#006400"&gt;B&lt;/font&gt;, &lt;font color="#006400"&gt;C&lt;/font&gt;, &lt;font color="#006400"&gt;D&lt;/font&gt;, &lt;font color="#006400"&gt;E&lt;/font&gt; and &lt;font color="#006400"&gt;F&lt;/font&gt;. If we make that assumption then we must conclude that the binary system is better half the time, equal a quarter of the time, and worse a quarter of the time.&lt;/p&gt; &lt;p&gt;Clearly this isn't the case just for the exponent 2. For any hex exponent N, the hex system will have a granularity exponent of 4N-24. For numbers in the range expressible by the hex system with that exponent, a quarter of the time, the binary system will have a granularity exponent of 4N-26, a quarter of the time it'll be 4N-25, a quarter of the time it'll be 4N-24, and the remaining quarter it'll be 4N-23, so three-quarters of the time it'll be as good or better. &lt;b&gt;On average, the binary system is considerably better if data are distributed according to Benford's Law. &lt;/b&gt;&lt;/p&gt; &lt;p&gt;This is just one example, not&amp;nbsp;a proof.&amp;nbsp;But we could generalize this example and show that for any system with a given number of bits, binary mantissas yield smaller representation errors on average than mantissas in any other base.&lt;/p&gt;&lt;/font&gt;&lt;/font&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=352284" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ericlippert/archive/tags/Rarefied+Heights/default.aspx">Rarefied Heights</category><category domain="http://blogs.msdn.com/ericlippert/archive/tags/Floating+Point+Arithmetic/default.aspx">Floating Point Arithmetic</category><category domain="http://blogs.msdn.com/ericlippert/archive/tags/Benford_2700_s+Law/default.aspx">Benford's Law</category></item><item><title>Benford's Law</title><link>http://blogs.msdn.com/ericlippert/archive/2005/01/12/benford-s-law.aspx</link><pubDate>Wed, 12 Jan 2005 20:18:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:351693</guid><dc:creator>Eric Lippert</dc:creator><slash:comments>25</slash:comments><comments>http://blogs.msdn.com/ericlippert/comments/351693.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ericlippert/commentrss.aspx?PostID=351693</wfw:commentRss><description>&lt;font face="Lucida sans unicode"&gt;&lt;font color="#800080" size="2"&gt; &lt;p&gt;While I was poking through my old numeric analysis textbooks to refresh my memory for this series on floating point arithmetic, I came across one of my favourite weird facts about math. &lt;/p&gt; &lt;p&gt;A nonzero base-ten integer starts with some digit other than zero. You might naively expect that given a bunch of "random" numbers, you'd see every digit from 1 to 9 about equally often. You'd see as many 2's as 9's. You'd see each digit as the leading digit about 11% of the time.&amp;nbsp; For example, consider a random integer between 100000 and 999999. One ninth begin with 1, one ninth begin with 2, etc.&lt;/p&gt; &lt;p&gt;But in real-life datasets, that's not the case at all. If you just start grabbing thousands or millions of "random" numbers from newspapers and magazines and books, you soon see that about 30% of the numbers begin with 1, and it falls off rapidly from there. About 18% begin with 2, all the way down to less than 5% for 9.&lt;/p&gt; &lt;p&gt;This oddity was discovered by Newcomb in 1881, and then rediscovered by Frank Benford, a physicist, in 1937. As often is the case, the fact became associated with the second discoverer and is now known as Benford's Law.&lt;/p&gt; &lt;p&gt;Benford's Law has lots of practical applications. For instance, people who just make up numbers wholesale on their tax returns tend to pick "average seeming" numbers, and to humans, "average seeming" means "starts with a five". People think, I want something between $1000 and $10000, let's say, $5624. The IRS routinely scans tax returns to find unusually high percentages of leading 5's and examines those more carefully.&lt;/p&gt; &lt;p&gt;Benford's result was carefully studied by many statisticians and other mathematicians, and we now have a multi-base form of the law. Given a bunch of numbers in base B, we'd expect to see leading digit n approximately &lt;font color="#000080"&gt;ln (1 + 1/n) / ln B&lt;/font&gt; of the time. &lt;/p&gt; &lt;p&gt;But what could possibly explain Benford's Law? &lt;/p&gt; &lt;p&gt;Multiplication. Most numbers we see every day are not random quantities in of themselves. They're usually computed qualities with some aspect of multiplication to them. &lt;/p&gt; &lt;p&gt;Consider, for example, any property which grows on a percentage basis. Like, say, the Dow Jones Industrial Average. It typically grows a few percent a year. Suppose, just to pick a rate, that on average the DJIA grows at 7% a year. At that rate, it doubles about every ten years. Suppose that&amp;nbsp;the DJIA is 10000. After ten years of having 1 as the leading digit, it finally gets to 20000. Ten years go by again, but in that ten years, it doubles to 40000, not 30000. Therefore, those ten years were spent about half starting with 2, and about half starting with 3.&amp;nbsp;Ten more years go by, and it doubles again to 80000. Now ten years have 4, 5, 6 and 7 as the leading digits in only ten years. Eventually we get up to 100000, and spend another ten years starting with 1.&amp;nbsp; Pick a random date and you'd expect that the DJIA on that day would be twice as likely to start with 1 as 2, and four times as likely to start with 1 as 5.&lt;/p&gt; &lt;p&gt;We can easily write a program that demonstrates Benford's Law. As we multiply more and more numbers together, they tend to clump based on Benford's Law:&lt;/p&gt;&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt; &lt;p&gt;var counters = [&lt;br /&gt;[0,0,0,0,0,0,0,0,0,0],&lt;br /&gt;[0,0,0,0,0,0,0,0,0,0],&lt;br /&gt;[0,0,0,0,0,0,0,0,0,0],&lt;br /&gt;[0,0,0,0,0,0,0,0,0,0] ];&lt;/p&gt; &lt;p&gt;for (var multiplications = 0 ; multiplications &amp;lt;= 3; ++multiplications)&lt;br /&gt;{&lt;br /&gt;&amp;nbsp; for (var trial = 0 ; trial &amp;lt; 10000 ; ++trial)&lt;br /&gt;&amp;nbsp; {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; var num = 1;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;for (var mult = 0 ; mult &amp;lt;= multiplications; ++mult)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; num = num * (Math.floor(Math.random() * 1000) + 1);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; var lead = num.toString().substr(0,1);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; counters[multiplications][lead] ++;&lt;br /&gt;&amp;nbsp; }&lt;br /&gt;}&lt;br /&gt;print(counters.join("\n"));&lt;/p&gt;&lt;/font&gt;&lt;font color="#800080" size="2"&gt; &lt;p&gt;A typical run produces data from which we can draw this table:&lt;/p&gt;&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt; &lt;p&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Leading Digit&lt;br /&gt;Mults&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;2&amp;nbsp;&amp;nbsp;&amp;nbsp; 3&amp;nbsp;&amp;nbsp;&amp;nbsp; 4&amp;nbsp;&amp;nbsp; &amp;nbsp;5&amp;nbsp;&amp;nbsp; &amp;nbsp;6&amp;nbsp;&amp;nbsp; &amp;nbsp;7&amp;nbsp;&amp;nbsp; &amp;nbsp;8&amp;nbsp;&amp;nbsp; &amp;nbsp;9&lt;br /&gt;&amp;nbsp; 0&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1102 1069 1085 1125 1167 1107 1083 1124 1138&lt;br /&gt;&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2416 1752 1443 1162 1019&amp;nbsp; 756&amp;nbsp; 643&amp;nbsp; 453&amp;nbsp; 356&lt;br /&gt;&amp;nbsp; 2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3046 1854 1265&amp;nbsp; 979&amp;nbsp; 778&amp;nbsp; 632&amp;nbsp; 551&amp;nbsp; 468&amp;nbsp; 427 &lt;br /&gt;&amp;nbsp; 3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3090 1814 1197&amp;nbsp; 924&amp;nbsp; 779&amp;nbsp; 661&amp;nbsp; 582&amp;nbsp; 491&amp;nbsp; 462&lt;br /&gt;predicted 3010 1761 1249&amp;nbsp; 969&amp;nbsp; 792&amp;nbsp; 669&amp;nbsp; 580&amp;nbsp; 511&amp;nbsp; 458&lt;/p&gt;&lt;/font&gt;&lt;font color="#800080" size="2"&gt; &lt;p&gt;As you can see, with no multiplications, the distribution is a flat 11% for each. But by the time we get up to two or three multiplications, we're almost exactly at the distribution predicted by Benford's Law.&lt;/p&gt; &lt;p&gt;What does this have to do with floating point math? Well, we could conceivably design chips that did decimal or hexadecimal floating-point arithmetic. Would such chips yield more accurate results? Well, recall that last time, we used the fact that you can stuff a leading 1 onto a bit field to define a number. Binary is the only system in which every number except 0 begins with a leading 1! You can make a statistical argument which shows that for bases other than binary, in which you cannot always assume a leading digit, have on average a larger representation error. The argument is somewhat subtle, so I'm not going to actually go through the details of it, but suffice to say that we can show that for typical uses, binary is the least error-producing system we can come up with given that we'll almost always be working with data which follow Benford's Law.&lt;/p&gt;&lt;/font&gt;&lt;/font&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=351693" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ericlippert/archive/tags/Rarefied+Heights/default.aspx">Rarefied Heights</category><category domain="http://blogs.msdn.com/ericlippert/archive/tags/Floating+Point+Arithmetic/default.aspx">Floating Point Arithmetic</category><category domain="http://blogs.msdn.com/ericlippert/archive/tags/Benford_2700_s+Law/default.aspx">Benford's Law</category></item><item><title>Floating Point Arithmetic, Part One</title><link>http://blogs.msdn.com/ericlippert/archive/2005/01/10/floating-point-arithmetic-part-one.aspx</link><pubDate>Mon, 10 Jan 2005 19:34:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:350108</guid><dc:creator>Eric Lippert</dc:creator><slash:comments>18</slash:comments><comments>http://blogs.msdn.com/ericlippert/comments/350108.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ericlippert/commentrss.aspx?PostID=350108</wfw:commentRss><description>&lt;font face="Lucida Sans Unicode" color="#800080" size="2"&gt;&lt;font color="#800080" size="2"&gt; &lt;p&gt;A month ago I was discussing some of the &lt;/font&gt;&lt;A href="http://blogs.msdn.com/ericlippert/archive/2004/12/03/274360.aspx"&gt;&lt;u&gt;&lt;font color="#0000ff" size="2"&gt;issues in integer arithmetic&lt;/u&gt;&lt;/font&gt;&lt;/a&gt;&lt;font color="#800080" size="2"&gt;, and I said that issues in floating point arithmetic were a good subject for another day. Over the weekend I got some questions from a reader about floating point arithmetic, so this seems like as good a time as any to address them. &lt;/p&gt; &lt;p&gt;Before I talk about some of the things that can go terribly wrong with floating point arithmetic, it's helpful (and character building) to understand how exactly a floating point number is represented internally.&lt;/p&gt; &lt;p&gt;To distinguish between decimal and binary numbers, I'm going to do all binary numbers in &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;blue fixed-width&lt;/font&gt;&lt;font color="#800080" size="2"&gt;.&lt;/p&gt; &lt;p&gt;Here's how floating point numbers work.&amp;nbsp; A float is 64 bits.&amp;nbsp; Of that, one bit represents the &lt;b&gt;sign&lt;/b&gt;: &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;0&lt;/font&gt;&lt;font color="#800080" size="2"&gt; is positive, &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;1&lt;/font&gt;&lt;font color="#800080" size="2"&gt; is negative.&amp;nbsp;&amp;nbsp;&lt;/p&gt; &lt;p&gt;Eleven bits represent the &lt;b&gt;exponent&lt;/b&gt;.&amp;nbsp; To determine the exponent value, treat the exponent field as an eleven-bit unsigned integer, then subtract 1023.&amp;nbsp; However, note that the exponent fields &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;00000000000&lt;/font&gt;&lt;font color="#800080" size="2"&gt; and &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;11111111111&lt;/font&gt;&lt;font color="#800080" size="2"&gt; have special meaning, which we'll come to later.&lt;/p&gt; &lt;p&gt;The remaining 52 bits represent the &lt;strong&gt;mantissa&lt;/strong&gt;.&lt;/p&gt; &lt;p&gt;To compute the value of a float, here's what you do.&amp;nbsp; You take the mantissa, and you stick a "&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;1.&lt;/font&gt;&lt;font color="#800080" size="2"&gt;" onto its left hand side.&amp;nbsp; Then you compute that value as a 53 bit fraction with 52 fractional places.&amp;nbsp; Then you multiply that by&amp;nbsp;two to the power of the given exponent value, and sign it appropriately.&lt;/p&gt; &lt;p&gt;So for example, the number -5.5 is represented like this: (sign, exponent, mantissa)&lt;/p&gt; &lt;p&gt;&lt;/font&gt;&lt;font face="Lucida Console" color="#800080" size="2"&gt;(&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;1&lt;/font&gt;&lt;font face="Lucida Console" color="#800080" size="2"&gt;, &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;10000000001&lt;/font&gt;&lt;font face="Lucida Console" color="#800080" size="2"&gt;, &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;0110000000000000000000000000000000000000000000000000&lt;/font&gt;&lt;font face="Lucida Console" color="#800080" size="2"&gt;)&lt;/p&gt;&lt;/font&gt;&lt;font color="#800080" size="2"&gt; &lt;p&gt;The sign is &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;1&lt;/font&gt;&lt;font color="#800080" size="2"&gt;, so its a negative number.&amp;nbsp; The exponent is 1025 - 1023 = 2.&amp;nbsp; Put a &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;1.&lt;/font&gt;&lt;font color="#800080" size="2"&gt; on the top of the mantissa and you get &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;1.0110000000000000000000000000000000000000000000000000&lt;/font&gt;&lt;font color="#800080" size="2"&gt; = 1.375 and sure enough, -1.375 x 2&lt;sup&gt;2&lt;/sup&gt; = -5.5&lt;/p&gt; &lt;p&gt;This system is nice because it means that every number in the range of a float has a unique representation, and therefore doesn't waste bits on duplicates.&amp;nbsp;&lt;/p&gt; &lt;p&gt;However, you might be wondering how zero is represented, since every bit pattern has &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;1.&lt;/font&gt;&lt;font color="#800080" size="2"&gt; plunked onto the beginning.&amp;nbsp; That's where the special values for the exponent come in.&amp;nbsp; If the exponent is &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;00000000000&lt;/font&gt;&lt;font color="#800080" size="2"&gt;, then the float is considered a "denormal".&amp;nbsp; It gets &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;0.&lt;/font&gt;&lt;font color="#800080" size="2"&gt; plunked onto the beginning, not &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;1.&lt;/font&gt;&lt;font color="#800080" size="2"&gt;, and the exponent is assumed to be -1022.&amp;nbsp; This has the nice property that if all bits in the float are zero, it's representing zero. Note that this lets you represent smaller numbers than you would be able to otherwise, as we'll see, though you pay the price of lower precision.&amp;nbsp; Essentially, denormals exist so that the chip can do "graceful underflow" -- represent tiny values without having to go straight to zero.&lt;/p&gt; &lt;p&gt;&lt;/p&gt; &lt;p&gt;If the exponent &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;11111111111&lt;/font&gt;&lt;font color="#800080" size="2"&gt; and the fraction is all zeros, that's Infinity.&amp;nbsp; If the exponent is &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;11111111111&lt;/font&gt;&lt;font color="#800080" size="2"&gt; and the fraction is not all zeros, that's considered to be &lt;b&gt;Not A Number&lt;/b&gt; -- this is a bit pattern reserved for errors.&lt;/p&gt; &lt;p&gt;So the biggest and smallest positive &lt;strong&gt;normalized&lt;/strong&gt; floats are&lt;/p&gt;&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt; &lt;p&gt;(0, 11111111110, 1111111111111111111111111111111111111111111111111111)&lt;/p&gt;&lt;/font&gt;&lt;font color="#800080" size="2"&gt; &lt;p&gt;which is &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;1.1111111111111111111111111111111111111111111111111111 &lt;/font&gt;&lt;font color="#800080" size="2"&gt;x 2&lt;sup&gt;1023&lt;/sup&gt;&lt;/font&gt;&lt;font face="Lucida Console" color="#800080" size="2"&gt;, and&lt;/font&gt;&lt;font color="#800080" size="2"&gt; &lt;/p&gt;&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt; &lt;p&gt;(0, 00000000001, 0000000000000000000000000000000000000000000000000000)&lt;/p&gt;&lt;/font&gt;&lt;font color="#800080" size="2"&gt; &lt;p&gt;which is &amp;nbsp;&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;1.000 &lt;/font&gt;&lt;font color="#800080" size="2"&gt;x 2&lt;sup&gt;-1022&lt;/sup&gt;&lt;/p&gt; &lt;p&gt;The biggest and smallest positive &lt;strong&gt;denormalized&lt;/strong&gt; floats are &lt;/p&gt;&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt; &lt;p&gt;(0, 00000000000, 0000000000000000000000000000000000000000000000000001)&lt;/p&gt;&lt;/font&gt;&lt;font color="#800080" size="2"&gt; &lt;p&gt;which is &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;0.0000000000000000000000000000000000000000000000000001 &lt;/font&gt;&lt;font color="#800080" size="2"&gt;x 2&lt;sup&gt;-1022&amp;nbsp;&lt;/sup&gt; = 2&lt;sup&gt;-1074&lt;/sup&gt;, and&lt;/p&gt;&lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt; &lt;p&gt;(0, 00000000000, 1111111111111111111111111111111111111111111111111111)&lt;/p&gt;&lt;/font&gt;&lt;font color="#800080" size="2"&gt; &lt;p&gt;which is &lt;/font&gt;&lt;font face="Lucida Console" color="#000080" size="2"&gt;0.1111111111111111111111111111111111111111111111111111 &lt;/font&gt;&lt;font color="#800080" size="2"&gt;x 2&lt;sup&gt;-1022&lt;/sup&gt;&lt;/p&gt; &lt;p&gt;Next time: floating point math is &lt;strong&gt;nothing&lt;/strong&gt; like real number math.&lt;/p&gt;&lt;/font&gt;&lt;/font&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=350108" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ericlippert/archive/tags/Rarefied+Heights/default.aspx">Rarefied Heights</category><category domain="http://blogs.msdn.com/ericlippert/archive/tags/Floating+Point+Arithmetic/default.aspx">Floating Point Arithmetic</category></item><item><title>Bankers' Rounding</title><link>http://blogs.msdn.com/ericlippert/archive/2003/09/26/bankers-rounding.aspx</link><pubDate>Fri, 26 Sep 2003 21:03:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:53107</guid><dc:creator>Eric Lippert</dc:creator><slash:comments>24</slash:comments><comments>http://blogs.msdn.com/ericlippert/comments/53107.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ericlippert/commentrss.aspx?PostID=53107</wfw:commentRss><description>&lt;SPAN&gt;
&lt;P&gt;
&lt;P&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt;A number of people have pointed out to me over the years that VBScript's &lt;/SPAN&gt;&lt;/FONT&gt;&lt;FONT face="Lucida Console" color=#333399 size=2&gt;&lt;SPAN&gt;Round&lt;/SPAN&gt;&lt;/FONT&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt; function is a bit weird.&amp;nbsp; It seems like it should be pretty straightforward -- you pick the integer closest to the number you've got, end of story.&amp;nbsp; But what about, say, 1.5?&amp;nbsp; There are &lt;B&gt;&lt;SPAN&gt;two&lt;/SPAN&gt;&lt;/B&gt; closest integers.&amp;nbsp; Do you go up or down? 
&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt;The &lt;/SPAN&gt;&lt;/FONT&gt;&lt;FONT face="Lucida Console" color=#333399 size=2&gt;&lt;SPAN&gt;Round&lt;/SPAN&gt;&lt;/FONT&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt; function goes to the nearest integer, and &lt;I&gt;&lt;SPAN&gt;if there are two nearest integers then it goes to the even one.&lt;/SPAN&gt;&lt;/I&gt;&amp;nbsp; 1.5 rounds to 2, 0.5 rounds to 0. 
&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt;Why's that?&amp;nbsp; Why not just arbitrarily say that we always round down in this situation?&amp;nbsp; Why round down sometimes and up some other times?&amp;nbsp; There actually is a good reason!&amp;nbsp; 
&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt;This algorithm is called the &lt;STRONG&gt;&lt;B&gt;&lt;FONT face="Lucida Sans Unicode"&gt;&lt;SPAN&gt;Bankers' Rounding&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;/STRONG&gt; algorithm because, unsurprisingly, it's used by bankers.&amp;nbsp; Suppose a data source provides data which is often in exactly split quantities -- half dollars, half cents, half shares, whatever -- but they wish to provide rounded-off quantities.&amp;nbsp; Suppose further that a data consumer is going to derive summary statistics from the rounded data -- an average, say.&amp;nbsp; &lt;/SPAN&gt;&lt;/FONT&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt;Ideally when you are taking an average you want to take an average of the raw data with as much precision as you can get.&amp;nbsp; But in the real world we often have to take averages of data which has lost some precision.&amp;nbsp; In such a situation the Banker's Rounding algorithm produces better results because it does not bias half-quantities consistently down or consistently up.&amp;nbsp; It assumes that on average, an equal number of half-quantities will be rounded up as down, and the errors will cancel out.&amp;nbsp; &lt;/SPAN&gt;&lt;/FONT&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt;If you don't believe me, try it.&amp;nbsp; Generate a random list of numbers that end in 0.5, round them off, and average them.&amp;nbsp; You'll find that Bankers' Rounding gives you closer results to the real average than "always round down" averaging.&lt;/SPAN&gt;&lt;/FONT&gt; 
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt;The &lt;/SPAN&gt;&lt;/FONT&gt;&lt;FONT face="Lucida Console" color=#333399 size=2&gt;&lt;SPAN&gt;Round&lt;/SPAN&gt;&lt;/FONT&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;/FONT&gt;&lt;FONT face="Lucida Console" color=#333399 size=2&gt;&lt;SPAN&gt;CInt&lt;/SPAN&gt;&lt;/FONT&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt; and &lt;/SPAN&gt;&lt;/FONT&gt;&lt;FONT face="Lucida Console" color=#333399 size=2&gt;&lt;SPAN&gt;CLng&lt;/SPAN&gt;&lt;/FONT&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt; functions in VBScript all use the Banker's Rounding algorithm.&amp;nbsp; 
&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt;There are two other VBScript functions which turn floats into integers.&amp;nbsp; The &lt;/SPAN&gt;&lt;/FONT&gt;&lt;FONT face="Lucida Console" color=#333399 size=2&gt;&lt;SPAN&gt;Int&lt;/SPAN&gt;&lt;/FONT&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt; function gives you the first integer &lt;I&gt;&lt;SPAN&gt;less than or equal to its input&lt;/SPAN&gt;&lt;/I&gt;, and the &lt;/SPAN&gt;&lt;/FONT&gt;&lt;FONT face="Lucida Console" color=#333399 size=2&gt;&lt;SPAN&gt;Fix&lt;/SPAN&gt;&lt;/FONT&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt; function gives you the first integer &lt;I&gt;&lt;SPAN&gt;closer to zero or equal to its input&lt;/SPAN&gt;&lt;/I&gt;.&amp;nbsp; These functions do not round to the nearest integer at all, they simply truncate the fractional part.&amp;nbsp; 
&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt;UPDATE: What about FormatNumber?&amp;nbsp; See &lt;A title=http://blogs.msdn.com/ericlippert/archive/2003/09/26/53112.aspx href="http://blogs.msdn.com/ericlippert/archive/2003/09/26/53112.aspx"&gt;this post&lt;/A&gt;.&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=53107" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ericlippert/archive/tags/VBScript/default.aspx">VBScript</category><category domain="http://blogs.msdn.com/ericlippert/archive/tags/Scripting/default.aspx">Scripting</category><category domain="http://blogs.msdn.com/ericlippert/archive/tags/Floating+Point+Arithmetic/default.aspx">Floating Point Arithmetic</category></item><item><title>Why does JScript have rounding errors?</title><link>http://blogs.msdn.com/ericlippert/archive/2003/09/15/why-does-jscript-have-rounding-errors.aspx</link><pubDate>Mon, 15 Sep 2003 20:41:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:53000</guid><dc:creator>Eric Lippert</dc:creator><slash:comments>7</slash:comments><comments>http://blogs.msdn.com/ericlippert/comments/53000.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ericlippert/commentrss.aspx?PostID=53000</wfw:commentRss><description>&lt;SPAN&gt;
&lt;DIV&gt;
&lt;P&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt;Try this in JScript: 
&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Lucida Console" color=#333399 size=2&gt;&lt;SPAN&gt;window.alert(9.2 * 100.0); 
&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt;You might expect to get 920, but in fact you get 919.9999999999999.&amp;nbsp; What the heck is going on here? 
&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt;Boy, have I ever heard this question a lot. 
&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt;Well, let me answer that question with another question.&amp;nbsp; Suppose you did a simple division, say 
&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Lucida Console" color=#333399 size=2&gt;&lt;SPAN&gt;window.alert(1.0 / 3.0); 
&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt;Would you expect an &lt;B&gt;&lt;SPAN&gt;infinitely large window &lt;/SPAN&gt;&lt;/B&gt;that said "0.33333333333..." with &lt;B&gt;&lt;SPAN&gt;an infinite number of threes&lt;/SPAN&gt;&lt;/B&gt;, or would you expect ten or so digits?&amp;nbsp; Clearly you'd expect it to not &lt;B&gt;&lt;SPAN&gt;fill your computer's entire memory with threes&lt;/SPAN&gt;&lt;/B&gt;.&amp;nbsp; But that means that I must ask &lt;B&gt;&lt;SPAN&gt;why are you willing to accept an error of 0.00000000000333333... in the case of dividing one by three but not willing to accept a smaller error of 0.000000000001 in the case of multiplying 9.2 by 100?&lt;/SPAN&gt;&lt;/B&gt;&amp;nbsp; 
&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt;The simple fact &lt;B&gt;&lt;SPAN&gt;is that computer arithmetic frequently accumulates tiny errors.&amp;nbsp; &lt;/SPAN&gt;&lt;/B&gt;Any mathematics which results in numbers that cannot be represented by a small number of powers of two will result in errors.&amp;nbsp; 
&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt;Let's look at this cases a little closer.&amp;nbsp; We're trying to multiply 9.2 by 100.0.&amp;nbsp; 100.0 can be EXACTLY represented as a floating point number because it's an integer.&amp;nbsp; But 9.2 can't be -- you can't represent 46 / 5 exactly in base two any more than you can represent 1 / 3 exactly in base ten.&amp;nbsp; So when converting from the string "9.2" to the internal binary representation, a tiny error is accrued.&amp;nbsp; However, it's not all bad -- the 64 bit binary number which represents 9.2 internally is (a) the 64 bit float closest to 9.2, and (b) the algorithm which converts back and forth between strings and binary representation will &lt;B&gt;&lt;SPAN&gt;round-trip&lt;/SPAN&gt;&lt;/B&gt; -- that binary representation will be converted back to 9.2 if you try to convert it to a string. 
&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt;But now we go and throw a wrench in the works by multiplying by one hundred.&amp;nbsp; &lt;B&gt;&lt;SPAN&gt;That's going to lose the last few bits of precision because we just multiplied the &lt;I&gt;&lt;SPAN&gt;accrued error&lt;/SPAN&gt;&lt;/I&gt; by a factor of one hundred&lt;/SPAN&gt;&lt;/B&gt;.&amp;nbsp; The accrued error is now large enough that the string which most exactly represents the computed value is NOT "920" but rather "919.9999999999999".&amp;nbsp; 
&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt;The ECMAScript specification mandates that JScript display &lt;B&gt;&lt;SPAN&gt;as much precision as possible&lt;/SPAN&gt;&lt;/B&gt; when displaying floating point numbers as strings.&amp;nbsp; 
&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt;You may note that VBScript does not do this -- VBScript has heuristics which look for this situation and deliberately break the round-trip property in order to make this look better.&amp;nbsp; In JScript you are guaranteed that &lt;B&gt;&lt;SPAN&gt;when you convert back and forth between string and binary representations you lose no data&lt;/SPAN&gt;&lt;/B&gt;.&amp;nbsp; In VBScript you sometimes lose data; there are some legal floating point values in VBScript which are impossible to represent in strings accurately. In VBScript it is impossible to represent values like 919.9999999999999 precisely because they are automatically rounded! 
&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt;Ultimately, the reason that this is an issue is because we as human beings see numbers like "920" as SPECIAL.&amp;nbsp; If you multiply 3.63874692874&amp;nbsp; by 4.2984769284 and get a result which is one-billionth of one percent off, no one cares, but when you multiply 9.2 by 100.0 and get a result which is one-billionth off, everyone yells (at me!)&amp;nbsp; &lt;B&gt;&lt;SPAN&gt;The computer doesn't know that 9.2 is more special than 3.63874692874&amp;nbsp;&amp;nbsp; -- it uses the same lossy algorithms for both. 
&lt;P&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/B&gt;&lt;/SPAN&gt;&lt;/FONT&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;FONT face="Lucida Sans Unicode" color=purple size=2&gt;&lt;SPAN&gt;All languages which use floating point arithmetic have this feature -- C++, VBScript, JScript, whatever.&amp;nbsp; If you don't like it, either get in the habit of calling rounding functions, or don't use floating point arithmetic, use only integer arithmetic.&amp;nbsp; (Note that VBScript supports a "currency" type which is fixed-point number not subject to rounding errors.&amp;nbsp; It can only represent four places after the decimal point though.) &lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/DIV&gt;&lt;/SPAN&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=53000" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ericlippert/archive/tags/JScript/default.aspx">JScript</category><category domain="http://blogs.msdn.com/ericlippert/archive/tags/VBScript/default.aspx">VBScript</category><category domain="http://blogs.msdn.com/ericlippert/archive/tags/Scripting/default.aspx">Scripting</category><category domain="http://blogs.msdn.com/ericlippert/archive/tags/Floating+Point+Arithmetic/default.aspx">Floating Point Arithmetic</category></item></channel></rss>