Fabulous Adventures In Coding
Eric Lippert is a principal developer on the C# compiler team. Learn more about Eric.
A month ago I was discussing some of the
Before I talk about some of the things that can go terribly wrong with floating point arithmetic, it's helpful (and character building) to understand how exactly a floating point number is represented internally.
To distinguish between decimal and binary numbers, I'm going to do all binary numbers in
Here's how floating point numbers work. A float is 64 bits. Of that, one bit represents the sign:
Eleven bits represent the exponent. To determine the exponent value, treat the exponent field as an eleven-bit unsigned integer, then subtract 1023. However, note that the exponent fields
The remaining 52 bits represent the mantissa.
To compute the value of a float, here's what you do. You take the mantissa, and you stick a "
So for example, the number -5.5 is represented like this: (sign, exponent, mantissa)
The sign is
This system is nice because it means that every number in the range of a float has a unique representation, and therefore doesn't waste bits on duplicates.
However, you might be wondering how zero is represented, since every bit pattern has
If the exponent
So the biggest and smallest positive normalized floats are
(0, 11111111110, 1111111111111111111111111111111111111111111111111111)
(0, 00000000001, 0000000000000000000000000000000000000000000000000000)
The biggest and smallest positive denormalized floats are
(0, 00000000000, 0000000000000000000000000000000000000000000000000001)
(0, 00000000000, 1111111111111111111111111111111111111111111111111111)
Next time: floating point math is nothing like real number math.
Mat Hall said: "....n a PDP-10..".
Do you have a PDP-10 to show proof of that????
If you are interested in old DEC hardware [I have a PDP-8 a few PDP-11's and Vaxen], please drop me a note: email@example.com
eric - forgive me, but can you explain how you 'compute that value as a 53 bit fraction with 52 fractional places' .. thanks
Because there at 52 "actual" bits + one "implied" bit at the beginning. Go back and look at the example:
011000.. is 1.01100...
what is the hardware difference between decimal floating point arithmetic(DFA) and Binary FA.
because in actual decimal no are also stored as 1's and 0's format howthat differs and helpful over BFA
can anyone tell me what is the output when input is qnan or snan