Welcome to MSDN Blogs Sign in | Join | Help

Microsoft Excel

The team blog for Microsoft Excel and Excel Services.
Function Improvements in Excel 2010

Thanks to Jessica Liu for putting together the next few posts on function improvements.

In Excel 2010, we made many improvements to Excel's function library. Excel 2010 will feature an accurate and consistent function library while remaining compatible with previous versions of Excel. In this first blog post, I will be giving an overview of the work we did in this area as well as talk about the function accuracy improvements. Subsequent posts will go into the details of the consistency improvements as well as the backward compatibility story.

The first area we invested in was to improve the accuracy of functions. Over the years there have been various academic papers detailing issues in Excel's worksheet functions. In Excel 2003, we started the work to address the most serious of the issues reported in these papers and in Excel 2010 we have addressed even more of these issues. Our goal for Excel 2010 was to address the most significant function accuracy issues reported. For any function we modified, we corrected all known bugs relating to that function.

We implemented new algorithms in order to improve the accuracy of our statistical, financial and math functions. We worked very closely with industry experts to determine which algorithms to use as well as to validate these new algorithms. Our hope is that Excel 2010 users will be able to utilize functions in our library with confidence knowing that they have comparable accuracy to those of other statistical packages.

The other area we invested in was making our function library more consistent. This was in response to the other set of concerns voiced in these academic papers as well as by our users. Users have noted that there were consistency issues with Excel's function names and definitions. In Excel 2010, we will offer users a set of consistently and accurately named functions as well as function definitions that are consistent with user expectations. We have introduced over 50 new functions in order to do this.

Finally, the last piece of work we did in this area was to update the functions user interface. We have improved the function auto complete feature, and we have also made changes to support the new function set.

Improved Function Accuracy

For Excel 2010, we overhauled the function library and implemented completely new algorithms for many of our statistical, financial and math functions. The Excel team partnered with Frontline Systems, the Numerical Algorithms Group, and ScienceOps to select, implement and validate these algorithms.

The algorithms for calculating the follow statistical distribution functions have been modified or redesigned completely for better accuracy:

Binomial distribution

BINOMDIST, CRITBINOM

Chi squared distribution

CHIDIST, CHIINV

Exponential distribution

EXPONDIST

F distribution

FDIST, FINV

Gamma distribution

GAMMADIST, GAMMAINV

Hypergeometric distribution

HYPGEOMDIST

Lognormal distribution

LOGNORMDIST, LOGINV

Negative Binomial distribution

NEGBINOMDIST

Normal distribution

NORMDIST, NORMINV

Standard Normal distribution

NORMSDIST, NORMSINV

Poisson distribution

POISSON

Student's t distribution

TDIST,TINV

Weibull distribution

WEIBULL

The following financial functions have improved accuracy:

Cumulative interest paid on a loan

CUMIPMT

Cumulative principal paid on a loan

CUMPRINC

Interest payment for an investment

IPMT

Internal rate of return for a series of cash flows

IRR

Payment for a loan

PMT

Payment on principal for an investment

PPMT

Internal rate of return for a schedule of cash flows

XIRR

The accuracy of these additional functions has been improved:

Hyperbolic arcsine

ASINH

Ceiling function

CEILING

Convert function

CONVERT

Error function

ERF

Complementary error function

ERFC

Floor function

FLOOR

Natural logarithm of the gamma function

GAMMALN

Geometric mean

GEOMEAN

MOD function

MOD

Random number function

RAND

Sample standard deviation

STDEVS

Sample variation

VARS

As part of the accuracy improvements, we will also accept a larger range of input values and as a result will be returning a wider range of results for certain functions. For example, the ERF and ERFC functions will now take in negative input values, and the MOD function will be able to take larger input values.

In the next post, I will talk about the changes we have made in Excel 2010 to improve the consistency of the function library.

Posted: Thursday, September 10, 2009 9:00 AM by Joseph Chirilov

Comments

lhm said:

Good to see improvements in stats functions, IRR and MOD finally addressed! LINEST also had some problems, I don't know if these were already fixed. It would also be nice to have more functionality for linear and non-linear estimation/interpolation.

Another function that needs attention is PROB. The function can error on a uniform distibution of just 27 values for example. The checksum fails due to the 15-digit precision tolerance as can be checked since =SUM(ROW(1:27)*0+1/27)=1 returns false (array-entered). This function can be very useful for stats on frequency data, something that's not really feasible without UDFs otherwise. eg for ascending data (x) and frequencies (f), LOOKUP(0.5,PROB(x,f/SUM(f),,x),x) returns the (lower) median.

# September 10, 2009 2:02 PM

anon said:

Since you have added over 50 functions whose name is very close to the existing ones (GAMMA.DIST() instead of GAMMADIST()), that will result in a mess with the user unable to choose wisely.

What I would have seen instead is a flag in Excel options (per workbook, or for the application), off by default, where the user can require better accuracy.

That would avoid both breaking changes and a useless row of new functions.

I'd have rather see the Excel functions team work on adding guts to some of the most powerful functions.

I'm curious what is your say on this.

# September 11, 2009 2:17 AM

Colin Banfield said:

I'm interested to know how the RAND function has been improved (repetition period, cryptographically secure or not). In Excel 2003, I found it baffling that the old Wichman-Hill algorithm was used - an algorithm so old that it had to be converted from Fortran. The Mersenne twister algorithm, for example, has been freely available in multiple languages for quite some time, and is vastly superior to Wichman-Hill.  

# September 11, 2009 3:00 AM

Colin Banfield said:

"The Mersenne twister algorithm, for example, has been freely available in multiple languages for quite some time, and is vastly superior to Wichman-Hill."

Not to mention Microsoft's own CryptGenRandom function.

# September 11, 2009 3:24 AM

Chris said:

I would love to see the rand command within VBA given a facelift. It's the same old numbers, time after time.

# September 11, 2009 6:05 AM

Harlan Grove said:

With respect to the inverse probability distribution functions, the main use would be simulation, no? If so, the formulas calling these functions would change upon every recalc. I'm not sure I see a compelling need for backward compatibility in terms of function return values/[in]accuracy.

Nice to see MOD and GAMMALN on the list. Presumably this means MOD is now calling the hardware FPU remainder operation, so can now handle full IEEE 754 double precision. Also, presumably GAMMALN(1) and GAMMALN(2) both now correctly return 0.

The fix to GEOMEAN presumably means it's now implemented as =EXP(AVERAGE(LN(data))) rather than as =PRODUCT(data)^(1/COUNT(data)).

# September 11, 2009 12:41 PM

Rafael Nicolas Fermin Cota said:

You should take a look at all the stats, matrix, and optimization functions I created for Excel & Access.

Nico

# September 12, 2009 7:19 PM

Simon Murphy said:

Anon

I prefer the additional function route rather the workbook flag approach you prefer.

I think few people check that sort of setting, so its better to be explicit in the face of the worksheet.

I agree it adds complexity, but that is the trade off for power and flexibility.

Having a worksheet/book flag would lead to lots of head scratching when two apparently identical functions in 2 worksheets/books give inconsistent results.

# September 14, 2009 5:49 AM

Mike Staunton said:

Good news, but long overdue and, without knowing which improved algorithms have been used, it's impossible to be really happy - for instance, the best application for NormSInv that I know is from Feb 2009

And will the last of the former ATP functions, that for Fourier transforms, finally get a proper built-in function?

# September 14, 2009 6:10 AM

Jessica Liu said:

anon & Simon: I will be describing the UI changes we have made to help users distinguish and choose between functions that are similarly named in a future blog post. So stay tuned for that!

Colin: We are using the Mersenne Twister for RAND.

# September 14, 2009 1:49 PM

Colin Banfield said:

Jessica, that's good news. Would there be a similar improvement in the VBA Rnd function?

# September 14, 2009 2:28 PM

Jessica Liu said:

Colin: We did not change VBA's RND function since that is part of VBA's math library. All of the other worksheet functions we've added are available through VBA.

# September 16, 2009 12:42 PM

Colin Banfield said:

Well, at least we can use the updated RAND function in VBA through [RAND()]

# September 16, 2009 1:28 PM

Jason said:

I think it would be great if you could add a function to do a simple linear interpolation, such as in the above link.  

# September 17, 2009 3:46 PM

Doug Jenkins said:

"We did not change VBA's RND function since that is part of VBA's math library. All of the other worksheet functions we've added are available through VBA."

Having the worksheet functions available to VBA is no substitute for implementing native VBA functions to the same standard.

Well not unless you have also provided a dramatic reduction in the time penalty of using a worksheetfunction call from VBA.

# September 30, 2009 2:32 AM

AdamV said:

Really please to see these improvements, and the whitepaper does add some great detail.

http://wp.me/p2I5L-44

# October 9, 2009 5:16 AM

joeu2004 said:

[Forgive me if this is another duplicate.  It will be my last try for now.]

Has the following Excel 2003 problem been fixed in 2010 (or 2007)?

INT(123456789 - 0.0000004) returns 123456789 instead of 123456788.

This causes a problem in formulas like the following:  if A1 is =123456789-0.0000004, =A1-INT(A1) is negative unexpectedly, about -4.917E-07 when formatted as General.

In contrast, INT(123456789 - 0.0000005) returns 123456788 as expected, and myInt(123456789 - 0.0000004) returns 123456788, where myInt() is the following UDF:

Function myInt(x as Double) as Double

myInt = Int(x)

End Function

Note that 123456789 - 0.0000004 is represented internally as about 123456788.999999,598, whereas 123456789 - 0.0000005 is about 123456788.999999,493 internally.  (The comma demarcates the first 15 significant digits.)

So I suspect that the Excel INT algorithm is effectively, albeit perhaps unintentionally, rounding its argument to 15 significant digits before truncating to an integer.  It shouldn't.

Indeed, the largest expression involving 123456789 that returns an incorrect INT value is 123456789 - 33*2^-26, which is represented internally as about 123456788.999999,508, whereas 123456789 - 34*2^-26 is about 123456789.999999,493 internally.

As you might imagine, the problem is not limited to 123456789 and 0.0000004.  And the problem will not appear with some combinations that you might think are similar, e.g. 100000000 and 0.0000004.

You need to look at the exact conversion of the internal binary representation -- that is, beyond the first 15 significant digits -- to determine whether or not to expect a problem.  I have VBA code that will help with that, if you need it.

# October 10, 2009 12:21 AM
New Comments to this post are disabled
Page view tracker