Designing a new general-purpose language

Another motivating example for load-time configuration

I was thinking the other day about certain computationally intensive numerical problems, particularly matrix multiplication, finding the eigenvalues of a matrix, and the Fast Fourier Transform, and how choosing the right algorithm depends on such intricasies as multiprocessor architecture, cache performance, etc.  Actually, "choosing the right algorithm" is more a more difficult task than it sounds, as I will explain.  For each of these problems, there exists a divide and conquer algorithm with an asymptotically better performance than the straightforward approach.  However, the divide and conquer algorithm performs worse for problem sizes below a certain threshold.  Therefore, it is common practice to use the divide and conquer algorithm down to a certain size, and then switch to the straightforward algorithm.  To give a more familiar example for those who haven't done much numerical programming, think of quicksort.  Usually it makes sense to use quicksort until the problem size is small enough and then use bubble sort.

There are important differences between quicksort and the numerical problems I mentioned above:

1. These numerical problems are highly parallelizable, making multiprocessor architecture particularly relevant in choosing the right algorithm.

2. They generally deal with huge amounts of data, making cache performance (and those other variables that affect it, such as cache size, available memory, memory latency, etc.) relevant.

3. They are very computationally intensive, making even small improvements in performance welcome.

Let's pick one of these problems and look at it in more detail: matrix multiplication.  The straightforward algorithm for matrix multiplication multiplies two n×n matrices in O(n3) time.  There is another algorithm, the Strassen algorithm, that does it in O(n2.807) time, but for small matrices it is generally much slower (Strassen's algorithm is also less numerically stable than the straightforward algorithm, but sometimes this is acceptable, and for some classes of matrices, it may be known in advance that this numerical instability has a much lesser effect than the general case).  Strassen's algorithm is divide and conquer, so it is possible to use it down to a certain size and then switch over.  What is that size?  This is highly dependent on the factors mentioned above.  Generally, to get the best possible performance, it will be necessary to do some profiling and analyze the results to find the optimum cutoff size.

There are a few possibilities of how to get the data from profiling into the program or library.  In languages where it is feasible to do so, you could generate some code (probably something pretty simple) that captures this data, and include it into your program.  For example, in C++, you could generate an include file containing:

#define STRASSEN_CUTOFF 10000

and whatever other code is necessary to make it work.  The problem with this approach is that if you wish to port the solution to another computer with the same architecture but a different hardware configuration, or even upgrade the same computer, for example adding more memory, you need to recompile the library to get the best performance.

Another possibility is to get the cutoff value from a configuration file, which is read at run time.  The problem with this is that you lose some potential for certain optimizations.  For example, there may be optimizations possible for the straightforward algorithm that depend on knowing the maximum size of the input.

With load-time configuration information, combined with just-in-time compiling, you would get the best of both worlds.  For example, the provider of the libarary could provide a large number of configuration files that are optimized for certain hardware configurations, and select one with a tool based on the particular machine's configuration, or it could even provide a tool that generates the configuration file from scratch.  To get these benefits, however, I believe you really do need the language to provide such a configuration feature natively.

Published Thursday, May 04, 2006 10:37 PM by hartwil

Comments

No Comments
Anonymous comments are disabled

© 2009 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker