All of this stuff about high dimensional geometry has been known for quite a while. The novel bit that this paper is all about is how to actually build a fast index for searching a high-dimensional space where the query has a strong likelihood of being

Brad Dodson

This is an interesting idea. Personally I've written a nearest-neighbor search system based on approximate k-d tree search called "Best Bin First" from a paper by David Lowe. It'd be interesting to know how they compare. It may actually worthwhile for me to implement this algorithm (or perhaps LSH) for the system I was doing (it was comparing scale-invariant features in images), since junk queries were definitely a problem for me (once the nearest neighbor is far enough away, I don't really care anymore, even though these queries are the ones taking the longest most of the time).
<p>Thanks for the great article!</p>
http://www.princeton.edu/artofscience/gallery/view.php?id=40.html
Eric W. Bachtal

Thanks for the series. It was mostly a head scratching exercise for me, but enjoyable nonetheless. In the spirit of searching millions of data points, I thought you might enjoy this:

"This graphic comes from a dynamic asset allocation problem in railroads. The system, which is now in production at Norfolk Southern railroad, is the first production implementation of a stochastic, dynamic programming model in freight transportation. The model is based on the Ph.D. dissertation of Huseyin Topaloglu (now a professor at Cornell University) for stochastic, integer multicommodity flow problems. The original research was modified to handle multidimensional attribute problems, with millions of asset types."

foxyshadis

Scripting question: When does JScript get hyperdimensional math language constructs? ;)