CS Colloquium, April 3, 3:30pm.
Computer Science Dept Room 105 (Small Auditorium).
High-Dimensional Computational Geometry
Piotr Indyk
Stanford University
Computing with massive and high-dimensional data is critical to a
large and diverse set of applications, including multimedia and hyperlinked databases (the
World Wide Web being the prime example), data mining, machine learning, computational
statistics, and vector quantization/ compression. Improving performance in the above
applications has been an important research goal in a variety of fields, including
Computational Geometry and Databases. Unfortunately, the running times of the algorithms
discovered so far depend exponentially on the dimension, which usually makes them
inefficient in the aforementioned applications. In my talk, I will show that it is possible to overcome this ``curse of dimensionality'' and obtain algorithms that are efficient in theory and practice, as long as one is satisfied with approximate answers. The algorithms employ novel techniques which are very different from the ones used in standard (low-dimensional) Computational Geometry. In particular, I will describe/address the following: - approximate nearest-neighbor algorithms for normed spaces with running time *polynomial* in dimension and logarithmic or sublinear in the data size - algorithmic reductions to the above problem from a large set of Computational Geometry problems, including closest pair, diameter, minimum spanning tree, facility location and other forms of clustering - extension of the above algorithms to spaces which are not normed via embeddings - efficient algorithms in general metric spaces, and - applications of the above techniques to problems outside of Computational Geometry As a ``proof-of-concept'', I will present initial results of an ongoing project on clustering the Web. |