piotr indyk talk

CS Colloquium, April 3, 3:30pm.

Computer Science Dept Room 105 (Small Auditorium).

High-Dimensional Computational Geometry

Piotr Indyk

Stanford University

Computing with massive and high-dimensional data is critical to a large and diverse set of applications, including multimedia and hyperlinked databases (the World Wide Web being the prime example), data mining, machine learning, computational statistics, and vector quantization/ compression. Improving performance in the above applications has been an important research goal in a variety of fields, including Computational Geometry and Databases. Unfortunately, the running times of the algorithms discovered so far depend exponentially on the dimension, which usually makes them inefficient in the aforementioned applications.

In my talk, I will show that it is possible to overcome this ``curse of dimensionality'' and obtain algorithms that are efficient in theory and practice, as long as one is satisfied with approximate answers. The algorithms employ novel techniques which are very different from the ones used in standard (low-dimensional) Computational Geometry.

In particular, I will describe/address the following:

- approximate nearest-neighbor algorithms for normed spaces with running time *polynomial* in dimension and logarithmic or sublinear in the data size

- algorithmic reductions to the above problem from a large set of Computational Geometry problems, including closest pair, diameter, minimum spanning tree, facility location and other forms of clustering

- extension of the above algorithms to spaces which are not normed via embeddings

- efficient algorithms in general metric spaces, and

- applications of the above techniques to problems outside of Computational Geometry

As a ``proof-of-concept'', I will present initial results of an ongoing project on clustering the Web.