Navigating nets: Simple algorithms for
proximity search.
Robi Krauthgamer, IBM Almaden
Nearest-neighbor search is the following problem: Given a set $S$
of
$n$ points in a metric space $X$, preprocess $S$ so as to
efficiently
find the point in $S$ that is closest to a query point $q\in X$.
Recently, there has been a significant interest in the case of
general
metrics, because in many practical applications the underlying
metric
is not Euclidean (or another $l_p$-metric).
We devise for this problem a simple and efficient dynamic data
structure
(i.e., supports insertions and deletions to $S$) that answers
$1+\epsilon$ approximate nearest neighbor queries for any
$\epsilon>0$. Our algorithms are deterministic and provably
correct
with no assumptions on the metric. Their (time and space) complexity
depends on the dimensionality of the data set $S$, which we measure
using a very natural notion borrowed from functional analysis. For
example, if $S$ has constant dimension, the above operations take
logarithmic time (for fixed $\epsilon>0$). Our data structure
outperforms the metric skip list of Karger and Ruhl [STOC, 2002] in
several aspects and is conceptually simpler.
[Joint work with James R. Lee.]
|
Measured descent: New embeddings for finite
metrics
James Lee, UC Berkeley
We give a new proof of Bourgain's theorem based on metric
decomposition
and a technique we call "measured descent." This provides
a unified
framework for the two primary methods of constructing Frechet-type
embeddings, those based on the work of Bourgain and of Rao. Our
technique yields some interesting refinements of Bourgain's theorem
in terms of important parameters of the metric space--for instance,
we
show that every n-point metric embeds into a Euclidean space with
distortion O(sqrt{dim(X) log n}), where dim(X) is the "metric
dimension"
of X (which is always at most log n).
We use this technique to answer a number of open questions. For
instance,
our embeddings are volume-respecting for subsets of arbitrary size;
this
answers positively an open question of Feige. Another application
achieves
the optimal dimension necessary to embed planar metrics into l_infty,
answering a question of Rabinovich.
[This is joint work with R. Krauthgamer, M. Mendel, and A. Naor.]
|
On Conflict-Free Colorings
Shakhar Smorodinsky, MSRI
Motivated by frequency assignment problems in cellular networks,
we study
some coloring problems of the following flavor:
What is the minimum number $f(n)$ such that one can assign colors
to any
set $P$ of $n$ points in the plane, using a total of at most $f(n)$
colors
and such that this coloring have the following property (which we
refer to
as Conflict Free coloring):
For any disc $d$ in the plane, with nonempty intersection with $P$,
there
is at least one point of $P$ inside $d$ which has a unique color
among the
points inside $d$.
We show that $f(n) = O(\log n)$. This is asymptotically tight in
the worst case.
We extend this result to many other classes of ranges (other than
disks),
and also to the dual type of problems, where we want to color a
given set
of ranges, so that for each point $p$ there is a range with a unique
color
among the ranges that contain $p$.
Some parts of this talk is joint work with Guy Even, Zvika Lotker
and Dana
Ron (Tel-Aviv University) and some other parts are joint work with
Sariel
Har-Peled (UIUC).
|
A Brief History of Generative Models
for Power Law and Lognormal
Distributions, and An Application to File Size Distributions
Michael Mitzenmacher, Harvard
We examine the the long history of power law and lognormal
distributions, showing that although these distributions are
relatively
new to the computer science community, they have a long and
interesting
history in other communities. We also describe how a new generative
model
from economics leads us to a new model for file sizes, explaining
why
they might have a power law distribution.
|
Scheduling High Speed Data in Wireless
Networks
Matthew Andrews, Lucent
Wireless networks for transmitting high speed data are becoming
increasingly common. Such networks lead to new and interesting
scheduling problems, in large part because the quality of a wireless
channel constantly changes over time. It is important to schedule in
an opportunistic fashion, i.e. we want to transmit data between two
users
during the times when the associated wireless channel has good
quality.
A number of models have been proposed for studying such systems.
These differ according to the assumptions made on the arrival
process,
the assumptions made on the channel conditions, and the metrics that
are to be optimized. In this talk we shall survey some of these
models and present scheduling results that arise in each model. We
shall also describe how these models relate to wireless data
networks
that are in use today.
|
LP Decoding Corrects a Constant Fraction of
Errors
Jon Feldman, Columbia University
We show that using low-density parity-check (LDPC) codes with
sufficient
expansion, the ``Linear Programming (LP) Decoder'' of Feldman,
Karger and
Wainwright (Allerton, 2003) corrects a constant fraction of errors
in the
channel. A random code will have sufficient expansion with high
probability, and recent work by Capalbo et al. (FOCS, 2003) shows
that
such codes can be constructed efficiently.
This result puts LP decoding into the small group of decoding
methods
known to correct a constant fraction of error in polynomial time.
(Forney's celebrated GMD algorithm, used along with concatenated
codes,
was the first member of this group.)
We prove our bounds by demonstrating a zero-valued dual solution
to the
decoding LP, under the condition that fewer than $\alpha n$ errors
occur
among the $n$ transmitted bits. The existence of such a solution
implies
that the LP succeeds in recovering the original codeword.
[This is joint work with Tal Malkin, Cliff Stein, Rocco Servedio
and
Martin Wainwright]
|
Opt vs. Load in Dynamic Storage Allocation
Howard Karloff, AT&T
Dynamic Storage Allocation is the problem of packing
given axis-aligned rectangles into a horizontal strip of
minimum
height by sliding the rectangles vertically but not horizontally.
Stated differently, Dynamic Storage Allocation is either the
problem
of allocating memory space for arriving and departing arrays,
or SONET channel assignment.
I will present some new approximation algorithms for
Dynamic Storage Allocation and pose an intriguing open question.
This is joint work with Adam Buchsbaum, Claire Kenyon, Nick
Reingold, and Mikkel Thorup.
|
Low Distortion Maps between Point Sets
Yuval Rabani, Technion and Cornell University
We initiate the study of the {\em minimum distortion} problem:
given as input two $n$-point metric spaces, find a bijection between
them with minimum distortion. This is an abstraction of certain
geometric problems in shape and image matching, and is also a
natural variation and extension of the fundamental problems of graph
isomorphism and bandwidth. Our focus is on algorithms that find an
optimal (or near-optimal) bijection when the distortion is fairly
small. We present a polynomial time algorithm that finds an optimal
bijection between two line metrics, provided the distortion is at
most $2+\sqrt{5}$. We also give a parameterized polynomial time
algorithm that finds an optimal bijection between an arbitrary
unweighted graph metric and a bounded-degree tree metric.
This is joint work with Claire Kenyon and Alistair Sinclair.
|
The Hardness of Metric Labeling
Julia Chuzhoy, Technion
The Metric Labeling problem is an elegant and powerful
mathematical model capturing a wide range of classification
problems. The input to the problem consists of a set of labels and a
weighted graph. Additionally, a metric distance function on the
labels is defined, and for each label and each vertex, an assignment
cost is given. The goal is to find a minimum-cost assignment of the
vertices to the labels. The cost of the solution consists of two
parts: the assignment costs of the vertices and the separation costs
of the edges (each edge pays its weight times the distance between
the two labels to which its endpoints are assigned).
Due to the simple structure and variety of the applications, the
problem and its special cases (with various distance functions on
the labels) received a lot of attention recently. Metric Labeling
has a known logarithmic approximation, and it has been an open
question for several years whether a constant approximation exists.
We refute this possibility and show a $\sqrt{\log n}$ hardness of
approximation.
Joint work with Seffi Naor.
|
Mixing Points in a Line
Peter Winkler, Bell Labs and IAS
Most Markov chain mixing results apply to discrete
chains with state spaces of size exponential in some
parameter n. What if we move to an infinite state
space, of dimension n? It turns out this can make
sense, for instance in considering various ways to
scramble n points on a line (ongoing work with Dana
Randall, Georgia Tech). We give a two-stage coupling
proof in one case, showing mixing in roughly cubic time.
|
Large Conjunctive Clusters
Nina Mishra, HP Labs and Stanford
We propose a new formulation of the clustering problem where the
goal
is to explicitly output a collection of simple and meaningful
conjunctions of attributes that define the clusters. The clusters
discovered may overlap and also may not cover all the points. In
addition, a point may be assigned to a cluster description even if
it
only satisfies most, and not necessarily all, of the attributes in
the
conjunction.
The problem of finding one best conjunctive cluster is defined as
identifying the conjunction that maximizes the product of the length
of
the conjunction with the number of points that satisfy the
conjunction.
The problem of finding one conjunctive cluster is equivalent to the
problem of finding a subgraph of a bipartite graph where (a) each
vertex on one side of the subgraph is adjacent to each vertex on the
other side of the subgraph and (b) the number of edges is maximized.
This problem is often called the maximum edge biclique problem.
Since
this problem is NP-hard and there is evidence that it is difficult
to
approximate, we describe a relaxed version where the objective is to
find a large subgraph that is close to being a biclique. We also
define the problem of finding a collection of diverse, conjunctive
clusters using graph-theoretic terminology.
Simple, randomized, sampling-based algorithms are given that
solve the
conjunctive clustering problem. A key property of the algorithms is
their sublinear behavior. The algorithms require a sample of data
points whose size is independent of both the number of data points
and
the number of attributes. The running time of the algorithm is
linear
in the number of attributes.
Joint work with Dana Ron and Ram Swaminathan.
|
Approximation Algorithms for Path-Planning
Problems
Shuchi Chawla, CMU
Given a weighted graph with values on vertices and a length bound
D, we consider the problem of constructing a rooted path of length
at most D that visits as many vertices in the graph as possible.
This problem is traditionally known as "Orienteering".
Path-planning problems such as Orienteering arise in a variety of
fields such as robot navigation, manufacturing and production
planning. Until recently, no approximation algorithm was known for
this problem on general graphs. In this talk, we give a constant
factor approximation for the Orienteering problem. We also consider
a generalization of this problem -- the Time-Window problem, where
every vertex in the graph has a time-window associated with it, and
the goal is to maximize the number of nodes visited within their
time-windows. We give an O(log^2 n)-approximation for the
Time-Window problem.
This is joint work with Nikhil Bansal, Avrim Blum, David Karger,
Adam Meyerson, Maria Minkoff and Terran Lane.
|
Automatizability and PAC Learnability
Toniann Pitassi, U. Toronto and IAS
In this talk we prove new upper and lower bounds on the proper
PAC learnability of decision trees, DNF formulas, and intersections
of halfspaces. Several of our results were obtained by exploring a
new connection between automatizability in proof complexity and
learnability.
After explaining this basic connection, we will prove the
following new results.
(1) We give new unified upper bounds for proper PAC learning of
decision trees and DNF, based on similar known algorithms for
automatizability of Resolution.
(2) We show that it is not possible to PAC learn DNF by DNF in
polynomial-time unless $NP \subseteq BPP$. The same negative result
also holds for proper PAC learning of intersections of halfspaces.
(3) We show that decision trees cannot be proper PAC learned,
under a different (less standard) complexity-theoretic assumption.
This is work in progress with Misha Alekhnovich, Mark Braverman,
Vitaly Feldman, and Adam Klivans.
|
Expander Flows, Geometric
Embeddings, and Graph Partitioning
Sanjeev Arora, Princeton University
Partitioning a graph into
two (or more) large pieces while minimizing
the size of the ``interface'' between them is a fundamental
combinatorial
problem.
Graph partitions or separators are central objects of study
in the theory of Markov chains, geometric embeddings and are a
natural algorithmic primitive in numerous settings, including
clustering, divide and conquer approaches, PRAM emulation, VLSI
layout,
and packet routing in distributed networks.
Classical algorithms for
these problems include eigenvalue approaches
(Cheeger 1970, Alon 1985) and multicommodity flows
(Leighton and Rao 1988). The
latter yields a O(log n)-approximation, and it has been an open
problem to improve
this ratio. We describe a new O(\sqrt{log n})-approximation
algorithm.
The algorithm relies on semidefinite relaxations that use the
triangle
inequality constraints. Analysing these relaxations had
been an important open
problem as well.
We also introduce the notion of {\em expander flows}, which
constitute
an interesting ``certificate'' of a graph's expansion. We
use them to prove a
surprising new theorem about graph embeddings: for every n-node
graph it is possible to embed
an n-node expander graph in it with appropriate dilation and
congestion. We think
this embedding theorem may have other applications.
Our techniques are an interesting mix of graph theory and
high-dimensional geometry, including the use of "measure
concentration."
The talk will be
self-contained, and showcases a newer proof that is simpler
than the one I
presented last term at IAS.
(Joint work with Satish Rao and Umesh Vazirani; to appear at ACM
STOC 2004)
|
Graph Entropy and Quantum Sorting Problems
Andrew C. Yao, Princeton
A central question in quantum computing is: how much speedup can
be achieved by quantum algorithms over classical ones? To find
clues, researchers have studied this question closely in a
fundamental computation model called decision trees. In this
talk we discuss some recent progress in this area. In particular, we
show that the information lower bounds for sorting in the standard
decision trees are still valid asymptotically in the quantum
model. The proof utilizes some intriguing connections, first raised
by Kahn and Kim, between the classical sorting problem, Korner's
graph entropy, and Stanley's polytopes associated with partially
ordered sets.
|
A CS perspective on voting
Ravi Kumar, IBM Almaden
Voting and rank aggregation are fundamental concepts in social
choice theory
and have been extensively studied for decades. An algorithmic study
of
rank aggregation is interesting from a computer science perspective.
We show that a very simple aggregation method based on median ranks
achieves a constant factor approximation to the rank aggregation
problem.
Median has several merits as an efficient aggregation operator. We
will
also discuss the approximation guarantees offered by the more
popular
voting schemes such as Borda, Copeland, and plurality.
Joint work with Cynthia Dwork, Ron Fagin, Mohammad Mahdian, Moni
Naor,
D. Sivakumar, and Erik Vee.
|
Positive Results and Techniques for
Obfuscation
Manoj Prabhakaran, Princeton
Informally, an obfuscator is an efficient, probabilistic
``compiler''
that transforms a program P into a new program with the same
functionality as P, but which hides any secrets that may be built
into
and used by P. Program obfuscation, if possible, would have numerous
important cryptographic applications, including: (1) Intellectual
property protection of secret algorithms and keys in software, (2)
Solving the long-standing open problem of homomorphic public-key
encryption, (3) Controlled delegation of authority and access, (4)
Transforming Private-Key Encryption into Public-Key Encryption, and
(5) Access Control Systems. Unfortunately however, program
obfuscators that work on arbitrary programs cannot exist (Barak et
al). No positive results for program obfuscation were known prior to
this work.
In this work, we provide the first positive results in program
obfuscation. We focus on the goal of access control, and give
several
provable obfuscations for complex access control functionalities, in
the random oracle model. Our results are obtained through
non-trivial
compositions of obfuscations; we note that general composition of
obfuscations is impossible, and so developing techniques for
composing
obfuscations is an important goal. This work can also be seen as
making initial progress toward the goal of obfuscating finite
automata
or regular expressions, an important general class of machines which
are not ruled out by the impossibility results of Barak et al.
This work provides the first formal proof techniques for
obfuscation,
which we expect to be useful in future work in this area.
Joint work with Benjamin Lynn and Amit Sahai
|
Graph properties and circular functions:
how low can quantum query complexity go?
Shengyu Zhang, Princeton
In decision tree models, considerable attention has been paid on
the effect of symmetry on computational complexity. That is, for a
permutation group Gamma, how low can the complexity be for any
boolean function invariant under Gamma?
In this paper we investigate this question for quantum decision
trees for graph properties, directed graph properties, and circular
functions. In particular, we prove that the n-vertex Scorpion graph
property has quantum query complexity \tilde \Theta (n^{1/2}), which
implies that the minimum quantum complexity for graph properties is
strictly less than that for monotone graph properties (known to be \Omega(n^{2/3}).
We also find a directed graph property with quantum query complexity
\tilde Theta(n^{1/2}) and a N-ary circular function with quantum
query complexity \tilde \Omega (N^ {1/4}). Finally, we show that any
N-ary function invariant under \Gamma has quantum quantum complexity
\Omega(N^{1/4}) as long as \Gamma is transitive.
This implies that our examples are (almost) the best possible.
Joint work with Xiaoming Sun and Andy Yao
|
Lower Bounds for Linear Degeneracy Testing
Nir Ailon, Princeton
In the late nineties Erickson proved a remarkable lower bound
on the decision tree complexity of
one of the central problems of computational geometry:
given $n$ numbers, do any $r$ of them add up to $0$?
His lower bound of $\Omega(n^{\lceil r/2\rceil})$, for any fixed
$r$,
is optimal if the polynomials at the nodes are linear and at most
$r$-variate.
We generalize his bound to $s$-variate polynomials for $s>r$.
Erickson's bound decays quickly as $r$ grows
and never reaches above pseudo-polynomial: we provide
an exponential improvement.
Our arguments are based on three ideas:
(i) a geometrization of Erickson's proof technique;
(ii) the use of error-correcting codes; and
(iii) a tensor product construction for permutation matrices.
This is joint work with Bernard Chazelle.
|
Approximation via Cost Sharing (or, How to
Build Good Networks by Flipping Coins)
Tim Roughgarden, UC Berkeley and Stanford University
In this talk, I will describe a family of randomized
approximation
algorithms for several well-studied NP-hard network design problems.
These algorithms are extremely simple, and yet improve over the
previously best known approximation ratios, in some cases by orders
of magnitude. The analysis of these algorithms is driven by a novel
connection between approximation algorithms and cost sharing.
Joint work with Anupam Gupta (CMU), Amit Kumar (IIT Delhi),
and
Martin Pal (Cornell).
|
Online Ascending Auctions for Gradually
Expiring Items
Ron Lavi, Hebrew University
In this work we consider online auction mechanisms for the
allocation of
M items that are identical to each other except for the fact that
the
items have different expiration times, and each item must be
allocated
before it expires. A computational application is the allocation of
time
slots in a scheduling problem, and an economic application is the
allocation
of transportation tickets.
If we ignore incentive issues, then 2-approximation online
allocation
algorithms are known. We, however, are interested in situations
where
players act "selfishly" and may mis-report values,
deadlines, or arrival
times. We first design two auction-like mechanisms and prove that a
3-approximation allocation is obtained for a wide class of
"semi-myopic"
selfish behaviors of the players. The rest of the paper aims at
providing
a game-theoretic rational justification for acting in such a
semi-myopic way.
We first show that the usual notions of incentive compatibility in
dominant
strategies can not be used in this case, since any such equilibria
can not
obtain better than an M-approximation.
Instead we suggest a new notion of "Set-Nash"
equilibria, where we can not
pin-point a single best-response strategy, but rather only a set of
possible
best-response strategies. These strategies are all semi-myopic and
thus our
mechanisms will perform well on any of them.
We believe that this notion is of independent interest.
(joint work with Noam Nisan)
|
|
|
|