Theory lunches

Princeton Seminars in Theoretical Computer Science,
Spring 2004

(All events held at the Computer Science Department, 35 Olden St., unless stated otherwise)

Schedule

Jan 16	Robi Krauthgamer, IBM Almaden Navigating nets: Simple algorithms for proximity search
Jan 21 (Wed)	James Lee, UC Berkeley Measured descent: New embeddings for finite metrics
Jan 23	Shakhar Smorodinsky, MSRI On Conflict-Free Colorings
Jan 29 (Thurs 11am-12)	Michael Mitzenmacher, Harvard A Brief History of Generative Models for Power Law and Lognormal Distributions, and An Application to File Size Distributions
Feb 6	Matthew Andrews, Lucent Scheduling High Speed Data in Wireless Networks
Feb 13	Jon Feldman, Columbia University LP Decoding Corrects a Constant Fraction of Errors
Feb 20	Howard Karloff, AT&T Opt vs. Load in Dynamic Storage Allocation
Feb 27	Yuval Rabani, Technion and Cornell University Low Distortion Maps between Point Sets
Mar 4 (Thurs)	Julia Chuzhoy, Technion The Hardness of Metric Labeling
Mar 5	Peter Winkler, Bell Labs and IAS Mixing Points in a Line
Mar 12	Nina Mishra, HP Labs and Stanford Large Conjunctive Clusters
Mar 18 (Thurs)	Shuchi Chawla, CMU Approximation Algorithms for Path-Planning Problems
Mar 26	Toniann Pitassi, U. Toronto and IAS Automatizability and PAC Learnability
April 2	Sanjeev Arora, Princeton Expander Flows, Geometric Embeddings, and Graph Partitioning
April 9	Andrew C. Yao, Princeton Graph Entropy and Quantum Sorting Problems
April 16	Ravi Kumar, IBM Almaden A CS perspective on voting
April 23	Manoj Prabhakaran, Princeton Positive Results and Techniques for Obfuscation
April 30	Shengyu Zhang, Princeton Graph properties and circular functions: how low can quantum query complexity go?
May 7	Nir Ailon, Princeton Lower Bounds for Linear Degeneracy Testing
May 13 (Thurs)	Tim Roughgarden, UC Berkeley and Stanford Approximation via Cost Sharing (or, How to Build Good Networks by Flipping Coins)
May 14	No Talk (Columbia \| NYU \| IBM Research Theory Day)
May 20 (Thurs)	Ron Lavi, Hebrew University Online Ascending Auctions for Gradually Expiring Items
May 21	No Talk - CS faculty meeting

Abstracts

Navigating nets: Simple algorithms for proximity search.

Robi Krauthgamer, IBM Almaden

Nearest-neighbor search is the following problem: Given a set $S$ of
$n$ points in a metric space $X$, preprocess $S$ so as to efficiently
find the point in $S$ that is closest to a query point $q\in X$.
Recently, there has been a significant interest in the case of general
metrics, because in many practical applications the underlying metric
is not Euclidean (or another $l_p$-metric).

We devise for this problem a simple and efficient dynamic data structure
(i.e., supports insertions and deletions to $S$) that answers
$1+\epsilon$ approximate nearest neighbor queries for any
$\epsilon>0$. Our algorithms are deterministic and provably correct
with no assumptions on the metric. Their (time and space) complexity
depends on the dimensionality of the data set $S$, which we measure
using a very natural notion borrowed from functional analysis. For
example, if $S$ has constant dimension, the above operations take
logarithmic time (for fixed $\epsilon>0$). Our data structure
outperforms the metric skip list of Karger and Ruhl [STOC, 2002] in
several aspects and is conceptually simpler.

[Joint work with James R. Lee.]

Measured descent: New embeddings for finite metrics

James Lee, UC Berkeley

We give a new proof of Bourgain's theorem based on metric decomposition
and a technique we call "measured descent." This provides a unified
framework for the two primary methods of constructing Frechet-type
embeddings, those based on the work of Bourgain and of Rao. Our
technique yields some interesting refinements of Bourgain's theorem
in terms of important parameters of the metric space--for instance, we
show that every n-point metric embeds into a Euclidean space with
distortion O(sqrt{dim(X) log n}), where dim(X) is the "metric dimension"
of X (which is always at most log n).

We use this technique to answer a number of open questions. For instance,
our embeddings are volume-respecting for subsets of arbitrary size; this
answers positively an open question of Feige. Another application achieves
the optimal dimension necessary to embed planar metrics into l_infty,
answering a question of Rabinovich.

[This is joint work with R. Krauthgamer, M. Mendel, and A. Naor.]

On Conflict-Free Colorings

Shakhar Smorodinsky, MSRI

Motivated by frequency assignment problems in cellular networks, we study
some coloring problems of the following flavor:

What is the minimum number $f(n)$ such that one can assign colors to any
set $P$ of $n$ points in the plane, using a total of at most $f(n)$ colors
and such that this coloring have the following property (which we refer to
as Conflict Free coloring):
For any disc $d$ in the plane, with nonempty intersection with $P$, there
is at least one point of $P$ inside $d$ which has a unique color among the
points inside $d$.

We show that $f(n) = O(\log n)$. This is asymptotically tight in the worst case.
We extend this result to many other classes of ranges (other than disks),
and also to the dual type of problems, where we want to color a given set
of ranges, so that for each point $p$ there is a range with a unique color
among the ranges that contain $p$.

Some parts of this talk is joint work with Guy Even, Zvika Lotker and Dana
Ron (Tel-Aviv University) and some other parts are joint work with Sariel
Har-Peled (UIUC).

A Brief History of Generative Models for Power Law and Lognormal
Distributions, and An Application to File Size Distributions

Michael Mitzenmacher, Harvard

We examine the the long history of power law and lognormal
distributions, showing that although these distributions are relatively
new to the computer science community, they have a long and interesting
history in other communities. We also describe how a new generative model
from economics leads us to a new model for file sizes, explaining why
they might have a power law distribution.

Scheduling High Speed Data in Wireless Networks

Matthew Andrews, Lucent

Wireless networks for transmitting high speed data are becoming
increasingly common. Such networks lead to new and interesting
scheduling problems, in large part because the quality of a wireless
channel constantly changes over time. It is important to schedule in
an opportunistic fashion, i.e. we want to transmit data between two users
during the times when the associated wireless channel has good
quality.

A number of models have been proposed for studying such systems.
These differ according to the assumptions made on the arrival process,
the assumptions made on the channel conditions, and the metrics that
are to be optimized. In this talk we shall survey some of these
models and present scheduling results that arise in each model. We
shall also describe how these models relate to wireless data networks
that are in use today.

LP Decoding Corrects a Constant Fraction of Errors

Jon Feldman, Columbia University

We show that using low-density parity-check (LDPC) codes with sufficient
expansion, the ``Linear Programming (LP) Decoder'' of Feldman, Karger and
Wainwright (Allerton, 2003) corrects a constant fraction of errors in the
channel. A random code will have sufficient expansion with high
probability, and recent work by Capalbo et al. (FOCS, 2003) shows that
such codes can be constructed efficiently.

This result puts LP decoding into the small group of decoding methods
known to correct a constant fraction of error in polynomial time.
(Forney's celebrated GMD algorithm, used along with concatenated codes,
was the first member of this group.)

We prove our bounds by demonstrating a zero-valued dual solution to the
decoding LP, under the condition that fewer than $\alpha n$ errors occur
among the $n$ transmitted bits. The existence of such a solution implies
that the LP succeeds in recovering the original codeword.

[This is joint work with Tal Malkin, Cliff Stein, Rocco Servedio and
Martin Wainwright]

Opt vs. Load in Dynamic Storage Allocation

Howard Karloff, AT&T

Dynamic Storage Allocation is the problem of packing
given axis-aligned rectangles into a horizontal strip of minimum
height by sliding the rectangles vertically but not horizontally.
Stated differently, Dynamic Storage Allocation is either the problem
of allocating memory space for arriving and departing arrays,
or SONET channel assignment.

I will present some new approximation algorithms for
Dynamic Storage Allocation and pose an intriguing open question.

This is joint work with Adam Buchsbaum, Claire Kenyon, Nick Reingold, and Mikkel Thorup.

Low Distortion Maps between Point Sets

Yuval Rabani, Technion and Cornell University

We initiate the study of the {\em minimum distortion} problem: given as input two $n$-point metric spaces, find a bijection between them with minimum distortion. This is an abstraction of certain geometric problems in shape and image matching, and is also a natural variation and extension of the fundamental problems of graph isomorphism and bandwidth. Our focus is on algorithms that find an optimal (or near-optimal) bijection when the distortion is fairly small. We present a polynomial time algorithm that finds an optimal bijection between two line metrics, provided the distortion is at most $2+\sqrt{5}$. We also give a parameterized polynomial time algorithm that finds an optimal bijection between an arbitrary unweighted graph metric and a bounded-degree tree metric.

This is joint work with Claire Kenyon and Alistair Sinclair.

The Hardness of Metric Labeling

Julia Chuzhoy, Technion

The Metric Labeling problem is an elegant and powerful mathematical model capturing a wide range of classification problems. The input to the problem consists of a set of labels and a weighted graph. Additionally, a metric distance function on the labels is defined, and for each label and each vertex, an assignment cost is given. The goal is to find a minimum-cost assignment of the vertices to the labels. The cost of the solution consists of two parts: the assignment costs of the vertices and the separation costs of the edges (each edge pays its weight times the distance between the two labels to which its endpoints are assigned).

Due to the simple structure and variety of the applications, the problem and its special cases (with various distance functions on the labels) received a lot of attention recently. Metric Labeling has a known logarithmic approximation, and it has been an open question for several years whether a constant approximation exists. We refute this possibility and show a $\sqrt{\log n}$ hardness of approximation.

Joint work with Seffi Naor.

Mixing Points in a Line

Peter Winkler, Bell Labs and IAS

Most Markov chain mixing results apply to discrete
chains with state spaces of size exponential in some
parameter n. What if we move to an infinite state
space, of dimension n? It turns out this can make
sense, for instance in considering various ways to
scramble n points on a line (ongoing work with Dana
Randall, Georgia Tech). We give a two-stage coupling
proof in one case, showing mixing in roughly cubic time.

Large Conjunctive Clusters

Nina Mishra, HP Labs and Stanford

We propose a new formulation of the clustering problem where the goal
is to explicitly output a collection of simple and meaningful
conjunctions of attributes that define the clusters. The clusters
discovered may overlap and also may not cover all the points. In
addition, a point may be assigned to a cluster description even if it
only satisfies most, and not necessarily all, of the attributes in the
conjunction.

The problem of finding one best conjunctive cluster is defined as
identifying the conjunction that maximizes the product of the length of
the conjunction with the number of points that satisfy the conjunction.
The problem of finding one conjunctive cluster is equivalent to the
problem of finding a subgraph of a bipartite graph where (a) each
vertex on one side of the subgraph is adjacent to each vertex on the
other side of the subgraph and (b) the number of edges is maximized.
This problem is often called the maximum edge biclique problem. Since
this problem is NP-hard and there is evidence that it is difficult to
approximate, we describe a relaxed version where the objective is to
find a large subgraph that is close to being a biclique. We also
define the problem of finding a collection of diverse, conjunctive
clusters using graph-theoretic terminology.

Simple, randomized, sampling-based algorithms are given that solve the
conjunctive clustering problem. A key property of the algorithms is
their sublinear behavior. The algorithms require a sample of data
points whose size is independent of both the number of data points and
the number of attributes. The running time of the algorithm is linear
in the number of attributes.

Joint work with Dana Ron and Ram Swaminathan.

Approximation Algorithms for Path-Planning Problems

Shuchi Chawla, CMU

Given a weighted graph with values on vertices and a length bound D, we consider the problem of constructing a rooted path of length at most D that visits as many vertices in the graph as possible. This problem is traditionally known as "Orienteering". Path-planning problems such as Orienteering arise in a variety of fields such as robot navigation, manufacturing and production planning. Until recently, no approximation algorithm was known for this problem on general graphs. In this talk, we give a constant factor approximation for the Orienteering problem. We also consider a generalization of this problem -- the Time-Window problem, where every vertex in the graph has a time-window associated with it, and the goal is to maximize the number of nodes visited within their time-windows. We give an O(log^2 n)-approximation for the Time-Window problem.

This is joint work with Nikhil Bansal, Avrim Blum, David Karger, Adam Meyerson, Maria Minkoff and Terran Lane.

Automatizability and PAC Learnability

Toniann Pitassi, U. Toronto and IAS

In this talk we prove new upper and lower bounds on the proper PAC learnability of decision trees, DNF formulas, and intersections of halfspaces. Several of our results were obtained by exploring a new connection between automatizability in proof complexity and learnability.

After explaining this basic connection, we will prove the following new results.

(1) We give new unified upper bounds for proper PAC learning of decision trees and DNF, based on similar known algorithms for automatizability of Resolution.

(2) We show that it is not possible to PAC learn DNF by DNF in polynomial-time unless $NP \subseteq BPP$. The same negative result also holds for proper PAC learning of intersections of halfspaces.

(3) We show that decision trees cannot be proper PAC learned, under a different (less standard) complexity-theoretic assumption.

This is work in progress with Misha Alekhnovich, Mark Braverman, Vitaly Feldman, and Adam Klivans.

Expander Flows, Geometric Embeddings, and Graph Partitioning

Sanjeev Arora, Princeton University

Partitioning a graph into two (or more) large pieces while minimizing
the size of the ``interface'' between them is a fundamental combinatorial

problem. Graph partitions or separators are central objects of study
in the theory of Markov chains, geometric embeddings and are a
natural algorithmic primitive in numerous settings, including
clustering, divide and conquer approaches, PRAM emulation, VLSI layout,
and packet routing in distributed networks.

Classical algorithms for these problems include eigenvalue approaches
(Cheeger 1970, Alon 1985) and multicommodity flows (Leighton and Rao 1988). The
latter yields a O(log n)-approximation, and it has been an open problem to improve
this ratio. We describe a new O(\sqrt{log n})-approximation algorithm.

The algorithm relies on semidefinite relaxations that use the triangle
inequality constraints. Analysing these relaxations had been an important open

problem as well.

We also introduce the notion of {\em expander flows}, which constitute
an interesting ``certificate'' of a graph's expansion. We use them to prove a
surprising new theorem about graph embeddings: for every n-node graph it is possible to embed
an n-node expander graph in it with appropriate dilation and congestion. We think
this embedding theorem may have other applications.

Our techniques are an interesting mix of graph theory and
high-dimensional geometry, including the use of "measure concentration."

The talk will be self-contained, and showcases a newer proof that is simpler

than the one I presented last term at IAS.

(Joint work with Satish Rao and Umesh Vazirani; to appear at ACM STOC 2004)

Graph Entropy and Quantum Sorting Problems

Andrew C. Yao, Princeton

A central question in quantum computing is: how much speedup can be achieved by quantum algorithms over classical ones? To find clues, researchers have studied this question closely in a fundamental computation model called decision trees. In this talk we discuss some recent progress in this area. In particular, we show that the information lower bounds for sorting in the standard decision trees are still valid asymptotically in the quantum model. The proof utilizes some intriguing connections, first raised by Kahn and Kim, between the classical sorting problem, Korner's graph entropy, and Stanley's polytopes associated with partially ordered sets.

A CS perspective on voting

Ravi Kumar, IBM Almaden

Voting and rank aggregation are fundamental concepts in social choice theory
and have been extensively studied for decades. An algorithmic study of
rank aggregation is interesting from a computer science perspective.
We show that a very simple aggregation method based on median ranks
achieves a constant factor approximation to the rank aggregation problem.
Median has several merits as an efficient aggregation operator. We will
also discuss the approximation guarantees offered by the more popular
voting schemes such as Borda, Copeland, and plurality.

Joint work with Cynthia Dwork, Ron Fagin, Mohammad Mahdian, Moni Naor,
D. Sivakumar, and Erik Vee.

Positive Results and Techniques for Obfuscation

Manoj Prabhakaran, Princeton

Informally, an obfuscator is an efficient, probabilistic ``compiler''
that transforms a program P into a new program with the same
functionality as P, but which hides any secrets that may be built into
and used by P. Program obfuscation, if possible, would have numerous
important cryptographic applications, including: (1) Intellectual
property protection of secret algorithms and keys in software, (2)
Solving the long-standing open problem of homomorphic public-key
encryption, (3) Controlled delegation of authority and access, (4)
Transforming Private-Key Encryption into Public-Key Encryption, and
(5) Access Control Systems. Unfortunately however, program
obfuscators that work on arbitrary programs cannot exist (Barak et
al). No positive results for program obfuscation were known prior to
this work.

In this work, we provide the first positive results in program
obfuscation. We focus on the goal of access control, and give several
provable obfuscations for complex access control functionalities, in
the random oracle model. Our results are obtained through non-trivial
compositions of obfuscations; we note that general composition of
obfuscations is impossible, and so developing techniques for composing
obfuscations is an important goal. This work can also be seen as
making initial progress toward the goal of obfuscating finite automata
or regular expressions, an important general class of machines which
are not ruled out by the impossibility results of Barak et al.
This work provides the first formal proof techniques for obfuscation,
which we expect to be useful in future work in this area.

Joint work with Benjamin Lynn and Amit Sahai

Graph properties and circular functions: how low can quantum query complexity go?

Shengyu Zhang, Princeton

In decision tree models, considerable attention has been paid on the effect of symmetry on computational complexity. That is, for a permutation group Gamma, how low can the complexity be for any boolean function invariant under Gamma?

In this paper we investigate this question for quantum decision trees for graph properties, directed graph properties, and circular functions. In particular, we prove that the n-vertex Scorpion graph property has quantum query complexity \tilde \Theta (n^{1/2}), which implies that the minimum quantum complexity for graph properties is strictly less than that for monotone graph properties (known to be \Omega(n^{2/3}). We also find a directed graph property with quantum query complexity \tilde Theta(n^{1/2}) and a N-ary circular function with quantum query complexity \tilde \Omega (N^ {1/4}). Finally, we show that any N-ary function invariant under \Gamma has quantum quantum complexity \Omega(N^{1/4}) as long as \Gamma is transitive.

This implies that our examples are (almost) the best possible.

Joint work with Xiaoming Sun and Andy Yao

Lower Bounds for Linear Degeneracy Testing

Nir Ailon, Princeton

In the late nineties Erickson proved a remarkable lower bound
on the decision tree complexity of
one of the central problems of computational geometry:
given $n$ numbers, do any $r$ of them add up to $0$?
His lower bound of $\Omega(n^{\lceil r/2\rceil})$, for any fixed $r$,
is optimal if the polynomials at the nodes are linear and at most
$r$-variate.

We generalize his bound to $s$-variate polynomials for $s>r$.
Erickson's bound decays quickly as $r$ grows
and never reaches above pseudo-polynomial: we provide
an exponential improvement.

Our arguments are based on three ideas:
(i) a geometrization of Erickson's proof technique;
(ii) the use of error-correcting codes; and
(iii) a tensor product construction for permutation matrices.

This is joint work with Bernard Chazelle.

Approximation via Cost Sharing (or, How to Build Good Networks by Flipping Coins)

Tim Roughgarden, UC Berkeley and Stanford University

In this talk, I will describe a family of randomized approximation
algorithms for several well-studied NP-hard network design problems.
These algorithms are extremely simple, and yet improve over the
previously best known approximation ratios, in some cases by orders
of magnitude. The analysis of these algorithms is driven by a novel
connection between approximation algorithms and cost sharing.

Joint work with Anupam Gupta (CMU), Amit Kumar (IIT Delhi), and
Martin Pal (Cornell).

Online Ascending Auctions for Gradually Expiring Items

Ron Lavi, Hebrew University

In this work we consider online auction mechanisms for the allocation of
M items that are identical to each other except for the fact that the
items have different expiration times, and each item must be allocated
before it expires. A computational application is the allocation of time
slots in a scheduling problem, and an economic application is the allocation
of transportation tickets.

If we ignore incentive issues, then 2-approximation online allocation
algorithms are known. We, however, are interested in situations where
players act "selfishly" and may mis-report values, deadlines, or arrival
times. We first design two auction-like mechanisms and prove that a
3-approximation allocation is obtained for a wide class of "semi-myopic"
selfish behaviors of the players. The rest of the paper aims at providing
a game-theoretic rational justification for acting in such a semi-myopic way.
We first show that the usual notions of incentive compatibility in dominant
strategies can not be used in this case, since any such equilibria can not
obtain better than an M-approximation.

Instead we suggest a new notion of "Set-Nash" equilibria, where we can not
pin-point a single best-response strategy, but rather only a set of possible
best-response strategies. These strategies are all semi-myopic and thus our
mechanisms will perform well on any of them.

We believe that this notion is of independent interest.

(joint work with Noam Nisan)

(archive of past announcements)