Welcome to the Princeton Theory Lunch page! The Princeton TCS Seminar, fondly known as Theory Lunch, is a weekly seminar organized by the Princeton TCS group every Friday from 12:15 pm to 1:15 pm, with food usually served at 12:00 pm. It is open to all members of the Princeton community. Note the new time

Talks are also simultaneously broadcasted via Zoom. The Zoom links are shared every week by email. All talks are recorded and uploaded on our YouTube channel, Princeton TCS.

This semester theory lunch is organized by Linda Cai and Barak Nehoran.

If you'd like to be notified about future Theory Lunch talks please subscribe to the theory-read mailing list.

## Spring 2023 Schedule

Faster Walsh-Hadamard Transform from Matrix Non-Rigidity

Josh Alman from IAS

February 10, 2023 | Computer Science 105

The Walsh-Hadamard transform is a simple recursively-defined linear transformation with many applications throughout algorithm design and complexity theory. The folklore "fast Walsh-Hadamard transform" algorithm computes it using only N log N arithmetic operations, which is believed to be optimal up to constant factors. In this talk, I'll give two improved constructions:

- A new algorithm which uses $(23/24) N log N + O(N)$ arithmetic operations. This is, to my knowledge, the first improvement of the leading constant. (Based on joint work with Kevin Rao.)

- A new depth-2 linear circuit of size $O(N^{1.45})$, improving on size $O(N^{1.5})$ which one gets from the fast Walsh-Hadamard transform approach. (Based on joint work with Ashwin Padaki and Yunfeng Guan.)

A key idea behind both constructions is to take advantage of the matrix non-rigidty of the Walsh-Hadamard transform. A matrix is called "rigid" if it's impossible to change a "small" number of its entries to make it have "low" rank. L. Valiant showed that if a matrix is rigid, then this implies lower bounds for computing its linear transformation. However, a recent line of work has shown that many matrices of interest, including the Walsh-Hadamard transform, are actually not rigid. Although a formal converse to Valiant's result is not known, we'll make use of non-rigidity in our new constructions.

A Subpolynomial Approximation Algorithm for Graph Crossing Number in Low-Degree Graphs

Zihan Tan from Rutgers / DIMACS

February 17, 2023 | Computer Science 105

Graph Crossing Number is a fundamental and extensively studied problem with wide ranging applications. In this problem, the goal is to draw an input graph G in the plane so as to minimize the number of crossings between the images of its edges. The problem is notoriously difficult, and despite extensive work, non-trivial approximation algorithms are only known for bounded-degree graphs. Even for this special case, the best current algorithm achieves a O(n)-approximation, while the best current negative results do not rule out constant-factor approximation. All current approximation algorithms for the problem build on the same paradigm, which is also used in practice: compute a set E’of edges (called a planarizing set) such that G \ E’ is planar; compute a planar drawing of G \ E’; then add the drawings of the edges of E to the resulting drawing. Unfortunately, there are examples of graphs G, in which any implementation of this method must incur Ω(OPT2) crossings, where OPT is the value of the optimal solution. This barrier seems to doom the only currently known approach to designing approximation algorithms for the problem, and to prevent it from yielding a better than O(n)-approximation.

We propose a new paradigm that allows us to overcome this barrier. Using the new paradigm, we reduce the Crossing Number problem to Crossing Number with Rotation System – a variant of the Crossing Number problem, in which the ordering of the edges incident to every vertex is fixed as part of input. We then show a randomized algorithm for this new problem, that allows us to obtain a subpolynomial approximation for Graph Crossing Number on low-degree graphs.

This talk is based on joint work with Julia Chuzhoy and Sepideh Mahabadi.

Almost linear time algorithms for all flows

Sushant Sachdeva from University of Toronto

February 24, 2023 | Computer Science 105

Over the last decade or so, combining methods from continuous optimization and analysis with graph theoretic-insights has led to a revolution in algorithms for classic problems on graphs such as maximum flow.

In this talk, I will present some of our key ideas behind our recent work that gives almost-linear time algorithms for solving all convex flow problems on graphs. This implies almost linear time algorithms for max-flow, minimum-cost flow, bipartite matching, optimal transport, matrix scaling, isotonic regression, and several other well-studied problems.

Our algorithm is designed using a new Interior Point Method (IPM) that builds the flow as a sequence of almost-linear number of approximate undirected minimum-ratio cycles, each of which is computed and processed very efficiently using a new dynamic data structure.

Joint work with Li Chen, Rasmus Kyng, Yang Liu, Richard Peng, and Maximilian Probst Gutenberg.

Robust and Equitable Uncertainty Estimation

Aaron Roth from University of Pennsylvania

March 03, 2023 | Friend Center 004

Machine learning provides us with an amazing set of tools to make predictions, but how much should we trust particular predictions? To answer this, we need a way of estimating the confidence we should have in particular predictions of black-box models. Standard tools for doing this give guarantees that are averages over predictions. For instance, in a medical application, such tools might paper over poor performance on one medically relevant demographic group if it is made up for by higher performance on another group. Standard methods also depend on the data distribution being static --- in other words, the future should be like the past.

In this talk, I will describe new techniques to address both these problems: a way to produce prediction sets for arbitrary black-box prediction methods that have correct empirical coverage even when the data distribution might change in arbitrary, unanticipated ways and such that we have correct coverage even when we zoom in to focus on demographic groups that can be arbitrary and intersecting. When we just want correct group-wise coverage and are willing to assume that the future will look like the past, our algorithms are especially simple. Our approach is based on a quantile analogue of (mean) multi-calibration, a recent solution concept arising from the algorithmic fairness literature. We close by asking which distributional properties (beyond means and quantiles) we can achieve multicalibration with respect to, and provide a complete characterization via a connection to property elicitation.

This talk is based on two papers, that are joint work with Osbert Bastani, Varun Gupta, Chris Jung, Georgy Noarov, and Ramya Ramalingam.

Last-Iterate Convergence in Min-Max Optimization: SOS to the Rescue

Yang Cai from Yale University

March 10, 2023 | Computer Science 105

Min-max optimization is a classical problem in optimization and game theory and has recently played a central role in modern machine learning (ML) applications. These applications range from generative adversarial networks, adversarial examples, robust optimization, to multi-agent reinforcement learning. These ML applications present novel challenges to min-max optimization, necessitating that algorithms (i) demonstrate last-iterate convergence, in contrast to the more traditional average-iterate convergence, and (ii) be robust in nonconvex-nonconcave environments.

In this talk, we first focus on the last-iterate convergence rates of two classical and widely-used algorithms for solving convex-concave min-max optimization – the extragradient (EG) method by Korpelevich (1976) and the optimistic gradient (OG) method by Popov (1980). Despite the considerable attention these algorithms have received from both the optimization and machine learning communities over the years, the last-iterate convergence rates for both algorithms remained open. We resolve this open problem by showing both EG and OG exhibit a tight O(1/\sqrt{T}) last-iterate convergence rate under arbitrary convex constraints. In the second part of the talk, we discuss some recent progress on designing optimal first-order methods for structured nonconvex-nonconcave min-max optimization. Our results are obtained via a new sum-of-squares programming based approach, which may be useful in analyzing other iterative methods. The talk is based on joint work with Argyris Oikonomou and Weiqiang Zheng.

1) On the Structural Complexities of Matching Mechanisms; 2) A Computational Separation Between Quantum No-cloning and No-teleportation

Clay Thomas and Barak Nehoran from Princeton University

March 24, 2023 | Computer Science 105

Talk 1: School districts in many cities employ school choice mechanisms, i.e., algorithms matching students to schools on the basis of students’ reported preferences, and the priorities students have for attending each school. Since simplicity and ease of participation are paramount to these mechanisms, an exact understanding of their structural complexities—e.g., how the matching can be described, and how one student’s input can affect the outputs—should be a first-order concern.

In this talk, I will give one example of this type of complexity: that of describing the outcome matching. Prior work has observed that for one popular real-world mechanism (DA), the outcome can be described in terms of a single "priority cutoff" per school, even when there are many more students than there are schools. We prove that, for another popular mechanism (TTC), the outcome cannot be described using less than a bit for each pair of schools. This result clarifies and formalizes one way in which, as often informally argued in both the literature and in practice, TTC may be more complex than DA.

Depending on time, I may also mention connections to verification (in the sense of nondeterministic communication), and how we formalize the question "how sensitive is the mechanism to one student’s input" in this setting.

Based on joint work with Yannai Gonczarowski.

Talk 2: Two of the fundamental no-go theorems of quantum information are the no-cloning theorem (that it is impossible to make copies of general quantum states) and the no-teleportation theorem (the prohibition on sending quantum states over classical channels without pre-shared entanglement). They are known to be equivalent, in the sense that a collection of quantum states is clonable if and only if it is teleportable.

Our main result suggests that this is not the case when computational efficiency is considered. We give a collection of quantum states and oracles relative to which these states are efficiently clonable but not efficiently teleportable. Given that the opposite scenario is impossible (states that can be teleported can always trivially be cloned), this gives the most complete oracle separation possible between these two important no-go properties.

In doing so, we introduce a related quantum no-go property, reconstructibility, which refers to the ability to construct a quantum state from a uniquely identifying classical description. We show the stronger result of a collection of quantum states that are efficiently clonable but not efficiently reconstructible. This novel no-go property only exists in relation to computational efficiency, as it is trivial for unbounded computation. It thus opens up the possibility of further computational no-go properties that have not yet been studied because they do not exist outside the computational context.

Joint work with Mark Zhandry.

### Past Semesters Archive

## Fall 2022 Schedule

Random restrictions on boolean functions with small influences

Pei Wu from IAS

September 16, 2022 | Friend Center Convocation Room | Talk video

In the talk, we discuss the probability of Boolean functions with small max influence to become constant under random restrictions. Let f be a Boolean function such that the variance of f is $\Omega(1)$ and all its individual influences are bounded by $\tau$. We show that when restricting all but a $\tilde{\Omega}((\log1/\tau)^{-1})$ fraction of the coordinates, the restricted function remains nonconstant with overwhelming probability. This bound is essentially optimal, as witnessed by the tribes function $AND_{n/C\log n} \circ OR_{C\log n}$.

We extend it to an anti-concentration result, showing that the restricted function has nontrivial variance with probability $1-o(1)$. This gives a sharp version of the "it ain't over till it's over" theorem due to Mossel, O'Donnell, and Oleszkiewicz.

Negative-Weight Single-Source Shortest Paths in Near-linear Time

Aaron Bernstein from Rutgers University

September 23, 2022 | Friend Center 004

We present a randomized algorithm that computes single-source shortest paths (SSSP) in $O(m\log^8(n)\log W)$ time when edge weights are integral, can be negative, and are $\geq -W$. This essentially resolves the classic negative-weight SSSP problem. The previous bounds are $\~{O}((m+n^{1.5})\log W)$ and $m^{4/3+o(1)}\log W$. Near-linear time algorithms were known previously only for the special case of planar directed graphs. Also, independently of our work, the recent breakthrough on min-cost flow [Chen, Kyng, Liu, Peng, Probst-Gutenberg and Sachdeva] implies an algorithm for negative-weight SSSP with running time $m^{1+o(1)}$.

In contrast to all recent developments that rely on sophisticated continuous optimization methods and dynamic algorithms, our algorithm is simple: it requires only a simple graph decomposition and elementary combinatorial tools. In fact, ours is the first combinatorial algorithm for negative-weight SSSP to break through the classic $O(m\sqrt{n}\log nW)$ bound from over three decades ago [Gabow and Tarjan SICOMP'89].

Kolmogorov Complexity Characterizes Statistical Zero Knowledge

Eric Allender from Rutgers University

September 30, 2022 | Friend Center 006

We show that a decidable promise problem has a non-interactive statistical zero-knowledge proof system if and only if it is randomly reducible to a promise problem for Kolmogorov-random strings, with a superlogarithmic additive approximation term. This extends recent work by Saks and Santhanam (CCC 2022). We build on this to give new characterizations of Statistical Zero Knowledge (SZK), as well as the related classes NISZK_L and SZK_L.

This is joint work with Shuichi Hirahara and Harsha Tirumala. A paper is available as ECCC TR22-127.

New Directions in Derandomization: Non-Black-Box Techniques, Superfast Algorithms

Roei Tell from IAS

October 07, 2022 | Friend Center 004

This talk will present two new directions in the study of derandomization, which are related to each other. The talk is based on a sequence of recent works with Lijie Chen (some of the works are upcoming).

The first direction is avoiding classical PRGs in favor of *non-black-box techniques*, which extract pseudorandomness for a probabilistic machine from the code of that specific machine. This approach allows constructing derandomization algorithms from weaker hypotheses (compared to classical results), and also constructing very fast derandomization algorithms that cannot be obtained using classical PRGs. The second direction is the fine-grained study of *superfast derandomization*, asking what is the smallest possible overhead that we need to pay when eliminating randomness (and whether we can get away with paying nothing at all).

As examples, we will mention four results:

1. Connecting the BPP = P conjecture to new types of lower bounds.

2. "Free lunch" derandomization with essentially no time overhead.

3. Superfast derandomization of interactive proof systems.

4. Average-case derandomization from weak hardness assumptions. (Joint work with Ron Rothblum.)

Almost Ramanujan Expanders from Arbitrary Expanders via Operator Amplification

Fernando Granha Jeronimo from IAS

October 14, 2022 | Friend Center 004

Expander graphs are fundamental objects in theoretical computer science and mathematics. They have numerous applications in diverse fields such as algorithm design, complexity theory, coding theory, pseudorandomness, group theory, etc.

In this talk, we will describe an efficient algorithm that transforms any bounded degree expander graph into another that achieves almost optimal (namely, near-quadratic, $d \leq 1/\lambda^{2+o(1)}$) trade-off between (any desired) spectral expansion $\lambda$ and degree $d$. The optimal quadratic trade-off is known as the Ramanujan bound, so our construction gives almost Ramanujan expanders from arbitrary expanders.

This transformation preserves structural properties of the original graph, and thus has many consequences. Applied to Cayley graphs, our transformation shows that any expanding finite group has almost Ramanujan expanding generators. Similarly, one can obtain almost optimal explicit constructions of quantum expanders, dimension expanders, monotone expanders, etc., from existing (suboptimal) constructions of such objects.

Our results generalize Ta-Shma's technique in his breakthrough paper [STOC 2017] used to obtain explicit almost optimal binary codes. Specifically, our spectral amplification extends Ta-Shma's analysis of bias amplification from scalars to matrices of arbitrary dimension in a very natural way.

Joint work with: Tushant Mittal, Sourya Roy and Avi Wigderson

Convex Spherical Cubes

Oded Regev from NYU

October 28, 2022 | Friend Center 004

We show that there exists a convex body that tiles $\mathbb{R}^n$ under translations by $\mathbb{Z}^n$ whose surface area is $\tilde{O}(\sqrt{n})$. No asymptotic improvement over the trivial $O(n)$ (achieved by the cube $[0,1]^n$) was previously known. A non-convex body achieving surface area $O(\sqrt{n})$ was constructed by Kindler, O’Donnell, Rao, and Wigderson [FOCS 2008].

For some fun background on this question, see Kindler et al.'s CACM article: https://cacm.acm.org/magazines/2012/10/155546-spherical-cubes/fulltext and Klarreich's broader review: https://www.jstor.org/stable/27857995.

Joint work with Assaf Naor.

Maintaining Privacy of the Minority without Ignoring It: Differentially Private Oversampling for Imbalanced Learning

Rachel Cummings from Columbia University

November 04, 2022 | Friend Center 004

The problem of learning from imbalanced datasets, where the label classes are not equally represented, arises often in practice. A widely used method to combat this problem is preprocessing techniques to amplify the minority class, e.g., by reusing or resampling the existing minority instances. However, when the training data are sensitive and privacy is a concern, this will also amplify privacy leakage from members of the minority class. Therefore this oversampling pre-processing step must be designed with privacy in mind. In this work, we present an algorithm for differentially private synthetic minority oversampling technique (DP-SMOTE) that guarantees formal privacy for the minority class. We show empirically that there is no statistically significant loss in performance relative to existing non-private SMOTE tools. We also show that even when non-private SMOTE is used as a part of a DP learning pipeline as a pre-processing step before downstream private classification, the upstream non-private oversampling increases sensitivity exponentially in the dimensionality of the data, which harms classification accuracy at a corresponding rate. The algorithmic structure also yields insights about the implications for pre-processing before the use of DP algorithms, which may be of independent interest.

(joint work with Yuliia Lut, Eitan Turok, and Marco Avella Medina)

Algorithmic Applications of Hypergraph and Partition Containers

Or Zamir from IAS and Princeton University

November 11, 2022 | Friend Center 004

The relatively new Hypergraph Container Method (Saxton and Thomason 2015, Balogh, Morris and Samotij 2015), generalizing an earlier version for graphs (Sapozhenko 2001), has been used extensively in recent years in extremal combinatroics. In essence, it shows that independent sets in regular-enough uniform hypergraphs must be clustered in some fashion. We show how to use this method algorithmically to convert algorithms for arbitrary inputs into faster algorithms for almost-regular input instances. Informally, an almost-regular input is an input in which the maximum degree is larger than the average degree by at most a constant factor. This vastly generalizes several families of inputs for which we commonly have improved algorithms.

An important component of our work is the generalization of (hyper-)graph containers to Partition Containers. While standard containers apply to a single independent set, partition containers apply to several independent sets at the same time.

Our first main application is a resolution of a major open problem about Graph Coloring algorithms, for almost-regular graphs. The chromatic number of a graph can be computed in $2^n$. Better algorithms are known only for $k<7$. We solve $k$-coloring in $(2-\epsilon)^n$ time for graphs in which the maximum degree is at most $C$ times the average degree, where $\epsilon > 0$ for any $k, C$. This includes, for example, regular graphs of any degree (with $C = 1$), and graphs with at least $\epsilon n^2$ edges for any fixed constant $\epsilon$ (with $C = 1 / 2 \epsilon$).

Our second main application is the first improved $k$-SAT algorithm for dense formulas. The celebrated sparsification lemma (Impagliazzo, Paturi and Zane 2001) implies that solving $k$-SAT can be reduced to solving $k$-SAT on sparse formulas. We show a complementing statement; For every $k, b$ there exists a $C$ such that if $k$-SAT can be solved in $b^n$ time, then $k$-SAT on formulas with at least $C \cdot n$ clauses that are well-spread can be solved in $(b-\epsilon)^n$ time.

We give additional applications including faster algorithms for unweighted and weighted Maximum Independent Set in almost-regular graphs.

Linear Probing Revisited: Tombstones Mark the Demise of Primary Clustering

William Kuszmaul from MIT

November 18, 2022 | Friend Center 004

The linear-probing hash table is one of the oldest and most widely used data structures in computer science. However, linear probing also famously comes with a major drawback: as soon as the hash table reaches a high memory utilization, elements within the hash table begin to cluster together, causing insertions to become slow. This phenomenon, now known as "primary clustering", was first captured by Donald Knuth in 1963; at a load factor of $1 - 1/x$, the expected time per insertion becomes $\Theta(x^2)$, rather than the more desirable $\Theta(x)$.

We show that there is more to the story than the classic analysis would seem to suggest. It turns out that small design decisions in how deletions are implemented have dramatic effects on the asymptotic performance of insertions. If these design decisions are made correctly, then even a hash table that is continuously at a load factor $1 - \Theta(1/x)$ can achieve average insertion time $\tilde{O}(x)$. A key insight is that the tombstones left behind by deletions cause a surprisingly strong "anti-clustering" effect, and that when insertions and deletions are one-for-one, the anti-clustering effects of deletions actually overpower the clustering effects of insertions.

We also present a new variant of linear probing, which we call "graveyard hashing", that completely eliminates primary clustering on any sequence of operations. If, when an operation is performed, the current load factor is $1 - 1/x$ for some $x$, then the expected cost of the operation is $O(x)$.

Based on joint work with Michael A. Bender and Bradley C. Kuszmaul. FOCS '21. Available at https://arxiv.org/abs/2107.01250.

1) On the Hardness of Dominant Strategy Mechanism Design; 2) Memory-Sample Lower Bounds for Learning with Classical-Quantum Hybrid Memory

Shiri Ron and Wei Zhan from Weitzmann Institute and Princeton University

December 02, 2022 | Friend 004

Talk 1: We study the communication complexity of dominant strategy implementations of combinatorial auctions. We start with two domains that are generally considered “easy”: multi-unit auctions with decreasing marginal values and combinatorial auctions with gross substitutes valuations. For both domains we have fast algorithms that find the welfare-maximizing allocation with communication complexity that is poly-logarithmic in the input size. This immediately implies that welfare maximization can be achieved in an ex-post equilibrium with no significant communication cost, by using VCG payments. In contrast, we show that in both domains the communication complexity of any dominant strategy implementation that achieves the optimal welfare is polynomial in the input size. We then study the approximation ratios achievable by dominant strategy mechanisms. For combinatorial auctions with general valuations, we show that no dominant-strategy mechanism achieves an approximation ratio of $m^{1−\eps}$, where m is the number of items. In contrast, a randomized dominant strategy mechanism that achieves an O(√m) approximation. This proves the first gap between computationally efficient deterministic dominant strategy mechanisms and randomized ones. Joint work with Shahar Dobzinski and Jan Vondrak.

Talk 2: In this talk, I will present our study on classical learning tasks in the presence of quantum memory. We prove that any quantum algorithm with both, classical memory and quantum memory, for parity learning on n bits, requires either Ω(n^2) bits of classical memory or $\Omega(n)$ bits of quantum memory or an exponential number of samples. The work by Raz showed that any classical algorithm requires either Ω(n^2) bits of memory or an exponential number of samples, and our results refute the possibility that a small amount of quantum memory significantly reduces the size of classical memory needed for efficient learning. I will briefly review the techniques used in the proof for classical memory-sample lower bound, discuss what prevents us from directly applying these techniques to the quantum case, and give a sketch of the actual proof for our results. This is a joint work with Qipeng Liu and Ran Raz.

Optimal Weak to Strong Learning

Kasper Green Larsen from Aarhus University

December 09, 2022 | CS 105

The classic algorithm AdaBoost allows to convert a weak learner, that is an algorithm that produces a hypothesis which is slightly better than chance, into a strong learner, achieving arbitrarily high accuracy when given enough training data. We present a new algorithm that constructs a strong learner from a weak learner but uses less training data than AdaBoost and all other weak to strong learners to achieve the same generalization bounds. A sample complexity lower bound shows that our new algorithm uses the minimum possible amount of training data and is thus optimal. Hence, this work settles the sample complexity of the classic problem of constructing a strong learner from a weak learner.

This work was accepted at NeurIPS’22 and is joint work with Martin Ritzert from Aarhus University

Pseudorandom rotations: explicit approximate designs for some Lie groups

Pedro Paredes from Princeton University

December 16, 2022 | CS 105

Since the introduction of randomness as an algorithmic tool, researchers have been trying to study how to reduce the amount of randomness that these probabilistic algorithms need. This is achieved through lots of different ways, but a common one is to develop algorithms that produce objects (like sequences of bits) that "look random" enough that they fool probabilistic algorithms. These algorithms that produce random looking objects are known as pseudorandom generators. In addition to the intrinsic motivation of understanding how randomness and computation interact, there is a broader appeal to using these tools to find explicit constructions of objects that are known to exist through probabilistic methods.

One of the many successes of this area is the observation that very often we only need bounded independence. Broadly speaking, sampling a truly uniform random $N$-dimensional object may require $\Omega(N)$ bits of randomness, but for many applications it is sufficient to use objects that only look truly random when inspecting $k \ll N$ dimensions, known as "$k$-wise independent". One basic example of this is $k$-wise independent permutations: a distribution over of permutations on $N$ elements is $k$-wise independent if a permutation sampled from it maps any distinct $k$ elements to any distinct $k$ elements equally likely (i.e. any subset of $k$ elements looks uniform). Thanks to Kassabov [Kas07] and Kaplan–Naor–Reingold [KNR09], it turns out that such a distribution can be created using only $O(k\log N)$ random bits to sample from (with a small error), much less than in the fully independent case.

In this talk, we present a framework that generalizes these works to obtain $k$-wise independent approximate orthogonal and unitary matrices (formally known as approximate orthogonal/unitary designs). Given a classical group of Lie type with certain properties, our framework uses the properties of the group to approximate to "$k$-locally" approximate the Haar (uniform) measure over elements of the group. We do so by first showing how to obtain a pseudorandom generator with a non-trivial yet big error, and then showing how to reduce this error using a generalization of expander random walks to operators.

Joint work with Ryan O'Donnell and Rocco Servedio