Algorithms/Machine Learning Reading Group


What - This is a reading group primarily attended by the research groups of Prof. Sanjeev Arora and Prof. Elad Hazan. We discuss exciting recent advancements in the areas of algorithm design and theoretical machine learning.
When - Thursdays 12:30 pm
Where - CS 402

Upcoming Talk

Bypassing KLS: Gaussian Cooling and an O*(n^3) Volume Algorithm

Thursday 29th October 12.30 pm
Authors: Ben Cousins, Santosh Vempala
Speaker: Karan Singh
Links: Arxiv Link

Abstract: We present an O∗(n3) randomized algorithm for estimating the volume of a well-rounded convex body given by a membership oracle, improving on the previous best complexity of O∗(n4). The new algorithmic ingredient is an accelerated cooling schedule where the rate of cooling increases with the temperature. Previously, the known approach for potentially achieving such complexity relied on a positive resolution of the KLS hyperplane conjecture, a central open problem in convex geometry.

Calendar

Archived Talks - Fall 2015

A survey on the brain's representation of language

Thursday 22nd October 12.30 pm
Speaker: Kiran Vodrahalli

Abstract: I will be speaking how to model the relationship between brain activity and language. I will first give some background on brain data and fMRI in particular, quickly covering Mitchell's 2008 paper on predicting brain activity associated with the meaning of nouns (http://www.cs.cmu.edu/~tom/pubs/science2008.pdf). I will then discuss Pereira et. al's work on generating text from fMRI (http://www.princeton.edu/~fpereira/Papers/fnhum-05-00072.pdf), and follow up with a paper from 2014 by Mitchell's group at CMU identifying story features predictive of fMRI signal as subjects read Harry Potter (http://www.cs.cmu.edu/~fmri/plosone/files/hp_PLOS_ONE.pdf). At the end if there is still time, I will speak about word vectors derived from joint embeddings of text and fMRI data, a paper by Fyshe (who is a student of Tom Mitchell): http://aclweb.org/anthology/P14-1046.

Recent Progresses in Sequence to Sequence Learning with Recurrent Neural Networks

Thursday 15th October 12.30 pm
Speaker: Tengyu Ma

Abstract: Deep learning approaches have achieved great empirical success for image and speech understanding. With a strong motivation for natural language processing, there has been a rapid growth of interests and progresses in sequence to sequence learning with various forms of recurrent neural networks. In this talk I will survey these empirical progresses (in a relatively theoretician friendly language). I am planning to cover notions/models including recurrent neural networks (RNN), long short term memory(LSTM), memory networks, neural turing machines, pointer networks etc, and their applications to NLP tasks, e.g., machine translation, question answering and image caption generation. No prior knowledge is required.

Convex Bandits

Thursday 8th October 12.30 pm
Speaker: Sebastien Bubeck
Authors: Sebastien Bubeck, Ronen Eldan
Links: Preprint on speaker's webpage

Abstract: I will prove that the minimax regret for bandit convex optimization is O(poly(n) sqrt(T)), thus resolving a decade-old open problem. The proof is based on an information theoretic analysis of Bayesian bandit problems, as well as a new understanding of convex functions. Joint work with Ronen Eldan.

Train faster, generalize better: Stability of stochastic gradient descent

Speaker: Zeyuan Allen-Zhu
Authors: Mortiz Hardt, Ben Recht, Yoram Singer
Links: Arxiv Version

Abstract: We show that any model trained by a stochastic gradient method with few iterations has vanishing generalization error. We prove this by showing the method is algorithmically stable in the sense of Bousquet and Elisseeff. Our analysis only employs elementary tools from convex and continuous optimization. Our results apply to both convex and non-convex optimization under standard Lipschitz and smoothness assumptions. Applying our results to the convex case, we provide new explanations for why multiple epochs of stochastic gradient descent generalize well in practice. In the nonconvex case, we provide a new interpretation of common practices in neural networks, and provide a formal rationale for stability-promoting mechanisms in training large, deep models. Conceptually, our findings underscore the importance of reducing training time beyond its obvious benefit.