Archived Talks - Fall 2015
A survey on the brain's representation of language
Thursday 22nd October 12.30 pm
Speaker: Kiran Vodrahalli
Abstract: I will be speaking how to model the relationship between brain activity and language. I will first give some background on brain data and fMRI in particular, quickly covering Mitchell's 2008 paper on predicting brain activity associated with the meaning of nouns (http://www.cs.cmu.edu/~tom/pubs/science2008.pdf). I will then discuss Pereira et. al's work on generating text from fMRI (http://www.princeton.edu/~fpereira/Papers/fnhum-05-00072.pdf), and follow up with a paper from 2014 by Mitchell's group at CMU identifying story features predictive of fMRI signal as subjects read Harry Potter (http://www.cs.cmu.edu/~fmri/plosone/files/hp_PLOS_ONE.pdf). At the end if there is still time, I will speak about word vectors derived from joint embeddings of text and fMRI data, a paper by Fyshe (who is a student of Tom Mitchell): http://aclweb.org/anthology/P14-1046.
Recent Progresses in Sequence to Sequence Learning with Recurrent Neural Networks
Thursday 15th October 12.30 pm
Speaker: Tengyu Ma
Abstract: Deep learning approaches have achieved great empirical success for image and speech understanding. With a strong motivation for natural language processing, there has been a rapid growth of interests and progresses in sequence to sequence learning with various forms of recurrent neural networks. In this talk I will survey these empirical progresses (in a relatively theoretician friendly language). I am planning to cover notions/models including recurrent neural networks (RNN), long short term memory(LSTM), memory networks, neural turing machines, pointer networks etc, and their applications to NLP tasks, e.g., machine translation, question answering and image caption generation. No prior knowledge is required.
Convex Bandits
Thursday 8th October 12.30 pm
Speaker: Sebastien Bubeck
Authors: Sebastien Bubeck, Ronen Eldan
Links: Preprint on speaker's webpage
Abstract: I will prove that the minimax regret for bandit convex optimization is O(poly(n) sqrt(T)), thus resolving a decade-old open problem. The proof is based on an information theoretic analysis of Bayesian bandit problems, as well as a new understanding of convex functions.
Joint work with Ronen Eldan.
Train faster, generalize better: Stability of stochastic gradient descent
Speaker: Zeyuan Allen-Zhu
Authors: Mortiz Hardt, Ben Recht, Yoram Singer
Links: Arxiv Version
Abstract: We show that any model trained by a stochastic gradient method with few iterations has vanishing generalization error. We prove this by showing the method is algorithmically stable in the sense of Bousquet and Elisseeff. Our analysis only employs elementary tools from convex and continuous optimization. Our results apply to both convex and non-convex optimization under standard Lipschitz and smoothness assumptions.
Applying our results to the convex case, we provide new explanations for why multiple epochs of stochastic gradient descent generalize well in practice. In the nonconvex case, we provide a new interpretation of common practices in neural networks, and provide a formal rationale for stability-promoting mechanisms in training large, deep models. Conceptually, our findings underscore the importance of reducing training time beyond its obvious benefit.