Monday, January 5 
08:55  09:00 
Address by Govindan Rangarajan, NMI Convener

09:00  10:30 
Nicolo CesaBianchi
University of Milan

The Online Approach to Machine Learning
Abstract:
The online paradigm has become a standard tool in machine learning and largescale data analysis. Online learning can be viewed as a repeated game between an adaptive agent and an everchanging environment. Within this simple paradigm, one can model a variety of sequential decision tasks simply by specifying the interaction protocol and the resource constraints on the learner. We will start the talk by introducing, through a novel and unified analysis, the main results for two wellknown settings: prediction with expert advice and multiarmed bandits. A natural extension of this analysis will take us to the model of online convex optimization. This is a very general online learning framework, whose workhorse is the powerful mirror descent algorithm. In the second half of the talk we will explore the applications of mirror descent in expertlike and banditlike scenarios, highlighting some of the most recent results and the main open problems.

Slides [pdf, 1mb]

10:30  11:00 
Coffee Break 
11:00  11:45 
Sham Kakade
Microsoft Research New England

Optimization on Noisy Data: Statistical Accuracy vs. Numerical Precision
Abstract:
Many optimization problems that arise in science and engineering are those in which we only have a stochastic approximation to the underlying problem at hand (e.g. linear regression or other problems where our objective function is a sample average). Such problems highlight some of the challenges we face at the interface of computer science and statistics: should we use a highly accurate algorithm (with costly time and space requirements yet returns numerically precise estimates) or a crude stochastic approximation scheme like stochastic gradient descent (which is light on memory and simple to implement, yet has a poor convergence rate)? The tension arises due to the statistical issue of how much accuracy we actually desire of our algorithm. In the absence of computational constraints, the minimizer on a sample average of observed data  the empirical risk minimizer (ERM)  is widely regarded as the estimation strategy of choice due to its desirable statistical convergence properties.
This talk will survey results on both these fronts (stochastic approximation algorithms and recent work on linearly convergent algorithms). Furthermore, this work will provide a simple to implement procedure (applicable to many estimation problems including linear regression) that, under mild regularity conditions, enjoys the following properties:
 The algorithm can be implemented in linear time with just a single pass of the observed data, using space linear in the size of a single sample.
 The algorithm achieves the same statistical rate of convergence as the empirical risk minimizer, on every problem (even considering constant factors).
 The algorithm's performance depends on the initial error at a rate that decreases superpolynomially.
Furthermore, we quantify this rate of convergence, showing that after the size of our dataset exceeds some (problem dependent) threshold, the streaming algorithm rapidly approaches the rate of ERM in expectation.


11:45  12:30 
Adam Kalai
Microsoft Research New England

Machine Learning and Crowdsourcing
Abstract:

Slides [ppt, 34mb]

12:30  14:00 
Lunch (Guest House lawns) 
14:00  15:30 
Robert C. Williamson
Australian National University

The Geometry of Machine Learning Problems
Abstract:
Machine learning researchers love coming up with new and better algorithms. But users only care about having their problems solved. How can this tension be turned into a theoretical research agenda? In the talk I will explain how. I will argue why it is important and valuable to be able to relate machine learning problems to each other. To that end I will consider the definition of a machine learning problem. A crucial element of formalising a machine learning problem is a loss function and so mapping all the problems requires mapping the structure of loss functions. I will summarise the work I have done on loss functions in the last several years, culminating in a new, intrinsic and intriguing representation in terms of convex bodies. Along the way I will present interesting relationships with divergences, surrogate regret bounds, the aggregating algorithm, and different parametrisations of loss functions.

Slides  Part 1 [ppt, 42mb]
Slides  Part 2 [pdf, 6mb]

15:30  16:00 
Coffee Break 
16:00  16:45 
Sanjoy Dasgupta
University of California, San Diego

Universal Consistency of Nearest Neighbor in Metric Spaces, and Rates of Convergence
Abstract:
In this talk, I will give an overview of the statistical theory of nearest neighbor classification. I will also talk about some recent progress on obtaining universal consistency, and sharp rates of convergence, in metric spaces.
Bio:
Sanjoy Dasgupta is a Professor in the Department of Computer Science and Engineering at UC San Diego. He received his PhD from Berkeley in 2000, and spent two years at AT&T Research Labs before joining UCSD. His area of research is algorithmic statistics, with a focus on unsupervised and minimally supervised learning. He is the author of a textbook, "Algorithms" (with Christos Papadimitriou and Umesh Vazirani), that appeared in 2006.

Slides [pdf, 1mb]

16:45  17:30 
Shivani Agarwal
Indian Institute of Science

Statistical Learning in Complex Prediction Spaces: What Do We Know?
Abstract:
In an increasing number of application domains, supervised learning methods are needed for making predictions in complex spaces: predicting sequences in speech recognition, predicting rankings in information retrieval and recommender systems, predicting assignments in image segmentation, and so on. How should we design statistically consistent learning algorithms for such problems? A popular learning approach is to minimize a convex surrogate loss, but how does one design surrogates that lead to consistent algorithms? While the answer to this question is well understood for simple binary classification tasks and a few other specific learning problems, a general understanding has been missing. In this talk I will describe some of our recent work on developing a unified framework for designing statistically consistent learning algorithms for complex prediction problems defined by an arbitrary loss matrix. I will introduce the notion of convex calibration dimension of a general loss matrix, which measures the smallest dimension of a surrogate space in which one can design a convex calibrated surrogate loss for a given loss matrix, and will describe some ways to design explicit convex calibrated surrogates for any given loss matrix. I will also discuss ways to achieve approximate statistical consistency when exact consistency may be computationally hard to achieve. I will conclude by describing some fundamental open questions.

Slides [pdf, 11mb]


Tuesday, January 6 
09:00  10:30 
Santosh Vempala
Georgia Institute of Technology

The Complexity of Unsupervised Learning
Abstract:
Unsupervised learning or making "sense" of unlabeled data is a leading paradigm in modern machine learning. The idea is to either to make structural/distributional assumptions on data, then recover the structure or parameters of the distribution from samples, OR to find the best fit model for a given data set. In this selfcontained talk, we will summarize the main ideas and progress along these lines, with a focus on the following "classical" models:
 Gaussian mixture models
 Independent Component Analysis
 Recovering planted structures (cliques, partitions, assignments)
We will discuss algorithmic techniques, lower bounds and some of the many fascinating open problems.

Slides [ppt, 4mb]

10:30  11:00 
Coffee Break 
11:00  11:45 
Constantinos Daskalakis
Massachusetts Institute of Technology

Beyond Berry Esseen: Structure and Learning of Sums of Random
Variables
Abstract:
It is wellknown that \(\sim1/\epsilon^2\) coin tosses are necessary and sufficient to estimate the bias of a coin. But how many samples are needed to learn the distribution of the sum of \(n\) independent Bernoullis? I will show that the answer is still \(O(1/\epsilon^2)\), independent of \(n\), and will generalize this bound to sums of independent bounded integer random variables (SIIRVs). I will argue that the celebrated BerryEsseen theorem and its variants fall short from characterizing the general structure of SIIRVs, and offer stronger finitary central limit theorems, which tightly characterizing their structure and have the aforementioned implications to learning.
Bio:
Constantinos Daskalakis is the xwindow consortium associate professor of computer science at MIT. He holds a diploma in electrical and computer engineering from the National Technical University of Athens, and a Ph.D. in electrical engineering and computer sciences from UCBerkeley. His research interests lie in theoretical computer science and its interface with economics and probability. Daskalakis has been honored with the 2007 Microsoft Graduate Research Fellowship, the 2008 ACM Doctoral Dissertation Award, the Game Theory and Computer Science Prize from the Game Theory Society, the 2010 Sloan Fellowship in Computer Science, the 2011 SIAM Outstanding Paper Prize, the 2011 Ruth and Joel Spira Award for Distinguished Teaching, and the 2012 Microsoft Research Faculty Fellowship. He is also a recipient of Best Paper awards at the ACM Conference on Economics and Computation in 2006 and in 2013.


11:45  12:30 
Ankur Moitra
Massachusetts Institute of Technology

Simple, Efficient and Neural Algorithms for Sparse Coding
Abstract:


12:30  14:00 
Lunch (Guest House lawns) 
16:00  17:00 
Silvio Micali
Massachusetts Institute of Technology

The Quest for Resilient Mechanism Design
Abstract:
Mechanism design aims at engineering games that, when played by selfish players, yield outcomes satisfying a desired property. Such engineered games, however, are typically vulnerable to computational complexity, privacy, and collusion. Developing a theory of mechanism design resilient to such "forces" will require a totally new framework: techniques, solution concepts, and benchmarks. We shall advocate this point using auctions as an example.
Bio:
Professor Silvio Micali received his Laurea in Mathematics from the University of Rome, and his PhD in Computer Science from the University of California at Berkeley. Since 1983, he has been on the MIT faculty, in Electrical Engineering and Computer Science Department, where he is Ford Professor of Engineering. Silvio's research interests are cryptography, zero knowledge, pseudorandom generation, secure protocols, and mechanism design. Silvio has received the Turing Award (in computer science), the Gödel Prize (in theoretical computer science), and the RSA prize (in cryptography). He is a member of the National Academy of Sciences, the National Academy of Engineering, and the American Academy of Arts and Sciences.


17:30  18:30 
Poster Session
(Faculty Hall)
Adway Mitra
Indian Institute of Science

Temporally Coherent CRP: A Bayesian NonParametric Approach for Clustering Tracklets with applications to Person Discovery in Videos

Anudhyan Boral
Harvard University

An Economic Interpretation of Edmond's Blossom Algorithm

Arpit Agarwal
Indian Institute of Science

GEVCanonical Regression for Accurate Binary Class Probability Estimation when One Class is Rare

Arun Rajkumar
Indian Institute of Science

Ranking from Pairwise Comparisons: The Role of the Pairwise Preference Matrix

Chandrasekhar Lakshmi Narayanan
Indian Institute of Science

A generalized reduced linear program for Markov decision processes.

ClĂ©ment Canonne
Columbia University

Sampling Correctors

Girish Varma
Tata Institute of Fundamental Research

Hardness of Coloring

Harikrishna Narasimhan
Indian Institute of Science

Online and Stochastic Gradient Methods for Nondecomposable Loss Functions

Harish G. Ramaswamy
Indian Institute of Science

Convex Calibrated Surrogates for LowRank Loss Matrices with Applications to Subset Ranking Losses

Lavanya Sita Tekumalla
Indian Institute of Science

Mining Block I/O Traces for Cache Preloading with Sparse Temporal NonÂparametric Mixture of Multivariate Poisson

Palash Dey
Indian Institute of Science

Sampling Complexity for Winner Determination in Voting

Rohit Vaish
Indian Institute of Science

On the Statistical Consistency of Plugin Classifiers for Nondecomposable Performance Measures

Satyanath Bhat
Indian Institute of Science

An Optimal Bidimensional MultiArmed Bandit Auction for Multiunit Procurement

Shahbaz Khan
Indian Institute of Technology Kanpur

Incremental Algorithm for Maintaining DFS Tree for Undirected Graphs

Shilpa Garg
Max Planck Institut Informatik

Seed node selection for community discovery in social networks

Shivaram Kalyanakrishnan
Indian Institute of Science

Improved Expected Runtime for MDP Planning

Shweta Jain
Indian Institute of Science

An Incentive Compatible MultiArmedBandit Crowdsourcing Mechanism with Quality Assurance

Trapit Bansal
Indian Institute of Science

A Provable SVDbased Algorithm for Learning Topics in Dominant Admixture Corpus



Wednesday, January 7 
09:00  09:45 
Sandeep Juneja
Tata Institute of Fundamental Research

Efficient Rare Event Simulation Algorithms for Heavy Tailed Processes
Abstract:

Slides [pdf, 2mb]

09:45  10:30 
Anirban Dasgupta
Indian Institute of Technology Gandhinagar

Aggregating Information from the Crowd
Abstract:

Slides [pdf, 2mb]

10:30  11:00 
Coffee Break 
11:00  11:45 
Arnab Bhattacharyya
Indian Institute of Science

Algebraic Property Testing
Abstract:

Slides [ppt, 3mb]

12:00  13:30 
Lunch (Guest House lawns) 

13:55  14:00 
Address by Spenta Wadia, ICTS Director

14:00  14:45 
Sanjeev Arora
Princeton University

Overcoming Computational Intractability in Unsupervised Learning
Abstract:
Unsupervised learning  i.e., learning with unlabeled data  is increasingly important given today's data deluge. Most natural problems in this domain  e.g. for models such as mixture models, HMMs, graphical models, topic models and sparse coding/dictionary learning, deep learning  are NPhard. Therefore researchers in practice use either heuristics or convex relaxations with no concrete approximation bounds. Several nonconvex heuristics work well in practice, which is also a mystery.
The talk will describe a sequence of recent results whereby rigorous approaches leading to polynomial running time are possible for several problems in unsupervised learning. The proof of polynomial running time usually relies upon nondegeneracy assumptions on the data and the model parameters, and often also on stochastic properties of the data (averagecase analysis). Some of these new algorithms are very efficient and practical  e.g. for topic modeling.
Bio:
Sanjeev Arora is Professor of Computer Science at Princeton University. He is interested in algorithms for intractable problems, and computational complexity. Recently he has been applying these ideas to machine learning. He has won the Packard Fellowship (1997), Simons Investigator Award (2012), Gödel Prize (2001 and 2010), ACMInfosys Foundation Award in the computing Science (2012), and the Fulkerson Prize (2012).

Slides [ppt, 2mb]

14:45  15:30 
Robert Schapire
Microsoft Research & Princeton University

The Contextual Bandits Problem: A New, Fast, and Simple Algorithm
Abstract:
In the contextual bandits learning problem, the learner must repeatedly decide which action to take in response to an observed context, and is then permitted to observe the received reward, but only for the chosen action. The goal is to learn through experience to behave nearly as well as the best policy (or decision rule) in some possibly very large space of candidate policies. We assume that the learner can only access this policy space using an oracle for solving empirical costsensitive classification problems; in practice, most offtheshelf classification algorithms could be used for this purpose. In this very general setting, we present a new, fast, and simple algorithm that achieves a regret guarantee that is statistically optimal. Moreover, this algorithm makes very modest use of the oracle, which it calls far less than once per round, on average. These properties suggest this may be the most practical contextual bandits algorithm among all existing approaches that are provably effective for general policy classes.
This is joint work with Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford and Lihong Li.
Bio:
Robert Schapire is a Principal Researcher at Microsoft Research in New York City, currently on leave from Princeton University. He received his ScB in math and computer science from Brown University in 1986, and his SM (1988) and PhD (1991) from MIT under the supervision of Ronald Rivest. After a short postdoc at Harvard, he joined the technical staff at AT&T Labs (formerly AT&T Bell Laboratories) in 1991. Since 2002, he has been with the Computer Science Department at Princeton University, and was named the David M. Siegel '83 Professor in Computer Science in 2013. He joined Microsoft Research in 2014. His awards include the 1991 ACM Doctoral Dissertation Award, the 2003 Gödel Prize, and the 2004 Kanelakkis Theory and Practice Award (both of the last two with Yoav Freund). He is a fellow of the AAAI, and a member of the National Academy of Engineering. He mainly studies machine learning, especially theoretically wellgrounded algorithms, with a particular focus on a constellation of closely related methods and topics that includes boosting, online learning, game theory, and maximum entropy.

Slides [pdf, 1mb]

15:30  16:00 
Coffee Break 
16:00  16:45 
Ravi Kannan
Microsoft Research India & Indian Institute of Science

Versatility of Singular Value Decomposition
Abstract:
Singular Value Decomposition (SVD) is a basic tool to deal with matrix data and has traditionally been applied in a variety of fields. In the modern setting, matrix sizes have increased, but improved sampling based algorithms are still effective. Besides, many new applications of SVD to Gaussian Mixtures, Nearest Neighbors, Topic Modeling etc. have been developed. Combined with a simple device of thresholding, SVD is useful on a new bunch of problems. The talk will discuss from first principles some recent
developments.

Slides [pdf, 1mb]


Thursday, January 8

09:30  10:30 
Elchanan Mossel
University of California, Berkeley

The Surprising Power of Belief Propagation
Abstract:
Belief propagation is an extremely popular and efficient algorithm in machine learning. The talk will concentrate on recent theoretical effort of understanding why is the algorithm so effective. This research effort reveals new connections between a type of zeta function from number theory, belief propagation and novel linear algebra algorithms.

Slides [pdf, 1mb]

10:30  11:00 
Coffee Break 
11:00  12:00 
Ronitt Rubinfeld
Massachusetts Institute of Technology & Tel Aviv University

Testing and Correction of Distributions
Abstract:

Slides [pdf, 1mb]

12:30  14:00 
Lunch (Guest House lawns) 
14:00  15:00 
Ashish Goel
Stanford University

Two Random Walks that Surprise
Abstract:

Slides  Part 1 [pdf, 22mb]
Slides  Part 2 [pdf, 4mb]

15:00  15:30 
Coffee Break 
15:30  16:30 
Rocco Servedio
Columbia University

Learning from Satisfying Assignments
Abstract:
We introduce and study a new type of learning problem for probability distributions over the \(n\)dimensional Boolean hypercube. A learning problem in our framework is defined by a class \(C\) of Boolean functions over the hypercube; in our model the learning algorithm is given uniform random satisfying assignments of an unknown function in \(C\) and its goal is to output a highaccuracy approximation of the uniform distribution over the space of satisfying assignments for \(f\). We discuss connections between the existence of efficient learning algorithms in this framework and evading the ``curse of dimensionality'' for more traditional density estimation problems.
As our main results, we show that the two most widely studied classes of Boolean functions in computational learning theory  linear threshold functions and DNF formulas  have efficient distribution learning algorithms in our model. Our algorithm for linear threshold functions runs in time \(\text{poly}(n,1/\epsilon)\) and our algorithm for polynomialsize DNF runs in quasipolynomial time. On the other hand, we also prove complementary hardness results which show that under cryptographic assumptions, learning monotone 2CNFs, intersections of 2 halfspaces, and degree2 PTFs are all hard. This shows that our algorithms are close to the limits of what is efficiently learnable in this model.
Joint work with Anindya De and Ilias Diakonikolas.

Slides [ppt, 3mb]


Friday. January 9 
09:00  09:45 
Deeparnab Chakrabarty
Microsoft Research India

Provable Submodular Minimization via Wolfe's Algorithm
Abstract:
Submodular functions are fundamental objects which arise in *numerous* applications including machine learning, economics, optimization, biology, information theory, etc. They are mathematically very beautiful. In particular, one can minimize them in polynomial time. This is nontrivial since the function is defined on \(2^n\) elements and all we can do is query its values. This was first shown using the celebrated ellipsoid method in the early 80s and later on combinatorial algorithms were found.
What we study is an algorithm which predates the ellipsoid method. Wolfe's algorithm is a heuristic proposed in 1976 to find the minimum length point on a polytope, and in 1984 Fujishige showed how to use this to minimize submodular functions. Together, this FujishigeWolfe heuristic, also called the minnorm point heuristic, seems to have the best empirical performance currently. However, no nontrivial theoretical guarantees were known. In this work we give the first convergence analysis of Wolfe's algorithm which gives a pseudopolynomial time guarantee for the FujishigeWolfe minnorm point algorithm for submodular function minimization.

Slides [ppt, 1mb]

09:45  10:30 
Prateek Jain
Microsoft Research India

Nonconvex Projection Based Approaches for Highdimensional Learning
Abstract:


10:30  11:00 
Coffee Break 
11:00  11:45 
Navin Goyal
Microsoft Research India

Algorithms for Independent Component Analysis
Abstract:

Slides [ppt, 2mb]

11:45  12:30 
Kavitha Telikepalli
Tata Institute of Fundamental Research

Pairwise Spanners
Abstract:

Slides [pdf, 10mb]

12:30  14:00 
Lunch (Guest House lawns) 
14:00  15:30 
Manindra Agrawal
Indian Institute of Technology Kanpur

Algebraic Complexity Theory
Abstract:

Slides [pdf, 2mb]

15:30  16:00 
Coffee Break 
16:00  16:45 
Neeraj Kayal
Microsoft Research India

Arithmetic Circuits: from Lower Bounds to Learning and Back
Abstract:


16:45  17:30 
Chandan Saha
Indian Institute of Science

Lower bounds for small depth arithmetic circuits
Abstract:
An approach to proving a superpolynomial lower bound for arithmetic circuits reduces the problem to proving ‘strong enough’ lower bounds for small depth circuits, in particular (nonhomogeneous) depth3 circuits and (homogeneous) depth4 circuits. Depth of a circuit is the number of layers of gates in it.
In this talk, we plan to discuss an exponential lower bound for (homogeneous) depth4 circuits that comes close to being ‘strong enough’. The techniques also yield exponential lower bounds for certain (nonhomogeneous) depth3 circuits, in particular depth3 circuits with low bottom fanin which also answers a question posed by Shpilka and Wigderson.
Based on joint works with Neeraj Kayal, Srikanth Srinivasan and Nutan Limaye.

Slides [ppt, 1mb]
