Seminar by Arindam Khan

Towards neural networks robust to distribution shifts

Abstract: Despite their success, the performance of neural networks has been shown to be brittle to mismatch between train and test distributions. Previous works have hypothesized that this brittleness is caused because deep networks rely only on simple features of the input (such as background or texture of images) to make decisions, while completely ignoring complex features. Surprisingly, we find that the features learnt by network’s backbone are sufficient for out of distribution generalization, however, the final classifier layer trained using ERM does not use these features optimally for the same. We posit two reasons for this:
1.dominance of non-robust features
2.replication of simple features, leading to over-dependence of the max-margin classifier on these.

We empirically validate these hypotheses on semi-synthetic and real-world datasets. We also draw connections with the line of work studying simplicity bias of neural nets. We then propose two methods to deal with both of these phenomena, and show gains of upto 1.5% over the state-of-the-art on DomainBed - a standard and large-scale benchmark for domain generalization.

Based on joint works with Anshul Nasery, Sravanti Addepalli, R. Venkatesh Babu and Praneeth Netrapalli.

Speaker Bio

Prateek Jain leads the Machine Learning Foundations and Optimization team at Google AI, Bangalore, India. Prateek completed his PhD at the University of Texas at Austin under Prof. Inderjit S. Dhillon. He was later a Sr. Principal Research Scientist at Microsoft Research India. His research interests include machine learning, non-convex optimization, high-dimensional statistics, and optimization algorithms. He is also interested in applications of machine learning to privacy, computer vision, text mining and natural language processing.