Understanding modern machine learning models through the lens of high-dimensional statistics

Wednesday, March 22, 2023

Modern machine learning tasks are often high-dimensional, due to the large amount of data, features, and trainable parameters. Mathematical tools such as random matrix theory have been developed to precisely study simple learning models in the high-dimensional regime, and such precise analysis can reveal interesting phenomena that are also empirically observed in deep learning. In this talk I will introduce two examples. First we consider the selection of regularization hyperparameters in the overparameterized regime. We establish a set of equations that rigorously describes the asymptotic generalization error of the ridge regression estimator, which leads to surprising findings including: (i) the optimal ridge penalty can be negative, (ii) regularization can suppress “multiple descent” in the risk curve. I will then discuss practical implications such as the implicit bias of first- and second-order optimizers in neural network training. For the second part, we go beyond linear models and characterize the benefit of gradient-based representation (feature) learning in neural networks. By studying the precise performance of kernel ridge regression on the trained features in a two-layer neural network, we prove that feature learning results in a considerable advantage over the initial random features model; this analysis also highlights the role of learning rate scaling in the initial phase of gradient descent.

 

Post Talk Link:  Click Here 
Passcode: @4ZUihGk

Speaker/s

Denny Wu is a Ph.D. student in computer science at the University of Toronto and the Vector Institute, under the supervision of Jimmy Ba and Murat A. Erdogdu. Before that he was an undergraduate student at Carnegie Mellon University advised by Ruslan Salakhutdinov. Denny's research focuses on developing a theoretical understanding (e.g., optimization and generalization) of modern machine learning systems, especially overparameterized models such as neural networks, using tools from high-dimensional statistics.

Related