Neural network theory: learning & generalisation

Neural networks are a powerful tool in modern machine learning, driving progress in areas ranging from protein folding to natural language processing. This half of the course will cover theoretical results with a direct bearing on machine learning practice. The course will tackle questions such as:

  • How should I wire up my neural network?
  • What class of functions does my network realise?
  • If my wide 2-layer network can fit any dataset, why try anything else?
  • How far is it safe to perturb my neural network during learning?
  • Why does my network with more parameters than training data still generalise?
  • Why is VC dimension not a relevant complexity measure for my network?
  • How much information did my neural network extract from the training data?

Health warning: these questions are still the subject of active research. This course will present the instructor’s best understanding of the issues and their resolutions.

Instructor

This part of the course will be taught by Jeremy Bernstein (bernstein@caltech.edu).

Homeworks

# Date set Date due Resources
3 4/22 4/29 hw3.zip
4 4/29 5/06 hw4.pdf

Lectures

# Date Subject Resources
    Main Lectures  
7 4/20 Neural Architecture Design pdf / vid
8 4/22 Network Function Spaces pdf / vid / ipynb
9 4/27 Network Optimisation pdf / vid
10 4/29 Statistical Learning Theory pdf / vid
11 5/04 PAC-Bayesian Theory pdf / vid
12 5/06 Project Ideas pdf / vid
    Guest Lectures  
14 5/13 Yasaman Bahri pdf / vid
18 5/27 Guillermo Valle-Pérez & Ard Louis pdf / vid
19 6/01 SueYeon Chung pdf / vid

Lecture 7 references

Network topologies:

Neural architecture search:

Local network design:

Perturbation theory:

Lecture 8 references

Universal function approximation:

NNGP correspondence:

Lecture 9 references

“Classic” deep learning optimisers:

Optimisation models:

Relative optimisers:

  • LARS, LAMB and Fromage—per layer relative updates;
  • Madam—per synapse relative updates;
  • Nero and NFnets—per neuron relative updates with weight constraints.

Lecture 10 references

Function counting and generalisation:

VC theory and neural networks:

Lecture 11 references

Bayesian data analysis and model comparison:

PAC-Bayes derivation:

PAC-Bayes for NNs:

Lecture 12 references

Understanding the distribution of solutions that SGD samples from:

Using PAC-Bayes for architecture design:

Adversarial examples:

Combining NN architectural properties with control: