Citation
Bernstein, Jeremy David (2023) Optimisation & Generalisation in Networks of Neurons. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/1jz8-5t85. https://resolver.caltech.edu/CaltechTHESIS:10132022-000100592
Abstract
The goal of this thesis is to develop the optimisation and generalisation theoretic foundations of learning in artificial neural networks. The thesis tackles two central questions. Given training data and a network architecture:
- Which weight setting will generalise best to unseen data, and why?
- What optimiser should be used to recover this weight setting?
On optimisation, an essential feature of neural network training is that the network weights affect the loss function only indirectly through their appearance in the network architecture. This thesis proposes a three-step framework for deriving novel “architecture aware” optimisation algorithms. The first step—termed functional majorisation—is to majorise a series expansion of the loss function in terms of functional perturbations. The second step is to derive architectural perturbation bounds that relate the size of functional perturbations to the size of weight perturbations. The third step is to substitute these architectural perturbation bounds into the functional majorisation of the loss and to obtain an optimisation algorithm via minimisation. This constitutes an application of the majorise-minimise meta-algorithm to neural networks.
On generalisation, a promising recent line of work has applied PAC-Bayes theory to derive non-vacuous generalisation guarantees for neural networks. Since these guarantees control the average risk of ensembles of networks, they do not address which individual network should generalise best. To close this gap, the thesis rekindles an old idea from the kernels literature: the Bayes point machine. A Bayes point machine is a single classifier that approximates the aggregate prediction of an ensemble of classifiers. Since aggregation reduces the variance of ensemble predictions, Bayes point machines tend to generalise better than other ensemble members. The thesis shows that the space of neural networks consistent with a training set concentrates on a Bayes point machine if both the network width and normalised margin are sent to infinity. This motivates the practice of returning a wide network of large normalised margin.
Potential applications of these ideas include novel methods for uncertainty quantification, more efficient numerical representations for neural hardware, and optimisers that transfer hyperparameters across learning problems.
Item Type: | Thesis (Dissertation (Ph.D.)) | ||||
---|---|---|---|---|---|
Subject Keywords: | neural networks; kernel methods; Gaussian processes; optimisation; generalisation; majorise-minimise; functional majorisation; architectural perturbation bounds; Bayes point machines; normalised margin; hyperparameter transfer; neural hardware; uncertainty quantification | ||||
Degree Grantor: | California Institute of Technology | ||||
Division: | Biology and Biological Engineering | ||||
Major Option: | Computation and Neural Systems | ||||
Thesis Availability: | Public (worldwide access) | ||||
Research Advisor(s): |
| ||||
Thesis Committee: |
| ||||
Defense Date: | 23 September 2022 | ||||
Record Number: | CaltechTHESIS:10132022-000100592 | ||||
Persistent URL: | https://resolver.caltech.edu/CaltechTHESIS:10132022-000100592 | ||||
DOI: | 10.7907/1jz8-5t85 | ||||
ORCID: |
| ||||
Default Usage Policy: | No commercial reproduction, distribution, display or performance rights in this work are provided. | ||||
ID Code: | 15041 | ||||
Collection: | CaltechTHESIS | ||||
Deposited By: | Jeremy Bernstein | ||||
Deposited On: | 25 Oct 2022 21:44 | ||||
Last Modified: | 16 Jun 2023 22:48 |
Thesis Files
PDF (Thesis)
- Final Version
See Usage Policy. 2MB |
Repository Staff Only: item control page