Optimisation & Generalisation in Networks of Neurons

Citation

Bernstein, Jeremy David (2023) Optimisation & Generalisation in Networks of Neurons. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/1jz8-5t85. https://resolver.caltech.edu/CaltechTHESIS:10132022-000100592

Abstract

The goal of this thesis is to develop the optimisation and generalisation theoretic foundations of learning in artificial neural networks. The thesis tackles two central questions. Given training data and a network architecture:

Which weight setting will generalise best to unseen data, and why?
What optimiser should be used to recover this weight setting?

On optimisation, an essential feature of neural network training is that the network weights affect the loss function only indirectly through their appearance in the network architecture. This thesis proposes a three-step framework for deriving novel “architecture aware” optimisation algorithms. The first step—termed functional majorisation—is to majorise a series expansion of the loss function in terms of functional perturbations. The second step is to derive architectural perturbation bounds that relate the size of functional perturbations to the size of weight perturbations. The third step is to substitute these architectural perturbation bounds into the functional majorisation of the loss and to obtain an optimisation algorithm via minimisation. This constitutes an application of the majorise-minimise meta-algorithm to neural networks.

On generalisation, a promising recent line of work has applied PAC-Bayes theory to derive non-vacuous generalisation guarantees for neural networks. Since these guarantees control the average risk of ensembles of networks, they do not address which individual network should generalise best. To close this gap, the thesis rekindles an old idea from the kernels literature: the Bayes point machine. A Bayes point machine is a single classifier that approximates the aggregate prediction of an ensemble of classifiers. Since aggregation reduces the variance of ensemble predictions, Bayes point machines tend to generalise better than other ensemble members. The thesis shows that the space of neural networks consistent with a training set concentrates on a Bayes point machine if both the network width and normalised margin are sent to infinity. This motivates the practice of returning a wide network of large normalised margin.

Potential applications of these ideas include novel methods for uncertainty quantification, more efficient numerical representations for neural hardware, and optimisers that transfer hyperparameters across learning problems.

Item Type:

Thesis (Dissertation (Ph.D.))

Subject Keywords:

neural networks; kernel methods; Gaussian processes; optimisation; generalisation; majorise-minimise; functional majorisation; architectural perturbation bounds; Bayes point machines; normalised margin; hyperparameter transfer; neural hardware; uncertainty quantification

Degree Grantor:

California Institute of Technology

Division:

Biology and Biological Engineering

Major Option:

Computation and Neural Systems

Thesis Availability:

Public (worldwide access)

Research Advisor(s):

Yue, Yisong

Thesis Committee:

Tropp, Joel A. (chair)
Liu, Ming-Yu
Meister, Markus
Thomson, Matthew
Yue, Yisong

Defense Date:

23 September 2022

Record Number:

CaltechTHESIS:10132022-000100592

Persistent URL:

https://resolver.caltech.edu/CaltechTHESIS:10132022-000100592

DOI:

10.7907/1jz8-5t85

ORCID:

Author	ORCID
Bernstein, Jeremy David	0000-0001-9110-7476

Default Usage Policy:

No commercial reproduction, distribution, display or performance rights in this work are provided.

ID Code:

15041

Collection:

CaltechTHESIS

Deposited By:

Jeremy Bernstein

Deposited On:

25 Oct 2022 21:44

Last Modified:

16 Jun 2023 22:48

Thesis Files

PDF (Thesis) - Final Version
See Usage Policy.
2MB

Repository Staff Only: item control page