Neural Network Models of Learning and Generalization

Citation

Vafeidis, Panteleimon (2025) Neural Network Models of Learning and Generalization. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/cxs1-ss10. https://resolver.caltech.edu/CaltechTHESIS:05212025-232450900

Abstract

Neural networks have emerged as powerful models for understanding both biological and artificial intelligence. This thesis investigates fundamental principles of learning and generalization across four interconnected domains, bridging insights from theoretical neuroscience and machine learning to advance our understanding of intelligent systems.

Chapter I addresses a central question in associative learning: how do neural circuits learn to associate concepts with one another? We combine two cortical inductive biases, namely mixed selectivity and predictive learning in compartmentalized neurons, to explain how the cortical architecture may confer significant evolutionary advantages for efficient learning and packing multiple associations within the same neuronal population. Our model achieves stimulus substitution, where neurons respond identically to a conditioned stimulus as they would to the associated unconditioned stimulus, a feat in which traditional, Hebb-based learning rules fail.

Chapter II pivots from the static mappings between concepts learned in Chapter I to explore how neural systems develop the precise synaptic connectivity required to establish dynamic mappings for path integration—the ability to maintain an internal sense of location without external cues. Applied to the Drosophila head direction system, our model develops connectivity patterns strikingly similar to those observed experimentally, with Continuous Attractor (CAN) dynamics emerging naturally from learning. This offers a novel perspective on how precisely calibrated neural circuits can develop through experience, rather than requiring genetic pre-specification, and explains experimental findings where animals adapt their internal representation when sensory experience changes.

In Chapter III, we establish a theoretical framework explaining how disentangled representations—internal models that isolate independent factors of variation in the world—emerge from multi-task learning. We prove that any system competent at multiple related tasks must implicitly represent the underlying latent variables in a linearly decodable form. We experimentally confirm all major theoretical predictions, and reveal a fundamental connection between task diversity and representation quality, particularly explaining why modern transformer models may develop human-interpretable concepts. Furthermore, our work suggests that the massively parallel cortical architecture may be a key facilitator in the development of representations that enable the impressive zero-shot generalization ability that humans possess.

Finally, Chapter IV proposes leveraging Large Language Models (LLMs) as cognitive tools for evaluating latent factor hypotheses for psychology, leveraging the theoretical insights from Chapter III. It suggests that the self-consistency of an LLM's responses given hypothesized psychological factors could serve as a metric for psychological latent factor hypothesis evaluation. While preliminary, this approach represents a novel computational methodology for psychology that could transform how hypotheses for human cognition are developed and refined.

Continuous attractors display prominently in this thesis (Chapters II and III), yet these concepts are misunderstood, particularly in the experimental literature. Hence, in Appendix D of this thesis we provide important considerations about the detection and quantification of Continuous Attractors (CANs) from experimental data, considerations particularly important in order to avoid confusion when it comes to these concepts, leading to wasted efforts and resources in the experimental neuroscience community.

Together, these investigations reveal complementary aspects of how intelligent systems develop useful, generalizable representations through learning. From biologically plausible learning rules to abstract computational principles, this thesis demonstrates how neural networks can illuminate fundamental mechanisms of intelligence across natural and artificial systems, contributing to a unified science of Neural Computation.

Item Type:

Thesis (Dissertation (Ph.D.))

Subject Keywords:

Theoretical Neuroscience, Machine Learning, Neural Networks, Representation Learning, Biologically Plausible Learning, Continuous Attractors, Zero-Shot Generalization, Disentanglement, Large Language Models, Predictive Coding, Stimulus Substitution, Path Integration, Psychological Latent Factors, NeuroAI

Degree Grantor:

California Institute of Technology

Division:

Engineering and Applied Science

Major Option:

Computation and Neural Systems

Thesis Availability:

Public (worldwide access)

Research Advisor(s):

Rangel, Antonio

Thesis Committee:

O'Doherty, John P. (chair)
Perona, Pietro
Yue, Yisong
Rangel, Antonio

Defense Date:

8 May 2025

Funders:

Funding Agency	Grant Number
Onassis Foundation	UNSPECIFIED
NOMIS Foundation	UNSPECIFIED

Record Number:

CaltechTHESIS:05212025-232450900

Persistent URL:

https://resolver.caltech.edu/CaltechTHESIS:05212025-232450900

DOI:

10.7907/cxs1-ss10

Related URLs:

URL	URL Type	Description
https://arxiv.org/abs/2409.13471	arXiv	Arxiv preprint adapted for Chapter I
https://doi.org/10.7554/eLife.69841	DOI	Published paper adapted for Chapter II
https://openreview.net/forum?id=yVGGtsOgc7	Publisher	Published paper adapted for Chapter III
https://openreview.net/forum?id=S0YFcYMis7	Publisher	Published workshop paper related to Chapter III
https://openreview.net/forum?id=LWPoA68TFT	Publisher	Published workshop paper related to Chapter III
https://openreview.net/forum?id=vqD8LEvIq3	Publisher	Published workshop paper related to Chapter III

ORCID:

Author	ORCID
Vafeidis, Panteleimon	0000-0002-9768-0609

Default Usage Policy:

No commercial reproduction, distribution, display or performance rights in this work are provided.

ID Code:

17258

Collection:

CaltechTHESIS

Deposited By:

Panteleimon Vafeidis

Deposited On:

27 May 2025 19:24

Last Modified:

17 Jun 2025 17:27

Thesis Files

PDF (Redacted thesis - Appendix D.6 omitted) - Final Version
See Usage Policy.
19MB

Repository Staff Only: item control page