CaltechTHESIS
  A Caltech Library Service

Neural Network Models of Learning and Generalization

Citation

Vafeidis, Panteleimon (2025) Neural Network Models of Learning and Generalization. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/cxs1-ss10. https://resolver.caltech.edu/CaltechTHESIS:05212025-232450900

Abstract

Neural networks have emerged as powerful models for understanding both biological and artificial intelligence. This thesis investigates fundamental principles of learning and generalization across four interconnected domains, bridging insights from theoretical neuroscience and machine learning to advance our understanding of intelligent systems.

Chapter I addresses a central question in associative learning: how do neural circuits learn to associate concepts with one another? We combine two cortical inductive biases, namely mixed selectivity and predictive learning in compartmentalized neurons, to explain how the cortical architecture may confer significant evolutionary advantages for efficient learning and packing multiple associations within the same neuronal population. Our model achieves stimulus substitution, where neurons respond identically to a conditioned stimulus as they would to the associated unconditioned stimulus, a feat in which traditional, Hebb-based learning rules fail.

Chapter II pivots from the static mappings between concepts learned in Chapter I to explore how neural systems develop the precise synaptic connectivity required to establish dynamic mappings for path integration—the ability to maintain an internal sense of location without external cues. Applied to the Drosophila head direction system, our model develops connectivity patterns strikingly similar to those observed experimentally, with Continuous Attractor (CAN) dynamics emerging naturally from learning. This offers a novel perspective on how precisely calibrated neural circuits can develop through experience, rather than requiring genetic pre-specification, and explains experimental findings where animals adapt their internal representation when sensory experience changes.

In Chapter III, we establish a theoretical framework explaining how disentangled representations—internal models that isolate independent factors of variation in the world—emerge from multi-task learning. We prove that any system competent at multiple related tasks must implicitly represent the underlying latent variables in a linearly decodable form. We experimentally confirm all major theoretical predictions, and reveal a fundamental connection between task diversity and representation quality, particularly explaining why modern transformer models may develop human-interpretable concepts. Furthermore, our work suggests that the massively parallel cortical architecture may be a key facilitator in the development of representations that enable the impressive zero-shot generalization ability that humans possess.

Finally, Chapter IV proposes leveraging Large Language Models (LLMs) as cognitive tools for evaluating latent factor hypotheses for psychology, leveraging the theoretical insights from Chapter III. It suggests that the self-consistency of an LLM's responses given hypothesized psychological factors could serve as a metric for psychological latent factor hypothesis evaluation. While preliminary, this approach represents a novel computational methodology for psychology that could transform how hypotheses for human cognition are developed and refined.

Continuous attractors display prominently in this thesis (Chapters II and III), yet these concepts are misunderstood, particularly in the experimental literature. Hence, in Appendix D of this thesis we provide important considerations about the detection and quantification of Continuous Attractors (CANs) from experimental data, considerations particularly important in order to avoid confusion when it comes to these concepts, leading to wasted efforts and resources in the experimental neuroscience community.

Together, these investigations reveal complementary aspects of how intelligent systems develop useful, generalizable representations through learning. From biologically plausible learning rules to abstract computational principles, this thesis demonstrates how neural networks can illuminate fundamental mechanisms of intelligence across natural and artificial systems, contributing to a unified science of Neural Computation.

Item Type:Thesis (Dissertation (Ph.D.))
Subject Keywords:Theoretical Neuroscience, Machine Learning, Neural Networks, Representation Learning, Biologically Plausible Learning, Continuous Attractors, Zero-Shot Generalization, Disentanglement, Large Language Models, Predictive Coding, Stimulus Substitution, Path Integration, Psychological Latent Factors, NeuroAI
Degree Grantor:California Institute of Technology
Division:Engineering and Applied Science
Major Option:Computation and Neural Systems
Thesis Availability:Public (worldwide access)
Research Advisor(s):
  • Rangel, Antonio
Thesis Committee:
  • O'Doherty, John P. (chair)
  • Perona, Pietro
  • Yue, Yisong
  • Rangel, Antonio
Defense Date:8 May 2025
Funders:
Funding AgencyGrant Number
Onassis FoundationUNSPECIFIED
NOMIS FoundationUNSPECIFIED
Record Number:CaltechTHESIS:05212025-232450900
Persistent URL:https://resolver.caltech.edu/CaltechTHESIS:05212025-232450900
DOI:10.7907/cxs1-ss10
Related URLs:
URLURL TypeDescription
https://arxiv.org/abs/2409.13471arXivArxiv preprint adapted for Chapter I
https://doi.org/10.7554/eLife.69841DOIPublished paper adapted for Chapter II
https://openreview.net/forum?id=yVGGtsOgc7PublisherPublished paper adapted for Chapter III
https://openreview.net/forum?id=S0YFcYMis7PublisherPublished workshop paper related to Chapter III
https://openreview.net/forum?id=LWPoA68TFTPublisherPublished workshop paper related to Chapter III
https://openreview.net/forum?id=vqD8LEvIq3PublisherPublished workshop paper related to Chapter III
ORCID:
AuthorORCID
Vafeidis, Panteleimon0000-0002-9768-0609
Default Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:17258
Collection:CaltechTHESIS
Deposited By: Panteleimon Vafeidis
Deposited On:27 May 2025 19:24
Last Modified:17 Jun 2025 17:27

Thesis Files

[img] PDF (Redacted thesis - Appendix D.6 omitted) - Final Version
See Usage Policy.

19MB

Repository Staff Only: item control page