CaltechTHESIS
  A Caltech Library Service

Understanding and Improving Efficiency in Training of Deep Neural Networks

Citation

Zhao, Jiawei (2025) Understanding and Improving Efficiency in Training of Deep Neural Networks. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/jgq8-et91. https://resolver.caltech.edu/CaltechTHESIS:02122025-201948305

Abstract

As deep neural networks (DNNs) continue to drive progress in fields like computer vision and natural language processing, their increasing complexity presents significant challenges for training efficiency, particularly in large language models (LLMs). These challenges include memory limitations, energy consumption, and bandwidth constraints during training.

In this thesis, I address these challenges by analyzing the training dynamics of DNNs and proposing hardware-efficient learning algorithms to enhance training efficiency. First, I focus on mitigating memory limitations in LLM training. Training large models like LLMs requires substantial memory for parameters, gradients, and optimizer states, often exceeding standard hardware capacity. To tackle this, I propose GaLore, a memory-efficient training algorithm that reduces the memory footprint of LLM training by up to 65.5% while preserving performance. Additionally, I introduce InRank, an incremental low-rank learning algorithm that further reduces memory usage by gradually increasing matrix rank.

Next, I address the issue of high energy consumption during training. Training large models like LLMs demands considerable energy, contributing to environmental impact. To mitigate this, I propose LNS-Madam, a low-precision training algorithm leveraging the logarithmic number system (LNS) to lower energy consumption without compromising accuracy. LNS-Madam achieves up to 90% energy savings compared to a full-precision baseline model.

Finally, I focus on bandwidth limitations in distributed training. Training LLMs often requires distributing computations across multiple devices to accelerate training. However, network bandwidth constraints can cause communication bottlenecks that slow down training. To resolve this, I introduce signSGD with Majority Vote, a communication-efficient training algorithm that reduces the overhead associated with distributed training.

Item Type:Thesis (Dissertation (Ph.D.))
Subject Keywords:deep learning
Degree Grantor:California Institute of Technology
Division:Engineering and Applied Science
Major Option:Computing and Mathematical Sciences
Thesis Availability:Public (worldwide access)
Research Advisor(s):
  • Anandkumar, Anima
Thesis Committee:
  • Wierman, Adam C. (chair)
  • Anandkumar, Anima
  • Mazumdar, Eric V.
  • Chen, Beidi
  • Tian, Yuandong
Defense Date:3 September 2024
Funders:
Funding AgencyGrant Number
Bren ChairUNSPECIFIED
Schmidt Sciences, LLCUNSPECIFIED
Record Number:CaltechTHESIS:02122025-201948305
Persistent URL:https://resolver.caltech.edu/CaltechTHESIS:02122025-201948305
DOI:10.7907/jgq8-et91
Related URLs:
URLURL TypeDescription
https://arxiv.org/abs/2403.03507arXivArticle adapted for ch.2
https://arxiv.org/abs/2306.11250arXivArticle adapted for ch.3
https://arxiv.org/abs/2106.13914arXivArticle adapted for ch.4
https://arxiv.org/abs/2110.12661arXivArticle adapted for ch.5
ORCID:
AuthorORCID
Zhao, Jiawei0000-0002-5726-6040
Default Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:16999
Collection:CaltechTHESIS
Deposited By: Jiawei Zhao
Deposited On:13 Feb 2025 22:22
Last Modified:19 Feb 2025 16:50

Thesis Files

[img] PDF - Final Version
See Usage Policy.

11MB

Repository Staff Only: item control page