Understanding and Improving Efficiency in Training of Deep Neural Networks

Citation

Zhao, Jiawei (2025) Understanding and Improving Efficiency in Training of Deep Neural Networks. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/jgq8-et91. https://resolver.caltech.edu/CaltechTHESIS:02122025-201948305

Abstract

As deep neural networks (DNNs) continue to drive progress in fields like computer vision and natural language processing, their increasing complexity presents significant challenges for training efficiency, particularly in large language models (LLMs). These challenges include memory limitations, energy consumption, and bandwidth constraints during training.

In this thesis, I address these challenges by analyzing the training dynamics of DNNs and proposing hardware-efficient learning algorithms to enhance training efficiency. First, I focus on mitigating memory limitations in LLM training. Training large models like LLMs requires substantial memory for parameters, gradients, and optimizer states, often exceeding standard hardware capacity. To tackle this, I propose GaLore, a memory-efficient training algorithm that reduces the memory footprint of LLM training by up to 65.5% while preserving performance. Additionally, I introduce InRank, an incremental low-rank learning algorithm that further reduces memory usage by gradually increasing matrix rank.

Next, I address the issue of high energy consumption during training. Training large models like LLMs demands considerable energy, contributing to environmental impact. To mitigate this, I propose LNS-Madam, a low-precision training algorithm leveraging the logarithmic number system (LNS) to lower energy consumption without compromising accuracy. LNS-Madam achieves up to 90% energy savings compared to a full-precision baseline model.

Finally, I focus on bandwidth limitations in distributed training. Training LLMs often requires distributing computations across multiple devices to accelerate training. However, network bandwidth constraints can cause communication bottlenecks that slow down training. To resolve this, I introduce signSGD with Majority Vote, a communication-efficient training algorithm that reduces the overhead associated with distributed training.

Item Type:

Thesis (Dissertation (Ph.D.))

Subject Keywords:

deep learning

Degree Grantor:

California Institute of Technology

Division:

Engineering and Applied Science

Major Option:

Computing and Mathematical Sciences

Thesis Availability:

Public (worldwide access)

Research Advisor(s):

Anandkumar, Anima

Thesis Committee:

Wierman, Adam C. (chair)
Anandkumar, Anima
Mazumdar, Eric V.
Chen, Beidi
Tian, Yuandong

Defense Date:

3 September 2024

Funders:

Funding Agency	Grant Number
Bren Chair	UNSPECIFIED
Schmidt Sciences, LLC	UNSPECIFIED

Record Number:

CaltechTHESIS:02122025-201948305

Persistent URL:

https://resolver.caltech.edu/CaltechTHESIS:02122025-201948305

DOI:

10.7907/jgq8-et91

Related URLs:

URL	URL Type	Description
https://arxiv.org/abs/2403.03507	arXiv	Article adapted for ch.2
https://arxiv.org/abs/2306.11250	arXiv	Article adapted for ch.3
https://arxiv.org/abs/2106.13914	arXiv	Article adapted for ch.4
https://arxiv.org/abs/2110.12661	arXiv	Article adapted for ch.5

ORCID:

Author	ORCID
Zhao, Jiawei	0000-0002-5726-6040

Default Usage Policy:

No commercial reproduction, distribution, display or performance rights in this work are provided.

ID Code:

16999

Collection:

CaltechTHESIS

Deposited By:

Jiawei Zhao

Deposited On:

13 Feb 2025 22:22

Last Modified:

19 Feb 2025 16:50

Thesis Files

PDF - Final Version
See Usage Policy.
11MB

Repository Staff Only: item control page