Citation
Zhao, Jiawei (2025) Understanding and Improving Efficiency in Training of Deep Neural Networks. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/jgq8-et91. https://resolver.caltech.edu/CaltechTHESIS:02122025-201948305
Abstract
As deep neural networks (DNNs) continue to drive progress in fields like computer vision and natural language processing, their increasing complexity presents significant challenges for training efficiency, particularly in large language models (LLMs). These challenges include memory limitations, energy consumption, and bandwidth constraints during training.
In this thesis, I address these challenges by analyzing the training dynamics of DNNs and proposing hardware-efficient learning algorithms to enhance training efficiency. First, I focus on mitigating memory limitations in LLM training. Training large models like LLMs requires substantial memory for parameters, gradients, and optimizer states, often exceeding standard hardware capacity. To tackle this, I propose GaLore, a memory-efficient training algorithm that reduces the memory footprint of LLM training by up to 65.5% while preserving performance. Additionally, I introduce InRank, an incremental low-rank learning algorithm that further reduces memory usage by gradually increasing matrix rank.
Next, I address the issue of high energy consumption during training. Training large models like LLMs demands considerable energy, contributing to environmental impact. To mitigate this, I propose LNS-Madam, a low-precision training algorithm leveraging the logarithmic number system (LNS) to lower energy consumption without compromising accuracy. LNS-Madam achieves up to 90% energy savings compared to a full-precision baseline model.
Finally, I focus on bandwidth limitations in distributed training. Training LLMs often requires distributing computations across multiple devices to accelerate training. However, network bandwidth constraints can cause communication bottlenecks that slow down training. To resolve this, I introduce signSGD with Majority Vote, a communication-efficient training algorithm that reduces the overhead associated with distributed training.
Item Type: | Thesis (Dissertation (Ph.D.)) | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Subject Keywords: | deep learning | |||||||||||||||
Degree Grantor: | California Institute of Technology | |||||||||||||||
Division: | Engineering and Applied Science | |||||||||||||||
Major Option: | Computing and Mathematical Sciences | |||||||||||||||
Thesis Availability: | Public (worldwide access) | |||||||||||||||
Research Advisor(s): |
| |||||||||||||||
Thesis Committee: |
| |||||||||||||||
Defense Date: | 3 September 2024 | |||||||||||||||
Funders: |
| |||||||||||||||
Record Number: | CaltechTHESIS:02122025-201948305 | |||||||||||||||
Persistent URL: | https://resolver.caltech.edu/CaltechTHESIS:02122025-201948305 | |||||||||||||||
DOI: | 10.7907/jgq8-et91 | |||||||||||||||
Related URLs: |
| |||||||||||||||
ORCID: |
| |||||||||||||||
Default Usage Policy: | No commercial reproduction, distribution, display or performance rights in this work are provided. | |||||||||||||||
ID Code: | 16999 | |||||||||||||||
Collection: | CaltechTHESIS | |||||||||||||||
Deposited By: | Jiawei Zhao | |||||||||||||||
Deposited On: | 13 Feb 2025 22:22 | |||||||||||||||
Last Modified: | 19 Feb 2025 16:50 |
Thesis Files
![]() |
PDF
- Final Version
See Usage Policy. 11MB |
Repository Staff Only: item control page