Error-Correcting Codes for Networks, Storage and Computation

Citation

Halbawi, Wael (2017) Error-Correcting Codes for Networks, Storage and Computation. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/Z9J67F08. https://resolver.caltech.edu/CaltechTHESIS:06092017-013147699

Abstract

The advent of the information age has bestowed upon us three challenges related to the way we deal with data. Firstly, there is an unprecedented demand for transmitting data at high rates. Secondly, the massive amounts of data being collected from various sources needs to be stored across time. Thirdly, there is a need to process the data collected and perform computations on it in order to extract meaningful information out of it. The interconnected nature of modern systems designed to perform these tasks has unraveled new difficulties when it comes to ensuring their resilience against sources of performance degradation. In the context of network communication and distributed data storage, system-level noise and adversarial errors have to be combated with efficient error correction schemes. In the case of distributed computation, the heterogeneous nature of computing clusters can potentially diminish the speedups promised by parallel algorithms, calling for schemes that mitigate the effect of slow machines and communication delay.

This thesis addresses the problem of designing efficient fault tolerance schemes for the three scenarios just described. In the network communication setting, a family of multiple-source multicast networks that employ linear network coding is considered for which capacity-achieving distributed error-correcting codes, based on classical algebraic constructions, are designed. The codes require no coordination between the source nodes and are end to end: except for the source nodes and the destination node, the operation of the network remains unchanged.

In the context of data storage, balanced error-correcting codes are constructed so that the encoding effort required is balanced out across the storage nodes. In particular, it is shown that for a fixed row weight, any cyclic Reed-Solomon code possesses a generator matrix in which the number of nonzeros is the same across the columns. In the balanced and sparsest case, where each row of the generator matrix is a minimum distance codeword, the maximal encoding time over the storage nodes is minimized, a property that is appealing in write-intensive settings. Analogous constructions are presented for a locally recoverable code construction due to Tamo and Barg.

Lastly, the problem of mitigating stragglers in a distributed computation setup is addressed, where a function of some dataset is computed in parallel. Using Reed-Solomon coding techniques, a scheme is proposed that allows for the recovery of the function under consideration from the minimum number of machines possible. The only assumption made on the function is that it is additively separable, which renders the scheme useful in distributed gradient descent implementations. Furthermore, a theoretical model for the run time of the scheme is presented. When the return time of the machines is modeled probabilistically, the model can be used to optimally pick the scheme's parameters so that the expected computation time is minimized. The recovery is performed using an algorithm that runs in quadratic time and linear space, a notable improvement compared to state-of-the-art schemes.

The unifying theme of the three scenarios is the construction of error-correcting codes whose encoding functions adhere to certain constraints. It is shown that in many cases, these constraints can be satisfied by classical constructions. As a result, the schemes presented are deterministic, operate over small finite fields and can be decoded using efficient algorithms.

Item Type:

Thesis (Dissertation (Ph.D.))

Subject Keywords:

Error-Correcting Codes, Algebraic Coding Theory, Distributed Storage Systems, Distributed Computation

Degree Grantor:

California Institute of Technology

Division:

Engineering and Applied Science

Major Option:

Electrical Engineering

Thesis Availability:

Public (worldwide access)

Research Advisor(s):

Hassibi, Babak

Thesis Committee:

Hassibi, Babak (chair)
Bruck, Jehoshua
Divsalar, Dariush
Dimakis, Alexandros G.
Kostina, Victoria

Defense Date:

5 June 2017

Record Number:

CaltechTHESIS:06092017-013147699

Persistent URL:

https://resolver.caltech.edu/CaltechTHESIS:06092017-013147699

DOI:

10.7907/Z9J67F08

ORCID:

Author	ORCID
Halbawi, Wael	0000-0001-5951-7002

Default Usage Policy:

No commercial reproduction, distribution, display or performance rights in this work are provided.

ID Code:

10328

Collection:

CaltechTHESIS

Deposited By:

Wael Halbawi

Deposited On:

09 Jun 2017 23:34

Last Modified:

04 Oct 2019 00:17

Thesis Files

Preview

PDF - Final Version
See Usage Policy.
971kB

Repository Staff Only: item control page