A Caltech Library Service

Compiler Optimization of Data Storage


Gupta, Rajiv (1991) Compiler Optimization of Data Storage. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/E8DD-VG68.


The system efficiency and throughput of most architectures are critically dependent on the ability of the memory subsystem to satisfy data operand accesses. This ability is in turn dependent on the distribution or layout of the data relative to the access of the data by the executing code. Page faults, cache misses, truncated vectors, global communication, for example, are expensive but common symptoms of data and access misalignment.

Compiler optimization, traditionally synonymous with code optimization, has addressed the issue of efficient data access by manipulating the code to better access the data under a fixed, default distribution. This approach is restrictive, and often suboptimal. Data optimization, or data-layout optimization, is presented as an integral part of compiler optimization.

For scalar data, a good compile-time approximation of the "reference string," or sequence of data accesses, is advanced for the purpose of distributing the data. However, the optimal distribution of the scalar data for such, or any, reference string is proved NP-complete. A methodology and a polynomial algorithm for an approximate solution are developed. Experiments with representative, but scaled, scientific programs and execution environments display a reduction in cache misses up to two orders in magnitude.

For array data, compile-time predictions of the patterns in which the data is accessed by programs in scalar and array languages are examined. For arbitrary computations in an array language, the determination of the optimal layout of the data is proved to be NP-complete. Polynomial techniques for the approximate solutions to the optimal layout of arrays in both languages, scalar and array, are outlined.

The general applicability of the techniques, in terms of environments other than hierarchical memories, and in terms of interdependence with code manipulations, is discussed. New code optimizations inspired by the data distribution techniques are motivated. The prudence of compiler- over user-optimized data distribution is argued.

Item Type:Thesis (Dissertation (Ph.D.))
Subject Keywords:Computer Science
Degree Grantor:California Institute of Technology
Division:Engineering and Applied Science
Major Option:Computer Science
Thesis Availability:Public (worldwide access)
Research Advisor(s):
  • Kajiya, James Thomas
Thesis Committee:
  • Kajiya, James Thomas (chair)
  • Barr, Alan H.
  • Chandy, K. Mani
Defense Date:13 July 1990
Record Number:CaltechETD:etd-06272007-081805
Persistent URL:
Default Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:2740
Deposited By: Imported from ETD-db
Deposited On:18 Jul 2007
Last Modified:21 Dec 2019 04:01

Thesis Files

PDF (Gupta_r_1991.pdf) - Final Version
See Usage Policy.


Repository Staff Only: item control page