Citation
Greenfeld, Norton Robert (1972) Computer System Support for Data Analysis. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/XGAM-XB93. https://resolver.caltech.edu/CaltechTHESIS:04112016-081827399
Abstract
This thesis is an investigation into the nature of data analysis and computer software systems which support this activity.
The first chapter develops the notion of data analysis as an experimental science which has two major components: data-gathering and theory-building. The basic role of language in determining the meaningfulness of theory is stressed, and the informativeness of a language and data base pair is studied. The static and dynamic aspects of data analysis are then considered from this conceptual vantage point. The second chapter surveys the available types of computer systems which may be useful for data analysis. Particular attention is paid to the questions raised in the first chapter about the language restrictions imposed by the computer system and its dynamic properties.
The third chapter discusses the REL data analysis system, which was designed to satisfy the needs of the data analyzer in an operational relational data system. The major limitation on the use of such systems is the amount of access to data stored on a relatively slow secondary memory. This problem of the paging of data is investigated and two classes of data structure representations are found, each of which has desirable paging characteristics for certain types of queries. One representation is used by most of the generalized data base management systems in existence today, but the other is clearly preferred in the data analysis environment, as conceptualized in Chapter I.
This data representation has strong implications for a fundamental process of data analysis -- the quantification of variables. Since quantification is one of the few means of summarizing and abstracting, data analysis systems are under strong pressure to facilitate the process. Two implementations of quantification are studied: one analagous to the form of the lower predicate calculus and another more closely attuned to the data representation. A comparison of these indicates that the use of the "label class" method results in orders of magnitude improvement over the lower predicate calculus technique.
Item Type: | Thesis (Dissertation (Ph.D.)) |
---|---|
Subject Keywords: | (Engineering Science) |
Degree Grantor: | California Institute of Technology |
Division: | Engineering and Applied Science |
Major Option: | Engineering |
Thesis Availability: | Public (worldwide access) |
Research Advisor(s): |
|
Thesis Committee: |
|
Defense Date: | 6 March 1972 |
Record Number: | CaltechTHESIS:04112016-081827399 |
Persistent URL: | https://resolver.caltech.edu/CaltechTHESIS:04112016-081827399 |
DOI: | 10.7907/XGAM-XB93 |
Default Usage Policy: | No commercial reproduction, distribution, display or performance rights in this work are provided. |
ID Code: | 9667 |
Collection: | CaltechTHESIS |
Deposited By: | INVALID USER |
Deposited On: | 11 Apr 2016 15:57 |
Last Modified: | 01 Jul 2024 17:28 |
Thesis Files
|
PDF
- Final Version
See Usage Policy. 44MB |
Repository Staff Only: item control page