Exploiting parallel memory hierarchies for ray casting volumes

Citation

Palmer, Michael Edward (1997) Exploiting parallel memory hierarchies for ray casting volumes. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/mbmg-ne39. https://resolver.caltech.edu/CaltechETD:etd-01162008-080520

Abstract

Previous work in single-processor ray casting methods for volume rendering has concentrated on algorithmic optimizations to reduce computational work. This approach leaves untapped the performance gains which are possible through efficient exploitation of the memory hierarchy.

Previous work in parallel volume rendering has concentrated on parallel partitioning, with the goals of maximizing load balance and minimizing communication between distributed nodes. This implies a simplified view of the memory hierarchy of a parallel machine, ignoring the relationship between parallel partitioning and memory hierarchy effects at all but the top level.

In this thesis, we progressively develop methods to optimize memory hierarchy performance for ray casting: 1) on a uniprocessor, using algorithmic modifications to isolate cache miss costs, specialized hardware to monitor cache misses on the bus, and a software cache simulator; 2) on the a shared-memory Power Challenge multiprocessor, examining the fundamental dependence of algorithmic design decisions regarding parallel partitioning upon memory hierarchy effects at several levels; and 3) on a distributed array of interconnected Power Challenge multiprocessors, on which we implement a logical global address space for volume blocks, and investigate the tradeoff between replication (caching) and communication of data. The methods we develop permit us to exploit the coherence found in volume rendering to increase memory locality, and thereby increase memory system performance. This focus on the optimal exploitation of the entire memory hierarchy, from the processor cache, to the interconnection network between distributed nodes, yields faster frame rates for large (357 MB to 1 GB) datasets than have been previously cited in the literature, and allows us to efficiently render a 7.1 GB dataset, the largest ever rendered.

Our results have implications for the parallel solution of other problems which, likeray casting, require a global gather operation, use an associative operator to combine partial results, and contain coherence. We discuss implications for the design of a parallel architecture suited to solving this class of problems, specifically, that these algorithms are best served by a deep memory hierarchy.

Item Type:	Thesis (Dissertation (Ph.D.))
Degree Grantor:	California Institute of Technology
Division:	Engineering and Applied Science
Major Option:	Computer Science
Thesis Availability:	Public (worldwide access)
Research Advisor(s):	Taylor, Stephen
Thesis Committee:	Unknown, Unknown
Defense Date:	4 April 1997
Record Number:	CaltechETD:etd-01162008-080520
Persistent URL:	https://resolver.caltech.edu/CaltechETD:etd-01162008-080520
DOI:	10.7907/mbmg-ne39
Default Usage Policy:	No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:	200
Collection:	CaltechTHESIS
Deposited By:	Imported from ETD-db
Deposited On:	13 Feb 2008
Last Modified:	16 Apr 2021 22:10

Thesis Files

Preview

PDF (Palmer_me_1997.pdf) - Final Version
See Usage Policy.
23MB

Repository Staff Only: item control page