JUCS - Journal of Universal Computer Science 6(10): 968-993, doi: 10.3217/jucs-006-10-0968
Compiler Generated Multithreading to Alleviate Memory Latency
expand article infoKristof E. Beyls, Erik H. D Hollander
‡ Dept. of Electronics and Information Systems University of Ghent, Belgium
Open Access
Abstract
Since the era of vector and pipelined computing, the computational speed is limited by the memory access time. Faster caches and more cache levels are used to bridge the growing gap between the memory and processor speeds. With the advent of multithreaded processors, it becomes feasible to concurrently fetch data and compute in two cooperating threads. A technique is presented to generate these threads at compile time, taking into account the characteristics of both the program and the underlying architecture. The results have been evaluated for an explicitly parallel processor. With a number of common programs the data-fetch thread allows to continue the computation without cache miss stalls.
Keywords
data locality, multithreading, run-time data relocation, compiler optimization, cache optimization, prefetching, tiling