Compiler Generated Multithreading to Alleviate Memory Latency

Kristof Beyls; Erik H. D Hollander

doi:10.3217/jucs-006-10-0968

JUCS - Journal of Universal Computer Science 6(10): 968-993, doi: 10.3217/jucs-006-10-0968

Compiler Generated Multithreading to Alleviate Memory Latency

Kristof E. Beyls^‡, Erik H. D Hollander^‡

‡ Dept. of Electronics and Information Systems University of Ghent, Belgium

Corresponding author: Kristof Beyls ( kbeyls@elis.rug.ac.be )

This article is freely available under the J.UCS Open Content License.

Citation: Beyls KE, Hollander EHD (2000) Compiler Generated Multithreading to Alleviate Memory Latency. JUCS - Journal of Universal Computer Science 6(10): 968-993. https://doi.org/10.3217/jucs-006-10-0968

Abstract

Since the era of vector and pipelined computing, the computational speed is limited by the memory access time. Faster caches and more cache levels are used to bridge the growing gap between the memory and processor speeds. With the advent of multithreaded processors, it becomes feasible to concurrently fetch data and compute in two cooperating threads. A technique is presented to generate these threads at compile time, taking into account the characteristics of both the program and the underlying architecture. The results have been evaluated for an explicitly parallel processor. With a number of common programs the data-fetch thread allows to continue the computation without cache miss stalls.

Keywords

data locality, multithreading, run-time data relocation, compiler optimization, cache optimization, prefetching, tiling