JUCS - Journal of Universal Computer Science 20(9): 1352-1372, doi: 10.3217/jucs-020-09-1351
Extending an Application-Level Checkpointing Tool to Provide Fault Tolerance Support to OpenMP Applications
expand article infoNuria Losada, María J. Martín, Gabriel Rodríguez, Patricia González
‡ University of A Coruña, A Coruña, Spain
Open Access
Abstract
Despite the increasing popularity of shared-memory systems, there is a lack of tools for providing fault tolerance support to shared-memory applications. CPPC (ComPiler for Portable Checkpointing) is an application-level checkpointing tool focused on the insertion of fault tolerance into long-running MPI applications. This paper presents an extension to CPPC to allow the checkpointing of OpenMP applications. The proposed solution maintains the main characteristics of CPPC: portability and reduced checkpoint file size. The performance of the proposal is evaluated using the OpenMP NAS Parallel Benchmarks showing that most of the applications present small checkpoint overheads.
Keywords
parallel programming, OpenMP, fault tolerance, checkpointing