JUCS - Journal of Universal Computer Science 31(1): 72-92, doi: 10.3897/jucs.120840
Fault Tolerance Model for Hadoop Distributed System
expand article infoSoraya Setti Ahmed, Yahya Slimani§, Riadh Frefita|
‡ Mustapha Stambouli University, Mascara, Algeria§ ISAMM, Manouba University, Manouba, Tunisia| Esprit School, Pôle Technologique,, El Ghazala, Tunisia
Open Access
Abstract
Fault tolerance approaches in distributed systems are essentially based on replication and checkpointing. Each of these approaches has its advantages and limitations. This paper has two objectives: first, it proposes a fault tolerance approach based on the nodes status of a distributed system. For this purpose, it defines 3 nodes status: safety, faulty and potentially faulty. With respect of classical node status (safety, faulty), it introduces a new status that we call potentially faulty. This last node allows to enhance the availability of a distributed system. Second, it discusses the efficiency of the proposed model on two types of architectures: virtual multi-node cluster and a physical multi-node cluster with WIFI connection. Experiments have showed that proposed approach increases the system performance throughput and its fault tolerance level.
Keywords
Distributed Systems, Hadoop, Fault Tolerance, Networks, Node Failures