JUCS - Journal of Universal Computer Science 22(4): 537-551, doi: 10.3217/jucs-022-04-0537

An Empirical Investigation of Security Vulnerabilities within Web Applications

Ibrahim Abunadi^‡, Mamdouh Alenezi^‡

‡ Prince Sultan University, Riyadh, Saudi Arabia

Corresponding author: Ibrahim Abunadi ( iabunadi@psu.edu.sa )

This article is freely available under the J.UCS Open Content License.

Citation: Abunadi I, Alenezi M (2016) An Empirical Investigation of Security Vulnerabilities within Web Applications. JUCS - Journal of Universal Computer Science 22(4): 537-551. https://doi.org/10.3217/jucs-022-04-0537

Abstract

Building secure software is challenging, time-consuming, and expensive. Software vulnerability prediction models that identify vulnerable software components are usually used to focus security efforts, with the aim of helping to reduce the time and effort needed to secure software. Existing vulnerability prediction models use process or product metrics and machine learning techniques to identify vulnerable software components. Cross-project vulnerability prediction plays a significant role in appraising the most likely vulnerable software components, specifically for new or inactive projects. Little effort has been spent to deliver clear guidelines on how to choose the training data for project vulnerability prediction. In this work, we present an empirical study aiming at clarifying how useful cross-project prediction techniques are in predicting software vulnerabilities. Our study employs the classification provided by different machine learning techniques to improve the detection of vulnerable components. We have elaborately compared the prediction performance of five well-known classifiers. The study is conducted on a publicly available dataset of several PHP open-source web applications in the context of cross-project vulnerability prediction, which represents one of the main challenges in the vulnerability prediction field.

Keywords

cross-project vulnerability prediction, software security, software quality, data mining