Online cross-project prediction of defect-inducing software changes

Tabassum, Sadia ORCID: 0000-0002-5096-7100 (2023). Online cross-project prediction of defect-inducing software changes. University of Birmingham. Ph.D.

Preview

Tabassum2023PhD.pdf
Text - Accepted Version
Available under License All rights reserved.
Download (18MB) | Preview

Abstract

Just-In-Time Software Defect Prediction (JIT-SDP) is concerned with predicting whether software changes are defect-inducing or clean, based on machine learning classifiers. JIT-SDP operates in an online scenario where training examples are continuously received over time, necessitating the updating of the JIT-SDP models with the new data. Building JIT-SDP models require a sufficient amount of training data that is not available at the beginning of a software project. Cross-Project (CP) JIT-SDP can address the problem of a lack of training data required for JIT-SDP models to perform effectively at the beginning of the projects. However, such approaches have never been investigated in realistic online learning scenario where the JIT-SDP models need to update with both incoming Cross-Project (CP) and Within-Project (WP) software changes over time. Since there are no online CP JIT-SDP approach available, it is unknown how useful CP data can be in this circumstance. In particular, it’s unclear whether CP data are just valuable in the project’s early stages, when there aren’t many WP data, or if they might be beneficial for an extended period of time. Besides, based on the model’s learning process, many existing JIT-SDP studies used ensemble learners that consisted of either offline or online base models. It is unknown whether adapting offline models to operate in an online scenario can provide advantages over online models and whether such advantages would reflect on CP learning. Most machine learning models are also sensitive to hyper-parameter choice and may underperform if hyperparameters are not configured properly. Tuning methods used for offline learning are not suitable for online learning because in online learning, the best hyper-parameter choice may change over time. It is unknown whether the best hyper-parameter configuration changes over time for online JIT-SDP. If it does, then an online hyper-parameter tuning approach for JIT-SDP would be essential to improve its predictive performance.
This thesis aims to address the aforementioned issues in JIT-SDP. We conduct the first investigation of when and to what extent CP data is helpful for JIT-SDP in a realistic online scenario. For that, we propose three novel online CP JIT-SDP approaches, which can be updated to take into account incoming CP and WP training examples. Additionally, we provide a comprehensive evaluation of the predictive performance and computational costs of online and offline learning models in online CP JIT-SDP scenario. Furthermore, we provide an online hyper-parameter tuning method for online JIT-SDP that can recommend the best hyper-parameter choices over time.

Type of Work:

Thesis (Doctorates > Ph.D.)

Award Type:

Doctorates > Ph.D.

Supervisor(s):