DSpace Repository

The Impact of Mislabelling on the Performance and Interpretation of Defect Prediction Models

Show simple item record

dc.contributor.author Tantithamthavorn, Chakkrit en
dc.contributor.author McIntosh, Shane en
dc.contributor.author Hassan, Ahmed E. en
dc.contributor.author Ihara, Akinori en
dc.contributor.author Matsumoto, Kenichi en
dc.date.accessioned 2018-10-30T04:57:14Z en
dc.date.available 2018-10-30T04:57:14Z en
dc.date.issued 2015 en
dc.identifier.issn 0270-5257 en
dc.identifier.uri http://hdl.handle.net/10061/12736 en
dc.description 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, 6-24 May 2015, Florence, Italy en
dc.description.abstract The reliability of a prediction model depends on the quality of the data from which it was trained. Therefore, defect prediction models may be unreliable if they are trained using noisy data. Recent research suggests that randomly-injected noise that changes the classification (label) of software modules from defective to clean (and vice versa) can impact the performance of defect models. Yet, in reality, incorrectly labelled (i.e., mislabelled) issue reports are likely non-random. In this paper, we study whether mislabelling is random, and the impact that realistic mislabelling has on the performance and interpretation of defect models. Through a case study of 3,931 manually-curated issue reports from the Apache Jackrabbit and Lucene systems, we find that: (1) issue report mislabelling is not random; (2) precision is rarely impacted by mislabelled issue reports, suggesting that practitioners can rely on the accuracy of modules labelled as defective by models that are trained using noisy data; (3) however, models trained on noisy data typically achieve 56%-68% of the recall of models trained on clean data; and (4) only the metrics in top influence rank of our defect models are robust to the noise introduced by mislabelling, suggesting that the less influential metrics of models that are trained on noisy data should not be interpreted or used to make decisions. en
dc.language.iso en en
dc.publisher IEEE en
dc.rights c Copyright IEEE 2015 en
dc.subject software performance evaluation en
dc.subject software reliability en
dc.subject mislabelling impact en
dc.subject defect prediction model performance en
dc.subject defect prediction model interpretation en
dc.subject prediction model reliability en
dc.subject defect prediction models en
dc.subject randomly-injected noise en
dc.subject software modules en
dc.subject Apache Jackrabbit system en
dc.subject Lucene system en
dc.subject Predictive models en
dc.subject Data models en
dc.subject Noise measurement en
dc.subject Noise en
dc.subject Data mining en
dc.subject Software en
dc.subject Software Quality Assurance en
dc.subject Software Defect Prediction en
dc.subject Data Quality en
dc.subject Mislabelling en
dc.title The Impact of Mislabelling on the Performance and Interpretation of Defect Prediction Models en
dc.type.nii Conference Paper en
dc.textversion author en
dc.identifier.spage 812 en
dc.identifier.epage 823 en
dc.relation.doi 10.1109/ICSE.2015.93 en
dc.identifier.NAIST-ID 82040478 en
dc.identifier.NAIST-ID 73292310 en


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account