DSpace Repository

The Effects of Over and Under Sampling on Fault-prone Module Detection

Show simple item record

dc.contributor.author Kamei, Yasutaka en
dc.contributor.author Monden, Akito en
dc.contributor.author Matsumoto, Shinsuke en
dc.contributor.author Kakimoto, Takeshi en
dc.contributor.author Matsumoto, Ken-ichi en
dc.date.accessioned 2018-10-30T04:57:16Z en
dc.date.available 2018-10-30T04:57:16Z en
dc.date.issued 2007 en
dc.identifier.issn 1949-3770 en
dc.identifier.uri http://hdl.handle.net/10061/12759 en
dc.description ESEM 2007 : First International Symposium on Empirical Software Engineering and Measurement, 20-21 Sept. 2007, Madrid, Spain en
dc.description.abstract The goal of this paper is to improve the prediction performance of fault-prone module prediction models (fault-proneness models) by employing over/under sampling methods, which are preprocessing procedures for a fit dataset. The sampling methods are expected to improve prediction performance when the fit dataset is unbalanced, i.e. there exists a large difference between the number of fault-prone modules and not-fault-prone modules. So far, there has been no research reporting the effects of applying sampling methods to fault-proneness models. In this paper, we experimentally evaluated the effects of four sampling methods (random over sampling, synthetic minority over sampling, random under sampling and one-sided selection) applied to four fault-proneness models (linear discriminant analysis, logistic regression analysis, neural network and classification tree) by using two module sets of industry legacy software. All four sampling methods improved the prediction performance of the linear and logistic models, while neural network and classification tree models did not benefit from the sampling methods. The improvements of Fl-values in linear and logistic models were 0.078 at minimum, 0.224 at maximum and 0.121 at the mean. en
dc.language.iso en en
dc.publisher IEEE en
dc.rights c Copyright IEEE 2007 en
dc.subject program testing en
dc.subject sampling methods en
dc.subject software maintenance en
dc.subject fault-prone module detection en
dc.subject fault-prone module prediction models en
dc.subject sampling methods en
dc.subject linear discriminant analysis en
dc.subject logistic regression analysis en
dc.subject neural network en
dc.subject classification tree en
dc.subject industry legacy software en
dc.subject Sampling methods en
dc.subject Fault detection en
dc.subject Predictive models en
dc.subject Classification tree analysis en
dc.subject Logistics en
dc.subject Neural networks en
dc.subject Linear discriminant analysis en
dc.subject Regression analysis en
dc.subject Accuracy en
dc.subject Software engineering en
dc.title The Effects of Over and Under Sampling on Fault-prone Module Detection en
dc.type.nii Conference Paper en
dc.textversion author en
dc.identifier.spage 196 en
dc.identifier.epage 204 en
dc.relation.doi 10.1109/ESEM.2007.28 en
dc.identifier.NAIST-ID 73292310 en

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace

Advanced Search


My Account