Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Software Defect Prediction Method Based on Class Imbalance Learning and Genetic Algorithm Wrapped Feature Selection

A technology of software defect prediction and genetic algorithm, which is applied in the field of software defect prediction of packaged feature selection, can solve the problems of classification method performance degradation, large search space, and low correlation, and achieve optimal test resource allocation and high prediction performance. , the effect of improving performance

Active Publication Date: 2019-02-19
南京瑞沃软件有限公司
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

On the one hand, a feature may have little correlation with the class label, but if the feature has a complementary relationship with other features, it can significantly improve the performance of the classification method. not optimal
On the other hand, although a certain feature has a strong correlation with the class label, if it is put together with other features, it may have certain redundancy, which will cause the performance of the classification method to decline.
(2) Large search space
[0005] In addition, due to the problem of class imbalance in the distribution of software defects within the tested project, most of the defects are concentrated in a few program modules.
Therefore, there is an obvious class imbalance problem in the collected defect prediction data set, that is, the number of defective modules (majority class) is far less than the number of non-defective modules (minority class)

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Software Defect Prediction Method Based on Class Imbalance Learning and Genetic Algorithm Wrapped Feature Selection
  • A Software Defect Prediction Method Based on Class Imbalance Learning and Genetic Algorithm Wrapped Feature Selection
  • A Software Defect Prediction Method Based on Class Imbalance Learning and Genetic Algorithm Wrapped Feature Selection

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0036] The overall flow chart of the software defect prediction method based on class imbalance learning and packaged feature selection of genetic algorithm in this embodiment is as follows figure 1 shown, including the following steps:

[0037] (1) Mining the version control system (such as CVS, SVN, or Git, etc.) and the defect tracking system (such as Bugzilla, Mantis, or Jira, etc.) of the software project, and extracting program modules therefrom. The granularity of program modules can be set as files, packages, classes or functions, etc. according to the purpose of defect prediction. Then, each program module is marked according to the defect report information in the defect tracking system (that is, each program module is marked as a defective type or a non-defective type). Finally, based on the analysis of software code complexity or software development process, the measurement units (namely features) that are correlated with software defects are designed, and the me...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a software defect prediction method based on two-stage wrapping-type feature selection, and belongs to the field of software quality assurance. The software defect prediction method comprises the following steps: (1) mining the version control system and the defect tracking system of a software project, extracting a program module from the version control system and the defect tracking system, and carrying out type marking and software measurement on the program module to generate a defect prediction data set D; (2) carrying out two-stage wrapping-type feature selection on the defect prediction data set so as to remove redundant features and irrelevant features in the data set D as many as possible, and finally, selecting an optimal feature subset FS' from an original feature set FS; and (3) on the basis of the optimal feature subset FS', preprocessing the data set D, forming a preprocessed data set D', and finally, constructing a defect prediction model in virtue of a decision tree which is a classification method. By use of the software defect prediction method, on one hand, the redundant features and the irrelevant features in the defect prediction data set can be effectively identified and removed, on the other hand, a class imbalance problem in the defect prediction data set can be effectively alleviated, and finally, the performance of the defect prediction model can be effectively improved.

Description

technical field [0001] The invention belongs to the field of software quality assurance, and in particular relates to a software defect prediction method based on packaged feature selection of class imbalance learning and genetic algorithm. Background technique [0002] Software defect prediction can pre-identify potential defect program modules in the project under test by analyzing the software historical warehouse and building a defect prediction model. By allocating more test resources to these program modules, it can optimize the allocation of test resources and improve the quality of the software. The purpose of product quality. However, when collecting defect prediction data sets, if multiple metrics (ie features) are considered, it is easy to cause the disaster of dimensionality in the data set, that is, the data set will contain irrelevant and redundant features. Among them, redundant features refer to a large number or complete repetition of information contained ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F11/36
CPCG06F11/366G06F11/3688
Inventor 陈翔田丹陆凌姣王莉萍吉人魏世鑫
Owner 南京瑞沃软件有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products