Software defect prediction method based on class imbalance learning algorithm

A software defect prediction and learning algorithm technology, applied in neural learning methods, integrated learning, computer components, etc., can solve problems such as imbalance, and achieve the effect of avoiding subjectivity and reducing costs

Pending Publication Date: 2021-03-09
HANGZHOU DIANZI UNIV
View PDF0 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the sample weight adjustment stage, the adaptive cost matrix adjustment strategy is used to assign different misclassification costs to the majority class samples and minority class samples through the cost matrix, and solve the class imbalance problem from the algorithm level

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Software defect prediction method based on class imbalance learning algorithm
  • Software defect prediction method based on class imbalance learning algorithm
  • Software defect prediction method based on class imbalance learning algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] combine figure 2 and image 3 , the NASA defect prediction data set and the AEEEM defect prediction data set illustrate the present invention in detail. Overall process of the present invention is as accompanying drawing figure 1 As shown, the specific steps are as follows:

[0042] Step 1. Use the SWIM oversampling method to synthesize minority class samples, and then combine the generated minority class samples with the original data to obtain a data set with a low imbalance rate.

[0043] Step 2. Use the ten-fold cross-validation method to divide the data set in step 1 into a training set and a test set for the prediction accuracy of the training model and the test model. Then use the ten-fold cross-validation method to divide the training set into a training set and a validation set, which is used to calculate the most suitable minority class misclassification cost for the current data set.

[0044]Step 3. Use the training set obtained from the second division ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a software defect prediction method based on a class imbalance learning algorithm. According to the method, a minority class sample is synthesized by using an SWIM oversampling method, so that a data set is converted into moderate imbalance from high imbalance, then minority class misclassification cost most suitable for a current data set is calculated by using a proposedadaptive cost matrix adjustment strategy, and then K weak classifiers are trained according to a training set, so that the classification accuracy of the data set is improved. In the process, the weight of the sample is continuously adjusted, the weight of the wrongly predicted sample is increased, the weight of the correctly predicted sample is reduced, and finally, the K weak classifiers are combined into a composite classifier to predict the category of the to-be-tested sample. According to the method, the problem of low prediction accuracy of minority class samples when the unbalanced data set is predicted is solved, defective modules can be accurately predicted, a test manager is helped to search for defects of software, and the software development cost is reduced.

Description

technical field [0001] The present invention is a learning method for class unbalanced data sets, and aims to use this technology to find defect samples in defect data sets, which can help testers locate defects and allocate test resources more effectively, thereby reducing the cost of software testing , specifically relates to a software defect prediction method based on a class imbalance learning algorithm. Background technique [0002] In the field of software defect prediction, there is a natural class imbalance problem in data sets, that is, in a given data set, the number of instances representing the "defective" class is much less than the number of instances representing the "non-defective" class. However, this defective class is the most important class, and it is the ultimate goal of the classifier to correctly predict samples of the defective class as much as possible. Due to under-representation of defect classes, classification techniques give less weight to in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06N20/20G06N3/08
CPCG06N20/20G06N3/08G06F18/2453G06F18/2415G06F18/214
Inventor 王兴起郑建明魏丹陈滨
Owner HANGZHOU DIANZI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products