Software defect prediction method based on data imbalance

A software defect prediction and balance technology, applied in software testing/debugging, electrical digital data processing, computer parts and other directions, can solve the problems of model overfitting, insufficient generalization, loss of important information, etc., to improve accuracy, The effect of data uniformity

Pending Publication Date: 2019-11-19
DALIAN MARITIME UNIVERSITY
View PDF3 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, the current imbalance processing method obviously has some shortcomings. For example, random oversampling adopts a strategy of simply copying samples to increase minority samples, which is prone to the problem of model overfitting, that is, the information learned by the model is too special ( Specific) but not generalized enough (General)
Random undersampling is because the sampling sample set is less than the original sample set, so some information will be missing, that is, deleting the majority class sample may cause the classifier to lose important information about the majority class
However, after linear interpolation to synthesize new samples according to the SMOTE algorithm, the new minority class obtained can only be distributed in the line segment between the original minority class instances, which strictly limits the distribution range of the newly generated minority class instances. CSC also has some limitations. For example, when there are too few positive classes, even if it costs a lot to misclassify positive classes into negative classes, it still prefers to classify positive classes into negative classes.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Software defect prediction method based on data imbalance

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] In order to make the technical solutions and advantages of the present invention more clear, the technical solutions in the embodiments of the present invention are clearly and completely described below in conjunction with the drawings in the embodiments of the present invention:

[0021] Such as figure 1 A software defect prediction method based on data imbalance is shown, which specifically includes the following steps:

[0022] S1: Use various error reports with software metric values ​​as the original data set for prediction, that is, divide the program with known bug distribution into small program segments, and then extract each software metric value from each program segment, such as inheritance The tree depth, the number of methods in each class, the number of direct inheritance classes of the class, etc., and then determine the label of the program segment according to whether there is a bug, and use various software measurement values ​​as attributes, and whe...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a software defect prediction method based on data imbalance, which comprises the following steps: taking various error reports with software metric values as an original data set for prediction from projects with known bug distribution; performing imbalance processing on the text matrix in the original data set by adopting an RSMOTE imbalance processing strategy to obtain abalanced data set; modeling the balance data set by using naive Bayes, polynomial naive Bayes, K neighbors, a support vector machine, a classification tree and Adaboost to find a classifier with an optimal prediction effect; and extracting a software metric value of a new project at an unknown bug position, inputting the software metric value into the classifier for prediction, outputting prediction information about whether each program segment has a bug or not, and recording and storing the prediction information. According to the method, the RSMOTE imbalance processing strategy is adoptedto perform imbalance processing on the text matrix in the original data set, so that the generation of a few types of samples is more flexible, and more extensive and reasonable samples can be generated.

Description

technical field [0001] The invention relates to the field of software defect prediction, in particular to a software defect prediction method based on data imbalance. Background technique [0002] With the continuous increase of people's demand for software, software development is becoming more and more important. Program debugging is a very important part of the software development process. This process mainly includes fault detection, fault location and fault repair. Among them, fault location is The most cumbersome part. In the process of software development, there will inevitably be a series of faults, some of which can be found and corrected according to the compilation information, but most of the program faults are caused by logic errors. According to statistics, the cost of repairing faults in software accounts for the entire software maintenance. 50% to 80% of the total cost, and in the repair process, personnel with experience and knowledge of code semantics an...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F11/36G06K9/62
CPCG06F11/3608G06F11/3616G06F18/24
Inventor 郭世凯董剑陈荣王佳慧李辉郭晨唐文君
Owner DALIAN MARITIME UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products