Unlock instant, AI-driven research and patent intelligence for your innovation.

Software defect prediction-oriented unbalanced data generation method

A software defect prediction and data generation technology, which is applied in software testing/debugging, electrical digital data processing, computer components, etc., can solve the problems of unresolved intra-class imbalance, performance degradation, and uneven distribution of defective samples. To achieve the effect of improving the prediction accuracy

Inactive Publication Date: 2020-12-01
BEIJING UNIV OF CHEM TECH
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the actual software defect prediction data set, the number of defective samples is usually much smaller than the number of non-defective samples, that is, there is an imbalance between classes. In addition, the distribution of defective samples in the data set is often uneven, which belongs to intra-class imbalance.
Both inter-class imbalance and intra-class imbalance will reduce the performance of the prediction model on the prediction of defective samples.
[0003] Existing data generation methods that deal with data imbalances generate new samples that are close to the original distribution. Although the inter-class balance is achieved by increasing the number of defective samples, the intra-class imbalance problem is not resolved.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Software defect prediction-oriented unbalanced data generation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0012] The present invention is an unbalanced data generation method oriented to software defect prediction, and its purpose is to generate data by adopting different strategies for defective samples with different distributions, so as to make the data set balance between classes and within classes, thereby improving the accuracy of prediction Rate. The specific implementation process of the present invention can be divided into the following stages:

[0013] In the first stage, the distribution is discussed. Discussing the distribution of software defect data sets in the feature space, it is found that there are usually three distributions of the two types of samples: the number of defective samples is greater than the number of non-defective samples, the number of defective samples is smaller than the number of non-defective samples and the number of defective samples Much smaller than the number of defect-free samples.

[0014] In the second stage, the samples are divided...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an unbalanced data generation method for software defect prediction, and belongs to the field of software testing. A software defect data set has a serious data imbalance problem, and generates a negative effect on the performance of a prediction model. Common methods for solving the data imbalance problem achieve inter-class balance by adjusting the number of samples, butnew sample distribution generally follows original distribution, and intra-class balance is not improved. According to the method, the sample distribution condition of the data set is considered, clustering division is performed on the original data set, and defective sample data generation is performed on the divided subareas by adopting different strategies according to different distribution conditions, so inter-class balance and intra-class balance of the data set samples are realized. Data imbalance can be effectively improved through data generation based on distribution, and accuracy ofa software defect prediction model is remarkably improved.

Description

technical field [0001] The invention relates to a method for generating unbalanced data oriented to software defect prediction, which belongs to the field of software development and testing. Background technique [0002] Software defect prediction technology (software defect prediction) identifies defective software modules by analyzing software historical data and using models such as classification and sorting. In the actual software defect prediction data set, the number of defective samples is usually much smaller than the number of non-defective samples, that is, there is an imbalance between classes. In addition, the distribution of defective samples in the data set is often uneven, which belongs to intra-class imbalance. Both inter-class imbalance and intra-class imbalance will degrade the performance of the prediction model on the prediction of defective samples. [0003] The existing data generation methods dealing with the problem of data imbalance generate new s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F11/36G06K9/62
CPCG06F11/3608G06F18/23213G06F18/24147
Inventor 张星瑶李征
Owner BEIJING UNIV OF CHEM TECH