Principal component distribution function based software defect prediction imbalance data processing method

A software defect prediction and principal component distribution technology, which is applied in the fields of electrical digital data processing, software testing/debugging, computer parts, etc.

Active Publication Date: 2016-12-07
NANJING UNIV OF AERONAUTICS & ASTRONAUTICS
View PDF7 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The object of the invention of the present invention is to address the deficiencies of the above-mentioned background technology and provide a software defect prediction unbalanced data processing method based on the principal component distribution functio

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Principal component distribution function based software defect prediction imbalance data processing method
  • Principal component distribution function based software defect prediction imbalance data processing method
  • Principal component distribution function based software defect prediction imbalance data processing method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] The technical solution of the invention will be described in detail below in conjunction with the accompanying drawings. Such as figure 1 As shown, the present invention first utilizes the principal component analysis technique to reduce the dimensionality of the data to avoid the disaster of dimensionality in the experiment; the Tomek algorithm is used to remove the boundary samples and noise samples in the non-defective sample set, thus avoiding the loss of partial information of the non-defective sample set ; Use the fitted distribution function to generate random numbers to synthesize a new defective sample set, and use the "3 times standard deviation" principle to remove the values ​​at both ends of the data, which is very close to the distribution of the original data; by calculating the newly synthesized defective sample set The Euclidean distance between the sample and the original sample set removes the noise samples in the newly synthesized defective sample se...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a principal component distribution function based software defect prediction imbalance data processing method and belongs to the technical field of software engineering application. The method comprises the steps that data acquired from a software data concentration is preprocessed to obtain an original sample set; dimension reduction processing is conducted on the original sample set by adopting a PCA algorithm to obtain a principal component data set including defect-free sample sets and defect sample sets; subsampling is conducted on the defect-free sample sets, and boundary samples and noise samples of the defect-free sample sets are removed; distribution fitting is conducted on principal component data corresponding to the defect sample sets to obtain new defect sample sets; new sample sets are obtained by screening the new defect sample sets; the Euclidean distances between the sample sets and the original sample set in the new sample sets are calculated to remove noise samples in the new sample sets. By the adoption of the imbalance data processing method, the software defect prediction accuracy can be effectively improved.

Description

technical field [0001] The invention discloses a software defect prediction unbalanced data processing method based on a principal component distribution function, and belongs to the technical field of software engineering applications. Background technique [0002] With the rapid development of information technology, the application of computer software is becoming more and more extensive. Efficient and safe software systems are highly dependent on software reliability, and software defects that affect software reliability have become the root cause of system errors, failures, crashes, and even disasters. Accurate prediction of software defects can help reduce testing workload and cost. At present, software defect prediction is facing a serious and unavoidable problem, that is, the problem of data imbalance. The imbalance of data means that the categories of the data set are not evenly distributed, so that one of the categories is dominant. The problem of data imbalance...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F11/36G06K9/62
CPCG06F11/3668G06F18/24
Inventor 张德平张晓风
Owner NANJING UNIV OF AERONAUTICS & ASTRONAUTICS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products