Cross-project defect prediction method based on semi-supervised clustering data screening

A semi-supervised clustering and data screening technology, applied in the direction of electrical digital data processing, error detection/correction, software testing/debugging, etc., can solve problems such as lack of high reliability and limited historical data

Inactive Publication Date: 2017-09-05
WUHAN UNIV
View PDF1 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, for some new projects, the historical data in the project is very limited and does not have high reliability, so it is difficult to predict the defects of this project smoothly

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cross-project defect prediction method based on semi-supervised clustering data screening
  • Cross-project defect prediction method based on semi-supervised clustering data screening
  • Cross-project defect prediction method based on semi-supervised clustering data screening

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] The specific implementation process of the embodiment of the cross-project defect prediction method based on semi-supervised clustering data screening designed by the present invention is as follows:

[0030] Step 1. Mining the software history warehouse of this project and extracting useful software modules from it. The granularity of software modules can be set as files, packages, classes or functions according to actual application scenarios. Mark whether all the software modules of this project are defective, the class label of the software module marked as defective is Y, and the class label of the software module marked as non-defective is N.

[0031] Step 2, extract the existing software modules to be predicted in this project. These project software modules to be predicted are marked with "?".

[0032] Step 3, extract the measurement attributes of the software modules of this project, and extract 20 measurement attributes: the number of weighted methods (wmc),...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention a cross-project defect prediction method based on semi-supervised clustering data screening; the method comprises: using a semi-supervised clustering algorithm to cluster software module data to discover subclusters; collecting, from all the generated subclusters, all cross-project historical software modules having the same marks as the historical software modules of this project, namely screened cross-project software module data; using a naive Bayesian classification algorithm to establish a cross-project defect prediction model based on the screened cross-project software module data and all the historical software module data of this project, and predicting software module data, to be predicted, of this project. The method has the advantages that the cross-project software prediction model can be protected from the influence by irrelevant cross-project software module data, cross-project historical software module information and historical software module information of this project are made full use, and the performance of the cross-project software prediction model is enhanced.

Description

technical field [0001] The invention belongs to the technical field of software defect prediction, in particular to a cross-project defect prediction method based on semi-supervised clustering data screening. Background technique [0002] (1) Software defect prediction technology [0003] Software has become an important factor affecting national economy, military affairs, politics and even social life. Highly reliable and complex software systems depend on the reliability of the software they employ. Software defects are the potential source of related system errors, failures, crashes, and even machine crashes. The so-called defect, so far, there are many related terms and definitions in academia and industry, such as failure, defect, bug, error, error, failure, failure, etc. According to ISO 9000, the definition of a defect is: to meet the requirements related to the intended or specified use. A defect is a part of the software that already exists and can be eliminated ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F11/36G06K9/62
CPCG06F11/3608G06F18/24155G06F18/214
Inventor 余啸刘进安格格崔晓辉夏臻井溢洋
Owner WUHAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products