Semi-supervised heterogeneous software defect prediction algorithm based on GitHub

A software defect prediction and semi-supervised technology, applied in computing, special data processing applications, instruments, etc., can solve the problem of few defect prediction models

Active Publication Date: 2019-07-12
GUANGDONG UNIV OF PETROCHEMICAL TECH
View PDF4 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Currently, there is little research on how to exploit the large amount of unlabeled heterogeneous data from the Open Source Project (OSP) for defect prediction models

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Semi-supervised heterogeneous software defect prediction algorithm based on GitHub
  • Semi-supervised heterogeneous software defect prediction algorithm based on GitHub
  • Semi-supervised heterogeneous software defect prediction algorithm based on GitHub

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0063] Such as figure 1 As shown, this embodiment is based on GitHub's semi-supervised heterogeneous software defect prediction algorithm, including the following steps:

[0064] Step (1), collect data and build your own database: first, collect data on GitHub. Data collection consists of three instances: 1) project selection; 2) feature extraction; 3) cleaning the data set. For item selection, here we have selected 3 language tags (Python, Java, C) as keywords, and the sorting tag we have selected is "most star". Take "Top Programming Languages ​​2017" for reference. Due to this ranking, we only look at projects primarily written in the most popular programming languages ​​(Python, Java, and C++) and then we filter the top 20 projects from the sorted list. Table 1 shows the number of instances for the 3 programming languages

[0065] Table 1

[0066] Number of different programming languages

[0067]

[0068] For feature extraction, here we use a commercial tool calle...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a semi-supervised heterogeneous software defect prediction algorithm based on GitHub, which comprises the following steps of: firstly, collecting a data set, and establishing adatabase of the data set; preprocessing the collected data; secondly, processing isomerous data, introducing an enhanced typical correlation analysis method which is composed of unified metric representation (UMR) and typical correlation analysis (CCA); finally, adding a cost-sensitive nuclear semi-supervised discrimination method. A semi-supervised heterogeneous software defect prediction algorithm based on GitHub is realized; the method has the advantages that the problem of data heterogeneity in software defect prediction is solved, a cost-sensitive CKSDA (kernel semi-supervised discriminant analysis) technology is put forward for the first time, different error classification costs are solved by utilizing a cost-sensitive learning technology, and a defect prediction effect is realized.

Description

technical field [0001] The invention relates to a software defect prediction algorithm, in particular to a GitHub-based semi-supervised heterogeneous software defect prediction algorithm. Background technique [0002] Software defect prediction is a research hotspot in the field of software engineering data. Its hope is to be able to pre-identify potential defective program modules in the project at the early stage of project development, and allocate sufficient testing resources to such program modules to ensure that sufficient code review or unit testing can be carried out, and ultimately achieve improvement. The purpose of software product quality. At present, most of the research work focuses on the problem of defect prediction of the same project, that is, select part of the data set of the same project as the training set to build the model, and use the remaining unselected data as the test set to obtain the predictive ability of the model. However, in an actual soft...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/50
CPCG06F30/20
Inventor 荆晓远孙莹李娟娟黄鹤杨永光姚永芳彭志平
Owner GUANGDONG UNIV OF PETROCHEMICAL TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products