Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Adaboost software defect unbalanced data classification method based on improvement

A data classification and software defect technology, applied in the direction of electrical digital data processing, software testing/debugging, genetic rules, etc., can solve problems such as investing a lot of research and unbalanced data classification

Inactive Publication Date: 2016-06-15
CHINA UNIV OF PETROLEUM (EAST CHINA)
View PDF0 Cites 23 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, since there are far fewer defective modules in software modules than non-defective modules, this is also a classification problem for unbalanced data.
At present, the classification technology for balanced data is relatively mature. However, the classification of unbalanced data, especially the classification of software defect data still needs a lot of research.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Adaboost software defect unbalanced data classification method based on improvement
  • Adaboost software defect unbalanced data classification method based on improvement
  • Adaboost software defect unbalanced data classification method based on improvement

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] Combine below figure 1 The present invention is described in further detail.

[0039] Step 1: First obtain the software feature set and software module data, and perform label processing. Among them, feature set F={f 1 ,f 2 … f m}. Software module data set {X,Y}, X={x 1 ,x 2 …x n}, Y={y 1 ,y 2}={+1,-1}. If software module x i No defects, then (x i ,y i ) = (x i ,-1), on the contrary, (x i ,y i ) = (x i ,+1).

[0040] The second step: use the improved genetic algorithm and BP neural network to select the features of software data, so as to reduce the dimensionality of software features and obtain the optimal feature subset.

[0041] (1) The initial population is randomly generated, and the population size is P (P<m). The feature set is binary coded, 0 means to select the feature, and 1 means not to select the feature.

[0042] (2) Use the BP neural network to train the data set, and adjust the weight and threshold of the network according to the predi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an Adaboost software defect unbalanced data classification method based on improvement, and mainly solves the problem that an existing software defect data classification method is poor in classification effect on minority classes. The method comprises the following steps that 1, software data is acquired from a software data set and then preprocessed, software module data is divided into a training set and a testing set for training and testing, and cross validation is performed for ten times; 2, feature selection of the software data is performed by combining a genetic algorithm based on improvement with a BP neural network to obtain an optimal feature subset, and then dimension reduction processing is performed on the software features; 3, the unbalancedness of the software defect data is fully considered, and an Adaboost classifier based on improvement is trained to classify software modules. According to the Adaboost software defect unbalanced data classification method based on improvement, the classification precision of the minority classes can be improved, and the software defect modules can be better detected.

Description

technical field [0001] The invention belongs to the field of software engineering applications, and in particular relates to an improved Adaboost-based software defect imbalance data classification method. Background technique [0002] With the rapid development of contemporary information technology, software systems have been applied to national defense construction, various directions and departments of the national economy, and various fields of human activities. The role played by software systems is increasing, and their size is also increasing accordingly. For example, American Telecom needs a system with more than 100 million lines of code to support; the space shuttle's airborne system has nearly 500,000 lines of code, and the ground control system and processing system has about 350,000 lines of code. After the massive reduction, there are still nearly a million lines of code to operate the entire space system. High stability is extremely important for these equi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F11/36G06K9/62G06N3/12
CPCG06F11/3608G06N3/126G06F18/2148G06F18/24
Inventor 李克文邹晶杰
Owner CHINA UNIV OF PETROLEUM (EAST CHINA)
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products