Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Unbalanced data distribution-based multi-heterogeneous base classifier fusion classification method

A technology of unbalanced data and base classifiers, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., to achieve the effect of avoiding over-adaptation

Inactive Publication Date: 2013-02-27
翟云 +1
View PDF2 Cites 25 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, these methods are only suitable for normal data sets, and are not suitable for unbalanced data sets. According to the current research progress, base classifier fusion methods suitable for unbalanced data distribution environments are still rare, especially in the algorithm Diversity and classification accuracy, especially in improving the accuracy of minority class samples, still face the bottleneck problem that is difficult to break through

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Unbalanced data distribution-based multi-heterogeneous base classifier fusion classification method
  • Unbalanced data distribution-based multi-heterogeneous base classifier fusion classification method
  • Unbalanced data distribution-based multi-heterogeneous base classifier fusion classification method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] The present invention will be further described below in conjunction with the accompanying drawings.

[0027] The implementation process of the fusion classification based on the heterogeneous base classifier under the unbalanced data distribution of the present invention is shown in the accompanying drawing, which specifically includes the steps:

[0028] Step 1 uses the resampling algorithm based on the differential sampling rate to preprocess the samples, including two processes of oversampling and undersampling, so as to allocate different samples to be classified for different base classifiers; taking the oversampling process as an example, the specific is :

[0029] A. Calculate the number of positive samples minsize and the number of negative samples maxsize;

[0030] B. Calculate the difference subsize between maxsize and minsize;

[0031] C. Calculate the sampling factor samfactor=subsize / n, where n is the number of base classifiers;

[0032] D. Calculate th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an unbalanced data distribution-based multi-heterogeneous base classifier fusion classification method, and relates to an unbalanced data classification technology in the field of data mining. The method comprises the following steps of: preprocessing a sample by using a difference sampling rate-based resampling algorithm, including an oversampling and an under-sampling process, thereby distributing different samples to be classified for different base classifiers; calculating a classification error rate of each base classifier and further calculating the corresponding weight; counting respective results by an oversampling expert and an under-sampling expert; and fusing the final prediction result according to a classification strategy function to obtain the category of the sample. By using the multi-heterogeneous base classifier fusion classification method, important characteristics of a few types of samples are found in mass data, and the accuracy of the few types of samples can be effectively improved, so that the aim of improving the integral classification accuracy of a data set is fulfilled.

Description

technical field [0001] The invention relates to the technical field of data mining, in particular to a multi-heterogeneous base classifier fusion classification method based on unbalanced data distribution. Background technique [0002] In recent years, with the deepening of data mining research and the continuous expansion of applications, more and more researchers have increasingly felt that in some complex data environments, traditional data mining technology has been difficult to adapt to the constant changes in the new situation. Among them, the data mining method for unbalanced data distribution environment has gradually become a hot issue in this field. Since Nathalie Japkowicz comprehensively proposed learning for unbalanced data sets, the problem of data classification based on unbalanced data distribution environments has become a dedicated research topic and one of the focus of future research for the first time. The traditional classification method focuses on i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 不公告发明人
Owner 翟云
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products