Ensemble learning method based on heuristic sampling

An integrated learning and heuristic technology, applied in integrated learning, character and pattern recognition, instruments, etc., can solve the problems of unbalanced pre-sampled data, low quality of data set sampling, and reduced classification effect of data sets, so as to improve classification effect, good classification operation efficiency, and the effect of increasing the degree of discrimination

Inactive Publication Date: 2020-06-12
TONGJI UNIV
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In view of the shortcomings of the above existing integration methods when dealing with unbalanced data sets, the purpose of the present invention is to provide an integrated learning method for heuristic sampling, which is used to solve the problem of the sampling quality of the existing integrated learning methods for unbalanced data sets. High, and the classification effect of the integrated learning method on the data set is reduced due to the imbalance of the pre-sampled data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Ensemble learning method based on heuristic sampling
  • Ensemble learning method based on heuristic sampling
  • Ensemble learning method based on heuristic sampling

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] Embodiments of the present invention are described below through specific examples, and those skilled in the art can easily understand other advantages and effects of the present invention from the content disclosed in this specification. The present invention can also be implemented or applied through other different specific implementation modes, and various modifications or changes can be made to the details in this specification based on different viewpoints and applications without departing from the spirit of the present invention. It should be noted that, in the case of no conflict, the following embodiments and features in the embodiments can be combined with each other.

[0029] It should be noted that the diagrams provided in the following embodiments are only schematically illustrating the basic ideas of the present invention, and only the components related to the present invention are shown in the diagrams rather than the number, shape and shape of the compo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

An ensemble learning method for heuristic sampling is suitable for classification of an unbalanced data set, and comprises the following steps: dividing the data set into a second category according to distribution characteristics of all samples in the data set in a characteristic space; respectively setting different hardness weights according to the second category of each sample, and calculating the selection probability of each sample in combination with the unbalanced weight; and resampling the data set according to the selected probability of each sample, and performing integrated training on the resampled data set to obtain a final classification result. According to the method, emphasized resampling is carried out on the basis of the intrinsic characteristics of the sample, so thatthe sampling quality of the unbalanced data set is improved, and the classification effect of an existing ensemble learning method on the unbalanced data set is improved.

Description

technical field [0001] The invention relates to the technical fields of data mining and machine learning, in particular to an integrated learning method based on heuristic sampling. Background technique [0002] In recent years, data mining and machine learning models have been widely used in various fields of life. Among them, the classification problem is one of the important tasks faced in the field of machine learning. The construction of traditional classification models is usually based on the following assumptions: the number of samples in each category in the data set is balanced. However, in many real-world applications, this assumption is not valid, that is, there is a phenomenon of category imbalance in the data set. Such as: transaction fraud detection, network intrusion detection, biological gene detection, spam filtering, etc., the data are not balanced. And when this unbalanced phenomenon occurs, the classifier's recognition accuracy for a small number of c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N20/20G06K9/62
CPCG06N20/20G06F18/2415G06F18/214
Inventor 蒋昌俊闫春钢丁志军刘关俊张亚英广明鉴
Owner TONGJI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products