High-dimensional and imbalance data classification-oriented integration

A data classification and balancing technology, applied in the field of data processing, can solve problems such as slightly poor performance

Inactive Publication Date: 2017-10-20
SHANGHAI FENGBAO INFORMATION TECH CO LTD
View PDF0 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Due to the limitations of the search method, the performance of the IEFS algorithm is slightly worse than that of the CSRF algorithm.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • High-dimensional and imbalance data classification-oriented integration
  • High-dimensional and imbalance data classification-oriented integration
  • High-dimensional and imbalance data classification-oriented integration

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] In order to make the technical means, creative features, goals and effects achieved by the present invention easy to understand, the present invention will be further elaborated below in conjunction with illustrations and specific embodiments.

[0048] For the integration of high-dimensional and unbalanced data classification, the order of dimensionality reduction and sampling is used to reduce the preprocessing strategy to two types; based on the principle of reproducibility of experimental conclusions, some standard data sets of data mining and machine learning are selected as experiments Data; in the selection of preprocessing methods, the Wrapper feature selection method and oversampling method are added; the influence of preprocessing methods on the classification performance of high-dimensional unbalanced data is studied from the two aspects of the number of attributes and the degree of imbalance;

[0049] There are two solutions to the classification of high-dimen...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides high-dimensional and imbalance data classification-oriented integration. The integration is characterized in that preprocessing policies are reduced to two types by adopting a sequence of dimension reduction and sampling; based on a reproducibility principle of an experiment conclusion, a few standard data sets of data mining and machine learning are selected as experiment data; in selection of a preprocessing method, a wrapper feature selection method and an oversampling method are added; the influence of the preprocessing method on the imbalance data classification performance is researched in two aspects of an attribute number and an imbalance degree, a completer preprocessing experiment policy is adopted, and different conclusions are obtained: before high-dimensional imbalance data classification, features are reduced firstly and then data is balanced, so that the average AUC performance is better and the automation level is high; and different preprocessing combination policies are adopted for relieving the influence of high dimension and imbalance on classification.

Description

technical field [0001] The invention relates to the field of data processing, in particular to integration oriented to classification of high-dimensional and unbalanced data. Background technique [0002] Data mining research is facing the challenges of various data problems, and data with different characteristics increases the complexity of algorithm research. Among them, the classification of data with high-dimensional and unbalanced characteristics is the focus of research in recent years. Existing methods only consider one characteristic of high dimensionality or imbalance, but a large amount of real-world data exhibits dual characteristics at the same time. Classification algorithms for high-dimensional or unbalanced data alone face performance bottlenecks when classifying data with dual characteristics. How to effectively classify high-dimensional and unbalanced data is an urgent problem to be solved in applied research. There are two approaches to classify high-di...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/2465
Inventor 李臻
Owner SHANGHAI FENGBAO INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products