Characteristic selection method and device

A feature selection method and feature technology, applied in special data processing applications, instruments, electronic digital data processing, etc.

Active Publication Date: 2016-01-20
HUAWEI TECH CO LTD
View PDF3 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The embodiment of the present invention provides a feature selection method and device, which solves the problem of how to select a more accurate opt

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Characteristic selection method and device
  • Characteristic selection method and device
  • Characteristic selection method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0064] figure 2 A flowchart of a feature selection method is provided for the embodiment of the present invention, such as figure 2 As shown, the following steps may be included:

[0065] 201. Calculate the correlation between each feature variable in the original data set, and the correlation between each feature variable in the original feature subset and the predicted target feature variable.

[0066] Wherein, the characteristic variable is a description of a certain characteristic of entities such as processes, events, and states, and the predicted target characteristic variable is a preset "certain phenomenon" that needs to be described based on the combination of multiple characteristic variables, which is a specific feature variables.

[0067] The original data set includes N-dimensional feature variables and M groups of data, and the N and the M are positive integers; the N-dimensional feature variables include the N-1 dimension feature variables and the predicted ...

Embodiment 2

[0110] Figure 5 A structural diagram of a feature selection device 50 provided for an embodiment of the present invention, such as Figure 5 As shown, can include:

[0111] Calculation module 501, configured to calculate the correlation between each feature variable in the original data set, and between each feature variable in the original feature subset and the predicted target feature variable.

[0112] Wherein, the characteristic variable is a description of a certain characteristic of entities such as processes, events, and states, and the predicted target characteristic variable is a preset "certain phenomenon" that needs to be described based on the combination of multiple characteristic variables, which is a specific feature variables.

[0113] The original data set includes N-dimensional feature variables and M groups of data, and the N and the M are positive integers; the N-dimensional feature variables include the N-1 dimension feature variables and the predicted...

Embodiment 3

[0144] Figure 6 A structural diagram of a feature selection device 60 provided for an embodiment of the present invention, such as Figure 6 As shown, the device may include: a processor 601, a memory 602, a communication unit 603, and at least one communication bus 604, which are used to realize the connection and mutual communication between these devices;

[0145] The processor 601 may be a central processing unit (English: central processing unit, referred to as CPU);

[0146] The memory 602 may be a volatile memory (English: volatile memory), such as a random access memory (English: random-access memory, abbreviated as RAM); or a non-volatile memory (English: non-volatile memory), such as a read-only memory (English: read-only memory, abbreviation: ROM), flash memory (English: flashmemory), hard disk (English: harddiskdrive, abbreviation: HDD) or solid state drive (English: solid-state drive, abbreviation: SSD); or the above-mentioned types A combination of memories, a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a characteristic selection method and device, and relates to the technical field of data mining. According to the characteristic selection method and device, an optimum characteristic subset is determined on the basis of the correlation among characteristic variables, so that the effectiveness and operation efficiency of high-dimensional data characteristic selection are improved. The characteristic selection method provided by the invention comprises the following steps: calculating the correlation among characteristic variables in an original data set and the correlation between the characteristic variables in the original data set and predicted objective characteristic variables; obtaining a strong correlation characteristic subset and a weak correlation characteristic subset according to the correlation among the characteristic variables in the original data set and the correlation between the characteristic variables in the original data set and the predicted objective characteristic variables; and determining the set of all the characteristic variables contained in the strong correlation characteristic subset and the characteristic variables directly correlated to the characteristic variables in the strong correlation characteristic subset in the weak correlation characteristic subset as the optimum characteristic subset of the predicted objective characteristic variables.

Description

technical field [0001] The invention relates to the technical field of data mining, in particular to a feature selection method and device. Background technique [0002] High-dimensional data, such as aerospace remote sensing data, biological data, network data, and financial market transaction data, etc., the number and dimension of the data show exponential quantitative expansion, which can not only bring people the "gospel of dimensionality", that is, high-dimensional The rich information contained in the data can generate new possibilities to solve problems; moreover, it will also bring people the "curse of dimensionality" (curse of dimensionality), that is, the Euclidean distance between points in high-dimensional space is almost the same, making Pattern recognition and rule discovery in high-dimensional data bring great difficulties; therefore, in order to avoid the "curse of dimensionality", feature selection (Feature Selection) is required for high-dimensional data. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 张世明袁明轩曾嘉
Owner HUAWEI TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products