Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Feature selection method based on exponential synergy measurement

A feature selection method and index technology, applied in special data processing applications, unstructured text data retrieval, text database clustering/classification, etc. Effect

Pending Publication Date: 2020-09-25
XIAN UNIV OF TECH
View PDF3 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The purpose of the present invention is to provide a feature selection method based on exponential synergistic metrics, which solves the problem that most algorithms ignore the relative size of categories when performing feature dimensionality reduction, resulting in a decrease in algorithm accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Feature selection method based on exponential synergy measurement
  • Feature selection method based on exponential synergy measurement
  • Feature selection method based on exponential synergy measurement

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0046] like figure 1 As shown, a feature selection method based on exponential co-metrics is implemented in the following steps:

[0047] Step 1. Obtain the data set and perform preprocessing;

[0048] Step 2. Set the size of the optimal feature subset to C, and perform dimension reduction processing on the acquired data set;

[0049] Step 3, using the 5-fold cross-validation method to divide the data set after dimensionality reduction into a test set and a training set, and classify the test set;

[0050]Step 4, using the Macro-F1 and Micro-F1 evaluation criteria to evaluate the classification results of the classifier.

[0051] The data set in step 1 includes RE0, RE1, R52, R8, and 20News-groups.

[0052] The preprocessing in step 1 is specifically to delete terms whose occurrence times in the document are less than or equal to 3 and ex...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a feature selection method based on exponential cooperation measurement. The method is specifically implemented according to the following steps: 1, acquiring a data set and preprocessing the data set; 2, setting the size of the optimal feature subset as C, and performing dimension reduction processing on the obtained data set; 3, dividing the dimension-reduced data set into a test set and a training set by adopting a five-fold cross validation method, and classifying the test set; 4, evaluating the classification result of the classifier by using the Macro-F1 evaluation criterion and the Micro-F1 evaluation criterion. The problem that the category size of a data set is unbalanced in practical application is solved, and the importance of the document frequency of terms ti appearing in a positive category is emphasized. The ECM algorithm makes up for the defects of MMR, and improves the accuracy of feature selection and the efficiency of classification.

Description

technical field [0001] The invention is applied in the technical field of text classification in data mining, and relates to a feature selection method based on index collaborative measurement. Background technique [0002] With the continuous popularization of mobile networks and information technology, the amount of data generated worldwide is growing exponentially. Unlike in the past, the proportion of unstructured data in the data is increasing nowadays, among which the text type is the majority. Classifying text can greatly increase the speed at which computers can retrieve information. Text classification is a very typical problem in the field of natural language, and it has been widely used in sentiment analysis, public opinion analysis and email filtering. [0003] Text classification refers to the task of marking documents to be classified into one or more predefined categories, such as in email detection, classifying emails as spam and non-spam; on social media, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06F16/35
CPCG06F16/353G06F18/213G06F18/24155G06F18/2411
Inventor 周红芳马一鸣李想
Owner XIAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products