Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

A Feature Selection Method for Text Classification

A feature selection method and text classification technology, which can be used in text database clustering/classification, unstructured text data retrieval, special data processing applications, etc., and can solve the problems of high feature dimension and low classification accuracy.

Active Publication Date: 2019-06-28
UNIV OF SCI & TECH BEIJING
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The technical problem to be solved by the present invention is to provide a text classification feature selection method to solve the problems of high feature dimension or low classification accuracy existing in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Feature Selection Method for Text Classification
  • A Feature Selection Method for Text Classification
  • A Feature Selection Method for Text Classification

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0053] In order to make the technical problems, technical solutions and advantages to be solved by the present invention clearer, the following will describe in detail with reference to the drawings and specific embodiments.

[0054] Aiming at the existing problem of high feature dimension or low classification accuracy, the present invention provides a text classification feature selection method.

[0055] like figure 1 As shown, the text classification feature selection method provided by the embodiment of the present invention includes:

[0056] Step 1: Obtain feature set S and target category C, and calculate each feature x in feature set S (i) The degree of association R with the target category C c (x (i) ), and according to the degree of relevance R c (x (i) ) size to sort the feature set S in descending order;

[0057] Step 2: Calculate the redundancy R between every two features in the feature set S x and degree of synergy S x , combined with the correlation d...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a text classification feature selecting method capable of reducing the characteristic dimension and the classification complexity and improving the classification accuracy. The method comprises the following steps that a feature set S and a target class C are obtained, the relevancy Rc(x(i)) between each feature x(i) in the feature set S and the target class C is calculated, and sort descending is conducted on the feature set S according to the size of the relevancy Rc(x(i)); the redundancy Rx and the synergy degree Sx between every two features in the feature set S are calculated, and the sensitivity Sen of each feature is calculated with the combination of the relevancy Rc(x(i)) between each feature and the target class, the sensitivities Sen are compared with a preset threshold th, and the feature set S is divided into a candidate set Ssel and an excluding set Sexc with the combination of the sort descending result of the feature set S according to the threshold th; the sensitivities Sen between the features in the candidate set Ssel and the features in the excluding set Sexc are calculated, the sensitivities Sen are compared with the preset threshold th, and the candidate set Ssel and the excluding set Sexc are adjusted according to the threshold th. The text classification feature selecting method is suitable for the field of machine learning text classification.

Description

technical field [0001] The invention relates to the field of machine learning text classification, in particular to a text classification feature selection method. Background technique [0002] With the continuous expansion of the scale of the Internet, the information resources gathered in the Internet are also increasing. In order to effectively manage and utilize these information resources conveniently, content-based information retrieval and data mining have always been concerned. Text classification technology is an important basis for information retrieval and text data mining. Its main task is to distinguish unknown categories of words and documents as one or more of the predetermined categories. However, the two characteristics of large number of training samples and high vector dimension determine that text classification is a machine learning problem with high computational time and space complexity. Therefore, we need to perform feature selection to reduce the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35
CPCG06F16/353
Inventor 张晓彤余伟伟刘喆王璇
Owner UNIV OF SCI & TECH BEIJING
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products