Text classification method and system based on class-aware feature selection framework

A text classification and feature selection technology, which is applied in character and pattern recognition, instruments, computing, etc., can solve the problems of not considering the ability to distinguish between feature word classes, feature sparseness, and classification accuracy limitations, so as to achieve excellent text classification effects and overcome one-sided effect

Active Publication Date: 2020-05-19
GUANGDONG UNIVERSITY OF FOREIGN STUDIES
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this approach does not perform well in discriminating between imbalanced datasets
This is because, when the data set has a large number of categories and is unbalanced data, the traditional feature extraction method only considers the features with the highest global class discrimination, resulting in sparse features extracted for some small sample category clusters, resulting in The classification accuracy rate for small sample clusters is reduced
At the same time, the feature extraction methods that the existing text classification methods rely on only consider the class inclination of the feature words but not the inter-class discrimination ability of the feature words. This one-sidedness limits the classification accuracy of the existing text classification methods.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text classification method and system based on class-aware feature selection framework
  • Text classification method and system based on class-aware feature selection framework
  • Text classification method and system based on class-aware feature selection framework

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] This embodiment includes a text classification method, referring to figure 1 , the method includes the following steps:

[0034] S1. Preprocessing multiple category clusters to obtain a set of feature words; the category clusters include multiple words of the same category, and the multiple category clusters are used to form a training set, and the training set is used to classify the classifier train;

[0035] S2. Calculate the class correlation score and the class discrimination score between each feature word in the feature word set and each category cluster respectively;

[0036] S3. Assigning each feature word in the feature word set to the category cluster with the corresponding highest class correlation score;

[0037] S4. Reorder the words in each category cluster according to the category distinction score between each category cluster and the assigned feature words;

[0038] S5. Select feature subsets from the reordered category clusters; all the selected f...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text classification method. The method includes preprocessing a plurality of category clusters to obtain a set of feature words, respectively calculating class correlation scores and class discrimination scores between each feature word and each category cluster, and assigning each feature word to a group with Corresponding to the category cluster with the highest class correlation score, reorder the words in each category cluster, select feature subsets from each category cluster, and reorder each feature subset in the total feature set to obtain the final feature collection, and the steps of inputting the text to be classified after the vector representation into the classifier, and outputting the classification result. The data processed by the classifier in the method of the present invention also includes information such as the respective properties of different category clusters and the degree of intra-class correlation and inter-class distinction of feature words, which overcomes the one-sidedness of the prior art and can achieve better text classification. Effect. The invention is widely used in the technical field of text classification.

Description

technical field [0001] The invention relates to the technical field of text classification, in particular to a text classification method and system based on a class-aware feature selection framework. Background technique [0002] Text classification technology is widely used in practical application scenarios such as information retrieval, text mining, public opinion analysis, and spam identification. Most text classification technologies are implemented based on classifiers, and the training set used to train classifiers contains hundreds of thousands of feature words, so feature extraction is an important part of text classification technology. [0003] The purpose of feature extraction is to extract feature words that can better identify cluster categories. Existing feature extraction methods mostly extract feature words that can best identify cluster categories from a global perspective. Taking information gain as an example, its principle is to calculate the informati...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/62
CPCG06F18/24155G06F18/24G06F18/214
Inventor 李霞刘汉锋
Owner GUANGDONG UNIVERSITY OF FOREIGN STUDIES
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products