Unlock instant, AI-driven research and patent intelligence for your innovation.

Text classification feature selection approach for importance weighing

A feature selection method, text classification technology, applied in text database clustering/classification, unstructured text data retrieval, special data processing applications, etc.

Active Publication Date: 2017-05-03
上海利连信息科技有限公司
View PDF4 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

When calculating the statistics of feature category discrimination ability, the existing methods do not distinguish the representative ability (ie importance) of each feature to the sample.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text classification feature selection approach for importance weighing
  • Text classification feature selection approach for importance weighing
  • Text classification feature selection approach for importance weighing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] When calculating statistics for feature selection, existing methods usually ignore the differences in the importance of each candidate feature in the text and treat them equally, which inevitably introduces some noise and affects the accurate determination of each feature. The class discriminative ability of the candidate features. In response to this problem, the present invention proposes an importance-weighted text classification feature selection strategy. Experiments on multiple text classification problems show that: compared with previous methods that did not consider the importance of features, the strategy of the present invention can effectively Improve the determination of various statistics on the ability to distinguish feature categories, and further improve the effectiveness of feature selection.

[0044] The principles and preferred embodiments of the present invention will be described in detail below.

[0045] To calculate a feature t for a certain cat...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text classification feature selection approach for importance weighing. The text classification feature selection approach comprises the following steps: first step, counting data information of each candidate feature appearing in each class, and specially considering the semantic representation degree of the candidate features for texts during counting, namely importance; second step, using the data information obtained in the first step, using a relevance statistical magnitude computational formula, and computing the distinguishing capacity of each candidate feature for each class; and third step, computing the distinguishing capacity of each candidate feature for all classes in a summarized manner, ranking all the candidate features according to the total distinguishing capacity of each candidate feature for all classes, and outputting a feature list obtained through ranking.

Description

technical field [0001] The invention relates to the technical field of text mining and machine learning, in particular to an importance-weighted text classification feature selection method. Background technique [0002] Text classification problems are a special class of machine learning problems. The usual practice is to use the vector space model to represent the text as points on the multi-dimensional feature space, and then use various machine learning algorithms for learning and discrimination. In a text classification problem, there are usually thousands of features available to determine such a semantic space. However, the ability of different features to distinguish categories is very different. In order to obtain ideal classification accuracy and high processing efficiency, it is usually necessary to use feature selection technology to determine a relatively streamlined and more effective feature set from the possible candidate feature set. A subset of features. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/35
Inventor 李保利
Owner 上海利连信息科技有限公司