Two-stage text feature selection method under unbalanced data set
Patent Information
- Authority / Receiving Office
- CN Ā· China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHANDONG UNIV OF SCI & TECH
- Publication Date
- 2020-05-12
Smart Images

Figure 1 
Figure 2 
Figure 3
Abstract
Description
technical field
[0001] The invention belongs to the field of text feature selection in natural language processing, and in particular relates to a two-stage text feature selection method under an unbalanced data set. Background technique
[0002] Text classification refers to the process of allowing a computer to automatically identify a given text content as one or several categories of pre-defined categories. Text classification is mainly divided into five steps, obtaining training set, text preprocessing, feature extraction, document representation, and classification algorithm. A general data set can generate tens of thousands of features after preprocessing, and a large data set can even generate millions of features. High-dimensional features not only increase the calculation time but also reduce the accuracy of text classification. Effective feature extraction can reduce the feature dimension and improve the accuracy of text classification, so feature extraction is o...