Method and device for generating text characteristic vectors based on TF-IGM, method and device for classifying texts
Patent Information
- Authority / Receiving Office
- CN · China
- Current Assignee / Owner
- CENT SOUTH UNIV
- Publication Date
- 2015-07-01
Smart Images
Figure 1 Figure 2 Figure 3
Abstract
Description
technical field
[0001] The invention belongs to the technical field of text mining and machine learning, and in particular relates to a TF-IGM-based text feature vector generation method and device, and a text classification method and device. Background technique
[0002] With the wide application of computers and the continuous development of the scale of the Internet, the number of electronic text documents has increased dramatically, so it is becoming more and more important to effectively organize, retrieve and mine massive text data. Automatic text classification is one of the widely used technical means. It often uses vector space model (VSM) to represent text, and then uses supervised machine learning method for classification. By extracting a certain number of feature words from the text and calculating their weights, the VSM model represents the text as a vector composed of the weight values of multiple feature words, called feature vectors. When generating text...