Method and device for generating text characteristic vectors based on TF-IGM, method and device for classifying texts

A feature vector, text classification technology, applied in special data processing applications, instruments, electronic digital data processing and other directions, can solve the problem of not considering the detailed distribution information of feature words, weight calculation deviation, etc., to achieve reasonable and effective weight calculation and improve performance. , Overcome the effect of large deviation in weight calculation
CN104750844AActive Publication Date: 2015-07-01CENT SOUTH UNIV

Patent Information

Authority / Receiving Office
CN · China
Current Assignee / Owner
CENT SOUTH UNIV
Publication Date
2015-07-01

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses a method and a device for generating text characteristic vectors based on TF-IGM, as well as a method and a device for classifying texts. The concentration ratios of characteristic words distributed in different classes of texts are calculated by establishing inverted gravitational moment (IGM) models, and the weights of the characteristic words are calculated based thereon. The weights obtained by the calculation can more realistically reflect the importance of the characteristic words in the text classes, accordingly increasing the performance of text classifiers. The device for generating the text characteristic vectors based on the TF-IGM has a plurality of options that may be optimized and regulated based on the results of the performance test of the text classes in order to be adaptive to text data sets having different characteristics. It is proved by experiments on public English corpus and Chinese corpus that the TF-IGM method is much more superior to the existing methods such as TF-IDF methods and TF-RF methods, and the TF-IGM method is particularly applicable to multi-class text classifications of more than two classes.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention belongs to the technical field of text mining and machine learning, and in particular relates to a TF-IGM-based text feature vector generation method and device, and a text classification method and device. Background technique

[0002] With the wide application of computers and the continuous development of the scale of the Internet, the number of electronic text documents has increased dramatically, so it is becoming more and more important to effectively organize, retrieve and mine massive text data. Automatic text classification is one of the widely used technical means. It often uses vector space model (VSM) to represent text, and then uses supervised machine learning method for classification. By extracting a certain number of feature words from the text and calculating their weights, the VSM model represents the text as a vector composed of the weight values ​​of multiple feature words, called feature vectors. When generating text...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More