Overlapped-between-clusters-oriented method for classifying two types of texts

A text classification and classifier technology, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve the problem of not inheriting the effective information of training samples, recognition errors, etc.
CN101876987AInactive Publication Date: 2010-11-03THE PLA INFORMATION ENG UNIV

Patent Information

Authority / Receiving Office
CN Β· China
Current Assignee / Owner
THE PLA INFORMATION ENG UNIV
Publication Date
2010-11-03
Estimated Expiration
Not applicable Β· inactive patent

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses an overlapped-between-clusters-oriented method for classifying two types of texts. The method comprises the following steps of: forming training sample vectors, identifying training samples in an overlapped area and judging whether the training samples are in the overlapped area between clusters; re-dividing training sample vector sets, constructing a first layer classifier on the newly divided training sample vector sets; in various kinds of training sample sets in the overlapped area between clusters, extracting binary word strings formed by the words, of which adjacent words are verbs or nouns, as characteristics to construct a second layer classifier; and finally, performing the first layer classification on test samples, if the conditions are met, identifying the test samples by the second layer classifier, and merging the results of the two layers of classifiers to obtain a final classification result. The method is applied in the fields of classification of the texts with a higher overlapped-between-clusters degree, information filtration and information monitoring, and can ensure the accurate classification of the texts with a higher overlapped-between-clusters degree.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention relates to the technical field of text information analysis and processing, in particular to a two-class text classification method oriented to class overlap. Background technique

[0002] With the popularization and rapid development of the Internet, a large number of text data, which is the main form of network data, has emerged, and text classification has become an effective way to organize and manage massive data. Text classification is to establish a mapping between the sample set to be classified and the pre-specified category set. According to the number of pre-specified categories, it is divided into two-class classification and multi-class classification. Among them, the two-class classification is aimed at the classification of positive and negative classes, and usually requires a manually labeled training set, including positive and negative samples. On this basis, the classifier learns, adjusts parameters, and establishes a ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More