Overlapped-between-clusters-oriented method for classifying two types of texts

A text classification and classifier technology, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve the problem of not inheriting the effective information of training samples, recognition errors, etc.

Inactive Publication Date: 2010-11-03
THE PLA INFORMATION ENG UNIV
View PDF0 Cites 22 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In order to improve the performance of the classifier under the overlap between classes, the current processing methods mainly improve the performance of the classifier by removing the "noise" samples in the training sample set. These

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Overlapped-between-clusters-oriented method for classifying two types of texts
  • Overlapped-between-clusters-oriented method for classifying two types of texts
  • Overlapped-between-clusters-oriented method for classifying two types of texts

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] From the perspective of information granularity, in the coarse-grained world, the difference between samples is small, and there is ambiguity in the understanding of the object. In text classification, it is reflected in the training sample set in the coarse-grained world, and the classification prior knowledge provided is insufficient, making The constructed classification decisions are ambiguous, leading to errors in classification results. If the test sample is in the overlapping area between classes, that is, the class of the sample is not obvious, then it is difficult for people to accurately identify this class of samples without prior knowledge. In the present invention, the training sample set is re-divided and converted to a fine-grained world, which can increase the difference between samples and increase the prior knowledge of classification, which is beneficial to reduce the ambiguity of classification decision-making and improve the accuracy of classifiers. ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an overlapped-between-clusters-oriented method for classifying two types of texts. The method comprises the following steps of: forming training sample vectors, identifying training samples in an overlapped area and judging whether the training samples are in the overlapped area between clusters; re-dividing training sample vector sets, constructing a first layer classifier on the newly divided training sample vector sets; in various kinds of training sample sets in the overlapped area between clusters, extracting binary word strings formed by the words, of which adjacent words are verbs or nouns, as characteristics to construct a second layer classifier; and finally, performing the first layer classification on test samples, if the conditions are met, identifying the test samples by the second layer classifier, and merging the results of the two layers of classifiers to obtain a final classification result. The method is applied in the fields of classification of the texts with a higher overlapped-between-clusters degree, information filtration and information monitoring, and can ensure the accurate classification of the texts with a higher overlapped-between-clusters degree.

Description

technical field [0001] The invention relates to the technical field of text information analysis and processing, in particular to a two-class text classification method oriented to class overlap. Background technique [0002] With the popularization and rapid development of the Internet, a large number of text data, which is the main form of network data, has emerged, and text classification has become an effective way to organize and manage massive data. Text classification is to establish a mapping between the sample set to be classified and the pre-specified category set. According to the number of pre-specified categories, it is divided into two-class classification and multi-class classification. Among them, the two-class classification is aimed at the classification of positive and negative classes, and usually requires a manually labeled training set, including positive and negative samples. On this basis, the classifier learns, adjusts parameters, and establishes a ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 李弼程林琛陈刚席耀一郭志刚
Owner THE PLA INFORMATION ENG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products