Text classification method based on block partition and position weight

A text classification and block division technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of retaining text information, single expression mode, and difficult to find feature word weight adjustment mode, etc.

Inactive Publication Date: 2011-04-27
BEIJING UNIV OF POSTS & TELECOMM
View PDF5 Cites 43 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Due to the single expression mode of the former, the text information cannot be retained to the greatest extent; although the latter carries richer information, it usually has different text sizes and large differences in sentence length, resulting in poor adaptability of the sentence segmentation analysis mode and large consumption of storage resources. And it is not easy to find an effective feature word weight adjustment mode and other issues

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text classification method based on block partition and position weight
  • Text classification method based on block partition and position weight
  • Text classification method based on block partition and position weight

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] The technical scheme of the present invention will be further explained below with reference to the drawings and embodiments.

[0041] In manual classification, if people judge which type of text a text belongs to or obtain the information conveyed by the text, they often do not need to read the full text to determine the subject of the text and get the correct judgment, but extract at a specific location of the text Characteristic words that reflect the type of text. And to determine a feature word with a considerable degree of importance is not just based on its frequency in a text. That is, a text with a clear single-category or multi-category attribution usually has a normative expression in its specific field, that is, the same feature words will carry different amounts of information due to their different positions in the text. In other words, the amount of information conveyed by a text should at least include the information carried by the characteristic words th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text classification method based on block partition and position weight. The method comprises the following steps of: after performing basic pre-processing on an input training or test text, extracting section information from the text; performing statistical analysis on block information by taking one section as a basic text block; performing block partition on the text content again, according to the block size distribution or pre-defined block ratio, including operation, such as combining text blocks; extracting characteristic words and quantized weight, acquiring the posterior probability of the characteristic words on class, analyzing the distribution of the characteristic words having the biggest posterior probability class and according with a text class label, and generating a text vector; and finishing classification model training or text classification with a classifier. The method can be applied to a text representation stage of a text classification system. The text classification effect is improved through the rich and traditional expression of the text content when the text vector is created by using the characteristic words.

Description

Technical field [0001] The invention relates to a text classification method based on block division and position weight, which belongs to the field of electronic text organization and classification. Background technique [0002] Text classification usually evaluates the pros and cons of text classification methods from two perspectives: one is classification performance, such as accuracy, recall, and F1 value, and the other is implementation performance, such as time efficiency and storage efficiency. The process of realizing text classification includes the steps of text preprocessing, text representation, classifier training, and classification application. The text representation includes details such as feature selection, feature weight requantization, and feature dimensionality reduction. [0003] Conventional text classification methods are mostly focused on the selection and improvement of classifiers, mainly based on support vector machines, K nearest neighbors, Bayesian ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 周亚建平源杨义先彭维平刘念
Owner BEIJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products