Chinese text classification method

A text classification and text word segmentation technology, applied in semantic analysis, special data processing applications, instruments, etc., can solve problems such as interference of classification results, failure to meet practical applications, and decline in accuracy rate, and achieve good classification results

Inactive Publication Date: 2018-09-07
SUZHOU CHUNQING INTELLIGENT TECH CO LTD
View PDF5 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in the public security alarm information, the gap between categories is very small, and there may be only 1 or 2 keywords expressing the subject of the document, so the interference of other noise words on the classification results is very obvious
In particular, with the improvement of the fineness o...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese text classification method
  • Chinese text classification method
  • Chinese text classification method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] The present invention will be further described below in conjunction with the accompanying drawings.

[0027] like figure 1 As shown, a Chinese text classification method includes the following steps:

[0028] (1) Text preprocessing, including corpus selection, text word segmentation, word frequency statistics and text representation;

[0029] (2) Feature representation and feature extraction

[0030] The feature representation method of the text is the model of the text, using the vector space model to simplify the text into a vector representation with the weight of the feature item as the component;

[0031] Feature extraction refers to removing words that cannot represent information to improve classification efficiency and reduce computational complexity. This method uses information gain. The information gain comes from information theory. It indicates that the feature appears or does not appear in the text to determine the type of text The size of the amount o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a Chinese text classification method. The method includes the following steps: (1) text preprocessing, (2) feature representation and feature extraction, (3) classifier design,and (4) performance indicators. According to the method, a new RBF neural-network algorithm is employed, K-means is used to derive a center point and width of hidden items through Gaussian radial basis function, output results obtained by a hidden layer are combined, thus a classification result is obtained, values of an accuracy rate, a recall rate and F measurement of the algorithm are very high, and a classification effect is good.

Description

technical field [0001] The invention relates to the technical field of data collection, in particular to a Chinese text classification method. Background technique [0002] This classification generally includes the process of text expression, classifier selection and training, classification result evaluation and feedback, and text expression can be subdivided into text preprocessing, indexing and statistics, feature extraction and other steps. [0003] Traditional text classification methods usually classify long documents with obvious differences between categories, such as web content classification (sports, news, finance and military, etc.). However, in some specific fields, such as short document classification such as automatic classification of police information for public security and sentiment analysis of Weibo, the gap between categories is very small. The higher the requirement for the fineness of text categories, the more accurate the classification becomes Lo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/27
CPCG06F40/30
Inventor 姚国平
Owner SUZHOU CHUNQING INTELLIGENT TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products