Text training method and text classifying method

A training method and text technology, applied in the field of text training and classification, can solve the problems of long training and classification time, unsatisfactory accuracy, and difficulty in meeting the needs of large-scale text classification, so as to achieve short training and classification time and improve classification The effect of precision

Inactive Publication Date: 2010-06-09
INST OF COMPUTING TECHNOLOGY - CHINESE ACAD OF SCI
View PDF0 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But their training and classification time are very long, it is difficult to meet the needs of large-scale text classification
On the contrary, the central classification method, Bayesian, Rocchio and Winnow are typical linear classifiers, and the training and classification time are linearly related to the sc

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text training method and text classifying method
  • Text training method and text classifying method
  • Text training method and text classifying method

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0025] In order to make the objectives, technical solutions, and advantages of the present invention clearer, the text classification method according to an embodiment of the present invention will be further described in detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described here are only used to explain the present invention, but not to limit the present invention.

[0026] The present invention attempts to improve the classification accuracy of the categories by modifying (or called optimizing) the center vectors of the categories while keeping the time complexity of the center classification method unchanged.

[0027] The classification idea of ​​the central classification method is simple. According to the arithmetic average, a central vector representing the category is generated for each type of text set, and then when the new text comes, the vector of the new text is determined, and the vector between the vec...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a text training method. The method comprises the following steps: 1) computing a centre vector of each classification of training sample sets; 2) classifying samples in the training sample sets according to the centre vector of the training sample sets; and 3) for the samples which are classified incorrectly, correcting the centre vector of a classification A to which the incorrectly classified samples belong or/and the centre vector of a classification B to which the samples are classified by mistake according to a set drag weight and a set push weight. The text can be classified with high precision and fast speed according to the centre vector obtained by the training method.

Description

technical field [0001] The invention relates to the field of pattern recognition, in particular to a text training method and a classification method. Background technique [0002] With the rapid development of Internet technology, online texts have shown exponential growth. Facing the vast amount of text data, people urgently need an efficient tool that can automatically process and organize large-scale text. Automatic text classification is to automatically classify a large amount of text according to its content, so as to help people effectively process and organize text data. Therefore, the research and development of high-performance text classification methods has increasingly become a research hotspot in the field of information retrieval. Many classic machine learning methods have been introduced into text classification. The more commonly used methods are central classification method, Rocchio, nearest neighbor (KNN), Winnow, Bayesian (NB), support vector machine...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06K9/62
Inventor 谭松波许洪波程学旗
Owner INST OF COMPUTING TECHNOLOGY - CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products