Check patentability & draft patents in minutes with Patsnap Eureka AI!

Text training method and text classifying method

A training method and text technology, applied in the field of text training and classification, can solve the problems of long training and classification time, unsatisfactory accuracy, and difficulty in meeting the needs of large-scale text classification, so as to achieve short training and classification time and improve classification The effect of precision

Inactive Publication Date: 2010-06-09
INST OF COMPUTING TECHNOLOGY - CHINESE ACAD OF SCI
View PDF0 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But their training and classification time are very long, it is difficult to meet the needs of large-scale text classification
On the contrary, the central classification method, Bayesian, Rocchio and Winnow are typical linear classifiers, and the training and classification time are linearly related to the scale of the problem, so they can meet the classification needs of large-scale text in time, but their accuracy often less than ideal
Therefore, in general, the current classification method is still in a situation where it is difficult to meet the requirements of high-performance text classification methods.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text training method and text classifying method
  • Text training method and text classifying method
  • Text training method and text classifying method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] In order to make the purpose, technical solution and advantages of the present invention clearer, the text classification method according to an embodiment of the present invention will be further described in detail below in conjunction with the accompanying drawings. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0026] The present invention attempts to improve their classification accuracy by modifying (or optimizing) the center vectors of categories while keeping the time complexity of the center classification method unchanged.

[0027] The classification idea of ​​the center classification method is simple. According to the arithmetic mean value, a center vector representing the class is generated for each type of text set, and then when a new text comes, the vector of the new text is determined, and the relationship between the vector and each type of center...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a text training method. The method comprises the following steps: 1) computing a centre vector of each classification of training sample sets; 2) classifying samples in the training sample sets according to the centre vector of the training sample sets; and 3) for the samples which are classified incorrectly, correcting the centre vector of a classification A to which the incorrectly classified samples belong or / and the centre vector of a classification B to which the samples are classified by mistake according to a set drag weight and a set push weight. The text can be classified with high precision and fast speed according to the centre vector obtained by the training method.

Description

technical field [0001] The invention relates to the field of pattern recognition, in particular to a text training method and a classification method. Background technique [0002] With the rapid development of Internet technology, online texts have shown exponential growth. Facing the vast amount of text data, people urgently need an efficient tool that can automatically process and organize large-scale text. Automatic text classification is to automatically classify a large amount of text according to its content, so as to help people effectively process and organize text data. Therefore, the research and development of high-performance text classification methods has increasingly become a research hotspot in the field of information retrieval. Many classic machine learning methods have been introduced into text classification. The more commonly used methods are central classification method, Rocchio, nearest neighbor (KNN), Winnow, Bayesian (NB), support vector machine...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06K9/62
Inventor 谭松波许洪波程学旗
Owner INST OF COMPUTING TECHNOLOGY - CHINESE ACAD OF SCI
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More