Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Method and device for classifying text

A text classification, text technology, applied in special data processing applications, instruments, electronic digital data processing and other directions, can solve the problems of difficult and unavailable training corpus, and the classifier has category bias.

Active Publication Date: 2013-03-27
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF4 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In this way, first of all, it is difficult to obtain a large amount of labeled training corpus. If large-scale manual labeling is used, the efficiency is low. Marked as entertainment), will cause the classifier to also be class-biased, so it will eventually lead to a reduction in classification accuracy
In addition, the existing technology also uses the clustering method to divide the text into several categories, but since the clustered categories cannot be controlled during clustering, if only the clustering method is used to divide the text into several categories, it may be There are situations where the class that is really needed cannot be obtained

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for classifying text
  • Method and device for classifying text
  • Method and device for classifying text

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0037] Please refer to figure 1 , figure 1 It is a schematic flowchart of Embodiment 1 of the text classification method in the present invention. Such as figure 1 As shown, this embodiment includes:

[0038] Step S101: Obtain the initial clustering result of the first text set as the current clustering result, and acquire the initial classification result of the first text set as the current classification result.

[0039] Step S102: Find the intersection of each category in the current classification result of the first text set and each category in the current clustering result of the first text set, and extract the texts of the corresponding category of the intersection from each intersection to obtain A first subset of text.

[0040] Ste...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method and device for classifying a text. The method comprises the following steps of A, acquiring the primary clustering result of a first text set as the current clustering result, and acquiring the primary classifying result of the first text set as the current classifying result; B, acquiring a first text sub set by using the current clustering result and the current classifying result; and C, acquiring a first classifier by using the first text sub set to classify the first text set to acquire the current classifying result, clustering the first text set by using the first text sub set as a clustering center to acquire the current clustering result, judging whether a preset condition is satisfied or not, if so, outputting the current classifying result of the first text set, and otherwise, returning to the step B. The text classifying precision is improved by the way.

Description

【Technical field】 [0001] The invention relates to text data mining technology, in particular to a text classification method and device. 【Background technique】 [0002] Text classification technology has applications in many fields. For example, texts are classified, and the classified texts are used to guide the training of translation models in machine translation. It can be seen that the accuracy of text classification is very important. Classified texts with high accuracy can be used in other However, if the accuracy of text classification is not enough, it will have a negative impact on the applications that use these classified texts. [0003] In the existing text classification methods, the training corpus is usually used for classifier training, and then the trained classifier is used to classify the text. In this way, first of all, it is difficult to obtain a large number of labeled training corpora. If large-scale manual labeling is used, the efficiency is low. M...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 杨振东吴华王海峰柴春光
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products