Method and equipment for constructing text classifier by referencing external knowledge

A text classification and external knowledge technology, applied in the direction of instruments, special data processing applications, electrical digital data processing, etc., can solve the problem of dependence on data distribution, generalization ability and robustness, poor generalization ability and robustness of classifiers and other issues to achieve the effect of improving generalization ability and robustness, improving category representativeness, and increasing diversity

Inactive Publication Date: 2011-04-20
NEC (CHINA) CO LTD
View PDF6 Cites 30 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, the final text classifier constructed must be completely influenced by the given labeled text, resulting in poor generalization and robustness of this classifier
[0012] Although there are other training text selection methods in the prior art, the current training text selection methods are mainly realized by using the internal knowledge of a given labeled text set, that is, the features and weights used are completely dependent on the given Determine the data distribution of the labeled text set, so that the selected training text will have a strong bias
This bias will be propagated to the classification orientation of the final constructed classifier, which greatly affects its generalization ability and robustness, and finally causes the performance of the classifier to be unsatisfactory.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and equipment for constructing text classifier by referencing external knowledge
  • Method and equipment for constructing text classifier by referencing external knowledge
  • Method and equipment for constructing text classifier by referencing external knowledge

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] Here, for the convenience of description, at first some technical terms that will be used in the present invention are briefly explained:

[0034]

[0035] image 3 is a structural block diagram of a text classifier construction device 300 according to an embodiment of the present invention. Such as image 3As shown, in the embodiment of the present invention, the text classifier construction device 300 includes an input device 301, a text vectorization device 302, an external feature construction device 303, a training text selection device 304 based on a hybrid method, and a classifier learning device 305. and figure 1 Compared with the text classifier construction device 100 shown in the prior art, the input device 301, the text vectorization device 302 and the classifier learning device 305 included in the text classifier construction device 300 have functions and structures similar to those of the prior art . Therefore, the uniqueness of the present inventio...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method and equipment for constructing a text classifier by referencing external knowledge. The method comprises the steps of: inputting a label text set; extracting internal characteristics of the label text set; constructing external characteristics of the label text set by referencing an external knowledge source (such as a dictionary); comprehensively considering the internal characteristics and the external characteristics of the label text set, and selecting training texts from the label text set; and learning the generation of the text classifier by using the selected training texts. According to the invention, sample distribution deviation generated by the label text set can be possibly regulated by the external characteristics automatically generated by the external knowledge source, and therefore, the finally trained classifier has better generalization capability and robustness.

Description

technical field [0001] The present invention relates generally to information retrieval and text classification. More specifically, the present invention relates to methods and devices for constructing text classifiers with reference to external knowledge. Background technique [0002] With the rapid development of electronic office and the Internet, the amount of electronic text information has exploded, and large-scale automatic information processing has become a necessary means and challenge for people to make better use of this large-scale information. [0003] Information retrieval refers to the process and technology of organizing information in a certain way and finding relevant information according to the needs of information users. Automatic text classification is one of the main supporting technologies for information retrieval. Its basic purpose is to divide text into predefined categories, which is an effective means to help people search, query, filter and ut...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 李建强赵彧刘博
Owner NEC (CHINA) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products