Unlock instant, AI-driven research and patent intelligence for your innovation.

Corpus processing method and apparatus

A processing method and processing device technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of reducing the coverage of the classifier, the corpus is not reliable enough, and consumes a lot of time and energy, so as to improve the utilization rate and accuracy, as well as coverage, and the effect of improving user experience

Active Publication Date: 2016-10-05
LENOVO (BEIJING) LTD
View PDF3 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This will cause the following situations: First, there will be many evaluation methods that cannot be covered. Manually labeling the training corpus usually takes a lot of time and energy, and it is generally difficult to cover all possible situations
Therefore, each labeling result is very precious. If the corpus that does not reach the threshold is directly removed, not only will the labeling work be wasted, but also the coverage of the final classifier will be reduced, and the final classification effect cannot be guaranteed; Second, even if there are certain evaluation methods in the training corpus, the number of corpus corresponding to them is relatively small
Due to the accidental and error-prone nature of manual annotation, the accuracy of these corpora is not reliable enough
If these corpus with low label reliability are put into the classifier, it may eventually affect the classification effect

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Corpus processing method and apparatus
  • Corpus processing method and apparatus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. Note that in this specification and the drawings, substantially the same steps and elements are denoted by the same reference numerals, and repeated explanation of these steps and elements will be omitted.

[0020] Reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one of the described embodiments. Thus, appearances of the phrase "in one embodiment" or "in an embodiment" in the specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner in one or more embodiments.

[0021] figure 1 A flow chart of a corpus processing method 100 according to an embodiment of the pres...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a corpus processing method and apparatus. The corpus processing method comprises the steps of obtaining to-be-classified first corpus sets; determining a second corpus set from the first corpus sets, wherein evaluation objects of second corpora in the second corpus set are all first evaluation objects, and evaluation content, on the first evaluation objects, of the second corpora is labeled as positive evaluation; determining a third corpus set from the first corpus sets, wherein evaluation objects of third corpora in the third corpus set are all the first evaluation objects, and evaluation content, on the first evaluation objects, of the third corpora is labeled as negative evaluation; judging whether the second corpora in the second corpus set and the evaluation content, on the first evaluation object, of any third corpus in the third corpus set are synonyms or near synonyms; and processing the corpus sets. According to the corpus processing method provided by the invention, the utilization rate and the coverage range of the classified corpora can be increased and the accuracy of the classified corpora can be improved.

Description

technical field [0001] The present invention relates to a corpus processing method and device, more specifically, to a corpus processing method and device for sentiment classification. Background technique [0002] At present, when we are doing sentiment analysis of product reviews, we mainly use classification methods to build sentiment analysis models. Since most of the objects to be classified are user comments on e-commerce websites, these comments are generally their own shopping experience published by users, and they are all colloquial descriptions. There is no specific evaluation scope and evaluation rules, and may involve all aspects of the product. Even when describing the same aspect of a product, different users say it differently. This makes it difficult for us to construct classification training corpus. Because the training corpus only achieves a certain coverage, representativeness and accuracy, the trained classification model will have a better classific...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
Inventor 卓雷赵凯葛安生
Owner LENOVO (BEIJING) LTD