Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Methods, devices and equipment for correcting categorizer and constructing categorizing corpus and medium

A classifier and corpus technology, applied in semantic tool creation, instrumentation, unstructured text data retrieval, etc., can solve problems such as underutilization of correct samples, overutilization of wrong samples, and increased error rate of text classification

Active Publication Date: 2018-07-24
TIANWEN DIGITAL MEDIA TECH BEIJING
View PDF3 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] At present, there are many choices for classification algorithms. For text classification, the influence of each text on the center vector of each text category in the center vector method is usually the same, that is, the influence of correct samples and wrong samples on the center vector are the same, therefore, there may be a problem that the correct samples are underutilized and the wrong samples are overutilized, which increases the error rate of text classification

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Methods, devices and equipment for correcting categorizer and constructing categorizing corpus and medium
  • Methods, devices and equipment for correcting categorizer and constructing categorizing corpus and medium
  • Methods, devices and equipment for correcting categorizer and constructing categorizing corpus and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0041] This embodiment provides a method for correcting a text classifier, which is applicable to the situation where the category center vector of the text classifier is corrected to improve the classification accuracy of the text classifier. It can be implemented by the correcting device of the processor, which can be implemented in the form of software and / or hardware, and generally can be integrated in the processor.

[0042] Such as figure 1 As shown, the method of this embodiment specifically includes:

[0043] S110. Acquire category center vectors respectively corresponding to at least two text categories of the classifier, where the category center vectors are calculated based on at least two category texts corresponding to the text categories.

[0044]For a text classifier, it includes several text categories, and each text category corresponds to a category center vector. When using a text classifier to classify text, it is to judge the distance between each text an...

Embodiment 2

[0100] This embodiment provides a method for constructing a classified corpus, which is applicable to the situation of automatically constructing and purifying a classified corpus based on a small amount of text classification of Chinese and Western vocabulary. It can be realized by means of software and / or hardware, and generally can be integrated in a processor. Such as figure 2 As shown, the method of this embodiment specifically includes:

[0101] S210. Pre-classify at least two texts according to the seed vocabulary corresponding to the at least two text categories in the pre-specified set field, and construct an initial classification corpus.

[0102] For each text category in the set domain, a number of seed words are artificially designated, and these seed words are used to pre-classify the text in the set domain, thereby constructing the initial classification corpus.

[0103] Specifically, the set domain may be a news domain, and correspondingly, the classified co...

Embodiment 3

[0123]This embodiment provides a text classifier correction device, which is applicable to the situation where the class center vector of the text classifier is corrected to improve the classification accuracy of the text classifier, and the device can be implemented in the form of software and / or hardware implementation, and generally can be integrated in the processor. Such as image 3 As shown, the device includes: category center vector acquisition module 310, modified text acquisition module 320, classifier modification module 330 and loop operation module 340, wherein:

[0124] A category center vector acquisition module 310, configured to acquire category center vectors respectively corresponding to at least two text categories of the classifier, the category center vectors being calculated according to at least two category texts corresponding to the text category;

[0125] A modified text obtaining module 320, configured to obtain a modified text of a set text catego...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses methods, devices and equipment for correcting a categorizer and constructing a categorizing corpus and a medium. The method for correcting the categorizer includes the steps that category center vectors corresponding to two or more text categories of the categorizer are obtained; a correction text of a set text category and text feature vectors of the correction text are obtained; according to the similarity of the text feature vectors and the category center vectors of all the current text categories of the categorizer and the text category of the correction text, thecategory center vectors corresponding to all the text categories in the categorizer are corrected; execution is performed again to obtain the correction text of the set text category and operation ofthe text feature vectors of the correction text until correction-ending conditions are met, so that the corrected categorizer is obtained. Through the method, influences on the category center vectorsof a text with wrong categorization are larger, and the error rate of text categorization is decreased.

Description

technical field [0001] The embodiments of the present invention relate to the field of text classification, and in particular to a method, device, device and medium for correcting a classifier and constructing a classification corpus. Background technique [0002] With the development of electronic technology and the popularization of the Internet, people's reading methods have quietly changed, and the traditional reading methods mainly based on reading paper media have gradually turned to digital reading. Therefore, electronic news gradually occupies an increasingly important position in the field of news. [0003] Automatic text classification of electronic news, that is, dividing electronic news into categories such as current politics, economy, military, entertainment, and sports according to news topics, can help us filter news of interest. At the same time, the automatic text classification of electronic news has important practical significance for news topic selecti...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/355G06F16/36
Inventor 张忠辉鲁彬李堪兵
Owner TIANWEN DIGITAL MEDIA TECH BEIJING
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products