Methods, devices and equipment for correcting categorizer and constructing categorizing corpus and medium

A classifier and corpus technology, applied in semantic tool creation, instrumentation, unstructured text data retrieval, etc., can solve problems such as underutilization of correct samples, overutilization of wrong samples, and increased error rate of text classification
CN108319682AActive Publication Date: 2018-07-24TIANWEN DIGITAL MEDIA TECH BEIJING

Patent Information

Authority / Receiving Office
CN · China
Current Assignee / Owner
TIANWEN DIGITAL MEDIA TECH BEIJING
Publication Date
2018-07-24

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses methods, devices and equipment for correcting a categorizer and constructing a categorizing corpus and a medium. The method for correcting the categorizer includes the steps that category center vectors corresponding to two or more text categories of the categorizer are obtained; a correction text of a set text category and text feature vectors of the correction text are obtained; according to the similarity of the text feature vectors and the category center vectors of all the current text categories of the categorizer and the text category of the correction text, thecategory center vectors corresponding to all the text categories in the categorizer are corrected; execution is performed again to obtain the correction text of the set text category and operation ofthe text feature vectors of the correction text until correction-ending conditions are met, so that the corrected categorizer is obtained. Through the method, influences on the category center vectorsof a text with wrong categorization are larger, and the error rate of text categorization is decreased.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The embodiments of the present invention relate to the field of text classification, and in particular to a method, device, device and medium for correcting a classifier and constructing a classification corpus. Background technique

[0002] With the development of electronic technology and the popularization of the Internet, people's reading methods have quietly changed, and the traditional reading methods mainly based on reading paper media have gradually turned to digital reading. Therefore, electronic news gradually occupies an increasingly important position in the field of news.

[0003] Automatic text classification of electronic news, that is, dividing electronic news into categories such as current politics, economy, military, entertainment, and sports according to news topics, can help us filter news of interest. At the same time, the automatic text classification of electronic news has important practical significance for news topic selecti...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More