Methods, devices and equipment for correcting categorizer and constructing categorizing corpus and medium
A classifier and corpus technology, applied in semantic tool creation, instrumentation, unstructured text data retrieval, etc., can solve problems such as underutilization of correct samples, overutilization of wrong samples, and increased error rate of text classification
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0041] This embodiment provides a method for correcting a text classifier, which is applicable to the situation where the category center vector of the text classifier is corrected to improve the classification accuracy of the text classifier. It can be implemented by the correcting device of the processor, which can be implemented in the form of software and / or hardware, and generally can be integrated in the processor.
[0042] Such as figure 1 As shown, the method of this embodiment specifically includes:
[0043] S110. Acquire category center vectors respectively corresponding to at least two text categories of the classifier, where the category center vectors are calculated based on at least two category texts corresponding to the text categories.
[0044]For a text classifier, it includes several text categories, and each text category corresponds to a category center vector. When using a text classifier to classify text, it is to judge the distance between each text an...
Embodiment 2
[0100] This embodiment provides a method for constructing a classified corpus, which is applicable to the situation of automatically constructing and purifying a classified corpus based on a small amount of text classification of Chinese and Western vocabulary. It can be realized by means of software and / or hardware, and generally can be integrated in a processor. Such as figure 2 As shown, the method of this embodiment specifically includes:
[0101] S210. Pre-classify at least two texts according to the seed vocabulary corresponding to the at least two text categories in the pre-specified set field, and construct an initial classification corpus.
[0102] For each text category in the set domain, a number of seed words are artificially designated, and these seed words are used to pre-classify the text in the set domain, thereby constructing the initial classification corpus.
[0103] Specifically, the set domain may be a news domain, and correspondingly, the classified co...
Embodiment 3
[0123]This embodiment provides a text classifier correction device, which is applicable to the situation where the class center vector of the text classifier is corrected to improve the classification accuracy of the text classifier, and the device can be implemented in the form of software and / or hardware implementation, and generally can be integrated in the processor. Such as image 3 As shown, the device includes: category center vector acquisition module 310, modified text acquisition module 320, classifier modification module 330 and loop operation module 340, wherein:
[0124] A category center vector acquisition module 310, configured to acquire category center vectors respectively corresponding to at least two text categories of the classifier, the category center vectors being calculated according to at least two category texts corresponding to the text category;
[0125] A modified text obtaining module 320, configured to obtain a modified text of a set text catego...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com