Check patentability & draft patents in minutes with Patsnap Eureka AI!

Text classification method, device, medium and equipment

A text classification and text technology, applied in semantic analysis, special data processing applications, instruments, etc., can solve the problems of low weight of feature words, insufficient number of sample feature words, and inaccurate text categories of texts, etc., to increase weight and improve The effect of accuracy

Active Publication Date: 2018-12-07
TENCENT TECH (BEIJING) CO LTD
View PDF2 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The inventors found that there are some distinctive feature words in the text to be classified that will play a key role in the classification of the text to be classified, but the existing text classification methods may cause obvious features due to the insufficient number of sample feature words. The problem that the weight of the feature words is low, which leads to the problem that the determined text category of the text is not accurate enough

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text classification method, device, medium and equipment
  • Text classification method, device, medium and equipment
  • Text classification method, device, medium and equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0060] Embodiments of the present invention provide a text classification method, such as figure 1 shown, including:

[0061] Step 101, for each key feature word in the text to be classified, according to the sample feature words corresponding to the word category in the sample library, determine the word category corresponding to the key feature word.

[0062] During specific implementation, a plurality of word categories used to represent the categories of feature words are pre-divided, the sample feature testimonies in the sample text are divided into corresponding word categories, and the corresponding relationship between word categories and sample feature words is stored in the sample library. Specifically, multiple word categories can be obtained according to the semantic division of the special testimony, and the special testimony with the same or similar semantics can be further divided into the same word category. Multiple word categories can also be obtained accord...

Embodiment 2

[0119] An embodiment of the present invention provides a text classification device, such as Figure 5 shown, including:

[0120] The first determination module 501 is used to determine the word category corresponding to the key feature word according to the sample feature words corresponding to the word category in the sample library for each key feature word in the text to be classified; and

[0121] The second determination module 502 is used to determine the text category corresponding to the sample text with the key feature word according to the correspondence between the text category in the sample library and the sample text, and use the determined text category as the text corresponding to the key feature word Category, wherein, the sample text corresponding to each text category includes a plurality of sample feature words;

[0122] The third determining module 503 is used to determine the weight of the key feature word under each corresponding text category, wherein...

Embodiment 3

[0140] An embodiment of the present invention provides a non-volatile computer storage medium, the computer storage medium stores an executable program, and the executable program is executed by a processor to implement the steps of any text classification method in the first embodiment.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a text classification method, device, medium and equipment. The method comprises the following steps: specific to each key feature word in a to-be-classified text, according tosample feature words corresponding to word categories in a sample library, determining a word category corresponding to the key feature word; according to correspondence between text categories and sample texts in the sample library, determining text categories corresponding to sample texts containing the key feature word, and taking the determined text categories as text categories correspondingto the key feature word; determining weight of the key feature word in each corresponding text category, wherein the weight of the key feature word in each corresponding text category is a sum value of weights of the sample feature words in the same word category with the key feature word in any text category; and according to the weight of each key feature word in each corresponding text category, determining a text category that the to-be-classified text belongs. The method provided by the invention can improve accuracy of determining the category that the to-be-classified text belongs.

Description

technical field [0001] The present invention relates to the technical field of natural language processing, in particular to a text classification method, device, medium and equipment. Background technique [0002] This section is intended to provide a background or context for implementations of the invention that are recited in the claims. The descriptions herein are not admitted to be prior art by inclusion in this section. [0003] Currently commonly used text classification methods are: [0004] Use the chi-square test algorithm to extract the feature words in the text to be classified; for each feature word, find the text category corresponding to the feature word from the corresponding relationship between the text category in the sample library and the sample feature word; put the feature word in The probability of occurrence in the same text category is used as the weight of the feature word under the text category; according to the weight of each feature word in ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/30
CPCG06F40/216G06F40/289G06F40/30
Inventor 李探温旭张智敏常卓王树伟花少勇张伟闫清岭
Owner TENCENT TECH (BEIJING) CO LTD
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More