A text classification method and device

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A text classification and text technology, applied in the field of artificial intelligence, can solve problems such as low classification accuracy, unfavorable text processing, and failure to meet classification requirements, and achieve high efficiency and high classification accuracy

Active Publication Date: 2021-06-08

NEW H3C BIG DATA TECH CO LTD

View PDF6 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Although the DF-IDF method weakens the impact of the frequency of vocabulary on the classification results, the classification accuracy is low and cannot meet the classification requirements of higher accuracy, which is not conducive to the further processing of the text

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0037] see figure 1 As shown, it is a flow chart of the text classification method provided in Embodiment 1 of the present application. The method includes steps S101 to S104, wherein:

[0038] S101: Obtain the text to be classified, and determine the number of times each sample word in the sample vocabulary set appears in the text to be classified; the sample words in the sample vocabulary set are sample words used for text classification based on the text classification sub-model.

[0039] S102: Divide the sample vocabulary into multiple groups according to the sample vocabulary respectively used by the multiple text classification sub-models; wherein, each group corresponds to a text classification sub-model, and the sample vocabulary in different groups is not completely the same.

[0040] S103: Input the number of occurrences of the sample words in each group in the text to be classified into the text classification sub-model corresponding to each group, and obtain the sub-...

Embodiment 2

[0161] The embodiment of this application provides a method for processing problematic work orders, including:

[0162] (1) Collect 4,100 problem tickets generated in 2017 and manually marked with the actual classification results. The text content corresponding to the problem ticket includes: title, brief description, solution, etc.

[0163] There are 42 corresponding actual classification results, including: "resource management", "dual machine hot standby", "operating system and database", "installation, deployment and upgrade", "DBMAN", "alarm management", "topology management "Wait.

[0164] (2) Merge the text content in each problem work order into a character string, and perform word segmentation processing on the synthesized character string, and obtain a total of 4601 sample words, which are: a 1 ,a 2 ,...,a 4601

[0165] (3) Calculate the importance score of each sample vocabulary: build a random forest model.

[0166] The importance score calculation process o...

Embodiment 3

[0180] refer to Figure 8 As shown, it is a schematic diagram of a text classification device provided in Embodiment 3 of the present application, and the device includes: an acquisition module 81, a grouping module 82, and a classification module 83; wherein:

[0181] The obtaining module 81 is used to obtain the text to be classified, and determine the number of times each sample vocabulary in the sample vocabulary set appears in the text to be classified; the vocabulary in the sample vocabulary set is the sample used for text classification based on the text classification sub-model vocabulary;

[0182] The grouping module 82 is used to divide the sample vocabulary into multiple groups according to the sample vocabulary used by multiple text classification sub-models; wherein, each group corresponds to a text classification sub-model, and the sample vocabulary in different groups is not completely the same ;

[0183] The classification module 83 is used to input the number ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The present application provides a text classification method and device, wherein the method includes: obtaining the text to be classified, determining the number of times each sample vocabulary in the sample vocabulary set appears in the text to be classified; Sample vocabulary, divide the sample vocabulary into multiple groups; wherein, each group corresponds to a text classification sub-model, and the sample vocabulary in different groups is not exactly the same; the number of times the sample vocabulary in each group appears in the text to be classified, input to the text classification sub-model corresponding to each group, and obtain the sub-classification results corresponding to each group; based on the sub-classification results corresponding to each group, determine the classification result of the text to be classified. The embodiment of the present application has higher classification accuracy when classifying texts, and satisfies the classification requirements of higher accuracy, and then performs subsequent processing based on the classification results with higher efficiency.

Description

technical field [0001] The present application relates to the technical field of artificial intelligence, in particular, to a text classification method and device. Background technique [0002] Text classification has important applications in many fields. For example, classifying news texts can distinguish news texts corresponding to different types of news, which is conducive to the extraction of news texts and the rapid arrangement of news texts, etc.; the classification of problem work order texts generated during software product testing can help Quickly identify problems corresponding to problem tickets and respond in a timely manner. [0003] There are two main methods of text classification at present: frequency method and term frequency-inverse text frequency index (Term Frequency–Inverse Document Frequency, DF-IDF) method. Both the frequency method and the DF-IDF algorithm belong to the feature extraction method. [0004] Among them, the classification results ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityPatents(China)

IPC IPC(8): G06F16/35

Inventor王李鹏

OwnerNEW H3C BIG DATA TECH CO LTD

A text classification method and device

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology