A text classification method and device

A text classification and text technology, applied in the field of artificial intelligence, can solve problems such as low classification accuracy, unfavorable text processing, and failure to meet classification requirements, and achieve high efficiency and high classification accuracy

Active Publication Date: 2021-06-08
NEW H3C BIG DATA TECH CO LTD
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although the DF-IDF method weakens the impact of the frequency of vocabulary on the classification results, the classification accuracy is low and cannot meet the classification requirements of higher accuracy, which is not conducive to the further processing of the text

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A text classification method and device
  • A text classification method and device
  • A text classification method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0037] see figure 1 As shown, it is a flow chart of the text classification method provided in Embodiment 1 of the present application. The method includes steps S101 to S104, wherein:

[0038] S101: Obtain the text to be classified, and determine the number of times each sample word in the sample vocabulary set appears in the text to be classified; the sample words in the sample vocabulary set are sample words used for text classification based on the text classification sub-model.

[0039] S102: Divide the sample vocabulary into multiple groups according to the sample vocabulary respectively used by the multiple text classification sub-models; wherein, each group corresponds to a text classification sub-model, and the sample vocabulary in different groups is not completely the same.

[0040] S103: Input the number of occurrences of the sample words in each group in the text to be classified into the text classification sub-model corresponding to each group, and obtain the sub-...

Embodiment 2

[0161] The embodiment of this application provides a method for processing problematic work orders, including:

[0162] (1) Collect 4,100 problem tickets generated in 2017 and manually marked with the actual classification results. The text content corresponding to the problem ticket includes: title, brief description, solution, etc.

[0163] There are 42 corresponding actual classification results, including: "resource management", "dual machine hot standby", "operating system and database", "installation, deployment and upgrade", "DBMAN", "alarm management", "topology management "Wait.

[0164] (2) Merge the text content in each problem work order into a character string, and perform word segmentation processing on the synthesized character string, and obtain a total of 4601 sample words, which are: a 1 ,a 2 ,...,a 4601

[0165] (3) Calculate the importance score of each sample vocabulary: build a random forest model.

[0166] The importance score calculation process o...

Embodiment 3

[0180] refer to Figure 8 As shown, it is a schematic diagram of a text classification device provided in Embodiment 3 of the present application, and the device includes: an acquisition module 81, a grouping module 82, and a classification module 83; wherein:

[0181] The obtaining module 81 is used to obtain the text to be classified, and determine the number of times each sample vocabulary in the sample vocabulary set appears in the text to be classified; the vocabulary in the sample vocabulary set is the sample used for text classification based on the text classification sub-model vocabulary;

[0182] The grouping module 82 is used to divide the sample vocabulary into multiple groups according to the sample vocabulary used by multiple text classification sub-models; wherein, each group corresponds to a text classification sub-model, and the sample vocabulary in different groups is not completely the same ;

[0183] The classification module 83 is used to input the number ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present application provides a text classification method and device, wherein the method includes: obtaining the text to be classified, determining the number of times each sample vocabulary in the sample vocabulary set appears in the text to be classified; Sample vocabulary, divide the sample vocabulary into multiple groups; wherein, each group corresponds to a text classification sub-model, and the sample vocabulary in different groups is not exactly the same; the number of times the sample vocabulary in each group appears in the text to be classified, input to the text classification sub-model corresponding to each group, and obtain the sub-classification results corresponding to each group; based on the sub-classification results corresponding to each group, determine the classification result of the text to be classified. The embodiment of the present application has higher classification accuracy when classifying texts, and satisfies the classification requirements of higher accuracy, and then performs subsequent processing based on the classification results with higher efficiency.

Description

technical field [0001] The present application relates to the technical field of artificial intelligence, in particular, to a text classification method and device. Background technique [0002] Text classification has important applications in many fields. For example, classifying news texts can distinguish news texts corresponding to different types of news, which is conducive to the extraction of news texts and the rapid arrangement of news texts, etc.; the classification of problem work order texts generated during software product testing can help Quickly identify problems corresponding to problem tickets and respond in a timely manner. [0003] There are two main methods of text classification at present: frequency method and term frequency-inverse text frequency index (Term Frequency–Inverse Document Frequency, DF-IDF) method. Both the frequency method and the DF-IDF algorithm belong to the feature extraction method. [0004] Among them, the classification results ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35
Inventor 王李鹏
Owner NEW H3C BIG DATA TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products