Hierarchical text classification method and system

A text classification and hierarchical technology, applied in text database clustering/classification, neural learning methods, text database indexing, etc., can solve problems such as error stacking, achieve the effect of improving accuracy and reducing the number of prediction errors

Active Publication Date: 2019-12-06
JINAN UNIVERSITY
View PDF5 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The purpose of the present invention is to overcome the shortcomings and deficiencies of the prior art, and provide a hierarchical text classification method and system, which can solve the problem of error stacking in the hierarchical text classification problem. According to the tree structure of the text class label, in In the construction of the classification model, the relationship between nodes and sibling nodes and parent nodes, as well as the influence of classification results between the upper and lower layers, are fully considered to obtain classification results with higher accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hierarchical text classification method and system
  • Hierarchical text classification method and system
  • Hierarchical text classification method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0051] In this embodiment, a hierarchical text classification method is mainly based on figure 1 In the hierarchical structure shown, a classifier is trained on each non-leaf node, and then the connection between text class labels is used to introduce the concept of "adjusted probability matrix" between class labels. The adjusted probability matrix obtained through training is The text class labels are revised globally, and a global hierarchical text classification model is constructed to obtain more accurate text class labels. figure 2 Shows the flow of the training phase of the hierarchical text classification method of this embodiment, image 3 The flow of the actual classification stage of the hierarchical text classification method of this embodiment is shown, and the above two stages will be described in detail below with reference to the accompanying drawings.

[0052] The training phase of this embodiment mainly includes several steps such as obtaining the training s...

Embodiment 2

[0084] This embodiment provides a hierarchical text classification system, including:

[0085] The text acquisition module is used to obtain the training set text and the text to be classified;

[0086] A text preprocessing module, configured to preprocess each text obtained;

[0087] The text vectorization module is used to vectorize the preprocessed text, and express the words in the text as vector forms;

[0088] Each level classifier training module, the classifier obtained is used to preliminarily predict the text class label probability vector, and the vector element represents the probability that the text is divided into each class label; the construction method of each level classifier is: according to the text class label The tree-type hierarchical structure, numbering the class label nodes in the text class label hierarchical tree; taking the training text vector set and the text subsets corresponding to the categories of each layer as input, using the neural netwo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a hierarchical text classification method and system, and the method comprises the steps: numbering class label nodes in a text class label hierarchical tree according to a tree-type hierarchical structure of a text class label; training a classifier on each non-leaf node, outputting a preliminary prediction class label probability vector by the classifier, and representingthe probability that the text is divided into each class label by a vector element; training an adjustment probability matrix by using the relationship between text class labels, and adjusting the probability that elements in the probability matrix are class labels to be adjusted into various class labels; and performing global overall correction on the text class labels through the adjustment probability matrix obtained by training, and constructing a global hierarchical text classification model. The relationship between layers of the class labels is broken through, so that the text class prediction error rate can be reduced in each layer, and the accuracy of hierarchical text classification is improved.

Description

technical field [0001] The invention relates to the research field of computer natural language processing and text classification, in particular to a hierarchical text classification method and system. Background technique [0002] With the rapid development of Internet technology, hundreds of millions of text data are generated every day. How to manage these text data has become a very difficult problem, and text classification is one of the best ways to solve this problem. The research on text classification methods has a long history, and good results have been achieved when the number of class labels is relatively small and each text has a class label. However, as the number of texts increases, the number of class labels of texts also increases rapidly, and each text may be divided into categories with different granularities at the same time. In this case, the direct classification algorithm is difficult to meet the needs of users. When the amount of text data is in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06F16/31G06F17/27G06K9/62G06N3/04G06N3/08
CPCG06F16/35G06F16/322G06N3/08G06N3/045G06F18/2415G06F18/241
Inventor 刘波李洋洋
Owner JINAN UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products