Text multi-label classification method based on semantic unit information

A technology of semantic units and classification methods, applied in text database clustering/classification, semantic tool creation, unstructured text data retrieval, etc., can solve problems such as insufficient attention mechanism noise affecting classification contribution, etc., and achieve parameter increment Small, less parameter cost and time cost, and the effect of improving accuracy

Active Publication Date: 2019-04-05
PEKING UNIV
View PDF2 Cites 46 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to overcome the deficiencies of the above-mentioned prior art, the present invention provides a text multi-label classification method based on semantic unit information, and establishes a semantic unit multi-label classification model (Semantic Unit for Multi-label Text Classification, the model is referred to as SU4MLC), which will be based on attention The sequence-to-sequence model of the mechanism is improved as a baseline model, and the content that the attention mechanism focuses on is improved, that is, the repr

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text multi-label classification method based on semantic unit information
  • Text multi-label classification method based on semantic unit information
  • Text multi-label classification method based on semantic unit information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] Below in conjunction with accompanying drawing, further describe the present invention through embodiment, but do not limit the scope of the present invention in any way.

[0039] The present invention provides a text multi-label classification method based on semantic unit information, which improves the sequence-to-sequence model based on the attention mechanism as the baseline model, improves the content that the attention mechanism focuses on, and improves the attention mechanism at the source Represents and improves the contribution of attention-based sequence-to-sequence models in text multi-label classification.

[0040] The baseline model adopted by the model SU4MLC proposed by the present invention is the sequence-to-sequence model of the cyclic neural network based on the attention mechanism, and the cyclic neural network adopts the LSTM. figure 1 It is a schematic flow chart of the method of the present invention. The specific implementation steps are as fol...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text multi-label classification method based on semantic unit information, which comprises the following steps: establishing a semantic unit multi-label classification modelSU4MLC, taking a recurrent neural network sequence based on an attention mechanism to a sequence model as a baseline model for improvement, and improving the expression of the attention mechanism by improving a source end; Extracting semantic unit related information from the context representation of the source end of the baseline model by using hole convolution in deep learning to obtain semantic unit information; Combining the semantic unit information with the word level information by using a multi-layer mixed attention mechanism, and providing the combined information for a decoder; Anddecoding the tag sequence by using a decoder, thereby realizing text multi-tag classification based on semantic unit information. According to the method, the problems that an existing attention mechanism is easily influenced by noise and contributes to classification insufficiently can be solved, the contribution of the attention mechanism to text classification can be improved, and the text multi-label classification problem can be more efficiently solved.

Description

technical field [0001] The invention relates to natural language processing technology, in particular to a text multi-label classification method based on semantic unit information. Background technique [0002] Text multi-label classification technology is a natural language processing technology, mainly for labeling the input text, which is equivalent to dividing the text into multiple label categories. Work in this field has strong application value, such as labeling and classifying news texts in the news field, or classifying user information to construct user portraits. [0003] In the past, the work of multi-label text was generally regarded as a multi-classification problem. The traditional method used the combination of linguistic knowledge and statistical methods to achieve classification. After the rise of machine learning, many methods based on machine learning algorithms, For example, rank-SVM based on SVM and ML-KNN based on KNN have made great progress compare...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/35G06F16/36G06F17/27G06N3/04
CPCG06F40/247G06F40/289G06N3/045
Inventor 林俊旸苏祺孙栩
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products