Text classification method combining title and text attention mechanism

A text classification and attention technology, which is applied in text database clustering/classification, unstructured text data retrieval, instrumentation, etc., can solve the problems of ignoring the importance of title content and low accuracy of classification results, so as to eliminate ambiguity, The effect of improving accuracy

Pending Publication Date: 2019-05-14
ANHUI UNIVERSITY
View PDF4 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The purpose of the present invention is to provide a text classification method that combines the title and text attention mechanism, which solves the problem that the existing title-containing text clas

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text classification method combining title and text attention mechanism
  • Text classification method combining title and text attention mechanism
  • Text classification method combining title and text attention mechanism

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0056] In order to enable those skilled in the art to better understand the solutions of the application, the technical solutions in the embodiments of the application will be clearly and completely described below in conjunction with the drawings in the embodiments of the application. Obviously, the described embodiments are only It is a part of the embodiments of this application, not all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work should fall within the protection scope of this application.

[0057] It should be noted that the terms "including" and "having" in the specification and claims of this application and the above-mentioned drawings and any variations of them are intended to cover non-exclusive inclusions, for example, including a series of steps or units The process, method, system, product, or equipment of is not necessarily limited to those clearly listed st...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text classification method combining a title and a text attention mechanism. The method comprises the following steps: firstly, carrying out word segmentation preprocessing on a title and a main body of each document to obtain a title word set and a main body word set; training a word vector by adopting a word2vec CBOW model, the expression of each word combined with context semantics is learned by using a bidirectional recurrent neural network, and the potential semantic vector of one word is obtained through serial word vectors and the expression of left and right contexts of the word vectors; respectively carrying out maximum pooling processing on the potential semantic vectors of each word in the title word set and the text word set to obtain a title vector and a text vector; obtaining an attention vector by using a title and text attention mechanism; and after the vector representation of the whole document is calculated, outputting the category of the probability prediction text through a softmax function. The method solves the problem that the classification result is low in accuracy because the importance of the title content is ignored and the title is taken as one part of the text or the title information is ignored in the existing text classification with the title.

Description

Technical field [0001] The invention belongs to the field of natural language processing, and in particular relates to a text classification method that combines a headline and body attention mechanism. Background technique: [0002] With the rapid development of information platforms in all walks of life in our country, online text data has become immense. Massive text data contains important value, but how to efficiently organize and use these data has become a major problem. Text classification technology in natural language processing is an effective solution. [0003] The text classification technology is a process of constructing a classifier model from the first text data, and classifying the new text according to the established classifier model. Traditional text classification methods focus on the two major issues of feature engineering and selection of classifiers. There are problems such as high dimensionality and high sparseness of text representation, weak feature exp...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/35G06F17/27
Inventor 王涛
Owner ANHUI UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products