Supercharge Your Innovation With Domain-Expert AI Agents!

Text classification method and system based on Attention graph attention network

A technology of text classification and attention, applied in text database clustering/classification, biological neural network model, unstructured text data retrieval, etc., can solve inaccurate, difficult data acquisition and classification, unstructured text obscure and imprecise And other issues

Pending Publication Date: 2021-06-08
NORTHEAST FORESTRY UNIVERSITY
View PDF4 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In order to solve the problems of unstructured text contained in geographical texts that are obscure and inaccurate, and the existing technology is difficult and inaccurate in obtaining and classifying a large amount of data, the present invention proposes a text classification method and system based on Attention-based graph attention network , the scheme is as follows:

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text classification method and system based on Attention graph attention network
  • Text classification method and system based on Attention graph attention network
  • Text classification method and system based on Attention graph attention network

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0048] Embodiment 1: A text classification system based on an Attention-based graph attention network, the system includes a text collection module, a data preprocessing module, a text construction module, a feature node module and a text classification module, and the modules are connected in a progressive logical order ;

[0049] Firstly, the text collection module is responsible for data collection, labeling and segmentation. Secondly, the data preprocessing module is responsible for preprocessing the data obtained by the text collection module. Then, the text construction module is responsible for combining the sentences in the text with the words or words in the data set. As a node, a graph is formed after establishing an edge and an attention mechanism is introduced. Again, the feature node module extracts and updates the feature vectors of adjacent nodes, and finally the text classification module performs geographical text classification according to the existing tag da...

specific Embodiment approach 2

[0050] Embodiment 2: A text classification method based on the Attention-based graph attention network. By introducing the attention mechanism, the ordinary graph convolution formula is improved, so that the geographical information text can aggregate the characteristics of the context, so that the geographical information in the text is information is more discernible.

[0051] The overall steps of this embodiment are as follows figure 2 As shown, it is realized through the following method steps:

[0052] S101: Collect text, label part of the data, and complete the segmentation of training data and test data;

[0053] S102: Perform word segmentation on the data, remove stop words and difficult-to-recognize special characters, and complete data preprocessing.

[0054] S103: Construct the text as graph structure data, use each sentence and the words or words in the data set as nodes, and establish edges with the relationship between words;

[0055] S104: Construct a graph ...

specific Embodiment approach 3

[0089] In addition to the system and method steps described in the specific embodiment one and two, such as image 3 As shown, this embodiment is realized in the following way:

[0090] Collect text data in network circulation, select part of the data from the total data for labeling, and then select 80% as the training set and 20% as the data set.

[0091] The graph data construction module S201 constructs the preprocessed text serialized data into graph data with a topological structure.

[0092] The graph attention network module S202 is used to train and test the entire graph data set, so that the initial features of each text are aggregated to the features of adjacent nodes to be updated.

[0093] The classification module S203 uses the fully connected layer and the softmax function to classify the updated feature vectors. There are two methods for text segmentation, word-level word segmentation and word-level word segmentation. Therefore, when a text sequence is converte...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a text classification method based on an Attention graph attention network, belongs to the field of natural language processing, and aims to solve the problems that unstructured texts contained in geographic texts are obscure and inaccurate, and a large amount of data is difficult to acquire and classify in the prior art. According to the method, an attention mechanism is introduced into the text graph convolution network, so that different weights are given to a common normalization process in convolution operation, and nodes (texts) to be classified can learn features with different weights according to the importance degree of the context to the nodes (texts). According to the method, feature aggregation is carried out in a self-established geographic text data set according to a context relation, and under the action of marked data, whether data pairs with unknown tags belong to geographic texts or not is classified. According to the text classification method based on the Attention graph attention network, texts containing geographic information can be accurately extracted from a large amount of text information, so that reliable data can be effectively provided for downstream tasks.

Description

technical field [0001] The invention is a text classification method and system based on an Attention-based graph attention network, in particular relates to the application of the Attention-based graph attention in the text classification process of a neural network, and belongs to the field of natural language processing. Background technique [0002] In the Internet, a large number of texts are generated every day, and these texts come from various fields. Most texts contain information in multiple fields. Compared with image data, the information contained in text is more obscure and more numerous. Data in web text is divided into three types: structured data, semi-structured data, and unstructured data. For structured data, the industry refers to relational model data, that is, data managed in the form of relational database tables. Semi-structured data refers to some non-relational model data with a basic fixed structure pattern, such as log files, XML documents, JSO...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/35G06F40/289G06N3/04
CPCG06F16/35G06F40/289G06N3/045
Inventor 景维鹏陈广胜宋先阳刘鹏
Owner NORTHEAST FORESTRY UNIVERSITY
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More