Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Text Classification Method Based on GCN

A text classification and sample technology, applied in text database clustering/classification, neural learning methods, unstructured text data retrieval, etc., can solve problems such as not considering word order information, unable to model sequence information, etc.

Active Publication Date: 2021-11-05
BEIJING UNIV OF TECH
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The current text classification models have their own problems. For example, the fastText model does not consider the word order information in the network structure, and the TextCNN model cannot model longer sequence information when the word order information is considered.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Text Classification Method Based on GCN
  • A Text Classification Method Based on GCN
  • A Text Classification Method Based on GCN

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0056] The present invention provides a text classification method of GCN, comprising:

[0057] Step 1. Write a python script, which uses the Beautiful Soup framework (an HTML or XML parsing library for python) to extract from the CSDN blog pages including title, text chapter, publication time, article classification (if any, the classification is the author own classification) and other data content; distributed realization of multi-server crawling website data at the same time, speeding up the crawling speed. In short, using the "crawler" technology, mainly collect the data content of java, python, front-end, database and other categories from the CSDN blog, collect and build a text classification corpus, the total number of samples of the corpus is N, and each sample Contains a title and a paragraph of text.

[0058] Step 2, perform preprocessing on the corpus set in step 1; preprocessing is: load the dictionary through the jieba word segmentation component, and perform wo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a GCN-based text classification method, comprising: obtaining a text classification corpus; wherein, the corpus includes a plurality of samples, each sample includes a title and a chapter; preprocessing the corpus, and converting the preprocessed corpus The set is divided into training set, verification set and test set; the text is processed through spacy, and the graph relationship between words is extracted; according to the graph relationship, each word is embedded in the low-dimensional real-valued vector space of the matrix; according to the vector representation of the word, Construct a bidirectional LSTM and obtain a sentence representation; reconstruct the sentence representation based on the self-attention mechanism, input it into the GCN neural network, and train a semantic classification model; input the text word vector of the verification set into the model, and save the record on the verification set The model parameters when the effect is optimal; the optimal model based on the verification set is used to test the test set to obtain the classification result. The present invention uses LSTM and GCN plus an attention mechanism to finally obtain more accurate class results.

Description

technical field [0001] The present invention relates to the technical field of text classification, in particular to a text classification method based on GCN (Graph convolutional networks, graph convolutional neural network). Background technique [0002] In the past few years, with the rapid development of science and technology, especially the rapid development of the Internet and social networks, all kinds of information are flooding the Internet. Among them, the CSDN blog is developing rapidly, providing a platform for Internet technicians to develop and communicate. Experience, and share the solutions to the problems you encounter, etc.; for everyone to exchange and learn, and also to make a record of your own growth. With the development of the platform, the number of users has increased, and more and more articles have been published. People can obtain a large amount of information through the platform, but how to find the law from these information, discover curren...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F40/279G06N3/04G06N3/08
CPCG06F16/35G06N3/049G06N3/08G06N3/045
Inventor 张丽郑鑫
Owner BEIJING UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products