GCN-based text classification method

A text classification and sample technology, applied in text database clustering/classification, neural learning methods, unstructured text data retrieval, etc., can solve the problem of unable to model sequence information, without considering word order information

Active Publication Date: 2020-06-12
BEIJING UNIV OF TECH
View PDF4 Cites 23 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The current text classification models have their own problems. For example, the fastText model does not consider the word order infor

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • GCN-based text classification method
  • GCN-based text classification method
  • GCN-based text classification method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0056] The present invention provides a text classification method of GCN, comprising:

[0057] Step 1. Write a python script, which uses the Beautiful Soup framework (an HTML or XML parsing library for python) to extract from the CSDN blog pages including title, text chapter, publication time, article classification (if any, the classification is the author own classification) and other data content; distributed realization of multi-server crawling website data at the same time, speeding up the crawling speed. In short, using the "crawler" technology, mainly collect the data content of java, python, front-end, database and other categories from the CSDN blog, collect and build a text classification corpus, the total number of samples of the corpus is N, and each sample Contains a title and a paragraph of text.

[0058] Step 2, perform preprocessing on the corpus set in step 1; preprocessing is: load the dictionary through the jieba word segmentation component, and perform wo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a GCN-based text classification method. The method comprises the steps of obtaining a text classification corpus set, wherein the corpus set comprises a plurality of samples, and each sample comprises a title and a chapter; preprocessing the corpus set, and dividing the preprocessed corpus set into a training set, a verification set and a test set; processing the chapters through space, and extracting a graph relationship between words; embedding each word into a low-dimensional real value vector space of the matrix according to the graph relationship; constructing a bidirectional LSTM according to the vector representation of the word, and obtaining sentence representation; reconstructing sentence representation based on a self-attention mechanism, inputting the sentence representation into a GCN neural network, and training a semantic classification model; inputting the text word vector of the verification set into the model, and recording and storing model parameters when the effect on the verification set is optimal; and testing the test set based on the optimal model obtained by the verification set to obtain a classification result. According to the method, a more accurate class result is finally obtained by applying LSTM and GCN in combination with an attention mechanism.

Description

technical field [0001] The present invention relates to the technical field of text classification, in particular to a text classification method based on GCN (Graph convolutional networks, graph convolutional neural network). Background technique [0002] In the past few years, with the rapid development of science and technology, especially the rapid development of the Internet and social networks, all kinds of information are flooding the Internet. Among them, the CSDN blog is developing rapidly, providing a platform for Internet technicians to develop and communicate. Experience, and share the solutions to the problems you encounter, etc.; for everyone to exchange and learn, and also to make a record of your own growth. With the development of the platform, the number of users has increased, and more and more articles have been published. People can obtain a large amount of information through the platform, but how to find the law from these information, discover curren...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/35G06F40/279G06N3/04G06N3/08
CPCG06F16/35G06N3/049G06N3/08G06N3/045
Inventor 张丽郑鑫
Owner BEIJING UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products