Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Text classification method based on graph kernel and convolutional neural network

A convolutional neural network and text classification technology, applied in the fields of data mining and information retrieval, can solve the problem of losing text semantic structure information, and achieve the effect of solving the complex and tedious processing process

Inactive Publication Date: 2018-08-10
BEIJING INSTITUTE OF TECHNOLOGYGY
View PDF3 Cites 26 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] For the problem of text classification, the existing technology mainly expresses the text as a vector space model, which loses the semantic structure information of the text. The present invention proposes a text classification method based on graph kernel and convolutional neural network, which can Effectively preserve the semantic structure of the text and improve classification accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text classification method based on graph kernel and convolutional neural network
  • Text classification method based on graph kernel and convolutional neural network
  • Text classification method based on graph kernel and convolutional neural network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0051] Such as figure 1 As shown, this embodiment is divided into five steps altogether, specifically as follows:

[0052] Step A, convert the text into a graph structure, such as figure 2 shown.

[0053] A.1 Firstly, word segmentation is performed on the text. In Chinese texts, words are written consecutively, unlike Western texts, where words are naturally separated. Therefore, it is first necessary to divide Chinese articles into word sequences. The mainstream Chinese word segmentation algorithms include forward maximum matching method, reverse maximum matching method, best matching method, word-by-word traversal method, optimal path method, etc. The algorithm used in this paper is maximum string matching, which is a segmentation method based on statistics. When the adjacent co-occurrence probability of two words is higher than a threshold, it is considered that this word group may constitute a word.

[0054] A.2 Remove stop words, punctuation, and numbers in the text, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a text classification method based on a graph kernel and a convolutional neural network, and belongs to the technical field of data mining and information retrieval. The coreconcept is that: firstly, a text is preprocessed into an expression mode of a graph structure, wherein nodes in a graph correspond to words in the text; then weights of the nodes are calculated on thebasis of the graph structure, then the graph structure is decomposed into a plurality of sub-graphs by using a community discovery algorithm, the graph is mapped to a high-dimensional space by usinga graph kernel technology so as to obtain a tensor expression of the graph; finally, the tensor expression is input into the convolutional neural network, deep mining is carried out on graph features,and a category of the text is output. Compared with the prior art, according to the text classification method, an internal structure and context semantics of the text can be sufficiently utilized, so that text contents are sufficiently expressed; node information is more reasonable; the problem of complex and fussy processing process in the text classification is effectively solved.

Description

technical field [0001] The invention relates to a text classification method, in particular to a text classification method based on a graph kernel and a convolutional neural network, and belongs to the technical fields of data mining and information retrieval. Background technique [0002] With the advent of the era of big data, the amount of information has exploded, and the way of information processing has gradually transitioned from traditional manual processing to automated processing. As an important task of information processing, text classification aims to automatically classify unlabeled documents into a predetermined category set, which can solve the phenomenon of information clutter to a large extent, and then realize efficient management of massive information. Text classification technology has been It is widely used in information filtering, information retrieval, topic detection and tracking and other fields. [0003] There are three main types of text clas...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/353G06F16/355G06F16/36G06F16/9024G06F40/289
Inventor 郭平张璐璐辛欣
Owner BEIJING INSTITUTE OF TECHNOLOGYGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products