Scientific and technological paper clustering analysis method based on variational diagram auto-encoder and K-Means

An autoencoder and cluster analysis technology, applied in the field of network science and machine learning, can solve the problems of difficult paper classification management and low classification accuracy, and achieve the effect of reducing analysis and computing costs, reducing computing costs, and improving accuracy.

Pending Publication Date: 2020-12-15
ZHEJIANG UNIV OF TECH
View PDF6 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to overcome the current trend that the citation network has an increasingly large scale, the classification management of papers is more difficult and ...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Scientific and technological paper clustering analysis method based on variational diagram auto-encoder and K-Means
  • Scientific and technological paper clustering analysis method based on variational diagram auto-encoder and K-Means
  • Scientific and technological paper clustering analysis method based on variational diagram auto-encoder and K-Means

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] The present invention will be further described below in conjunction with the accompanying drawings.

[0019] refer to Figure 1 ~ Figure 3 , a method for clustering analysis of scientific papers based on variational graph autoencoders and K-Means, including the following steps:

[0020] Step 1: Express the data of scientific papers to be analyzed as a citation network G=(V, E, F), where V={v 1 ,v 2 ,...,v n} is a collection of nodes, each node represents a scientific paper, the number of nodes is the total number of scientific papers n=|V|, E is a collection of edges, if there is a citation relationship between two papers, the corresponding nodes of the two papers There is an edge between them, and the edge relationship of all papers constitutes an n×n adjacency matrix A, and the keyword attribute of each paper is F={f 1 ,f 2 ,..., f m}, the number of attributes m=|F|, the attributes of all papers are expressed as an n×m attribute information characteristic matri...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a scientific and technological paper clustering analysis method based on variational diagram auto-encoder and K-Means, which comprises the following steps of: constructing a citation network G=(V, E, F) by utilizing existing scientific and technological paper data, and constructing a variational diagram auto-encoder consisting of an encoder and a decoder according to an adjacent matrix A of a citation relationship between papers and a characteristic matrix F of paper keyword attributes, taking minimization of distance measurement between a reconstructed adjacency matrixand an original adjacency matrix A and divergence of node representation vector distribution and normal distribution as targets, training in an unsupervised mode to obtain multi-dimensional Gaussiandistribution, and sampling from the distribution to obtain a low-dimensional embedded vector z of a node; and then clustering the low-dimensional embedded vector z by using the K-Means algorithm to obtain a division result of the science and technology paper, and performing two-dimensional visual display after dimension reduction by using a tSNE algorithm. According to the method, the accuracy ofscientific and technological paper clustering analysis is improved, and the calculation cost of analysis is reduced.

Description

technical field [0001] The invention relates to the fields of network science and machine learning, in particular to a method for clustering and analyzing scientific papers based on variational graph autoencoders and K-Means. Background technique [0002] Academic papers have experienced more than 350 years of development history, forming a complex citation network of super-large-scale knowledge flow and information dissemination. The citation network implies a research group composed of literature authors, which has similar or related research directions. Community discovery algorithms through complex networks can divide citation networks into different research groups. Cluster analysis of citation networks, in addition to author clustering, also includes journal clustering and article clustering. The citation network is a gradually growing scientific network. As time goes by, the scale of the citation network will become larger and larger. As a result, the cluster analys...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/35G06K9/62G06N3/04
CPCG06F16/35G06N3/045G06F18/23213
Inventor 徐新黎刘锐肖云月杨旭华许营坤
Owner ZHEJIANG UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products