Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Graph neural network data sampling method and device, equipment and storage medium

A technology of data sampling and neural network, applied in biological neural network models, processor architecture/configuration, instruments, etc., can solve problems such as time complexity increase, poor data locality, long sampling time and calculation time, etc., to improve data locality sex, efficiency-enhancing effect

Pending Publication Date: 2022-02-15
NAT UNIV OF DEFENSE TECH
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, graph data in the real world is a complex and irregular structure. The sampling and traversal process of graph data involves irregular memory access, which leads to randomization of graph data access and poor data locality, and poor locality will lead to frequent global data access, and significantly increase the memory access time, which in turn causes the sample batch sampling time on the CPU to be longer than the model calculation time on the GPU, which leads to an unbalanced workload of the pipeline unit based on the CPU-GPU architecture, which damages the Pipeline Training Performance
Moreover, another problem brought by sample batch sampling is that neighborhood expansion introduces significant computational overhead
For example, when training an L-layer graph neural network model, it is necessary to sequentially sample the 1st to L-order neighborhoods of the target vertices, which leads to an exponential increase in time complexity with the depth of the graph neural network
This neighborhood explosion problem involves a lot of random memory access, resulting in poor data locality, and the sampled data may even exceed the GPU memory

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Graph neural network data sampling method and device, equipment and storage medium
  • Graph neural network data sampling method and device, equipment and storage medium
  • Graph neural network data sampling method and device, equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] At present, the existing sub-graph sampling methods for accelerated Graph Neural Network (GNN) training mainly include the pipeline overlapping method and the multi-process sampling method; wherein, the pipeline overlapping method refers to: performing sub-graph sampling on the CPU, and performing sub-graph sampling on the CPU. The GPU performs graph neural network model calculations, and the two run in a pipelined manner. This can overlap the sampling time of some subimages. However, the disadvantage of this method is that due to the random memory access and exponentially expanded neighborhood during sampling, the subgraph sampling time is much longer than the calculation time of the graph neural network model, resulting in very unbalanced pipeline units and affecting the efficiency of pipeline operation. The multi-process sampling method means that some graph neural network training frameworks use a multi-process sampling method based on the CPU-GPU pipeline training ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a graph neural network data sampling method and device, equipment and a storage medium. According to the scheme, vertexes of an original image data set are clustered, and the training vertexes are sorted according to the clustering categories of the training vertexes, so that a batch of training vertexes can be sampled in the same cluster at the same time during sampling in the sampling process, and the data locality of sampling is improved; moreover, as the training vertexes in the same cluster generally have more similar attributes and are closely connected parts, and connection among different clusters is little, the neighborhood vertexes expanded in the same cluster are concentrated in the same cluster, and the vertexes in the same cluster are close in storage, so that the sampling data locality can be improved, the range of neighborhood expansion is limited, and the efficiency of sub-graph sampling is improved.

Description

technical field [0001] The present invention relates to the technical field of graph data sampling, and more specifically, to a graph neural network data sampling method, device, equipment and storage medium. Background technique [0002] At present, graph data, as a kind of unstructured data, has been widely used in recommender systems, social networks, knowledge graphs and other fields. Graph neural networks have emerged as powerful tools for processing graph data. Different from image recognition and sentence processing, the existing graph neural network model training process needs to sample each batch of training samples on the CPU (Central processing unit, central processing unit) before data loading, and process them on the GPU (graphics processing unit). , graphics processor) for model calculations. However, graph data in the real world is a complex and irregular structure. The sampling and traversal process of graph data involves irregular memory access, which lea...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62G06V10/774G06N3/063G06T1/20
CPCG06N3/063G06T1/20G06F18/23G06F18/214Y02D10/00
Inventor 李东升张立志赖志权刘锋黄震乔林波梅松竹牛新
Owner NAT UNIV OF DEFENSE TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products