A heterogeneous graph attribute completion method based on efficient meta-path context-aware learning
By employing a heterogeneous graph attribute completion method based on metapath and context awareness, combined with an improved graph attention neural network, the problem of missing attributes in heterogeneous graphs is solved, improving computational efficiency and the accuracy of downstream tasks.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- NORTHWESTERN POLYTECHNICAL UNIV
- Filing Date
- 2023-09-22
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies ignore meta-path information and neighboring node information when completing missing attributes in heterogeneous graphs, resulting in low computational efficiency and information loss, making it difficult to effectively perform downstream tasks.
Node encoding is performed using a meta-path-based random walk and Skip-gram model, attribute completion is performed using context-aware attention, and an improved graph attention neural network is used for downstream tasks. Edge type information and residual connections are introduced to improve computational efficiency.
It effectively improves the computational efficiency of heterogeneous graph attribute completion and the accuracy of downstream tasks, captures information of non-neighboring nodes, and enhances the performance of node classification.
Smart Images

Figure CN117422106B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of graph neural network technology, specifically relating to a method for attribute completion of heterogeneous graphs. Background Technology
[0002] With the rapid development of information technologies such as the Internet and the Internet of Things, data is becoming increasingly massive and complex. Among these, graph data has become a research hotspot. Based on the number of node types and edge types, graphs can be divided into homogeneous graphs and heterogeneous graphs. Because heterogeneous graphs contain more comprehensive information and richer semantics, they are better able to model complex real-world systems than traditional homogeneous graphs, and are receiving increasing attention. However, in most real-world graph applications, graph structure data is not always complete; typically, only a small subset of samples has complete attribute information. For example, in Amazon's co-buying graph, due to privacy concerns, consumers tend to selectively provide feedback only on specific items. During data collection, some features may be lost due to human error caused by mechanical or electronic malfunctions. In social networks, many users are unwilling to provide information such as address, nationality, and age to protect their privacy. Furthermore, graphs in the real world are inherently dynamic, so newly added nodes often contain very little information. Therefore, graph representation learning for incomplete attribute graphs is becoming an increasingly important task. Because of the large number of node types involved, this problem is more serious in heterogeneous graphs. It is usually impossible to obtain all attributes of all types of nodes. Therefore, it is more urgent to perform attribute completion for heterogeneous graphs with missing attributes to improve their performance in downstream tasks such as node classification and graph classification.
[0003] Recent works on attribute completion for heterogeneous graphs have achieved promising results, mainly falling into two categories: completion based on traditional machine learning methods and end-to-end missing information completion based on graph neural networks. While graph neural network-based methods have yielded good results, they still face the following limitations: 1) Previous research primarily relied on random walks for embedding learning, neglecting the role of meta-paths. This is convenient and effective for homogeneous graphs, but for heterogeneous graphs with various node types, incorporating meta-path information can provide richer prior information. 2) In previous studies, attribute completion relied heavily on information from adjacent nodes, leading to the loss of much important information. For example, for the path "movie—actor—movie," if an actor appears in movies of the same type, it means the two movies can provide complementary information, which previous methods often failed to capture. 3) After attribute completion, previous methods used heterogeneous graph neural networks for downstream tasks, but the computation is relatively slow because traditional heterogeneous graph neural networks require the inclusion of meta-paths. Summary of the Invention
[0004] To overcome the shortcomings of existing technologies, this invention provides a heterogeneous graph attribute completion method based on efficient meta-path context-aware learning. First, nodes are encoded using graph structure information and meta-path information. Next, context-aware attention is used to fill in missing attributes. Finally, the completed attributes are input into a heterogeneous graph neural network for downstream tasks. This invention effectively improves computational efficiency.
[0005] The technical solution adopted by this invention to solve its technical problem includes the following steps:
[0006] Step 1: Given a graph G = (V, E, A, X), where V = {v0, v1, ..., v...} N Let} represent the set of N nodes, E represent the set of edges, and A∈R. N×N The adjacency matrix of a graph, X∈R N×d It is a node feature matrix, where d represents the node feature dimension; the node set V is divided into V... cp and V ms V cp It is a complete set of nodes with attributes, V ms It is a set of nodes with missing attributes, V cp The corresponding feature matrix is X cp ;
[0007] Graph nodes are encoded using meta-path-based random walks; given a meta-path P: T1→T2→…T i ...→T l T i It is node v i The sampling probability at step i is defined as follows:
[0008]
[0009] in Represents node v i-1 In the neighborhood, the type is equal to T i The number of nodes, t(v i ) = T i It is a mapping function that maps node v i Mapped to its corresponding type; A i-1,i =1 indicates that there is an edge between node i and node i-1;
[0010] The Skip-gram model is used for graph node encoding; the sampled node sequence is treated as sentences, and each node represents a word in the sentence; given a center word, the Skip-gram model selects some context words within a window around it and then predicts the probability distribution of these context words; the training objective of the Skip-gram model is to maximize the probability of predicting context words while minimizing the probability of negatively sampled words; the objective function is expressed as:
[0011]
[0012] Where β is the set of node types, and N F(v) It is the set of nodes of type F among the neighbors of node v, obtained through random walk sampling; using a softmax function. Calculate the probabilities that u and v are neighbors; finally, for each node v i The obtained embedding is represented as E i , using E cp E represents an embedding of a node with complete attributes. ms This indicates an embedding of a node with missing attributes;
[0013] Step 2: After obtaining the node code, the next step is to use context awareness to complete the missing attributes;
[0014] Define the query as the encoding matrix E of nodes with missing attributes. ms The key is the encoding matrix E of the node with complete attributes. cp The value is the feature matrix X containing all attribute nodes. cp Attention weights are calculated using the similarity between the query and the key, and then these weights are applied to the value matrix to obtain the completed attribute information; the formula is as follows:
[0015] X ac =MultiHead(E ms E cp ,X cp (3)
[0016] X new =[X cp |X ac (4)
[0017] Step 3: Obtain the completed attribute X new In the future, the heterogeneous graph attention neural network GAT will be used for downstream tasks of node classification.
[0018] Step 3-1: Calculate the attention weights between node pairs in the neighborhood of each node, and then use these weights as a weighted sum of the features of neighboring nodes to form a new node representation;
[0019] For node i and its neighbor node j, GAT calculates the attention weight e between them. ij and normalize it to get w ij For node i, GAT uses the feature vectors h of all its neighboring nodes. j Through w ij We perform a weighted summation to obtain a new eigenvector h of i. i More complex graph neural networks can be achieved by stacking multiple GAT layers; the specific formula is as follows:
[0020]
[0021] Step 3-2: Extend the original graph attention mechanism by incorporating edge type information into the attention calculation; specifically, at each layer, assign a d-dimensional embedding for each edge type and use the edge type embedding and node embedding to calculate the attention score; update the attention weights w. ij The calculation formula is as follows:
[0022]
[0023] Here, W r It is a learnable transformation matrix. This represents the characteristics of the edge between node i and node j in the l-th layer;
[0024] Step 3-3: A residual connection was added to the node representations between different layers; the node aggregation formula for layer l+1 is shown below:
[0025]
[0026] in, It is a transformation matrix used when the dimensions represented by nodes in layer l and layer (l+1) are different;
[0027] Steps 3-4: After obtaining the original attention score, add residual connections to the attention score:
[0028]
[0029] The hyperparameter β∈[0,1] is a scaling factor. It is the raw attention score. It is a new attention score;
[0030] Steps 3-5: Apply L2 regularization to the output embedding:
[0031]
[0032] Preferably, the Adam optimizer is used during model training.
[0033] The beneficial effects of this invention are as follows:
[0034] This invention proposes a novel method for attribute completion in heterogeneous graphs. Unlike previous methods, this invention presents an efficient meta-path context-aware learning framework to improve the completion of missing attributes in heterogeneous graphs. Within this framework, a meta-path-driven node embedding scheme is introduced, incorporating valuable meta-path prior knowledge during node sampling. Furthermore, a context-aware attention mechanism is utilized to complete missing node attributes, capturing information from non-neighboring nodes. Finally, an improved graph attention network is employed for downstream tasks, effectively improving computational efficiency. Experiments demonstrate that the proposed method achieves state-of-the-art performance on multiple benchmarks. Attached Figure Description
[0035] Figure 1 This is a schematic diagram of the method of the present invention.
[0036] Figure 2 This is a schematic diagram of the node classification results in an embodiment of the present invention. Detailed Implementation
[0037] The present invention will be further described below with reference to the accompanying drawings and embodiments.
[0038] To address the aforementioned issues, this invention proposes a heterogeneous graph attribute completion method. This method can complete the attributes of heterogeneous graph nodes with missing attributes and utilizes a context-aware attention mechanism to uncover correlations between different nodes. This strategy effectively improves the accuracy of heterogeneous graphs with missing attributes in downstream tasks.
[0039] The main modules of the technical solution of this invention are as follows: the first part is a node embedding module, the second part is an attribute completion module, and the third part is a heterogeneous graph neural network module. In the first part, nodes are encoded using graph structure information and meta-path information. In the second part, missing attributes are filled in using context-aware attention. In the third part, the completed attributes are input into the heterogeneous graph neural network for downstream tasks. Each module will be described in detail in the following sections.
[0040] like Figure 1 As shown, this heterogeneous graph attribute completion method includes the following main steps:
[0041] Step 1: Given a graph G = (V, E, A, X), where V = {v0, v1, ..., v...} N} represents a set of N nodes, E represents a set of edges, and A∈R N×N The adjacency matrix of a graph, X∈R N×dThis is the node feature matrix, where d represents the node feature dimension. Since the graph's attributes are incomplete, nodes can be categorized into V... cp and V ms V cp It is a complete set of nodes with attributes, V ms It is a set of nodes with missing attributes, V cp The corresponding feature matrix is X cp The goal is to be able to set v∈V ms By combining known node attributes and graph structure features, missing attributes are completed. First, graph nodes are encoded using a meta-path-based random walk. Given a meta-path P: T1→T2→…→T l T here i It is node v i The sampling probability at step i is defined as follows:
[0042]
[0043] in Represents node v i-1 In the neighborhood, the type is equal to T i The number of nodes, and t(v i ) = T i It is a mapping function that maps node v i Map to its corresponding type.
[0044] Then, the Skip-gram algorithm is used for graph node encoding. The Skip-gram algorithm is a word vector representation learning algorithm for natural language processing. Its goal is to learn the vector representation of each word so that words with similar semantics are close together in the vector space. The sampled node sequence can be viewed as sentences, with each node being a word in the sentence. Its core idea is to learn the vector representation of a word by predicting the surrounding context words. Specifically, for a given center word, the Skip-gram algorithm selects some context words within a window around it and then attempts to predict the probability distribution of these context words. The training objective of the model is to maximize the probability of predicting context words while minimizing the probability of negatively sampled words. The objective function can be expressed as:
[0045]
[0046] Here, V is the set of nodes, β is the set of node types, and N is the set of nodes. F(v) It is the set of nodes of type F among the neighbors of node v, obtained through random walk sampling. A softmax function is used. Calculate the probability that u and v are neighbors. Finally, for each node v... i The obtained embedding is represented as Ei E cp E represents an embedding of a node with complete attributes. ms This indicates an embedding of a node with missing attributes.
[0047] Step 2: After obtaining the node encoding, context awareness will be used to complete the missing attributes. The query is the encoding matrix E of the nodes with missing attributes. ms The key is the encoding matrix E of a node with complete attributes. cp The value is the feature matrix X containing all attribute nodes. cp During attribute completion, a multi-head attention mechanism is used to establish relationships between nodes. Specifically, the similarity between the query and the key is used to calculate attention weights, which are then applied to the value matrix to obtain the completed attribute information. The formula is as follows:
[0048] X ac =MultiHead(E ms E cp ,X cp (3)
[0049] X new =[X cp |X ac (4)
[0050] By utilizing multi-head attention mechanisms, the encoding and feature information of known nodes can be used to complete the features of nodes with missing attributes in a relational manner, thereby improving the overall attribute completion effect.
[0051] Step 3: When the completed attribute X is obtained new Subsequently, for downstream tasks involving node classification using heterogeneous graph neural networks, the improved graph attention neural network is directly used for node classification because the meta-path information of the heterogeneous graph is already incorporated during the node embedding encoding process, effectively improving the model's speed. It's worth noting that since node embedding is performed offline and separately, the node embedding encoding process that incorporates meta-path information does not increase time complexity. This will be discussed in detail later.
[0052] The core idea of Graph Attention Neural Networks (GAT) is to compute attention weights between node pairs in the neighborhood of each node, and then use these weights as a weighted sum of the features of neighboring nodes to form a new node representation. Specifically, for node i and its neighbor node j, GAT computes the attention weight e between them. ij and normalize it to get w ij Then, for node i, GAT calculates the feature vectors h of all its neighboring nodes. jThrough w ij We perform a weighted summation to obtain a new eigenvector h of i. i More complex graph neural networks can be implemented by stacking multiple GAT layers. The specific formula is as follows:
[0053]
[0054] Several techniques are added to graph attention neural networks to handle heterogeneous graphs. First, the original graph attention mechanism is extended by incorporating edge type information into the attention computation. Specifically, at each layer, a d-dimensional embedding is assigned for each edge type, and the attention score is computed using the edge type embedding and the node embedding. The attention weights w are then updated. ij The calculation formula is as follows:
[0055]
[0056] Here, W r It is a learnable transformation matrix. This represents the characteristics of the edge between node i and node j in the l-th layer.
[0057] Secondly, to address the oversmoothing problem in graph neural networks, a residual connection is added between the node representations of different layers. The node aggregation formula for layer l+1 is shown below:
[0058]
[0059] here, It is a transformation matrix used when the dimensions represented by nodes in layer l and layer (l+1) are different.
[0060] At the same time, after obtaining the original attention score, residual connections are also added to the attention score:
[0061]
[0062] The hyperparameter β∈[0,1] is a scaling factor. It is the raw attention score. It's a new attention score.
[0063] In addition, L2 regularization is applied to the output embedding.
[0064]
[0065] Example:
[0066] This invention provides a novel method for attribute completion in heterogeneous graphs. It uses context-aware attention to complete missing attributes, thereby improving the performance of graphs with missing attributes in node classification. The specific process is as follows:
[0067] 1. Random walk sampling based on metapath
[0068] Given a citation graph containing four types of nodes: author, paper, term, and conference, with a total of 26,128 nodes and 239,566 edges, the target node to be classified is the author, which falls into four categories. However, attribute information for author nodes is missing. To address this missing attribute issue, three meta-paths are identified: author-paper-author, author-paper-term-paper-author, and author-paper-conference-paper-author. Then, sampling is performed using a probabilistic sampling formula based on the given meta-paths to obtain the sampling sequence.
[0069] 2. Graph node encoding
[0070] Next, the Skip-gram algorithm is used to encode the sampled nodes. By training the Skip-gram model, a 128-dimensional encoding vector E can be obtained for each node. i , is used to represent the characteristics of a node.
[0071] 3. Complete missing node attributes
[0072] After obtaining the node encodings, an attribute completion module is used to handle missing attributes. Specifically, the node encoding matrix with missing attributes, the node encoding matrix with complete attributes, and the node feature matrix are input into the attribute completion module. Context-aware attention is used within the attribute completion module to complete the missing attributes. Specifically, for all nodes in the graph, a feature matrix X of dimension N×d is generated. new Where N is the number of nodes and d is the feature dimension, with a dimension of 64, thus completing the completion of missing attributes. Then X new It is then fed into the subsequent graph neural network for downstream tasks.
[0073] 4. Downstream tasks
[0074] After completing the attribute completion, the completed feature matrix X is... new The data is fed into an improved graph attention neural network. Through the message propagation process in the graph attention neural network, a 64-dimensional feature vector is obtained for the target node. Then, this feature vector is mapped to four categories through a multilayer perceptron for node classification.
[0075] 5. Model Training
[0076] The entire training process was end-to-end. Experiments were conducted primarily on three different heterogeneous graph datasets: DBLP, ACM, and IMDB. For ACM and DBLP, the Adam optimizer was used with a learning rate of 1e-3, a weight_dacay of 1e-4, and 500 training epochs. The graph attention neural network had 2 layers, and the multi-head attention mechanism had 8 heads. For IMDB, the Adam optimizer was used with a learning rate of 1e-3, a weight_dacay of 2e-4, and 300 training epochs. The graph attention neural network had 5 layers, and the multi-head attention mechanism had 8 heads.
[0077] 6. Model Application
[0078] After the above training process, multiple models can be obtained. The optimal model (those with the best performance on the test set) is selected for application, using a heterogeneous graph as input. The parameters of the entire network model remain fixed; only the heterogeneous graph data needs to be input and propagated forward. The node encoding matrix is obtained sequentially, and then missing attributes are imputed. The completed graph is then fed into the heterogeneous graph neural network for node classification, directly yielding the classification results. The actual results are shown in the image. Figure 2 As shown, this method can efficiently complete heterogeneous graphs with missing attributes and achieve good classification results.
Claims
1. A heterogeneous graph attribute completion method based on efficient meta-path context-aware learning, characterized in that, Applied to paper citation graphs, which include four types of nodes: author, paper, terminology, and conference, the process involves the following steps: Step 1: Given a paper citation graph ,in represent A set of nodes The set representing the edges, The adjacency matrix of the graph. It is the node feature matrix. Represents the dimension of node features; node set Divided into and ,in It is a complete set of nodes with attributes. It is a collection of nodes with missing attributes. The corresponding feature matrix is ; Among them, the set of nodes with complete attributes Including papers, terms, and conferences, a set of nodes with missing attributes. Including the author; Graph nodes are encoded using meta-path-based random walks; given a meta-path... P : , It is a node The types; among which, the meta-paths given in the paper citation graph include: author-paper-author, author-paper-term-paper-author, and author-paper-conference-paper-author; No. i The sampling probability of a step is defined as: ; in Represents a node The type in the neighborhood is equal to The number of nodes, It is a mapping function that maps nodes Mapped to its corresponding type; Represents a node and There are edges between them; The Skip-gram model is used for graph node encoding; the sampled node sequence is treated as sentences, and each node represents a word in the sentence; given a center word, the Skip-gram model selects some context words within a window around it and then predicts the probability distribution of these context words; the training objective of the Skip-gram model is to maximize the probability of predicting context words while minimizing the probability of negatively sampled words; the objective function is expressed as: ; in, It is a collection of node types. It is a node The neighboring nodes of type The set of nodes is obtained through random walk sampling; using a softmax function calculate and The probability of being a neighbor; ultimately, for each node The obtained embedding is represented as ,use This indicates an embedding of a node with complete attributes. This indicates an embedding of a node with missing attributes; Step 2: After obtaining the node code, the next step is to use context awareness to complete the missing attributes; Define the query as the encoding matrix of nodes with missing attributes. The key is the encoding matrix of the node with complete attributes. The value is a feature matrix with complete attribute nodes. Attention weights are calculated using the similarity between the query and the key, and then these weights are applied to the value matrix to obtain the completed attribute information; the formula is as follows: ; ; Step 3: Obtain the complete attributes Subsequently, the Heterogeneous Graph Attention Neural Network (GAT) will be used for the downstream task of classifying author nodes in the paper citation graph. In the downstream task, the message propagation process of the GAT will generate feature vectors for author nodes in the paper citation graph, and these feature vectors will be mapped to preset categories through a multilayer perceptron to complete the node classification. The message propagation process using the Graph Attention Neural Network (GAT) involves generating feature vectors for author nodes in the paper citation graph, as shown below: Step 3-1: Calculate the attention weights between node pairs in the neighborhood of each node, and then use these weights as a weighted sum of the features of neighboring nodes to form a new node representation; For nodes and its neighboring nodes GAT calculates the attention weights between them. and normalize it to obtain For nodes GAT combines the feature vectors of all neighboring nodes. pass Perform a weighted summation to obtain New feature vector More complex graph neural networks can be achieved by stacking multiple GAT layers; the specific formula is as follows: ; Step 3-2: Extend the original graph attention mechanism by incorporating edge type information into the attention calculation; specifically, at each layer, assign an edge type to each edge. Dimensional embeddings are used, and attention scores are computed using edge-type embeddings and node embeddings; attention weights are updated. The calculation formula is as follows: ; here, It is a learnable transformation matrix. Indicates the first Layer nodes and nodes Features of the edges between them; Step 3-3: A residual connection was added to the node representations between different layers; The node aggregation formula for the layer is shown below: ; in, Is Layers and A transformation matrix used when layer nodes represent different dimensions; Steps 3-4: After obtaining the original attention score, add residual connections to the attention score: ; where the hyperparameters are scale factors, are original attention scores, are new attention scores; Steps 3-5: Apply L2 regularization to the output embedding: 。 2. The heterogeneous graph attribute completion method based on efficient meta-path context-aware learning according to claim 1, characterized in that, The model was trained using the Adam optimizer.