A motif-based temporal academic hypergraph network reasoning method

By constructing a temporal academic hypergraph network and utilizing motif features and self-attention mechanisms, the problem that traditional models cannot represent higher-order relationships is solved, thereby improving the learning and inference accuracy of the academic network.

CN117350385BActive Publication Date: 2026-06-16CENT SOUTH UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
CENT SOUTH UNIV
Filing Date
2023-10-19
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Traditional temporal network models cannot effectively reflect the higher-order relationships in academic networks, resulting in insufficient learning of the network and poor inference and prediction results.

Method used

A temporal academic hypergraph network is constructed using a Motif mining algorithm. By embedding and aggregating Motif features, combined with hypergraph convolution and self-attention mechanisms, temporal features are extracted and used for academic network inference.

🎯Benefits of technology

The hypergraph representation method was optimized, the structural information of the academic network was preserved, the accuracy and information content of the academic network reasoning were improved, and the learning ability of the model was enhanced.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117350385B_ABST
    Figure CN117350385B_ABST
Patent Text Reader

Abstract

The application discloses a kind of Motif-based timing academic hypergraph network reasoning method, comprising the following steps: step S1: the hypergraph of timing academic network graph is constructed;Step S2: embedding and aggregating the Motif feature of timing academic network graph;Step S3: extract the timing feature of node;Step S4: academic network reasoning deduction: the timing feature of timing feature extraction layer output is decoded using feature decoder, and the final probability matrix is output, and the reasoning deduction of academic network is realized based on probability matrix and academic reasoning loss function.The advantage is that the Motif structure is introduced into the timing hypergraph link prediction in the application, the hypergraph structure representation is optimized, and the additional Motif information is increased to improve the prediction effect.In addition, the Motif embedding is integrated into the hypergraph convolution in the application, so that the process of information propagation is accompanied by Motif feature, the information amount of convolution process is increased, and the content of aggregated information is more perfect.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of academic network data processing technology, specifically to a temporal academic hypergraph network inference method based on Motif. Background Technology

[0002] Academic network data plays a crucial role in current scientific research. Academic networks encompass collaborations, citation relationships, and research directions among scholars. This information not only helps us gain a deeper understanding of disciplinary development trends but also provides strong support for the management of academic resources and research collaboration. For example, research institutions can use academic data for analysis and reasoning in areas such as strategic planning, discipline development, and talent selection, uncovering hidden academic collaborations and research trends to facilitate strategic planning and discipline development. Furthermore, research institutions can conduct comprehensive evaluations and rankings of scholars' research achievements, thereby enabling targeted selection and training of research talent. Therefore, how to efficiently and accurately utilize academic network data to extract multi-dimensional characteristics is a research topic that urgently needs to be addressed.

[0003] Academic networks typically exhibit temporal characteristics, and temporal sequence network models can effectively learn these features, thereby understanding the network's evolution rules. Traditional temporal sequence network models often rely on ordinary graphs for modeling. However, the node relationships in ordinary graphs cannot reflect higher-order relationships. For example, co-authors in a collaborator network cannot be represented in an ordinary graph based on binary relationships. Furthermore, traditional methods do not fully utilize network structural information. These issues result in practical network models lacking certain features, leading to insufficient learning and poor inference and prediction results.

[0004] In summary, there is an urgent need for a Motif-based temporal academic hypergraph network inference method to address the problems existing in current technologies. Summary of the Invention

[0005] The purpose of this invention is to provide a temporal academic hypergraph network inference method based on Motif, and the specific technical solution is as follows:

[0006] A temporal academic hypergraph network inference method based on Motif includes the following steps:

[0007] Step S1: Construct the hypergraph of the temporal academic network graph:

[0008] A discrete snapshot graph sequence is used to represent the time-series academic network graph. For each specific academic snapshot network, a Motif mining algorithm is used to calculate the number of each type of Motif in the network of each node, and the number of Motifs is obtained. Hyperedges are formed by nodes and points in the Motifs with that node as the vertex, and the snapshot is transformed into a hypergraph.

[0009] Step S2: Motif feature embedding and aggregation:

[0010] Motif features are obtained from the academic snapshot network. The motif features are input into the neural network to capture the nonlinear relationship between motif features and the number of motifs. A soft attention mechanism is used to process the motifs of all nodes, learn the embedding weights of each motif, and aggregate nodes, motifs and hyperedges based on the hypergraph convolution method to obtain node information.

[0011] Step S3: Temporal Feature Extraction:

[0012] The feature matrix of a node at a certain time is obtained based on the set of hyperedge points of that node. The feature matrix of the node at a certain time is input into the temporal feature extraction layer to obtain the final temporal feature representation of the node at that time. The final temporal feature representation is then embedded and aggregated with the surrounding motifs of the node to obtain the output of the temporal feature extraction layer.

[0013] Step S4: Academic Network Reasoning and Deduction:

[0014] A feature decoder is used to decode the output of the temporal feature extraction layer, and the final probability matrix is ​​output. The inference and deduction of the academic network are realized based on the probability matrix and the academic inference loss function.

[0015] Preferably, in step S1, the expression for the Motif storage structure is as follows:

[0016]

[0017] Among them, D k Represents node v k The number of various motifs in the network, where || represents concatenation. This indicates that for any node v in graph G k Motifm x , express The number of motifs, where n represents the number of motif types in the network graph.

[0018] Preferably, in step S2, the Motif feature expression is as follows:

[0019]

[0020] in, Representing edge e i Motif features; E i for edge e in i Features; F i1 and F i2 They are respectively edge ei Connected node v i1 and node v i2 Features; α i Representing edge e i Does it exist? If it does, the value is 1; otherwise, it is 0.

[0021] Preferably, in step S2, the non-linear relationship between Motif features and the number of Motifs is expressed by the following formula:

[0022]

[0023] in, This represents the non-linear relationship between the number of motif features and the number of motifs; ReLU represents the ReLU activation function, expressed as f(x) = max(0, x); w0, b0, w1, and b1 are learnable parameters.

[0024] Preferably, in step S2, the expression for the Motif embedding weights is as follows:

[0025]

[0026] in, Represents node v k Motif embedding weights; σ represents the activation function; For node v k Average embedding of all Motifs; For node v k The entire Motif embedding representation; f, w1, w2, b are learnable parameters.

[0027] Preferably, in step S2, the expression for the node information is as follows:

[0028]

[0029] Among them, X (l) The node information of the l-th hypergraph convolutional layer is represented by σ, where σ is the activation function; D v H is the node degree matrix; H is the hyperedge matrix; D is the node degree matrix. M Θ is the number of motifs; M is the number of nodes required for each node's motif embedding; Θ is the learnable parameter.

[0030] Preferably, in step S3, the specific process for obtaining the final expression of the temporal features is as follows:

[0031] The feature matrix of the time node is passed into three projection matrices W. Q W K W V ,get in, Let Q(t) represent the feature matrix of the node at that time. i ), K(t) i ), V(t) i ) represent the projection matrix W respectively. Q W K W V The output result;

[0032] A self-attention mechanism is used to generate temporal features at each snapshot time. The self-attention results are summed row-wise to obtain the temporal feature matrix, as shown in the following expression:

[0033]

[0034] in, represents the temporal feature matrix; attn represents the self-attention mechanism, which means the calculation process of the softmax function; T i This represents the row vector of the final matrix; d represents the scaling factor.

[0035] The temporal feature matrix is ​​input into the neural network to obtain the final representation of the temporal features at that moment, as shown in the following expression:

[0036]

[0037] Preferably, in step S3, the expression output by the temporal feature extraction layer is as follows:

[0038]

[0039] Among them, I j (t i () indicates the output of the temporal feature extraction layer; This represents the final representation of the temporal features of the Nth layer; This indicates the surrounding Motif embeddings of this node.

[0040] Preferably, in step S4, the final probability matrix expression is as follows:

[0041] P t =tanh(Relu(I(t)w+b));

[0042] Where I(t) is the final temporal representation of all nodes at time t, and w and b are both learnable parameters; P t Let P represent the probability matrix. t (i,j) represents the link probability between node i and node j at time t.

[0043] Preferably, in step S4, the expression for the academic inference loss function is as follows:

[0044]

[0045] Where Loss represents academic reasoning loss; A t Snapshot G at time t t The adjacency matrix; ||·||2 denotes the 2-norm.

[0046] The application of the technical solution of the present invention has the following beneficial effects:

[0047] (1) This invention introduces the Motif structure into the construction of hypergraphs, which solves the problem of structural information loss caused by previous hypergraph construction, optimizes the representation method of hypergraphs, and makes the modeling of higher-order relationships of hypergraphs include the structural features and interaction habit features contained in the Motif.

[0048] (2) This invention integrates Motif embedding into hypergraph convolution, improves and optimizes the representation of Motif, and preserves information such as the mutual habits of intersection nodes; optimizes the logic of the original hypergraph convolution, so that the information propagation process is accompanied by Motif features, thereby solving the shortcomings of learning hidden structural information, increasing the amount of information in the convolution process, and improving the accuracy of inference.

[0049] In addition to the objectives, features, and advantages described above, the present invention has other objectives, features, and advantages. The invention will now be described in further detail with reference to the figures. Attached Figure Description

[0050] The accompanying drawings, which form part of this application, are used to provide a further understanding of the invention. The illustrative embodiments of the invention and their descriptions are used to explain the invention and do not constitute an undue limitation of the invention. In the drawings:

[0051] Figure 1 This is a flowchart of the steps of the temporal academic hypergraph network inference method in a preferred embodiment of the present invention;

[0052] Figure 2 This is a schematic diagram of the Motif types in the Cora dataset in a preferred embodiment of the present invention;

[0053] Figure 3 These are schematic diagrams of the structures of undirected graph Motif(A) and directed graph Motif(B);

[0054] Figure 4 This is a schematic diagram of heterogeneous and homogeneous network structures;

[0055] Figure 5 This is a schematic diagram showing the differences between Motifm1 in homogeneous networks (A) and heterogeneous networks (B);

[0056] Figure 6 This is a diagram illustrating the types of motifs in a citation network;

[0057] Figure 7 This is a partial schematic diagram of the citation network. Detailed Implementation

[0058] Traditional temporal academic network modeling is usually based on binary relations in ordinary graphs. However, in reality, relations in academic networks are often not limited to binary relationships, making it difficult for ordinary graphs to represent the complex relations in academic networks. Using hypergraphs to model academic networks and leveraging hypergraph neural networks for aggregation and propagation learning of temporal and node information can effectively solve the problems of complex relation representation and temporal network link prediction. However, hypergraphs generally treat nodes inside the constructed hyperedges as connected, meaning the structure of nodes inside the hyperedges is not well represented. This leads to a lack of network structure information after solving the relation modeling problem.

[0059] This invention addresses the time-varying nature and high-order interaction modeling requirements of academic networks by proposing a motif-based hypergraph construction method. This method constructs a complete temporal academic network hypergraph by taking snapshots of the academic network at each time point. While the hypergraph itself contains high-order information, the node structure within its hyperedges is often neglected. Using motifs for hyperedge construction ensures the preservation of node structural information during hypergraph construction. This transforms the complexity of the temporal changes in academic networks into the simplicity of the hypergraph structure without losing structural information, preparing for subsequent feature extraction and hypergraph learning.

[0060] The embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, the present invention can be implemented in many different ways as defined and covered by the claims.

[0061] Example:

[0062] See Figure 1 This embodiment discloses a temporal academic hypergraph network inference method based on Motif, including the following steps:

[0063] Step S1: Construct the hypergraph of the temporal academic network graph:

[0064] This embodiment selects an undirected graph citation network dataset (specifically the Cora dataset, whose motif types are as follows). Figure 2The inference and deduction are performed using discrete snapshot graph sequences to represent the time-series academic network graph. For each specific academic snapshot network, a Motif mining algorithm (this embodiment uses a Motif mining algorithm in the prior art, which is not protected in this embodiment and will not be described in detail here) is used to calculate the number of various types of Motifs in the network of each node, and the number of Motifs is obtained. Hyperedges are formed by nodes and points in the Motifs with that node as the vertex, and the snapshot is transformed into a hypergraph.

[0065] Specifically, the expression for the Motif storage structure is as follows:

[0066]

[0067] Among them, D k Represents node v k The number of various motifs in the network, where || represents concatenation. This indicates that for any node v in graph G k of express The number of motifs, where n represents the number of motif types in the network graph.

[0068] Step S2: Motif feature embedding and aggregation:

[0069] The expression for obtaining motif features from the Academic Snapshot Network is as follows:

[0070]

[0071] in, Representing edge e i Motif features; E i for edge e in i Features; F i1 and F i2 They are respectively edge e i Connected node v i1 and node v i2 Features; α i Representing edge e i Does it exist? If it does, the value is 1; otherwise, it is 0.

[0072] Motif features are input into a neural network to capture the non-linear relationship between motif features and the number of motifs, as shown in the following expression:

[0073]

[0074] in, This represents the non-linear relationship between the number of motif features and the number of motifs; ReLU represents the ReLU activation function, expressed as f(x) = max(0, x); w0, b0, w1, and b1 are learnable parameters.

[0075] A soft attention mechanism is used to process the motifs of all nodes, learning the embedding weights of each motif, as shown in the following expression:

[0076]

[0077] in, Represents node v k Motif embedding weights; σ represents the activation function; For node v k Average embedding of all Motifs; For node v k The entire Motif embedding representation; f, w1, w2, b are learnable parameters.

[0078] The node information is obtained by aggregating nodes, motifs, and hyperedges using a hypergraph convolution method, as shown in the following expression:

[0079]

[0080] Among them, X (l) The node information of the l-th hypergraph convolutional layer is represented by σ, where σ is the activation function; D v Let D be the node degree matrix, where D is the degree matrix. v (n, n) represents the degree of node n; H is the hyperedge matrix, where H(n, m) = 1 indicates that node n is inside hyperedge m, and equal to 0 indicates that it is not inside hyperedge m; D M Let D be the number of motifs. M (n, m) represents the number of the m-th type of Motif for node n; M is the matrix of the number of nodes required for each node's Motif embedding, where each row is the node count of all Motif instances surrounding each node; Θ is a learnable parameter.

[0081] Step S3: Temporal Feature Extraction:

[0082] The feature matrix of a node at a given time is obtained from the set of hyperedge points of that node. The feature matrix of that node is then input into the temporal feature extraction layer to obtain the final temporal feature representation of that node.

[0083] The specific process of obtaining the final representation of temporal features is as follows:

[0084] The feature matrix of the time node is passed into three projection matrices W. Q W K WV ,get in, Let Q(t) represent the feature matrix of the node at that time. i ), K(t) i ), V(t) i ) represent the projection matrix W respectively. Q W K W V The output result;

[0085] It should be noted that the feature matrix of the node at this time point is represented as follows:

[0086]

[0087] X i (l) For node v i The features output by the l-th hypergraph convolutional layer for a given time t i node v j N(v) j ;t i )={v1,…,v n} is defined as node v j At time t i The set of superedge points it belongs to.

[0088] A self-attention mechanism is used to generate temporal features at each snapshot time. The self-attention results are summed row-wise to obtain the temporal feature matrix, as shown in the following expression:

[0089]

[0090] in, represents the temporal feature matrix; attn represents the self-attention mechanism, which means the calculation process of the softmax function; T i This represents the row vector of the final matrix; d represents the scaling factor.

[0091] The temporal feature matrix is ​​input into the neural network to obtain the final representation of the temporal features at that moment, as shown in the following expression:

[0092]

[0093] The final representation of the temporal features is aggregated with the surrounding motifs of the node to obtain the output of the temporal feature extraction layer, as shown in the following expression:

[0094]

[0095] Among them, I j (t i() indicates the output of the temporal feature extraction layer; This represents the final representation of the temporal features of the Nth layer; This indicates the surrounding Motif embeddings of this node.

[0096] Step S4: Academic Network Reasoning and Deduction:

[0097] A feature decoder is used to decode the output of the temporal feature extraction layer, and the final probability matrix is ​​output. The inference and deduction of the academic network are realized based on the probability matrix and the academic inference loss function.

[0098] The final probability matrix expression is as follows:

[0099] P t =tanh(Relu(I(t)w+b));

[0100] Where I(t) is the final temporal representation of all nodes at time t, and w and b are both learnable parameters; P t Let P represent the probability matrix. t (i,j) represents the link probability between node i and node j at time t.

[0101] The expression for the academic inference loss function is as follows:

[0102]

[0103] Where Loss represents academic reasoning loss; A t Snapshot G at time t t The adjacency matrix; ||·||2 denotes the 2-norm.

[0104] It should be noted that each motif is an independent substructure for a specific node, and each node has N different or identical motifs.

[0105] Specifically, the Motif structure includes:

[0106] (1) General Motif structure:

[0107] The construction of the academic hypergraph depends on the motif set M = {m1, m2, ..., m}. k In this embodiment, the specific motif used is a 3-motif, which is divided into undirected graph motifs and directed graph motifs for different graph structures, as shown in the following structure. Figure 3 As shown:

[0108] The total number of motif types in a directed graph is not fixed in a specific network. Specifically: ① In a citation network, if there are no cycles, m4 does not exist; ② If there are no bidirectional relationships in the network, only m1, m2, m3, m4, and m5 exist; conversely, if there are only bidirectional relationships, only m8 and m5 exist. 13 Yes, it exists. The above discussion is based on conventional directed network graphs. For network graphs generated with artificial constraints, such as constraints where all nodes have a degree of 1, it is sufficient to delete the motifs that do not conform to the rules.

[0109] (2) Heterogeneous and homogeneous network structures:

[0110] like Figure 4 As shown, academic social networks can be either homogeneous or heterogeneous. Collaborator networks and citation networks are both homogeneous networks, each containing only one type of relationship and node. However, these two types of networks combine to form heterogeneous networks, which contain two or more types of nodes and relationships.

[0111] (3) Heterogeneous and homogeneous network Motif structures:

[0112] The motifs of these two types of networks follow the same structure. Figure 1 While both definitions exist, there are differences. Specifically, with three nodes fixed, homogeneous network motifs, due to the homogeneity of edges and nodes, have no other derived motifs; heterogeneous network motifs, due to potentially different node relationships, have other derived motifs, such as... Figure 5 As shown, homogeneous networks It is sufficient that the left and right nodes of paper p are in a citation and cited relationship, respectively, while in heterogeneous networks... The relationship pointing to paper p is a writing relationship, while the relationship indicating paper p is a citation relationship, and its derivatives... Then, citation and recommendation relationships are required respectively.

[0113] This embodiment mainly uses citation networks as an example for detailed explanation.

[0114] Citation networks contain numerous citation relationships between authors, typically exhibiting recurring motifs. Because citation networks are homogeneous, directed graphs, and because publication dates exist sequentially, circular and bidirectional citations are absent. Therefore, the motif structures within citation networks resemble... Figure 6 As shown, the arrows point to the paper nodes that cite this paper.

[0115] Figure 7This is a partial graph of the citation network, containing some motif annotations. Motifs with node B1 as the main node include motif m5 formed by A1, B1, and C1, and motif m5 formed by C1, B1, and C2. Analysis of motif m5 shows that node B1 is the basis for both papers C1 and D1. D1's citation of C1 indicates that C1 and D1 share the same research field. A high number of motif m5s indicates that B1 appears frequently in the citation network as an important foundational paper in a particular research field. Different nodes in the network are surrounded by different types and numbers of motifs. Different motif structures reflect different characteristics, and the same motif reflects different information depending on its quantity. Traditional methods often directly utilize node and edge information in the network, neglecting the local structural information carried by motifs. This hidden information is necessary for reasoning about network evolution.

[0116] The above description is merely a preferred embodiment of the present invention and is not intended to limit the invention. Various modifications and variations can be made to the present invention by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.

Claims

1. A Motif-based temporal academic hypernetwork reasoning method, characterized in that, Includes the following steps: Step S1: Construct the hypergraph of the temporal academic network graph: A discrete snapshot graph sequence is used to represent the time-series academic network graph. For each specific academic snapshot network, a Motif mining algorithm is used to calculate the number of each type of Motif in the network of each node, and the number of Motifs is obtained. Hyperedges are formed by nodes and points in the Motifs with that node as the vertex, and the snapshot is transformed into a hypergraph. Step S2: Motif feature embedding and aggregation: Motif features are obtained from the academic snapshot network. The motif features are input into the neural network to capture the nonlinear relationship between motif features and the number of motifs. A soft attention mechanism is used to process the motifs of all nodes, learn the embedding weights of each motif, and aggregate nodes, motifs and hyperedges based on the hypergraph convolution method to obtain node information. Step S3: Temporal Feature Extraction The feature matrix of a node at a certain time is obtained based on the set of hyperedge points of that node. The feature matrix of the node at a certain time is input into the temporal feature extraction layer to obtain the final temporal feature representation of the node at that time. The final temporal feature representation is then embedded and aggregated with the surrounding motifs of the node to obtain the output of the temporal feature extraction layer. Step S4: Academic Network Reasoning and Deduction: A feature decoder is used to decode the output of the temporal feature extraction layer, and the final probability matrix is ​​output. The reasoning and deduction of the academic network is realized based on the probability matrix and the academic reasoning loss function. In step S2, the Motif feature expression is as follows: ; in, Representing an edge Motif features; For Motif The edge in Features; and They are the edges Connected nodes and nodes Features; Representing an edge Does it exist? If it does, the value is 1; otherwise, it is 0. Motif Quantity; In step S2, the expression for the Motif embedding weights is as follows: ; in, Represents a node Motif embedding weights; Indicates the activation function; , for nodes Average embedding of all Motifs; , for nodes The entire Motif embedding representation; , , , These are learnable parameters; In step S3, the specific process of obtaining the final representation of the temporal features is as follows: The feature matrix of the time node is passed into three projection matrices. , , ,get , , ,in, This represents the feature matrix of the node at that time. , , Representing the projection matrix respectively , , The output result; A self-attention mechanism is used to generate temporal features at each snapshot time. The self-attention results are summed row-wise to obtain the temporal feature matrix, as shown in the following expression: ; in, Represents the time-series feature matrix; attn This refers to the self-attention mechanism, which means that from softmax The calculation process of the function; This represents the row vector of the final obtained matrix; d Indicates the scaling factor; The temporal feature matrix is ​​input into the neural network to obtain the final temporal feature representation at that moment. The expression is as follows: ; in, w 0 , b 0 , w 1 and b 1 For learnable parameters, superscript This indicates the number of the temporal extraction layer.

2. The temporal academic hypergraph network reasoning method according to claim 1, characterized in that, In step S1, the expression for the Motif mining algorithm is as follows: ; in, Represents a node The number of various motifs on the internet Indicates a serial operation. This indicates that for any node in graph G Motif , Motif The number of motifs, where n represents the number of motif types in the network graph.

3. The temporal academic hypergraph network reasoning method according to claim 2, characterized in that, In step S2, the non-linear relationship between Motif features and the number of Motifs is expressed by the following formula: ; in, This represents the non-linear relationship between motif features and the number of motifs. ReLU This represents the ReLU activation function, expressed as follows: f ( x )= max ( 0 , x ); w 0 , b 0 , w 1 and b 1 These are learnable parameters.

4. The temporal academic hypergraph network reasoning method according to claim 3, characterized in that, In step S2, the expression for the node information is as follows: ; in, For the first Node information of hypergraph convolutional layers, For activation functions; This is the node degree matrix; It is a hyperedge matrix; This is a matrix containing the number of motifs. Embed the required number of nodes into the Motif of each node; These are learnable parameters.

5. The temporal academic hypergraph network reasoning method according to claim 4, characterized in that, In step S3, the expression output by the temporal feature extraction layer is as follows: ; in, This represents the output of the temporal feature extraction layer; Indicates the first N The temporal characteristics of the layer are ultimately represented; This indicates the surrounding Motif embeddings of this node.

6. The temporal academic hypergraph network inference method according to claim 5, characterized in that, In step S4, the final probability matrix expression is as follows: ; in, for t The final temporal characteristics of all nodes at any given time are represented as follows: w and b All are learnable parameters; Represents a probability matrix. express t Time Node i and nodes j The probability of linking.

7. The temporal academic hypergraph network reasoning method according to claim 6, characterized in that, In step S4, the expression for the academic inference loss function is as follows: ; in, Indicates academic reasoning loss; Snapshot at time t The adjacency matrix; This represents the 2-norm.