Heterogeneous network node representation learning method based on meta-path

A heterogeneous network and learning method technology, applied in the field of meta-path-based heterogeneous network node representation learning, can solve the problem of insufficient processing ability of complex heterogeneous network graphs, and achieve high classification accuracy

Pending Publication Date: 2019-12-10
EAST CHINA NORMAL UNIVERSITY
6 Cites 22 Cited by

AI-Extracted Technical Summary

Problems solved by technology

[0006] The technical problem to be solved by the present invention is how to store the rich semantic information and structural information contained in the heterogeneous network graph in the vecto...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Abstract

The invention provides a heterogeneous network node representation learning method based on a meta-path. According to the method, various node types and relationship types contained in a heterogeneousnetwork graph are considered, and rich semantic information and structure information in the network graph are stored in a plurality of meta-paths in a meta-path extraction mode. In each meta-path, the feature information of the nodes in the meta-path is saved by learning the vector representation of the nodes, and then the plurality of meta-paths are integrated together for common training, so that the semantic information and the structure information in the whole heterogeneous network are saved in the node vector representation. The method has higher classification accuracy, the feature information of the nodes in the heterogeneous network can be better stored in the vector representation of the nodes, the meta-paths can be freely selected according to specific target tasks, and the method is more flexible.

Application Domain

Technology Topic

Heterogeneous networkLearning methods +3

Image

  • Heterogeneous network node representation learning method based on meta-path
  • Heterogeneous network node representation learning method based on meta-path
  • Heterogeneous network node representation learning method based on meta-path

Examples

  • Experimental program(1)

Example Embodiment

[0073] Example
[0074] The above method is described in more detail through an embodiment below.
[0075] figure 1 As shown, the meta-path-based heterogeneous network node representation learning method of the present invention includes the following steps:
[0076] A: Construct a network architecture based on the heterogeneous information network diagram; then obtain multiple different types of meta-paths from the heterogeneous information network according to the network architecture; then mathematically quantify the meta-paths to obtain the matrix representation corresponding to each meta-path.
[0077] Specifically, step A is implemented by performing the following steps:
[0078] A1: Construct a heterogeneous information network data set, crawl movie description information, user comment data and a divided movie style system from existing movie review websites. Integrate the crawled data into a heterogeneous information network graph about movies, which contains multiple types of nodes and multiple types of relationships (edges); the association relationship between nodes can not only represent the heterogeneous network graph The network structure can also represent semantic information between nodes.
[0079] A2: According to the data set, extract all node types and relationship types to construct a network architecture, such as figure 2 Shown. Each node in the network architecture represents a node type, and each edge represents a relationship type. The movie data set contains 5 types of nodes, namely: user, movie, year, style, and keywords. The node types are interconnected to form a variety of relationship types, such as: user comments on movies, movie and style belonging, movie and keyword description relationship, movie and year release relationship, etc.
[0080] A3: Extract meta-paths from the heterogeneous network diagram according to the network architecture and mathematically quantify the acquired meta-paths to obtain a matrix representation of each meta-path.
[0081] Specifically, according to the directly connected comment relationship between the user and the movie, a one-hop distance meta-path L (MU) is taken out, where a user’s comment on a movie is regarded as a path instance 1, and the nodes contained in the node type in the meta-path The connection between the two constructs the adjacency matrix M L(MU) , If the value of any element in the matrix is ​​0, it means that the user has not commented on this movie; if the element in any position of the matrix is ​​not 0, it means that the user has evaluated the movie. Table 1 is the partial information of the adjacency matrix of the one-hop distance element path L (MU) in this embodiment.
[0082] Table 1 The adjacency matrix M of the element path L (MU) L(MU)
[0083]
[0084] Specifically, according to the K-hop distance meta-path L(UMGMU) formed by indirect connections between users who like movies of the same style, multiple distance meta-paths can be formed by splicing end-to-end in the order of nodes in the meta-path, that is, L(UMGMU)=L (UM)L(MG)L(GM)L(MU); its corresponding adjacency matrix M L(UMGMU) =M L(UM) ×M L(MG) ×M L(GM) ×M L(MU).
[0085] B: Based on the node information contained in a meta-path, obtain the conditional probability distribution represented by the target node vector according to the connection relationship between the target node in the meta-path and neighboring nodes; then obtain the meta-path based on the matrix representation of the meta-path The empirical probability distribution of the target node; by minimizing the distance between the two distributions, the node information in a meta-path is stored in the vector representation of the target node.
[0086] Specifically, step B is implemented by performing the following steps:
[0087] B1: In a meta-path L (MU), the nodes included in the path start node type M are defined as target nodes, and the nodes included in the path end node type U are defined as neighbor nodes of the target node. Target node Connect to one of the neighbor nodes through the meta path L (MU) The conditional probability is:
[0088]
[0089] B2: Represent M according to the matrix corresponding to the meta path L (MU) L(MU) , By the target node Connect to one of the neighbor nodes through the meta path L (MU) The empirical probability is:
[0090]
[0091] B3: Use KL divergence (or relative entropy) to calculate the distance between two probability distributions. By minimizing the KL divergence between the two probability distributions, the two probability distributions are as close as possible, so that the semantic information and structural information between nodes in the meta-path L (MU) are stored in the target node Vector representation of among:
[0092]
[0093] C: Select multiple meta-paths from the movie heterogeneous network graph to form a meta-path set, and integrate all meta-paths in the set for joint training to obtain the vector representation of the node. In the process of model training, edge sampling and negative sampling methods are used in conjunction with stochastic gradient descent algorithm to optimize the model.
[0094] Specifically, step C is implemented by performing the following steps:
[0095] C1: Select multiple meta-paths containing target nodes from the movie heterogeneous network to construct a meta-path set Γ(L), for example, select multiple meta-paths with movie (M) as the target node to construct a meta-path set Γ(L )={L(MU),L(MK),L(MUM),L(MKM)}; pass each meta path in the set through step B to obtain the vector representation function of the target node under the meta path; then set All meta-paths are integrated together for joint training, and the vector representation of the target node in the entire heterogeneous network is obtained by minimizing the vector representation function of all meta-path nodes:
[0096]
[0097] C2: For the conditional probability p(v j |v i; L) calculation, using negative sampling instead of softmax calculation method, according to the noise distribution of each meta-path, sampling multiple path instance negative examples to reduce the computational overhead, let logp(v j |v i;L) is approximately equal to:
[0098]
[0099] C3: In the process of model initialization, in the meta path set Γ(L), a corresponding type of negative sampling table is constructed according to the end node type of each meta path; in the model training process, first edge sampling is performed, and the meta path All the path instances contained in the set Γ(L) are regarded as edges, and the alias table is used to sample the edges according to the weight of the edges; then according to the meta-path type of the sampled path instances, the corresponding negative sampling table is selected; then according to the negative sampling table Select negative examples of K path examples to participate in stochastic gradient descent operation and update model parameters.
[0100] D: Obtained from the multi-path joint training model, the node that covers the entire heterogeneous network graph node represents a matrix U, and the vector in each row of the matrix represents a node in the corresponding network graph. Confirm the association between the learned movie node vector and its style category, and then divide the movie node into two parts, one is used as the training set to train the classifier model, and the other part is used as the test set. The node is determined to belong to the category through the classifier model, and then with The existing categories of the nodes are compared to judge whether the learned node representation vector saves the structural features and semantic information of the node in the original network graph. The existing style category information of the movie node is stored in the original movie heterogeneous information network data set.
[0101] The method of the present invention takes into account that the heterogeneous network graph contains multiple node types and relationship types, and adopts the method of extracting meta-paths so that the meta-paths contain different types of nodes and relations, thereby storing the rich semantic information and structural information in the network graph. Among multiple meta-paths; the present invention uses the vector of learning nodes in each meta-path to express and save the feature information of the node in the meta-path, and then integrates the multiple meta-paths for common training, thereby integrating the entire heterogeneous network Semantic information and structural information are stored in the node vector representation. Compared with the existing network representation learning methods, the method of the present invention has a higher classification accuracy on the node classification task, which also proves that the present invention can better store the characteristic information of the node in the heterogeneous network in the node. In the vector representation of; In addition, since the meta path can be freely selected according to the specific target task, the method is more flexible.
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Similar technology patents

Monocular vision-based road detection method

Owner:XI'AN INST OF OPTICS & FINE MECHANICS - CHINESE ACAD OF SCI

Classification and recommendation of technical efficacy words

Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products