A method and system for updating entities in a knowledge graph

By calculating the attribute and relation similarity matrix between the old and new knowledge graphs and combining it with an improved graph attention network model, accurate alignment of entities in the fault diagnosis knowledge graph was achieved, improving update efficiency and accuracy.

CN115809340BActive Publication Date: 2026-06-30NARI INFORMATION & COMM TECH +3

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
NARI INFORMATION & COMM TECH
Filing Date
2022-08-29
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

In existing technologies, fault diagnosis knowledge graphs cannot accurately align entities during the update process, especially since the attribute information of entities is limited to their own nodes and cannot interactively learn with the domain structure, resulting in inaccurate entity alignment.

Method used

By calculating the name attribute similarity matrix and entity relationship similarity matrix between the new knowledge graph and the original knowledge graph, and combining the character embedding matrix and the improved graph attention network model, the entity relationship structure similarity is fused to achieve accurate entity alignment.

Benefits of technology

It improves the accuracy of entity alignment and the efficiency of knowledge graph updates, effectively solving the problem of inaccurate entity alignment.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115809340B_ABST
    Figure CN115809340B_ABST
Patent Text Reader

Abstract

This invention discloses a method and system for updating entities in a knowledge graph. The method involves acquiring a new knowledge graph and an original knowledge graph; calculating a name attribute similarity matrix based on the name attribute of the new and original knowledge graphs; calculating an entity relation similarity matrix based on the entity relation structure triples of the new and original knowledge graphs; fusing the name attribute similarity matrix and the entity relation similarity matrix to obtain the entities corresponding to the new knowledge graph; and updating the original knowledge graph with the entities corresponding to the new knowledge graph. Advantages: Based on research on multi-attention entity alignment, this invention proposes a method for entity alignment in the fault diagnosis domain knowledge graph that combines the name attribute of long texts with relation structure similarity calculation. A knowledge graph update tool was developed based on this method. Through case testing and practical use, the accuracy of entity alignment and the efficiency of knowledge graph updates have been effectively improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to a method and system for updating entities in a knowledge graph, belonging to the field of cloud data center diagnostic technology. Background Technology

[0002] With the development of knowledge graph technology in the field of intelligent operation and maintenance, technologies such as intelligent diagnosis, reasoning, and knowledge recommendation based on knowledge graphs have attracted much attention from researchers. Since the knowledge in the fault diagnosis knowledge graph needs to be updated according to the actual cloud data center topology and situation, an automatic knowledge graph update tool is needed to achieve the fusion of new knowledge with the original knowledge graph.

[0003] The cloud data center fault diagnosis knowledge graph belongs to the domain knowledge graph category. Fault diagnosis knowledge graphs are characterized by clear relational structures and a large amount of information in the entity NAME attribute. During knowledge updates, relying solely on calculating NAME attribute similarity cannot accurately align entities. Current entity alignment efforts focus on entity structure and attribute information, but the attribute information of an entity is limited to its own nodes and cannot interactively learn from the entity's domain structure. Summary of the Invention

[0004] The technical problem to be solved by the present invention is to overcome the defects of the prior art and provide a method and system for updating entities in a knowledge graph.

[0005] To address the aforementioned technical problems, this invention provides a method for updating entities in a knowledge graph, comprising:

[0006] Obtain the new knowledge graph and the original knowledge graph;

[0007] Calculate the name attribute similarity matrix based on the name attribute of the new knowledge graph and the original knowledge graph;

[0008] Calculate the entity relationship similarity matrix based on the entity relationship structure triples of the new knowledge graph and the original knowledge graph;

[0009] The name attribute similarity matrix and the entity relationship similarity matrix are fused to obtain the entities corresponding to the new knowledge graph, and the entities corresponding to the new knowledge graph are updated in the original knowledge graph.

[0010] Furthermore, the calculation of the entity attribute similarity matrix based on the name attribute of the new knowledge graph and the original knowledge graph includes:

[0011] Sentence S in the new knowledge graph is extracted by searching the character embedding matrix. Ai word vectors s Ai and word vector x Ai ;

[0012] Sentence S in the original knowledge graph is extracted by searching the character embedding matrix. Bi word vectors s Bi and word vector x Bi ;

[0013] According to sentence S Ai word vectors s Ai And sentence S Bi word vectors s Bi Calculate sentence S Ai Corresponding sentence S Bi Word vector similarity;

[0014] According to sentence S Ai word vector x Ai And sentence S Bi word vector x Bi Calculate sentence S Ai Corresponding sentence S Bi Word vector similarity;

[0015] Sentence S Ai Corresponding sentence S Bi Word vector similarity and sentence S Ai Corresponding sentence S Bi The average value of the sum of the word vector similarities is used to obtain the sentence S. Ai Corresponding sentence S Bi similarity S of the name attribute namei ;

[0016] Obtain the name attribute similarity between each sentence in the new knowledge graph and each sentence in the original knowledge graph, and obtain the name attribute similarity matrix.

[0017] Furthermore, the calculation of the entity relation similarity matrix based on the entity relation structure triples of the new knowledge graph and the original knowledge graph includes:

[0018] Obtain the entity relation structure triplet for each sentence in the new knowledge graph, input it into the entity relation structure similarity model pre-trained based on the relation structure triplet of the original knowledge graph, obtain the entity structure similarity for each sentence, and obtain the entity structure similarity matrix based on the entity structure similarity for each sentence.

[0019] Furthermore, the training process of the entity relation structure similarity model obtained by training the entity relation structure triples based on the original knowledge graph includes:

[0020] Construct the entity relationship structure similarity model to be trained, represented as:

[0021]

[0022]

[0023]

[0024] in, These represent the entity vectors that are the input and output of the l-th layer of the domain attention layer, respectively. This represents the entity vector input to the l-th layer of the domain attention layer, containing entity e. i and all its neighbors; σ represents the sigmoid activation function; N i Represents entity e i The connected set of entities, e j Represents entity e i and all its neighbors, e k Represents entity e i All neighbors; This represents the entity domain attention coefficient after normalization at layer l; Represents entity e i The result of information fusion with neighbor j; Represents entity e i The result of information fusion with neighbor k; exp() represents the exponential function with the natural constant e as the base; LeakyReLU() represents the activation function; u∈R 2d(l+1)×1 and W (l) ∈R d(l+1)×d(l) It is a learnable parameter matrix; d(l) represents the network embedding dimension of the l-th layer; d(l+1) represents the network embedding dimension of the (l+1)-th layer; the superscript T indicates matrix transpose;

[0025] Construct a pre-aligned set of entity seeds and positive / negative instance triples;

[0026] Construct a loss function L for entity alignment used to train an entity relationship structural similarity model. A , is represented as:

[0027] L A =L0+L a

[0028]

[0029] Among them, L a Let L0 represent the entity alignment loss function of the entity relationship structure similarity model, and L0 represent the orthogonalization loss function of the parameter matrix W. The nearest neighbor sampling method NS(e) is used to construct the negative sample set e_ of entity ee and the negative sample set e'_ of entity ee', a neighboring entity of entity ee. d(·,·)=1-cos(·,·) represents the cosine distance between entities; [·] + =max{·,0}; γ is a hyperparameter;

[0030]

[0031] Among them, W (l) This represents the parameter matrix of the l-th layer; m is the number of embedding layers in the attention network. This represents the operation of finding the 2-norm of a matrix and squaring it;

[0032] Construct a loss function L for the relation structure used to train the entity relation structure similarity model. R , is represented as:

[0033]

[0034] Where f(h,r,t)=||h+rt||2 represents the scoring function for the relation triple (h,r,t), used to calculate the confidence of the relation triple, h,t are the head and tail entity vectors from the global structural embedding layer, r is the relation vector to be modeled and learned; γ' is a hyperparameter; T1 represents the set of positive triples, T1' (h,r,t) ={(h',r,t)|h'∈E}∪{(h,r,t')|t'∈E} represents the set of negative example triples, h' represents the head entity of the negative example global structure embedding layer, t' represents the tail entity of the negative example global structure embedding layer, and E represents the set containing all negative example entities;

[0035] The entity alignment loss function L is trained using a pre-aligned entity seed set and positive / negative example triples, respectively. A Loss function L of relation structure R The final model parameters of the entity relationship structure similarity model are determined, and the entity relationship structure similarity model is updated based on the final model parameters to obtain the trained entity relationship structure similarity model.

[0036] Furthermore, the fusion of the name attribute similarity matrix and the entity relationship similarity matrix includes:

[0037] Standardize the name attribute similarity matrix and the entity relationship similarity matrix respectively;

[0038] The final entity similarity sentence is obtained by summing the standardized name attribute similarity matrix and the entity relationship similarity matrix.

[0039] A knowledge graph entity update system, comprising:

[0040] The acquisition module is used to acquire both the new knowledge graph and the original knowledge graph.

[0041] The first calculation module is used to calculate the name attribute similarity matrix based on the name attribute of the new knowledge graph and the original knowledge graph;

[0042] The second calculation module is used to calculate the entity relationship similarity matrix based on the entity relationship structure triples of the new knowledge graph and the original knowledge graph.

[0043] The update module is used to fuse the name attribute similarity matrix and the entity relationship similarity matrix to obtain the entity corresponding to the new knowledge graph, and update the entity corresponding to the new knowledge graph into the original knowledge graph.

[0044] Furthermore, the first computing module is used for

[0045] Sentence S in the new knowledge graph is extracted by searching the character embedding matrix. Ai word vectors s Ai and word vector x Ai ;

[0046] Sentence S in the original knowledge graph is extracted by searching the character embedding matrix. Bi word vectors s Bi and word vector x Bi ;

[0047] According to sentence S Ai word vectors s Ai And sentence S Bi word vectors s Bi Calculate sentence S Ai Corresponding sentence S Bi Word vector similarity;

[0048] According to sentence S Ai word vector x Ai And sentence S Bi word vector x Bi Calculate sentence S Ai Corresponding sentence S Bi Word vector similarity;

[0049] Sentence S Ai Corresponding sentence S Bi Word vector similarity and sentence S Ai Corresponding sentence S Bi The average value of the sum of the word vector similarities is used to obtain the sentence S. Ai Corresponding sentence S Bi similarity S of the name attribute namei ;

[0050] Obtain the name attribute similarity between each sentence in the new knowledge graph and each sentence in the original knowledge graph, and obtain the name attribute similarity matrix.

[0051] Furthermore, the second computing module is used for

[0052] Obtain the entity relation structure triplet for each sentence in the new knowledge graph, input it into the entity relation structure similarity model pre-trained based on the relation structure triplet of the original knowledge graph, obtain the entity structure similarity for each sentence, and obtain the entity structure similarity matrix based on the entity structure similarity for each sentence.

[0053] Furthermore, the second module is used for

[0054] Construct the entity relationship structure similarity model to be trained, represented as:

[0055]

[0056]

[0057]

[0058] in, These represent the entity vectors that are the input and output of the l-th layer of the domain attention layer, respectively. This represents the entity vector input to the l-th layer of the domain attention layer, containing entity e. i and all its neighbors; σ represents the sigmoid activation function; N i Represents entity e i The connected set of entities, e j Represents entity e i and all its neighbors, e k Represents entity e i All neighbors; This represents the entity domain attention coefficient after normalization at layer l; Represents entity e i The result of information fusion with neighbor j; Represents entity e i The result of information fusion with neighbor k; exp() represents the exponential function with the natural constant e as the base; LeakyReLU() represents the activation function; u∈R 2d(l+1)×1 and W (l) ∈R d(l+1)×d(l) It is a learnable parameter matrix; d(l) represents the network embedding dimension of the l-th layer; d(l+1) represents the network embedding dimension of the (l+1)-th layer; the superscript T indicates matrix transpose;

[0059] Construct a pre-aligned set of entity seeds and positive / negative instance triples;

[0060] Construct a loss function L for entity alignment used to train an entity relationship structural similarity model. A , is represented as:

[0061] L A =L0+La

[0062]

[0063] Among them, L a Let L0 represent the entity alignment loss function of the entity relationship structure similarity model, and L0 represent the orthogonalization loss function of the parameter matrix W. The nearest neighbor sampling method NS(e) is used to construct the negative sample set e_ of entity ee and the negative sample set e'_ of entity ee', a neighboring entity of entity ee. d(·,·)=1-cos(·,·) represents the cosine distance between entities; [·] + =max{·,0}; γ is a hyperparameter;

[0064]

[0065] Among them, W (l) This represents the parameter matrix of the l-th layer; m is the number of embedding layers in the attention network. This represents the operation of finding the 2-norm of a matrix and squaring it;

[0066] Construct a loss function L for the relation structure used to train the entity relation structure similarity model. R , is represented as:

[0067]

[0068] Where f(h,r,t)=||h+rt||2 represents the scoring function for the relation triple (h,r,t), used to calculate the confidence of the relation triple, h,t are the head and tail entity vectors from the global structural embedding layer, r is the relation vector to be modeled and learned; γ' is a hyperparameter; T1 represents the set of positive triples, T1' (h ,r,t)={(h',r,t)|h'∈E}∪{(h,r,t')|t'∈E} represents the set of negative example triples, h' represents the head entity of the negative example global structure embedding layer, t' represents the tail entity of the negative example global structure embedding layer, and E represents the set containing all negative example entities;

[0069] The entity alignment loss function L is trained using a pre-aligned entity seed set and positive / negative example triples, respectively. A Loss function L of relation structure R The final model parameters of the entity relationship structure similarity model are determined, and the entity relationship structure similarity model is updated based on the final model parameters to obtain the trained entity relationship structure similarity model.

[0070] Furthermore, the update module is used for

[0071] Standardize the name attribute similarity matrix and the entity relationship similarity matrix respectively;

[0072] The final entity similarity sentence is obtained by summing the standardized name attribute similarity matrix and the entity relationship similarity matrix.

[0073] A computer-readable storage medium storing one or more programs, said one or more programs including instructions that, when executed by a computing device, cause the computing device to perform any of the methods described.

[0074] A computing device, comprising,

[0075] One or more processors, a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, and the one or more programs include instructions for performing any of the methods described.

[0076] The beneficial effects achieved by this invention are as follows:

[0077] Building upon research on multi-attention entity alignment, this paper proposes an entity alignment method for knowledge graphs in the fault diagnosis domain that combines the similarity calculation of the name attribute and relational structure in long texts. Based on this method, a knowledge graph update tool was developed. Through case testing and practical application, the accuracy of entity alignment and the efficiency of knowledge graph updates were effectively improved. Attached Figure Description

[0078] Figure 1 This is a flowchart illustrating the present invention;

[0079] Figure 2 This is a schematic diagram of the overall framework of the entity update method for the knowledge graph of this invention. Detailed Implementation

[0080] The present invention will be further described below with reference to the accompanying drawings. The following embodiments are only used to more clearly illustrate the technical solution of the present invention, and should not be used to limit the scope of protection of the present invention.

[0081] like Figure 1 As shown, this invention discloses a method for updating entities in a knowledge graph, comprising:

[0082] Obtain the new knowledge graph and the original knowledge graph;

[0083] Calculate the name attribute similarity matrix based on the name attribute of the new knowledge graph and the original knowledge graph;

[0084] Calculate the entity relationship similarity matrix based on the entity relationship structure triples of the new knowledge graph and the original knowledge graph;

[0085] The name attribute similarity matrix and the entity relationship similarity matrix are fused to obtain the entities corresponding to the new knowledge graph, and the entities corresponding to the new knowledge graph are updated in the original knowledge graph.

[0086] Figure 2 The diagram illustrates the framework of the entity update method for knowledge graphs. The structural channel transforms the structural features contained in the relation triples of entities into graph entity feature vectors, and then calculates the entity similarity matrix through similarity calculation. The attribute channel, based on the `name` attribute, uses cosine similarity to calculate the entity attribute similarity matrix. Finally, by fusing structural and attribute similarity, the entity corresponding to the new knowledge is obtained. After manual review, the new knowledge is updated into the original graph, modifying or adding attributes and relational content to the original graph.

[0087] The method of the present invention specifically includes the following processes:

[0088] The first step is vector generation. The model proposed in this invention fuses the word segmentation feature vectors with character vectors. The model's input is a sentence and all its self-matching words. A character's self-matching word refers to a word containing that character. Let s = {c1, c2, ... c...} n Let} represent this sentence, where c i Let x represent the i-th character in the sentence. Each character in the sentence is represented as a vector x by looking up the character embedding matrix. i :

[0089] x i =e c (c i )

[0090] Where e c It is a character embedding lookup table, c i The characters in the sentence.

[0091] The model uses a word segmentation tool to segment sentences and labels the data in the training set to construct word segmentation features, resulting in character vector representations containing word boundary information:

[0092]

[0093] Where x i s represents the character vector corresponding to the character. i This represents the word feature vector corresponding to the character. This represents vector concatenation, c i This represents the fused word vector representation;

[0094] The second step is the generation of self-matching word vectors. To represent the semantic information of words, vector representations of self-matching words are obtained. The words that can be matched in the input sentence of the model are represented as l={z1,z2,...z m By searching the pre-trained word embedding matrix, each word is represented as a semantic vector z. i :

[0095] z i =e w (l i )

[0096] Finally, the character vectors and word vectors are concatenated to obtain the final output representation of the embedding layer:

[0097] Node f =[v1,v2,....v n ] = [c1,c2,....,z1,z2,....z m ]

[0098] Among them, v i c represents the final word vector representation. i For word vector representation, z i It is a self-matching word vector representation;

[0099] The third step is attribute similarity calculation. After converting the entity's `name` attribute into character vectors and word vectors, the similarity between two `name` attributes is calculated based on cosine similarity. The closer the value is to 1, the closer the angle between the two vectors is to 0, meaning the two vectors are more similar. The calculation formula is as follows:

[0100]

[0101] Among them, Node A and Node B Let represent the word vectors of the 'name' attribute of the two entities to be matched. Based on the above formula, the attribute similarity between the entity to be updated and the names of each entity in the original graph can be obtained.

[0102] To avoid the influence of numerous repetitive technical terms on the similarity calculation, Jaccard similarity is used to calculate the similarity between two name attributes. The calculation formula is as follows:

[0103]

[0104] NAME A and NAME B This indicates two NAME attributes to be matched. and Represents NAME A and NAME BThe i-th character in the expression, #|{…}∩{…}| represents the number of elements in the intersection of two sets, and #|{…}∪{…}| represents the number of elements in the union of two sets.

[0105] The similarity of the NAME attribute is defined as the mean of the cosine similarity and the Jaccard similarity:

[0106]

[0107] The fourth step is to improve GAT. Given a sequence of relation triples for input entities, following the GAT approach, the attention coefficient between each entity and its neighboring entities is calculated, and this coefficient serves as the weight for aggregating the features of neighboring entities. The computation of entity embeddings learned using relation structures employs GAT:

[0108]

[0109]

[0110]

[0111] in, N represents the entity vectors of the input and output of the l-th layer of the domain attention layer, respectively; i Represents entity e i A collection of connected entities; Represents the entity domain attention coefficient after normalization at layer l; u∈R 2d(l+1)×1 and W (l) ∈R d(l +1)×d(l) It is a learnable parameter matrix; d (l) Let W represent the embedding dimension of the l-th layer of the network. Inspired by the relationship between words, the traditional GAT model is improved by orthogonalizing the transformation matrix W and learning the orthogonalization loss of W. The aim is to maintain the relative distribution between entities in the embedding layer and the transformation process of the graph attention network, thus preserving more realistic entity structure information. The formula for calculating the orthogonalization loss of the parameter matrix W is:

[0112]

[0113] Among them: W (l) This represents the parameter matrix of the l-th layer; m is the number of embedding layers in the attention network. This represents the operation of finding the 2-norm of a matrix and squaring it;

[0114] Step 5: Alignment Impairment. Addressing the above requirements, a loss function is trained based on the pre-aligned entity seed set (the entity seed set is a collection containing all entities in the knowledge graph; a positive triplet contains an entity, its neighboring entities, and their relationships. For a positive triplet, a negative triplet is generated by replacing the head and tail entities with the same relationship). Entities from New_KG and Origin_KG are embedded into the same vector space. During training, the model parameters in the attribute attention layer and domain attention layer are updated to obtain the updated entity feature vectors. Based on the entity feature vectors from New_KG and Origin_KG, the entity similarity matrix S under the attribute channel is obtained. The loss function for training entity alignment using the attribute channel is:

[0115] L A =L0+L a

[0116]

[0117] Where: NS(e) represents the negative sample set of entity e, and the nearest neighbor sampling method is used to construct the negative sample set of entity e in this invention; d(·,·)=1-cos(·,·) represents the cosine distance between entities; [·] + =max{·,0}; is a hyperparameter;

[0118] The sixth step is relation structure modeling. This invention employs the TransE model to calculate entity relation structure similarity. Given a relation triple (h, r, t), the model is trained based on the entity vectors after global structural embedding, ensuring h + r ≈ t. This model further constrains the embedding representations of the head and tail entities while modeling the relation structure.

[0119] The formula for calculating the loss function of the training relation structure part is:

[0120]

[0121] Where: f(h,r,t)=||h+rt||2 represents the scoring function for the triplet, used to calculate the confidence of the triplet; h,t are the head and tail entity vectors from the global structural embedding layer; r is the relation vector to be modeled and learned; is a hyperparameter; T1 represents the set of positive triplets, T1' (h,r,t)={(h', r, t)|h' ∈ E} ∪ {(h, r, t')|t' ∈ E} represents the negative example triple set, which is constructed by randomly replacing the head and tail entities with the same relation type. During model training, positive example triples are given lower scores, negative example triples are given higher scores, and the two are distinguished by the maximum margin. Entities incorporate the relational structure during the embedding process, improving the model's ability to distinguish entities. Based on the pre-aligned entity seed set and positive / negative example triples, the alignment and relational structure modeling losses are trained simultaneously to update the model parameters in the global structure embedding layer and the local semantic optimization layer. The influence of the topological structure and the relational structure on entity embedding is learned based on the entity alignment task. The updated entity feature vectors are obtained, and the entity similarity matrix S under the structure channel is obtained. The calculation formula for the loss function of training entity alignment in the structure channel is:

[0122] L s = L A + L R

[0123] Through the above relational structure similarity modeling, the calculation of entity structure similarity can be realized, and the structure similarity matrix S is obtained. relation .

[0124] Step 7, fusion layer. The name attribute similarity matrix and the structure relation similarity matrix S are obtained from the attribute channel and the structure channel respectively. relation . First, the two matrices are normalized to eliminate the influence of different feature dimensions of entities. During the fusion process, their weights are the same, and the final entity similarity is the mean of the two. The similarity mean normalization and fusion formulas are as follows:

[0125]

[0126]

[0127]

[0128] The following embodiments are only used to more clearly illustrate the technical solutions of the present invention and cannot be used to limit the protection scope of the present invention.

[0129] Based on the method in the present invention, under the condition of considering both the structure similarity and the attribute name similarity calculation, bringing in the structure relation <C_new1, cause, application bug> of the target entity, the similarity calculation result is 0.75, and the recognition degree of similar entities is greatly improved. Bringing in the relational structure <C_new2, cause, communication interruption> of C_new2, the calculation result is 0.51. It can be seen that the difference in the similarity calculation results between the target entity and other irrelevant cause entities is significantly enlarged.

[0130] Accordingly, the present invention also provides an entity update system for a knowledge graph, comprising:

[0131] The acquisition module is used to acquire both the new knowledge graph and the original knowledge graph.

[0132] The first calculation module is used to calculate the name attribute similarity matrix based on the name attribute of the new knowledge graph and the original knowledge graph;

[0133] The second calculation module is used to calculate the entity relationship similarity matrix based on the entity relationship structure triples of the new knowledge graph and the original knowledge graph.

[0134] The update module is used to fuse the name attribute similarity matrix and the entity relationship similarity matrix to obtain the entity corresponding to the new knowledge graph, and update the entity corresponding to the new knowledge graph into the original knowledge graph.

[0135] Furthermore, the first computing module is used for

[0136] Sentence S in the new knowledge graph is extracted by searching the character embedding matrix. Ai word vectors s Ai and word vector x Ai ;

[0137] Sentence S in the original knowledge graph is extracted by searching the character embedding matrix. Bi word vectors s Bi and word vector x Bi ;

[0138] According to sentence S Ai word vectors s Ai And sentence S Bi word vectors s Bi Calculate sentence S Ai Corresponding sentence S Bi Word vector similarity;

[0139] According to sentence S Ai word vector x Ai And sentence S Bi word vector x Bi Calculate sentence S Ai Corresponding sentence S Bi Word vector similarity;

[0140] Sentence S Ai Corresponding sentence S Bi Word vector similarity and sentence S Ai Corresponding sentence S Bi The average value of the sum of the word vector similarities is used to obtain the sentence S. Ai Corresponding sentence S Bi similarity S of the name attributenamei ;

[0141] Obtain the name attribute similarity between each sentence in the new knowledge graph and each sentence in the original knowledge graph, and obtain the name attribute similarity matrix.

[0142] Furthermore, the second computing module is used for

[0143] Obtain the entity relation structure triplet for each sentence in the new knowledge graph, input it into the entity relation structure similarity model pre-trained based on the relation structure triplet of the original knowledge graph, obtain the entity structure similarity for each sentence, and obtain the entity structure similarity matrix based on the entity structure similarity for each sentence.

[0144] Furthermore, the second module is used for

[0145] Construct the entity relationship structure similarity model to be trained, represented as:

[0146]

[0147]

[0148]

[0149] in, These represent the entity vectors that are the input and output of the l-th layer of the domain attention layer, respectively. This represents the entity vector input to the l-th layer of the domain attention layer, containing entity e. i and all its neighbors; σ represents the sigmoid activation function; N i Represents entity e i The connected set of entities, e j Represents entity e i and all its neighbors, e k Represents entity e i All neighbors; This represents the entity domain attention coefficient after normalization at layer l; Represents entity e i The result of information fusion with neighbor j; Represents entity e i The result of information fusion with neighbor k; exp() represents the exponential function with the natural constant e as the base; LeakyReLU() represents the activation function; u∈R 2d(l+1)×1 and W (l) ∈R d(l+1)×d(l) It is a learnable parameter matrix; d(l) represents the network embedding dimension of the l-th layer; d(l+1) represents the network embedding dimension of the (l+1)-th layer; the superscript T indicates matrix transpose;

[0150] Construct a pre-aligned set of entity seeds and positive / negative instance triples;

[0151] Construct a loss function L for entity alignment used to train an entity relationship structural similarity model. A , is represented as:

[0152] L A =L0+L a

[0153]

[0154] Among them, L a Let L0 represent the entity alignment loss function of the entity relationship structure similarity model, and L0 represent the orthogonalization loss function of the parameter matrix W. The nearest neighbor sampling method NS(e) is used to construct the negative sample set e_ of entity ee and the negative sample set e'_ of entity ee', a neighboring entity of entity ee. d(·,·)=1-cos(·,·) represents the cosine distance between entities; [·] + =max{·,0}; γ is a hyperparameter;

[0155]

[0156] Among them, W (l) This represents the parameter matrix of the l-th layer; m is the number of embedding layers in the attention network. This represents the operation of finding the 2-norm of a matrix and squaring it;

[0157] Construct a loss function L for the relation structure used to train the entity relation structure similarity model. R , is represented as:

[0158]

[0159] Where f(h,r,t)=||h+rt||2 represents the scoring function for the relation triple (h,r,t), used to calculate the confidence of the relation triple, h,t are the head and tail entity vectors from the global structural embedding layer, r is the relation vector to be modeled and learned; γ' is a hyperparameter; T1 represents the set of positive triples, T 1'(h,r,t) ={(h',r,t)|h'∈E}∪{(h,r,t')|t'∈E} represents the set of negative example triples, h' represents the head entity of the negative example global structure embedding layer, t' represents the tail entity of the negative example global structure embedding layer, and E represents the set containing all negative example entities;

[0160] The entity alignment loss function L is trained using a pre-aligned entity seed set and positive / negative example triples, respectively. A Loss function L of relation structure RThe final model parameters of the entity relationship structure similarity model are determined, and the entity relationship structure similarity model is updated based on the final model parameters to obtain the trained entity relationship structure similarity model.

[0161] Furthermore, the update module is used for

[0162] Standardize the name attribute similarity matrix and the entity relationship similarity matrix respectively;

[0163] The final entity similarity sentence is obtained by summing the standardized name attribute similarity matrix and the entity relationship similarity matrix.

[0164] Accordingly, the present invention also provides a computer-readable storage medium for storing one or more programs, said one or more programs including instructions that, when executed by a computing device, cause the computing device to perform any of the methods described.

[0165] Accordingly, the present invention also provides a computing device, comprising,

[0166] One or more processors, a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, and the one or more programs include instructions for performing any of the methods described.

[0167] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0168] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0169] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0170] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0171] The above description is only a preferred embodiment of the present invention. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the technical principles of the present invention, and these improvements and modifications should also be considered within the scope of protection of the present invention.

Claims

1. A method for updating entities in a knowledge graph, characterized in that, include: Obtain the new knowledge graph and the original knowledge graph; Calculate the name attribute similarity matrix based on the name attribute of the new knowledge graph and the original knowledge graph; The entity relation similarity matrix is ​​calculated based on the entity relation structure triples of the new knowledge graph and the original knowledge graph. This includes: obtaining the entity relation structure triples of each sentence in the new knowledge graph, inputting them into the entity relation structure similarity model pre-trained based on the relation structure triples of the original knowledge graph, obtaining the entity structure similarity of each sentence, and obtaining the entity structure similarity matrix based on the entity structure similarity of each sentence. Loss function for entity alignment used to train entity relationship structural similarity models Represented as: ; ; in, This represents the entity alignment loss function in the entity relationship structural similarity model. Represents the parameter matrix Orthogonalize the loss function and use the nearest neighbor sampling method. Construct a negative sample set e_ for entity e, and a negative sample set e'_ for entity e'', a neighboring entity of entity e; Represents the cosine distance between entities; ; It's a hyperparameter; seed Represents the entity seed set; ; in, This represents the parameter matrix of the l-th layer; m is the number of embedding layers in the attention network. This represents the operation of finding the 2-norm of a matrix and squaring it; The loss function for training the entity relationship structure similarity model. Represented as: ; in, This represents the scoring function for the relation triple (h,r,t), used to calculate the confidence of the relation triple. h and t are the head and tail entity vectors from the global structure embedding layer, and r is the relation vector to be modeled and learned. It is a hyperparameter; T1 represents the set of positive triples. Represents the set of negative triples. This represents the head entity of the negative example global structure embedding layer. This represents the tail entity of the negative example global structure embedding layer. This represents a set containing all negative instances. The name attribute similarity matrix and the entity relationship similarity matrix are fused to obtain the entities corresponding to the new knowledge graph, and the entities corresponding to the new knowledge graph are updated in the original knowledge graph.

2. The entity update method for a knowledge graph according to claim 1, characterized in that, The calculation of the entity attribute similarity matrix based on the name attribute of the new knowledge graph and the original knowledge graph includes: Sentence S in the new knowledge graph is extracted by searching the character embedding matrix. Ai word vectors s Ai and word vector x Ai ; Sentence S in the original knowledge graph is extracted by searching the character embedding matrix. Bi word vectors s Bi and word vector x Bi ; According to sentence S Ai word vectors s Ai And sentence S Bi word vectors s Bi Calculate sentence S Ai Corresponding sentence S Bi Word vector similarity; According to sentence S Ai word vector x Ai And sentence S Bi word vector x Bi Calculate sentence S Ai Corresponding sentence S Bi Word vector similarity; Sentence S Ai Corresponding sentence S Bi Word vector similarity and sentence S Ai Corresponding sentence S Bi The average value of the sum of the word vector similarities is used to obtain the sentence S. Ai Corresponding sentence S Bi similarity S of the name attribute namei ; Obtain the name attribute similarity between each sentence in the new knowledge graph and each sentence in the original knowledge graph, and obtain the name attribute similarity matrix.

3. The entity update method for a knowledge graph according to claim 1, characterized in that, The training process of the entity relation structure similarity model obtained by training the entity relation structure triples based on the original knowledge graph includes: Construct the entity relationship structure similarity model to be trained, represented as: ; ; ; in, , Let represent the entity vectors of the input and output of the l-th layer of the domain attention layer, respectively. This represents the entity vector input to the l-th layer of the domain attention layer, containing entities. and all its neighbors; This represents the activation function sigmoid; Representing entities A collection of connected entities Representing entities and all its neighbors, Representing entities All neighbors; This represents the entity domain attention coefficient after normalization at layer l; Representing entities The result of information fusion with neighbor j; Representing entities The result of information fusion with neighbor k; This represents an exponential function with the natural constant e as its base. Indicates the activation function; and It is a learnable parameter matrix; This represents the network embedding dimension of the l-th layer; The superscript T indicates the network embedding dimension of the (l+1)th layer; Construct a pre-aligned set of entity seeds and positive / negative instance triples; The entity alignment loss function is trained using a pre-aligned entity seed set and positive / negative example triples, respectively. Loss function of relational structure The final model parameters of the entity relationship structure similarity model are determined, and the entity relationship structure similarity model is updated based on the final model parameters to obtain the trained entity relationship structure similarity model.

4. The entity update method for a knowledge graph according to claim 3, characterized in that, The process of fusing the name attribute similarity matrix and the entity relationship similarity matrix includes: Standardize the name attribute similarity matrix and the entity relationship similarity matrix respectively; The final entity similarity sentence is obtained by summing the standardized name attribute similarity matrix and the entity relationship similarity matrix.

5. A knowledge graph entity update system, characterized in that, include: The acquisition module is used to acquire both the new knowledge graph and the original knowledge graph. The first calculation module is used to calculate the name attribute similarity matrix based on the name attribute of the new knowledge graph and the original knowledge graph; The second calculation module is used to calculate the entity relation similarity matrix based on the entity relation structure triples of the new knowledge graph and the original knowledge graph. It includes: obtaining the entity relation structure triples of each sentence in the new knowledge graph, inputting them into the entity relation structure similarity model pre-trained based on the relation structure triples of the original knowledge graph, obtaining the entity structure similarity of each sentence, and obtaining the entity structure similarity matrix based on the entity structure similarity of each sentence. Loss function for entity alignment used to train entity relationship structural similarity models Represented as: ; ; in, This represents the entity alignment loss function in the entity relationship structural similarity model. Represents the parameter matrix Orthogonalize the loss function and use the nearest neighbor sampling method. Construct a negative sample set e_ for entity e, and a negative sample set e'_ for entity e'', a neighboring entity of entity e; Represents the cosine distance between entities; ; It's a hyperparameter; seed Represents the entity seed set; ; in, This represents the parameter matrix of the l-th layer; m is the number of embedding layers in the attention network. This represents the operation of finding the 2-norm of a matrix and squaring it; The loss function for training the entity relationship structure similarity model. Represented as: ; in, This represents the scoring function for the relation triple (h,r,t), used to calculate the confidence of the relation triple. h and t are the head and tail entity vectors from the global structure embedding layer, and r is the relation vector to be modeled and learned. It is a hyperparameter; T1 represents the set of positive triples. Represents the set of negative triples. This represents the head entity of the negative example global structure embedding layer. This represents the tail entity of the negative example global structure embedding layer. This represents a set containing all negative instances. The update module is used to fuse the name attribute similarity matrix and the entity relationship similarity matrix to obtain the entity corresponding to the new knowledge graph, and update the entity corresponding to the new knowledge graph into the original knowledge graph.

6. The entity update system for knowledge graphs according to claim 5, characterized in that, The first calculation module is used for Sentence S in the new knowledge graph is extracted by searching the character embedding matrix. Ai word vectors s Ai and word vector x Ai ; Sentence S in the original knowledge graph is extracted by searching the character embedding matrix. Bi word vectors s Bi and word vector x Bi ; According to sentence S Ai word vectors s Ai And sentence S Bi word vectors s Bi Calculate sentence S Ai Corresponding sentence S Bi Word vector similarity; According to sentence S Ai word vector x Ai And sentence S Bi word vector x Bi Calculate sentence S Ai Corresponding sentence S Bi Word vector similarity; Sentence S Ai Corresponding sentence S Bi Word vector similarity and sentence S Ai Corresponding sentence S Bi The average value of the sum of the word vector similarities is used to obtain the sentence S. Ai Corresponding sentence S Bi similarity S of the name attribute namei ; Obtain the name attribute similarity between each sentence in the new knowledge graph and each sentence in the original knowledge graph, and obtain the name attribute similarity matrix.

7. The entity update system for knowledge graphs according to claim 5, characterized in that, The second calculation module is used for Construct the entity relationship structure similarity model to be trained, represented as: ; ; ; in, , Let represent the entity vectors of the input and output of the l-th layer of the domain attention layer, respectively. This represents the entity vector input to the l-th layer of the domain attention layer, containing entities. and all its neighbors; This represents the activation function sigmoid; Representing entities A collection of connected entities Representing entities and all its neighbors, Representing entities All neighbors; This represents the entity domain attention coefficient after normalization at layer l; Representing entities The result of information fusion with neighbor j; Representing entities The result of information fusion with neighbor k; This represents an exponential function with the natural constant e as its base. Indicates the activation function; and It is a learnable parameter matrix; This represents the network embedding dimension of the l-th layer; The superscript T indicates the network embedding dimension of the (l+1)th layer; The entity alignment loss function is trained using a pre-aligned entity seed set and positive / negative example triples, respectively. Loss function of relational structure The final model parameters of the entity relationship structure similarity model are determined, and the entity relationship structure similarity model is updated based on the final model parameters to obtain the trained entity relationship structure similarity model.

8. The entity update system for knowledge graphs according to claim 7, characterized in that, The update module is used for Standardize the name attribute similarity matrix and the entity relationship similarity matrix respectively; The final entity similarity sentence is obtained by summing the standardized name attribute similarity matrix and the entity relationship similarity matrix.

9. A computer-readable storage medium for storing one or more programs, characterized in that, The one or more programs include instructions that, when executed by a computing device, cause the computing device to perform any of the methods according to claims 1 to 4.

10. A computing device, characterized in that, include, One or more processors, a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing any of the methods according to claims 1 to 4.