A method and system for constructing multidimensional knowledge graphs for substation engineering

By identifying and merging redundant nodes in the field of substation engineering, and utilizing semantic feature analysis and similarity calculation, the problem of knowledge graph redundancy caused by multidimensional data heterogeneity is solved, thereby improving the accuracy and reliability of operation and maintenance decisions.

CN122311366APending Publication Date: 2026-06-30STATE GRID JIANGXI ELECTRIC POWER CO LTD ECONOMIC & TECH RES INST +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
STATE GRID JIANGXI ELECTRIC POWER CO LTD ECONOMIC & TECH RES INST
Filing Date
2026-04-02
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

The heterogeneity of multidimensional data in the field of substation engineering leads to too many redundant nodes in the knowledge graph construction process, which reduces the accuracy and reliability of operation and maintenance decisions.

Method used

By identifying and merging redundant nodes in the knowledge graph, and utilizing semantic feature analysis and similarity calculation, a multidimensional knowledge graph is constructed, including triple extraction, node feature value evaluation, and redundant node merging processing.

Benefits of technology

It improves the usability of multidimensional knowledge graphs and enhances the accuracy and credibility of operation and maintenance decisions.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122311366A_ABST
    Figure CN122311366A_ABST
Patent Text Reader

Abstract

This application relates to the field of knowledge graph construction technology, specifically to a method and system for constructing a multidimensional knowledge graph for substation engineering. The method includes: extracting triples from multidimensional data in the substation engineering field to construct graph structures with entities in the triples as nodes; for a single graph structure network, using the node with the largest number of connections as the starting point to construct an initial node set, and recording the remaining nodes as nodes to be processed; obtaining the first and second redundancy feature values ​​of each node to be processed; evaluating the redundancy of each node to be processed; and then merging the redundant nodes. This application aims to improve the usability of the constructed multidimensional knowledge graph in the substation engineering field by effectively identifying and merging redundant nodes in the knowledge graph, thereby improving the accuracy and reliability of operation and maintenance decisions.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of knowledge graph construction technology, specifically to a method and system for constructing a multidimensional knowledge graph for substation engineering. Background Technology

[0002] With the continuous technological development and expansion of construction scale in the field of substation engineering, substation engineering, as a complex engineering field, involves a wide variety of data types. Knowledge graphs can use relation extraction methods to construct rule bases from different data, promoting digital transformation throughout the entire lifecycle of substations. Especially in the operation, maintenance, and fault diagnosis of substation engineering, by integrating multi-dimensional data such as equipment, fault types, real-time data, fault location, and maintenance methods, a multi-dimensional knowledge graph can be constructed, providing a more professional engineering knowledge foundation for the operation and maintenance of substation engineering.

[0003] However, power data in substation projects comes from diverse sources, has different data types, and is recorded in significantly different ways. For example, the descriptions of entities in unstructured fault texts and equipment operation records differ, resulting in a large amount of redundancy in the knowledge graph construction process of heterogeneous multidimensional data. This reduces the availability of the knowledge graph and consequently weakens the accuracy and credibility of operation and maintenance decisions. Summary of the Invention

[0004] In view of the above, it is necessary to provide a method and system for constructing a multidimensional knowledge graph for substation engineering. Compared with traditional methods for constructing multidimensional knowledge graphs for substation engineering, this method improves the usability of the constructed multidimensional knowledge graph in the field of substation engineering by effectively identifying and merging redundant nodes in the knowledge graph, thereby improving the accuracy and reliability of operation and maintenance decisions. In a first aspect, embodiments of this application provide a method for constructing a multidimensional knowledge graph for substation engineering, the method comprising the following steps: Triples are extracted from multidimensional data in the field of substation engineering to construct graph-structured networks with entities in the triples as nodes. For a single graph structure network, the node with the largest number of connections is used as the starting point to construct an initial node set, and the remaining nodes are denoted as nodes to be processed. Each node to be processed is introduced into the initial node set one by one. By comparing the distribution of semantic features of all nodes in the initial node set before and after the introduction of each node to be processed, the first redundant feature value of each node to be processed is obtained. For each edge containing a node to be processed, the sentence features of the two nodes on the edge containing the node to be processed are measured and compared with the sentence features of the other node on the edge containing the node to be processed and its directly connected node. The semantic features of the node to be processed and its directly connected node are also compared, and a second redundant feature value of the node to be processed is obtained. This second redundant feature value is then fused with the first redundant feature value to evaluate the redundancy of the node to be processed, and the redundant nodes are then merged.

[0005] In one embodiment, the process of obtaining the first redundant feature value is as follows: Calculate the posterior probability value of each node in the initial node set before and after each node to be processed is introduced into the initial node set, and calculate the information entropy of the posterior probability value of all nodes in the initial node set. By analyzing the changes in information entropy before and after introducing each node to be processed into the initial node set, the first redundant feature value of each node to be processed is obtained.

[0006] In one embodiment, the first redundant feature value is positively correlated with the difference in information entropy before and after each node to be processed is introduced into the initial node set.

[0007] In one embodiment, the process of obtaining the second redundant feature value is as follows: Based on the similarity, the statement-level similarity of each node to be processed on its respective edge is obtained; The entity-level similarity of each node to be processed on its respective edge is obtained by using the similarity of semantic features between each node to be processed and the directly connected node of another node on its edge. The redundancy of each node to be processed on its respective edge is obtained by using the statement-level similarity and the entity-level similarity. The second redundancy feature value is the maximum value of the redundancy of each node to be processed among all the edges it is on.

[0008] In one embodiment, the process of obtaining the statement-level similarity is as follows: The semantic feature vectors of the two nodes on the edge of each node to be processed are concatenated to obtain the first statement-level feature vector; The semantic feature vectors of the other node on the edge of each node to be processed and its directly connected nodes are concatenated to obtain the second statement-level feature vectors. The similarity between the first statement-level feature vector and the second statement-level feature vector is denoted as the first similarity. The statement-level similarity is positively correlated with the first similarity.

[0009] In one embodiment, the process of obtaining the entity-level similarity is as follows: The similarity of semantic feature vectors between each node to be processed and each directly connected node of another node on its edge is denoted as the second similarity. The entity-level similarity is obtained by combining the second similarity between each node to be processed and all directly connected nodes of another node on its edge, and the entity-level similarity is positively correlated with the second similarity.

[0010] In one embodiment, the process of obtaining the redundancy is as follows: Calculate the sum and difference of the entity-level similarity and the statement-level similarity, respectively; The redundancy is positively correlated with the sum and negatively correlated with the difference.

[0011] In one embodiment, the redundancy assessment process for each node to be processed is as follows: The redundancy weights of each node to be processed are obtained by using the first redundancy feature value and the second redundancy feature value, and the redundancy weights are positively correlated with the first redundancy feature value and the second redundancy feature value, respectively. The redundancy threshold is obtained by comparing the average level and dispersion of the redundancy weights of all nodes to be processed in a single graph structure network. Then, it is determined whether each node to be processed is a redundant node by comparing the redundancy weights with the redundancy threshold.

[0012] In one embodiment, the process of merging redundant nodes is as follows: For a single redundant node, each node that shares a common directly connected node with the single redundant node is denoted as a candidate merging node; If there is a node among all the candidate merge nodes of a single redundant node that is not identified as a redundant node, select the node that is not identified as a redundant node and has the highest second similarity with the single redundant node from all the candidate merge nodes of the single redundant node, and use it as the target merge node; otherwise, select the node with the smallest redundancy weight from all the candidate merge nodes of the single redundant node, and use it as the target merge node. The topology connection of a single redundant node is migrated to the target merge node, and the single redundant node is removed from the single graph structure network.

[0013] Secondly, embodiments of this application also provide a multidimensional knowledge graph construction system for substation engineering, including a memory, a processor, and a computer program stored in the memory and running on the processor. When the processor executes the computer program, it implements the steps of any of the above-described methods for constructing a multidimensional knowledge graph for substation engineering.

[0014] This application has at least the following beneficial effects: This application constructs an initial node set by using the node with the largest number of connections as the starting point. It analyzes the discrete distribution of semantic features within the initial node set before and after the introduction of each node to be processed, calculating a first redundancy feature value. This helps capture the similarity between newly introduced nodes and existing nodes in the global semantic space. By analyzing the statement-level and entity-level similarity of the edges containing each node to be processed, a second redundancy feature value is calculated. This fully utilizes the differences in event expression and entity differences of the same fault type in different equipment locations, helping to improve the accuracy of determining whether each node to be processed is a redundant node. By fusing the first and second redundancy feature values, a comprehensive analysis of the possibility that each node to be processed is a redundant node can significantly reduce the risk of misjudgment, effectively identify redundant nodes in the knowledge graph, and then merge redundant nodes, improving the usability of the constructed multidimensional knowledge graph in the field of substation engineering, thereby improving the accuracy and reliability of operation and maintenance decisions. Attached Figure Description

[0015] To more clearly illustrate the technical solutions and advantages in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0016] Figure 1 A flowchart illustrating the steps of a method for constructing a multidimensional knowledge graph for substation engineering, as provided in one embodiment of this application; Figure 2 This is a schematic diagram illustrating the process of obtaining redundant weights. Figure 3 This is a schematic diagram of the assessment process for redundancy. Detailed Implementation

[0017] In the description of the embodiments in this application, the words "exemplary," "or," and "for example" are used to indicate examples, illustrations, or descriptions. Any embodiment or design scheme described as "exemplary" or "for example" in the embodiments of this application should not be construed as being more preferred or advantageous than other embodiments or design schemes. Specifically, the use of the words "exemplary," "or," and "for example" is intended to present the relevant concepts in a specific manner.

[0018] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. It should be understood that, unless otherwise stated, " / " in this application means "or".

[0019] It should also be noted that the terms "first" and "second" in this application are used to distinguish similar objects, rather than to describe a specific order or sequence.

[0020] The following description, in conjunction with the accompanying drawings, details the specific scheme of the multi-dimensional knowledge graph construction method and system for substation engineering provided in this application.

[0021] Please see Figure 1 The diagram illustrates a flowchart of a method for constructing a multidimensional knowledge graph for substation engineering, according to an embodiment of this application. The method includes the following steps: Step 1: Extract triples from multidimensional data in the field of substation engineering to construct a graph structure network with entities in the triples as nodes.

[0022] Acquire multidimensional data in the field of substation engineering, including ledger data, operation data, fault log data, and maintenance log data of substation equipment.

[0023] The knowledge extraction tool is used to extract knowledge from the multidimensional data in the field of substation engineering, obtain triples, and obtain the semantic feature vector of each entity in each triple during the knowledge extraction process.

[0024] In this embodiment, knowledge extraction is performed using DeepDive, which is a well-known technology and will not be described in detail here. As other implementation methods, based on the ability to extract knowledge from multi-source data, implementers may adopt other existing feasible technologies, and this application does not impose any special restrictions.

[0025] In this embodiment, the BERT (Bidirectional Encoder Representations from Transformers) language model is used to obtain the semantic feature vector of each entity in the triple. The BERT language model is a well-known technology and will not be described in detail here. As other implementation methods, based on the ability to obtain the semantic feature vector of each entity in the triple, the implementer may use other existing feasible technologies. This application does not impose any special restrictions.

[0026] Based on the string similarity between entities, the entities in the obtained triples are merged to obtain various graph structure networks. Specifically, entities with identical strings in different triples are merged into a single entity, resulting in multiple independent graph structure networks, thus achieving preliminary deduplication of triples. For example, if one entity in two triples is "transformer" and the other is different, it indicates that these two triples are obtained from the event of "transformer". When both entities are the same, these two triples are merged. For example, if both entities are "transformer" and "overload fault", it means that these two triples describe the event of "transformer overload fault". A single graph structure network represents a series of events surrounding a type of power equipment, such as the basic parameters of the power equipment, operating data when a fault occurs, fault type, fault location, and fault repair process.

[0027] When merging entities, the semantic feature vector of the merged entity is the average of the semantic feature vectors of the multiple entities being merged.

[0028] Step 2: For a single graph structure network, obtain the first redundancy feature value and the second redundancy feature value of each node to be processed in order to evaluate the redundancy of each node.

[0029] In step 1, the graph structure network initially removes duplicates through string matching. However, since the event records from different business dimensions are unstructured log text, the descriptions of entities differ across business dimensions, leading to inconsistencies in the triple entities describing the same event. For example, both "transformer overload fault" and "transformer continuously operating under overload" describe abnormally high loads caused by abnormal line current or cooling system failures. The large number of triples generated from heterogeneous multidimensional data knowledge extraction contains a great deal of similar information; simply eliminating triples with identical entities would result in knowledge graph redundancy.

[0030] To address the aforementioned issues, the redundancy of nodes in each graph structure network is analyzed, and redundant nodes are merged to improve the usability of the constructed multidimensional knowledge graph in the field of substation engineering.

[0031] In a graph-structured network, nodes represent entities, and two directly connected nodes represent a triple. Therefore, in a graph-structured network, each node is connected to at least one other node. For example, if node A is connected to node B, and both are connected to multiple nodes, then due to the differences in the descriptions of the same entity in multidimensional data, there may be cases where the entity names are different but the semantics are the same among these connected nodes, which directly leads to redundancy in knowledge graph data.

[0032] Further analysis will be conducted on a single graph-structured network.

[0033] Step 2.1: For a single graph structure network, the node with the largest number of connections is used as the starting benchmark to construct an initial node set, and the remaining nodes are recorded as nodes to be processed. Each node to be processed is introduced into the initial node set one by one. By comparing the distribution and dispersion of semantic features of all nodes in the initial node set before and after the introduction of each node to be processed, the first redundant feature value of each node to be processed is obtained.

[0034] In graph networks, the more directly connected nodes a single node has, the richer the events surrounding that single node, and the higher the probability that the single node will become a central node. For example, if a power equipment entity is the central node, the nodes directly connected to it can cover a variety of event information such as basic parameters, operating data, fault types, fault locations, and maintenance methods.

[0035] Based on the above analysis, the node with the largest number of connections in a single graph structure network is taken as the starting reference. The starting reference is the core entity of the graph structure network and does not participate in the calculation of redundant features, and is retained by default.

[0036] In a single graph structure network, all nodes except the initial reference are denoted as nodes to be processed. An initial node set is constructed using the node with the largest number of connections in the single graph structure network as the initial reference. By comparing the distribution and dispersion of semantic features of all nodes in the initial node set before and after introducing the nodes to be processed, the first redundant feature value of each node to be processed is obtained. The specific process is as follows: Calculate the posterior probability value of each node in the initial node set before and after each node to be processed is introduced into the initial node set, and calculate the information entropy of the posterior probability value of all nodes in the initial node set. The first redundant feature value of each node to be processed is positively correlated with the difference in information entropy before and after introducing each node into the initial node set. The calculation of information entropy is a well-known technique and will not be elaborated upon in this application.

[0037] It should be noted that positive correlation means that the variables change in the same direction; when one variable increases, the other variable also increases, and when one variable decreases, the other variable also decreases.

[0038] In this embodiment, a Gaussian mixture model is used to calculate the posterior probability value of each node in the initial node set based on the semantic feature vectors of all nodes in the initial node set. The Gaussian mixture model is a well-known technique and will not be described in detail in this application.

[0039] In this embodiment, the expression for the first redundant feature value of each node to be processed is: In the formula, represents the first redundant feature value of the u-th node to be processed; exp() represents the exponential function with the natural constant as the base, used to map the data to positive numbers; , These represent the information entropy before and after introducing the u-th node to be processed into the initial node set, respectively. The formula for calculating the first redundant feature value is the Sigmoid function, which is used to smoothly map the difference in information entropy to the (0,1) interval.

[0040] It should be added that the first redundancy feature value of the first node to be processed is assigned to 0. In the process of calculating the first redundancy feature value, all nodes with a node interval of 0 from the starting reference are first introduced into the initial node set in sequence, and then all nodes with a node interval of 1 from the starting reference are introduced into the initial node set in sequence. This process is repeated until all nodes in a single graph structure network are traversed.

[0041] It should be noted that in data statistical analysis, information entropy reflects the degree of dispersion of data distribution; the more dispersed the distribution, the greater the information entropy. In this embodiment, information entropy is applied to the distribution evaluation of node semantic feature vectors: when the semantic features of a newly added node differ significantly from all nodes in the initial node set, it indicates the emergence of a new event type, and the information entropy increases; when the semantic features of a newly added node are similar to those of existing nodes, their posterior probability values ​​are close, the Gaussian distribution in the Gaussian mixture model is more concentrated, no new semantic features are introduced, and the information entropy decreases. When the entity represented by the newly added node is a completely new entity, the smaller the calculated first redundant feature value, the less likely the newly added node is to be merged as a node containing redundant information; conversely, the more similar the semantic features of the newly added node are to those of existing nodes in the initial node set, the larger the calculated first redundant feature value, the greater the likelihood that the newly added node will be merged as a node containing redundant information.

[0042] Step 2.2: For each edge containing a node to be processed, measure the sentence features of the two nodes on the edge containing the node to be processed, compare them with the sentence features of the other node on the edge containing the node to be processed and its directly connected node, and compare the semantic features of the node to be processed with the directly connected node of the other node on the edge containing the node to be processed, and then obtain the second redundant feature value of each node to be processed.

[0043] The first redundancy feature value characterizes the semantic features of the entity represented by each node to be processed, and its redundancy probability in the entire graph structure network. However, the structure of power equipment is complex, and different parts of the same equipment may experience the same type of fault. This can lead to nodes describing different events being evaluated as containing similar information, which may cause the first redundancy feature value of each node to be processed to increase abnormally, resulting in redundancy misjudgment. For example, "main transformer winding overheating fault" and "high-voltage bushing overheating fault" both belong to the "component-overheating fault" pattern, which are represented in the graph structure network as "node U (main transformer winding) - node V (overheating fault)" and "node X (high-voltage bushing) - node Y (overheating fault)". In this case, the semantic feature vectors between nodes V and Y are very close, which leads to a significant increase in the first redundancy feature value of at least one node in nodes V and Y, making it easy to be misjudged as redundant information.

[0044] Based on the above analysis, for each edge containing a node to be processed, the similarity between the sentence features of the two nodes on that edge is measured and compared to the similarity between the sentence features of the other node on that edge and its directly connected node. Furthermore, the semantic features of the node to be processed are compared with those of the directly connected nodes of the other node on that edge. This process yields the second redundant feature value for each node to be processed. By measuring the sentence features of the two nodes on the edge of each node to be processed, and comparing them with the sentence features of the other node on the edge of each node to be processed and its directly connected node, the sentence-level similarity of each node to be processed on the edge is obtained. The entity-level similarity of each node to be processed on its respective edge is obtained by using the similarity of semantic features between each node to be processed and the directly connected node of another node on its edge. The redundancy of each node to be processed on its respective edge is obtained by using the statement-level similarity and entity-level similarity of each node to be processed on its respective edge. The second redundancy feature value of each node to be processed is the maximum value of the redundancy of each node among all the edges it is on.

[0045] The calculation process for the statement-level similarity of each node to be processed on its respective edge is as follows: The semantic feature vectors of the two nodes on the edge of each node to be processed are concatenated to obtain the first statement-level feature vector; The semantic feature vectors of the other node on the edge of each node to be processed and its directly connected nodes are concatenated to obtain the second statement-level feature vectors. The similarity between the first statement-level feature vector and the second statement-level feature vector is denoted as the first similarity. The statement-level similarity of each node to be processed on its respective edge is positively correlated with the first similarity.

[0046] The calculation process for the entity-level similarity of each node to be processed on its respective edge is as follows: The similarity of semantic feature vectors between each node to be processed and each directly connected node of another node on the same edge is denoted as the second similarity. The entity-level similarity is obtained by combining the second similarity between each node to be processed and all directly connected nodes of another node on the same edge, and the entity-level similarity of each node to be processed on the same edge is positively correlated with the second similarity.

[0047] The calculation process for the redundancy of each node to be processed on its respective edge is as follows: Calculate the sum and difference of entity-level similarity and statement-level similarity for each node to be processed on its respective edge; The redundancy of each node to be processed on its respective edge is positively correlated with the sum value and negatively correlated with the difference value.

[0048] It should be noted that negative correlation means that the variables change in opposite directions; when one variable increases, the other decreases, and vice versa.

[0049] It should be noted that when concatenating the first statement-level feature vector and the second statement-level feature vector, at least one position in the first statement-level feature vector and the second statement-level feature vector must be the semantic feature vector of the same entity.

[0050] In this embodiment, the similarity between semantic feature vectors and the similarity between the first statement-level feature vector and the second statement-level feature vector are both cosine similarity. The calculation of cosine similarity is a well-known technique and will not be described in detail here. As other implementation methods, based on the ability to measure the similarity between semantic feature vectors and the similarity between the first statement-level feature vector and the second statement-level feature vector, implementers may adopt other existing feasible techniques, and this application does not impose any special restrictions.

[0051] In this embodiment, the difference between entity-level similarity and statement-level similarity is the absolute value of the difference. As other implementation methods, based on the ability to measure the degree of difference between entity-level similarity and statement-level similarity, the implementer may use other calculation methods, such as the square of the difference, the ratio, etc. This application does not impose any special restrictions.

[0052] In this embodiment, the method for calculating the statement-level similarity of each node to be processed on its side is as follows: the average of the first similarity between the second statement-level feature vector obtained by concatenating another node on the side of each node to be processed with all its directly connected nodes and the first statement-level feature vector is used as the statement-level similarity of each node to be processed on its side.

[0053] In this embodiment, the entity-level similarity of each node to be processed on its respective edge is calculated as follows: the average of the second similarity between each node to be processed and all directly connected nodes of another node on its respective edge is taken as the entity-level similarity of each node to be processed on its respective edge.

[0054] In this embodiment, the redundancy of each node to be processed on its corresponding edge is the ratio of the sum to the difference. It should be noted that during the calculation of the ratio, if there is a denominator of 0, the denominator is first mapped to a positive number before subsequent calculations. There are many methods for mapping data to positive numbers, and implementers can choose existing feasible methods according to their actual situation. In this embodiment, the purpose of mapping the data to a positive number is achieved by calculating the sum of the data and a preset value greater than 0. The value of the preset value greater than 0 is preset by the user, and implementers can set it according to their actual situation. This application does not impose any special restrictions. In this embodiment, the preset value greater than 0 is 0.01.

[0055] It should be added that: if there are no other directly connected nodes on the edge of any node to be processed, the statement-level similarity and entity-level similarity of the edge of any node to be processed are both set to 0.

[0056] It should be noted that in data analysis, the similarity of semantic features between different words can be measured. In this embodiment, by combining the entity-level semantic similarity and sentence-level semantic similarity of each node to be processed in the nearest neighbor range, it is evaluated whether there are semantically similar nodes in the nearest neighbor range of each node to be processed, thereby reflecting whether each node to be processed belongs to the node containing redundant information. When the calculated sentence-level similarity and entity-level similarity are both larger, it indicates that each node to be processed is more likely to be a redundant node on its edge. The larger the calculated second redundancy feature value, the more consistent there are nodes with the same entity words and event expressions as the nodes to be processed within the nearest neighbor range of each node to be processed, and the higher the probability that each node to be processed is a redundant node.

[0057] Step 2.3: Fuse the first redundancy feature value and the second redundancy feature value of each node to be processed to evaluate the redundancy of each node to be processed.

[0058] By combining the first and second redundancy feature values ​​of each node to be processed, the redundancy weight of each node to be processed is obtained, specifically as follows: The redundancy weight of each node to be processed is positively correlated with the first redundancy feature value and the second redundancy feature value of each node. A schematic diagram of the redundancy weight acquisition process is shown below. Figure 2 As shown.

[0059] In this embodiment, the normalized value of the second redundant feature value of each node to be processed and the sum of the first redundant feature values ​​are used as the redundancy weight of each node to be processed. The normalized value of the second redundant feature value is obtained using the Min-Max normalization method, which is a well-known technique and will not be described further in this application.

[0060] The redundancy threshold is obtained by considering the average level and dispersion of the redundancy weights of all nodes to be processed in a single graph structure network.

[0061] In this embodiment, the average and standard deviation of the redundancy weights of all nodes to be processed in a single graph structure network are calculated. Since a larger redundancy weight indicates a greater probability that the node to be processed is a redundant node, the calculated average and standard deviation are compared with... The sum of multiples of the standard deviation is used as the redundancy threshold, where... The range of values ​​is , The value can be set by the implementer according to the actual scenario. This application does not impose any special restrictions. In this embodiment, The value of is 1.

[0062] Furthermore, by comparing the redundancy weight and redundancy threshold of each node to be processed, it is determined whether each node is a redundant node. Specifically, nodes with a redundancy weight greater than the redundancy threshold are classified as redundant nodes, while nodes with a redundancy weight less than or equal to the redundancy threshold are classified as non-redundant nodes. A schematic diagram of the redundancy assessment process is shown below. Figure 3 As shown.

[0063] Step 3: Merge redundant nodes.

[0064] The specific process for merging redundant nodes is as follows: For a single redundant node, each node that shares a common directly connected node with the single redundant node is denoted as a candidate merging node; If there is a node among all the candidate merge nodes of a single redundant node that is not identified as a redundant node, select the node that is not identified as a redundant node and has the highest second similarity with the single redundant node from all the candidate merge nodes of the single redundant node, and use it as the target merge node; otherwise, select the node with the smallest redundancy weight from all the candidate merge nodes of the single redundant node, and use it as the target merge node. The topology connection of a single redundant node is migrated to the target merge node, and the single redundant node is removed from the single graph structure network.

[0065] Furthermore, Neo4j was used to visualize all graph network structures, resulting in a multidimensional knowledge graph for substation engineering. Neo4j is a well-known technology and will not be described further in this application.

[0066] Based on the same inventive concept as the above method, this application embodiment also provides a multidimensional knowledge graph construction system for substation engineering, including a memory, a processor, and a computer program stored in the memory and running on the processor. When the processor executes the computer program, it implements the steps of any one of the above-described methods for constructing a multidimensional knowledge graph for substation engineering.

[0067] In summary, this application constructs an initial node set by using the node with the largest number of connections as the starting point. It analyzes the discrete distribution of semantic features within the initial node set before and after the introduction of each node to be processed, calculating a first redundancy feature value. This helps capture the similarity between newly introduced nodes and existing nodes in the global semantic space. By analyzing the statement-level and entity-level similarity of the edges of each node to be processed, a second redundancy feature value is calculated. This fully utilizes the differences in event expression and entity differences of the same fault type in different equipment locations, helping to improve the accuracy of determining whether each node to be processed is a redundant node. By fusing the first and second redundancy feature values, a comprehensive analysis of the possibility that each node to be processed is a redundant node can significantly reduce the risk of misjudgment, effectively identify redundant nodes in the knowledge graph, and then merge redundant nodes, improving the usability of the constructed multidimensional knowledge graph in the field of substation engineering, thereby improving the accuracy and reliability of operation and maintenance decisions.

[0068] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to embodiments of this disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. In some alternative implementations, the functions marked in the blocks may occur in a different order than that shown in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. In the descriptions corresponding to the flowcharts and block diagrams in the accompanying drawings, the operations or steps corresponding to different blocks may also occur in a different order than disclosed in the description, and sometimes there is no specific order between different operations or steps. For example, two consecutive operations or steps may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. Each block in a block diagram and / or flowchart, and combinations of blocks in a block diagram and / or flowchart, can be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.

[0069] It will be apparent to those skilled in the art that this application is not limited to the details of the exemplary embodiments described above, and that this application can be implemented in other specific forms without departing from its essential characteristics. Therefore, the embodiments described above should be considered exemplary and non-limiting in all respects.

Claims

1. A method for constructing a multidimensional knowledge graph for substation engineering, characterized in that, The method includes the following steps: Triples are extracted from multidimensional data in the field of substation engineering to construct graph-structured networks with entities in the triples as nodes. For a single graph structure network, the node with the largest number of connections is used as the starting point to construct an initial node set, and the remaining nodes are denoted as nodes to be processed. Each node to be processed is introduced into the initial node set one by one. By comparing the distribution of semantic features of all nodes in the initial node set before and after the introduction of each node to be processed, the first redundant feature value of each node to be processed is obtained. For each edge containing a node to be processed, the sentence features of the two nodes on the edge containing the node to be processed are measured and compared with the sentence features of the other node on the edge containing the node to be processed and its directly connected node. The semantic features of the node to be processed and its directly connected node are also compared, and a second redundant feature value of the node to be processed is obtained. This second redundant feature value is then fused with the first redundant feature value to evaluate the redundancy of the node to be processed, and the redundant nodes are then merged.

2. The method for constructing a multi-dimensional knowledge graph for substation engineering as described in claim 1, characterized in that, The process of obtaining the first redundant feature value is as follows: Calculate the posterior probability value of each node in the initial node set before and after each node to be processed is introduced into the initial node set, and calculate the information entropy of the posterior probability value of all nodes in the initial node set. By analyzing the changes in information entropy before and after introducing each node to be processed into the initial node set, the first redundant feature value of each node to be processed is obtained.

3. The method for constructing a multidimensional knowledge graph for substation engineering as described in claim 2, characterized in that, The first redundant feature value is positively correlated with the difference in information entropy before and after each node to be processed is introduced into the initial node set.

4. The method for constructing a multidimensional knowledge graph for substation engineering as described in claim 1, characterized in that, The process of obtaining the second redundant feature value is as follows: Based on the similarity, the statement-level similarity of each node to be processed on its respective edge is obtained; The entity-level similarity of each node to be processed on its respective edge is obtained by using the similarity of semantic features between each node to be processed and the directly connected node of another node on its edge. The redundancy of each node to be processed on its respective edge is obtained by using the statement-level similarity and the entity-level similarity. The second redundancy feature value is the maximum value of the redundancy of each node to be processed among all the edges it is on.

5. The method for constructing a multidimensional knowledge graph for substation engineering as described in claim 4, characterized in that, The process of obtaining the statement-level similarity is as follows: The semantic feature vectors of the two nodes on the edge of each node to be processed are concatenated to obtain the first statement-level feature vector; The semantic feature vectors of the other node on the edge of each node to be processed and its directly connected nodes are concatenated to obtain the second statement-level feature vectors. The similarity between the first statement-level feature vector and the second statement-level feature vector is denoted as the first similarity. The statement-level similarity is positively correlated with the first similarity.

6. The method for constructing a multidimensional knowledge graph for substation engineering as described in claim 4, characterized in that, The process of obtaining the entity-level similarity is as follows: The similarity of semantic feature vectors between each node to be processed and each directly connected node of another node on its edge is denoted as the second similarity. The entity-level similarity is obtained by combining the second similarity between each node to be processed and all directly connected nodes of another node on its edge, and the entity-level similarity is positively correlated with the second similarity.

7. The method for constructing a multidimensional knowledge graph for substation engineering as described in claim 4, characterized in that, The process of obtaining the redundancy is as follows: Calculate the sum and difference of the entity-level similarity and the statement-level similarity, respectively; The redundancy is positively correlated with the sum and negatively correlated with the difference.

8. The method for constructing a multidimensional knowledge graph for substation engineering as described in claim 1, characterized in that, The evaluation process for the redundancy of each node to be processed is as follows: The redundancy weights of each node to be processed are obtained by using the first redundancy feature value and the second redundancy feature value, and the redundancy weights are positively correlated with the first redundancy feature value and the second redundancy feature value, respectively. The redundancy threshold is obtained by comparing the average level and dispersion of the redundancy weights of all nodes to be processed in a single graph structure network. Then, it is determined whether each node to be processed is a redundant node by comparing the redundancy weights with the redundancy threshold.

9. A method for constructing a multidimensional knowledge graph for substation engineering as described in claim 6, characterized in that, The process of merging redundant nodes is as follows: For a single redundant node, each node that shares a common directly connected node with the single redundant node is denoted as a candidate merging node; If there is a node that is not identified as a redundant node among all the candidate merge nodes of a single redundant node, select the node that is not identified as a redundant node and has the highest second similarity with the single redundant node from all the candidate merge nodes of the single redundant node, and use it as the target merge node. Otherwise, select the node with the smallest redundancy weight from all candidate merge nodes of a single redundant node as the target merge node; The topology connection of a single redundant node is migrated to the target merge node, and the single redundant node is removed from the single graph structure network.

10. A multi-dimensional knowledge graph construction system for substation engineering, comprising a memory, a processor, and a computer program stored in the memory and running on the processor, characterized in that, When the processor executes the computer program, it implements the steps of the method for constructing a multidimensional knowledge graph for substation engineering as described in any one of claims 1-9.