A virtual power plant user sensitive data identification method and device based on graph learning

CN115758172BActive Publication Date: 2026-06-23GLOBAL ENERGY INTERCONNECTION RES INST CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
GLOBAL ENERGY INTERCONNECTION RES INST CO LTD
Filing Date
2022-11-14
Publication Date
2026-06-23

Smart Images

  • Figure CN115758172B_ABST
    Figure CN115758172B_ABST
Patent Text Reader

Abstract

The application discloses a virtual power plant user sensitive data identification method and device based on graph learning, and the method comprises the following steps: obtaining to-be-identified user data; generating a to-be-identified graph network according to the relationship of keywords in the to-be-identified user data; inputting the to-be-identified graph network into a pre-trained graph convolutional neural network model to obtain a minimum connected dominating set corresponding to the to-be-identified graph network; generating a simplified graph of the to-be-identified graph network based on the minimum connected dominating set corresponding to the to-be-identified graph network; and performing matching degree calculation on the simplified graph of the to-be-identified graph network and a simplified graph corresponding to standard sensitive level data to obtain a sensitive identification result of the to-be-identified user data. Through the implementation of the application, the minimum connected dominating set is generated by using the graph learning method, the complexity of subsequent graph matching is reduced, the matching speed is improved, and the application has higher efficiency compared with a traditional sensitive identification method.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of power data security technology, specifically to a method and apparatus for identifying sensitive user data in a virtual power plant based on graph learning. Background Technology

[0002] In the context of the booming development of the energy internet, a large number of producers and consumers have emerged in the energy system. The randomness and volatility of a large number of distributed resources have increased the complexity and management difficulty of the power grid, significantly impacting its safety, reliability, and economic operation. Complex virtual power plant technology utilizes advanced sensing and control technologies to effectively aggregate and dispatch distributed resources such as renewable energy generation and energy storage. While participating in the ancillary services market to obtain revenue, it provides flexibility to the power grid, improves the grid security level, and reduces grid operating and investment costs. It is an important module in multi-energy flow integrated energy management systems.

[0003] Therefore, the secure protection of user data in complex virtual power plants has become a crucial foundation for the State Grid's data security efforts. With the promulgation of various data protection laws, data identification requirements have been clearly defined. In recent years, the power grid company has actively conducted a series of preliminary explorations regarding data identification, initially establishing identification methods and a foundation for practical operations. However, at present, identification work mainly relies on manual methods, which suffers from low efficiency, poor accuracy, and difficulty in implementation.

[0004] Complex virtual power plant user data is characterized by its large volume, diverse types, and high level of confidentiality, containing a large amount of commercially sensitive and personal data that could impact national, social, and corporate interests. To better clarify important data, identify data risks, and strengthen data security protection, the power industry has made a series of preliminary explorations in recent years regarding the identification of sensitive data from complex virtual power plant users. Various documents have been issued, outlining overall requirements for power data identification. Based on data tables and field descriptions, a shared negative list has been published through manual review to promote internal sharing and integration of power data.

[0005] Therefore, how to simultaneously meet the need for rapid identification of sensitive user data in complex virtual power plants is a key issue that urgently needs to be addressed in current solutions for identifying sensitive user data in complex virtual power plants. Summary of the Invention

[0006] In view of this, embodiments of the present invention provide a method and apparatus for identifying sensitive user data of virtual power plants based on graph learning, in order to solve the technical problem of how to simultaneously meet the requirements of rapid identification of sensitive user data of complex virtual power plants in the prior art.

[0007] The technical solution proposed in this invention is as follows:

[0008] The first aspect of this invention provides a method for identifying sensitive user data in a virtual power plant based on graph learning, comprising: acquiring user data to be identified; generating a graph network to be identified based on the relationship between keywords in the user data; inputting the graph network to be identified into a pre-trained graph convolutional neural network model to obtain the minimum connected dominance set corresponding to the graph network to be identified; generating a simplified graph of the graph network to be identified based on the minimum connected dominance set corresponding to the graph network to be identified; and calculating the matching degree between the simplified graph of the graph network to be identified and the simplified graph corresponding to the standard sensitivity level data to obtain the sensitivity identification result of the user data to be identified.

[0009] Optionally, generating a graph network to be identified based on the relationships between keywords in the user data to be identified includes: performing word segmentation on the user data to be identified based on word frequency statistics to generate a keyword list corresponding to the user data to be identified; traversing the user data to be identified, extracting words that match the keyword list and the relationships between the words, and generating a graph network to be identified, the graph network to be identified including nodes and edges.

[0010] Optionally, the graph convolutional neural network model is trained in the following manner: multiple graph networks are obtained, each graph network including nodes and edges; the minimum spanning tree Prim algorithm is used to generate the minimum connected dominating set of each graph network; the graph convolutional neural network is trained based on the multiple graph networks and the corresponding minimum connected dominating sets to obtain a pre-trained graph convolutional neural network model.

[0011] Optionally, training a graph convolutional neural network based on multiple graph networks and their corresponding minimum connected dominating sets includes: using the degree of each node in the multiple graph networks as a node feature matrix; labeling nodes in the graph networks that belong to the minimum connected dominating set as 1 and nodes that do not belong as 0 to obtain node label values; training the graph convolutional neural network based on the node feature matrix and the node label values, wherein the graph convolutional neural network uses ReLU as the activation function, the convolutional layers include dropout, and the log-likelihood function is used as the loss function.

[0012] Optionally, a simplified graph of the graph to be identified is generated based on the minimum connected dominating set corresponding to the graph network to be identified, including: obtaining any two nodes in the minimum connected dominating set corresponding to the graph network to be identified; determining whether there is an edge between the two corresponding nodes in the graph network to be identified; if so, adding an edge between the two nodes; repeating the above steps until all nodes in the minimum connected dominating set have been traversed to obtain the simplified graph of the graph network to be identified.

[0013] Optionally, the matching degree is calculated between the simplified graph of the graph network to be identified and the simplified graph corresponding to the standard sensitivity level data to obtain the sensitivity identification result of the user data to be identified. This includes: calculating the maximum common subgraph between the simplified graph of the graph network to be identified and the simplified graph corresponding to the standard sensitivity level data; calculating the number of nodes in the simplified graph of the graph network to be identified and the simplified graph corresponding to the standard sensitivity level data; calculating the matching degree based on the maximum value between the maximum common subgraph and the number of nodes, and classifying the user data to be identified as the sensitive result corresponding to the standard sensitivity level data with the highest matching degree.

[0014] Optionally, calculating the maximum common subgraph between the simplified graph of the graph network to be identified and the simplified graph corresponding to the standard sensitivity level data includes: calculating the common nodes between the simplified graph of the graph network to be identified and the simplified graph corresponding to the standard sensitivity level data; calculating the common edges between the simplified graph of the graph network to be identified and the simplified graph corresponding to the standard sensitivity level data based on the common nodes; and determining the maximum common subgraph based on the common nodes and common edges.

[0015] A second aspect of this invention provides a virtual power plant user sensitive data identification device based on graph learning, comprising: a data acquisition module for acquiring user data to be identified; a graph network generation module for generating a graph network to be identified based on the relationships of keywords in the user data; a dominance set generation module for inputting the graph network to be identified into a pre-trained graph convolutional neural network model to obtain the minimum connected dominance set corresponding to the graph network to be identified; a simplified graph generation module for generating a simplified graph of the graph network to be identified based on the minimum connected dominance set corresponding to the graph network to be identified; and a classification and grading module for calculating the matching degree between the simplified graph of the graph network to be identified and the simplified graph corresponding to the standard sensitivity level data to obtain the sensitivity identification result of the user data to be identified.

[0016] Optionally, the graph network generation module is specifically used for: performing word segmentation processing on the user data to be identified based on word frequency statistics to generate a keyword list corresponding to the user data to be identified; traversing the user data to be identified, extracting words that match the keyword list and the relationships between words in the user data to be identified, and generating a graph network to be identified, the graph network to be identified including nodes and edges.

[0017] Optionally, the graph convolutional neural network model is trained in the following manner: multiple graph networks are obtained, each graph network including nodes and edges; the minimum spanning tree Prim algorithm is used to generate the minimum connected dominating set of each graph network; the graph convolutional neural network is trained based on the multiple graph networks and the corresponding minimum connected dominating sets to obtain a pre-trained graph convolutional neural network model.

[0018] Optionally, training a graph convolutional neural network based on multiple graph networks and their corresponding minimum connected dominating sets includes: using the degree of each node in the multiple graph networks as a node feature matrix; labeling nodes in the graph networks that belong to the minimum connected dominating set as 1 and nodes that do not belong as 0 to obtain node label values; training the graph convolutional neural network based on the node feature matrix and the node label values, wherein the graph convolutional neural network uses ReLU as the activation function, the convolutional layers include dropout, and the log-likelihood function is used as the loss function.

[0019] Optionally, the simplified graph generation module is specifically used to: obtain any two nodes in the minimum connected dominance set corresponding to the graph network to be identified; determine whether there is an edge between the corresponding two nodes in the graph network to be identified; if there is, add the edge between the two nodes; repeat the above steps until all nodes in the minimum connected dominance set have been traversed to obtain the simplified graph of the graph network to be identified.

[0020] Optionally, the classification and grading module is specifically used to: calculate the largest common subgraph between the simplified graph of the graph network to be identified and the simplified graph corresponding to the standard sensitivity level data; calculate the number of nodes in the simplified graph of the graph network to be identified and the simplified graph corresponding to the standard sensitivity level data; calculate the matching degree based on the maximum value between the largest common subgraph and the number of nodes, and classify the user data to be identified as the sensitive result corresponding to the standard sensitivity level data with the highest matching degree.

[0021] Optionally, calculating the maximum common subgraph between the simplified graph of the graph network to be identified and the simplified graph corresponding to the standard sensitivity level data includes: calculating the common nodes between the simplified graph of the graph network to be identified and the simplified graph corresponding to the standard sensitivity level data; calculating the common edges between the simplified graph of the graph network to be identified and the simplified graph corresponding to the standard sensitivity level data based on the common nodes; and determining the maximum common subgraph based on the common nodes and common edges.

[0022] A third aspect of the present invention provides a computer-readable storage medium storing computer instructions for causing the computer to perform the graph learning-based method for identifying sensitive user data of a virtual power plant as described in the first aspect and any one of the first aspects of the present invention.

[0023] A fourth aspect of the present invention provides an electronic device, including: a memory and a processor, wherein the memory and the processor are communicatively connected to each other, the memory stores computer instructions, and the processor executes the computer instructions to perform the graph learning-based virtual power plant user sensitive data identification method as described in the first aspect and any one of the first aspects of the present invention.

[0024] The technical solution provided by this invention has the following effects:

[0025] The present invention provides a method, apparatus, and storage medium for identifying sensitive user data in virtual power plants based on graph learning. The method involves: acquiring user data to be identified; generating a graph network to be identified based on the relationships between keywords in the user data; inputting the graph network to be identified into a pre-trained graph convolutional neural network model to obtain the minimum connected dominating set corresponding to the graph network; generating a simplified graph of the graph network to be identified based on the minimum connected dominating set; and calculating the matching degree between the simplified graph of the graph network to be identified and the simplified graph corresponding to the standard sensitivity level data to obtain the sensitivity identification result of the user data. This achieves the generation of the minimum connected dominating set through graph learning, reducing the complexity of subsequent graph matching, improving matching speed, and exhibiting higher efficiency compared to traditional sensitivity identification methods. Attached Figure Description

[0026] To more clearly illustrate the specific embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0027] Figure 1 This is a flowchart of a graph learning-based method for identifying sensitive user data in a virtual power plant according to an embodiment of the present invention.

[0028] Figure 2 This is a structural block diagram of a graph learning-based virtual power plant user sensitive data identification device according to an embodiment of the present invention;

[0029] Figure 3 This is a schematic diagram of the structure of a computer-readable storage medium provided according to an embodiment of the present invention;

[0030] Figure 4 This is a schematic diagram of the structure of an electronic device provided according to an embodiment of the present invention. Detailed Implementation

[0031] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.

[0032] The terms "first," "second," "third," "fourth," etc., used in the specification, claims, and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0033] According to an embodiment of the present invention, a method for identifying sensitive user data of a virtual power plant based on graph learning is provided. It should be noted that the steps shown in the flowchart in the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions. Furthermore, although a logical order is shown in the flowchart, in some cases, the steps shown or described may be executed in a different order than that shown here.

[0034] This embodiment provides a graph learning-based method for identifying sensitive user data in virtual power plants, which can be used in electronic devices such as computers, mobile phones, and tablets. Figure 1 This is a flowchart of a method for identifying sensitive user data in a virtual power plant based on graph learning, according to an embodiment of the present invention. Figure 1 As shown, the method includes the following steps:

[0035] Step S101: Obtain user data to be identified. The user data to be identified may be user data requiring sensitive identification within a complex virtual power plant. This embodiment of the invention does not specifically limit the method and source of obtaining the user data to be identified.

[0036] Sensitive identification primarily involves assessing the sensitivity of data. Based on the importance of user power data and the impact and harm that a leak would cause to national security, social order, business operations, and public interests, the data is categorized into four sensitivity levels: severe, high, medium, and low.

[0037] Step S102: Generate a graph network to be identified based on the relationships between keywords in the user data to be identified. Specifically, the user data to be identified may include a large amount of data. In order to perform sensitive identification of the user data to be identified more quickly, the keywords in the user data to be identified are first obtained, and then the keywords are used as nodes and the contextual relationships between keywords are used as edges to generate a graph network to be identified.

[0038] Step S103: Input the graph network to be identified into a pre-trained graph convolutional neural network model to obtain the minimum connected dominating set corresponding to the graph network to be identified. Specifically, the minimum connected dominating set of the graph network to be identified can be quickly determined through the pre-trained graph convolutional neural network model. Here, the dominating set is the set of nodes that act as backbone nodes in the graph network. If the nodes in the dominating set are connected, then the dominating set is called a connected dominating set. Among all connected dominating sets, the connected dominating set with the fewest nodes is called the minimum connected dominating set.

[0039] Step S104: Generate a simplified graph of the graph network to be identified based on the minimum connected dominating set corresponding to the graph network to be identified. By comparing the graph network to be identified with the minimum connected dominating set and retaining the edges between nodes in the minimum connected dominating set, a simplified graph structure representation of the data can be generated, i.e., a simplified graph of the graph network to be identified, which facilitates subsequent graph matching.

[0040] Step S105: Calculate the matching degree between the simplified graph of the graph network to be identified and the simplified graph corresponding to the standard sensitivity level data to obtain the sensitivity identification result of the user data to be identified. Specifically, the standard sensitivity level data includes data for each level determined according to the sensitivity level in step S101. The method for determining the simplified graph of the standard sensitivity level data is the same as the method for determining the simplified graph of the graph network to be identified, and will not be repeated here.

[0041] When calculating the matching degree, the matching degree between the simplified graph of the network to be identified and the simplified graph of the standard sensitivity level data at each level is calculated. The maximum value between the calculated matching degrees is determined. Then, the user data to be identified is at the same level as the standard sensitivity level data corresponding to the maximum value, thereby completing the sensitivity identification of the user data to be identified.

[0042] The graph learning-based method for identifying sensitive user data in virtual power plants provided in this invention involves: acquiring user data to be identified; generating a graph network to be identified based on the relationships between keywords in the user data; inputting the graph network to be identified into a pre-trained graph convolutional neural network model to obtain the minimum connected dominating set corresponding to the graph network; generating a simplified graph of the graph network to be identified based on the minimum connected dominating set; and calculating the matching degree between the simplified graph of the graph network to be identified and the simplified graph corresponding to the standard sensitivity level data to obtain the sensitivity identification result of the user data to be identified. This method achieves the generation of the minimum connected dominating set through graph learning, reducing the complexity of subsequent graph matching, improving matching speed, and exhibiting higher efficiency compared to traditional sensitivity identification methods.

[0043] In one embodiment, generating a graph network to be identified based on the relationships between keywords in the user data to be identified includes the following steps:

[0044] Step S201: Based on word frequency statistics, segment the user data to be identified to generate a keyword list corresponding to the user data. Specifically, words are the basic information units of text, but unlike English words, Chinese words are not clearly separated, so word segmentation is required to select words. Words are stable combinations of characters; the more frequently a certain combination of characters appears in the context, the more likely it is to be a word.

[0045] To represent the frequency of a certain combination, we need to calculate the likelihood of a string W, expressed by the following formula:

[0046]

[0047] In the formula, W i Each character in the string is represented by a parameter n, which represents the number of characters in the word. In the ternary model, n = 3, and P(W) represents the probability of occurrence of the string W.

[0048] To calculate the probability of occurrence of a string W, assume a certain character W i The probability of its occurrence depends only on the two words preceding it, resulting in a ternary model:

[0049]

[0050] Where P(W3|W1W2) represents the probability that the next character is W3 given that the combination of characters W1W2 has occurred, and is calculated using the following formula:

[0051]

[0052] In the formula, count(...) represents the number of times a specific word sequence appears in the text.

[0053] Therefore, word segmentation of the user data to be identified was achieved through the aforementioned word frequency statistics method. After word segmentation, a keyword list was constructed by selecting several words with a frequency greater than a preset threshold. This preset threshold can be determined in advance based on actual circumstances.

[0054] Step S202: Traverse the user data to be identified, extract words that match the keyword list and the relationships between words, and generate a graph network to be identified. The graph network includes nodes and edges. Vertices represent words, and edges represent the proximity relationships between words. The specific generation process of the graph is as follows: Based on the keyword list obtained in the previous step, the user data to be identified is filtered, and words that match the keyword list are read and extracted and placed into the graph network. The nodes of the graph network are composed of keywords matched in the user data to be identified. Nodes corresponding to two adjacent keywords are directly connected by an edge, that is, the edges between nodes represent the contextual relationship of the keywords. Finally, the graph form of the user data to be identified is obtained, i.e., the graph network to be identified.

[0055] In one embodiment, the graph convolutional neural network model is trained as follows: multiple graph networks are obtained, each graph network including nodes and edges; the minimum spanning tree Prim algorithm is used to generate the minimum connected dominating set of each graph network; the graph convolutional neural network is trained based on the multiple graph networks and the corresponding minimum connected dominating sets to obtain a pre-trained graph convolutional neural network model.

[0056] Specifically, the minimum connected dominating set is determined using the Prim algorithm for minimum spanning trees. This algorithm uses the sum of the degrees of the two unordered node pairs corresponding to each edge in the graph network as the weight of that edge, generating the maximum spanning tree with the largest sum of weights. Here, a spanning tree represents a subgraph that contains all vertices of the connected graph, and there is exactly one path between any two vertices.

[0057] The specific calculation process of Prim's algorithm for minimum spanning trees is as follows: Calculate the number of edges connected to each node in the graph network as the degree of each node; calculate the sum of the degrees of the nodes connected to each edge in the graph network as the weight of each edge; starting from a pre-selected node in the graph network, select the edge with the largest weight among the edges connected to that node, and add the other vertex connected to that edge to the selected set; find the edge with the largest weight in the graph network where one node has been selected and another node has not been selected, and add the unselected node connected to that edge to the selected node set; continue until all nodes have been selected, obtaining the selected node set as the minimum connected dominance set; where the pre-selected node can be any randomly selected node in the graph network.

[0058] In one embodiment, training a graph convolutional neural network based on multiple graph networks and their corresponding minimum connected dominating sets includes the following steps:

[0059] Step S301: Use the degree of each node in the multiple graph networks as the node feature matrix. Specifically, use a one-dimensional feature matrix instead of a diagonal matrix that places the node degrees on the diagonal. For example, a graph with 100 nodes corresponds to a 100x1 feature matrix.

[0060] Step S302: Mark nodes in the graph network that belong to the minimum connected dominance set as 1 and those that do not as 0, thus obtaining the node label values. By setting the label values, the trained graph neural convolutional network can model a binary classification problem for a node. Since graph convolutional neural networks are not suitable for inductive learning, a jigsaw puzzle approach is needed to piece together the training graph.

[0061] Step S303: Train the graph convolutional neural network based on the node feature matrix and the node label values. The graph convolutional neural network uses ReLU as the activation function, includes dropout in the convolutional layers, and uses the log-likelihood function as the loss function. ReLU is chosen as the activation function to avoid the vanishing gradient problem caused by an increase in neural network connections and uneven distribution of training data. Dropout is introduced in the convolutional layers to avoid overfitting and reduce the complexity of the graph convolutional neural network. The log-likelihood function is used as the loss function to introduce weights to the loss, causing the model to incur a larger loss when predicting a 1 as 0, and a smaller loss when predicting a 0 as 1, thus balancing the relatively small number of 0 samples in the training samples.

[0062] Specifically, multiple graph networks can be 1000 randomly generated graphs of 100 nodes each, whose minimum connected dominating set nodes are known. This information is used as the label input to the graph convolutional neural network for training nodes. The label value is 1 for nodes in the minimum connected dominating set and 0 for the rest. These multiple graph networks constitute the training set. The test set is a collection of graphs whose minimum connected dominating sets are to be determined; the size and number of graphs are arbitrary. The adjacency matrix of the test set and the training set, along with the one-dimensional node feature matrix, are concatenated and used as the input to the first convolutional layer of the model. This is multiplied by a randomly initialized 1x8 dimensional weight matrix W1, and then passed through a ReLU activation function, resulting in an 8-dimensional feature representation for each node. This 8-dimensional feature representation and the adjacency matrix are used as the input to the second convolutional layer, multiplied by an 8x2 dimensional weight matrix W2, and activated by an activation function, resulting in a 2-dimensional output feature for each node. The training node label matrix is ​​expanded by subtracting the values ​​of the original dimensions from the new dimensions, resulting in a two-dimensional matrix. Calculate the cross-entropy loss between the two-dimensional label matrix and the feature matrix corresponding to the training samples output by the two-layer model, and update the weight matrices W1 and W2 in reverse with a learning rate of 0.01 and a dropout rate of 0.2 until the loss value converges.

[0063] In one embodiment, a simplified graph of the network to be identified is generated based on the minimum connected dominating set corresponding to the network to be identified. This includes: obtaining any two nodes from the minimum connected dominating set corresponding to the network to be identified; determining whether there is an edge between these two nodes in the network to be identified; if so, adding an edge between the two nodes; repeating the above steps until all nodes in the minimum connected dominating set have been traversed to obtain the simplified graph of the network to be identified. Specifically, the simplified graph is generated by traversing the nodes v of the minimum connected dominating set. i and v j If v in the graph network to be identified i and v j If there is an edge between v, then v i and v j Add an edge between them.

[0064] In one embodiment, the matching degree is calculated between the simplified graph of the network to be identified and the simplified graph corresponding to the standard sensitivity level data to obtain the sensitivity identification result of the user data to be identified, including the following steps:

[0065] Step S401: Calculate the maximum common subgraph between the simplified graph of the network to be identified and the simplified graph corresponding to the standard sensitivity level data; specifically, the maximum common subgraph can be considered as the intersection of the two simplified graphs. The specific calculation method for the maximum common subgraph is as follows:

[0066] Calculate the common nodes between the simplified graph of the graph network to be identified and the simplified graph corresponding to the standard sensitivity level data; to determine the common nodes, traverse all nodes of the simplified graph g of the graph network to be identified and the simplified graph g' corresponding to the standard sensitivity level data, respectively. If v∈V g And v∈V g’ If v is a common node of graph g and graph g', then v is a common node of graph g and graph g'.

[0067] Based on the shared nodes, calculate the shared edges between the simplified graph of the network to be identified and the simplified graph corresponding to the standard sensitivity level data; when determining the shared edges, traverse any two nodes v of the shared nodes. i and v j If v i and v j If there is an edge in both graph g and graph g', then this is a shared edge between graph g and graph g'.

[0068] The maximum common subgraph is determined based on the shared edges. After determining the shared edges, the nodes corresponding to the shared edges constitute the maximum common subgraph.

[0069] Step S402: Calculate the number of nodes in the simplified graph of the network to be identified and the simplified graph corresponding to the standard sensitivity level data.

[0070] Step S403: Calculate the matching degree based on the maximum value between the largest common subgraph and the number of nodes, and classify the user data to be identified as the sensitive result corresponding to the standard sensitive level data with the highest matching degree.

[0071] The matching degree is calculated using the following formula:

[0072]

[0073] Where mcs(g,g′) is the maximum common subgraph of graph g and graph g′, and |g|,|g′| represent the number of nodes in graph g and graph g′, respectively.

[0074] Specifically, in order to perform sensitive identification on the user data to be identified, the matching degree calculation formula mentioned above is used to calculate the matching degree between the simplified graph of the graph network to be identified and the simplified graph of the standard sensitive level data corresponding to each level, resulting in multiple matching degree values. Then, the maximum value among the multiple matching degree values ​​is selected, and the user data to be identified belongs to the level corresponding to the maximum value, thereby completing the sensitive identification of the user data to be identified.

[0075] This invention also provides a graph learning-based device for identifying sensitive user data in virtual power plants, such as... Figure 2 As shown, the device includes:

[0076] The data acquisition module is used to acquire user data to be identified; for details, please refer to the corresponding part of the above method embodiment, which will not be repeated here.

[0077] The graph network generation module is used to generate a graph network to be identified based on the relationship between keywords in the user data to be identified; for details, please refer to the corresponding part of the above method embodiment, which will not be repeated here.

[0078] The dominance set generation module is used to input the graph network to be identified into a pre-trained graph convolutional neural network model to obtain the minimum connected dominance set corresponding to the graph network to be identified; for details, please refer to the corresponding part of the above method embodiment, and will not be repeated here.

[0079] The simplified graph generation module is used to generate a simplified graph of the network to be identified based on the minimum connected dominating set corresponding to the network to be identified; for details, please refer to the corresponding part of the above method embodiment, and will not be repeated here.

[0080] The classification and grading module is used to calculate the matching degree between the simplified graph of the graph network to be identified and the simplified graph corresponding to the standard sensitivity level data, so as to obtain the sensitivity identification result of the user data to be identified. For details, please refer to the corresponding section of the above method embodiment, which will not be repeated here.

[0081] The graph learning-based virtual power plant user sensitive data identification device provided in this invention acquires user data to be identified; generates a graph network to be identified based on the relationships between keywords in the user data; inputs the graph network to be identified into a pre-trained graph convolutional neural network model to obtain the minimum connected dominating set corresponding to the graph network; generates a simplified graph of the graph network to be identified based on the minimum connected dominating set; and calculates the matching degree between the simplified graph of the graph network to be identified and the simplified graph corresponding to the standard sensitivity level data to obtain the sensitivity identification result of the user data to be identified. This achieves the generation of the minimum connected dominating set through graph learning, reducing the complexity of subsequent graph matching, improving matching speed, and exhibiting higher efficiency compared to traditional classification and grading methods.

[0082] Optionally, the graph network generation module is specifically used for: performing word segmentation processing on the user data to be identified based on word frequency statistics to generate a keyword list corresponding to the user data to be identified; traversing the user data to be identified, extracting words that match the keyword list and the relationships between words in the user data to be identified, and generating a graph network to be identified, the graph network to be identified including nodes and edges.

[0083] Optionally, the graph convolutional neural network model is trained in the following manner: multiple graph networks are obtained, each graph network including nodes and edges; the minimum spanning tree Prim algorithm is used to generate the minimum connected dominating set of each graph network; the graph convolutional neural network is trained based on the multiple graph networks and the corresponding minimum connected dominating sets to obtain a pre-trained graph convolutional neural network model.

[0084] Optionally, training a graph convolutional neural network based on multiple graph networks and their corresponding minimum connected dominating sets includes: using the degree of each node in the multiple graph networks as a node feature matrix; labeling nodes in the graph networks that belong to the minimum connected dominating set as 1 and nodes that do not belong as 0 to obtain node label values; training the graph convolutional neural network based on the node feature matrix and the node label values, wherein the graph convolutional neural network uses ReLU as the activation function, the convolutional layers include dropout, and the log-likelihood function is used as the loss function.

[0085] Optionally, the simplified graph generation module is specifically used to: obtain any two nodes in the minimum connected dominance set corresponding to the graph network to be identified; determine whether there is an edge between the corresponding two nodes in the graph network to be identified; if there is, add the edge between the two nodes; repeat the above steps until all nodes in the minimum connected dominance set have been traversed to obtain the simplified graph of the graph network to be identified.

[0086] Optionally, the classification and grading module is specifically used to: calculate the largest common subgraph between the simplified graph of the graph network to be identified and the simplified graph corresponding to the standard sensitivity level data; calculate the number of nodes in the simplified graph of the graph network to be identified and the simplified graph corresponding to the standard sensitivity level data; calculate the matching degree based on the maximum value between the largest common subgraph and the number of nodes, and classify the user data to be identified as the sensitive result corresponding to the standard sensitivity level data with the highest matching degree.

[0087] Optionally, calculating the maximum common subgraph between the simplified graph of the graph network to be identified and the simplified graph corresponding to the standard sensitivity level data includes: calculating the common nodes between the simplified graph of the graph network to be identified and the simplified graph corresponding to the standard sensitivity level data; calculating the common edges between the simplified graph of the graph network to be identified and the simplified graph corresponding to the standard sensitivity level data based on the common nodes; and determining the maximum common subgraph based on the common nodes and common edges.

[0088] For a detailed description of the function of the graph learning-based virtual power plant user sensitive data identification device provided in this embodiment of the invention, please refer to the graph learning-based virtual power plant user sensitive data identification method description in the above embodiments.

[0089] This invention also provides a storage medium, such as... Figure 3 As shown, a computer program 601 is stored on it. When executed by a processor, this program implements the steps of the graph learning-based virtual power plant user sensitive data identification method described in the above embodiments. The storage medium also stores audio and video stream data, feature frame data, interactive request signaling, encrypted data, and preset data sizes. The storage medium can be a magnetic disk, optical disk, read-only memory (ROM), random access memory (RAM), flash memory, hard disk drive (HDD), or solid-state drive (SSD), etc.; the storage medium may also include combinations of the above types of memory.

[0090] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The program can be stored in a computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. The storage medium can be a magnetic disk, optical disk, read-only memory (ROM), random access memory (RAM), flash memory, hard disk drive (HDD), or solid-state drive (SSD), etc.; the storage medium can also include combinations of the above types of memory.

[0091] This invention also provides an electronic device, such as... Figure 4 As shown, the electronic device may include a processor 51 and a memory 52, wherein the processor 51 and the memory 52 may be connected via a bus or other means. Figure 4 Taking the example of a connection between China and Israel via a bus.

[0092] Processor 51 can be a central processing unit (CPU). Processor 51 can also be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or combinations of the above types of chips.

[0093] The memory 52, as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as the corresponding program instructions / modules in the embodiments of the present invention. The processor 51 executes various functional applications and data processing by running the non-transitory software programs, instructions, and modules stored in the memory 52, thereby implementing the graph learning-based virtual power plant user sensitive data identification method in the above method embodiments.

[0094] The memory 52 may include a program storage area and a data storage area. The program storage area may store applications required for operating the device and at least one function; the data storage area may store data created by the processor 51, etc. Furthermore, the memory 52 may include high-speed random access memory and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, the memory 52 may optionally include memory remotely located relative to the processor 51, and these remote memories may be connected to the processor 51 via a network. Examples of such networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.

[0095] The one or more modules are stored in the memory 52, and when executed by the processor 51, they perform the following: Figure 1 The illustrated embodiment presents a graph learning-based method for identifying sensitive user data in a virtual power plant.

[0096] For specific details regarding the aforementioned electronic devices, please refer to the relevant documentation. Figure 1 The relevant descriptions and effects in the illustrated embodiments are for understanding purposes only and will not be repeated here.

[0097] Although embodiments of the invention have been described in conjunction with the accompanying drawings, those skilled in the art can make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations all fall within the scope defined by the appended claims.

Claims

1. A method for identifying sensitive user data in a virtual power plant based on graph learning, characterized in that, include: Obtain user data to be identified; Generate a graph network to be identified based on the relationships between keywords in the user data to be identified; The graph network to be identified is input into a pre-trained graph convolutional neural network model to obtain the minimum connected dominance set corresponding to the graph network to be identified. A simplified graph of the network to be identified is generated based on the minimum connected dominating set corresponding to the network to be identified. The matching degree is calculated between the simplified graph of the graph network to be identified and the simplified graph corresponding to the standard sensitivity level data to obtain the sensitivity identification result of the user data to be identified. A simplified graph of the network to be identified is generated based on the minimum connected dominating set corresponding to the network to be identified, including: Obtain any two nodes from the minimum connected dominance set corresponding to the graph network to be identified; Determine whether there is an edge between two corresponding nodes in the graph network to be identified; If they exist, add an edge between the two nodes; Repeat the above steps until all nodes in the minimum connected dominance set have been traversed, resulting in a simplified graph of the network to be identified.

2. The method for identifying sensitive user data in a virtual power plant based on graph learning according to claim 1, characterized in that, Generate a graph network to be identified based on the relationships between keywords in the user data to be identified, including: Based on word frequency statistics, the user data to be identified is segmented into words to generate a list of keywords corresponding to the user data to be identified. Traverse the user data to be identified, extract words that match the keyword list and the relationships between words, and generate a graph network to be identified, which includes nodes and edges.

3. The method for identifying sensitive user data in a virtual power plant based on graph learning according to claim 1, characterized in that, The graph convolutional neural network model is trained in the following way: Obtain multiple graph networks, where each graph network includes nodes and edges; The minimum spanning tree Prim algorithm is used to generate the minimum connected dominance set for each graph network; The graph convolutional neural network is trained using multiple graph networks and their corresponding minimum connected dominating sets to obtain a pre-trained graph convolutional neural network model.

4. The method for identifying sensitive user data in a virtual power plant based on graph learning according to claim 3, characterized in that, Training a graph convolutional neural network based on multiple graph networks and their corresponding minimum connected dominating sets includes: The degree of each node in multiple graph networks is used as the node feature matrix; In the graph network, nodes that belong to the minimum connected dominance set are marked as 1, and nodes that do not belong are marked as 0, thus obtaining the node label values. The graph convolutional neural network is trained based on the node feature matrix and the node label values. The graph convolutional neural network uses ReLU as the activation function, includes dropout in the convolutional layers, and uses the log-likelihood function as the loss function.

5. The method for identifying sensitive user data in a virtual power plant based on graph learning according to claim 1, characterized in that, The matching degree is calculated between the simplified graph of the network to be identified and the simplified graph corresponding to the standard sensitivity level data to obtain the sensitivity identification results of the user data to be identified, including: Calculate the largest common subgraph between the simplified graph of the graph network to be identified and the simplified graph corresponding to the standard sensitivity level data; Calculate the number of nodes in the simplified graph of the network to be identified and the simplified graph corresponding to the standard sensitivity level data; The matching degree is calculated based on the maximum value between the largest common subgraph and the number of nodes, and the user data to be identified is classified into the sensitive results corresponding to the standard sensitive level data with the highest matching degree.

6. The method for identifying sensitive user data in a virtual power plant based on graph learning according to claim 5, characterized in that, Calculate the largest common subgraph between the simplified graph of the network to be identified and the simplified graph corresponding to the standard sensitivity level data, including: Calculate the common nodes of the simplified graph of the network to be identified and the simplified graph corresponding to the standard sensitivity level data; Based on the shared nodes, calculate the shared edges of the simplified graph of the network to be identified and the simplified graph corresponding to the standard sensitivity level data; The maximum common subgraph is determined based on the shared nodes and shared edges.

7. A virtual power plant user sensitive data identification device based on graph learning, characterized in that, include: The data acquisition module is used to acquire user data to be identified; The graph network generation module is used to generate a graph network to be identified based on the relationships between keywords in the user data to be identified. The dominance set generation module is used to input the graph network to be identified into a pre-trained graph convolutional neural network model to obtain the minimum connected dominance set corresponding to the graph network to be identified. The simplified graph generation module is used to generate a simplified graph of the network to be identified based on the minimum connected dominating set corresponding to the network to be identified. The classification and grading module is used to calculate the matching degree between the simplified graph of the graph network to be identified and the simplified graph corresponding to the standard sensitivity level data, so as to obtain the sensitivity identification result of the user data to be identified. A simplified graph of the network to be identified is generated based on the minimum connected dominating set corresponding to the network to be identified, including: Obtain any two nodes from the minimum connected dominance set corresponding to the graph network to be identified; Determine whether there is an edge between two corresponding nodes in the graph network to be identified; If they exist, add an edge between the two nodes; Repeat the above steps until all nodes in the minimum connected dominance set have been traversed, resulting in a simplified graph of the network to be identified.

8. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer instructions for causing the computer to perform the graph learning-based method for identifying sensitive user data in a virtual power plant as described in any one of claims 1-6.

9. An electronic device, characterized in that, include: The system includes a memory and a processor, which are communicatively connected to each other. The memory stores computer instructions, and the processor executes the computer instructions to perform the graph learning-based method for identifying sensitive user data in a virtual power plant as described in any one of claims 1-6.