A method and system for constructing a large power data model

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By combining graph databases and the PageRank algorithm with an intelligent document analysis platform, the problem of traditional power data analysis methods being unable to integrate power system data and document information has been solved. This has enabled efficient analysis of power system data and accurate model construction, thereby improving the system's operational efficiency and security.

CN117520599BActive Publication Date: 2026-06-30ELECTRIC POWER RES INST CHINA SOUTHERN POWER GRID CO LTD +1

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: ELECTRIC POWER RES INST CHINA SOUTHERN POWER GRID CO LTD
Filing Date: 2023-09-27
Publication Date: 2026-06-30

Application Information

Patent Timeline

27 Sep 2023

Application

30 Jun 2026

Publication

CN117520599B

IPC: G06F16/901; G06F16/25; G06F40/279; G06Q10/067; G06Q50/06; H02J3/00; H02J103/30; H02J103/35

AI Tagging

Technology Topics

Document analysisTheoretical computer science

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Semantic content fingerprinting method and apparatus
CN122122580ASemantic analysis Program/content distribution protection Natural language processingDocument analysis
A method for analyzing bidding documents based on an engineering cost knowledge graph
CN122364690ADocument analysisTheoretical computer science
Sequence-based encoder-decoder approach for handwritten mathematical expression recognition
CN122336779AEncoder decoderDocument analysis
Method and system for identifying and displaying similar clauses in structured documents
US20260140992A1Natural language translation Office automation Natural language processingDocument analysis
system
JP2026101178AData processing applicationsDocument analysisApplication procedure

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Traditional power data analysis methods are ill-suited to the diversity and dynamism of power system data, and cannot effectively integrate operational data with relevant documentation to construct a comprehensive data model that reflects the state of the power system.

Method used

Power system data is collected and preprocessed, topology analysis is performed using graph databases and the PageRank algorithm, and text information is extracted using an intelligent document analysis platform to construct a large power data model.

Benefits of technology

It enables efficient integration and analysis of power system data, provides more accurate technical support, and improves the operating efficiency and security of the power system.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN117520599B_ABST

Patent Text Reader

Abstract

This invention discloses a method and system for constructing a large-scale power data model, belonging to the field of power system data analysis and model construction technology. It includes: collecting power system operation data and performing data preprocessing to construct a power system data model; performing topological analysis on data in a graph database based on the power system's topology and operating status, calculating the topological parameters of each node and edge; using an intelligent document analysis platform to parse the content of power-related documents and extract textual information related to the power system's operation data; and associating the extracted textual information with data in the graph database to construct a large-scale power data model. This invention ensures the quality of power system data through data preprocessing technology and accurately captures the structure and operating status of the power system through topological analysis. It also utilizes intelligent document parsing to mine key textual information related to power system operation, providing more precise technical support for power system decision-making.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of power system data analysis and model building technology, specifically to a method and system for building large power data models. Background Technology

[0002] With the increasing complexity of power systems, accurate and efficient analysis and prediction of power system operational data have become crucial. Traditional power data analysis methods often rely on fixed models and algorithms, making it difficult to adapt to the diversity and dynamism of power system data. Furthermore, power systems not only generate a large amount of operational data but also a wealth of related documentation, such as operation manuals, maintenance records, and fault reports. Effectively integrating this data and documentation to construct a comprehensive large-scale data model that fully reflects the state of the power system is a significant challenge currently facing the field of power data analysis. Therefore, developing a new power data large-scale model construction technology capable of comprehensively processing and analyzing power system operational data and related documentation is of great importance for improving the operational efficiency and safety of power systems. Summary of the Invention

[0003] Therefore, the technical problem solved by this invention is: how to comprehensively process and analyze the operating data and related document information of the power system in order to construct a large data model that can comprehensively and accurately reflect the state of the power system.

[0004] To address the aforementioned technical problems, this invention provides the following technical solution: Collecting operational data from the power system and performing data preprocessing to construct a power system data model; performing topological analysis on the data in the graph database based on the power system's topology and operating status, calculating the topological parameters of each node and edge; using an intelligent document analysis platform to parse the content of power-related documents and extract textual information related to the power system's operational data; and associating the extracted textual information with the data in the graph database to construct a large-scale power data model.

[0005] As a preferred embodiment of the power data large-scale model construction method described in this invention, the following steps are taken: the collected power system operation data includes voltage, current, power, and frequency; the data preprocessing involves denoising the collected periodic state data, removing missing values, outliers, and invalid data with incorrect formats, adjusting the brightness and contrast of the image, converting the original format data into the format required for demand analysis, normalizing the data to complete the data preprocessing, and storing it in a graph database. The nodes and edges of the graph database are used to represent the components and connections of the power system, thereby constructing a power system data model based on a graph structure.

[0006] As a preferred embodiment of the power data large model construction method described in this invention, the collected power system operation data is imported into a graph database, and corresponding nodes and edges are created. Based on the topology and operating status of the power system, the PageRank algorithm is used to perform topological analysis on the data in the graph database.

[0007] The PageRank algorithm expression is as follows:

[0008]

[0009] Where PR′(u) represents the PageRank value of node u after optimization, d represents the damping coefficient, N represents the total number of nodes in the graph, P(c) represents the probability that any node is visited, and B u Let E represent the set of nodes pointing to node u, w(e) represent the weight of the edge connecting two nodes to e, and w(e′) represent the weight of a node and an edge. v Let v represent the set of edges originating from node v, PR′(v) represent the optimized PageRank value of node v, and T(v) represent the threshold function.

[0010] As a preferred embodiment of the power data large-scale model construction method described in this invention, the following steps are taken: Cross-validation is used to divide the historical data of the power system into a training set and a validation set. The damping coefficient is evaluated using the validation set. When the accuracy is greater than a set threshold, the current PageRank algorithm continues the topology analysis of the power system. When the accuracy is less than or equal to the threshold, the importance of nodes and their status in the power system are incorporated into the PageRank algorithm for optimization. The expression is:

[0011]

[0012] Where I(u) represents the importance of node u, S(u) represents the operating state of the power system at node u, and λ and μ are the importance weight coefficient of node u and the importance weight coefficient of the operating state of node u, respectively.

[0013] As a preferred embodiment of the method for constructing a large power data model according to the present invention, the calculation of the topological parameters of each node and edge is based on the node numerical output of the iterated PageRank algorithm, and the expression is:

[0014] TP(u) = PR′(u) × T(u)

[0015] TP(e)=W(e)×PR′(u)×PR′(v)

[0016] Where TP(u) represents the topology parameters of node u, TP(e) represents the topology parameters of edge e, and T(u) represents the voltage or frequency magnitude of each node.

[0017] As a preferred embodiment of the power data large-scale model construction method described in this invention, the following steps involve: parsing power-related documents using an intelligent document analysis platform. This includes transmitting collected and organized power-related documents to the platform, extracting key information from the documents using machine learning as a reference dataset, determining the type of power-related documents based on their format, structure, and content characteristics, further extracting and analyzing key information from the current documents, statistically analyzing various fault conditions and their corresponding maintenance times, using association rule mining algorithms to analyze the relationships between different devices, operations, and faults, calculating the topological parameters of each node and edge using the PageRank algorithm, transmitting the output results to the intelligent document analysis platform, and automatically matching the reference dataset using the topological parameters to construct a power data large-scale model.

[0018] As a preferred embodiment of the power data large model construction method described in this invention, the key information includes the operating status of the power system, fault conditions, and maintenance records, and the power-related documents include operation manuals, maintenance records, fault reports, and system logs.

[0019] The intelligent document analysis platform automatically matches the reference dataset with topology parameters and outputs the matching results as predicted values. It receives feedback information in real time, labels and records it, and judges the mean square error and absolute error between the predicted results and the actual values to quantify the accuracy of the prediction. When the error value exceeds a preset value, the platform analyzes the duration and frequency of the prediction error. If the prediction error is instantaneous or the frequency is less than the preset value, it is classified as an occasional error. If the prediction error is continuous or the frequency is greater than or equal to the preset value, it is determined to be a systematic error. The impact on the power system is further assessed. By comparing the prediction results with historical data and the reference dataset, the platform determines the degree of impact on system stability, assigns a comprehensive score, and formulates corresponding operation and maintenance decisions. During the analysis process in the intelligent document analysis platform, the platform screens unsafe operations in the current data implementation steps based on the reference dataset to trace the causes of problems, updates the reference dataset, and optimizes the power data model.

[0020] Another objective of this invention is to provide a power data large-scale model construction system that can automatically construct and optimize power data large-scale models by integrating data acquisition, preprocessing, topology analysis, and intelligent document parsing technologies. This solves the problem that traditional methods struggle to comprehensively process and analyze power system operational data and related document information, providing strong technical support for power system decision-making.

[0021] To solve the above-mentioned technical problems, the present invention provides the following technical solution: a power data large model construction system, including a data processing and storage module, a topology analysis and optimization module, an intelligent document parsing module, an association and model construction module, and a feedback and adaptive update module.

[0022] The data processing and storage module is responsible for collecting the operating data of the power system, performing data preprocessing, and storing the processed data in the graph database.

[0023] The topology analysis and optimization module performs topology analysis on the data in the graph database based on the topology and operating status of the power system, calculates the topology parameters of each node and edge, and optimizes the model when the accuracy of the model does not meet the preset threshold.

[0024] The intelligent document parsing module parses the content of power-related documents and extracts key text information related to power system operation data.

[0025] The association and model building module associates the extracted text information with data in the graph database, uses machine learning methods for data analysis and prediction, and forms a large power data model containing multi-dimensional information.

[0026] The feedback and adaptive update module receives the model's prediction results in real time and performs adaptive updates and optimizations of the model based on the feedback information.

[0027] A computer device includes a memory and a processor, the memory storing a computer program, characterized in that the processor executes the computer program to implement the steps of the method described above.

[0028] A computer-readable storage medium having a computer program stored thereon, characterized in that the computer program, when executed by a processor, implements the steps of the method described above.

[0029] The beneficial effects of this invention are as follows: By employing efficient data preprocessing technology, this invention ensures the quality and integrity of power system data, and accurately captures the structure and operating status of the power system through topology analysis. Intelligent document parsing is used to deeply mine key textual information related to power system operation. The combination of these three methods overcomes the challenges of traditional methods in processing complex and dynamic power data, solves the pain points of data integration and real-time analysis, and thus realizes a more detailed, accurate, and responsive large-scale power data model, providing more precise technical support for power system decision-making. Attached Figure Description

[0030] To more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort. Wherein:

[0031] Figure 1 This is an overall flowchart of a method for constructing a large power data model according to an embodiment of the present invention;

[0032] Figure 2 This is a structural diagram of a power data large model construction system provided in the second embodiment of the present invention. Detailed Implementation

[0033] The above-mentioned objects, features, and advantages of the present invention will become more apparent and understandable. The specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present invention, and not all of them. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the protection scope of the present invention.

[0034] Many specific details are set forth in the following description in order to provide a full understanding of the invention. However, the invention may also be practiced in other ways different from those described herein, and those skilled in the art can make similar extensions without departing from the spirit of the invention. Therefore, the invention is not limited to the specific embodiments disclosed below.

[0035] Secondly, the term "one embodiment" or "embodiment" as used herein refers to a specific feature, structure, or characteristic that may be included in at least one implementation of the present invention. The phrase "in one embodiment" appearing in different places in this specification does not necessarily refer to the same embodiment, nor is it a single or selective embodiment that is mutually exclusive with other embodiments.

[0036] This invention is described in detail with reference to the schematic diagrams. When detailing the embodiments of this invention, for ease of explanation, the cross-sectional views illustrating the device structure may be partially enlarged, not adhering to the usual scale. Furthermore, the schematic diagrams are merely examples and should not be construed as limiting the scope of protection of this invention. In actual fabrication, the three-dimensional spatial dimensions of length, width, and depth should be included.

[0037] Furthermore, in the description of this invention, it should be noted that the terms "upper," "lower," "inner," and "outer," etc., indicate the orientation or positional relationship based on the orientation or positional relationship shown in the accompanying drawings. These terms are used solely for the convenience of describing the invention and for simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation. Therefore, they should not be construed as limitations on the invention. In addition, the terms "first," "second," or "third" are used for descriptive purposes only and should not be construed as indicating or implying relative importance.

[0038] Unless otherwise explicitly specified and limited, the terms "installation," "connection," and "joining" in this invention should be interpreted broadly. For example, they can refer to fixed connections, detachable connections, or integral connections; similarly, they can refer to mechanical connections, electrical connections, or direct connections, or indirect connections through an intermediate medium, or internal connections between two components. Those skilled in the art can understand the specific meaning of the above terms in this invention based on the specific circumstances.

[0039] Example 1

[0040] Reference Figure 1 As an embodiment of the present invention, a method for constructing a large power data model is provided, comprising: collecting power system operation data and performing data preprocessing to construct a power system data model; performing topological analysis on the data in the graph database according to the topology and operation status of the power system, and calculating the topological parameters of each node and edge; using an intelligent document analysis platform to parse the content of power-related documents and extract text information related to the power system operation data; and associating the extracted text information with the data in the graph database to construct a large power data model.

[0041] The power system operation data collected includes voltage, current, power, and frequency. Data preprocessing involves denoising the collected periodic state data, removing missing values, outliers, and invalid data with incorrect formats, adjusting image brightness and contrast, converting the original format data into the format required for demand analysis, normalizing the data to complete the preprocessing, and storing it in a graph database. The nodes and edges of the graph database are used to represent the components and connections of the power system, thus constructing a power system data model based on a graph structure.

[0042] The collected power system operation data is imported into a graph database, and corresponding nodes and edges are created. Based on the power system's topology and operating status, the PageRank algorithm is used to perform topology analysis on the data in the graph database.

[0043] The PageRank algorithm expression is as follows:

[0044]

[0045] Where PR′(u) represents the PageRank value of node u after optimization, d represents the damping coefficient, N represents the total number of nodes in the graph, P(c) represents the probability that any node is visited, and B u Let E represent the set of nodes pointing to node u, W(e) represent the weight of the edge connecting two nodes to e, W(e′) represent the weight of a node and an edge, and E v Let v represent the set of edges originating from node v, PR′(v) represent the optimized PageRank value of node v, and T(v) represent the threshold function.

[0046] In modern power systems, due to their complexity and high stability requirements, relying solely on traditional network topology analysis methods is no longer sufficient to meet practical needs. Each node in a power system, such as a substation or power plant, not only has its place in the network structure but also possesses varying importance based on its function and scope of influence. A failure in a major substation can affect the power supply to millions of users.

[0047] In addition, the actual operating conditions of the power system, such as voltage, current and frequency, also have a direct impact on the stability and security of the system. Therefore, two new parameters, the importance of nodes and the operating conditions of the power system at the nodes, are introduced, and their weights in the overall score are adjusted by weighting coefficients.

[0048] Such optimization not only makes the algorithm more targeted, but also provides greater flexibility, enabling it to better adapt to different power system scenarios.

[0049] In a power system, not all nodes are equally important. Some nodes may be critical power supply nodes, while others may be secondary distribution nodes. By introducing a parameter that represents the importance of a node, we can ensure that critical nodes receive higher weights in the PageRank calculation. If a node frequently fails or its power supply capacity is limited, its PageRank value should be reduced accordingly. By introducing a parameter that represents the operating status of a node, we can ensure that the actual operating status of the system is reflected in the PageRank calculation.

[0050] Some nodes may be the main power source for large industrial areas or important facilities. Failure of these nodes may lead to widespread power outages, resulting in huge economic losses. Some nodes may be located on major transportation routes, such as high-speed rail stations and airports. The stable operation of these nodes is crucial to ensuring the normal operation of transportation. When a node is identified as a new critical power supply node or transportation hub, its importance weight is increased; if a node is no longer a critical node, its importance weight is decreased.

[0051] If a node frequently fails, it indicates a potential problem with that node, and its weight should be reduced to reflect its actual importance in the system. Similarly, if some nodes may be temporarily offline due to scheduled maintenance or upgrades, their operational status weight should be temporarily reduced.

[0052] When a node fails, its operational status weight is reduced based on the severity and frequency of the failure. When the node completes maintenance or upgrades and is put back into operation, its operational status weight is restored.

[0053] After the PageRank algorithm outputs its results, cross-validation is used to divide the historical data of the power system into a training set and a validation set. The validation set is used to evaluate the damping coefficient. When the accuracy is greater than a set threshold, the current PageRank algorithm continues the topology analysis of the power system. When the accuracy is less than or equal to the threshold, the importance of nodes and their status in the power system are incorporated into the PageRank algorithm for optimization. The expression is as follows:

[0054]

[0055] Where I(u) represents the importance of node u, S(u) represents the operating state of the power system at node u, and λ and μ are the importance weight coefficient of node u and the importance weight coefficient of the operating state of node u, respectively.

[0056] The topological parameters of each node and edge calculated using the PageRank algorithm are expressed as follows:

[0057] TP(u) = PR′(u) × T(u)

[0058] TP(e)=W(e)×PR′(u)×PR′(v)

[0059] Where TP(u) represents the topology parameters of node u, TP(e) represents the topology parameters of edge e, and T(u) represents the voltage or frequency magnitude of each node.

[0060] The intelligent document analysis platform is used to parse the content of power-related documents. This involves transmitting collected and organized power-related documents to the platform, using machine learning to extract key information as a reference dataset, determining the document type based on its format, structure, and content characteristics, further extracting and analyzing key information from the current document, statistically analyzing various fault scenarios and their corresponding maintenance times, using association rule mining algorithms to analyze the relationships between different devices, operations, and faults, and calculating the topological parameters of each node and edge using the PageRank algorithm. The output results are then transmitted to the intelligent document analysis platform, where the topological parameters are automatically matched to the reference dataset to construct a large-scale power data model.

[0061] Power-related documents are converted into a unified format, and irrelevant information is removed using text cleaning techniques. Natural language processing techniques are then used for word segmentation, part-of-speech tagging, and named entity recognition to conduct in-depth analysis of the document content. Documents are classified and filtered based on factors such as type, source, and time. Screening conditions are set, such as document update frequency, keyword frequency, and document author, and the processing steps for different situations are further refined.

[0062] Keyword extraction technology is used to mine key information related to power system operation in documents, and text clustering technology is used to classify similar information, such as fault conditions and maintenance records.

[0063] The extracted key information is then linked with data in the graph database.

[0064] Key information includes the operating status of the power system, fault conditions, and maintenance records. The power-related documents include operation manuals, maintenance records, fault reports, and system logs.

[0065] The intelligent document analysis platform automatically matches the reference dataset with topological parameters and outputs the matching results as the predicted values. It also receives feedback information in real time and records it. The accuracy of the prediction is quantified by judging the mean square error and absolute error between the predicted results and the actual values. When the error value exceeds the preset tolerance range, the next step of analysis is performed.

[0066] After confirming that the prediction error exceeds the tolerance range, the duration and frequency of the prediction error are further analyzed. If the prediction error is found to be instantaneous or has a low frequency, it is classified as an occasional error; if the prediction error has a long duration or a high frequency, it is considered a systematic error and requires further analysis.

[0067] For predictions categorized as systematic errors, their impact on the power system is further assessed. By comparing the predictions with historical data and reference datasets, the potential impact on system stability is determined. If the impact is significant or may lead to system instability, a lower overall score is assigned to the prediction; conversely, a higher overall score is assigned, and corresponding operation and maintenance decisions are made. During the analysis process in the intelligent document analysis platform, the causes of unsafe operations in the current data implementation steps are investigated based on the reference dataset, and the reference dataset is updated to optimize the large power data model.

[0068] Example 2

[0069] Reference Figure 2As an embodiment of the present invention, a power data large model construction system is provided, including a data processing and storage module, a topology analysis and optimization module, an intelligent document parsing module, an association and model construction module, and a feedback and adaptive update module.

[0070] The data processing and storage module is responsible for collecting power system operation data, performing data preprocessing, and storing the processed data in the graph database.

[0071] The topology analysis and optimization module performs topology analysis on data in the graph database based on the topology and operating status of the power system, calculates the topology parameters of each node and edge, and optimizes the model when the accuracy of the model does not meet the preset threshold.

[0072] The intelligent document parsing module parses the content of power-related documents and extracts key text information related to power system operation data.

[0073] The association and model building module associates the extracted text information with data in the graph database, uses machine learning methods for data analysis and prediction, and forms a large power data model containing multi-dimensional information.

[0074] The feedback and adaptive update module receives real-time feedback on the model's prediction results and performs adaptive updates and optimizations of the model based on the feedback information.

[0075] One embodiment of the present invention differs from the previous two embodiments in that: if the function is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0076] The logic and / or steps represented in the flowchart or otherwise described herein, for example, can be considered as a sequenced list of executable instructions for implementing logical functions, and can be embodied in any computer-readable medium for use by, or in conjunction with, an instruction execution system, apparatus, or device (such as a computer-based system, a processor-included system, or other system that can fetch and execute instructions from, an instruction execution system, apparatus, or device). For the purposes of this specification, "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transmit programs for use by, or in conjunction with, an instruction execution system, apparatus, or device.

[0077] More specific examples (a non-exhaustive list) of computer-readable media include the following: electrical connections having one or more wires (electronic devices), portable computer disk cases (magnetic devices), random access memory (RAM), read-only memory (ROM), and erasable and programmable read-only memory.

[0078] (EPROM or flash memory), fiber optic devices, and portable optical disc read-only memory (CDROM). Alternatively, the computer-readable medium can even be paper or other suitable media on which the program can be printed, as the program can be obtained electronically, for example, by optically scanning the paper or other medium, followed by editing, interpreting, or otherwise processing as necessary, and then stored in computer memory.

[0079] It should be understood that various parts of the present invention can be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, multiple steps or methods can be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented using any one or a combination of the following techniques known in the art: discrete logic circuits having logic gates for implementing logical functions on data signals, application-specific integrated circuits (ASICs) having suitable combinational logic gates, programmable gate arrays (PGAs), field-programmable gate arrays (FPGAs), etc.

[0080] Example 3

[0081] In this embodiment, in order to verify the beneficial effects of the present invention, scientific demonstration is carried out through economic benefit calculation and simulation experiments.

[0082] Select 1000 data points from the power system, including voltage, current, power, and frequency. Use a median filter to remove noise and interpolation to fill in missing values. Transform all data to the 0-1 range using the min-max normalization method. Based on the power system's connectivity, construct a graph model where nodes represent power equipment and edges represent connections.

[0083] The PageRank algorithm is used to calculate a weight value for each node, representing its importance in the power system.

[0084]

[0085] TP(u) = PR′(u) × T(u)

[0086] TP(e)=W(e)×PR′(u)×PR′(v)

[0087] Select 10 documents related to the power system (operation manuals and fault reports), extract keywords and phrases related to power system operation from the documents, and classify the documents into different types such as operation, fault, and maintenance based on the extracted key information.

[0088] The extracted text information is correlated with power system data; for example, fault reports are associated with corresponding equipment nodes. A portion of the data is used as the training set, and the remainder as the test set to verify the model's accuracy.

[0089] Table 1 Data Comparison Table

[0090]

[0091] As shown in Table 1, the method of this invention performs exceptionally well across all evaluation metrics compared to traditional methods. Specifically, the accuracy, recall, and F1 score of the method of this invention reached 95%, 93%, and 94%, respectively, while the traditional methods achieved 88%, 85%, and 86%, respectively. This demonstrates a significant improvement in predictive accuracy by the method of this invention.

[0092] Furthermore, the area under the ROC curve increased from 0.90 in the traditional method to 0.97, further demonstrating the superiority of the method of this invention. In terms of topology parameter accuracy and document matching accuracy, the method of this invention achieved 92% and 90% respectively, while the traditional method achieved 85% and 82%. This means that the method of this invention is more accurate and efficient in processing power system data and document information.

[0093] It should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all such modifications or substitutions should be covered within the scope of the claims of the present invention.

Claims

1. A method for constructing a large power data model, characterized in that, include: Collect operational data from the power system, perform data preprocessing, and construct a power system data model. Based on the topology and operating status of the power system, topology analysis is performed on the data in the graph database to calculate the topology parameters of each node and edge. The intelligent document analysis platform is used to parse the content of power-related documents and extract text information related to power system operation data. The extracted text information is associated with data in the graph database to construct a large-scale power data model; The collected power system operation data is imported into a graph database, and corresponding nodes and edges are created. Based on the power system's topology and operating status, the PageRank algorithm is used to perform topology analysis on the data in the graph database. The PageRank algorithm expression is as follows: ， in, Represents the optimized node PageRank value, Indicates the damping coefficient. This represents the total number of nodes in the graph. This represents the probability that any node is visited. Indicates pointing to a node The set of nodes, This indicates connecting two nodes to an edge. The weight, This represents the weight of a node relative to an edge. Represented by node The set of edges starting from point A, Represents a node The optimized PageRank value, Represents the threshold function; The process of using an intelligent document analysis platform to parse power-related documents involves transmitting collected and organized power-related documents to the platform, using machine learning to extract key information from the documents as a reference dataset, determining the type of power-related documents based on their format, structure, and content characteristics, further extracting and analyzing key information from the current documents, statistically analyzing various fault scenarios and their corresponding maintenance times, using association rule mining algorithms to analyze the relationships between different devices, operations, and faults, calculating the topological parameters of each node and edge using the PageRank algorithm, transmitting the output results to the intelligent document analysis platform, and automatically matching the reference dataset using the topological parameters to construct a large-scale power data model.

2. The method for constructing a large power data model as described in claim 1, characterized in that: The collected power system operating data includes voltage, current, power, and frequency. The data preprocessing involves denoising the collected periodic state data, removing missing values, outliers, and invalid data with incorrect formats, adjusting the brightness and contrast of the images, converting the original format data into the format required for demand analysis, normalizing the data to complete the data preprocessing, and storing it in a graph database. The nodes and edges of the graph database are used to represent the components and connections of the power system, thus constructing a power system data model based on a graph structure.

3. The method for constructing a large power data model as described in claim 2, characterized in that: Cross-validation is used to divide historical power system data into training and validation sets. The validation set is used to evaluate the damping coefficient. When the accuracy is greater than a set threshold, the current PageRank algorithm continues the power system topology analysis. When the accuracy is less than or equal to the threshold, the importance of nodes and their states in the power system are incorporated into the PageRank algorithm for optimization. The expression is: ， in, Representative node The importance of Represents the power system at nodes The running status, and These are nodes Importance weight coefficients and nodes The importance weighting coefficient of the running status.

4. The method for constructing a large power data model as described in claim 3, characterized in that: The calculation of the topological parameters of each node and edge is based on the node numerical output of the iterated PageRank algorithm, expressed as: ， in, Represents a node Topology parameters, Representing an edge Topology parameters, This indicates the voltage or frequency at each node.

5. The method for constructing a large power data model as described in claim 4, characterized in that: The key information includes the operating status of the power system, fault conditions, and maintenance records. The power-related documents include operation manuals, maintenance records, fault reports, and system logs. The intelligent document analysis platform automatically matches the reference dataset with topology parameters and outputs the matching results as predicted values. It receives feedback information in real time, labels and records it, and judges the mean square error and absolute error between the predicted results and the actual values to quantify the accuracy of the prediction. When the error value exceeds a preset value, the platform analyzes the duration and frequency of the prediction error. If the prediction error is instantaneous or the frequency is less than the preset value, it is classified as an occasional error. If the prediction error is continuous or the frequency is greater than or equal to the preset value, it is determined to be a systematic error. The impact on the power system is further assessed. By comparing the prediction results with historical data and the reference dataset, the platform determines the degree of impact on system stability, assigns a comprehensive score, and formulates corresponding operation and maintenance decisions. During the analysis process in the intelligent document analysis platform, the platform screens unsafe operations in the current data implementation steps based on the reference dataset to trace the causes of problems, updates the reference dataset, and optimizes the power data model.

6. A system employing a method for constructing a large power data model as described in any one of claims 1 to 5, characterized in that: It includes a data processing and storage module, a topology analysis and optimization module, an intelligent document parsing module, a correlation and model building module, and a feedback and adaptive update module; The data processing and storage module is responsible for collecting the power system's operating data, performing data preprocessing, and storing the processed data in the graph database. The topology analysis and optimization module performs topology analysis on the data in the graph database based on the topology and operating status of the power system, calculates the topology parameters of each node and edge, and optimizes the model when the accuracy of the model does not meet the preset threshold. The intelligent document parsing module parses the content of power-related documents and extracts key text information related to power system operation data. The association and model building module associates the extracted text information with the data in the graph database, uses machine learning methods to perform data analysis and prediction, and forms a large power data model containing multi-dimensional information. The feedback and adaptive update module receives the model's prediction results in real time and performs adaptive updates and optimizations of the model based on the feedback information.

7. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1 to 5.

8. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 5.