Method, device, medium and equipment for querying safety supervision data of power grid
By constructing metadata graphs and using intent recognition technology, and optimizing and validating SQL statements, the problem of low accuracy of SQL statements in power grid safety supervision data queries was solved, and high-accuracy multi-hop queries were achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- JIANGXI KECHEN HONGXING INFORMATION TECH CO LTD
- Filing Date
- 2026-06-02
- Publication Date
- 2026-06-30
AI Technical Summary
Existing methods for querying power grid safety supervision data suffer from low accuracy in generating SQL statements, failing to effectively express foreign key relationships and business semantic relationships between tables, resulting in low accuracy for multi-hop queries.
A metadata graph is constructed by acquiring metadata from safety supervision business concepts and the power grid safety supervision database, performing intent recognition and feature extraction, generating candidate SQL statements, and optimizing and verifying them through a statement transformation model, ultimately generating accurate SQL query statements.
It improves the accuracy of multi-hop queries and SQL statements, enabling precise understanding of safety supervision business concepts and cross-system data relationships, and meeting the compliance requirements of strong power grid supervision.
Smart Images

Figure CN122309542A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of safety monitoring data technology, and in particular to a method, device, medium and equipment for querying safety monitoring data of power grids. Background Technology
[0002] As power grid companies deepen their digital transformation, power grid safety supervision (safety monitoring) operations face the challenge of managing massive amounts of heterogeneous data. Power grid safety monitoring involves multiple systems, including dispatching, operation and maintenance, and marketing, with complex and highly specialized data structures covering key aspects such as operational risk control and hazard investigation and management. Under the business requirements of "strong supervision and high compliance," safety monitoring personnel need to frequently query data across systems to assess risk levels and statistically analyze hazard trends.
[0003] Currently, the industry mainly uses traditional Text2SQL (Natural Language to SQL, Structured Query Language) systems to query safety supervision data. The safety supervision data is stored in the schema (database). After receiving the natural language query, the system converts the natural language query into an SQL statement based on the elements in the schema. The SQL statement then queries the database for matching data.
[0004] Text2SQL query technology has made some progress in general domain benchmark tests, but in the power grid scenario, it mainly relies on basic schema information matching. The schema has static limitations and cannot express multi-dimensional relationships such as foreign key associations between tables and business semantic associations, resulting in low accuracy of multi-hop queries and inaccurate generated SQL statements. Summary of the Invention
[0005] In view of this, the present invention provides a method, apparatus, medium and equipment for querying power grid safety monitoring data, the main purpose of which is to solve the problem of low accuracy of SQL statements generated by current safety monitoring data methods.
[0006] According to one aspect of this application, a method for querying power grid safety monitoring data is provided, the method comprising: Obtain the metadata of the safety supervision business concept and the power grid safety supervision database, and construct a metadata graph based on the safety supervision business concept and the metadata of the power grid safety supervision database; The user's natural query language is obtained, and feature extraction based on intent recognition is performed on the natural query language to obtain structured semantic features. Relevant data is then obtained from the metadata graph according to the recognized intent. Obtain the constraints, input the structured semantic features, the relevant data and the constraints into a preset statement transformation model to obtain multiple candidate SQL statements, optimize the multiple candidate SQL statements to obtain an optimized SQL statement; The optimized SQL statement is validated, and the validated SQL statement is used as the final query statement. The safety supervision data is then queried based on the final query statement.
[0007] Optionally, the metadata in the power grid safety supervision database includes multiple data tables. The construction of a metadata graph based on the safety supervision business concept and the metadata in the power grid safety supervision database includes: A data structure graph is constructed using the table names of the data tables as nodes and the preset business relationships and foreign key relationships between data tables as edges. A business concept graph is constructed using the aforementioned safety supervision business concepts as nodes and the semantic relationships between the aforementioned safety supervision business concepts as edges. Based on a preset mapping relationship, the nodes in the data structure diagram and the business concept diagram are connected to obtain a heterogeneous diagram; A hybrid expert field embedding method is used to embed features into the fields corresponding to the nodes in the heterogeneous graph to obtain the initial field vector of the nodes; The initial field vectors of the nodes in the heterogeneous graph are encoded to obtain the metadata graph.
[0008] Optionally, the step of performing intent-based feature extraction on the natural query language to obtain structured semantic features includes: The intent of the natural query language is identified based on a preset intent recognition model, wherein the preset intent recognition model includes a semantic feature extractor and a Bayesian classification head, the semantic feature extractor extracts semantic features from the natural query language, and the Bayesian classification head determines the intent based on the semantic features; Based on the determined intent, entities and entity relationships are extracted from the natural query language, and the identified intent, extracted entities, and entity relationships are used as structured semantic features.
[0009] Optionally, optimizing the plurality of candidate SQL statements to obtain an optimized SQL statement includes: Each candidate SQL statement is split into segments, and the split statement segments are treated as multiple nodes on a path in order from front to back. Each path is then used as a branch of the search tree to construct the search tree. The Monte Carlo tree search algorithm is used to perform node searches based on the search tree to determine the optimal SQL statement.
[0010] Optionally, the verification of the optimized SQL statement includes: The optimized SQL statement is parsed, and the parsed statement is subjected to syntax verification. After the syntax verification is passed, the table names, join operations and aggregation calculation methods in the optimized SQL statement are semantically verified based on the metadata graph. After semantic verification is passed, the optimized SQL statement is subjected to rule compliance verification. After rule compliance verification is passed, the optimized SQL statement is executed in a sandbox environment. During the execution process, anomaly detection, empty result detection, execution performance testing, and counterfactual verification are performed.
[0011] Optionally, after validating the optimized SQL statement and before using the validated SQL statement as the query statement, the method for querying the power grid safety supervision data further includes: Obtain the preference optimization dataset, and train the initial SQL statement fine-tuning model based on the preference optimization dataset to obtain the trained SQL statement fine-tuning model; The validated SQL statement is input into the trained SQL statement fine-tuning model to obtain the fine-tuned SQL statement.
[0012] Optionally, after using the validated SQL statement as the final query statement, the method for querying the power grid safety supervision data further includes: When a new round of natural query statements is received, the first dialogue state and the final query statement corresponding to the natural query language of the previous round are obtained. Based on the second dialogue state of the new round of natural query statements, the first dialogue state corresponding to the natural query language of the previous round, and the final query statement, the Bayesian filtering framework is used to update the second dialogue state. Calculate the KL divergence between the first dialogue state and the updated second dialogue state. When the KL divergence is greater than a preset first divergence threshold, output intent change confirmation information. After receiving the feedback information of the intent change confirmation information, reset the context information in the updated second dialogue state, obtain the first candidate entity in the new round of natural query statements, calculate the relevance between the referent in the new round of natural query statements and the first candidate entity, take the first candidate entity with a relevance greater than a preset relevance threshold as the referent reference entity, and update the reset second dialogue state based on the referent reference entity. When the KL divergence is less than a preset first divergence threshold and greater than a preset second divergence threshold, the context and related information in the updated second dialogue state are adjusted, the second candidate entity in the dialogue history is obtained, the relevance between the reference in the new round of natural query statements and the second candidate entity is calculated, the second candidate entity with a relevance greater than a preset relevance threshold is used as the reference entity, and the adjusted second dialogue state is updated based on the reference entity.
[0013] According to another aspect of this application, a device for querying power grid safety monitoring data is provided, comprising: The metadata graph construction module is used to obtain metadata from the safety supervision business concept and the power grid safety supervision database, and to construct a metadata graph based on the safety supervision business concept and the metadata in the power grid safety supervision database; The intent recognition module is used to acquire the user's natural query language, perform intent recognition-based feature extraction on the natural query language to obtain structured semantic features, and acquire relevant data in the metadata graph according to the recognized intent. The SQL statement generation module is used to obtain constraints, input the structured semantic features, the relevant data and the constraints into a preset statement conversion model to obtain multiple candidate SQL statements, and optimize the multiple candidate SQL statements to obtain optimized SQL statements. The SQL statement verification module is used to verify the optimized SQL statement, and the verified SQL statement is used as the final query statement to query the safety supervision data.
[0014] Optionally, the metadata graph construction module is further used for: A data structure graph is constructed using the table names of the data tables as nodes and the preset business relationships and foreign key relationships between data tables as edges. A business concept graph is constructed using the aforementioned safety supervision business concepts as nodes and the semantic relationships between the aforementioned safety supervision business concepts as edges. Based on a preset mapping relationship, the nodes in the data structure diagram and the business concept diagram are connected to obtain a heterogeneous diagram; A hybrid expert field embedding method is used to embed features into the fields corresponding to the nodes in the heterogeneous graph to obtain the initial field vector of the nodes; The initial field vectors of the nodes in the heterogeneous graph are encoded to obtain the metadata graph.
[0015] Optionally, the intent recognition module is further configured to: The intent of the natural query language is identified based on a preset intent recognition model, wherein the preset intent recognition model includes a semantic feature extractor and a Bayesian classification head, the semantic feature extractor extracts semantic features from the natural query language, and the Bayesian classification head determines the intent based on the semantic features; Based on the determined intent, entities and entity relationships are extracted from the natural query language, and the identified intent, extracted entities, and entity relationships are used as structured semantic features.
[0016] Optionally, the SQL statement generation module is further configured to: Each candidate SQL statement is split into segments, and the split statement segments are treated as multiple nodes on a path in order from front to back. Each path is then used as a branch of the search tree to construct the search tree. The Monte Carlo tree search algorithm is used to perform node searches based on the search tree to determine the optimal SQL statement.
[0017] Optionally, the SQL statement verification module is further used for: The optimized SQL statement is parsed, and the parsed statement is subjected to syntax verification. After the syntax verification is passed, the table names, join operations and aggregation calculation methods in the optimized SQL statement are semantically verified based on the metadata graph. After semantic verification is passed, the optimized SQL statement is subjected to rule compliance verification. After rule compliance verification is passed, the optimized SQL statement is executed in a sandbox environment. During the execution process, anomaly detection, empty result detection, execution performance testing, and counterfactual verification are performed.
[0018] Optionally, the device for querying power grid safety monitoring data also includes: The fine-tuning module is used to acquire the preference optimization dataset, train the initial SQL statement fine-tuning model based on the preference optimization dataset, and obtain the trained SQL statement fine-tuning model; the validated SQL statement is input into the trained SQL statement fine-tuning model to obtain the fine-tuned SQL statement.
[0019] Optionally, the device for querying power grid safety monitoring data also includes: The semantic processing module is used to obtain the first dialogue state and the final query statement corresponding to the natural query language of the previous round when a new round of natural query statement is received. Based on the second dialogue state of the new round of natural query statement, the first dialogue state corresponding to the natural query language of the previous round, and the final query statement, the second dialogue state is updated using a Bayesian filtering framework. Calculate the KL divergence between the first dialogue state and the updated second dialogue state. When the KL divergence is greater than a preset first divergence threshold, output intent change confirmation information. After receiving the feedback information of the intent change confirmation information, reset the context information in the updated second dialogue state, obtain the first candidate entity in the new round of natural query statements, calculate the relevance between the referent in the new round of natural query statements and the first candidate entity, take the first candidate entity with a relevance greater than a preset relevance threshold as the referent reference entity, and update the reset second dialogue state based on the referent reference entity. When the KL divergence is less than a preset first divergence threshold and greater than a preset second divergence threshold, the context and related information in the updated second dialogue state are adjusted, the second candidate entity in the dialogue history is obtained, the relevance between the reference in the new round of natural query statements and the second candidate entity is calculated, the second candidate entity with a relevance greater than a preset relevance threshold is used as the reference entity, and the adjusted second dialogue state is updated based on the reference entity.
[0020] According to another aspect of this application, a storage medium is provided that stores at least one executable instruction, which causes a processor to perform an operation corresponding to the above-described method for querying power grid safety monitoring data.
[0021] According to another aspect of this application, a computer device is provided, comprising: a processor, a memory, a communication interface, and a communication bus, wherein the processor, the memory, and the communication interface communicate with each other via the communication bus; The memory is used to store at least one executable instruction, which causes the processor to perform the operation corresponding to the above-mentioned power grid safety monitoring data query method.
[0022] By employing the above-described technical solutions, the technical solutions provided by the embodiments of the present invention have at least the following advantages: This application provides a method, apparatus, equipment, and medium for querying power grid safety supervision data. Based on safety supervision business concepts and metadata in the power grid safety supervision database, a metadata graph is constructed. Natural query language is subjected to intent-based feature extraction to obtain structured semantic features. Relevant data is retrieved from the metadata graph according to the identified intent. The structured semantic features, relevant data, and constraints are input into a preset statement conversion model to obtain multiple candidate SQL statements. These candidate SQL statements are optimized to obtain optimized SQL statements. The optimized SQL statements are then verified, and the verified SQL statements are used as the final query statements. Because the metadata graph can reflect the relationship between safety supervision business concepts and metadata in the power grid safety supervision database, it can accurately understand the professional terminology in the safety supervision field and the relationship between cross-system data, improving the accuracy of multi-hop queries, i.e., improving the accuracy of SQL statements. The addition of conditional constraints during SQL statement conversion and the verification of the optimized SQL statements also improve the accuracy of the SQL statements.
[0023] The above description is merely an overview of the technical solution of the present invention. In order to better understand the technical means of the present invention and to implement it in accordance with the contents of the specification, and in order to make the above and other objects, features and advantages of the present invention more apparent and understandable, specific embodiments of the present invention are described below. Attached Figure Description
[0024] Various other advantages and benefits will become apparent to those skilled in the art upon reading the following detailed description of preferred embodiments. The accompanying drawings are for illustrative purposes only and are not intended to limit the invention. Furthermore, the same reference numerals denote the same parts throughout the drawings. In the drawings: Figure 1 A flowchart illustrating a method for querying power grid safety monitoring data according to an embodiment of this application is shown; Figure 2 This paper presents another flowchart illustrating a method for querying power grid safety monitoring data according to an embodiment of this application. Figure 3 This paper illustrates another flowchart of a method for querying power grid safety monitoring data provided in an embodiment of this application; Figure 4 This paper shows a block diagram of a power grid safety monitoring data query device provided in an embodiment of this application; Figure 5 A schematic diagram of the structure of a computer device provided in an embodiment of the present invention is shown.
[0025] in, Figure 4Chinese: 402 - Metadata Graph Construction Module; 404 - Intent Recognition Module; 406 - SQL Statement Generation Module; 408 - SQL Statement Verification Module; Figure 5 In Chinese: 502 - Processor; 504 - Communication interface; 506 - Memory; 508 - Communication bus; 510 - Program. Detailed Implementation
[0026] The present invention will be described in detail below with reference to the accompanying drawings and embodiments. It should be noted that, unless otherwise specified, the embodiments and features described in the embodiments of the present invention can be combined with each other.
[0027] To further illustrate the technical means and effects adopted by the present invention to achieve the intended purpose, the specific embodiments, structures, features, and effects according to the present invention will be described in detail below with reference to the accompanying drawings and preferred embodiments. In the following description, different "an embodiment" or "an embodiment" do not necessarily refer to the same embodiment. Furthermore, specific features, structures, or characteristics in one or more embodiments can be combined in any suitable form.
[0028] To address the issue of low accuracy in SQL statements generated by current safety monitoring data methods, this application provides a method for querying power grid safety monitoring data, such as... Figure 1 As shown, the method includes: 102: Obtain metadata from the safety supervision business concept and the power grid safety supervision database, and construct a metadata graph based on the metadata from the safety supervision business concept and the power grid safety supervision database; 104: Obtain the user's natural query language, perform intent-based feature extraction on the natural query language to obtain structured semantic features, and obtain relevant data from the metadata graph based on the identified intent; 106: Obtain constraints, input structured semantic features, relevant data and constraints into the preset statement transformation model to obtain multiple candidate SQL statements, optimize the multiple candidate SQL statements to obtain the optimized SQL statement; 108: Validate the optimized SQL statement, use the validated SQL statement as the final query statement, and query the safety supervision data based on the final query statement.
[0029] Specifically, safety supervision business concepts are obtained through manual input by safety supervision business experts and parsing of business documents. These concepts include: concept name, business definition, calculation formula, list of associated technical fields, and data source system identifier. Metadata includes database-level metadata, table-level metadata, field-level metadata, constraint-level metadata, and index-level metadata. Based on the safety supervision business concepts and the metadata in the power grid safety supervision database, a metadata graph is constructed. This metadata graph reflects the mapping and association between safety supervision business concepts and technical metadata.
[0030] The system acquires natural language query requests, performs intent recognition and task decomposition on the queries, and classifies, identifies entities, and extracts relationships based on the identified intent to obtain structured semantic features. It then retrieves relevant data, business terminology mappings, and historical cases from the metadata graph based on the identified intent. The structured semantic features, relevant data, and constraints are input into a pre-defined statement transformation model. Under pre-defined rule constraints, the model generates multiple candidate SQL statements. These candidate SQL statements may contain syntax errors and semantic deviations. The system optimizes these candidate SQL statements to obtain optimized SQL statements, systematically solving the illusion problem of large language models in vertical domains from both the generation source and optimization search dimensions. The optimized SQL statements undergo three levels of verification and correction: syntax, semantics, and result. The verified SQL statements are used as the final query statements, and safety supervision data is queried based on these final query statements.
[0031] This application provides a method for querying power grid safety supervision data. Compared with existing technologies, it constructs a metadata graph based on safety supervision business concepts and metadata in the power grid safety supervision database. It then performs intent-based feature extraction on natural query language to obtain structured semantic features. Based on the identified intent, it retrieves relevant data from the metadata graph. The structured semantic features, relevant data, and constraints are input into a preset statement transformation model to obtain multiple candidate SQL statements. These candidate SQL statements are then optimized to obtain optimized SQL statements. The optimized SQL statements are then verified, and the verified SQL statements are used as the final query statements. Because the metadata graph can reflect the relationship between safety supervision business concepts and metadata in the power grid safety supervision database, it can accurately understand the professional terminology in the safety supervision field and the relationships between cross-system data, improving the accuracy of multi-hop queries, i.e., improving the accuracy of SQL statements. The addition of conditional constraints during SQL statement transformation and the verification of the optimized SQL statements also improve the accuracy of the SQL statements.
[0032] In one embodiment of the present invention, such as Figure 2 As shown, the metadata in the power grid safety supervision database includes multiple data tables. Based on the safety supervision business concepts and the metadata in the power grid safety supervision database, a metadata graph is constructed, including: 202: Construct a data structure graph using the table name as a node and the preset business relationships and foreign key relationships between data tables as edges; 204: Construct a business concept graph using safety supervision business concepts as nodes and semantic relationships between safety supervision business concepts as edges; 206: Based on the preset mapping relationship, connect the nodes in the data structure diagram and the business concept diagram to obtain a heterogeneous diagram; 208: A hybrid expert field embedding method is used to embed features of the fields corresponding to nodes in a heterogeneous graph, resulting in the initial field vector of the node; 210: Encode the initial field vectors of nodes in the heterogeneous graph to obtain the metadata graph.
[0033] In this embodiment, the construction of the metadata graph is divided into four stages: metadata collection, metadata cleaning and annotation, data structure graph construction, and graph structure encoding.
[0034] Metadata Collection: Connecting to the power grid safety monitoring business database via JDBC (Java Database Connectivity) / ODBC (Open Database Connectivity) protocols to obtain database connection information (database address, port, instance name, authentication credentials, etc., provided by the power grid information technology operation and maintenance department and stored in the system configuration repository). Metadata includes: (1) Database-level metadata: database name, instance information, character set, version number, etc.; (2) Table-level metadata: table name, table comments, database / schema, row count estimate, partition information, creation time, last update time, etc.; (3) Field-level metadata: field name, field comment, data type, whether nullable is allowed, default value, whether it is a primary key / foreign key, etc.; (4) Constraint-level metadata: primary key constraints, foreign key constraints, unique constraints, check constraints, and reference tables and reference fields associated with foreign keys; (5) Index-level metadata: index name, index type, index fields, whether it is a unique index, etc.
[0035] The collected raw metadata has quality issues and needs to be cleaned and processed one by one: (1) Descriptive processing of field names: A large number of fields in the power grid safety supervision database are named using abbreviations. The system first performs semantic restoration of the field names. The specific processing steps are as follows: Based on the pre-built abbreviation dictionary in the field of safety supervision (compiled and maintained by safety supervision business experts, containing about 2,000 common abbreviations-full name mappings), the abbreviation field names are automatically restored to Chinese full name descriptions. For abbreviations not included in the dictionary, the system uses a large language model combined with the field context (the table name to which it belongs, the names of adjacent fields) to infer and complete the abbreviations. The inference results are then manually reviewed and included in the dictionary.
[0036] (2) Field annotation completion: For fields with missing annotations, the system adopts the following strategy to complete them: First, based on the field name and the table name to which it belongs, it searches for semantically similar labeled fields in the metadata graph and refers to their annotation templates; second, it uses the large language model to infer the meaning of the field based on the field name, data type and sample value, and generates annotation candidates; finally, it is manually reviewed and confirmed.
[0037] (3) Data quality inspection and cleaning: Perform data quality inspection on each field, including null value rate statistics (calculate the number of null records / total number of records), data type consistency verification (detect whether there are outliers with mismatched types), value range rationality check (such as whether the date field has future dates or illegal dates), and duplicate data detection. Fields with a null value rate of more than 80% are marked as "low quality fields" and their selection weight is reduced when generating SQL.
[0038] (4) Sample value collection and statistical feature extraction: Collect the top-50 (first fifty) high-frequency values for each field, and calculate the cardinality, null value rate, and data distribution characteristics of the field (calculate the mean, standard deviation, and quantiles for numerical fields; calculate the time span and time distribution density for date fields; calculate the average length and high-frequency words for text fields). These statistical features will be stored as instance data in the metadata graph.
[0039] Based on the cleaned metadata, a data structure diagram is constructed to achieve full-dimensional coverage of technical metadata and safety supervision business semantics. The data structure diagram includes: Business concepts: Defined by safety supervision business experts, these concepts encompass safety supervision KPIs (Key Performance Indicators) such as "monthly violation rate," "hazard rectification rate," and "work plan execution rate," risk warning levels (Level 1 / 2 / 3 / 4), hazard levels (major / minor), and the "Four Controls" indicator system (controlling plans, teams, personnel, and on-site operations). Each business concept includes: concept name, business definition, calculation formula, a list of associated technical fields, and a data source system identifier. Business concepts are obtained through both manual input by safety supervision business experts and parsing of business documents.
[0040] Semantic association: Establishing a mapping of terminology synonyms (e.g., "outsourced team" = "subcontracting team" = "construction team"), indicator lineage (e.g., "monthly violation rate" is calculated from "number of violations" and "total number of operations"), and business rule mapping (e.g., "risk level" is comprehensively determined by three dimensions: "operation type" + "operation environment" + "control measures"). This data was obtained through the parsing of safety supervision business rule documents and the input of expert knowledge.
[0041] Constraints: Define primary and foreign key association constraints (obtained directly from the collected constraint-level metadata), data quality rules, and safety supervision business constraints (such as "time range conditions must be included when querying risk warning data" and "queries involving personnel information must be anonymized," etc., defined by safety supervision business experts).
[0042] Structure: Stores the underlying structure information of database tables / fields, which is directly mapped from the technical metadata collected in Phase 1, including table name, field name, data type, constraint relationship, etc.
[0043] Example: The sample values, high-frequency values, and data distribution characteristics of the record field are obtained from the sample value collection and statistical feature extraction results in Phase Two.
[0044] A data structure graph is constructed using database tables as nodes and foreign key relationships and business associations between tables as edges. The initial characteristics of each table node are represented by mean pooling of its embedded fields. A business concept graph is constructed using safety supervision business concepts as nodes and semantic associations between concepts as edges. Based on pre-defined mapping relationships, the nodes in the data structure graph and the business concept graph are connected to obtain a heterogeneous graph.
[0045] The initial features of each concept node are represented by attention-weighted representations embedded in its associated technical fields.
[0046] The data structure diagram and business concept diagram are merged into a unified heterogeneous diagram, and a relational graph convolutional network is used to encode the fields corresponding to the nodes in the heterogeneous diagram. For node i in the diagram, its l-th layer hidden state update formula is:
[0047] In the formula: Let i be the hidden state of node i at level l+1. R represents the hidden state of node j at level l; R is the relation type, which includes various power grid safety supervision-specific relation types such as technical associations (primary and foreign keys, inter-table joins), business aliases (synonym mappings), and computational dependencies (indicator lineages). Let i be the set of nodes that are adjacent to node i under relation type r; This is the normalization constant; Let r be the learnable weight matrix of relation type r at layer l; This is the self-connection weight matrix; It uses the ReLU (Rectified Linear Unit) activation function. This encoding method can accurately capture multi-type relationships in metadata, which is superior to traditional planar encoding methods.
[0048] Field features are collaboratively encoded using multi-expert networks. The embedding vector for each field is encoded separately by four expert networks and then weighted and fused using a gating network. The calculation formula is as follows:
[0049]
[0050] In the formula: represents the collaborative encoding field features of a multi-expert network; k represents the number of named features of the encoded field in the expert network. The expert network's encoded field naming features, data type features, annotation features, and value distribution features (embedded representations of statistical features) are defined. Among them, the expert network's encoded field naming features include BERT embeddings (Bidirectional Encoder Representations from Transformers) of field names, data type features are one-hot encodings of data types and semantic embeddings, and annotation features are BERT embeddings of field annotations. The weight vector is the output of the gating network. and These are the learnable parameters of the gated network; The weight vector output by the gating network that names the features of the encoded field of the k-th expert network. .
[0051] In one embodiment of the present invention, such as Figure 3 As shown, feature extraction based on intent recognition is performed on natural query language to obtain structured semantic features, including: 302: Perform intent recognition on natural query language based on a preset intent recognition model, wherein the preset intent recognition model includes a semantic feature extractor and a Bayesian classification head, the semantic feature extractor extracts semantic features from natural query language, and the Bayesian classification head determines the intent based on the semantic features; 304: Extract entities and entity relationships in natural query language based on the determined intent, and use the identified intent, extracted entities, and entity relationships as structured semantic features.
[0052] Specifically, the intent recognition model uses the DeBERTa-v3-large model (Decoding-enhanced BERT with Disentangled Attention - v3, BERT based on decoding enhancement and decoupled attention) to extract deep semantic feature vectors. A Bayesian classification head is added to the model's output layer to achieve probability output and uncertainty quantification through variational inference. The Bayesian classification head uses a Monte Carlo Dropout sampling strategy, performing T forward propagations (default T=20) during the inference phase to statistically analyze the prediction distribution of each intent category.
[0053] The loss function employs a hybrid loss design, jointly optimizing classification accuracy, regularization constraints, and uncertainty estimation:
[0054] in: Total loss; The standard cross-entropy classification loss is used. This is the KL divergence regularization term, which constrains the posterior distribution to approximate the prior distribution. To compensate for losses due to uncertainty, we penalize overconfident predictions under high uncertainty. and These are hyperparameters, with default values of 0.1 and 0.05 respectively.
[0055] The intent category system covers eight core intents in the electricity supervision and safety supervision scenarios: data query (single table query, multi-table association query, statistical aggregation query), risk analysis (risk level distribution, hidden danger trend analysis, early warning data query), operation supervision (operation plan execution rate, "four controls" indicator query), comparative analysis (year-on-year and month-on-month comparison, regional comparison), ranking statistics (unit ranking, personnel ranking), detailed export (condition filtering, field specification), trend prediction (time series trend, anomaly prediction), and comprehensive report (multi-dimensional summary analysis).
[0056] Entity recognition is based on GlobalPointer (global pointer network) + RoFormer (Rotary Position EmbeddingTransformer). The GlobalPointer architecture serves as a unified framework for entity recognition, while the underlying encoder uses RoFormer. Compared to the traditional BERT position encoding method, RoFormer's rotational position encoding can better capture long-distance dependencies and has stronger modeling capabilities for safety supervision query statements containing multiple entities and complex modification relationships.
[0057] GlobalPointer addresses the entity recognition problem by calculating a score for any two token positions (d, f) in the query that constitute an entity boundary.
[0058] in: Scoring of entity boundaries; Let d be the head representation vector at position d; Let f be the tail representation vector at position f; A is the learnable relative position rotation transformation matrix; ReLU (Rectified Linear Unit) is the rectified linear activation function. When the threshold for the type is exceeded, the interval (d,f) is considered to constitute an entity of the corresponding type.
[0059] The entity type system covers 12 types of entities in the field of safety supervision: personnel entities (name, employee number, position), organizational entities (unit name, department name, work group name), time entities (absolute time, relative time, time period), spatial entities (substation name, line name, area name), operation entities (operation type, operation number, operation status), equipment entities (equipment name, equipment number, equipment type), indicator entities (KPI name, statistical caliber), numerical entities (quantity, proportion, amount), hidden danger entities (hidden danger level, hidden danger type, hidden danger number), risk entities (risk level, warning type), legal entities (regulation clauses, standard number), and other entities (fuzzy references, negative terms).
[0060] This invention employs the HyperGraphTransformer architecture to extract entity relationships across multiple relationships. Traditional relationship extraction methods can only handle binary relationships (i.e., relationships between two entities), while safety supervision queries often involve multi-dimensional relationships (such as "risk level of a certain type of operation in a certain month for a certain unit," which involves complex relationships between four entities: unit, time, operation type, and risk level). This invention uses the HyperGraphTransformer architecture to model relationship extraction as a hypergraph construction problem.
[0061] Specifically, a semantic dependency hypergraph is first constructed, where a hyperedge represents a relation tuple, and the nodes in the hyperedge are the set of entities participating in that relation. For the k-th candidate hyperedge... Its scoring function is:
[0062] in: For super-edge The normalized attention weights; H is the set of all candidate superedges; For super-edge The relationship score is calculated by the HyperGraphTransformer encoder. For any candidate hyperedge Relationship rating; Used to filter relation tuples with high confidence.
[0063] The HyperGraphTransformer encoder employs a hypergraph attention mechanism to aggregate information from entity nodes within each hyperedge, while simultaneously capturing dependencies between relationships through message passing between hyperedges. The core computations of the encoder include: entity feature aggregation within hyperedges (based on a multi-head attention mechanism), relationship feature propagation between hyperedges (based on a graph convolutional network), and query-aware relationship scoring (injecting the semantic representation of query Q as a conditional vector into the scoring function).
[0064] The relationship type system covers a variety of relationships in the field of safety supervision: attribution relationship (personnel-unit, equipment-site), time relationship (event-time, indicator-cycle), spatial relationship (operation-area, hazard-site), causal relationship (violation-risk, hazard-accident), statistical relationship (indicator-calculation formula, data-source table), condition relationship (early warning-triggering condition, level-judgment rule), parallel relationship (parallel relationship between entities of the same type), etc.
[0065] In one embodiment of the present invention, multiple candidate SQL statements are optimized to obtain an optimized SQL statement, including: Each candidate SQL statement is split into segments, and the split statement segments are treated as multiple nodes on a path in order from front to back. Each path is then used as a branch of the search tree to construct the search tree. The Monte Carlo tree search algorithm is used to search for nodes based on the search tree in order to determine the optimal SQL statement.
[0066] Specifically, for candidate SQL statements, Monte Carlo Tree Search (MCTS) is used for iterative optimization. MCTS models SQL optimization as a sequential decision problem, where each decision corresponds to a modification operation in the SQL (such as replacing table names, modifying JOIN conditions, adjusting aggregate functions, etc.).
[0067] Calculate the upper confidence interval for each node. In each branch, determine the selected node for each level based on the upper confidence interval, in a top-down order. When the selected node is an unexpanded node, generate a new statement fragment based on the statement fragment of the unexpanded node and the foreign key relationship in the metadata graph, and use it as an expanded node. When both the selected node and the expanded node are unexpandable nodes, generate the SQL query statement corresponding to each branch, simulate the execution of the SQL query statement, and calculate the reward value of the SQL query statement based on the execution. Based on the reward value, update the access count and cumulative reward value of the node corresponding to the SQL query statement. After updating the access count and cumulative reward value of the node, calculate the upper confidence interval for each node and generate a new SQL query statement corresponding to each branch. Continue until the iteration condition is met. Calculate the access count of each branch based on the access count of each node, and use the SQL query statement corresponding to the branch with the most access count as the optimized SQL statement.
[0068] For each SQL node in the search tree, its reward value is calculated using the following methods: SQLGlot (SQLGlot Parsing and Verification) parsing and verification, evaluation based on the metadata graph business rule engine, evaluation based on sandbox execution results, and preset rules.
[0069] in: This is the reward value; Score for syntax correctness (0 or 1, verified by SQLGlot parsing); The semantic correctness score (a continuous value of 0-1, evaluated based on the metadata graph business rule engine). The execution effectiveness score (a continuous value of 0-1, evaluated based on sandbox execution results); The preset rule compliance score (a continuous value of 0-1, based on a check of each of the 6 constraint rules); , , , These are the weighting coefficients, with default values of 0.15, 0.35, 0.30, and 0.20.
[0070] Search parameter configuration: maximum search depth is 5 (i.e., a maximum of 5 rounds of SQL modification), maximum number of child nodes per node is 3 (a maximum of 3 modification candidates are generated per round), and maximum number of simulations is 50 (i.e., a maximum of 50 SQL variants are evaluated). The search termination condition is: finding the reward value. The SQL node, or the maximum number of simulations.
[0071] SQL modification operations include: table name replacement (based on metadata graph synonym mapping), field name replacement (based on metadata graph field semantic association), JOIN condition adjustment (based on metadata graph table association path), WHERE (filtering) condition optimization (based on query intent to supplement filter conditions), aggregate function adjustment (based on metadata graph indicator calculation caliber), and subquery expansion / collapse (based on execution plan optimization).
[0072] In one embodiment of the present invention, the optimization of the SQL statement is validated, including: The optimized SQL statement is parsed, and the parsed statement is subjected to syntax verification. After the syntax verification is passed, the semantic verification of table names, join operations and aggregation calculation methods in the optimized SQL statement is performed based on the metadata graph. After semantic validation passes, the optimized SQL statement undergoes rule compliance validation. Once the rule compliance validation passes, the optimized SQL statement is executed in a sandbox environment. During execution, anomaly detection, empty result detection, execution performance testing, and counterfactual verification are performed.
[0073] Specifically, to improve the accuracy of SQL statements and meet the stringent regulatory requirements of the power grid, optimized SQL statements undergo sequential syntax verification, semantic verification, and result verification. Each level of verification triggers a corresponding correction strategy upon identifying an issue, and the corrected SQL statement re-enters the verification process until all three levels of verification are passed or the maximum number of correction rounds is reached.
[0074] Syntax verification method: The SQLGlot library (SQL escape library) is used to perform abstract syntax tree parsing on the optimized SQL to verify its syntactic validity. SQLGlot supports syntax parsing for multiple database dialects and can accurately identify and locate syntax errors.
[0075] The verification content covers: keyword spelling correctness, bracket matching correctness, data type compatibility, expression syntax correctness, subquery nesting legality, JOIN syntax correctness, aggregate function usage standardization (grouping consistency), ORDERBY (sorting) field validity, LIMIT / OFFSET (limit / offset) syntax correctness, etc.
[0076] Correction strategy: When a syntax error is detected, the system feeds back the error information (error type, error location, error description) to the large language model, requiring it to correct the syntax error while maintaining semantic integrity. The corrected SQL then re-enters the syntax validation process.
[0077] Semantic validation method: Based on successful syntax validation, the semantic correctness of the SQL is further verified. This application constructs a business rule engine based on metadata graphs, performing semantic validation from the following four dimensions: (1) Table existence verification: Check whether all table names referenced in the SQL are registered in the data structure graph of the metadata graph. For unregistered table names, perform fuzzy matching based on the synonym mapping in the semantic association layer of the metadata graph to recommend the most likely correct table name.
[0078] (2) JOIN (join) validity verification: Check whether the join conditions of all JOIN operations in the SQL are consistent with the table relationships defined in AMG. Specific verification content includes: whether the joined fields do indeed have primary-foreign key relationships or business relationships, whether the join direction is correct (e.g., a child table joins a parent table instead of the other way around), and whether the join path is the shortest (avoiding redundant intermediate table JOINs). For invalid JOINs, the correct JOIN path is recommended based on the graph structure of AMG.
[0079] (3) Business Scope Consistency Verification: Check whether the aggregation calculation logic in SQL is consistent with the calculation scope defined in the business concept layer of the metadata graph. Specific verification content includes: whether the aggregation formula matches the predefined scope (e.g., whether the calculation formula for "violation rate" is "number of violations / total number of operations"), whether the time granularity is correct (e.g., whether monthly indicators are grouped by month), and whether the filtering conditions omit key constraints (e.g., whether specific operation types are excluded when calculating the violation rate).
[0080] (4) Rule compliance verification: Check each SQL statement to see if it conforms to the preset constraint rules, including table / field authenticity, partition pruning, aggregation compliance, JOIN validity, syntax compliance, and data security.
[0081] If verification fails, output a detailed list of violations and suggested corrections.
[0082] Sandbox result verification method: After semantic verification passes, the SQL is executed in a sandbox environment, and the execution results are verified from multiple dimensions. The sandbox environment has the same logical structure as the production database, but the data has undergone anonymization to ensure the security of the verification process.
[0083] Result verification covers the following four detection dimensions: (1) Statistical anomaly detection: Perform statistical feature analysis on the query results to detect whether there are outliers. Specific methods include: range check of numerical results (whether they exceed a reasonable range), null value rate check (whether the proportion of null values is abnormally high), and result set size check (whether the number of returned rows is consistent with the expected magnitude). For example, when querying a unit's monthly violation records, if the returned results are empty or the number of rows is abnormally large (more than 3 times the standard deviation of the historical average), it is marked as an anomaly.
[0084] (2) Counterfactual verification: Counterfactually replace the key conditions in the SQL to verify the reasonableness of the results. The specific method is to invert or replace a certain condition in the WHERE clause with a comparison value, re-execute the SQL, and compare whether the difference between the two results meets business expectations. For example, after querying "A-level risk operation", replace the condition with "C-level risk operation". If the difference in the number of the two results does not conform to common business sense (such as the number of A-level risk operations is more than that of C-level), then mark it as an anomaly.
[0085] (3) Empty Result Detection: When SQL execution returns an empty result set, the system performs root cause analysis to determine whether the data does not exist or is caused by an SQL logic error. The analysis methods include: gradually relaxing the WHERE conditions to check for the existence of data, checking whether the JOIN conditions are over-filtered, and checking whether the time range is reasonable. If it is determined to be an SQL logic error, the correction process is triggered.
[0086] (4) Execution performance detection: Monitor the execution time of SQL in the sandbox. For SQL whose execution time exceeds the threshold (default 30 seconds), analyze its execution plan, identify performance bottlenecks (such as full table scan, nested loop join, missing index, etc.), and generate optimization suggestions.
[0087] In one embodiment of the present invention, after verifying the optimized SQL statement and before using the verified SQL statement as the query statement, the method for querying power grid safety supervision data further includes: Obtain the preference optimization dataset, and train the initial SQL statement fine-tuning model based on the preference optimization dataset to obtain the trained SQL statement fine-tuning model; The validated SQL statement is input into the trained SQL statement fine-tuning model to obtain the fine-tuned SQL statement.
[0088] In one embodiment, to continuously optimize the quality of generated SQL and make it closer to the actual query habits and business preferences of safety inspectors, a Direct Preference Optimization (DPO) algorithm is introduced to construct a human feedback alignment mechanism.
[0089] DPO loss function: given a query Q and a pair of SQL ( SQL that is preferred by humans For rejected SQL statements, the DPO's loss function is:
[0090] in: The loss function for the DPO algorithm; For expectations; This is the current strategy model (i.e., the SQL generation model to be optimized). Use the reference model (i.e., the baseline model before optimization). This is a temperature parameter that controls the strength of the preference signal; the default value is 0.1. The loss function is the Sigmoid function. This loss function is optimized directly in the policy space without explicitly training the reward model, making it more stable and efficient than the traditional RLHF (Reinforcement Learning from Human Feedback) method.
[0091] Preference data collection mechanism: During daily operation, the system automatically records the following three types of preference data: (1) Implicit preference: When a user manually modifies the SQL generated by the system and then executes it, the modified SQL is used as The system's original generated SQL is used as (2) Explicit preferences: Users evaluate the quality of SQL by clicking the like / dislike button on the results display page; (3) Expert annotation: Safety supervision business experts regularly review the quality of SQL generated by the system and annotate high-quality and low-quality samples.
[0092] Training Strategy: Incremental training is employed, with model fine-tuning performed weekly. The initial training dataset consists of 500 preference pairs, with approximately 50-100 new preference pairs added weekly thereafter. The training utilizes a LoRA (Low-Rank Adaptation) efficient fine-tuning strategy, updating only a small number of parameters (rank r=16), injecting preference knowledge from the safety supervision domain while maintaining the generalizability of the basic model. After each training iteration, the model improvement is evaluated on a newly collected test set. If significant improvement is observed, the online model is updated; otherwise, the current version is retained.
[0093] The validated SQL statements are input into a SQL generation model aligned with human preferences for fine-tuning. The generated SQL is superior to the baseline model in terms of syntactic correctness, semantic accuracy, and user satisfaction, resulting in SQL statements that continuously approach the quality of human-written SQL.
[0094] In one embodiment of the present invention, after using the verified SQL statement as the final query statement, the method for querying power grid safety supervision data further includes: When a new round of natural query statements is received, the first dialogue state and the final query statement corresponding to the natural query language of the previous round are obtained. Based on the second dialogue state of the new round of natural query statements, the first dialogue state corresponding to the natural query language of the previous round, and the final query statement, the Bayesian filtering framework is used to update the second dialogue state. Calculate the KL divergence between the first dialogue state and the updated second dialogue state. When the KL divergence is greater than the preset first divergence threshold, output the intent change confirmation information. After receiving the feedback information of the intent change confirmation information, reset the context information in the updated second dialogue state, obtain the first candidate entity in the new round of natural query statements, calculate the relevance between the reference in the new round of natural query statements and the first candidate entity, take the first candidate entity with a relevance greater than the preset relevance threshold as the reference entity, and update the reset second dialogue state based on the reference entity. When the KL divergence is less than the preset first divergence threshold and greater than the preset second divergence threshold, the context and related information in the updated second dialogue state are adjusted, the second candidate entity in the dialogue history is obtained, the relevance between the reference in the new round of natural query statements and the second candidate entity is calculated, the second candidate entity with a relevance greater than the preset relevance threshold is used as the reference entity, and the adjusted second dialogue state is updated based on the reference entity.
[0095] Specifically, during each round of natural query language dialogue, a dialogue state corresponding to that round of natural query language is generated. It includes the intent distribution of the current dialogue, entity slot filling status, schema (data structure diagram) context, and query history.
[0096] When a new round of natural queries is received, a dialogue state is generated. As the dialogue content increases and the intent becomes clearer, the dialogue state is updated. The state update uses a Bayesian filtering framework. In the t-th round of dialogue, based on the observations of the current user input... (i.e., the user's input in round t) and the state of the previous round of dialogue. Calculate the updated dialogue state:
[0097] in: This represents the current dialogue state. For the dialogue state space; In candidate dialogue state; This is the status from the previous round of dialogue; For the action taken by the system in round t (such as executing SQL, requesting clarification, etc.); Let be the observation likelihood probability, representing the probability of generating an observation in the candidate dialogue state s. The probability of; Let be the state transition probability, representing the state transition probability during the dialogue. Next action The probability of transitioning to state s; This is the normalization constant.
[0098] Observational likelihood probability The calculation is based on semantic similarity: combining the observations of the user's current input... Semantic matching is performed with the expected input of the candidate state s; the higher the matching degree, the greater the likelihood probability. State transition probability. Estimation is based on a combination of historical statistics of dialogue actions and dialogue strategy models.
[0099] The updated dialogue state is obtained through the above calculations. It contains the latest intent distribution, entity slots, and contextual information.
[0100] During multiple rounds of dialogue, a user's query intent may shift (e.g., from "inquiring about violations by a certain unit" to "analyzing potential safety hazards in that unit"). It is necessary to detect significant shifts in intent in each round of dialogue in order to adjust retrieval and generation strategies accordingly.
[0101] Intent drift detection uses Kullback-Leibler divergence to measure the difference in intent distribution between two adjacent turns of dialogue:
[0102] in: Let KL divergence be the intention between the t-th round of dialogue and the intention between the (t-1)-th round of dialogue. Let be the probability distribution of intent in the t-th round of dialogue; Let be the probability distribution of intent in the (t-1)th round of dialogue; For the set of intent categories; Let be the probability distribution of intent category C in the t-th round of dialogue; Let be the probability distribution of intent category C in the (t-1)th round of dialogue. The larger the KL divergence value, the greater the difference in intent between the two rounds of dialogue.
[0103] Drift detection strategy: When When the first divergence preset threshold (default 0.5) is exceeded, the intent is determined to have drifted significantly, triggering the following processing flow: (1) Send intent change confirmation information to the user to avoid misjudgment; (2) After receiving the intent change confirmation information, reset the Schema context in the dialogue state and re-retrieve relevant data based on the new intent; (3) Clear the few sample retrieval cache and re-retrieve similar cases based on the new intent. When the KL divergence is between the first divergence preset threshold and the second divergence preset threshold (0.2-0.5), it is determined to be a gradual drift, retaining some context but supplementing the Schema information related to the new intent.
[0104] In multi-turn conversations, users often use pronouns or ellipses to refer to entities mentioned earlier (such as "the unit," "that hidden danger," "last month's," etc.). It is necessary to accurately identify these referents and parse them into specific entity references to improve the accuracy of intent recognition.
[0105] When the KL divergence is greater than the preset first divergence threshold, in order to avoid incorrect referencing, the first candidate entity in the new round of natural query statements is selected for referencing; when the KL divergence is less than the preset first divergence threshold but greater than the preset second divergence threshold, all identified entities are extracted from the dialogue history as second candidate entities, and the second candidate entity is selected for referencing.
[0106] For each candidate entity, its semantic relevance to the current referent (pronoun or abbreviated expression) is calculated. Utilizing the synonym mapping, indicator lineage, and business rule mapping in the metadata graph, a multi-dimensional relevance between the referent and the candidate entity is calculated. The relevance calculation comprehensively considers: semantic similarity, contextual distance (distance decay between the referent and the candidate entity in the dialogue history), and relevance strength. The candidate entity with the highest relevance is selected as the referencing resolution result. When the difference between the highest and second-highest relevance is less than a threshold (default 0.1), it is determined to be ambiguous referencing, triggering a clarification dialogue mechanism. The referent is replaced with a specific entity reference, updating the entity slots in the dialogue state, thus improving the accuracy of the dialogue state.
[0107] Furthermore, as a response to the above Figure 1 The implementation of the method shown in this embodiment of the invention provides a device for querying power grid safety monitoring data, such as... Figure 4 As shown, the device includes: Metadata graph construction module 402 is used to obtain metadata from the safety supervision business concept and the power grid safety supervision database, and to construct a metadata graph based on the metadata from the safety supervision business concept and the power grid safety supervision database; The intent recognition module 404 is used to obtain the user's natural query language, perform feature extraction based on intent recognition on the natural query language to obtain structured semantic features, and obtain relevant data in the metadata graph according to the recognized intent. The SQL statement generation module 406 is used to obtain constraints, input structured semantic features, related data and constraints into a preset statement conversion model to obtain multiple candidate SQL statements, optimize the multiple candidate SQL statements to obtain optimized SQL statements; The SQL statement verification module 408 is used to verify the optimized SQL statement, and the verified SQL statement is used as the final query statement to query the safety supervision data.
[0108] This application provides a query device for power grid safety supervision data. Compared with existing technologies, it constructs a metadata graph based on safety supervision business concepts and metadata in the power grid safety supervision database. It then performs intent-based feature extraction on natural query language to obtain structured semantic features. According to the identified intent, it retrieves relevant data from the metadata graph. The structured semantic features, relevant data, and constraints are input into a preset statement conversion model to obtain multiple candidate SQL statements. These candidate SQL statements are then optimized to obtain an optimized SQL statement. The optimized SQL statement is then verified, and the verified SQL statement is used as the final query statement. Because the metadata graph reflects the relationship between safety supervision business concepts and metadata in the power grid safety supervision database, it improves the accuracy of multi-hop queries, i.e., improves the accuracy of the SQL statement. The addition of conditional constraints during SQL statement conversion and the verification of the optimized SQL statement also improves the accuracy of the SQL statement.
[0109] In one embodiment, the metadata graph building module is also used for: A data structure graph is constructed using the table name as a node and the preset business relationships and foreign key relationships between data tables as edges. A business concept graph is constructed using safety supervision business concepts as nodes and semantic relationships between these concepts as edges. Based on the preset mapping relationship, the nodes in the data structure diagram and the business concept diagram are connected to obtain a heterogeneous diagram; A hybrid expert field embedding method is used to embed features of the fields corresponding to nodes in a heterogeneous graph, resulting in the initial field vector of the nodes. The initial field vectors of nodes in the heterogeneous graph are encoded to obtain the metadata graph.
[0110] In one embodiment, the intent recognition module is further configured to: The intent recognition of natural query language is performed based on a pre-defined intent recognition model, which includes a semantic feature extractor and a Bayesian classification head. The semantic feature extractor extracts semantic features from the natural query language, and the Bayesian classification head determines the intent based on the semantic features. Based on the determined intent, entities and entity relationships are extracted from natural query language, and the identified intent, extracted entities, and entity relationships are used as structured semantic features.
[0111] In one embodiment, the SQL statement generation module is also used for: Each candidate SQL statement is split into segments, and the split statement segments are treated as multiple nodes on a path in order from front to back. Each path is then used as a branch of the search tree to construct the search tree. The Monte Carlo tree search algorithm is used to search for nodes based on the search tree in order to determine the optimal SQL statement.
[0112] In one embodiment, the SQL statement validation module is also used for: The optimized SQL statement is parsed, and the parsed statement is subjected to syntax verification. After the syntax verification is passed, the semantic verification of table names, join operations and aggregation calculation methods in the optimized SQL statement is performed based on the metadata graph. After semantic validation passes, the optimized SQL statement undergoes rule compliance validation. Once the rule compliance validation passes, the optimized SQL statement is executed in a sandbox environment. During execution, anomaly detection, empty result detection, execution performance testing, and counterfactual verification are performed.
[0113] In one embodiment, the device for querying power grid safety monitoring data further includes: The fine-tuning module is used to obtain the preference optimization dataset, train the initial SQL statement fine-tuning model based on the preference optimization dataset, and obtain the trained SQL statement fine-tuning model. The validated SQL statement is then input into the trained SQL statement fine-tuning model to obtain the fine-tuned SQL statement.
[0114] In one embodiment, the device for querying power grid safety monitoring data further includes: The semantic processing module is used to obtain the first dialogue state and the final query statement corresponding to the natural query language of the previous round when a new round of natural query statement is received. Based on the second dialogue state of the new round of natural query statement, the first dialogue state corresponding to the natural query language of the previous round, and the final query statement, the second dialogue state is updated using a Bayesian filtering framework. Calculate the KL divergence between the first dialogue state and the updated second dialogue state. When the KL divergence is greater than the preset first divergence threshold, output the intent change confirmation information. After receiving the feedback information of the intent change confirmation information, reset the context information in the updated second dialogue state, obtain the first candidate entity in the new round of natural query statements, calculate the relevance between the reference in the new round of natural query statements and the first candidate entity, take the first candidate entity with a relevance greater than the preset relevance threshold as the reference entity, and update the reset second dialogue state based on the reference entity. When the KL divergence is less than the preset first divergence threshold and greater than the preset second divergence threshold, the context and related information in the updated second dialogue state are adjusted, the second candidate entity in the dialogue history is obtained, the relevance between the reference in the new round of natural query statements and the second candidate entity is calculated, the second candidate entity with a relevance greater than the preset relevance threshold is used as the reference entity, and the adjusted second dialogue state is updated based on the reference entity.
[0115] According to one embodiment of the present invention, a storage medium is provided, the storage medium storing at least one executable instruction, which can execute the power grid safety monitoring data query method in any of the above method embodiments.
[0116] Figure 5 The diagram illustrates a structural schematic of a computer device according to an embodiment of the present invention. The specific embodiments of the present invention do not limit the specific implementation of the computer device.
[0117] like Figure 5 As shown, the computer device may include: a processor 502, a communication interface 504, a memory 506, and a communication bus 508.
[0118] The processor 502, communication interface 504, and memory 506 communicate with each other via communication bus 508.
[0119] Communication interface 504 is used to communicate with other network elements such as clients or other servers.
[0120] The processor 502 is used to execute program 510, specifically the relevant steps in the above embodiment of the method for querying power grid safety supervision data.
[0121] Specifically, program 510 may include program code that includes computer operation instructions.
[0122] Processor 502 may be a central processing unit (CPU), an application-specific integrated circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present invention. The computer device includes one or more processors, which may be processors of the same type, such as one or more CPUs; or processors of different types, such as one or more CPUs and one or more ASICs.
[0123] Memory 506 is used to store program 510. Memory 506 may include high-speed RAM memory, and may also include non-volatile memory, such as at least one disk storage device.
[0124] Specifically, program 510 can be used to cause processor 502 to perform the following operations: Obtain metadata from the safety supervision business concept and the power grid safety supervision database, and construct a metadata graph based on the metadata from the safety supervision business concept and the power grid safety supervision database; The system acquires the user's natural query language, performs intent-based feature extraction on the natural query language to obtain structured semantic features, and retrieves relevant data from the metadata graph based on the identified intent. Obtain constraints, input structured semantic features, relevant data and constraints into a preset statement transformation model to obtain multiple candidate SQL statements, optimize multiple candidate SQL statements to obtain optimized SQL statements; The optimized SQL statement is validated, and the validated SQL statement is used as the final query statement. The safety supervision data is then queried based on the final query statement.
[0125] It is obvious to those skilled in the art that the modules or steps of the present invention described above can be implemented using general-purpose computing devices. They can be centralized on a single computing device or distributed across a network of multiple computing devices. In one embodiment, they can be implemented using device-executable program code, thereby allowing them to be stored in a storage device for execution by a computing device. In some cases, the steps shown or described can be performed in a different order than those presented herein, or they can be fabricated as separate integrated circuit modules, or multiple modules or steps can be fabricated as a single integrated circuit module. Thus, the present invention is not limited to any particular hardware and software combination.
[0126] The above embodiments are merely exemplary embodiments of this application and are not intended to limit this application. The scope of protection of this application is defined by the claims. Those skilled in the art can make various modifications or equivalent substitutions to this application within its substance and scope of protection, and such modifications or equivalent substitutions should also be considered to fall within the scope of protection of this application.
Claims
1. A method for querying power grid safety monitoring data, characterized in that, include: Obtain the metadata of the safety supervision business concept and the power grid safety supervision database, and construct a metadata graph based on the safety supervision business concept and the metadata of the power grid safety supervision database; The user's natural query language is obtained, and feature extraction based on intent recognition is performed on the natural query language to obtain structured semantic features. Relevant data is then obtained from the metadata graph according to the recognized intent. Obtain the constraints, input the structured semantic features, the relevant data and the constraints into a preset statement transformation model to obtain multiple candidate SQL statements, optimize the multiple candidate SQL statements to obtain an optimized SQL statement; The optimized SQL statement is validated, and the validated SQL statement is used as the final query statement. The safety supervision data is then queried based on the final query statement.
2. The method for querying power grid safety supervision data as described in claim 1, characterized in that, The metadata in the power grid safety supervision database includes multiple data tables. The construction of a metadata graph based on the safety supervision business concept and the metadata in the power grid safety supervision database includes: A data structure graph is constructed using the table names of the data tables as nodes and the preset business relationships and foreign key relationships between data tables as edges. A business concept graph is constructed using the aforementioned safety supervision business concepts as nodes and the semantic relationships between the aforementioned safety supervision business concepts as edges. Based on a preset mapping relationship, the nodes in the data structure diagram and the business concept diagram are connected to obtain a heterogeneous diagram; A hybrid expert field embedding method is used to embed features into the fields corresponding to the nodes in the heterogeneous graph to obtain the initial field vector of the nodes; The initial field vectors of the nodes in the heterogeneous graph are encoded to obtain the metadata graph.
3. The method for querying power grid safety supervision data as described in claim 1, characterized in that, The process of extracting structured semantic features from the natural query language based on intent recognition includes: The intent of the natural query language is identified based on a preset intent recognition model, wherein the preset intent recognition model includes a semantic feature extractor and a Bayesian classification head, the semantic feature extractor extracts semantic features from the natural query language, and the Bayesian classification head determines the intent based on the semantic features; Based on the determined intent, entities and entity relationships are extracted from the natural query language, and the identified intent, extracted entities, and entity relationships are used as structured semantic features.
4. The method for querying power grid safety supervision data as described in claim 1, characterized in that, The optimization of the plurality of candidate SQL statements to obtain the optimized SQL statement includes: Each candidate SQL statement is split into segments, and the split statement segments are treated as multiple nodes on a path in order from front to back. Each path is then used as a branch of the search tree to construct the search tree. The Monte Carlo tree search algorithm is used to perform node searches based on the search tree to determine the optimal SQL statement.
5. The method for querying power grid safety supervision data as described in claim 1, characterized in that, The verification of the optimized SQL statement includes: The optimized SQL statement is parsed, and the parsed statement is subjected to syntax verification. After the syntax verification is passed, the table names, join operations and aggregation calculation methods in the optimized SQL statement are semantically verified based on the metadata graph. After semantic verification is passed, the optimized SQL statement is subjected to rule compliance verification. After rule compliance verification is passed, the optimized SQL statement is executed in a sandbox environment. During the execution process, anomaly detection, empty result detection, execution performance testing, and counterfactual verification are performed.
6. The method for querying power grid safety supervision data as described in claim 1, characterized in that, After validating the optimized SQL statement and before using the validated SQL statement as the query statement, the method for querying the power grid safety supervision data further includes: Obtain the preference optimization dataset, and train the initial SQL statement fine-tuning model based on the preference optimization dataset to obtain the trained SQL statement fine-tuning model; The validated SQL statement is input into the trained SQL statement fine-tuning model to obtain the fine-tuned SQL statement.
7. The method for querying power grid safety supervision data as described in any one of claims 1-6, characterized in that, After using the validated SQL statement as the final query statement, the method for querying the power grid safety supervision data further includes: When a new round of natural query statements is received, the first dialogue state and the final query statement corresponding to the natural query language of the previous round are obtained. Based on the second dialogue state of the new round of natural query statements, the first dialogue state corresponding to the natural query language of the previous round, and the final query statement, the Bayesian filtering framework is used to update the second dialogue state. Calculate the KL divergence between the first dialogue state and the updated second dialogue state. When the KL divergence is greater than a preset first divergence threshold, output intent change confirmation information. After receiving the feedback information of the intent change confirmation information, reset the context information in the updated second dialogue state, obtain the first candidate entity in the new round of natural query statements, calculate the relevance between the referent in the new round of natural query statements and the first candidate entity, take the first candidate entity with a relevance greater than a preset relevance threshold as the referent reference entity, and update the reset second dialogue state based on the referent reference entity. When the KL divergence is less than a preset first divergence threshold and greater than a preset second divergence threshold, the context and related information in the updated second dialogue state are adjusted, the second candidate entity in the dialogue history is obtained, the relevance between the reference in the new round of natural query statements and the second candidate entity is calculated, the second candidate entity with a relevance greater than a preset relevance threshold is used as the reference entity, and the adjusted second dialogue state is updated based on the reference entity.
8. A device for querying power grid safety monitoring data, characterized in that, include: The metadata graph construction module is used to obtain metadata from the safety supervision business concept and the power grid safety supervision database, and to construct a metadata graph based on the safety supervision business concept and the metadata in the power grid safety supervision database; The intent recognition module is used to acquire the user's natural query language, perform intent recognition-based feature extraction on the natural query language to obtain structured semantic features, and acquire relevant data in the metadata graph according to the recognized intent. The SQL statement generation module is used to obtain constraints, input the structured semantic features, the relevant data and the constraints into a preset statement conversion model to obtain multiple candidate SQL statements, and optimize the multiple candidate SQL statements to obtain optimized SQL statements. The SQL statement verification module is used to verify the optimized SQL statement, and the verified SQL statement is used as the final query statement to query the safety supervision data.
9. A storage medium storing at least one executable instruction, characterized in that, The executable instructions cause the processor to perform the operation corresponding to the power grid safety monitoring data query method as described in any one of claims 1-7.
10. A computer device, comprising: The processor, memory, communication interface, and communication bus are provided, wherein the processor, memory, and communication interface communicate with each other via the communication bus. The memory is used to store at least one executable instruction, characterized in that the executable instruction causes the processor to perform the operation corresponding to the power grid safety supervision data query method as described in any one of claims 1-7.