Method, system and device for interactive query of multi-dimensional data of distributed energy and medium

CN122240758APending Publication Date: 2026-06-19DONGA POWER SUPPLY CO STATE GRID SHANDONG ELECTRIC POWER CO

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
DONGA POWER SUPPLY CO STATE GRID SHANDONG ELECTRIC POWER CO
Filing Date
2026-01-27
Publication Date
2026-06-19

Smart Images

  • Figure CN122240758A_ABST
    Figure CN122240758A_ABST
Patent Text Reader

Abstract

This invention relates to the field of data processing technology, specifically providing an interactive query method, system, device, and medium for multidimensional data of distributed energy resources. The method includes: receiving question text and generating a semantic vector based on the question text; querying historical question-and-answer data matching the semantic vector from a long short-term memory pool; if the historical question-and-answer data is found, generating search results based on the historical question-and-answer data; if the historical question-and-answer data is not found, generating a query statement based on the question text using a large model, and obtaining search results by sending the query statement to a database; generating a chart based on the search results, and outputting the chart to a visualization platform. This invention enables interactive querying of multidimensional data of distributed energy resources, eliminating the need for redeveloping pages or manual statistics, saving time and costs, and facilitating decision-making for production and management personnel.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of data processing technology, specifically relating to an interactive query method, system, device, and medium for multidimensional data of distributed energy. Background Technology

[0002] Against the backdrop of digital transformation in the power industry and the construction of new power systems, the large-scale integration of distributed energy resources has led to an explosive growth and high heterogeneity of grid data. Distributed resources such as photovoltaics, wind power, energy storage, and electric vehicle charging facilities generate massive amounts of multi-dimensional, high-frequency time-series, status, and environmental data during operation. This data is dispersedly stored in Supervisory Control and Automated Guided Vehicle (SCADA) systems, Distribution Management Systems (DMS), Energy Management Systems (EMS), electricity consumption information collection systems, meteorological monitoring platforms, and various distributed energy monitoring platforms. Currently, many power companies are building energy data platforms to access, integrate, and manage multi-source heterogeneous data, aiming to create multi-dimensional data assets for the operation, management, and service of distributed energy resources.

[0003] Currently, the display and application of distributed energy data assets mainly rely on traditional business intelligence (BI) tools. These tools visualize operational information such as power generation output, load curves, and equipment status through pre-configured fixed-dimensional dashboards and reports to assist operators and managers in analysis and decision-making. However, this static-dimensional display method is difficult to flexibly respond to the needs of exploring multi-dimensional data in real time, and cannot support dynamic and combined query scenarios such as "comparing the output characteristics of photovoltaic clusters in different weather conditions" or "analyzing the relationship between the charging and discharging behavior of energy storage in a certain area and electricity prices." When operational strategies are adjusted, analysis objectives change, or new regulatory requirements arise, existing reports often cannot be directly reused. Resources need to be reorganized for page development or manual extraction and statistics from multiple systems are required. This process is time-consuming and slow, making it difficult to adapt to the randomness and interactivity of distributed energy management needs, thus restricting the agile release of data value and the timeliness of business decisions. Summary of the Invention

[0004] In view of the above-mentioned shortcomings of the prior art, the present invention provides an interactive query method, system, device and medium for multidimensional data of distributed energy, so as to solve the above-mentioned technical problems.

[0005] In a first aspect, the present invention provides an interactive query method for multidimensional data of distributed energy resources, comprising: Receive the question text and generate a semantic vector based on the question text; Retrieve historical question-and-answer data that matches the semantic vector from the long short-term memory pool; If the historical question and answer data is found, search results are generated based on the historical question and answer data; If the historical question and answer data is not found, a query statement is generated based on the question text using a large model, and the search results are obtained by sending the query statement to the database. A chart is generated based on the search results, and the chart is output to a visualization platform.

[0006] In an optional implementation, receiving question text and generating a semantic vector based on the question text includes: Keywords in user questions are extracted using named entity recognition and semantic role labeling. These keywords include table names, fields, and conditions. The question text is converted into a semantic vector based on the keywords.

[0007] In an optional implementation, the method further includes: Store historical question and answer data within the set time period into the short-term memory pool; The historical question-and-answer data that will be marked as valid will be stored in the long-term memory pool.

[0008] In an optional implementation, the method further includes: Monitor the access frequency of historical question-and-answer data in the long-term memory pool and set the access frequency as the weight of historical question-and-answer data. If the number of historical question-and-answer data in the long-term memory pool reaches the set threshold, then delete the historical question-and-answer data with the lowest weight. Monitor the access frequency of historical question and answer data in the short-term memory pool, and update the historical document data whose access frequency reaches the set frequency threshold to the long-term memory pool.

[0009] In an optional implementation, a query statement is generated for the semantic vector using a large model, including: The complex problem is broken down into five elements, including time, place, people, task, and progress, and the five elements are semantically encoded through vectorization. Knowledge graphs are built based on metadata, and vectorized rules are generated using large models. Construct cross-table query links using rules and knowledge graphs; A query statement is generated based on the semantically encoded five-dimensional elements and the matching cross-table query links.

[0010] In one alternative implementation, a knowledge graph is constructed based on metadata, and vectorized rules are generated using a large model, including: Scan the database metadata to extract the database name, table name, field name, and comments; Construct triples based on the library name, table name, field name, and comments, and store the triples in the graph database; A prompt word template for generating rules is pre-configured, and metadata is populated into the prompt word template to obtain prompt words. The metadata includes table structure and field descriptions. The prompt words are input into the large model to obtain rule description text, which defines the association between the question keywords and the metadata; The rule description text is encoded into a rule vector, and the rule vector is saved to the rule vector library.

[0011] In an optional implementation, cross-table query links are constructed using rules and knowledge graphs, including: Calculate the cosine similarity between the user's question vector and the rule vector library, match the highest-scoring rule, and determine the main table based on the highest-scoring rule; Starting from the main table, we search for secondary tables with foreign key relationships by traversing the knowledge graph; Extract the entities from the problem and align the fields in the secondary table based on those entities; Based on the main table, secondary table, and aligned fields, construct a cross-table query chain.

[0012] Secondly, the present invention provides an interactive query system for multidimensional data of distributed energy resources, comprising: A receiving module is used to receive question text and generate a semantic vector based on the question text; The matching module is used to query historical question-and-answer data that match the semantic vector from the long short-term memory pool; The first processing module is used to generate search results based on the historical question and answer data if the historical question and answer data is found. The second processing module is used to generate a query statement based on the question text using a large model if the historical question and answer data is not found, and to obtain the retrieval results by sending the query statement to the database. The visualization module is used to generate charts based on the search results and output the charts to the visualization platform.

[0013] Thirdly, a device is provided, comprising: A memory used to store interactive query programs for multidimensional data on distributed energy resources; The processor is configured to implement the steps of the interactive query method for distributed energy multidimensional data as provided in the first aspect when executing the interactive query program for the distributed energy multidimensional data.

[0014] Fourthly, a computer-readable medium is provided, on which an interactive query program for multidimensional data of distributed energy is stored, wherein when the interactive query program for multidimensional data of distributed energy is executed by a processor, the steps of the interactive query method for multidimensional data of distributed energy provided in the first aspect are implemented.

[0015] The beneficial effects of this invention are as follows: The interactive query method, system, device, and medium for multidimensional distributed energy data provided by this invention, through the technical feature of receiving question text and generating semantic vectors, realize the need for flexible data processing based on user questions, breaking free from the constraints of fixed data dimensions in traditional BI tools. The technical feature of querying matching historical question-and-answer data from the long short-term memory pool enables rapid result retrieval and improves response speed. The technical feature of generating query statements using large models to obtain retrieval results allows for efficient acquisition of required information even without matching historical data. The technical feature of generating charts based on retrieval results and outputting them to a visualization platform enables interactive querying of multidimensional distributed energy data, eliminating the need for redeveloping pages or manual statistics, saving time and costs, and facilitating decision-making for production management personnel.

[0016] Furthermore, the design principle of this invention is reliable, the structure is simple, and it has a very wide range of application prospects. Attached Figure Description

[0017] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0018] Figure 1 This is a schematic flowchart of a method according to an embodiment of the present invention.

[0019] Figure 2 This is a schematic block diagram of a system according to an embodiment of the present invention.

[0020] Figure 3 This is a schematic diagram of the structure of a device provided in an embodiment of the present invention. Detailed Implementation

[0021] To enable those skilled in the art to better understand the technical solutions of this invention, the technical solutions of the embodiments of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this invention, and not all embodiments. Based on the embodiments of this invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of this invention.

[0022] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

[0023] The interactive query method for multidimensional data of distributed energy provided in this embodiment of the invention is executed by a computer device, and correspondingly, the interactive query system for multidimensional data of distributed energy runs on the computer device.

[0024] Figure 1 This is a schematic flowchart illustrating a method according to an embodiment of the present invention. Wherein, Figure 1 The implementing entity can be an interactive query system for multidimensional data on distributed energy resources. Depending on different needs, the order of the steps in this flowchart can be changed, and some can be omitted.

[0025] like Figure 1 As shown, the method includes: S1. Receive the question text and generate a semantic vector based on the question text; S2. Query historical question-and-answer data that match the semantic vector from the long short-term memory pool; S3. If the historical question and answer data is found, then generate search results based on the historical question and answer data; S4. If the historical question and answer data is not found, a query statement is generated based on the question text using the large model, and the search results are obtained by sending the query statement to the database; S5. Generate a chart based on the search results and output the chart to a visualization platform.

[0026] In one embodiment of the present invention, based on step S1, the following will provide a possible embodiment and describe its specific implementation in a non-limiting manner.

[0027] S101. Extract keywords from user questions using named entity recognition and semantic role labeling. The keywords include table names, fields, and conditions.

[0028] 1. Named Entity Recognition (NER) Implementation.

[0029] Model selection: Use pre-trained BERT series models (such as ERNIE and RoBERTa) to identify table names, field names, and condition values. Data preprocessing: Construct a domain-specific training corpus, which includes database table structure documents and typical query use cases.

[0030] Model optimization: A CRF layer is introduced to handle entity boundary recognition, a multi-task learning framework is constructed, and NER and part-of-speech tagging tasks are trained together.

[0031] 2. Enhanced Semantic Role Labeling (SRL).

[0032] Predicate identification: Define a database operation predicate dictionary: {query, statistics, calculation, update, delete}; use dependency parsing to determine the core position of the predicates.

[0033] Argument structure analysis: Define semantic roles such as ARG0 (agent), ARG1 (patient), and ARG2 (condition).

[0034] Constraints: Field entities must be associated with specific table entities; condition entities must contain two sub-entities: operator and value.

[0035] 3. Keyword integration strategy.

[0036] Entity disambiguation: Resolving field name conflicts based on database schema and context-aware table alias resolution.

[0037] Relationship construction: Construct a triple structure: (table name, field name, condition expression); Example: (user, age, "greater than 18") → (user, age, ">18").

[0038] S102. Convert the question text into a semantic vector based on the keywords.

[0039] 1. Vector representation model design.

[0040] Hybrid Embedded Architecture: Semantic vector = Structure-aware embedding ⊕ Pattern knowledge embedding ⊕ Context embedding; Structure-aware embedding: Graph embedding based on keyword relationship graph; Pattern knowledge embedding: pre-trained database pattern representation; Contextual embedding: BERT-encoded representation of problem text.

[0041] 2. Vector space construction.

[0042] Pattern-aware vector space: Table name vectors are mapped to the database schema space; Field vectors retain data type and constraint information; Conditional vector encoding operators and comparison values.

[0043] 3. Vector generation process.

[0044] Keyword vectorization: def keyword_to_vector(keyword, entity_type): if entity_type == "TABLE": # Table name vector = text embedding + schema embedding return concat(bert_embed(keyword),schema_embed(keyword)) elif entity_type == "FIELD": # Field vector = text embedding + type embedding + associated table embedding field_type = get_field_type(keyword) table = get_associated_table(keyword) return concat( bert_embed(keyword), type_embed(field_type), table_embed(table) ) elif entity_type == "CONDITION": # Condition vector = operator embedding ⊕ value embedding op, value = parse_condition(keyword) return op_embed(op) + value_embed(value).

[0045] 4. Relationship modeling and vector fusion.

[0046] Graph attention mechanism: # Relation-aware vector fusion def relation_aware_fusion(keyword_vectors,relation_graph): # Build adjacency matrix adj_matrix = build_adjacency_matrix(relation_graph) # Graph attention layer for _ in range(num_layers): keyword_vectors = graph_attention_layer(keyword_vectors, adj_matrix) # Global pooling to get the final vector returnglobal_pooling(keyword_vectors).

[0047] In one embodiment of the present invention, based on step S2, the following will provide a possible embodiment and describe its specific implementation in a non-limiting manner.

[0048] First, construct a long short-term memory pool: (1) Store historical question and answer data within the set time period into the short-term memory pool; Configure a sliding time window (e.g., 7 days) as the storage period, implemented using Redis Sorted Sets, with timestamps as fractions.

[0049] Data Structures: STMP = { "question_id": { "question_text": "original question text", "answer_text": "corresponding answer", "timestamp": "timestamp", "access_count": 0, "last_access_time": "last access time", "semantic_vector": "question semantic vector", "metadata":{ "user_id": "questioning user ID", "source": "data source", "confidence_score": 0.85}}}.

[0050] Storage engine: Uses Redis cluster to achieve high-speed read and write, and sets a key expiration policy to automatically evict old data.

[0051] (2) Store the historical question and answer data that will be marked as valid in the long-term memory pool.

[0052] Validity determination rules: Manual review mark (reviewer confirms the correctness of the answer); Automatic verification mechanism (consistency check between answer and knowledge base); Confidence threshold filtering (e.g., confidence_score ≥ 0.8).

[0053] Update the Long Short-Term Memory pool: Long Memory Pool (LTMP) Update: Access frequency monitoring: The access frequency weight is calculated using a time decay model: access_frequency = Σ(1 / 2)^((current time - access time) / time half-life); The time half-life is configurable, such as 7 days. A scheduled task (every hour) recalculates the access frequency for all records.

[0054] Data eviction strategy: When the amount of LTMP data reaches a threshold (e.g., 1 million records), the eviction mechanism is triggered. A min-heap is constructed based on the access frequency weights, and the top 10% of the data (with the lowest weight) is deleted. Perform a soft deletion, retaining historical versions for auditing purposes.

[0055] Short-Term Memory Pool (STMP) Update: Frequency threshold monitoring: The system records the number of accesses for each piece of data in real time. When the number of accesses is greater than or equal to a threshold (e.g., 5 times), the validity verification process is initiated: the answer verification module is called to check the correctness of the answer; the confidence score is calculated; if the score is greater than or equal to the validity threshold, the system is migrated to LTMP.

[0056] Data migration process: Copy the data completely from STMP to LTMP, mark the data as "migrated" in STMP, and automatically delete the migrated data the next time the window is swiped.

[0057] Retrieve historical question-and-answer data that matches the semantic vector from the long short-term memory pool: Vector similarity calculation: Simultaneously calculate cosine similarity and Euclidean distance: similarity=0.7*cosine_similarity+0.3*(1 / (1+euclidean_distance)) A vector dimension weighting mechanism is introduced to assign higher weights to keyword dimensions such as table names and field names.

[0058] Search process: Prioritize approximate vector retrieval in STMP (using the HNSW algorithm); If there are fewer than 3 results with a similarity of ≥0.8 in STMP, the search continues in LTMP; The two results are merged and sorted in descending order of similarity.

[0059] In one embodiment of the present invention, based on step S3, the following will provide a possible embodiment and describe its specific implementation in a non-limiting manner.

[0060] If the historical question-and-answer data is found, search results are generated based on the historical question-and-answer data: 1. Optimization of result aggregation and sorting Multi-source result fusion strategy: The search results from the Short-Term Memory Pool (STMP) and the Long-Term Memory Pool (LTMP) are merged, and an intelligent deduplication mechanism is used: If the same question ID exists in two places, the record with higher similarity will be retained first; If the similarity difference is less than 5%, then the record from LTMP is selected (because it is more effective). A hybrid ranking model is constructed, taking into account: semantic similarity (50%), historical access frequency (20%), time freshness (20%), and answer validity score (10%).

[0061] Results are displayed in tiers: Exact match (similarity ≥ 0.85): Returns the answer directly and marks it as "confirmed match"; Highly similar (0.7 ≤ similarity < 0.85): Returns the answer with a "possible match" hint and displays the similarity score; Partially similar (0.5 ≤ similarity < 0.7): Returns a list of related questions, allowing users to make a second selection; Low similarity (similarity < 0.5): Returns "No exact match found", but displays the top 3 most relevant questions.

[0062] 2. The answer content has been enhanced.

[0063] Knowledge updating and integration: Automatically extract the latest information related to the answer from the knowledge base; Real-time replacement of time-sensitive content (such as prices and dates); Please add the following note at the end of your answer: "This information was updated on [date]. For the latest updates, please refer to...".

[0064] Handling multiple versions of answers: When there are multiple valid answers to the same question: The latest version of the answer will be displayed by default; Offers a collapse option for "View historical versions"; Automatically compare version differences and highlight updated content; Preserve the historical trajectory of the answer's evolution to support the tracing of decision-making basis.

[0065] In one embodiment of the present invention, based on step S4, the following will provide a possible embodiment and describe its specific implementation in a non-limiting manner.

[0066] S401. Decompose the complex problem into five-dimensional elements, which include time, place, people, tasks, and progress, and semantically encode the five-dimensional elements through vectorization.

[0067] (1) Element extraction: Named Entity Recognition (NER): Using variants of pre-trained Transformers (such as BERT) (such as SpaCy or FLAIR), it identifies time, place, and people by fine-tuning the domain.

[0068] Task and progress extraction: Dependency parsing: Extract verb phrases from the question (e.g., “statistics sales” → “statistics”) as the task.

[0069] Semantic role labeling: The "progress" status is determined by the PropBank / SRL model (e.g., "data from the last three months" → time + progress).

[0070] Multi-label classification: Construct a five-dimensional element classifier to map extracted entities to predefined labels (time / location, etc.).

[0071] (2) Vectorization of user questions: Semantic embedding: Use Sentence-BERT (SBERT) or SimCSE to generate vector representations of the question sentences (such as 384-dimensional vectors).

[0072] Element fusion coding: The labels of the five elements (such as time: 2023) are converted into feature vectors and concatenated with the question sentence vector.

[0073] The final semantic vector (e.g., 256-dimensional) is generated by dimensionality reduction through a fully connected layer.

[0074] S402. Construct a knowledge graph based on metadata and generate vectorized rules using a large model.

[0075] Scan the database metadata to extract the database name, table name, field name, and comments; Construct triples based on the database name, table name, field name, and comments, and store the triples in the graph database; for example, a triple is table A, association_field, table B, fieldX.

[0076] A prompt word template for generating rules is pre-configured, and metadata is populated into the prompt word template to obtain prompt words. The metadata includes table structure and field descriptions. The prompt words are input into the large model to obtain rule description text. This rule description text defines the association between the question keywords and metadata. For example, "If the question contains 'business volume' and involves 'time', prioritize associating the 'sale_amount' and 'date' fields of the 'sales_fact' table." The rule description text is encoded into a rule vector, and the rule vector is saved to the rule vector library.

[0077] S403. Construct cross-table query links through rules and knowledge graphs.

[0078] (1) Calculate the cosine similarity between the user question vector and the rule vector library, match the highest score rule, and determine the main table based on the highest score rule; Example: Question "Business volume growth" matches the rule "Related sales_fact table".

[0079] (2) Starting from the main table, find the secondary tables with foreign key relationships by traversing the knowledge graph; (3) Extract the entities from the problem and align the fields in the sub-table based on the entities: Synonym matching: Use a predefined field thesaurus (e.g., "Business Volume = Revenue = Transaction Volume") to align the descriptions in user questions with the physical field names.

[0080] Example: The user description "customer address" matches the shipping_address field in the user_info table.

[0081] Type casting: Automatic handling when the types of related fields are inconsistent: Numeric compatibility: Convert string foreign keys (e.g., region_code: 'BJ') to integer primary keys (e.g., region_dim.id: 101) using a mapping table.

[0082] Time formatting: unify the precision of sale_date(DATE) and log_time(TIMESTAMP) (e.g., truncate to the day).

[0083] (4) Construct cross-table query links based on the main table, the secondary table, and the aligned fields.

[0084] Derivation of association conditions: Generate ON clauses based on foreign key relationships in the knowledge graph: -- The main table sales_fact is associated with region_dim ON sales_fact.region_id = region_dim.id.

[0085] Multi-table join optimization: Avoid Cartesian products: Only join tables that are actually needed for the user's problem. For example, if there is no need for "department information", do not join the department table.

[0086] Join order selection: join smaller tables first (e.g., read region_dim first and then filter sales_fact) to reduce the size of intermediate result sets.

[0087] S404. Generate a query statement based on the semantically encoded five-dimensional elements and the matching cross-table query links.

[0088] 1. Mapping five-dimensional elements to query clauses.

[0089] Time element processing: Automatically generate time filter conditions: "Last three months" → WHERE date_field>= DATEADD(MONTH, -3, GETDATE()) "Q1 2023" → WHERE date_field BETWEEN '2023-01-01' AND '2023-03-31'.

[0090] Supports time granularity conversion: "Weekly statistics" → GROUP BY DATEPART(WEEK, date_field); "Monthly Trend" → GROUP BY DATEFORMAT(date_field, 'YYYY-MM').

[0091] Location element processing: Automatically associate geographic dimension tables: "Beijing region" → JOIN region_dim ON sales_fact.region_id = region_dim.id; AND region_dim.name = 'Beijing'.

[0092] Supports geographic-level queries: "All provinces in East China" → JOIN region_dim ON ... WHERE region_dim.parent_id = (SELECT id FROM region_dim WHERE name ='East China').

[0093] Character element processing: Generate permission filtering conditions: "My Customers" → WHERE customer.owner_id = CURRENT_USER_ID; Supports role-related queries: "Sales Manager" → JOIN employee_dim ON sales_fact.sales_id = employee_dim.id, AND employee_dim.position = 'Sales Manager'.

[0094] Task element processing: Mapping business operations to SQL verbs: "Statistics" → SELECT COUNT / SUM / AVG; "Compare" → SELECT...GROUPBY...HAVING; "Filter" → WHERE; "Sort" → ORDERBY; Handling complex business calculations: "Growth rate"→(CURRENT_VALUE-PREVIOUS_VALUE) / PREVIOUS_VALUE*100; "Market share" → CURRENT_VALUE / SUM(CURRENT_VALUE)OVER() * 100.

[0095] Progress element processing: Generate state filtering conditions: "Order completed" → WHERE order_status = 'COMPLETED' "Overdue items" → WHERE end_date <CURRENT_DATE AND status != 'FINISHED'。

[0096] Supports state evolution query: "Status Change History" → SELECT status, change_time FROM order_history WHERE order_id = ORDER BY change_time.

[0097] 2. Query statement assembly and optimization.

[0098] Smart projection generation: Automatically select the projection field based on the question: "Transaction volume" → SELECT sale_amount; "Number of customers" → SELECT COUNT(DISTINCT customer_id); Supports field alias mapping: SELECT sale_amount AS "Business Volume", customer_count AS "number of customers".

[0099] Aggregation and grouping processing: Automatically identify aggregation requirements: "Business volume in each region" → SELECTregion, SUM(sale_amount)GROUPBYregion; "Monthly Trend" → SELECT MONTH(sale_date), SUM(sale_amount) GROUPBYMONTH(sale_date).

[0100] Handling multi-level aggregation: "Calculate average salary by department and position" → SELECT department, position, AVG(salary); GROUP BY department, position.

[0101] Query performance optimization: Add index hint: For frequently queried fields, use FORCEINDEX hints; Range queries → It is recommended to create a composite index.

[0102] Implement a query caching strategy: High-frequency queries → Automatic result caching; Queries with high real-time requirements are marked as not to be cached.

[0103] In one embodiment of the present invention, based on step S5, a possible embodiment will be given below, and its specific implementation will be described in a non-limiting manner.

[0104] S501. Data Preprocessing.

[0105] Data cleaning: Cleaning the retrieved data to remove duplicate, missing, and outlier values. For example, in sales data, if there are missing sales volume values, fill them in using the mean, median, or other appropriate methods, depending on the characteristics of the data; for abnormal sales quantities (such as values ​​that significantly exceed the normal range), verify and process them.

[0106] Data transformation: Converting data into a format suitable for chart display. For example, standardizing the date format of time series data, encoding text data, and normalizing or standardizing numerical data to facilitate comparison and analysis in charts.

[0107] Data aggregation: Aggregate data based on the type of chart and the requirements. For example, for a bar chart, if you want to display the total sales for each month, you need to summarize the daily sales data by month; for a line chart, if you want to display the annual sales trend of a product, you need to aggregate the monthly sales data into annual data.

[0108] S502. Chart type selection.

[0109] Choose the appropriate chart type based on the characteristics of the data: Select the appropriate chart type based on the type of data and the purpose of the analysis. For example, choose a bar chart to compare the size of different categories of data; choose a line chart to show the trend of data over time; and choose a pie chart to show the proportion of data.

[0110] Choose based on user needs: Consider user needs and preferences, and select chart types that are easy to understand and interpret. For example, for non-technical users, simple and intuitive bar charts or line charts may be more suitable; for professional users, complex chart types such as stacked bar charts or multi-series line charts may better meet their analytical needs.

[0111] S503. Chart Style Settings.

[0112] Color Selection: Choose an appropriate color scheme to make charts more visually appealing and easier to distinguish. For example, in a bar chart, set different colors for different categories of bars; in a line chart, set different colors for different series of lines, and use the shade of the color to represent the magnitude of the data.

[0113] Font settings: Set appropriate fonts and sizes to ensure that the text in the charts is clearly readable. For example, use a larger font for the chart title, a medium font for the axis labels and legend, and a smaller font for the data labels.

[0114] Chart Layout: Arrange the chart layout appropriately, including the position and size of elements such as the chart title, axis labels, legend, and data labels. For example, place the chart title at the top of the chart, the axis labels next to the axes, the legend in a suitable position on the chart, and the data labels directly on or next to the data points.

[0115] S504. Data visualization implementation.

[0116] Using chart libraries: Quickly create charts using existing chart libraries, such as Matplotlib, Secrets, and Plotly in Python, and Highcharts and Echarts in JavaScript. These libraries offer a rich variety of chart types and style options, making it easy to customize them to your needs.

[0117] Add interactive features: Add interactive features to charts, such as displaying data details when the mouse hovers over a data point, or allowing users to filter or drill down on chart elements, to enhance their ability to explore and understand the data. For example, in a line chart, hovering the mouse over a data point on the line displays the specific value and related information for that data point; in a bar chart, clicking on a bar displays detailed data for that category.

[0118] Chart Export: Provides chart export functionality, allowing users to export charts to common file formats such as PNG, PDF, and Excel, making it convenient for users to use charts in other documents or reports.

[0119] S505. Data Interpretation and Explanation.

[0120] Add chart titles and annotations: Add clear titles to your charts, briefly explaining the topic and purpose of the chart. Also, add annotations to the chart to explain the meaning of the data and the analysis results. For example, clearly state the categories and indicators being compared in the title of a bar chart, and add annotations below the chart explaining the source of the data and the statistical scope.

[0121] Provide data interpretation suggestions: Based on the chart data, provide relevant data interpretation suggestions and analytical conclusions. For example, in a line chart, analyze the data trend, point out outliers and potential problems; in a bar chart, compare the differences between different categories of data and propose improvement suggestions.

[0122] Recording the question-and-answer process and results forms a question-and-answer pair memory. By accumulating historical question-and-answer data, the system's knowledge base is continuously enriched, enabling self-learning, iteration, and optimization, and improving the ability to handle similar questions in the future.

[0123] This method mainly includes modules such as long short-term memory pool, problem decomposition, configuration, rule reasoning and retrieval, SQL reconstruction and retrieval, chart generation, question-answer pair memory, and meta-database.

[0124] When a user submits a question, the system first imports the question into the Long Short-Term Memory (LSTM) pool. The LTM pool retrieves historical information and knowledge related to the question. If the current question matches a historical question, the system automatically jumps to the SQL restructuring and retrieval unit to immediately search the database and obtain the target data.

[0125] If the current problem has a low match with historical problems, the multimodal large model performs in-depth analysis of the user's problem to achieve accurate problem decomposition, breaking down complex problems into multiple easily manageable sub-problems. Based on the problem decomposition, the rule reasoning and retrieval module performs logical reasoning on these sub-problems according to pre-defined rules and retrieves relevant knowledge and data information. To improve database retrieval speed, the configuration module participates, reconstructing the overall database to form a meta-database. Based on the database tables to be retrieved, the large model is called to generate possible retrieval rules by database name, table name, and field name. After expert annotation, the retrieval rules are imported into the database. The annotated retrieval rules rationally plan and schedule the entire processing flow to ensure the orderly progress of each stage.

[0126] Based on the preceding reasoning, the SQL refactoring and retrieval module encapsulates the user's question, five-dimensional features, and basic database information, feeding them into the large model. The large model then generates SQL commands to accurately retrieve data records from the database that meet the user's needs. The retrieved data can be used in two ways: firstly, in the chart generation module to present the data to the user in an intuitive and easy-to-understand graphical format; secondly, the question-and-answer process and results are stored in the question-and-answer pair memory module, providing reference and guidance for handling similar questions in the future.

[0127] In some embodiments, the interactive query system for distributed energy multidimensional data may include multiple functional modules composed of computer program segments. The computer programs for each program segment in the interactive query system for distributed energy multidimensional data may be stored in the memory of a computer device and executed by at least one processor to perform (see details). Figure 1 (Description) Functionality for interactive querying of multidimensional data on distributed energy resources.

[0128] In this embodiment, the interactive query system for distributed energy multidimensional data can be divided into multiple functional modules based on its functions, such as... Figure 2 As shown. The module referred to in this invention is a series of computer program segments that can be executed by at least one processor and perform a fixed function, and is stored in memory. In this embodiment, the functions of each module will be described in detail in subsequent embodiments.

[0129] A receiving module is used to receive question text and generate a semantic vector based on the question text; The matching module is used to query historical question-and-answer data that match the semantic vector from the long short-term memory pool; The first processing module is used to generate search results based on the historical question and answer data if the historical question and answer data is found. The second processing module is used to generate a query statement based on the question text using a large model if the historical question and answer data is not found, and to obtain the retrieval results by sending the query statement to the database. The visualization module is used to generate charts based on the search results and output the charts to the visualization platform.

[0130] Figure 3 The interactive query method for multidimensional distributed energy data provided in this application embodiment can be applied to a device. The device 300 may include a processor 310, a memory 320, and a communication unit 330. These components communicate via one or more buses. Those skilled in the art will understand that the server structure shown in the figures does not constitute a limitation of the invention; it can be a bus topology, a star topology, and may include more or fewer components than shown, or combine certain components, or have different component arrangements.

[0131] The present invention also provides a computer medium, wherein the computer medium may store a program, which, when executed, may include some or all of the steps provided in the embodiments of the present invention. The medium may be a magnetic disk, an optical disk, read-only memory (ROM), or random access memory (RAM), etc.

Claims

1. A method for interactive querying of multidimensional data on distributed energy resources, characterized in that, include: Receive the question text and generate a semantic vector based on the question text; Retrieve historical question-and-answer data that matches the semantic vector from the long short-term memory pool; If the historical question and answer data is found, search results are generated based on the historical question and answer data; If the historical question and answer data is not found, a query statement is generated based on the question text using a large model, and the search results are obtained by sending the query statement to the database. A chart is generated based on the search results, and the chart is output to a visualization platform.

2. The method according to claim 1, characterized in that, Receive question text and generate a semantic vector based on the question text, including: Keywords in user questions are extracted using named entity recognition and semantic role labeling. These keywords include table names, fields, and conditions. The question text is converted into a semantic vector based on the keywords.

3. The method according to claim 1, characterized in that, The method further includes: Store historical question and answer data within the set time period into the short-term memory pool; The historical question-and-answer data that will be marked as valid will be stored in the long-term memory pool.

4. The method according to claim 3, characterized in that, The method further includes: Monitor the access frequency of historical question-and-answer data in the long-term memory pool and set the access frequency as the weight of historical question-and-answer data. If the number of historical question-and-answer data in the long-term memory pool reaches the set threshold, then delete the historical question-and-answer data with the lowest weight. Monitor the access frequency of historical question and answer data in the short-term memory pool, and update the historical document data whose access frequency reaches the set frequency threshold to the long-term memory pool.

5. The method according to claim 1, characterized in that, Generate query statements for the semantic vectors using a large model, including: The complex problem is broken down into five elements, including time, place, people, task, and progress, and the five elements are semantically encoded through vectorization. Knowledge graphs are built based on metadata, and vectorized rules are generated using large models. Construct cross-table query links using rules and knowledge graphs; A query statement is generated based on the semantically encoded five-dimensional elements and the matching cross-table query links.

6. The method according to claim 5, characterized in that, Knowledge graphs are built based on metadata, and vectorized rules are generated using large models, including: Scan the database metadata to extract the database name, table name, field name, and comments; Construct triples based on the library name, table name, field name, and comments, and store the triples in the graph database; A prompt word template for generating rules is pre-configured, and metadata is populated into the prompt word template to obtain prompt words. The metadata includes table structure and field descriptions. The prompt words are input into the large model to obtain rule description text, which defines the association between the question keywords and the metadata; The rule description text is encoded into a rule vector, and the rule vector is saved to the rule vector library.

7. The method according to claim 5, characterized in that, Constructing cross-table query links using rules and knowledge graphs, including: Calculate the cosine similarity between the user's question vector and the rule vector library, match the highest-scoring rule, and determine the main table based on the highest-scoring rule; Starting from the main table, we search for secondary tables with foreign key relationships by traversing the knowledge graph; Extract the entities from the problem and align the fields in the secondary table based on those entities; Based on the main table, secondary table, and aligned fields, construct a cross-table query chain.

8. An interactive query system for multidimensional data of distributed energy resources, characterized in that, include: A receiving module is used to receive question text and generate a semantic vector based on the question text; The matching module is used to query historical question-and-answer data that match the semantic vector from the long short-term memory pool; The first processing module is used to generate search results based on the historical question and answer data if the historical question and answer data is found. The second processing module is used to generate a query statement based on the question text using a large model if the historical question and answer data is not found, and to obtain the retrieval results by sending the query statement to the database. The visualization module is used to generate charts based on the search results and output the charts to the visualization platform.

9. A device, characterized in that, include: A memory used to store interactive query programs for multidimensional data on distributed energy resources; The processor is configured to implement the steps of the interactive query method for distributed energy multidimensional data as described in any one of claims 1-7 when executing the interactive query program for the distributed energy multidimensional data.

10. A computer-readable medium storing a computer program, characterized in that, The readable medium stores an interactive query program for distributed energy multidimensional data. When the processor executes the interactive query program for distributed energy multidimensional data, it implements the steps of the interactive query method for distributed energy multidimensional data as described in any one of claims 1-7.