Energy business large model semantic understanding and application method and system
By combining a large language model with an energy business knowledge graph, the problems of insufficient professional knowledge coverage and poor system scalability in semantic understanding in existing technologies for energy business have been solved. This has enabled high-precision semantic understanding and automated analysis, and improved the intelligent service capabilities of the Green State Grid platform.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- STATE GRID GREEN ENERGY CO LTD
- Filing Date
- 2026-03-23
- Publication Date
- 2026-06-23
AI Technical Summary
Existing technologies have problems in applying large models to semantic understanding of energy business, such as insufficient coverage of professional knowledge, limited model generalization ability, poor system scalability, lack of interpretability and compliance of results, which makes it difficult to implement them on a large scale in comprehensive energy service platforms such as Green State Grid.
We construct a semantic understanding architecture based on a large language model, combining an energy business knowledge graph with a multimodal data processing mechanism to achieve accurate semantic parsing of unstructured text such as user queries, work order descriptions, and operation reports. Through a configurable task routing and execution engine, we complete automated intelligent analysis and decision suggestion generation.
It achieves high-precision semantic understanding of complex natural language requests, improves the accuracy of intent recognition, supports automated responses to multiple typical business scenarios, enhances the platform's intelligence level and user experience, and meets the requirements of high-concurrency online services.
Smart Images

Figure CN122263884A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the interdisciplinary field of artificial intelligence and energy information technology, specifically involving a method and system for semantic understanding and application of large-scale energy business models. Background Technology
[0002] With the deepening of energy digital transformation, intelligent energy business has become the core driving force for improving the operational efficiency and service level of energy systems. With its powerful semantic understanding and knowledge reasoning capabilities, big model technology is gradually showing its application potential in scenarios such as energy management, demand response, and energy efficiency optimization, providing a new technical path for building intelligent energy systems with cognitive and decision-making capabilities.
[0003] Among them, the large-scale semantic understanding technology for energy business focuses on transforming unstructured text such as natural language instructions, business documents, and user inquiries into executable business logic and data operations, aiming to achieve efficient human-machine collaboration and business process automation. The core of this technology lies in the deep modeling of energy field terms, business rules, and contextual relationships through pre-trained language models, thereby supporting the accurate implementation of functions such as intelligent question answering, work order generation, and strategy recommendation.
[0004] Existing technologies still face multiple challenges in applying large-scale models to semantic understanding of energy businesses: First, general-purpose large-scale models lack in-depth coverage of professional knowledge in the energy field, making it difficult to accurately interpret professional expressions involving electricity pricing mechanisms, load characteristics, equipment parameters, etc.; second, the fragmentation of business scenarios limits the generalization ability of models, and different functional modules need to be trained and deployed independently, resulting in resource waste and increased maintenance complexity; third, the high coupling between semantic understanding and back-end data systems, and the lack of a unified service middleware layer, leads to poor system scalability and high response latency; finally, in actual business operations, the model output results lack interpretability and compliance verification mechanisms, making it difficult to meet the stringent requirements of the power system for safety and reliability. These problems seriously restrict the large-scale implementation and value release of large-scale models in integrated energy service platforms such as the Green State Grid. Summary of the Invention
[0005] The purpose of this invention is to overcome the shortcomings of existing technologies by providing a method and system for semantic understanding and application of large-scale energy business models, which can effectively solve the problems mentioned in the background technology. Currently, energy digitalization platforms generally face structural problems when handling complex business semantic understanding tasks, such as weak natural language interaction capabilities, low efficiency in multi-source heterogeneous data fusion, and lagging intelligent response in typical business scenarios. Traditional systems rely on rule engines or shallow models, making it difficult to achieve deep analysis of user intent and support cross-scenario knowledge transfer and dynamic decision support, resulting in low service response accuracy, frequent manual intervention, and poor system adaptability. This invention constructs a semantic understanding architecture based on a large language model, combined with an energy business knowledge graph and multimodal data processing mechanism, to achieve accurate semantic analysis of unstructured text such as user queries, work order descriptions, and operation reports in the Green State Grid platform. Through a configurable task routing and execution engine, it completes automated intelligent analysis and decision suggestion generation in multiple typical business scenarios such as energy efficiency diagnosis, load forecasting, and equipment alarm analysis, significantly improving the platform's intelligence level and user experience.
[0006] To achieve the above objectives, the present invention provides the following technical solution: On one hand, an energy business large-scale model semantic understanding and application system, comprising the following components: a front-end interface module for receiving natural language requests input from users of the Green State Grid platform; a text preprocessing module for performing preliminary cleaning and format standardization of the natural language requests; a semantic parsing engine module for loading and calling the DeepSeek large-scale language model for deep semantic understanding and intent recognition; an energy business knowledge graph database module for storing and managing power equipment topology relationships, energy efficiency index systems, historical operation and maintenance records, and industry standard provisions; a task routing and scheduling module for distributing requests to corresponding functional subsystems according to the identified business intent; multiple scenario-based intelligent analysis component modules for performing specific business logic analysis and generating structured results; a response synthesis and output module for converting analysis results into natural language feedback and returning it to the user; and a system operation status monitoring module for monitoring model inference performance, call latency, and semantic accuracy. Preferably, the text preprocessing module performs character encoding unification, special symbol filtering, stop word removal, and preliminary named entity annotation operations on the received natural language requests. The named entity annotation covers energy-related proprietary terms such as substation name, voltage level, equipment type, time range, and electricity unit. The annotation process uses a sequence annotation model based on BiLSTM-CRF for recognition. The model training corpus comes from 500,000 real work orders and consultation records accumulated by the Green State Grid platform over the past three years, and the entity recognition accuracy rate reaches over 92.7%. Furthermore, the semantic parsing engine module integrates the DeepSeek-V2 large language model via API. This model has 145 bytes of parameters and has been optimized through instruction tuning and reinforcement learning based on human feedback (RLHF). It achieves an average F1 score of 86.4% on energy-related question-answering tasks. Its input is a preprocessed text sequence, and its output is a structured semantic representation vector containing idea graph labels, key parameter slot filling results, and confidence scores. The idea graph labels cover 18 high-frequency business categories, such as energy efficiency assessment, fault diagnosis, electricity consumption suggestions, and report generation. The slot parameters include quantifiable condition fields such as equipment number, start and end time, area range, and threshold conditions. In addition, the energy business knowledge graph database module is built using the Neo4j graph database. The node types include 67 entities such as power users, distribution transformers, metering points, energy efficiency benchmark values, and typical load curves. The edge relationships are defined as 42 semantic associations such as connected to, belonging to, affecting, conforming to standards, and historical anomaly patterns. The knowledge graph regularly synchronizes incremental data from the SCADA system, marketing business system, and electricity consumption information collection system, with an update cycle of once every 15 minutes, to ensure that the graph content is consistent with the real-time operating status of the power grid. Preferably, the task routing and scheduling module makes path decisions based on the intent tags and parameter completeness output by the semantic parsing engine. When a clear business intent is identified and the parameters are complete, the module directly routes to the corresponding intelligent analysis component. When parameters are missing, a questioning strategy is triggered to generate supplementary questions, and interactive follow-up questions are initiated through the front-end interface, with a maximum of 3 rounds of follow-up questions to complete the input conditions. When the intent is ambiguous or the confidence level is lower than the set threshold of 0.75, the case is automatically transferred to a human agent for assistance, and the case is recorded simultaneously for subsequent model iteration training. Furthermore, the scenario-based intelligent analysis component module includes at least five independently deployed functional units: the first is an energy efficiency diagnosis and analysis unit, used to automatically generate an analysis report on weak links in energy efficiency based on comparison with historical electricity consumption data on the user side and industry energy efficiency benchmarks; the second is a load trend prediction unit, which uses a sliding window mechanism to extract hourly load data for the past 90 days, combines weather forecasts and holiday information, and uses an integrated learning model to output hourly load prediction curves for the next 7 days; the third is an abnormal electricity consumption detection unit, which constructs a multi-dimensional criterion model based on 12 features such as current harmonic content, three-phase imbalance, and peak-to-valley ratio deviation to detect potential electricity theft or equipment aging risks; the fourth is an equipment health assessment unit, which integrates multi-source sensor data such as infrared thermography, partial discharge monitoring, and oil chromatography analysis to determine the remaining lifespan of power distribution equipment and propose maintenance priority suggestions; the fifth is an energy-saving measure recommendation unit, which matches suitable energy storage configuration schemes or demand response participation strategies based on user industry attributes, electricity price packages, and load characteristics. In addition, the response synthesis and output module converts the structured data returned by each intelligent analysis component into natural language paragraphs that conform to Chinese expression habits. It organizes the content by combining template filling and sentence sorting to ensure that the output sentences are fluent, the terminology is standardized, and the key points are highlighted. At the same time, it adds links to visual charts for users to click and view detailed data charts. All response content is filtered by the security review module before being pushed to the front-end interface, and the end-to-end response time is controlled within 2.3 seconds. Furthermore, the system operation status monitoring module continuously collects the number of calls, average response time, semantic recognition accuracy, task completion rate, and user satisfaction score of each component. User satisfaction is collected by embedding a "Do you want your problem solved?" option at the end of the response. All monitoring data is written to the Elasticsearch cluster and a real-time dashboard is generated. When the average latency of any component exceeds 1.5 seconds for 5 consecutive minutes or the error rate rises to more than 5%, an alarm is automatically triggered to notify the operation and maintenance team to intervene and investigate. On the other hand, a method for semantic understanding and application of a large-scale energy business model, the specific steps of which are as follows: S110, receives service requests in natural language form from users of the Green State Grid platform through the front-end interface module. The service requests include text messages, speech-to-text, or text information in uploaded file content. S120, The service request is standardized using the text preprocessing module, including character set conversion, noise character removal, word segmentation, and coarse-grained annotation of named entities in the energy field. S130: Input the preprocessed text into the semantic parsing engine module, call the DeepSeek large language model to perform intent classification and key parameter slot extraction, and obtain structured semantic understanding results; S140, determine whether the intention graph label in the semantic understanding result exists in the preset set of valid business intentions. If yes, proceed to S150; otherwise, return a prompt message and end the process. S150, check whether all necessary parameter slots associated with the concept graph have been successfully filled. If all are satisfied, proceed to S160. If there are any missing slots, execute S151 to generate follow-up questions and request the user to supplement information through the front-end interface. S151, Generate natural language follow-up questions that fit the context of energy business, guide the user to provide missing key parameters, repeat S120 to S130 until the parameters are complete or the maximum number of follow-up questions is reached 3 times. S160, based on the final determined main intent label and complete parameter set, the task routing and scheduling module forwards the request to the corresponding scenario-based intelligent analysis component module; S170, through the scenario-based intelligent analysis component module, retrieves relevant information from the energy business knowledge graph database, combines it with external data sources to execute specific business analysis logic, and generates structured analysis results; S180, The structured analysis results are input into the response synthesis and output module, which converts them into natural language descriptions and adds data chart references to form complete response content; S190, push the response content back to the front-end interface of the Green State Grid platform, and record the entire process log of this interaction for subsequent system optimization and model retraining; Preferably, the DeepSeek large language model in S130 has completed domain-adaptive training for the power industry before deployment. The training data includes the full text of 23 professional documents such as the "Energy Efficiency Management Guidelines of State Grid Corporation of China" and the "Power System Operation Regulations", as well as 86,000 FAQ items in the internal knowledge base of the Green State Grid platform. During the training process, LoRA low-rank adaptation technology is used to efficiently fine-tune the parameters. Only 0.8% of the model parameters are updated to achieve an increase of 19.3 percentage points in the accuracy of intent recognition. Furthermore, when performing analysis, the energy efficiency diagnosis and analysis unit in S170 first obtains the daily electricity consumption data of the target user for the past 12 months from the electricity consumption information collection system, calculates nine basic energy efficiency indicators such as load rate, simultaneity rate, and power factor, and then searches for the average value of users of the same industry and scale in the energy business knowledge graph as a reference benchmark. If a certain indicator deviates from the benchmark value by more than ±15%, it is marked as an abnormal item and improvement suggestions are generated. Furthermore, the load trend prediction unit in S170 adopts a hybrid architecture of XGBoost and Transformer for modeling. XGBoost is responsible for handling discrete features such as weather conditions and workday type, while Transformer is responsible for capturing the long-term dependencies of the load sequence. The outputs of the two are weighted and fused to obtain the final prediction value. The model automatically uses the latest 24-hour actual load data for incremental updates every morning to ensure that the prediction accuracy is stable at MAPE≤6.8%. Furthermore, the abnormal power consumption detection unit in S170 introduces a dynamic threshold mechanism during the analysis process. The alarm thresholds for various criteria are not fixed, but are calculated based on the historical operating data of the equipment to determine the mean and standard deviation. When the real-time monitoring value exceeds the mean ± 2.5 times the standard deviation, an early warning is triggered. If the same equipment exhibits the same type of abnormality for 3 consecutive days, it is upgraded to a formal alarm and pushed to the mobile terminal of the maintenance personnel. In addition, the response synthesis process in S180 adopts a multi-template selection strategy. The system has 43 built-in response templates for different business types. The template selection is based on the main intent tag and user identity attributes. For example, industrial users receive more technical descriptions, while residential users receive more popular explanations and behavioral guidance. All templates have been reviewed for legal compliance to avoid making promises or misleading statements. Compared with the prior art, the present invention has the following beneficial effects: By introducing the DeepSeek large language model and deeply integrating it with the energy business knowledge graph, high-precision semantic understanding of complex natural language requests was achieved, and the intent recognition accuracy was improved to 89.2%, which is 37.5 percentage points higher than the traditional keyword matching method. A scenario-based intelligent analysis component system covering core businesses such as energy efficiency diagnosis, load forecasting, and anomaly detection has been built, supporting automated response to no less than 18 typical application scenarios, enabling the Green State Grid platform to handle more than 78% of routine consultation and analysis requests without manual intervention. By adopting a task routing and multi-round interaction mechanism, the service interruption problem caused by incomplete parameters was effectively solved, and the one-time resolution rate of user problems increased from 54% to 83.6%. The system's end-to-end average response time is 2.1 seconds, meeting the requirements for high-concurrency online services. In actual testing, it supports processing 420 concurrent requests per second, and resource utilization remains stable within a reasonable range. Establish a closed-loop feedback mechanism, automatically collect all interactive data for continuous model optimization, forming a positive cycle of use-feedback-evolution, and ensuring that the system's long-term service capabilities are continuously improved. Attached Figure Description
[0007] Figure 1 This is a schematic diagram of the overall technical architecture of the semantic understanding and application method and system for the large energy business model proposed in this invention; Figure 2 This is a schematic diagram of the core principle framework of semantic understanding and task routing based on the fusion of large language models and knowledge graphs in this invention. Detailed Implementation
[0008] Please refer to Figure 1 and Figure 2 To further illustrate the technical means and effects of the present invention in order to achieve the intended purpose, the following detailed description of the specific implementation methods, structures, features and effects of the present invention, in conjunction with the accompanying drawings and preferred embodiments, is provided below.
[0009] Example 1 This embodiment uses an energy efficiency optimization consultation request submitted by an industrial user on the Green State Grid platform as an application scenario. It specifically describes the complete technical process of the energy business large-scale model semantic understanding and application system in processing complex natural language input, achieving cross-system data linkage, and generating automated analysis suggestions. This scenario involves key technical aspects such as multimodal input parsing, deep semantic recognition, knowledge graph association query, task routing decision-making, intelligent component collaborative execution, and natural language response synthesis, comprehensively demonstrating the systematic innovation of this invention in enhancing the platform's intelligent service capabilities.
[0010] The front-end interface module continuously monitors user interaction channels from the State Grid Green Web portal and mobile app, receiving service requests containing text messages, speech-to-text content, or text information extracted from uploaded documents. When a steel manufacturing company submits a natural language request on the platform asking for analysis of the electricity consumption of Workshop 3 last month to see if there is any energy-saving potential, the request is immediately captured and passed to the text preprocessing module. The text preprocessing module first performs character encoding standardization on the raw string, forcibly converting mixed encoding formats such as UTF-8 and GBK to standard UTF-8 encoding to ensure no garbled characters in subsequent processing; then it performs noise character filtering, removing invisible control characters, repeated spaces, HTML tag residues, and typos introduced by OCR recognition, such as standardizing electricity consumption information to "electricity consumption information". After cleaning, the system calls a power-specific word segmentation engine based on jieba for word segmentation. This engine has a built-in power industry terminology dictionary, covering 12,476 professional terms such as reactive power compensation, peak-valley flat rate, and demand charge, avoiding problems such as traditional word segmentation tools incorrectly segmenting distribution transformers into distribution transformers.
[0011] Next, the system initiates a coarse-grained named entity annotation process, employing a BiLSTM-CRF sequence annotation model pre-deployed on a local GPU server to perform entity recognition on the word segmentation results. The model's input layer consists of a 300-dimensional Word2Vec word vector concatenated with a 100-dimensional positional embedding vector. The hidden layer contains two layers of bidirectional LSTM networks, each with 256 neurons. The output layer decodes the optimal label path using a Conditional Random Field (CRF). The model training corpus originates from 500,000 real work orders and customer service records accumulated over the past three years by the Green State Grid platform. These records are manually annotated to form an annotated corpus containing five categories: substation name, voltage level, equipment type, time range, and electricity unit. For example, Workshop No. 3 is identified as the equipment type, "last month" is parsed as the time range, and electricity consumption is associated with the implicit field of electricity unit. After entity recognition, the system generates a labeled intermediate representation: [Equipment Type: Workshop No. 3] [Time Range: Last Month] [Electricity Unit: Electricity Consumption]. This structured intermediate product serves as the input basis for the next stage.
[0012] The processed text sequence is encapsulated into a JSON data packet containing four core fields: the original text, the cleaned text, the word segmentation results, and the entity annotation list. This packet is then transmitted to the semantic parsing engine module via HTTPS. This module integrates the DeepSeek-V2 large language model via API. Its parameter size reaches 145 bytes, and its underlying architecture is based on stacked Transformer decoders with 96 decoding layers, 96 attention heads, and a feedforward network dimension of 36,864. The model weights are optimized through both instruction tuning and reinforcement learning based on human feedback (RLHF), and domain-adaptive training for the power industry was completed before deployment. During training, full texts of 23 professional documents, including the "State Grid Corporation Energy Efficiency Management Guidelines" and the "Power System Operation Regulations," as well as 86,000 FAQ entries from the platform's internal knowledge base, were used as supervision signals. LoRA low-rank adaptation technology was employed for efficient parameter tuning, achieving a 19.3 percentage point improvement in intent recognition accuracy with only 0.8% of the model parameters updated.
[0013] After receiving input, the semantic parsing engine initiates an inference session. The model maps the user request to a high-dimensional semantic space and performs joint intent classification and slot filling tasks. The output is a structured semantic representation vector containing three key components: The first is the main intent label, which the model matches from 18 predefined high-frequency business categories, in this example, the energy efficiency assessment. The second is the key parameter slot filling result, including the device number (corresponding to workshop 3), start and end time (automatically inferred to be from the 1st of the previous month to the last day of the month), area range (defaulting to the power supply zone to which the user belongs), and threshold conditions (unspecified, defaulting to standard comparison mode). The third is the confidence score of each recognition result, where the confidence score of the main graph energy efficiency assessment is 0.91, the device number recognition score is 0.88, and the time range parsing score is 0.93, all higher than the set threshold of 0.75, and are judged as high-confidence recognition results.
[0014] The system then proceeds to the S140 judgment process to verify whether the energy efficiency assessment belongs to the preset set of valid business intents. This set is stored in the memory cache as a hash table and contains 18 enumerated values, including energy efficiency assessment, fault diagnosis, electricity consumption suggestions, report generation, load forecasting, anomaly detection, equipment health assessment, and energy-saving measure recommendations. The comparison result confirms the intent's validity, and the process continues to the S150 parameter integrity verification stage. The system checks whether the necessary slots are complete based on the parameter template corresponding to the energy efficiency assessment intent. The template definition requires two core parameters: the target device number and the time window. Both have been successfully filled, and the parameter integrity condition is met. Without triggering a follow-up mechanism, the system directly proceeds to the S160 task routing stage.
[0015] After receiving the structured semantic results, the task routing and scheduling module looks up the routing mapping table based on the main graph label energy efficiency assessment to determine which component should be forwarded to the energy efficiency diagnosis and analysis unit. This mapping table is configured in YAML format and stored in the distributed configuration center Nacos, supporting hot updates without requiring a service restart. After the routing decision is completed, the system constructs a new task request object, containing fields such as user ID, device number, analysis time period, original request snapshot, and context identifier, and asynchronously delivers it to the target component via a Kafka message queue, achieving loosely coupled communication and load shaving.
[0016] Upon receiving the task request, the energy efficiency diagnosis and analysis unit immediately initiates its analysis logic. First, it initiates a cross-system query through the unified data access agent to retrieve the daily electricity consumption data for the target equipment, Workshop 3, over the past 12 months from the electricity consumption information collection system. The sampling frequency is one record per day, and the data fields include date, active power, reactive power, maximum demand, power factor, and three-phase current balance. The system then uses a sliding window mechanism to extract the dataset for the most recent complete month, i.e., the daily electricity consumption sequence for the previous month, totaling 30 or 31 records. Based on this, nine basic energy efficiency indicators are calculated: Load factor = actual monthly electricity consumption / (rated capacity × operating hours); Simultaneity rate = maximum demand / sum of demand of each sub-circuit; Power factor = active power / apparent power; Peak-valley ratio = average load during peak hours / average load during off-peak hours; Unit product power consumption = total electricity consumption / output during the same period (if the user provides production data); Harmonic distortion rate = ∑(square of the effective value of each harmonic)^(1 / 2) / effective value of the fundamental wave; Three-phase imbalance = (maximum phase current - minimum phase current) / maximum phase current × 100%; Daytime fluctuation coefficient = standard deviation / mean; Nighttime standby power consumption ratio = electricity consumption during non-production periods / total electricity consumption.
[0017] After all indicators are calculated, the system connects to the energy business knowledge graph database module to initiate a related query. This database is built using the Neo4j graph database, with node types including 67 entities such as electricity users, distribution transformers, metering points, energy efficiency benchmark values, and typical load curves. Edge relationships are defined as 42 semantic associations, such as connected to, belonging to, affecting, conforming to standards, and historical anomaly patterns. The system constructs a Cypher query to find other user groups belonging to the same steel smelting industry as Workshop No. 3, with annual electricity consumption within ±20%, and the same voltage level, and extracts their corresponding historical average energy efficiency indicators as reference benchmarks. The knowledge graph regularly synchronizes incremental data from the SCADA system, marketing business system, and electricity information collection system, with an update cycle of every 15 minutes, ensuring that the graph content is consistent with the real-time operating status of the power grid.
[0018] Comparative analysis revealed that Workshop 3 had a load rate of 58%, lower than the average of 67% for similar enterprises; a power factor of 0.82, lower than the industry benchmark of 0.92; and nighttime standby energy consumption as high as 23%, far exceeding the average level of 12%. Based on this, the system identified three anomalies and generated improvement suggestions for each: for the low load rate, it recommended optimizing production scheduling to improve equipment utilization; for the low power factor, it recommended installing a dynamic reactive power compensation device, which is expected to reduce power regulation costs by approximately 150,000 yuan per year; and for the high standby energy consumption, it proposed establishing an equipment shutdown management system and installing smart sockets for remote power-off control. All analysis logs were recorded in an independent audit stream for subsequent traceability and compliance review.
[0019] The analysis results are returned to the main control flow in structured JSON format, containing four main parts: raw indicator data, a benchmark comparison table, a list of anomalies, and a list of improvement suggestions. This result is then sent to the response synthesis and output module, initiating the natural language generation process. The system has 43 built-in response templates, indexed by both business type and user attributes. Since this request originated from an industrial user and its subject was energy efficiency analysis, the system selected template number TPL-EFF-IND-02. Its language style leans towards technical expressions, using formal terms such as "your organization," "suggestion," and "can be considered," while avoiding imperative expressions like "should" and "must" to prevent legal disputes.
[0020] The template population uses a rule-based sorting mechanism: First, structured data is mapped to preset placeholders, such as replacing {{device_name}} with Workshop 3, {{analysis_month}} with last month, and inserting a comparison table summary for {{benchmark_comparison}}. Then, sentence blocks are sorted according to the severity of anomalies, prioritizing suggestions with the greatest energy-saving potential. Finally, guiding statements are added. Below is the generated energy efficiency weakness analysis report for you: [link to conclusion]. Click the link below to view detailed charts and historical trend comparisons. The visualization chart links point to the platform's BI system's fixed report URL, with timestamps and signature tokens added to ensure secure access.
[0021] The synthesized complete response content is filtered by a security review module. This module uses a dual mechanism of regular expressions and keyword blacklists to block sensitive words, promises (such as guaranteeing savings of XX million yuan), and risks of personal privacy information leakage. After passing the review, it is pushed to the user's front-end interface via a WebSocket long connection. The actual end-to-end response time was 2.08 seconds, meeting the design specifications. Meanwhile, the system operation status monitoring module continuously collects performance data for each component: the number of front-end interface calls increased by 1, the average latency of the semantic parsing engine was 620 milliseconds, the energy efficiency diagnosis unit's processing time was 890 milliseconds, and the overall task completion rate was 100%. User satisfaction was collected through a pop-up "Does this solve your problem?" button; in this example, the user clicked "Yes," and the rating was included in the statistics.
[0022] All interaction logs are persistently stored in the ELK stack, including original requests, preprocessing traces, semantic parsing details, routing paths, analysis parameters, response content snapshots, user feedback, and other end-to-end data. These logs are written to the Elasticsearch cluster and a unique transaction ID is generated for subsequent model retraining and system optimization. The monitoring module also tracks resource usage in real time. If any component experiences an average latency exceeding 1.5 seconds for five consecutive minutes or an error rate rises above 5%, an alarm is automatically triggered to notify the operations team to investigate and ensure system stability.
[0023] The above process fully demonstrates the technical implementation path of this invention in industrial energy efficiency diagnosis scenarios, showcasing the advantages of deep integration of large language models and professional knowledge. To further expand the technical boundaries and verify the adaptability of this invention under different business logics, another embodiment with substantial differences is proposed.
[0024] Example 2 This embodiment focuses on a complex query request submitted by a residential user through the Green State Grid App: "My electricity bill is suddenly very high this month, is there a problem with the meter? Also, I'd like to check if it will be hot next week, and whether I need to turn on the air conditioner." The key difference between this scenario and Embodiment 1 is that the user's intent is highly complex and potentially conflicting—it includes questioning abnormal electricity usage (troubleshooting), predicting future load behavior (load forecasting), and also incorporating common-sense judgments (weather and air conditioner usage). Traditional systems typically only identify the dominant intent or forcibly split the request, while this invention, by constructing a multi-intent parallel parsing and conflict resolution mechanism, achieves accurate decomposition and collaborative response to such complex requests, constituting a fundamental difference from Embodiment 1 in its core technical approach.
[0025] After receiving the composite request, the text preprocessing module still performs basic operations such as character standardization, noise reduction, word segmentation, and entity annotation. The word segmentation result is: My home / this month / electricity bill / suddenly / is / the / meter / has / a / problem / ? / By the way / check / next week / weather / is / hot / should / we / turn / the / air / on / ? Entity annotation identifies "this month" as the time range, electricity bill as the unit of electricity consumption, meter as the device type, next week as the future time window, weather as an environmental variable, and air conditioner as a type of household appliance. It is worth noting that "high" as an adverb of degree is given special semantic weight, triggering subsequent anomaly detection logic.
[0026] After loading the DeepSeek-V2 model, the semantic parsing engine module no longer employs a single intent classification strategy but instead activates a multi-label joint prediction mode. This mode modifies the model's output layer structure, replacing the original single-classification Softmax with a multi-task output header composed of multiple sigmoid activation functions, with each task corresponding to an independent intent probability estimate. The model is specifically trained to identify intent co-occurrence patterns, with the training samples containing 23,000 manually constructed multi-intent corpora covering common combinations such as cost inquiries + equipment status, electricity usage suggestions + weather impact, and repair requests + energy-saving consultations. Inference results show that the request simultaneously activates two high-confidence intents: abnormal electricity usage detection (confidence 0.87) and load prediction (confidence 0.84), both exceeding the 0.75 threshold, thus the system classifies it as a valid multi-intent request.
[0027] Upon entering the S140 judgment phase, the system traverses the preset intent set, confirming that both are legitimate business types, and the process proceeds to the S150 parameter verification stage. The system checks the necessary parameters for each intent: for abnormal electricity consumption detection, the metering point number and comparison period must be clearly defined; currently, only the "my home" designation is implied, lacking a specific device ID; for load forecasting, the forecast time range and geographical location must be determined; "next week" has been identified, but the specific power supply area has not been located. Missing parameters trigger the S151 follow-up questioning mechanism, but this embodiment uses a differentiated concurrent follow-up questioning strategy, rather than the serial single-round questioning in Embodiment 1.
[0028] The task routing and scheduling module generates two independent follow-up questions: The first is: Which address of the electricity meter are you referring to when you say "my home"? Please provide the account number or installation location for verification. This is used to complete the identity information required for anomaly detection. The second is: What city are you in? This will help us provide more accurate weather and electricity consumption forecasts. This is used to obtain the geographical parameters for load forecasting. The two follow-up questions are displayed side by side in a card layout on the front-end interface, allowing users to provide two pieces of information at once, greatly improving interaction efficiency. If the user only answers one question, the system will retain the other question in a pending confirmation state and continue to prompt in the next interaction, allowing a maximum of 3 rounds of interaction (not 3 rounds in a single thread), fully respecting the user's input rhythm.
[0029] Suppose the user replies: "Account number is HN123456789, I am in Yuelu District, Changsha City." The system re-executes processes S120 to S130 to complete entity recognition and semantic parsing of the new information, and completes the metering point number and geographic location parameters. At this point, the parameters of both intents are complete, and the task routing module no longer performs single forwarding but initiates a parallel task distribution mechanism: the abnormal power consumption detection request is sent to the abnormal power consumption detection unit, and the load forecasting request is sent to the load trend forecasting unit. The two analysis tasks are executed concurrently in independent containers without blocking each other.
[0030] After receiving the task, the abnormal power consumption detection unit retrieves the target user's daily power consumption data for the past 12 months, calculates the cumulative power consumption from this month to date, and compares it with the same period last year and the average of the past 6 months. The system introduces a dynamic threshold mechanism; the alarm thresholds for various criteria are not fixed but are calculated based on the average and standard deviation of the equipment's historical operating data. Assuming the average monthly power consumption for the past six months is μ=320kWh and the standard deviation σ=45kWh, the normal fluctuation range is μ±1.5σ, i.e., [252.5, 387.5]kWh. If the power consumption for this month reaches 410kWh, exceeding the upper limit, a level one warning is triggered. The system further analyzes the power consumption pattern: by obtaining the hourly load profile through the AMI system, it is found that the base load from 23:00 to 6:00 the next day increases from the usual 1.2kW to 2.8kW, initially suspecting equipment leakage or electricity theft. The system generates a preliminary conclusion: recent power consumption is significantly higher than historical levels, especially with an abnormal increase in standby load at night, recommending on-site meter verification and line inspection.
[0031] Meanwhile, the load trend forecasting unit initiated its analysis process. This unit employs a hybrid architecture of XGBoost and Transformer for modeling: XGBoost handles discrete features such as weather conditions (sunny / rainy / cloudy), weekday type (weekday / weekend / holiday), and seasonal factors (summer / winter); Transformer captures long-term dependencies in the load sequence, with its input being a time-series matrix composed of hourly load data from the past 90 days. External data sources are accessed via the meteorological bureau's API to obtain hourly temperature forecasts for Changsha City for the next 7 days, with a predicted high of 37℃, meeting the conditions for air conditioning activation. The model automatically updates incrementally daily at midnight using the latest 24-hour actual load data, ensuring a stable forecast accuracy of MAPE ≤ 6.8%. The forecast results indicate that air conditioning load will increase by approximately 45% next Sunday, suggesting users adjust their peak-hour equipment start-up and shutdown plans and consider participating in demand response projects to obtain subsidies.
[0032] After the two analysis results are returned, the response synthesis and output module executes a multi-source information fusion generation strategy. Unlike the single template filling in Implementation Example 1, this implementation adopts a paragraph-level splicing and logical connector injection mechanism: First, it prioritizes the output of anomaly detection results according to business importance, using the issue of your reported high electricity bills as a guide; then, a transitional sentence is inserted; furthermore, combined with weather forecast analysis, it naturally switches to the load forecast section; finally, it provides comprehensive suggestions, recommending prioritizing the investigation of abnormally power-consuming equipment at night, and using air conditioning reasonably during high-temperature periods to control electricity cost increases. The entire response includes two independent data chart links, pointing to a historical electricity consumption comparison chart and a future load forecast curve, which users can click to view respectively.
[0033] This embodiment demonstrates the unique technical capabilities of the present invention in processing multi-intent composite requests. Its core differences are: 1) using multi-label intent recognition instead of single intent classification; 2) enabling concurrent follow-up questions under missing parameter conditions; 3) supporting the parallel execution of multiple intelligent analysis components; and 4) having the ability to generate multi-source results fusion.
[0034] Example 3 This embodiment addresses an emergency alarm handling request submitted by equipment maintenance personnel via voice input during inspection: "Substation A main transformer oil temperature alarm, infrared thermography shows C-phase bushing temperature reaches 89 degrees Celsius, please immediately assess health status!" The technical difference in this scenario lies in the fact that the input modality is pure voice with a highly specialized context, requiring the system to possess robust voice transcription error tolerance, accurate recognition mechanism for technical terms, and multi-source sensor data fusion and analysis capabilities, fundamentally different from the text-based interaction modes of the previous two embodiments.
[0035] After the front-end interface module receives the voice request, it first calls the ASR (Automatic Speech Recognition) engine integrated with the platform for transcription. This engine is trained based on the Conformer architecture and is specifically optimized using 12,000 hours of voice data in the power industry (including complex scenarios such as substation background noise, intercom electromagnetic interference, and multi-person conversation mixing). Under the condition of signal-to-noise ratio ≥ 20dB, the word accuracy rate reaches 95.6%. The original audio goes through steps such as noise reduction, endpoint detection, acoustic model inference, and language model rescoring, and outputs the text: The main transformer oil temperature of Substation A alarms. Infrared temperature measurement shows that the temperature of Phase C bushing reaches eighty-nine degrees. Please immediately evaluate the health status! The number 89 is recognized as the Chinese character eighty-nine in spoken language, and numerical normalization needs to be performed in the preprocessing stage.
[0036] After the text preprocessing module is started, in addition to regular cleaning, it focuses on performing power term error correction and standardized mapping. The system maintains a对照表 of easily confused terms. For example, it automatically corrects主便 to主变 (abbreviation for transformer),油问 to油温, and套观 to套管. This mapping is based on the combined judgment of edit distance and context probability. At the same time, it converts the Chinese number eighty-nine to the Arabic number 89, and the unit度 is clarified as℃ according to the context. The entity annotation model particularly enhances the ability to recognize the device hierarchy structure, successfully analyzes the three-level topological relationship of Substation A → Main Transformer → Phase C bushing, and annotates the temperature value as a key parameter.
[0037] After the semantic parsing engine inputs the text, since it involves device fault diagnosis, the system automatically enables the high-priority inference channel. This channel allocates higher GPU computing resources, shortens the queuing waiting time, and ensures rapid response to key tasks. The DeepSeek-V2 model outputs the main intention as device health assessment, with a confidence level of 0.93. The parameter slot filling includes device path (Substation A / Main Transformer / Phase C bushing), monitoring type (infrared temperature measurement), measured temperature (89℃), and alarm level (default is emergency). The system skips the regular parameter verification and directly enters the task routing stage.
[0038] After the task routing and scheduling module recognizes the device health assessment intention, it not only forwards the request to the device health assessment unit but also triggers the multi-source data collaborative pulling mechanism. This mechanism synchronously发起 three asynchronous queries through a unified data middleware: 1) Obtain the real-time values of the current oil temperature, winding temperature, and load current of the main transformer from the SCADA system; 2) Retrieve the infrared thermal image sequence of Phase C bushing in the past 72 hours from the online monitoring platform; 3) Extract the concentration data of characteristic gases such as hydrogen, methane, and acetylene in the most recent oil sample from the DGA (Dissolved Gas Analysis) system. All data is aligned with timestamps to form a multi-dimensional observation matrix.
[0039] After receiving structured input and multi-source sensor data, the equipment health assessment unit initiates a fusion analysis process. The system employs a data fusion algorithm based on Data Synthesis (DS Theory) to perform confidence-weighted merging of health indicators from different sources. Specifically: Assume three sources of evidence: E1: Evidence from infrared thermometry, temperature 89℃ > threshold 85℃, membership function μ1 = 0.8 E2: Oil temperature SCADA reading, currently 82℃ < alarm limit 90℃, μ2 = 0.3 E3: Oil chromatographic data shows an acetylene concentration of 5.2 ppm, falling between the warning value of 5 ppm and the danger value of 20 ppm. μ3 = 0.6 Calculate the joint confidence level using the DS composition rule: Bel(Fault) = m1({High}) ⊕ m2({Normal}) ⊕ m3({Warning}) After iterative synthesis, the overall confidence level of the fault hypothesis reached 0.74, indicating that there is a risk of local overheating, which requires close attention.
[0040] The system further accessed the knowledge graph database to query the equipment's historical maintenance records. It found that a sealing ring had been replaced six months prior due to a similar issue. Combined with the current three-phase temperature imbalance of 18% (standard <5%), the final assessment conclusion was: the C-phase bushing exhibits a persistent heating defect, suspected to be caused by poor sealing leading to increased eddy current losses. A power outage inspection was recommended within 72 hours, with a high priority. This recommendation, along with links to thermal imaging data, historical trend comparisons, and references to maintenance procedures, was returned.
[0041] The response synthesis module uses the TPL-DIAG-EMG-01 template and adopts an emergency notification format: Equipment health assessment results indicate an abnormal temperature in the C-phase bushing of substation A main transformer. A comprehensive assessment suggests a risk of localized overheating; a power outage inspection is recommended within 72 hours. Detailed analysis is provided in the attachment. The message is delivered to relevant personnel via SMS, App push notifications, and a dispatch console pop-up, achieving second-level alarm distribution.
[0042] This embodiment presents a completely different technical implementation path from the previous two embodiments in terms of input modality, data fusion method, and response priority mechanism. In particular, it introduces voice error-tolerant processing, multi-source sensor data DS fusion algorithm and emergency channel scheduling strategy, which constitute an independent and effective technical solution variant and meet the requirements of equivalent embodiments in the sense of patent law.
[0043] The above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention in any way. Although the present invention has been disclosed above with reference to preferred embodiments, it is not intended to limit the present invention. Any person skilled in the art can make some modifications or alterations to the above-disclosed technical content to create equivalent embodiments without departing from the scope of the present invention. Any simple modifications, equivalent changes and alterations made to the above embodiments based on the technical essence of the present invention without departing from the scope of the present invention shall still fall within the scope of the present invention.
Claims
1. A method for semantic understanding and application of a large-scale energy business model, characterized in that, include: The front-end interface module receives service requests in natural language from users of the energy digitization platform. The service request is preprocessed using a text preprocessing module, including character set conversion, noise character removal, word segmentation, and coarse-grained annotation of named entities in the energy field. The preprocessed text is input into the semantic parsing engine module, which calls a large language model adapted for the energy domain to perform intent classification and key parameter slot extraction, thereby obtaining structured semantic understanding results. Determine whether the intention graph label in the semantic understanding result exists in the preset set of valid business intentions. If so, proceed to the parameter integrity verification step. Check whether all necessary parameter slots associated with the concept graph have been successfully filled. If any are missing, generate natural language follow-up questions that conform to the energy business context and request the user to supplement information through the front-end interface. Repeat the aforementioned preprocessing and semantic parsing steps until the parameters are complete or the maximum number of follow-up questions is reached. Based on the final determined main intent label and complete parameter set, the task routing and scheduling module forwards the service request to the corresponding scenario-based intelligent analysis component module. The scenario-based intelligent analysis component module retrieves relevant information from the energy business knowledge graph database, combines it with external data sources to execute specific business analysis logic, and generates structured analysis results. The structured analysis results are input into the response synthesis and output module, where they are transformed into natural language descriptions and accompanied by data charts and references to form a complete response. The response content is pushed back to the front-end interface of the energy digitization platform, and the entire interaction process is logged for subsequent system optimization and model retraining.
2. The method for semantic understanding and application of the large-scale energy business model according to claim 1, characterized in that, The large language model, which has been adaptively trained in the energy field, uses full-text professional literature in the power industry and FAQ entries in the platform's internal knowledge base as training data before deployment, and employs low-rank adaptation technology for efficient parameter fine-tuning.
3. The method for semantic understanding and application of the large-scale energy business model according to claim 1, characterized in that, The energy business knowledge graph database is constructed using a graph database. Its node types include power users, distribution transformers, metering points, energy efficiency benchmark values, and typical load curves. Its edge relationships are defined as connected to, belong to, influence, conform to standards, and historical anomaly patterns. The knowledge graph also periodically synchronizes incremental data from the SCADA system, marketing business system, and electricity consumption information collection system.
4. The method for semantic understanding and application of the large-scale energy business model according to claim 1, characterized in that, The scenario-based intelligent analysis component module includes an energy efficiency diagnosis and analysis unit, a load trend prediction unit, an abnormal electricity consumption detection unit, an equipment health assessment unit, and an energy-saving measure recommendation unit. Specifically, the energy efficiency diagnosis and analysis unit generates an energy efficiency weakness analysis report based on comparisons between historical electricity consumption data from the user side and industry energy efficiency benchmarks; the load trend prediction unit outputs future load prediction curves by combining weather forecasts and holiday information; the abnormal electricity consumption detection unit constructs a criterion model based on multi-dimensional features to detect potential electricity theft or equipment aging risks; the equipment health assessment unit integrates multi-source sensor data to determine the remaining lifespan of power distribution equipment; and the energy-saving measure recommendation unit matches suitable energy storage configuration schemes or demand response participation strategies based on user industry attributes, electricity pricing packages, and load characteristics.
5. The method for semantic understanding and application of the large-scale energy business model according to claim 4, characterized in that, The abnormal power consumption detection unit introduces a dynamic threshold mechanism during the analysis process. The alarm thresholds for various criteria are determined based on the mean and standard deviation of the historical operating data of the equipment. When the real-time monitoring value exceeds the standard deviation range of a set multiple, an early warning is triggered.
6. The method for semantic understanding and application of a large-scale energy business model according to claim 1, characterized in that, The response synthesis process adopts a multi-template selection strategy. The system has built-in response templates for different business types. The template selection is based on the main intent tag and user identity attributes. All templates have been reviewed for legal compliance.
7. The method for semantic understanding and application of a large-scale energy business model according to claim 1, characterized in that, In the parameter integrity verification step, if the main graph is blurry or the confidence level is lower than the set threshold, it will be automatically transferred to a human agent for assistance, and the case will be recorded for subsequent model iteration training.
8. A semantic understanding and application system for a large-scale energy business model, characterized in that, include: The front-end interface module is used to receive natural language requests from users of the energy digitization platform; The text preprocessing module is used to perform preliminary cleaning and format standardization processing on the natural language request; The semantic parsing engine module is used to load and call a large language model that has been adaptively trained in the energy field for deep semantic understanding and intent recognition; The energy business knowledge graph database module is used to store and manage the topology relationships of power equipment, energy efficiency index system, historical operation and maintenance records and industry standard provisions; The task routing and scheduling module is used to distribute requests to the corresponding functional subsystems based on the identified business intent; The scenario-based intelligent analysis component module is used to perform specific business logic analysis and generate structured results; The response synthesis and output module is used to convert the analysis results into natural language feedback and return them to the user; The system operation status monitoring module is used to monitor model inference performance, call latency, and semantic accuracy.
9. The energy business large-scale model semantic understanding and application system according to claim 8, characterized in that, The text preprocessing module performs character encoding standardization, special symbol filtering, stop word removal, and preliminary named entity annotation operations on the received natural language requests. The named entity annotation covers energy-related proprietary terms such as substation name, voltage level, equipment type, time range, and electricity unit.
10. The energy business large-scale model semantic understanding and application system according to claim 8, characterized in that, The task routing and scheduling module makes path decisions based on the intent tags and parameter completeness output by the semantic parsing engine. When a clear business intent is identified and the parameters are complete, it directly routes to the corresponding intelligent analysis component. When parameters are missing, a counter-questioning strategy is triggered to generate supplementary questions and initiate interactive follow-up questions through the front-end interface. When the intent is ambiguous or the confidence level is lower than the set threshold, it is automatically transferred to a human agent for assistance.