Operational specification form template implementation method considering artificial intelligence utilization
By constructing a structured data warehouse and knowledge graph, and combining it with artificial intelligence models to generate operational standard forms, the problems of rigid design of operational standard forms and insufficient data value mining have been solved. This has enabled dynamic adaptation of operational standard forms and scientific decision-making, thereby improving operational efficiency and cost control.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHINA THREE GORGES CORPORATION
- Filing Date
- 2026-03-20
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies suffer from rigid operational standard forms, insufficient data value mining, and a disconnect between artificial intelligence models and daily operational management processes. This results in the inability to achieve two-way dynamic collaborative iteration between forms and artificial intelligence, making it difficult to support scientific and intelligent decision-making under complex working conditions.
We construct a structured data warehouse and knowledge graph for the field of urban wastewater treatment, combine artificial intelligence models to generate operational standard form templates, and achieve multi-dimensional analysis of operational data and closed-loop upgrades of models through compliance verification and intelligent interaction.
It enables dynamic adaptation of operational standard forms, improves operational efficiency and cost control, ensures bidirectional iteration between operational standard forms and artificial intelligence, and supports scientific decision-making under complex working conditions.
Smart Images

Figure CN122242473A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of urban wastewater treatment operation, and in particular to an implementation method for an operation specification form template that also takes into account the use of artificial intelligence. Background Technology
[0002] As a crucial component of urban infrastructure, the level of urban wastewater treatment management directly impacts the economic benefits of enterprises and the social environment. In traditional operation and management, standardized operational forms are often designed and used with traditional electronic form software (such as Excel and WPS Forms), relying heavily on manual operation. Staff must manually fill out, organize, and analyze form data, which is not only time-consuming and labor-intensive but also prone to human error. Furthermore, traditional methods struggle to efficiently mine and utilize massive amounts of operational data, failing to provide timely and accurate support for decision-making. While some existing form software possesses basic data processing capabilities, these functions fall short when facing complex operational scenarios and intelligent requirements, failing to achieve deep data insights and automated process handling. This rigid "one-time design, no change" model cannot achieve intelligent process handling and dynamic adaptation when facing wastewater treatment process upgrades, equipment iterations, or abnormal operating conditions. This leads to frequent deviations from standard practices in daily execution, failing to meet the needs of refined operations and scientific decision-making for enterprises.
[0003] To address the challenges of collecting and analyzing pipeline network or water affairs data, existing technology CN111178020A proposes a smart pipeline network analysis system based on big data. This solution acquires dynamic pipeline network data through a data acquisition module and generates blank and rule-based forms based on pre-stored static data. Dynamic data is then filled into the blank forms, and invalid entries are flagged as invalid, resulting in a filled form. Entries without invalid flags are matched against the rule-based forms, and alarms are issued for mismatches, resulting in a result form. However, this existing technology essentially remains at the level of passive static rule checking. Its form structure and matching rules are fixed and preset, lacking the ability for deep learning and dynamic evolution. As wastewater treatment scenarios become increasingly complex, this form, relying on fixed thresholds, cannot adaptively adjust to real-time water quality fluctuations or historical long-term patterns, nor can it achieve iterative optimization of the form's structure.
[0004] With the development of artificial intelligence (AI) technology, the industry has also attempted to introduce machine learning into the water sector for energy consumption or operational condition prediction. For example, existing technology CN119337085A proposes a multi-level feature-based machine learning prediction method for electricity consumption in wastewater treatment plants. This scheme integrates regional development characteristics, precipitation characteristics, and input variables such as wastewater treatment volume and wastewater treatment technology, and uses machine learning models (such as XGBoost, LightGBM, etc.) to predict the electricity demand of wastewater treatment plants. Although this existing technology introduces AI algorithms to solve the prediction problem of specific indicators, its model, as an isolated prediction tool, is completely detached from the operational standard forms frequently used by frontline personnel in wastewater treatment plants. This disconnect makes it difficult for AI prediction results to be automatically converted into standard operating guidelines and compliance verification rules on forms, and the underlying operational data cannot smoothly form a closed loop that continuously feeds back to the AI system. Furthermore, a single prediction model cannot cope with the coordinated linkage of multiple scenarios such as process control, equipment health, and compliance management. Summary of the Invention
[0005] The main objective of this invention is to provide an implementation method for operational standard form templates that takes into account the use of artificial intelligence. This addresses the technical problems in existing technologies, such as rigid operational form design, insufficient data value mining, and the disconnect between artificial intelligence models and daily operational management processes. These issues prevent the achievement of bidirectional dynamic collaborative iteration between forms and artificial intelligence, and make it difficult to support scientific and intelligent decision-making under complex working conditions.
[0006] To solve the above-mentioned technical problems, the technical solution adopted by the present invention is: a method for implementing an operational specification form template that also takes into account the use of artificial intelligence, comprising the following steps: S1. Construct a structured data warehouse and knowledge graph for the field of urban wastewater treatment, and train artificial intelligence models for corresponding urban wastewater treatment business scenarios based on the structured data warehouse; S2. Analyze the urban sewage treatment operation requirements input by natural language, and generate operation specification form templates by combining knowledge graphs and artificial intelligence models; S3. Collect on-site operational data through the generated operational specification form template, use artificial intelligence models for compliance verification and intelligent interaction during the execution process, and generate auxiliary decision-making suggestions based on the on-site operational data; S4. The collected on-site operational data and the execution effect data of auxiliary decision-making suggestions are fed back to the artificial intelligence system for multi-dimensional analysis. Based on the analysis results, the structure and content of the operational standard form template are iteratively optimized. At the same time, the fed-back data is used to upgrade the artificial intelligence model in a closed loop.
[0007] In the preferred scheme, the methods for constructing a structured data warehouse and knowledge graph in the field of urban wastewater treatment in S1 include: Collect historical operational data from urban wastewater treatment plants, covering dimensions such as water quality monitoring, equipment operation, chemical dosing, process control, and operation and maintenance management. Clean and deduplicate the data, unify the data format, and classify and store it to establish a structured data warehouse. Define nodes and edges in the knowledge graph. Nodes include water quality indicators, equipment, reagents, processes, and personnel. Edges represent the logical relationships between water quality indicators, equipment, reagents, processes, and personnel. Natural language processing technology was used to extract feature entities from the operating procedures and industry standards of urban sewage treatment plants to fill a knowledge graph.
[0008] In the preferred scheme, the artificial intelligence model trained in S1 for the corresponding urban wastewater treatment business scenario includes: The template generation optimization model is trained by combining a natural language understanding model based on BERT bidirectional encoder representation with a sequence labeling algorithm. The compliance verification model is trained using a knowledge graph-based graph neural network algorithm. The cost-efficiency optimization model was trained, and the cost-efficiency optimization model adopted a multivariate regression prediction model based on LightGBM lightweight gradient booster. The equipment health prediction model is trained using a Transformer time series prediction model based on time series data.
[0009] In the preferred solution, the method used in S2 to generate the operation specification form template by combining knowledge graphs and artificial intelligence models is as follows: The similarity between the parsed operational requirements and the basic templates in the historical template library is calculated using a collaborative filtering algorithm. When the similarity is greater than the preset matching threshold, the corresponding basic template is invoked. When the similarity is lower than the preset matching threshold, the required fields corresponding to the business process are automatically matched by the graph traversal algorithm of the knowledge graph, and the initial template framework is automatically generated. The preset matching threshold is 75%-85%.
[0010] In the preferred solution, after automatically generating the initial template framework, the template is used to generate an optimized model that inputs the parsed operational requirement keywords and business nodes and relationships in the knowledge graph, and outputs and automatically sets the field set, field association rules and compliance verification rules of the operational specification form template. The template generation optimization model incorporates a knowledge graph forced attention mechanism into the output layer of the BERT bidirectional encoder representation model. When generating fields, it prioritizes matching the mandatory compliance fields for the wastewater treatment industry and business association rules defined in the knowledge graph, ensuring that the generated templates comply with industry standards and water plant business processes.
[0011] In the preferred solution, S3 utilizes an artificial intelligence model for compliance verification and intelligent interaction during execution as follows: The compliance verification model takes on-site operational data as input in real time and uses graph neural networks to calculate the matching degree between on-site operational data and preset standard thresholds in the knowledge graph, outputting the data compliance judgment result and the level of abnormal data. The graph neural network algorithm of the compliance verification model adds the temporal feature dimension to simultaneously verify the threshold compliance of a single data point and the consistency of the process logic of operational data within a continuous time period, and identifies hidden operational anomalies that cannot be covered by conventional threshold verification. When on-site operational data triggers preset control thresholds, the subgraph matching algorithm of the knowledge graph identifies operational execution deviations and automatically pushes corresponding disposal suggestions, historical solutions, and compliance correction guidelines for the urban sewage treatment business scenario.
[0012] In the preferred scheme, when the water quality monitoring data in the on-site operation data exceeds the standard, process adjustment suggestions are automatically pushed, including pop-up prompts in forms, system message pushes, mobile reminders, etc., and the historical treatment plans corresponding to the abnormal water quality data are associated through the entity linking algorithm of knowledge graph.
[0013] In the preferred scheme, the methods for generating auxiliary decision-making suggestions based on on-site operational data in S3 include: The cost-efficiency optimization model takes into account real-time influent water quality data, influent water volume data, equipment operating parameters, and effluent water quality standards. With the constraint of continuous effluent water quality compliance and the optimization objective of minimizing chemical and power consumption, it outputs the optimal chemical dosage and equipment operating parameters, including aeration equipment operating frequency and sludge return ratio. The cost-efficiency optimization model incorporates a dynamic weight adjustment mechanism into the LightGBM lightweight gradient booster model, assigning higher weights to recent real-time operational data and inputting seasonal and rainfall characteristics as auxiliary variables to adapt to fluctuating influent water quality and quantity scenarios and improve parameter optimization accuracy. The equipment health prediction model takes into account real-time operating parameters, historical fault data and maintenance records, and outputs equipment health score and prediction of potential fault occurrence time. When the predicted fault occurrence time is within the next 24 to 48 hours, it automatically triggers a fault warning and pushes the corresponding maintenance plan. The Transformer time series prediction model of the equipment health prediction model incorporates a multi-scale feature extraction module to simultaneously capture the short-term mutation characteristics and long-term degradation trends of equipment operation data, thereby improving the accuracy of early potential fault identification and the advance warning.
[0014] In the preferred scheme, the natural language understanding model based on BERT bidirectional encoder representation combined with sequence labeling algorithm achieves the optimal drug dosage and equipment operating parameter output in the following way: Using the BERT bidirectional encoder representation model, semantic features are extracted from textual data of process specifications, historical process adjustment plans, and water quality anomaly handling records in the urban wastewater treatment industry to generate process rule feature vectors. Using a sequence labeling algorithm, entity annotation is performed on the feature vector of process rules to identify key water quality factors, equipment operating conditions, and process constraint factors affecting reagent dosage, aeration equipment operating frequency, and sludge return ratio. The labeled key factors are then transformed into structured constraints. A comprehensive operating cost function is constructed by combining the optimization objective of minimizing reagent and power consumption. This function is then solved within the constraint space of achieving effluent quality standards to output the optimal parameters. The calculation formula for the comprehensive operating cost function is as follows: ; In the formula, This represents the overall operating cost index; Indicates the total number of drug types; This indicates the total number of high-energy-consuming devices; Indicates the first The unit dose cost coefficient of the drug; Indicates the first Dosage of the agent; Indicates the first Energy consumption coefficient per unit operating frequency of each device; Indicates the first The operating frequency of each device; This represents the chemical consumption weighting adjustment factor that is dynamically allocated based on real-time water quality status. This represents the power consumption weighting adjustment factor dynamically allocated based on real-time water quality status.
[0015] In the preferred scheme, the multivariate regression prediction model based on the LightGBM lightweight gradient lift machine is used to adapt to the scenario of fluctuating influent water quality and quantity and improve the accuracy of parameter optimization. The method is as follows: feature engineering is performed on the collected historical operation data to extract multi-dimensional features of influent water quality, quantity, equipment operation, season, and rainfall, and the training set and validation set are divided according to the time series. A dynamic weight adjustment mechanism incorporating water quality fluctuations is introduced into the objective function of the LightGBM lightweight gradient booster to calculate training weights for each sample in the training set. The formula for calculating the dynamic training weights is as follows: ; In the formula, Indicates the first Training weights for each historical operational data sample; Represents the basic weight constant; Indicates the timestamp of the current prediction time; Indicates the first The timestamp of when each historical operational data sample was collected; This represents the historical attenuation coefficient. Indicates the first The comprehensive fluctuation gradient of influent water quality and quantity corresponding to each historical operational data point; This represents the amplification factor of abnormal fluctuations extracted based on rainfall and seasonal characteristics. The model uses this weight as a basis to complete pre-training with a histogram algorithm, and after being put into use, it is updated online incrementally at fixed time windows.
[0016] In the preferred scheme, the specific execution method of the equipment health prediction model is as follows: after completing and normalizing the input real-time operating parameters, historical fault data and maintenance records of the equipment, the time series data segments are divided according to a fixed time step and input into the Transformer time series prediction model. The model outputs the device health score at the current moment and, in parallel, outputs the predicted health trajectory for multiple future time steps; the calculation formula for the device health score is:
[0017] In the formula, This represents the device health score, with a value range of [value missing]. Up to 100 points; This indicates the total number of dimensions of equipment operating parameters included in the health assessment; Indicates the first Real-time measured actual values of each parameter dimension; This indicates that the Transformer time series forecasting model is based on the output of normal operating conditions. Theoretical predicted values for each parameter dimension; and They represent the first The upper and lower limits of safe physical values for each parameter dimension; Indicates the first Fault sensitivity weights across multiple parameter dimensions; This represents the time-series penalty function for mechanical losses that increases with the equipment's operating time; when a certain time step in the predicted trajectory... When the health level falls below the preset health threshold, a fault warning will be automatically triggered within the next 24 to 48 hours, and a corresponding maintenance plan will be pushed out.
[0018] In the preferred embodiment, the multi-scale feature extraction module is executed as follows: a multi-scale feature extraction module containing one-dimensional convolutional branches and dilated convolutional branches is set at the encoder input of the Transformer time series prediction model. By using different convolutional kernels to extract and fused features from short-term abrupt changes and long-term degradations in device operation data, a multi-scale fusion feature matrix is generated and input into the encoder. The calculation formula for the multi-scale fusion feature matrix is as follows: ; In the formula, This represents the output multi-scale fused feature matrix; This represents the time series matrix of the input device operating parameters; This represents a small-sized one-dimensional convolutional kernel used to capture short-term, high-frequency abrupt changes in device parameters; This represents a standard one-dimensional convolution operation; This refers to a large-size dilated convolutional kernel used to capture long-term, low-frequency degradation trends in device performance. Indicates the expansion rate The dilated convolution operation; and These represent the attention weights for short-term mutation features and long-term degradation features, which are dynamically allocated through a self-attention mechanism, respectively.
[0019] In the preferred solution, S4 optimizes the structure and content of the operational guidelines form template based on the analysis results as follows: Based on the preset urban wastewater treatment standards, anomaly identification and trend prediction analysis are performed on the collected on-site operation data. Based on the results of anomaly identification and trend prediction analysis, the structure of key monitoring items, related data content, and data collection frequency in the existing operation specification form template are optimized in reverse. A new generation of operation specification form template is generated and put into collection and verification. Unlike the traditional fixed operation form template, the operation specification form template can be dynamically adapted to the process upgrades, equipment iterations, and policy and standard updates of urban wastewater treatment plants.
[0020] In the preferred scheme, the method of using the feedback data to perform closed-loop upgrades on the artificial intelligence model in S4 is as follows: the artificial intelligence model is fine-tuned using an incremental learning algorithm; The incremental learning algorithm incorporates a catastrophic forgetting suppression mechanism with regularization constraints to prevent the model from losing its ability to fit existing routine conditions after learning new operating conditions. The total loss function in the incremental learning stage is calculated as follows: ; In the formula, This represents the total loss function during the incremental fine-tuning phase. This represents the current prediction error loss calculated by the model based on the newly added training set, such as on-site operational data and process optimization results. This represents the total number of parameter nodes in the model's neural network; Indicates the first step in the fine-tuning process The current iteration value of each model weight; This indicates the first element in the original model before incremental fine-tuning. A fixed value for each weight; Indicates the first The diagonal elements of the sensitivity evaluation matrix for each weight to historical routine operating condition characteristics; This represents the regularization strength coefficient that balances the learning of new knowledge with the retention of old knowledge; at the same time, a model performance evaluation mechanism is set, which triggers full retraining when any indicator falls below the preset upgrade threshold.
[0021] In the preferred solution, multiple artificial intelligence models achieve cross-scenario linkage. The fault warning information output by the equipment health prediction model is automatically synchronized to the template generation optimization model, and the corresponding fault-specific inspection items are automatically added to the equipment maintenance and operation specification form template. The prediction results of the water quality trend prediction model are automatically synchronized to the compliance verification model, and the early warning thresholds of the form data are updated in advance, realizing intelligent collaboration throughout the entire process.
[0022] This invention provides an implementation method for an operational standard form template that takes into account the use of artificial intelligence. By deeply integrating artificial intelligence technology with the operational standard forms of urban sewage treatment plants, this application achieves a significant leap in operational efficiency and lean cost control.
[0023] Firstly, in terms of form generation and execution interaction, it breaks through the limitations of traditional manual form design, which is time-consuming and of inconsistent quality. Based on knowledge graphs and natural language understanding models, it can quickly generate form frameworks that conform to industry standards and business logic. During the execution process, it uses graph neural networks to perform deep temporal compliance verification, transforming exception handling from paper regulations to proactive pop-up guidance and association with historical solutions, which greatly enhances the enforceability of standards and reduces the risk of human error.
[0024] Secondly, regarding cost reduction, efficiency improvement, and scientific decision-making, this application utilizes a multivariate regression prediction model with dynamic weight adjustment and a multi-scale time series prediction model to precisely control the dosage of chemicals and equipment operating parameters, and provides early warnings of potential equipment failures. This effectively mitigates energy waste caused by fluctuations in influent water quality and quantity, and avoids production losses due to sudden equipment shutdowns. Most importantly, this application constructs a full-process data closed loop and bidirectional iterative mechanism: massive amounts of high-quality form data continuously drive the upgrade of the artificial intelligence model through an incremental learning mechanism, while the increasingly powerful artificial intelligence model in turn guides the dynamic optimization of form structure, key monitoring items, and data collection frequency. This makes the operation forms a truly intelligent management and control platform that can continuously and dynamically adapt to water plant process upgrades and standard updates. Attached Figure Description
[0025] The present invention will be further described below with reference to the accompanying drawings and embodiments: Figure 1 This is a flowchart illustrating the implementation method of the operational specification form template of this invention. Detailed Implementation
[0026] Example 1 like Figure 1 As shown, an implementation method for an operational standard form template that also takes into account the use of artificial intelligence includes the following steps: S1. Construct a structured data warehouse and knowledge graph for the field of urban wastewater treatment, and train artificial intelligence models for corresponding urban wastewater treatment business scenarios based on the structured data warehouse; S2. Analyze the urban sewage treatment operation requirements input by natural language, and generate operation specification form templates by combining knowledge graphs and artificial intelligence models; S3. Collect on-site operational data through the generated operational specification form template, use artificial intelligence models for compliance verification and intelligent interaction during the execution process, and generate auxiliary decision-making suggestions based on the on-site operational data; S4. The collected on-site operational data and the execution effect data of auxiliary decision-making suggestions are fed back to the artificial intelligence system for multi-dimensional analysis. Based on the analysis results, the structure and content of the operational standard form template are iteratively optimized. At the same time, the fed-back data is used to upgrade the artificial intelligence model in a closed loop.
[0027] In the preferred scheme, the methods for constructing a structured data warehouse and knowledge graph in the field of urban wastewater treatment in S1 include: Collect historical operational data from urban wastewater treatment plants, covering dimensions such as water quality monitoring, equipment operation, chemical dosing, process control, and operation and maintenance management. Clean and deduplicate the data, unify the data format, and classify and store it to establish a structured data warehouse. Define nodes and edges in the knowledge graph. Nodes include water quality indicators, equipment, reagents, processes, and personnel. Edges represent the logical relationships between water quality indicators, equipment, reagents, processes, and personnel. Natural language processing technology was used to extract feature entities from the operating procedures and industry standards of urban sewage treatment plants to fill a knowledge graph.
[0028] In the specific implementation process, constructing a structured data warehouse and knowledge graph in the field of urban wastewater treatment is the underlying data foundation and logical support for the entire system to achieve intelligent analysis and dynamic adaptation. First, the system comprehensively collects historical operational data from urban wastewater treatment plants across various dimensions through a distributed data acquisition interface. These dimensions cover indicators such as chemical oxygen demand (COD) and ammonia nitrogen in water quality monitoring, current and frequency parameters in equipment operation, chemical consumption such as polyacrylamide in chemical dosing, aeration adjustment records in process control, and fault and maintenance logs in operation and maintenance management. The massive amounts of raw data collected often have problems such as inconsistent formats, missing values, and abnormal noise, therefore, they must undergo rigorous cleaning, deduplication, and format unification. The system uses time series alignment and outlier smoothing algorithms to remove erroneous data and transforms unstructured logs into structured relational data tables, which are then categorized and stored in the structured data warehouse. To eliminate the differences in dimensions between different dimensions of data, the system uses a standardized processing model to normalize the dimensions of the cleaned numerical real-time sensing data when building the data warehouse. The standardized mapping formula is: ; In the formula, This represents the standard feature value stored in the structured data warehouse after normalization. This represents the original historical values collected from various underlying devices or sensors to complete the basic cleaning process; This represents the historical mean of this type of feature data within a specific time sliding window; This represents the historical standard deviation of the feature data within the sliding window of time; This represents the feature dimension scaling mapping constant set to adapt to the input requirements of subsequent artificial intelligence models. Through the above processing, a high-quality and dimensionally consistent clean data source can be provided for subsequent model training.
[0029] Building upon the structured data warehouse, the system further defines nodes and edges in the knowledge graph to construct a network topology reflecting the business logic of urban wastewater treatment. The knowledge graph's nodes encompass all key elements in the wastewater treatment scenario, specifically including water quality indicator nodes, equipment nodes, reagent nodes, process flow nodes, and personnel nodes. The knowledge graph's edges are used to rigorously quantify the objective logical relationships between water quality indicators, equipment, reagents, processes, and personnel. For example, a control influence edge exists between a specific aeration equipment node and a dissolved oxygen water quality indicator node, and an operation and maintenance responsibility edge exists between a personnel node and a specific process flow node. To enable these logical relationships to be effectively identified and calculated by the machine learning model, the system establishes dynamic association weights between nodes based on historical operational data. The formula for calculating the edge relationship weight is as follows: ; In the formula, Representing node entities in a knowledge graph With node entities The comprehensive relational weight of the logically related edges between them; This represents the total number of historical time periods used to calculate the correlation; and Representing nodes respectively With nodes In the The characteristic quantized state value of each time period; and These represent the average value of the feature quantization state of the corresponding node throughout the entire computation cycle; This represents the prior mechanism penalty coefficient assigned based on the mechanistic model in the field of wastewater treatment. This formula integrates pure statistical correlation with the physicochemical mechanisms of wastewater treatment, ensuring that the generated edge weights not only reflect data patterns but also conform to common scientific knowledge in the industry.
[0030] After defining the basic knowledge graph framework, the system utilizes natural language processing (NLP) technology to extract feature entities from the operational procedures of urban wastewater treatment plants and national industry standards documents to enrich and populate the knowledge graph's content. The system employs a bidirectional encoder representation model to scan and extract semantic features from unstructured text regulations, and uses sequence labeling algorithms to identify various node entities hidden within the regulations and their corresponding compliance constraints. The extracted feature entities and their relationships are mapped as fact triples to the corresponding nodes and edges in the knowledge graph. To ensure the high accuracy and reliability of the entity relationships extracted from the regulations and populated into the knowledge graph, the system calculates a confidence score for each extracted fact triple. The confidence score calculation formula is as follows: ; In the formula, This represents the overall confidence score when unstructured text content is transformed into graph triples and populated into a knowledge graph; This represents the predicted probability that the natural language processing algorithm has identified the boundary of the target feature entity. This represents the probability value of the algorithm determining a relationship between two target feature entities based on a specific logical connection edge. and These represent the weight adjustment parameters dynamically allocated by the system for the entity recognition stage and the relation extraction stage, respectively. This represents a topology conflict attenuation term applied when the extracted triple rule logically contradicts the existing physical topology in the knowledge graph. Only when the confidence score exceeds a preset threshold will the extracted content be formally written into the knowledge graph.
[0031] The above implementation methods have yielded significant beneficial effects. First, by constructing a multi-dimensional structured data warehouse, the problems of severe data silos and low data quality among various business systems in traditional wastewater treatment plants have been completely resolved. Standardized and clean data greatly improves the training convergence speed and prediction accuracy of subsequent artificial intelligence models. Second, by constructing a knowledge graph based on mechanistic prior coefficients and using natural language processing technology to transform paper-based operating procedures and industry standards into machine-readable graph entities and logical edges, artificial intelligence is no longer a black-box model that simply relies on data, but is subject to the dual constraints of objective physicochemical laws and safety compliance regulations in wastewater treatment. This underlying construction method, driven by both data and knowledge graphs, not only expands the coverage of business scenarios but also ensures that the subsequently generated operational form templates possess high professionalism, interpretability, and fault tolerance, laying an irreplaceable foundation for achieving intelligent assisted decision-making and dynamic system adaptation.
[0032] In the preferred scheme, the artificial intelligence model trained in S1 for the corresponding urban wastewater treatment business scenario includes: The template generation optimization model is trained by combining a natural language understanding model based on BERT bidirectional encoder representation with a sequence labeling algorithm. The compliance verification model is trained using a knowledge graph-based graph neural network algorithm. The cost-efficiency optimization model was trained, and the cost-efficiency optimization model adopted a multivariate regression prediction model based on LightGBM lightweight gradient booster. The equipment health prediction model is trained using a Transformer time series prediction model based on time series data.
[0033] After constructing and improving the structured data warehouse and knowledge graph, the system needs to train a series of dedicated artificial intelligence models for specific business scenarios in urban sewage treatment to achieve intelligent generation and closed-loop management of operational standard form templates. First, the system trains a template generation optimization model. This model uses a natural language understanding model based on bidirectional encoder representation combined with a sequence labeling algorithm. In practical applications, the system converts the natural language requirement text input by business personnel into a sequence of word vectors, uses the bidirectional encoder representation model to extract deep semantic features, and then uses the sequence labeling algorithm to classify the feature vectors into entities, accurately extracting the form field requirements and filling rules contained in the text. To improve the accuracy of sequence labeling, the system uses joint conditional probability to calculate and optimize the output sequence; the target probability calculation formula is: ; In the formula, This represents the predicted conditional probability of a specific field label sequence corresponding to a natural language demand text; This represents the total length of the valid word sequence in the input request text. The bidirectional encoder represents the output of the model. Each word is mapped to a specific tag. The launch score; This indicates the sequence labeling algorithm from the previous label. Move to current tag The transition score; Y' represents all possible combinations of label sequences; and Let represent the hypothetical labels for the current position and the previous position in any possible label sequence, respectively. This formula, through global normalization, ensures that the model, when generating the template field set, not only relies on local word meanings but also strictly adheres to the contextual logical coherence of wastewater treatment business rules.
[0034] Secondly, the system trains a compliance verification model. This compliance verification model employs a graph neural network algorithm based on a knowledge graph. Since wastewater treatment involves complex coupling of multiple factors such as water quality indicators, reagents, and equipment, the system maps real-time operational data collected on-site to dynamic node features in a knowledge graph. The graph neural network learns and extracts the compliance dependencies between various business nodes by performing message passing and neighbor node feature aggregation on the graph topology. When real-time data is input, the model performs multi-layer feature aggregation through graph convolution operations. The update formula for its node hidden layer features is: ; In the formula, Represents the target entity node in the knowledge graph In the Updated feature vectors after aggregation by layered graph neural networks; Represents the target entity node In the The current hidden feature vector of the layer; Indicates the relationship with the target entity node A set of neighboring nodes that have direct business logic relationships; Representing neighboring nodes In the Hidden feature vectors of the layer; and Representing the target entity nodes respectively with neighboring nodes The degree of connectivity in the topology of a knowledge graph; and These represent the model at the 1st and 2nd. The self-feature transformation weight matrix and the neighbor feature transformation weight matrix learned through backpropagation training in a layered graph convolutional network; This represents the activation function that introduces nonlinear mapping capabilities. This model and its calculation process enable the system to penetrate the surface appearance of individual data and accurately verify the compliance of operational actions and data from a global process coordination perspective.
[0035] Subsequently, the system trains a cost-efficiency optimization model. This model employs a multivariate regression prediction model based on a lightweight gradient booster. Facing the challenge of controlling reagent and energy consumption due to frequent fluctuations in influent water quality and quantity at wastewater treatment plants, the model uses massive amounts of multidimensional data, including historical influent water quality and quantity, seasonal environmental parameters, and equipment operating status, as input variables. The lightweight gradient booster uses a histogram algorithm to discretize continuous features and constructs a multi-decision tree ensemble network through a leaf growth strategy with depth constraints. To solve for the optimal reagent dosage and equipment operating frequency while ensuring effluent water quality meets standards, the formula for calculating the weight of a single regression tree leaf node during training is as follows: ; In the formula, Indicating the first in a lightweight gradient lift machine The optimal prediction weight output value for each leaf node; Indicates being divided into the first A collection of historical multidimensional operational data samples within each leaf node; In the sample set, the first... The first derivative gradient value of the objective cost function for each sample, calculated based on real data at the current iteration step; Indicates the corresponding number The second derivative Hessian matrix value of the objective cost function for each sample at the current iteration step, calculated based on real data; This represents the leaf node weight penalty regularization constant used to prevent overfitting of the model under complex operating conditions. Through the above regression prediction model, the system can quickly adapt to the nonlinear characteristics of water quality fluctuations, significantly improving the accuracy and computational efficiency of energy efficiency optimization parameter output.
[0036] Finally, the system trains an equipment health prediction model. This model employs a Transformer time series prediction model based on time series data. The deterioration of core wastewater treatment equipment, such as large blowers and sludge return pumps, is a complex process with continuous and implicit time-series dependencies. The system integrates the equipment's real-time operating current, vibration frequency, and historical maintenance records into a time series matrix input to the model, arranged by time step. The Transformer time series prediction model abandons the recursive structure of traditional recurrent neural networks, instead utilizing a self-attention mechanism to process global time series data in parallel, capturing the long-term and short-term dependencies of equipment operating parameters across different time spans. Its core self-attention score calculation formula is: ; In the formula, This represents the device runtime feature matrix, which contains global temporal dependency information, calculated using the self-attention mechanism. This represents the query feature matrix generated by linear mapping from the original matrix of device timing input; This represents the transpose of the key feature matrix generated by linear mapping from the original matrix of device timing input; The square root of the key feature matrix represents the feature dimension, and its square root is used to prevent the gradient vanishing problem caused by an excessively large inner product result. This represents the transformation of the mapped dot product result into a normalized exponential function of the weighted probability distribution; This represents the value feature matrix generated by linearly mapping the original time-series input matrix of the equipment. This time series prediction model can keenly capture subtle changes and long-term decay trends in equipment operating data, thereby achieving high-precision prediction and early warning of potential failure times.
[0037] The implementation method employing the above-mentioned combination of multiple artificial intelligence models brings comprehensive and significant benefits to this invention. On the one hand, the specialized division of labor among multiple models breaks through the performance bottleneck of traditional single algorithms when dealing with complex urban sewage treatment scenarios. The natural language understanding model solves the efficiency problem of human-computer interaction and rule extraction; the graph neural network algorithm overcomes the limitations of independent data threshold verification, realizing compliant control of global process topology relationships; and the gradient booster and time series prediction models provide precise quantitative guidance on the two core requirements of process cost reduction and equipment life extension, respectively. On the other hand, the computational logic of these models is deeply integrated with business mechanisms. By using the probability constraint of sequence coherence to constrain the form generation logic, utilizing graph node aggregation to map the water affairs relationship network, and introducing regularization to prevent overfitting, it ensures that the strategies and form templates output by the system not only converge in the mathematical feature space, but are also safe, accurate, and usable in the actual physical environment of sewage treatment. This multi-model collaborative processing mechanism transforms the operational specification form from a static tool that passively records data into a core operational hub with deep perception, intelligent analysis, early warning, and dynamic optimization capabilities.
[0038] In the preferred solution, the method used in S2 to generate the operation specification form template by combining knowledge graphs and artificial intelligence models is as follows: The similarity between the parsed operational requirements and the basic templates in the historical template library is calculated using a collaborative filtering algorithm. When the similarity is greater than the preset matching threshold, the corresponding basic template is invoked. When the similarity is lower than the preset matching threshold, the required fields corresponding to the business process are automatically matched by the graph traversal algorithm of the knowledge graph, and the initial template framework is automatically generated. The preset matching threshold is 75%-85%.
[0039] In the preferred solution, after automatically generating the initial template framework, the template is used to generate an optimized model that inputs the parsed operational requirement keywords and business nodes and relationships in the knowledge graph, and outputs and automatically sets the field set, field association rules and compliance verification rules of the operational specification form template. The template generation optimization model incorporates a knowledge graph forced attention mechanism into the output layer of the BERT bidirectional encoder representation model. When generating fields, it prioritizes matching the mandatory compliance fields for the wastewater treatment industry and business association rules defined in the knowledge graph, ensuring that the generated templates comply with industry standards and water plant business processes.
[0040] In the specific implementation process of generating operational specification form templates by combining knowledge graphs and artificial intelligence models, the system first uses a collaborative filtering algorithm to perform deep similarity matching between the parsed user operational needs and the basic templates in the historical template library. The system transforms the operational need keywords extracted after natural language processing into multi-dimensional need feature vectors, and simultaneously extracts the structure and business tags of each basic template in the historical template library into template feature vectors. To accurately measure the semantic fit and business overlap between the two, the system adopts a cosine similarity calculation model that integrates business feature weights. The formula for calculating the similarity score is: ; In the formula, This represents the overall similarity score between the parsed operational requirements and a specific historical template. This represents the total number of unified embedding dimensions between the demand feature vector and the template feature vector; The user operation demand feature vector represents the first... Feature values in each dimension; The historical basic template feature vector is represented in the th... Feature values in each dimension; The knowledge graph is represented as the first... The business importance weights are dynamically assigned to each feature dimension to amplify the similarity ratio of core business dimensions such as accounting or key water quality indicators during calculation.
[0041] When the system calculates a similarity greater than the preset matching threshold using the above formula, it directly calls the corresponding high-scoring basic template to maximize the reuse of historical excellent form assets and significantly improve template generation efficiency. This implementation strictly limits the preset matching threshold to between 75% and 85%. The technical consideration for establishing this range is that if the threshold is set too high, a large number of similar templates with reference and modification value will be rejected by the system, causing the system to frequently start the underlying generation algorithm, thus increasing ineffective computing power consumption; if the threshold is set too low, the called template will deviate too much from the actual operational needs, and the cost of subsequent manual fine-tuning or machine correction will increase dramatically. When the highest similarity is still lower than the preset matching threshold, the system determines that the current input is a significantly different customized new requirement. At this time, the system switches to a knowledge graph traversal algorithm, starting with the graph nodes mapped to the requirement keywords, to perform a directed walk, automatically finding and matching the indispensable required fields in the business process, thereby automatically generating an initial template framework. The formula for calculating the recommendation score of the candidate field nodes in the graph traversal is: ; In the formula, This represents the overall recommendation score of a candidate field node being included in the initial template framework during the knowledge graph traversal process; This represents the walk decay control coefficient in the graph traversal algorithm; This represents the probability value of a topology transition from the current critical requirement node to the candidate field node along the business logic edge; This indicates the global prior importance score of the candidate field node, which is defined as a mandatory field in the entire urban wastewater treatment industry standard.
[0042] After automatically generating the initial template framework, the system further refines and enriches it using a template generation optimization model. The system inputs the parsed operational requirement keywords and business nodes and relationships extracted from the knowledge graph through traversal into the optimization model, outputting and automatically setting the field set, field association rules, and compliance verification rules for subsequent data entry in the final operational specification form template. This template generation optimization model uses a bidirectional encoder representation model as its underlying natural language understanding foundation and innovatively incorporates a knowledge graph-based forced attention mechanism into its output layer. Conventional natural language models tend to generate free divergence in form field generation that does not conform to the rigorous logic of specific industry sectors. To address this inherent deficiency, the system introduces a constraint bias matrix based on graph business association rules in the self-attention calculation and scoring stage. The calculation formula for its forced attention feature fusion is as follows: ; In the formula, This represents the attention feature matrix of the form fields output after being subject to mandatory constraints by the knowledge graph. This represents the query feature matrix generated by linear mapping from the initial template frame text; This represents the transpose of the key feature matrix generated by linearly mapping demand keywords and business nodes; Indicates the dimension of the hidden layer vector; This represents the value feature matrix generated by mapping the above input content; This represents the constraint bias matrix generated based on the knowledge graph. In the matrix, elements that conform to the mandatory fields for compliance in the urban wastewater treatment industry and the rules for business association are assigned a very large positive bias value, while elements that violate common sense about water treatment processes are assigned a penalty bias value that approaches negative infinity.
[0043] The aforementioned implementation method, combining collaborative filtering, graph traversal, and forced attention mechanisms, yields significant and irreplaceable benefits. This multi-level, distributed generation mechanism balances efficient form template generation with deep customization capabilities, leveraging existing, high-quality form assets while possessing the ability to build customized forms for complex water plant scenarios from scratch. Scientifically determined matching threshold ranges precisely balance computational power consumption and modification costs. The most crucial breakthrough lies in the introduction of a knowledge graph forced attention mechanism containing constraint bias matrices, completely resolving industry-specific technical bottlenecks such as indicator omissions or business logic illusions that often occur in general AI generation models within industrial sub-sectors. Forced intervention at the underlying mathematical formula level ensures that every automatically generated template field set and its associated compliance verification rules are 100% strictly aligned with national urban wastewater treatment standards and the objective physical processes of water plants, fundamentally guaranteeing the professional rigor of intelligently generated templates and the legality and compliance of on-site business data collection.
[0044] Example 2 Further explanation in conjunction with Example 1, such as Figure 1 The structure shown illustrates how S3 utilizes an artificial intelligence model for compliance verification and intelligent interaction during execution: The compliance verification model takes on-site operational data as input in real time and uses graph neural networks to calculate the matching degree between on-site operational data and preset standard thresholds in the knowledge graph, outputting the data compliance judgment result and the level of abnormal data. The graph neural network algorithm of the compliance verification model adds the temporal feature dimension to simultaneously verify the threshold compliance of a single data point and the consistency of the process logic of operational data within a continuous time period, and identifies hidden operational anomalies that cannot be covered by conventional threshold verification. When on-site operational data triggers preset control thresholds, the subgraph matching algorithm of the knowledge graph identifies operational execution deviations and automatically pushes corresponding disposal suggestions, historical solutions, and compliance correction guidelines for the urban sewage treatment business scenario.
[0045] In the preferred scheme, when the water quality monitoring data in the on-site operation data exceeds the standard, process adjustment suggestions are automatically pushed, including pop-up prompts in forms, system message pushes, mobile reminders, etc., and the historical treatment plans corresponding to the abnormal water quality data are associated through the entity linking algorithm of knowledge graph.
[0046] During the on-site implementation of the operational compliance form template, the system utilizes an artificial intelligence model for in-depth compliance verification and intelligent interaction. This is a core element in achieving the transition from static recording to dynamic prevention in wastewater treatment management. Specifically, the system uses the compliance verification model to input on-site operational data in real time, employs a graph neural network to calculate the matching degree between the on-site operational data and preset standard thresholds in the knowledge graph, and simultaneously outputs the data compliance judgment results and abnormal data levels. To overcome the limitations of traditional single-point threshold alarms, the graph neural network algorithm of the compliance verification model incorporates a temporal feature dimension. This algorithm not only verifies the absolute value compliance of a single data point at the current moment in the spatial dimension but also simultaneously verifies the consistency of the process logic of operational data within a continuous time period in the temporal dimension. The formula for calculating the spatiotemporal compliance matching degree is as follows: ; In the formula, The spatiotemporal compliance matching score represents the degree of integration of temporal and spatial characteristics; This represents the operational data value input in real time at the current moment; This indicates the preset standard threshold for the business node obtained directly from the knowledge graph; This indicates the allowable variance based on standard thresholds set according to industry specifications; This represents the absolute threshold matching weight of the space allocated by the system; Represents the sequential logical consistency weights allocated by the system; This represents the maximum time-series sliding window step size used to trace historical data; Indicates the number of times since the current time. Historical data decay penalty factor at each time step; This represents the discrete gradient of operational data over a continuous time period; and These represent the exponential activation function and the hyperbolic tangent activation function, respectively. By introducing the calculation of discrete gradient changes, the system can accurately identify implicit operational anomalies where the absolute value of the data is still within a safe range, but the rate of change has seriously violated the logic of the upstream and downstream water treatment processes.
[0047] When the on-site operational data, after the above calculations, causes a sharp drop in the matching degree and triggers a preset control threshold, the system immediately activates a knowledge graph-based subgraph matching algorithm to locate and identify operational execution deviations. This algorithm extracts a real-time operational subgraph containing the current abnormal data node and its associated process nodes from the vast global knowledge graph, and performs a dual comparison of its topology and node attributes with the safety compliance standard operational subgraph defined in the knowledge graph. The formula for calculating the operational execution deviation score is as follows: ; In the formula, This represents the quantified operational performance deviation score; This represents the topology divergence weights set by the system. This represents the Kourbcclebler divergence function used to measure the difference between two probability distributions; This represents the probability distribution characteristics of the relationships between entity nodes and edges in the real-time operation subgraph; This represents the baseline probability distribution characteristics of the standard operational subgraph in the knowledge graph; Indicates the distance weight of node attribute features; This represents the node feature state matrix extracted from the real-time operation subgraph; The safety feature state matrix represents the standard operation subgraph; This represents the squared Frobenius norm used to calculate the distance between the feature matrices. After calculating the deviation score, the system will accurately pinpoint the specific process node that caused the deviation and automatically push corresponding treatment suggestions, historical solutions, and compliance correction guidelines to frontline personnel for the corresponding urban wastewater treatment business scenario.
[0048] In the core optimization scheme for urban wastewater treatment, when water quality monitoring data in on-site operation data exceeds standards, the system will trigger the most frequent and critical interactive action: automatically pushing targeted process adjustment suggestions. The automatic push is presented through various interactive methods, including built-in pop-up prompts in form interfaces, cross-level system message pushes, and mobile reminders for maintenance personnel. To ensure that the pushed historical treatment solutions have high reference value and emergency repair guidance, the system synchronously runs a knowledge graph entity linking algorithm in the background. This algorithm uses the current abnormal water quality data and its contextual process environment as the query source, searches for and associates the most matching historical treatment solutions in the knowledge graph's treatment experience base, and calculates the entity link recommendation score using the following formula: ; In the formula, This represents the entity link recommendation score used for sorting and filtering related schemes; The similarity weights are assigned to semantic feature vectors. This represents the context feature vector of the current abnormal water quality event, generated through encoding by a deep learning model. This represents the contextual feature vector of historical handling solutions stored in the knowledge graph experience base; This represents the cosine similarity function used to calculate the angle between two high-dimensional feature vector spaces. This indicates the reward weight that the system assigns to the frequency of successes in historical experience; This indicates the actual number of times the specific historical response plan has been successfully adopted and effectively mitigated risks in similar past anomalies. Using this formula, the system can prioritize linking the semantically most matching and historically most successfully validated response plan to the current form interface for direct use by operations personnel.
[0049] The aforementioned technical implementation scheme has extremely significant beneficial effects. Traditional form compliance verification is often limited to post-event comparison of single indicator thresholds, which is severely lagging behind changes in on-site processes. This implementation scheme innovatively introduces the temporal feature dimension into graph neural networks, endowing forms with the ability to perceive latent anomalies "from the smallest details," successfully achieving a leap from passive alarm for exceeding standards to early trend intervention. Furthermore, through the dual empowerment of subgraph matching algorithms and entity linking algorithms, when an anomaly occurs, the form is no longer just a rigid interface recording erroneous data, but instantly transforms into an intelligent decision-making hub integrating global process wisdom. It can not only immediately notify relevant responsible persons through multiple channels, but also directly present the best handling solution based on high-dimensional cosine similarity and historical success rate precise matching on the form filling interface. This design seamlessly embeds expert-level fault diagnosis and correction capabilities into the daily filling operations of grassroots employees, greatly reducing the emergency response time in cases of water quality exceeding standards or process deviations, reducing excessive reliance on manual on-site experience, and fundamentally ensuring the continuity, safety, and standardization of wastewater treatment plant operation and management.
[0050] In the preferred scheme, the methods for generating auxiliary decision-making suggestions based on on-site operational data in S3 include: The cost-efficiency optimization model takes into account real-time influent water quality data, influent water volume data, equipment operating parameters, and effluent water quality standards. With the constraint of continuous effluent water quality compliance and the optimization objective of minimizing chemical and power consumption, it outputs the optimal chemical dosage and equipment operating parameters, including aeration equipment operating frequency and sludge return ratio. The cost-efficiency optimization model incorporates a dynamic weight adjustment mechanism into the LightGBM lightweight gradient booster model, assigning higher weights to recent real-time operational data and inputting seasonal and rainfall characteristics as auxiliary variables to adapt to fluctuating influent water quality and quantity scenarios and improve parameter optimization accuracy. The equipment health prediction model takes into account real-time operating parameters, historical fault data and maintenance records, and outputs equipment health score and prediction of potential fault occurrence time. When the predicted fault occurrence time is within the next 24 to 48 hours, it automatically triggers a fault warning and pushes the corresponding maintenance plan. The Transformer time series prediction model of the equipment health prediction model incorporates a multi-scale feature extraction module to simultaneously capture the short-term mutation characteristics and long-term degradation trends of equipment operation data, thereby improving the accuracy of early potential fault identification and the advance warning.
[0051] During execution, the system generates auxiliary decision-making suggestions based on on-site operational data. Its core lies in achieving precise control and preventative maintenance of the entire wastewater treatment process through advanced artificial intelligence algorithms. The system first utilizes a cost-efficiency optimization model to dynamically optimize process parameters. This model receives real-time influent water quality data, influent water volume data, equipment operating parameters, and effluent water quality standards as input variables. Within the algorithm's computational space, the model sets continuous effluent water quality compliance as an insurmountable hard constraint, while simultaneously setting the minimum total chemical and electrical consumption of the entire process as the objective function. Through iterative solving in the high-dimensional feature space, the model can output the optimal chemical dosage and specific equipment operating parameters under the current complex operating conditions. These parameters specifically include the aeration equipment operating frequency and sludge return ratio. To accurately adapt to scenarios with drastic fluctuations in influent water quality and volume under different external environments, the cost-efficiency optimization model innovatively incorporates a dynamic weight adjustment mechanism based on the lightweight gradient booster algorithm, and simultaneously inputs seasonal and rainfall characteristics as auxiliary analysis variables. This dynamic weight adjustment mechanism assigns higher weights to real-time operational data that is more recent during model training and online prediction. The formula for calculating the dynamic training weights of the samples is as follows: ; In the formula, Indicates the first Dynamic training weights for each real-time operational data sample; This represents the basic weighting coefficient set by the system; Indicates the system timestamp at which the current prediction is located; Indicates the first The timestamp of when each historical operational data sample was collected by the system; This represents the time decay penalty factor, used to control the decay rate of historical data weights; Indicates the first The comprehensive inflow fluctuation gradient corresponding to each historical operational data point; This represents the amplified environmental fluctuation parameters constructed by integrating seasonal and rainfall characteristics. Through this computational mechanism, the model can quickly capture and adapt to sudden water quality fluctuations caused by heavy rainfall or seasonal changes, significantly improving the accuracy and robustness of the parameter optimization output.
[0052] At the equipment operation and maintenance level, the system utilizes an equipment health prediction model to provide proactive protection for core equipment. This model receives real-time operating parameters, historical fault data, and maintenance records as input time series. Through deep network calculations, it outputs a continuous health score for the current moment and extrapolates the occurrence time of potential faults along the time axis. If the model predicts a fault occurrence within a critical window of 24 to 48 hours, the system automatically triggers the highest-level fault warning and simultaneously pushes corresponding professional maintenance solutions to the operation and maintenance terminals. To overcome the limitations of traditional prediction models in simultaneously capturing early subtle anomalies and slow aging trends, the equipment health prediction model incorporates a multi-scale feature extraction module at the front end of its core time series prediction model encoder. This module employs a parallel network architecture to simultaneously capture short-term abrupt changes and long-term degradation trends in equipment operating data. The extraction and calculation formula for its multi-scale fusion feature matrix is as follows: ; In the formula, This represents the fused feature matrix output by the multi-scale feature extraction module; This represents the initial matrix of the time series of input device operating parameters; This represents a small-sized one-dimensional convolutional kernel used to capture short-term abrupt changes in device parameters; This represents the standard one-dimensional convolution operator; This refers to a large-size dilated convolutional kernel used to capture long-term degradation trends in device performance. This represents an expanded convolution operator with a specific dilation rate parameter, which increases the receptive field to obtain long-period degradation information; and These represent the adaptive fusion weights for short-term abrupt changes and long-term degradation trends learned by the network through the self-attention mechanism, respectively. This computational logic ensures that the feature matrix input to the prediction model includes both instantaneous mechanical impact information and cross-cycle component wear trend information.
[0053] The aforementioned method of generating auxiliary decision-making suggestions has brought groundbreaking and beneficial effects to the on-site operation of urban wastewater treatment plants. On the one hand, the cost-efficiency optimization model, through a dynamic weighting mechanism and the introduction of environmental characteristic variables, breaks through the predicament of traditional water process control relying heavily on lagging human experience. Within the red line of ensuring absolute compliance of effluent quality, it explores the ultimate balance point between reagent dosage and power consumption, effectively resolving the problems of excessive reagent dosage and over-aeration caused by sudden increases in rainfall or seasonal changes, and achieving a significant reduction in the overall operating cost of the entire plant. On the other hand, the equipment health prediction model and its innovative multi-scale feature fusion mechanism endow the operation and management system with a keen fault perception capability, completely transforming reactive emergency repairs into proactive preventive maintenance. The 24-48 hour lead time provides ample buffer time for spare parts procurement, personnel scheduling, and water treatment process bypass switching. This not only greatly improves the accuracy of identifying potential minor faults in the early stages, but also avoids the crisis of plant-wide shutdown and high equipment replacement costs caused by sudden paralysis of core units, thus comprehensively ensuring the continuity of production, safety, and the health of equipment assets throughout their entire life cycle.
[0054] In the preferred scheme, the natural language understanding model based on BERT bidirectional encoder representation combined with sequence labeling algorithm achieves the optimal drug dosage and equipment operating parameter output in the following way: Using the BERT bidirectional encoder representation model, semantic features are extracted from textual data of process specifications, historical process adjustment plans, and water quality anomaly handling records in the urban wastewater treatment industry to generate process rule feature vectors. Using a sequence labeling algorithm, entity annotation is performed on the feature vector of process rules to identify key water quality factors, equipment operating conditions, and process constraint factors affecting reagent dosage, aeration equipment operating frequency, and sludge return ratio. The labeled key factors are then transformed into structured constraints. A comprehensive operating cost function is constructed by combining the optimization objective of minimizing reagent and power consumption. This function is then solved within the constraint space of achieving effluent quality standards to output the optimal parameters. The calculation formula for the comprehensive operating cost function is as follows: ; In the formula, This represents the overall operating cost index; Indicates the total number of drug types; This indicates the total number of high-energy-consuming devices; Indicates the first The unit dose cost coefficient of the drug; Indicates the first Dosage of the agent; Indicates the first Energy consumption coefficient per unit operating frequency of each device; Indicates the first The operating frequency of each device; This represents the chemical consumption weighting adjustment factor that is dynamically allocated based on real-time water quality status. This represents the power consumption weighting adjustment factor dynamically allocated based on real-time water quality status.
[0055] In practical implementation, the natural language understanding model based on bidirectional encoder representation, combined with sequence labeling algorithms, achieves optimal reagent dosage and equipment operating parameter output. This is a key technical path for transforming the experience of grassroots experts in wastewater treatment into quantitative mathematical optimization. First, the system utilizes the bidirectional encoder representation model to perform deep semantic feature extraction on unstructured text data from the urban wastewater treatment industry. This text data mainly covers national and local promulgated process specifications, historical process adjustment plans accumulated by water plants over many years, and records of handling water quality anomalies in response to emergencies. Through its internal multi-head attention mechanism, the bidirectional encoder representation model can accurately vectorize each professional term in the text while simultaneously considering the context. By scanning and semantically encoding the massive amount of text sentence by sentence, the system transforms the process guidelines and boundary conditions, which originally relied on manual reading comprehension, into a high-dimensional continuous spatial feature matrix that can be processed by a computer, i.e., generating process rule feature vectors.
[0056] Subsequently, the system uses a sequence labeling algorithm to perform entity labeling and classification extraction on the generated process rule feature vectors. In this step, the model assigns corresponding labels to the feature vector sequences, thereby accurately identifying key factors affecting core process actions such as reagent dosage, aeration equipment operating frequency, and sludge return ratio. These factors are specifically divided into three categories: first, key water quality factors, such as the upper limit of influent chemical oxygen demand concentration or ammonia nitrogen mutation threshold; second, equipment operating condition factors, such as the safe operating frequency range of blowers or the maximum head limit of centrifugal pumps; and third, process constraint factors, such as the minimum dissolved oxygen concentration that the biological treatment tank must maintain or the safe boundary of sludge concentration. The system transforms these discrete key factors extracted by the sequence labeling algorithm into structured constraints that can participate in mathematical calculations. For example, the phrase "dissolved oxygen must not be lower than two milligrams per liter" in the text is transformed into specific inequality constraints.
[0057] After establishing the structured constraints generated by automatic text parsing, the system uses the continuous achievement of effluent quality standards as the absolute constraint space, and combines this with the optimization objective of minimizing chemical and power consumption to construct a comprehensive operating cost function. The system uses an optimization algorithm to find the minimum value of the comprehensive operating cost function within this constraint space, thereby safely and accurately outputting the optimal parameters. The formula for calculating the comprehensive operating cost function is as follows: ; In the formula, This represents the comprehensive operating cost index after systematic quantitative calculation. This index is a core indicator for evaluating the economic benefits of the current process control strategy. The smaller the value, the better the overall operating cost. This indicates the total number of types of chemicals actually used in the current wastewater treatment process, such as phosphorus removal chemicals or carbon source chemicals. This indicates the total number of high-energy-consuming devices that are directly controlled by the system in the entire process section, such as large aeration blowers or sludge return pumps. Indicates the first The unit dosage cost coefficient of a certain drug is determined by the actual purchase price and the conversion ratio; The system output represents the first... The optimal dosage of the agent; Indicates the first The unit operating frequency energy consumption coefficient of a device reflects the power consumption characteristics of the device when operating at a unit frequency; The system output represents the first... The optimal operating frequency of each device; This represents the chemical consumption weighting adjustment factor that is dynamically allocated based on real-time water quality status. This represents the power consumption weight adjustment factor dynamically allocated based on real-time water quality status. The chemical consumption weight adjustment factor and the power consumption weight adjustment factor will be dynamically and mutually exclusively adjusted according to the fluctuation of influent water quality. For example, when the carbon-nitrogen ratio of the influent is severely imbalanced and a large amount of external carbon source needs to be added, the system will automatically reduce the chemical consumption weight adjustment factor to give the chemical dosing greater tolerance and ensure that the process requirements for water quality compliance are met first.
[0058] The above-described technical implementation scheme is highly innovative and has significant beneficial effects. Traditional optimal parameter solutions often rely solely on pure mathematical regression or empirical settings, which can easily exceed the physical safety limits of equipment or violate industry standards when pursuing extremely low energy or chemical consumption. This implementation scheme innovatively combines natural language processing technology with mathematical operations research optimization. Through bidirectional encoder representation models and sequence labeling algorithms, the system "understands" and "remembers" the safety boundaries in water plant operating procedures and historical experience, transforming them into strict structured mathematical constraints. This approach not only significantly reduces the global search blind spot of the optimization algorithm and improves the solution speed and accuracy, but also fundamentally ensures that the output chemical dosage and equipment operating frequency are not only optimal in terms of economic cost, but also absolutely comply with industry compliance requirements and equipment safe operation logic, achieving a perfect balance between cost reduction and efficiency improvement in water plants and safe process operation.
[0059] In the preferred scheme, the multivariate regression prediction model based on the LightGBM lightweight gradient lift machine is used to adapt to the scenario of fluctuating influent water quality and quantity and improve the accuracy of parameter optimization. The method is as follows: feature engineering is performed on the collected historical operation data to extract multi-dimensional features of influent water quality, quantity, equipment operation, season, and rainfall, and the training set and validation set are divided according to the time series. A dynamic weight adjustment mechanism incorporating water quality fluctuations is introduced into the objective function of the LightGBM lightweight gradient booster to calculate training weights for each sample in the training set. The formula for calculating the dynamic training weights is as follows: ; In the formula, Indicates the first Training weights for each historical operational data sample; Represents the basic weight constant; Indicates the timestamp of the current prediction time; Indicates the first The timestamp of when each historical operational data sample was collected; This represents the historical attenuation coefficient. Indicates the first The comprehensive fluctuation gradient of influent water quality and quantity corresponding to each historical operational data point; This represents the amplification factor of abnormal fluctuations extracted based on rainfall and seasonal characteristics. The model uses this weight as a basis to complete pre-training with a histogram algorithm, and after being put into use, it is updated online incrementally at fixed time windows.
[0060] In practical implementation, a multivariate regression prediction model based on a lightweight gradient lift machine is the core underlying technology for the system to cope with drastic fluctuations in influent water quality and quantity and improve the accuracy of process parameter optimization. First, the system performs in-depth feature engineering processing on the massive historical operational data collected. The system extracts water quality characteristics such as influent chemical oxygen demand (COD) and ammonia nitrogen concentration, instantaneous influent flow rate and cumulative water volume, and operational characteristics such as the start-up and shutdown status and operating frequency of electromechanical equipment from the raw sensing data. Simultaneously, it incorporates seasonal change characteristics and rainfall characteristics from external meteorological data, thereby constructing a multi-dimensional, high-quality feature matrix covering both the internal mechanisms of the water plant and the external environment. After feature extraction, the system strictly divides the multi-dimensional data into training and validation sets according to the chronological order of data collection. This time-series segmentation effectively preserves the objective temporal causal relationships and state evolution patterns in the wastewater treatment process, avoiding the problems of data time-series traversal and premature leakage of future information caused by traditional random segmentation, ensuring a high degree of consistency between the model training environment and the real physical environment.
[0061] To enhance the model's adaptability and resilience to complex and extreme conditions such as torrential rains or seasonal transitions, the system innovatively incorporates a dynamic weight adjustment mechanism that incorporates water quality fluctuations into the objective function of the lightweight gradient booster. Conventional machine learning models typically assign the same weights to all training samples. This often leads to the model prioritizing stable period data over less frequent but crucial data from periods of significant fluctuation, resulting in severely inaccurate predictions during sudden changes in water quality. This implementation overcomes this deficiency by independently calculating dynamic training weights for each sample in the training set. This allows the model to be guided to focus more on recently occurring, critical samples with significant operational fluctuations during gradient descent optimization and splitting. The specific formula for calculating these dynamic training weights is as follows: ; In the formula, This indicates the first result obtained through systematic quantization calculation. The dynamic training weights of the historical operational data samples are directly multiplied into the first gradient and second Hessian matrix of the objective function of the lightweight gradient booster to intervene in the growth direction of the decision tree. This represents the basic weight constant set by the system based on the global convergence requirements of the underlying algorithm, serving as the benchmark for weight scaling; This represents the absolute system timestamp of the current execution of the prediction task; Indicates the first The absolute timestamp of each historical operational data sample when it is collected by the underlying IoT devices and stored in the database; This represents the historical experience decay coefficient. This parameter determines that the larger the time difference, the more exponentially the sample weight decays, thus ensuring that the model prioritizes learning and fitting the recent water plant equipment status and process rules. Indicates the first The comprehensive fluctuation gradient of influent water quality and quantity corresponding to each historical operational data point objectively reflects the degree of instantaneous change in the load of the water treatment system at the time of sample collection. This represents an amplification factor for abnormal fluctuations extracted based on rainfall and seasonal characteristics. When the system identifies that it is currently in the flood season or that there are sudden heavy rainfall meteorological conditions, the value of this amplification factor will be significantly increased to multiply the training weight of the corresponding abnormal samples.
[0062] After establishing the dynamic training weight sequence for all samples, the model uses this weight distribution as a basis and employs a histogram algorithm to discretize continuous high-dimensional features into bins. The histogram algorithm maps continuous floating-point feature values to discrete integer bins, significantly reducing memory consumption during model training and greatly accelerating the computational efficiency of finding optimal feature split points in the decision tree, thus efficiently completing offline pre-training of the multivariate regression prediction model. Once the model is officially put into field operation, the mechanical wear and tear of equipment and the characteristics of the climate environment in urban wastewater treatment plants will undergo continuous and regular drift over time. Therefore, the system sets a fixed time window to automatically collect newly added actual on-site operational data and corresponding process execution feedback results periodically. The system uses the same calculation formula mentioned above to assign extremely high time weights to these new data, performing online incremental updates to the existing artificial intelligence model. This incremental learning mode eliminates the need for consuming massive computing power to retrieve all historical data for retraining each time, greatly saving system computing resources while maintaining the model's cutting-edge vitality.
[0063] The above-described implementation methods have brought significant benefits to the refined and intelligent operation of urban wastewater treatment plants. The dynamic weight adjustment mechanism fundamentally solves the technical bottlenecks of general regression models, which tend to exhibit oversmoothing and passivation of abnormal conditions when the majority of samples are stable operating conditions. Through a dual multiplicative superposition logic of time exponential decay and fluctuation gradient amplification, the model is systematically forced to focus on more decision-making-oriented recent equipment states and surge load conditions. This results in the optimal reagent dosage and equipment operating parameters having extremely high predictive accuracy and control robustness when facing sudden changes in influent water quality or quantity. Combined with the histogram algorithm and a fixed-time-window online incremental update mechanism, the system not only achieves a high degree of adaptability to fluctuations in complex external environments but also ensures that the cost-efficiency optimization strategy closely follows the evolution of the actual process state of the water plant without increasing the burden of additional hardware computing power. This maximizes the economic potential for energy saving and consumption reduction while strictly adhering to the environmental bottom line of ensuring safe effluent water quality.
[0064] In the preferred scheme, the specific execution method of the equipment health prediction model is as follows: after completing and normalizing the input real-time operating parameters, historical fault data and maintenance records of the equipment, the time series data segments are divided according to a fixed time step and input into the Transformer time series prediction model. The model outputs the device health score at the current moment and, in parallel, outputs the predicted health trajectory for multiple future time steps; the calculation formula for the device health score is: ; In the formula, This represents the device health score, with a value range of [value missing]. Up to 100 points; This indicates the total number of dimensions of equipment operating parameters included in the health assessment; Indicates the first Real-time measured actual values of each parameter dimension; This indicates that the Transformer time series forecasting model is based on the output of normal operating conditions. Theoretical predicted values for each parameter dimension; and They represent the first The upper and lower limits of safe physical values for each parameter dimension; Indicates the first Fault sensitivity weights across multiple parameter dimensions; This represents the time-series penalty function for mechanical losses that increases with the equipment's operating time; when a certain time step in the predicted trajectory... When the health level falls below the preset health threshold, a fault warning will be automatically triggered within the next 24 to 48 hours, and a corresponding maintenance plan will be pushed out.
[0065] In practical implementation, the accurate execution of the equipment health prediction model is a crucial line of defense for ensuring the continuous and stable operation of the core units of urban wastewater treatment plants. The system first performs in-depth preprocessing on the multi-dimensional equipment data collected from on-site IoT nodes. This data includes not only dynamic parameters such as real-time motor operating current, bearing vibration frequency, and coil temperature, but also historical fault occurrence characteristics and records of major overhauls and routine lubrication maintenance. Since industrial field sensor data inevitably suffers from network transmission packet loss or sampling signal blind spots, the system uses a time-series interpolation algorithm to accurately complete missing values, and then uses a max-min scaling method to eliminate numerical differences between different physical dimensions. After completing basic data cleaning and normalization preprocessing, the system truncates continuous long-term operating data into equal-length time-series data segments at fixed time steps, using these segments as standard high-dimensional input tensors fed into the underlying time-series prediction model based on a self-attention mechanism.
[0066] This time series prediction model overcomes the limitations of traditional equipment monitoring methods, which can only reflect the present moment. It not only outputs a quantitative assessment of the equipment's health status at the current time but also extrapolates the predicted trajectory of health performance degradation over multiple consecutive time steps along the time axis. When calculating the comprehensive equipment health score at the current moment or a future time node, the system employs a deduction-based quantitative evaluation function that integrates multi-dimensional parameter dynamic residuals and long-term mechanical aging patterns. The formula for calculating the equipment health score is as follows: ; In the formula, This represents the equipment health score obtained after systematic quantitative calculation, and its value range is strictly defined within... Scores range from 100 to 100, with a perfect score representing the equipment in a brand-new, wear-free operating condition. This indicates the total number of operating parameter dimensions included in the comprehensive health assessment system for the equipment, covering various independent monitoring characteristics such as temperature, current, and vibration. Indicates the first The actual real-time measured values of each parameter dimension are acquired and input from the on-site sensors. This represents the first output of the time series forecasting model, derived from a large amount of historical, fault-free, normal operating data. Theoretical predicted values for each parameter dimension; and These represent the system's settings based on the equipment's nameplate and industrial physical limits. The upper and lower limits of safety physical values for each parameter dimension, and the range of their difference, are used to normalize the scale penalty for the absolute deviation between the actual measured value and the theoretical prediction value. Indicates the first The system assigns extremely high weights to core feature dimensions such as sudden vibrations that are highly likely to cause catastrophic shutdowns, in order to amplify their abnormal signals. This represents the time-series penalty function for mechanical losses that increases with the equipment's operating time, used to objectively reflect the irreversible physical aging process of equipment metal fatigue and component wear.
[0067] The underlying calculation logic of this formula lies in setting the initial optimal state of the equipment as full score, and dynamically deducting health scores by accumulating the normalized weighted residuals between the actual state and the theoretical normal state of various operating parameters. In each calculation cycle, the model generates a health score trajectory curve extending into the future. When the equipment health score in one or more future time steps falls below the system's preset health safety threshold for that type of equipment, the system determines that the equipment is about to experience substantial mechanical damage or electrical failure. At this point, the system will break from its usual silent monitoring state and automatically trigger early warning signals for potential faults within the next 24 to 48 hours. Simultaneously, it will cross system boundaries and call upon the maintenance experience network in the knowledge graph to accurately push standard maintenance plans and parts replacement guidelines that perfectly match the type of potential fault to the terminal equipment of on-site engineers and managers.
[0068] The aforementioned technical implementation scheme has brought significant benefits to the management and daily operation and maintenance of heavy-asset equipment in wastewater treatment plants. Traditional equipment status monitoring often relies on simple absolute threshold over-limit alarms. This delayed, reactive alarm mechanism usually detects anomalies when the equipment is already severely damaged, leading to unplanned shutdowns of the entire water treatment process line. This implementation scheme innovatively combines time series prediction algorithms with a multi-dimensional deduction-based health calculation formula. It can not only accurately measure the current sub-health state of the equipment but also predict the future trajectory of performance degradation. The scientifically introduced fault sensitivity weight and mechanical wear time-series penalty function in the formula make the health scoring mechanism extremely sensitive to instantaneous abnormal parameter changes while fully considering the objective law of natural physical performance degradation caused by long-term high-load service, thus eliminating false alarms and missed alarms. More importantly, the deterministic early warning lead time of up to 24 to 48 hours completely transforms the previously chaotic emergency repairs into calm and orderly predictive maintenance. This provides water plants with ample time to allocate spare parts, arrange construction personnel, and switch process bypasses, greatly reducing the unplanned downtime rate of core equipment and the overall maintenance cost. It also provides solid support for the efficient and compliant operation of urban wastewater treatment from the perspective of the safety and reliability of the underlying hardware.
[0069] In the preferred embodiment, the multi-scale feature extraction module is executed as follows: a multi-scale feature extraction module containing one-dimensional convolutional branches and dilated convolutional branches is set at the encoder input of the Transformer time series prediction model. By using different convolutional kernels to extract and fused features from short-term abrupt changes and long-term degradations in device operation data, a multi-scale fusion feature matrix is generated and input into the encoder. The calculation formula for the multi-scale fusion feature matrix is as follows: ; In the formula, This represents the output multi-scale fused feature matrix; This represents the time series matrix of the input device operating parameters; This represents a small-sized one-dimensional convolutional kernel used to capture short-term, high-frequency abrupt changes in device parameters; This represents a standard one-dimensional convolution operation; This refers to a large-size dilated convolutional kernel used to capture long-term, low-frequency degradation trends in device performance. Indicates the expansion rate The dilated convolution operation; and These represent the attention weights for short-term mutation features and long-term degradation features, which are dynamically allocated through a self-attention mechanism, respectively.
[0070] In the specific implementation of equipment health prediction, the execution method of the multi-scale feature extraction module is the core architectural design for improving the sensitivity and accuracy of the entire prediction system. Before a failure occurs, the underlying operating parameters of core equipment in urban wastewater treatment plants often exhibit two distinct physical characteristics simultaneously: short-term high-frequency parameter mutations and long-term low-frequency performance degradation. To simultaneously capture these two different dimensions of features, the system innovatively incorporates a multi-scale feature extraction module with one-dimensional convolutional and dilated convolutional branches at the encoder input of the Transformer time series prediction model.
[0071] This module utilizes a parallel computing mechanism to synchronously process the input device operation data using convolutional kernels of different sizes and receptive fields. Specifically, the one-dimensional convolutional branch uses a smaller kernel to scan the input data sequence with extremely high temporal resolution, accurately capturing short-term abrupt changes such as instantaneous motor current overload or brief high-frequency bearing vibration. The dilated convolutional branch, by introducing a dilation rate parameter, multiplies the receptive field of the convolutional kernel without increasing the number of network physical parameters, thus enabling the extraction of long-term degradation trends in device performance across an extremely wide time span. The system then adaptively weights and merges these two data streams, representing short-term and long-term features respectively, to generate a multi-scale fused feature matrix, which is input into the subsequent encoder. The quantization formula for the multi-scale fused feature matrix is as follows: ; In the formula, This represents the multi-scale fusion feature matrix that is finally output after parallel computation and weighted fusion of two convolutional branches. This matrix completely preserves the multi-band abnormal signals during device operation in a single feature space. This represents the input device operating parameter time series matrix, which contains the raw sensing sequences such as temperature and amplitude continuously and stably collected from the underlying sensors; This refers to a small-sized one-dimensional convolutional kernel configured within a one-dimensional convolutional branch to capture short-term, high-frequency abrupt changes in device parameters. This represents the standard discrete one-dimensional convolution mathematical operation; This refers to a large-size dilated convolution kernel configured within the dilated convolution branch to capture the long-term, low-frequency degradation trend of device performance. This indicates that a specific inflation rate has been introduced. The dilated convolution operation enables the extraction of cross-step features from long-period historical data without losing global topological information. and These represent the attention weights for short-term mutation features and long-term degradation features, which are dynamically calculated and allocated by the model through its built-in self-attention network structure. These two dynamic weight factors can automatically adjust the proportions of short-term mutation signals and long-term gradual change signals in the final feature matrix according to the different mechanical wear or electrical aging stages of the current device.
[0072] The implementation method described above, which includes a multi-scale feature extraction module, has brought significant benefits to equipment health prediction and operation and maintenance decisions in urban wastewater treatment plants. Traditional time series prediction models are often limited by a single receptive field size. While successfully capturing macroscopic aging trends, they often filter out transient, destructive micro-mutations. Furthermore, when focusing on transient anomalies, they are easily lost in local noise, neglecting long-term performance drift. This solution cleverly integrates standard one-dimensional convolution and dilated convolution in the feature encoding input stage, perfectly balancing high-frequency transient sensitivity with a low-frequency global macroscopic perspective. The dynamic attention weight allocation mechanism introduced in the formula further endows the model with the ability to adaptively adjust the feature focus based on the current actual operating conditions of the equipment. This innovative feature extraction at the algorithmic level greatly improves the model's accuracy in identifying early, weak, and complex fault signals hidden in massive amounts of normal data, significantly extending the lead time for the system to issue fault warnings. This provides the on-site maintenance team with more time for hazard mitigation and equipment deployment, fundamentally reducing the unexpected damage rate of heavy-asset equipment.
[0073] Example 3 Further explanation in conjunction with Example 1, such as Figure 1 As shown in the diagram, the method for iteratively optimizing the structure and content of the operational guidelines form template in S4 based on the analysis results is as follows: Based on the preset urban wastewater treatment standards, anomaly identification and trend prediction analysis are performed on the collected on-site operation data. Based on the results of anomaly identification and trend prediction analysis, the structure of key monitoring items, related data content, and data collection frequency in the existing operation specification form template are optimized in reverse. A new generation of operation specification form template is generated and put into collection and verification. Unlike the traditional fixed operation form template, the operation specification form template can be dynamically adapted to the process upgrades, equipment iterations, and policy and standard updates of urban wastewater treatment plants.
[0074] In S4, the method of using backhaul data to perform closed-loop upgrades of the artificial intelligence model is as follows: incremental learning algorithms are used to fine-tune the artificial intelligence model. The incremental learning algorithm incorporates a catastrophic forgetting suppression mechanism with regularization constraints to prevent the model from losing its ability to fit existing routine conditions after learning new operating conditions. The total loss function in the incremental learning stage is calculated as follows: ; In the formula, This represents the total loss function during the incremental fine-tuning phase. This represents the current prediction error loss calculated by the model based on the newly added training set, such as on-site operational data and process optimization results. This represents the total number of parameter nodes in the model's neural network; Indicates the first step in the fine-tuning process The current iteration value of each model weight; This indicates the first element in the original model before incremental fine-tuning. A fixed value for each weight; Indicates the first The diagonal elements of the sensitivity evaluation matrix for each weight to historical routine operating condition characteristics; This represents the regularization strength coefficient that balances the learning of new knowledge with the retention of old knowledge; at the same time, a model performance evaluation mechanism is set, which triggers full retraining when any indicator falls below the preset upgrade threshold.
[0075] In the preferred solution, multiple artificial intelligence models achieve cross-scenario linkage. The fault warning information output by the equipment health prediction model is automatically synchronized to the template generation optimization model, and the corresponding fault-specific inspection items are automatically added to the equipment maintenance and operation specification form template. The prediction results of the water quality trend prediction model are automatically synchronized to the compliance verification model, and the early warning thresholds of the form data are updated in advance, realizing intelligent collaboration throughout the entire process.
[0076] In the closed-loop phase of the system's entire process, iteratively optimizing the structure and content of the operational specification form template based on multi-dimensional analysis results is a crucial step in achieving the dynamic growth of management tools. The system performs in-depth anomaly identification and trend prediction analysis on the massive amounts of continuously collected on-site operational data, based on preset urban wastewater treatment standards. Based on the results of this anomaly identification and trend prediction analysis, the system reverse-engineers the structure of key monitoring items, related data content, and data collection frequency in the existing operational specification form template. Specifically, when data trend prediction indicates a high-frequency anomaly risk in a certain process step in the near future, the system automatically adds key monitoring items for that risky step to the next-generation template and increases the corresponding data collection frequency. This deep integration of data feedback generates a new generation of operational specification form templates, which are then deployed for on-site data collection and verification. This completely distinguishes them from traditional, fixed operational form templates, truly achieving dynamic and seamless adaptation of the operational specification form templates to urban wastewater treatment plant process upgrades, equipment iterations, and policy standard updates.
[0077] Simultaneously, the system utilizes a large amount of feedback data to perform closed-loop upgrades on the underlying artificial intelligence model. Considering that the evolution of the operating environment and climate conditions of urban wastewater treatment plants is a long-term and gradual process, the system employs an incremental learning algorithm to fine-tune the artificial intelligence model, absorbing the latest field operating experience with the most efficient computing power. To prevent the model from forgetting previously mastered routine operating rules after learning new operating conditions, a catastrophic forgetting suppression mechanism with regularization constraints is specifically incorporated into the incremental learning algorithm. This mechanism effectively prevents the model from losing its ability to fit existing routine operating conditions after learning new ones. During the incremental learning phase, the total loss function used for optimization calculations is calculated as follows: ; In the formula, This represents the total loss function during the incremental fine-tuning phase, used to guide the gradient descent direction of the overall model weights; This represents the current prediction error loss calculated by the model based on the newly added training set, such as on-site operational data and process optimization results. It represents the degree to which the model has learned new knowledge and the fitting error. This represents the total number of parameter nodes in the model's neural network; Indicates the first step in the fine-tuning process The current iteration value of each model weight; This indicates the first element in the original model before incremental fine-tuning. A fixed value for each weight, i.e., the historical experience anchor points that are saved; Indicates the first The diagonal elements of the sensitivity evaluation matrix for the characteristics of historical routine operating conditions are weighted. The larger the value, the more important the corresponding weight parameter is for maintaining the memory of historical routine operating conditions. This represents the regularization strength coefficient that balances the learning of new knowledge with the retention of old knowledge. Using this formula, when an AI model attempts to significantly modify key weights that are extremely sensitive to historical operating conditions during fine-tuning, the regularization term in the latter half of the formula incurs a substantial penalty loss. This forces the model to adapt to new process conditions by utilizing relatively redundant parameters in the network, while retaining core historical experience.
[0078] While performing incremental fine-tuning, the system rigorously establishes a model performance evaluation index mechanism to quantitatively monitor the actual operational health of each model. When any model performance evaluation index falls below a preset upgrade threshold, the system determines that simple incremental fine-tuning is insufficient to eliminate the negative impact of drastic shifts in the underlying data distribution. In this case, it automatically triggers a full retraining process for the corresponding AI model to ensure the absolute prediction accuracy of the algorithm. Furthermore, in the preferred technical implementation, the system breaks through the limitations of single-point intelligence, achieving deep cross-scenario linkage among multiple AI models. Fault warning information output by the equipment health prediction model is automatically synchronized to the template generation optimization model, enabling the system to automatically add specific inspection items for potential faults when automatically generating and issuing equipment maintenance and operation specification form templates. Simultaneously, the prediction results of the water quality trend prediction model are also automatically synchronized to the compliance verification model, thereby updating the warning thresholds of the form data in advance and achieving intelligent collaborative management throughout the entire process.
[0079] The above-described implementation methods have brought decisive and beneficial effects to the intelligent transformation of urban wastewater treatment plants. On the one hand, the dynamic iteration of form templates and the incremental learning of artificial intelligence models complement each other, enabling the control system to have a continuously evolving vitality and completely solving the technical pain points of traditional water management software becoming outdated as soon as it goes online and being unable to adapt to on-site process upgrades and standard changes. The combination of a catastrophic forgetting suppression mechanism and a specific loss function formula ensures a smooth transition between old and new operating conditions from the algorithmic level, avoiding control logic oscillations caused by model upgrades and greatly improving the system's availability and stability. On the other hand, multi-model cross-scenario linkage breaks down the information silos between traditional single algorithm tools, seamlessly integrating equipment early warning, dynamic water quality prediction, and form standard execution. This design allows expert-level fault diagnosis and water quality intervention strategies to be directly translated into specific operational instructions on the forms held by frontline employees, achieving true full-process intelligent collaboration. This not only significantly improves the proactive management efficiency of wastewater treatment plants in dealing with complex and abnormal operating conditions but also provides a highly valuable implementation path for form-driven intelligentization in similar complex industrial scenarios.
[0080] Example 4 To further illustrate with Example 1, when actually deploying and implementing this standardized form system for urban wastewater treatment operations that incorporates artificial intelligence, the overall architecture can unfold along four standard steps: underlying data, AI hub, front-end interaction, and closed-loop iteration. It can also be flexibly built using containerized microservices. During the deployment phase of the underlying data warehouse and knowledge graph, it is recommended to use Docker containerization technology for environment isolation and rapid deployment. Specifically, PostgreSQL can be used as the core relational structured data warehouse to store time-series characteristic data such as water quality, equipment, and chemical consumption after cleaning; the knowledge graph storage and graph traversal computation can be deployed using the Neo4j graph database. For the overall infrastructure, elastic computing resources can be configured relying on cloud service providers such as Tencent Cloud. For data nodes with extremely high real-time requirements or those requiring local storage, lightweight edge deployment on NAS devices at the wastewater treatment plant can also be chosen.
[0081] In the deployment of AI models and natural language interaction modules, the core template generation optimization model can be rapidly orchestrated and deployed using the Dify platform. Through the Dify platform, the BERT bidirectional encoder representation model and sequence labeling algorithm can be efficiently integrated to achieve accurate parsing of natural language operational needs and directly connect to the knowledge graph to complete mandatory attention matching for required fields. For compliance verification GNN graph neural networks on the process side, cost and energy efficiency optimization LightGBM models, and equipment health prediction Transformer models, these can be uniformly developed as independent microservice API modules based on the PyTorch deep learning framework in the Python environment. These modules will continuously run in the background as AI inference engines, receiving real-time field data and outputting prediction parameters.
[0082] In the front-end form generation and intelligent interaction execution phases, the system's presentation layer can utilize Vue.js or React frameworks combined with an open-source low-code form engine to dynamically render the initial template framework generated by AI and provide a data entry interface. To ensure efficient communication between the front-end and back-end interfaces and high-concurrency access from multiple terminals on-site, it is recommended to configure Nginx as a high-performance reverse proxy and load balancer server for unified routing and distribution. Furthermore, for historical handling solutions, compliance guidelines, and various operating procedure texts associated with the knowledge graph, a dedicated cloud-based knowledge base system can be built using Wiki.js. When the compliance verification model detects excessive water quality or equipment warnings, the form front-end can directly call the API to seamlessly and accurately display the corresponding regulatory guidance content from Wiki.js in a pop-up window, completing the intelligent interaction.
[0083] During the implementation phase of end-to-end data closure and incremental learning, the system needs to build an automated incremental data feedback pipeline. On-site collected form business data, underlying device sensor data, and the actual execution effects of decision support can be transmitted back to the cloud in real time via message queue middleware. An additional automated scheduling module needs to be deployed in the backend. When the model performance evaluation metrics trigger the preset 90%-95% upgrade threshold, the scheduling script automatically pulls the latest incremental dataset from PostgreSQL, triggering incremental training tasks for LightGBM and Transformer with catastrophic forgetting suppression mechanisms. After model fine-tuning and weight updates are completed, the scheduling system smoothly restarts the corresponding Docker container service through automated operation and maintenance tools, thereby achieving seamless bidirectional dynamic iteration of operational standard form structures and AI model parameters without interrupting on-site operations.
[0084] The above embodiments are merely preferred technical solutions of the present invention and should not be considered as limitations on the present invention. The scope of protection of the present invention should be limited to the technical solutions described in the claims, including equivalent substitutions of the technical features described in the claims. That is, equivalent substitutions and improvements within this scope are also within the scope of protection of the present invention.
Claims
1. A method for implementing an operational standard form template that also takes into account the use of artificial intelligence, characterized by: Includes the following steps: S1. Construct a structured data warehouse and knowledge graph for the field of urban wastewater treatment, and train artificial intelligence models for corresponding urban wastewater treatment business scenarios based on the structured data warehouse; S2. Analyze the urban sewage treatment operation requirements input by natural language, and generate operation specification form templates by combining knowledge graphs and artificial intelligence models; S3. Collect on-site operational data through the generated operational specification form template, use artificial intelligence models for compliance verification and intelligent interaction during the execution process, and generate auxiliary decision-making suggestions based on the on-site operational data; S4. The collected on-site operational data and the execution effect data of auxiliary decision-making suggestions are fed back to the artificial intelligence system for multi-dimensional analysis. Based on the analysis results, the structure and content of the operational standard form template are iteratively optimized. At the same time, the fed-back data is used to upgrade the artificial intelligence model in a closed loop.
2. The method for implementing an operational standard form template that takes into account the use of artificial intelligence as described in claim 1, characterized in that: The methods for constructing a structured data warehouse and knowledge graph in the field of urban wastewater treatment in S1 include: Collect historical operational data from urban wastewater treatment plants, covering dimensions such as water quality monitoring, equipment operation, chemical dosing, process control, and operation and maintenance management. Clean and deduplicate the data, unify the data format, and classify and store it to establish a structured data warehouse. Define nodes and edges in the knowledge graph. Nodes include water quality indicators, equipment, reagents, processes, and personnel. Edges represent the logical relationships between water quality indicators, equipment, reagents, processes, and personnel. Natural language processing technology was used to extract feature entities from the operating procedures and industry standards of urban sewage treatment plants to fill a knowledge graph.
3. The method for implementing an operational standard form template that also takes into account the use of artificial intelligence, as described in claim 2, is characterized in that: The artificial intelligence models trained in S1 for urban wastewater treatment business scenarios include: The template generation optimization model is trained by combining a natural language understanding model based on BERT bidirectional encoder representation with a sequence labeling algorithm. The compliance verification model is trained using a knowledge graph-based graph neural network algorithm. The cost-efficiency optimization model was trained, and the cost-efficiency optimization model adopted a multivariate regression prediction model based on LightGBM lightweight gradient booster. The equipment health prediction model is trained using a Transformer time series prediction model based on time series data.
4. The method for implementing an operational standard form template that takes into account the use of artificial intelligence as described in claim 1, characterized in that: S2 The method used in China to generate operational specification form templates by combining knowledge graphs and artificial intelligence models is as follows: The similarity between the parsed operational requirements and the basic templates in the historical template library is calculated using a collaborative filtering algorithm. When the similarity is greater than the preset matching threshold, the corresponding basic template is invoked. When the similarity is lower than the preset matching threshold, the required fields corresponding to the business process are automatically matched by the graph traversal algorithm of the knowledge graph, and the initial template framework is automatically generated. The preset matching threshold is 75%-85%.
5. The method for implementing an operational standard form template that takes into account the use of artificial intelligence as described in claim 4, characterized in that: After automatically generating the initial template framework, the template is used to generate an optimized model that takes the parsed operational requirement keywords and business nodes and relationships in the knowledge graph as input. The model outputs and automatically sets the field set, field association rules and compliance verification rules of the operational specification form template. The template generation optimization model incorporates a knowledge graph forced attention mechanism into the output layer of the BERT bidirectional encoder representation model. When generating fields, it prioritizes matching the mandatory compliance fields for the wastewater treatment industry and business association rules defined in the knowledge graph, ensuring that the generated templates comply with industry standards and water plant business processes.
6. The method for implementing an operational standard form template that also takes into account the use of artificial intelligence, as described in claim 1, is characterized in that: In S3, the method of using artificial intelligence models for compliance verification and intelligent interaction during execution is as follows: The compliance verification model takes on-site operational data as input in real time, uses graph neural networks to calculate the matching degree between on-site operational data and preset standard thresholds in the knowledge graph, and outputs the data compliance judgment result and abnormal data level. The graph neural network algorithm of the compliance verification model incorporates the temporal feature dimension to simultaneously verify the threshold compliance of a single data point and the consistency of the process logic of operational data within a continuous time period, thereby identifying hidden operational anomalies that cannot be covered by conventional threshold verification. When on-site operational data triggers preset control thresholds, the subgraph matching algorithm of the knowledge graph identifies operational execution deviations and automatically pushes corresponding disposal suggestions, historical solutions, and compliance correction guidelines for the urban sewage treatment business scenario.
7. The method for implementing an operational standard form template that also takes into account the use of artificial intelligence, as described in claim 6, is characterized in that: When water quality monitoring data in on-site operation data exceeds the standard, process adjustment suggestions are automatically pushed, including pop-up prompts in forms, system message pushes, mobile reminders, etc., and the historical treatment plans corresponding to the abnormal water quality data are associated through entity linking algorithms of knowledge graph.
8. The method for implementing an operational standard form template that takes into account the use of artificial intelligence as described in claim 1, characterized in that: The methods for generating auxiliary decision-making suggestions based on on-site operational data in S3 include: The cost-efficiency optimization model takes into account real-time influent water quality data, influent water volume data, equipment operating parameters, and effluent water quality standards. With the constraint of continuous effluent water quality compliance and the optimization objective of minimizing chemical and power consumption, it outputs the optimal chemical dosage and equipment operating parameters, including aeration equipment operating frequency and sludge return ratio. The cost-efficiency optimization model incorporates a dynamic weight adjustment mechanism into the LightGBM lightweight gradient booster model, assigning higher weights to recent real-time operational data and inputting seasonal and rainfall characteristics as auxiliary variables to adapt to fluctuating influent water quality and quantity scenarios and improve parameter optimization accuracy. The equipment health prediction model takes into account real-time operating parameters, historical fault data and maintenance records, and outputs equipment health score and prediction of potential fault occurrence time. When the predicted fault occurrence time is within the next 24 to 48 hours, it automatically triggers a fault warning and pushes the corresponding maintenance plan. The Transformer time series prediction model of the equipment health prediction model incorporates a multi-scale feature extraction module to simultaneously capture the short-term mutation characteristics and long-term degradation trends of equipment operation data, thereby improving the accuracy of early potential fault identification and the advance warning.
9. The method for implementing an operational standard form template that also takes into account the use of artificial intelligence, as described in claim 8, is characterized in that: The method for achieving optimal drug dosage and equipment operating parameter output based on a natural language understanding model using BERT bidirectional encoder representation combined with sequence labeling algorithms is as follows: Using the BERT bidirectional encoder representation model, semantic features are extracted from textual data of process specifications, historical process adjustment plans, and water quality anomaly handling records in the urban wastewater treatment industry to generate process rule feature vectors. By using sequence labeling algorithms, entity labeling is performed on the feature vectors of process rules to identify key water quality factors, equipment operating conditions and process constraints that affect the dosage of reagents, the operating frequency of aeration equipment and the sludge return ratio. The key factors obtained from the annotation are transformed into structured constraints. A comprehensive operating cost function is constructed by combining the optimization objective of minimizing chemical and electrical consumption. This function is then solved within the constraint space of ensuring effluent quality meets standards to output the optimal parameters. The formula for calculating the comprehensive operating cost function is as follows: ; In the formula, This represents the overall operating cost index; Indicates the total number of drug types; This indicates the total number of high-energy-consuming devices; Indicates the first The unit dose cost coefficient of the drug; Indicates the first Dosage of the agent; Indicates the first Energy consumption coefficient per unit operating frequency of each device; Indicates the first The operating frequency of each device; This represents the chemical consumption weighting adjustment factor that is dynamically allocated based on real-time water quality status. This represents the power consumption weighting adjustment factor dynamically allocated based on real-time water quality status.
10. The method for implementing an operational specification form template that takes into account the use of artificial intelligence as described in claim 8, characterized in that: The multivariate regression prediction model based on the LightGBM lightweight gradient booster is adapted to scenarios of fluctuating influent water quality and quantity and improves the accuracy of parameter optimization by performing feature engineering on the collected historical operation data, extracting multi-dimensional features of influent water quality, quantity, equipment operation, season, and rainfall, and dividing the training set and validation set according to the time series. A dynamic weight adjustment mechanism incorporating water quality fluctuations is introduced into the objective function of the LightGBM lightweight gradient booster to calculate training weights for each sample in the training set. The formula for calculating the dynamic training weights is as follows: ; In the formula, Indicates the first Training weights for each historical operational data sample; Represents the basic weight constant; Indicates the timestamp of the current prediction time; Indicates the first The timestamp of when each historical operational data sample was collected; This represents the historical attenuation coefficient. Indicates the first The comprehensive fluctuation gradient of influent water quality and quantity corresponding to each historical operational data point; This represents the amplification factor of abnormal fluctuations extracted based on rainfall and seasonal characteristics. The model uses this weight as a basis to complete pre-training with a histogram algorithm, and after being put into use, it is updated online incrementally in a fixed time window.
11. The method for implementing an operational specification form template that takes into account the use of artificial intelligence as described in claim 8, characterized in that: The specific execution method of the equipment health prediction model is as follows: after completing and normalizing the input real-time operating parameters, historical fault data and maintenance records of the equipment, the time series data segments are divided according to a fixed time step and input into the Transformer time series prediction model. The model outputs the device health score at the current moment and, in parallel, outputs the predicted health trajectory for multiple future time steps; the calculation formula for the device health score is: In the formula, This represents the device health score, with a value range of [value missing]. Up to 100 points; This indicates the total number of dimensions of equipment operating parameters included in the health assessment; Indicates the first Real-time measured actual values of each parameter dimension; This indicates that the Transformer time series forecasting model is based on the output of normal operating conditions. Theoretical predicted values for each parameter dimension; and They represent the first The upper and lower limits of safe physical values for each parameter dimension; Indicates the first Fault sensitivity weights across multiple parameter dimensions; This represents the time-series penalty function for mechanical losses that increases with the equipment's operating time; when a certain time step in the predicted trajectory... When the health level falls below the preset health threshold, a fault warning will be automatically triggered within the next 24 to 48 hours, and a corresponding maintenance plan will be pushed out.
12. The method for implementing an operational specification form template that takes into account the use of artificial intelligence as described in claim 11, characterized in that: The specific execution method of the multi-scale feature extraction module is as follows: a multi-scale feature extraction module containing one-dimensional convolutional branches and dilated convolutional branches is set at the encoder input of the Transformer time series prediction model; By using different convolutional kernels to extract and fused features from short-term abrupt changes and long-term degradations in device operation data, a multi-scale fusion feature matrix is generated and input into the encoder. The calculation formula for the multi-scale fusion feature matrix is as follows: ; In the formula, This represents the output multi-scale fused feature matrix; This represents the time series matrix of the input device operating parameters; This represents a small-sized one-dimensional convolutional kernel used to capture short-term, high-frequency abrupt changes in device parameters; This represents a standard one-dimensional convolution operation; This refers to a large-size dilated convolutional kernel used to capture long-term, low-frequency degradation trends in device performance. Indicates the expansion rate The dilated convolution operation; and These represent the attention weights for short-term mutation features and long-term degradation features, which are dynamically allocated through a self-attention mechanism, respectively.
13. The method for implementing an operational standard form template that takes into account the use of artificial intelligence as described in claim 1, characterized in that: In S4, the method for iteratively optimizing the structure and content of the operational guidelines form template based on the analysis results is as follows: Based on the preset urban wastewater treatment standards, anomaly identification and trend prediction analysis are performed on the collected on-site operation data. Based on the results of anomaly identification and trend prediction analysis, the structure of key monitoring items, related data content, and data collection frequency in the existing operation specification form template are optimized in reverse. A new generation of operation specification form template is generated and put into collection and verification. Unlike the traditional fixed operation form template, the operation specification form template can be dynamically adapted to the process upgrades, equipment iterations, and policy and standard updates of urban wastewater treatment plants.
14. The method for implementing an operational specification form template that takes into account the use of artificial intelligence as described in claim 13, characterized in that: In S4, the method of using backhaul data to perform closed-loop upgrades of the artificial intelligence model is as follows: incremental learning algorithms are used to fine-tune the artificial intelligence model. The incremental learning algorithm incorporates a catastrophic forgetting suppression mechanism with regularization constraints to prevent the model from losing its ability to fit existing routine conditions after learning new operating conditions. The total loss function in the incremental learning stage is calculated as follows: ; In the formula, This represents the total loss function during the incremental fine-tuning phase. This represents the current prediction error loss calculated by the model based on the newly added training set, such as on-site operational data and process optimization results. This represents the total number of parameter nodes in the model's neural network; Indicates the first step in the fine-tuning process The current iteration value of each model weight; This indicates the first element in the original model before incremental fine-tuning. A fixed value for each weight; Indicates the first The diagonal elements of the sensitivity evaluation matrix for each weight to historical routine operating condition characteristics; This represents the regularization strength coefficient that balances the learning of new knowledge with the retention of old knowledge; at the same time, a model performance evaluation index mechanism is set, which triggers full retraining when any index falls below the preset upgrade threshold.
15. The method for implementing an operational standard form template that takes into account the use of artificial intelligence as described in claim 1, characterized in that: Multiple artificial intelligence models can achieve cross-scenario linkage. The fault warning information output by the equipment health prediction model is automatically synchronized to the template generation optimization model, and the corresponding fault-specific inspection items are automatically added to the equipment maintenance and operation specification form template. The prediction results of the water quality trend prediction model are automatically synchronized to the compliance verification model, and the early warning thresholds of the form data are updated in advance, realizing intelligent collaboration throughout the entire process.