A safety monitoring method for hydrogen-electric-oil-gas comprehensive energy station based on lightweight deep learning
By constructing a lightweight LSTM encoder-decoder model and performing pruning, the problem of multi-source data fusion in integrated energy stations was solved, enabling real-time security monitoring and rapid fault detection, and improving the safe and stable operation of energy stations.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHONGQING UNIV
- Filing Date
- 2026-03-13
- Publication Date
- 2026-06-12
AI Technical Summary
The data from multiple sensors in integrated energy stations is large and complex. Traditional methods are difficult to effectively integrate multi-source data for safety status assessment, and deep learning models require high computational resources to deploy on front-end devices, making real-time monitoring difficult.
LSTM encoder-decoder models are constructed, time-series data is preprocessed and lightweight pruning is performed, lightweight models of different sizes are deployed on front-end and back-end servers, and the model is dynamically selected for real-time security diagnostics.
It enables intelligent monitoring of integrated hydrogen, electricity, oil and gas energy stations, quickly detects hydrogen leaks and equipment failures, improves early warning response speed and reliability, and reduces model complexity to ensure monitoring accuracy.
Smart Images

Figure CN122196344A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of integrated energy station safety technology, and in particular to a safety monitoring method for integrated hydrogen-electric-oil-gas energy stations based on lightweight deep learning. Background Technology
[0002] With the development of the new energy industry, integrated energy stations combining hydrogen, electricity, oil, and gas supply are gradually emerging. These multi-energy integrated stations improve the convenience and efficiency of energy supply, but their safe operation faces significant challenges. For example, hydrogen leaks during hydrogen storage and refueling may cause explosions, while oil and gas fuel leaks or equipment malfunctions may lead to fires. Therefore, how to conduct safety monitoring of integrated energy stations and promptly detect anomalies such as hydrogen leaks, electrical faults, and fuel leaks has become an urgent technical problem to be solved.
[0003] Furthermore, integrated energy stations generate large and complex multi-sensor data, with interconnections between various energy subsystems, making it difficult for traditional methods to effectively fuse multi-source data to determine the overall safety status of the system. In recent years, artificial intelligence technology has demonstrated advantages in anomaly detection. Deep learning models such as Long Short-Term Memory (LSTM) neural networks can learn temporal patterns of normal operation from historical operational data, enabling the detection of subtle anomalies. However, deep learning models typically have large parameter scales and high computational costs, making it impractical to directly deploy complex models on front-end equipment in energy stations for real-time monitoring. This necessitates optimizing and lightweighting the models to reduce computational overhead and storage consumption, enabling them to run efficiently on edge devices or industrial controllers. Summary of the Invention
[0004] This invention discloses a safety monitoring method for integrated hydrogen-electric-oil-gas energy stations based on lightweight deep learning. The specific method is as follows: Collect time-series data from multiple sources of sensors at the integrated energy station and preprocess the time-series data; Construct an LSTM encoder-decoder model with the goal of minimizing the reconstruction error between the input time series data and the reconstructed time series data, and optimize the LSTM encoder-decoder model. The LSTM encoder-decoder model after training convergence is pruned and lightweighted using a custom pruning strategy, and three lightweight models of different sizes are constructed. Deploy at least two lightweight models with the smallest scale in the front-end equipment of the integrated energy station, and build the largest lightweight model on the back-end server; Based on a custom invocation strategy, dynamically select and invoke different levels of lightweight models for real-time security diagnostics.
[0005] Furthermore, multi-source sensor time-series data includes, but is not limited to: Hydrogen energy system data: hydrogen pressure and concentration, oil and gas pipeline pressure and flow rate; Oil and gas system data: pressure, valve opening, oil and gas flow; Power system data: voltage, current, temperature, equipment status.
[0006] Furthermore, the time-series data is preprocessed using the following methods: Each sensor records readings over time, which are then aggregated to form multidimensional time-series data. , Indicates time Collected Sensor readings; Normalization of time series data with different dimensions, for the first... Data sequences of various sensors Normalization is performed using min-max linear scaling, and the transformation formula is as follows: ; in, and These represent the sensors. The minimum and maximum values in the training data; If a sensor is at time... If no valid readings are recorded, linear interpolation is used to fill in the gaps using data from adjacent time points to estimate the single-point missing data, as shown in the following formula: ; That is, the assumed time missing values It is the average of the values at consecutive time points; Obtain a clean and scale-uniform time series .
[0007] Furthermore, the LSTM encoder-decoder model includes an encoder and a decoder; The encoder is used to receive lengths of The input sequence is updated with its hidden state at each time step, and finally the information of the entire sequence is compressed into a hidden vector of fixed dimension. The decoder uses the encoder's hidden vector as initial conditions to generate an output sequence of the same length as the input, in order to reconstruct the original sequence.
[0008] Furthermore, the LSTM encoder-decoder model is trained using the following method: Define the input sequence of the training samples, such as ,in, For sequence length, Indicates the first Input features at each time step; The encoder uses LSTM to progressively encode the input sequence to obtain the hidden state sequence. The state update relationship is expressed by the following formula: ; in, For the encoder in the first The hidden state of the step, This represents the state update function of the encoder LSTM. This is the set of parameters for the encoder; after... After the encoding steps, the final implicit representation can be obtained. ; The decoder uses context vectors As the initial state, the output sequence is gradually generated using LSTM. ; The formula for the hidden state recursion of the decoder is as follows: ; in, For the decoder in the The hidden state of the step, This represents the state update function of the decoder LSTM, with the following parameter set: ; Decoder initial hidden state command , Generated by the encoder, and Time can be ordered This is the initial input; Through the output mapping function Map the hidden states to the reconstructed output of the decoder. The formula is as follows: ; in This is the decoder's output mapping function, used to map the current hidden state. Convert to output The total number of trainable parameters of the model is denoted as . ; The training objective function is defined as the reconstruction error, and the mean squared error is used to measure the output sequence. Relative to the input sequence The reconstruction error is calculated using the following formula: ; in, Indicates the first The squared error between input and output at each time step; for a given time step containing The training set of sequence samples can be used to calculate the above error for each sample and take the average to evaluate the overall reconstruction error; By introducing an L2 regularization term into the loss function, a penalty constraint is imposed on the sum of squares of the model parameters. The overall loss function expression after introducing the regularization term is as follows: ; Among them, the first item That is, the reconstruction error, and the second term is the L2 regularization term. This represents the sum of squares of all parameters in the model. The regularization coefficient is used. The parameters are iteratively updated using gradient descent, adjusting along the negative gradient direction. The gradient descent update rule formula is as follows: until convergence or the preset number of iterations is reached: ; in, express Any trainable parameter in the dataset, Indicates the first The value of this parameter in the next iteration. The learning rate; due to the loss function Includes L2 regularization terms, which are applied to the parameter The gradient is increased based on the reconstructed error gradient. Therefore, the parentheses in the above formula contain the term. That is, the loss function with respect to The gradient.
[0009] Furthermore, we construct the smallest lightweight model M1, as follows: Collect normal operating condition data to form a training set Each variable was normalized and missing values were filled in in accordance with the online method. Using a sliding window to generate sample sequences And divide the training / validation subsets; Based on the LSTM encoder-decoder model, minimize the reconstruction error on the training samples and train to the validation set. Once convergence or a preset number of iterations is reached, the parameters and reconstruction error statistics are obtained. Based on the parameters of the LSTM encoder-decoder model, further training is conducted, and L1 and L2 regularization are introduced to obtain: ; in, This is the total loss after adding regularization. It is a set of model parameters. It is any weight parameter. These are L1 regularization coefficients. These are L2 regularization coefficients; An incremental strategy, starting small and gradually increasing, can be used to avoid instability in the early stages of training. The parameter matrix to be cut Calculate importance: ; in, This represents the weight matrix of the layer to be pruned. This represents the weight of the i-th row and j-th column. This indicates the importance measure of the weight; Using threshold Determine the pruning intensity: Directly prune. The connection; Construct the mask matrix: ; Perform pruning ,in, It is a binary mask. For indicator functions, The pruning threshold is represented by ⊙, which indicates element-wise multiplication. This is the weight matrix after pruning; the pruned weights are fixed to 0 to form a sparse connection structure. Optional node-level pruning further reduces the number of neurons, and the importance of nodes is calculated for hidden unit k: ; When satisfied Delete the hidden unit and its associated ingress and egress connections, and simultaneously update the dimensions of the relevant matrix. For the first The importance score of each neuron The weights flowing into this neuron, The weights that flow out of this neuron, The neuron pruning threshold; Fine-tuning training is performed using the pruned parameters as initial values, with indicators retained. Control updates: ; in, and These are the weights before and after the update, respectively. For learning rate, To calculate the gradient of the loss with respect to the weights, Indicates whether the connection should be retained; the fine-tuning stopping condition can be set to the preset proportion range of the validation set reconstruction error recovering to the error before pruning, or no significant decrease for several consecutive rounds; Calculate the online error on M1 using normal data: ; in, For M1 at time Reconstruction error, The abnormal threshold for M1; the threshold is determined based on the normal error distribution. And export M1 and .
[0010] Furthermore, a lightweight model M2 of moderate size is constructed, as follows: Select a normal dataset that covers a wider range of operations Sliding window generates samples ; Introduce a scaling vector at the hidden state output. : in, Let be the hidden state vector of the LSTM at time t; This is the scaled-down hidden state; This is a vector of scaling factors; For the first The importance parameters of each hidden unit; For hidden state dimensions; Training using an objective function makes some Converging to near 0: ; in, For pruning training objectives; This is the weight decay coefficient; The sparsity intensity is the scaling factor; The closer to 0, the smaller the contribution of the corresponding hidden unit; Start training small To ensure the stability of the reconstruction, further increase... Promotes sparseness; To determine the pruning set, all statistics were collected. And sort them, set a threshold. get: ; in, To preserve the set of hidden units, To remove the set of hidden units, This is the pruning threshold; Perform structured trimming and reconstruction on all Delete the corresponding hidden dimension, making the hidden dimension ; Gating calculations include: ; in, These are the input gate, forget gate, output gate, and candidate gate, respectively. For input, This is the hidden state from the previous moment. For bias; Updated after cropping: ; in, For input dimensions, The hidden state dimension after clipping. This is the weight matrix after clipping; Fine-tune the training of the cropped model until the validation set error stabilizes, then perform statistical analysis using normal data. ; in, The reconstruction error is M2; The outlier threshold for M2 is determined based on the error distribution. And export M2 and .
[0011] Furthermore, the largest lightweight model, M3, is constructed using the following method: Build a more comprehensive dataset covering different working conditions ; Training Teacher Model The teacher's reconstructed output is obtained by converging to the optimal value on the validation set. ; Design a student model M3 with a structure smaller than the teacher model but larger than M1 and M2, for the same input. Computational student reconstruction and read the teacher's output. ; The distillation objective is constructed and the student model is trained. The student reconstruction loss is as follows: ; in, The loss for students reconstructing real input; Reconstruct the output for students; To make the student model closer to the teacher model, the distillation loss is calculated first: ; in, For distillation losses, Reconstructing output for teachers; At this point, the comprehensive objective function for training the student model can be obtained: ; in, The overall goal of student training As a weighting factor, For the student parameter set, The regularization coefficient is... For any weight in the student network; The validation set can be used to adjust the fit to the input and to approximate the teacher model. If there is unnormalized output in the discrimination process, temperature and KL terms can be introduced: ; in, , Output the original values for the teacher model and the student model respectively. The distillation temperature. , It is a soft probability distribution; ; Indexed by category; At this time, you can Added as an add-on ; After distillation and convergence, only one type of product can be used. Make minor adjustments, then calculate the reconstruction error: ; In the formula, For M3 reconstruction error, The M3 anomaly threshold; determine And export M3 and .
[0012] Furthermore, a custom invocation strategy can be defined, as follows: Calculate the margin of reconstruction error relative to the threshold: ; Then map the margin to confidence level: ; in, This is the error margin; For single-test credibility; The mapping slope coefficient; The smaller the value, the closer it is to the threshold, and the more uncertain it is. when near hour, Smaller, less credible Relatively low; when When far from the threshold, The larger the value, the higher the credibility. When alarms or suspected faults are accurately identified after subsequent maintenance... At that time, the model output for this sample will be... Compare with the true labels and update the recent accuracy metric using an exponential moving average: ; in, For the recent accuracy estimate of M1, The moving average update coefficients are... For indicator functions; For real labels, Output for Mi; The overall cost of candidate invocation actions is quantified, and the optimal invocation path is selected based on this, starting from the current level. Switch to , , : ; in, Indicates at time From the current level Call to level Comprehensive evaluation indicators; , , , , Assign weights to each cost item; These are network usage costs, time usage costs, risk costs, computing power costs, and electricity usage costs. The conditions for converting M1 to M2 include: When the current sample is considered to be an uncertain sample, then, This is the credibility threshold; When; when the reconstruction error and threshold deviation fluctuate more within a preset time period; when it is found that the misjudgment rate of M1 increases, the accuracy decreases, or the error distribution drifts within the preset time window; The final decision on whether to upgrade to model M2 will be made based on a comprehensive evaluation index: ; in, To retrieve the comprehensive evaluation index of M2 from M1, To continue using the comprehensive evaluation index of M1, the transition from M1 to M2 will be triggered when this condition is met; The conditions for converting M1 to M3 include: Excellent network conditions, that is When; when the latency is too high, and That is, when the front-end review cannot meet the timeliness requirements; or when the front-end equipment is operating at a high risk. That is, when continuing to monitor the front end would increase the probability of equipment failure or the consequences of failure; In the above situations, a final decision will be made on whether to directly make a backend call, based on comprehensive evaluation indicators: ; in, To retrieve the comprehensive evaluation indicators of M3 from M1, To continue using the comprehensive evaluation index of M1, the transition from M1 to M3, i.e., from the front end to the back end, is triggered when this condition is met. The conditions for converting M1 to M2 include: Even after review, the credibility is still insufficient, that is... The accuracy rate still decreased after review, that is... The operational risk of model M2 deployed at the front end is increased. This means that continued front-end monitoring would increase the time spent on the device, thereby increasing the probability of device failure or its consequences; or the computing power cost would be too high. That is, when the computational cost of M2 is too high; and when the network conditions are excellent, i.e. Time; front-end energy consumption cost That is, when the local energy consumption exceeds the budget; When the above situations occur, a final decision on whether to call the M3 model in the background will be made based on comprehensive evaluation indicators: ; in, To retrieve the comprehensive evaluation indicators of M3 from M2, To continue using the comprehensive evaluation of M2, the condition is met to trigger M2→M3, which is the transition from the front end to the back end.
[0013] Furthermore, network occupancy costs The formula is as follows: ; in, To determine the amount of data to be sent; For the amount of data to be returned; , This is the link occupancy factor; when Usually , Significant increase; when When calling from the front end locally, you can , Approximately zero or ignored; Time cost The formula is as follows: ; in, The inference latency for calling model Mk; For scheduling delay; For network round-trip latency; For indicator functions; Introducing usage time The formula is as follows: ; in, This represents the time elapsed from initiating an action to obtaining a switch, expressed as a time cost. As its estimate; Computing cost The formula is as follows: ; in, , , These represent the CPU, GPU, and accelerator usage of the calling model Mk, respectively. , , This is the resource conversion factor; Electricity usage cost The formula is as follows: ; in, The energy consumption for inference of Mk This refers to the energy consumption of data transfer during cross-layer calls. For indicator functions; Operating risk costs ; ; in, Indicates the status of the front-end device or operating condition. The consequences and risk levels of front-end equipment malfunctions; This represents the probability of a front-end failure occurring within the specified time period. Weighting the consequences of misjudgment; For model Mk in the window Estimation of the probability of misjudgment; The formula for network availability metrics is as follows: ; in, Available bandwidth; For round-trip time; Packet loss rate; ; in, Indicates whether background calls are available; For indicator functions; This is the lower limit of bandwidth. This is the upper limit of latency; This is the maximum number of packets lost.
[0014] Due to the adoption of the above technical solutions, this application has the following beneficial effects: This application utilizes an LSTM encoder-decoder model to model the time-series data of multiple sensors in an integrated energy station, identifying anomalies through reconstruction errors. Simultaneously, it introduces various lightweight techniques such as model pruning to reduce model complexity and achieve optimal performance while ensuring detection accuracy. Using this method, intelligent monitoring of the operating status of integrated hydrogen-electricity-oil-gas energy stations can be achieved, quickly detecting potential safety hazards such as hydrogen leaks and equipment failures, improving early warning response speed and reliability, ensuring the safe and stable operation of the energy station, and enabling efficient operation by lightweighting the model while maintaining monitoring accuracy.
[0015] Other advantages, objectives, and features of the invention will be set forth in part in the description which follows, and in part will be apparent to those skilled in the art from the following examination, or may be learned from practice of the invention. The objectives and other advantages of the invention can be realized and obtained through the following description. Attached Figure Description
[0016] The accompanying drawings of this invention are described below.
[0017] Figure 1 This is a schematic diagram of the process of the present invention. Detailed Implementation
[0018] The present invention will be further described below with reference to the accompanying drawings and embodiments.
[0019] A safety monitoring method for integrated hydrogen-electric-oil-gas energy stations based on lightweight deep learning, such as... Figure 1 As shown, the specific steps are as follows: S1. Data Acquisition and Preprocessing Operational data is acquired from various subsystems of the integrated hydrogen-electricity-oil-gas energy station, including hydrogen pressure and concentration, oil and gas pipeline pressure and flow, power system voltage and current, and various sensor signals such as temperature and equipment status. The collected multi-source time-series data is cleaned and normalized, sensor data is normalized, and missing data is filled in.
[0020] The specific steps are as follows: Multiple types of sensors are deployed in the integrated hydrogen-electricity-oil-gas energy station to acquire key parameters during operation. For example, hydrogen concentration and pressure sensors are deployed in the hydrogen energy module; pipeline pressure and valve opening sensors are installed in the oil and gas supply module to monitor oil and gas flow. Each sensor records readings over time, and the data is aggregated to form multidimensional time-series data. For example, vector... Indicates time Collected The system acquires various sensor readings. The data acquisition system should ensure data synchronization and integrity to provide reliable foundational data for subsequent analysis.
[0021] The acquired raw multi-source data sequences are preprocessed to improve data quality and consistency. First, sensor data of different dimensions are normalized. For the first... Data sequences of various sensors The normalization can be achieved using min-max linear scaling, and the conversion formula is as follows: ; in and These represent the sensors. The minimum and maximum values in the training data. After formula processing, the numerical ranges of different sensors are unified, preventing certain parameters with large dimensions from dominating model training.
[0022] Secondly, the system addresses potential missing and outlier values in the dataset. For example, if a sensor at time... If no valid readings are recorded, linear interpolation can be used to fill the gaps using data from adjacent time points, and the missing data can be estimated using a formula: ; That is, the assumed time missing values This is the average of values from consecutive sampling times. When missing values persist for multiple sampling periods, methods such as forward padding or mean padding can be used. Furthermore, outlier values that significantly exceed the normal range can be removed or replaced with neighboring values to reduce the interference of extreme anomalies on model training.
[0023] After the above processing, a clean and scale-uniform time series is obtained. , as input for model training.
[0024] S2, Model Building and Training An LSTM encoder-decoder model is established to model and learn the preprocessed multivariate time series data. The training process uses only historical data under normal operating conditions to enable the model to learn the time series patterns of the integrated energy station under normal conditions. Anomalies are then detected through reconstruction error. The model parameters are trained by minimizing the difference between the input and output.
[0025] The specific steps are as follows: A Long Short-Term Memory (LSTM) neural network was used to construct an encoder-decoder model to learn the normal patterns of multi-sensor time series data from an integrated energy station. The model consists of two parts: an encoder LSTM and a decoder LSTM. The encoder's receiving length is... The encoder-decoder process takes the input sequence as input, updates its hidden state time-by-time, and finally compresses the information of the entire sequence into a hidden vector of fixed dimension. The decoder uses the encoder's hidden vector as initial conditions to generate an output sequence of the same length as the input, hoping that it can reconstruct the original sequence. Compared with traditional autoencoders, LSTM encoder-decoders can handle variable-length sequences and use the gating mechanism of LSTM to capture long-term dependency information, making them suitable for analyzing the time-series operation data of energy stations.
[0026] The encoder-decoder model constructed above is applied to the training data, and the model parameters are adjusted so that the decoder output sequence can reconstruct the original input sequence as accurately as possible. Specifically, the input sequence of the training samples is first defined, such as... ,in, For sequence length, Indicates the first The input features are analyzed at each time step. The encoder uses LSTM to progressively encode the input sequence to obtain the hidden state sequence. The state update relationship is shown in the formula: ; in, For the encoder in the first The hidden state of the step, This represents the state update function of the encoder LSTM. This is the set of parameters for the encoder. After... After the encoding steps, the final implicit representation can be obtained. The decoder uses context vectors. As the initial state, the output sequence is gradually generated using LSTM. The recursive relationship of the decoder's hidden states is shown in the formula: ; in, For the decoder in the The hidden state of the step, This represents the state update function of the decoder LSTM, with the following parameter set: ; Decoder initial hidden state command , Generated by the encoder, and Time can be ordered Use the initial input as the input. Map the output using the output mapping function. Map the hidden states to the reconstructed output of the decoder. As shown in the formula: ; in This is the decoder's output mapping function, used to map the current hidden state. Convert to output The total number of trainable parameters of the model is denoted as . During model training, it is necessary to determine the parameter set. The goal is to minimize the difference between the decoder output sequence and the original input sequence. Therefore, the training objective function is defined as the reconstruction error. Mean squared error is used to measure the difference between the output sequence and the original input sequence. Relative to the input sequence The reconstruction error is calculated using the formula shown in the formula below: ; in, Indicates the first The squared error between the input and output at each time step. For a given time step containing The training set of sequence samples is used to calculate the aforementioned error for each sample and average it to evaluate the overall reconstruction error. To improve the model's generalization ability and prevent overfitting, an L2 regularization term is introduced into the loss function to penalize the sum of squares of the model parameters. The overall loss function expression after introducing the regularization term is shown in the formula: ; Among them, the first item That is, the reconstruction error, the second term is the L2 regularization term ( (represents the sum of squares of all parameters in the model). This is the regularization coefficient, used to control the degree of influence of the regularization term. In summary, the loss function... The error magnitude of the model reconstruction sequence and the model complexity penalty were measured.
[0027] Next, the model parameters are optimized by minimizing the aforementioned loss function. Gradient descent is used to iteratively update the parameters, adjusting them along the negative gradient direction. This continues until convergence or the preset number of iterations is reached. Specifically, the gradient descent update rule for the model parameters is shown in the formula: ; in, express Any trainable parameter in the encoder or decoder (e.g., weight coefficients in the encoder or decoder). Indicates the first The value of this parameter in the next iteration. The learning rate. Due to the loss function... Includes L2 regularization terms, which are applied to the parameter The gradient is increased based on the reconstructed error gradient. Therefore, the parentheses in the above formula contain the term. That is, the loss function with respect to The gradient is calculated. All parameters are updated and iterated continuously until the loss function drops to a predetermined threshold or the training reaches a set number of epochs, at which point the model's parameter training is complete.
[0028] This invention performs pruning and lightweighting after model training convergence to reduce the number of model parameters and computational cost, and constructs three pruning models of different scales, corresponding to different pruning strategies and different numbers of neurons retained: Model M1 is the smallest model, employing L1 regularization pruning with the largest pruning ratio, and is primarily used for routine online diagnosis. Model M2 is a medium-sized model, employing Network Slimming pruning with a moderate pruning ratio, and is used for secondary judgment accuracy. Model M3 is a larger model, using a student model obtained through knowledge distillation, and is used for further verification of high-risk samples.
[0029] S3, Lightweight Pruning and Model Library Construction After the initial training of the model reaches convergence, this invention balances detection accuracy and computational efficiency by constructing three lightweight models of different sizes, specifically employing three differentiated pruning strategies. Based on a preset importance metric, the impact of weight parameters or hidden units on model performance is calculated, and less important components are pruned, constructing multiple lightweight models with different compression ratios. After pruning, the models are fine-tuned to reduce computational overhead and storage usage while maintaining detection accuracy, ultimately resulting in a suitable lightweight model library.
[0030] The specific steps are as follows: S31. Minimal Model M1 Based on L1 Regularized Pruning Strategy Collect normal operating condition data to form a training set Each variable was normalized and missing values were imputed in accordance with online methods; a sliding window was used to generate the sample sequence. The training / validation subsets are then divided to determine whether the performance before and after pruning meets the requirements.
[0031] Based on the LSTM encoder-decoder, the reconstruction error is minimized on the training samples, and the training continues to the validation set. Once convergence or a preset number of iterations is reached, the parameters and reconstruction error statistics are obtained.
[0032] Continuing training based on the model parameters, L1 and L2 regularization are introduced to obtain: ; in, This is the total loss after adding regularization. It is a set of model parameters. It is any weight parameter. These are L1 regularization coefficients. It is the L2 regularization coefficient. An incremental strategy, starting small and gradually increasing, can be used to avoid instability in the early stages of training.
[0033] The parameter matrix to be cut Calculate importance: ; in, This represents the weight matrix of the layer to be pruned. This represents the weight of the i-th row and j-th column. This represents a measure of the importance of the weight. A threshold is used. Determine the pruning intensity: Directly prune. The connection.
[0034] Construct the mask matrix: ; Perform pruning ,in, It is a binary mask. This is an indicator function (it takes a value of 1 if the condition is true, and 0 otherwise). The pruning threshold is represented by ⊙, which indicates element-wise multiplication. This is the weight matrix after pruning. The pruned weights are fixed to 0 to form a sparse connection structure.
[0035] Optional node-level pruning further reduces the number of neurons, and the importance of nodes is calculated for hidden unit k: ; When satisfied Delete the hidden unit and its associated ingress and egress connections, and simultaneously update the dimensions of the relevant matrix. For the first The importance score of each neuron The weights flowing into this neuron, The weights that flow out of this neuron, This is the neuron pruning threshold.
[0036] Pruning inevitably affects the model's reconstruction ability. To restore model accuracy, fine-tuning training is performed after pruning. Fine-tuning training is conducted using the pruned parameters as initial values, employing a retention indicator approach. Control updates: ; in, and These are the weights before and after the update, respectively. For learning rate, To calculate the gradient of the loss with respect to the weights, Indicates whether the connection should be retained (or removed). The fine-tuning stopping condition can be set to the validation set reconstruction error recovering to a preset percentage range of the error before pruning, or no significant decrease for several consecutive rounds.
[0037] Calculate the online error on M1 using normal data: ; in, For M1 at time... Reconstruction error, The abnormal threshold for M1 is determined based on the normal error distribution. And export M1 and .
[0038] S32, Pruning Model M2 Based on Network Slimming Select a normal dataset that covers a wider range of operations For example, data under different loads, start-ups, shutdowns, and environmental fluctuations can be generated using a sliding window to create samples. And prepare a validation set for comparison before and after pruning.
[0039] Introduce a scaling vector at the hidden state output. : ; in, Let be the hidden state vector of the LSTM at time t; This is the scaled-down hidden state; This is a vector of scaling factors; For the first The importance parameters of each hidden unit; For the hidden state dimension.
[0040] Training using an objective function makes some Converging to near 0: ; in, For pruning training objectives; This is the weight decay coefficient; The sparsity intensity is the scaling factor; The closer to 0, the smaller the contribution of the corresponding hidden unit. The training strategy can be divided into two stages: first small... To ensure the stability of the reconstruction, further increase... Promotes sparseness.
[0041] To determine the pruning set, all statistics were collected. And sort them, set a threshold. get: ; in, To preserve the set of hidden units, To remove the set of hidden units, This is the pruning threshold.
[0042] Perform structured trimming and reconstruction on all Delete the corresponding hidden dimension, making the hidden dimension Gating calculations include: ; in, These are the input gate, forget gate, output gate, and candidate gate, respectively. For input, This is the hidden state from the previous moment. For offset. Updated after cropping: ; For input dimensions, The hidden state dimension after clipping. This is the pruned weight matrix. The encoder, decoder, and output mapping layer dimensions are checked for matching to ensure proper forward computation.
[0043] Fine-tune the training of the cropped model until the validation set error stabilizes, then perform statistical analysis using normal data. ; The reconstruction error of M2; The anomaly threshold for M2 is determined based on the error distribution. And export M2 and .
[0044] S33, Pruning Model M3 Based on Knowledge Distillation Build a more comprehensive dataset covering different working conditions To maintain the same preprocessing as online, training and validation sets are divided for distillation convergence judgment.
[0045] Training Teacher Model The teacher's reconstructed output is obtained by converging to the optimal value on the validation set. .
[0046] Design a student model M3 structure, typically smaller than the teacher model but larger than M1 and M2, for the same input. Computational student reconstruction and read the teacher's output. .
[0047] The distillation objective is constructed and the student model is trained. The student reconstruction loss is as follows: ; in, The loss for students reconstructing real input; Reconstruct the output for students.
[0048] To make the student model closer to the teacher model, the distillation loss is calculated first: ; in, For distillation loss, Reconstruct output for teachers.
[0049] At this point, the comprehensive objective function for training the student model can be obtained: ; in, The overall goal of student training As a weighting factor, For the student parameter set, The regularization coefficient is... Let be any weight in the student network. The validation set can be used to adjust the fit between the input and the teacher model.
[0050] If there is unnormalized output in the discrimination process, temperature and KL terms can be introduced: ; in, , Output the original values for the teacher model and the student model respectively. The distillation temperature. , It is a soft probability distribution.
[0051] ; This is a category index. At this point, you can... Added as an add-on by.
[0052] After distillation and convergence, only one option can be used. Make minor adjustments, then calculate the reconstruction error: ; For M3 reconstruction error, This is the M3 anomaly threshold. (Determine) And export M3 and .
[0053] S4. Hierarchical model invocation and anomaly detection and judgment The generated lightweight model is deployed in the integrated energy station monitoring system, employing a hierarchical dynamic invocation mechanism for online monitoring. The minimum and medium-sized models, with the lowest computational cost, are deployed at the front end for real-time inference, while the more accurate and larger-scale knowledge distillation model is deployed in the back end. By identifying different scenarios, a final judgment is made based on the comprehensive evaluation indicators under these scenarios, determining whether to invoke different levels of models or upload data from the front end to the back end server for real-time monitoring. Finally, based on the invoked model, the system is determined to have potential safety hazards such as hydrogen leakage or equipment failure, triggering an early warning, thereby reducing average computational energy consumption while ensuring monitoring reliability. Compared with existing technologies, this invention uses an LSTM sequence model, which can fully utilize the time dependence and multi-sensor correlation of historical data to train the normal behavior patterns of the integrated energy station, improving the anomaly detection rate. Lightweight techniques such as model pruning reduce the number of parameters and computational cost of deep models, making front-end deployment in energy stations possible. This method can implement a unified monitoring strategy for systems with multiple energy forms, including hydrogen, electricity, oil, and gas, providing timely warnings of safety risks under complex operating conditions and improving the inherent safety level of integrated energy station operation.
[0054] The specific steps are as follows: S41. Model Deployment and Related Indicator Construction A two-tier deployment architecture is adopted, with lightweight model M1 and medium-sized model M2 deployed on the front-end equipment of the integrated energy station, and the larger model M3 deployed on the back-end server side. Among them, the front-end equipment has close access to sensor and monitoring system data, and has the ability to perform low-latency local inference and operate offline; the back-end server can be deployed in the station-level data center or cloud resource pool, with higher computing power and storage capacity, which is conducive to performing high-complexity inference and cross-device analysis problems.
[0055] In addition, the two-tier deployment facilitates model operation and maintenance and upgrades. The front-end M1 and M2 can be updated in batches according to device capabilities, ensuring real-time performance and controllable resource consumption. The back-end M3 can be centrally deployed and uniformly updated, and the verification results can be used to calibrate the thresholds and parameters of the front-end model, forming a collaborative partnership between the front-end and back-end.
[0056] To determine whether the current model's discrimination is reliable, first calculate the margin of the reconstruction error relative to the threshold: ; Then map the margin to confidence level: ; in, This is the error margin; For single-test credibility; This represents the mapping slope coefficient. The smaller the value, the closer it is to the threshold and the more uncertain it is.
[0057] when near hour, Smaller, less credible Relatively low; when When far from the threshold, A higher confidence level indicates higher reliability. This reliability level serves as the basis for determining whether to conduct model verification and can be calculated and output in real-time during each diagnostic cycle.
[0058] To obtain the recent monitoring accuracy of the current model, when an alarm or suspected fault obtains a true label after subsequent maintenance. At that time, the model output for this sample will be... Compare with the true labels and update the recent accuracy metric using an exponential moving average: ; in, For the recent accuracy estimate of M1, The moving average update coefficients are... For indicator functions; For real labels, Output for Mi.
[0059] This metric reflects the actual effectiveness of Mi at the current site and within the current time period, and is used to address model performance changes caused by factors such as operational condition drift and sensor offset. Since verification results are not available for every sample, The update can be performed only when verification is required, and the data can be cached in the system for a long time. When there is no update for a long time, the system will keep the most recent estimate for triggering the criterion.
[0060] To strike a balance between monitoring reliability and resource consumption, a tiered call evaluation index is constructed to quantify the comprehensive cost of candidate call actions and select the optimal call path accordingly. From the current level... Switch to , , .
[0061] ; in, Indicates at time From the current level Call to level Comprehensive evaluation indicators; , , , , Assign weights to each cost item; These are network usage costs, time usage costs, risk costs, computing power costs, and electricity usage costs.
[0062] Network occupancy cost ; in, To the amount of data sent; For the amount of data to be returned; , This is the link occupancy factor; when (Background call) usually , Significant increase; when When calling from the front end locally, you can , Approximately take 0 or ignore.
[0063] Time cost ; in, The inference latency for calling model Mk; For scheduling delay; For network round-trip latency; For the indicator function (calling the background) Take 1, otherwise take 0).
[0064] Introducing the duration of use: ; in, This represents the time elapsed from initiating an action to obtaining a switch, expressed as a time cost. As its estimate.
[0065] Computing cost ; in, , , These represent the CPU, GPU, and accelerator usage of the calling model Mk, respectively. , , This is the resource conversion factor.
[0066] Electricity usage cost ; in, The energy consumption for inference of Mk This refers to the energy consumption of data transfer during cross-layer calls. This is an indicator function (it takes 1 if the background function is called, otherwise it takes 0).
[0067] Operating risk costs ; in, Indicates the status of front-end devices or operating conditions (device type, health status, ambient temperature, load, etc.); The consequences and risk levels of front-end equipment malfunctions; This represents the probability of a front-end failure occurring within the specified time period. Weighting the consequences of misjudgment; For model Mk in the window The probability of misjudgment is estimated.
[0068] The probability of front-end failure can be represented by an exponential failure model: ; in, Failure rate related to equipment health, environment, and load; For the duration of time occupied; therefore The larger, The larger the size, the higher the operating risk and cost.
[0069] Building network availability metrics ; in, Available bandwidth; For round-trip time; This refers to the packet loss rate.
[0070] ; in, Indicates whether background calls are available; For indicator functions; This is the lower limit of bandwidth. This is the upper limit of latency; This is the maximum number of packets lost.
[0071] S42. Invocation Situation and Execution Judgment S421, Identification of the call scenario from M1 to M2, the specific scenarios are as follows: Insufficient credibility: When the current sample is considered an uncertain sample, the window output is unstable, and using only M1 is prone to false alarms or false negatives. Front-end verification is needed to enhance reliability, and it is determined whether to perform M2 verification. This is the credibility threshold.
[0072] Recent accuracy decline: Recently, the overall accuracy of M1 has declined. This is the accuracy threshold. M1 may produce erroneous detection results under the current circumstances, and it should be determined whether to increase the review priority.
[0073] Operating condition switching: During the operating condition switching process, the monitoring data may experience sudden changes or transition phases, causing the input of model M1 to deviate from its training. This results in a rapid increase in the fluctuation of reconstruction error and threshold deviation, making it easier for false alarms or missed alarms to occur.
[0074] Feedback from random sampling: When the system obtains the true results of some samples through random sampling or manual verification, and finds that the misjudgment rate of M1 increases, the accuracy decreases, or the error distribution drifts in the recent window, it indicates that the model may have a risk of phased mismatch due to factors such as working condition migration, equipment aging, and changes in sensor characteristics.
[0075] When the above situations occur, a final decision on whether to upgrade to model M2 will be made based on comprehensive evaluation indicators: ; in, To retrieve the comprehensive evaluation index of M2 from M1, To continue using the comprehensive evaluation index of M1, M1→M2 is triggered when this condition is met.
[0076] S422, the M1→M3 trigger scenario is a skip-level / cross-layer transition, meaning that model M2 is not called; instead, model M3 deployed in the background is used directly for real-time monitoring. The specific scenario is as follows: Excellent network conditions, that is When network bandwidth, round-trip latency, and packet loss rate meet the background call threshold, the system allows M3 to be included in the candidate set and determines whether to execute a cross-level call. When the network is unavailable, the system does not execute cross-level calls, but instead uses methods such as caching for review, only sending features, or delaying the sending process to ensure continuity.
[0077] When the latency is too high, and In other words, when the front-end review cannot meet the timeliness requirement, but the network is good and the background is idle, making the upload faster, the system adopts rapid upload to the background for monitoring, thereby ensuring timeliness or controlling response.
[0078] When the front-end device is operating at a high risk, In other words, when continuing to monitor the front end would increase the probability of equipment failure or the consequences of failure, the system will determine whether to prioritize skipping the transmission to shorten the front end occupation and reduce the operational risk and cost in order to reduce the probability of failure and the consequences of failure during the front end occupation.
[0079] In the above situations, a final decision will be made on whether to directly make a backend call, based on comprehensive evaluation indicators: ; in, To retrieve the comprehensive evaluation indicators of M3 from M1, To continue using the comprehensive evaluation of M1, the condition is met to trigger M1→M3, which is the transition from the front end to the back end.
[0080] S423, M2→M3 trigger scenarios, the specific scenarios are as follows: Even after review, the credibility is still insufficient, that is... If the M2 review still cannot provide a stable conclusion, the system determines that the front-end review is insufficient for normal monitoring and decides to switch to the back-end M3 for real-time monitoring, thereby improving the consistency of judgment and reducing the risk of misjudgment.
[0081] The accuracy rate still decreased after review, that is When the recent accuracy of M2 is still below the threshold, it indicates that the front-end review capability has also deteriorated. The system prioritizes whether to send the data to the back-end to avoid misjudgment problems on the front end.
[0082] The operational risk of model M2, deployed at the front end, is increased. In other words, if continuing to monitor through the front end would lead to increased time occupied, thereby increasing the probability of equipment failure or the consequences of failure, the system will determine whether to prioritize sending the data to the back end to shorten the time occupied by the front end and reduce operational risks and costs.
[0083] The cost of computing power is too high. That is, when the computation cost of M2 is too high, the system skips the front-end review and decides whether to send it to the back-end for monitoring, so that highly complex calculations are handled by the back-end, thereby reducing the front-end computing power occupation and ensuring the stable operation of real-time tasks.
[0084] Excellent network conditions, that is When network bandwidth, round-trip latency, and packet loss rate meet the background call threshold, the system allows M3 to be included in the candidate set and determines whether to execute a cross-level call. When the network is unavailable, the system does not execute cross-level calls, but instead uses methods such as caching for review, only sending features, or delaying the sending process to ensure continuity.
[0085] Front-end energy consumption cost That is, when the local energy consumption exceeds the budget, the system determines whether to prioritize sending it to the background monitoring to avoid long-term excessive energy consumption at the front end, which could lead to thermal risks or excessive electricity costs, thereby reducing the energy burden at the front end.
[0086] Excellent network conditions, that is At the same time, as above.
[0087] When the above situations occur, a final decision will be made on whether to call the M3 model in the background, based on comprehensive evaluation indicators: ; in, To retrieve the comprehensive evaluation indicators of M3 from M2, To continue using the comprehensive evaluation of M2, the condition is met to trigger M2→M3, which is the transition from the front end to the back end.
[0088] When the network is unavailable, In such cases, the system can upload only the compression features or cache the window to be reviewed and record it in the queue. The background review will be triggered after the network recovers, thereby ensuring continuity and controlling bandwidth usage under network fluctuations.
[0089] Based on the above scenarios and call execution judgments, a two-level deployment of front-end and back-end is implemented, enabling the invocation and real-time monitoring of three different scale models.
[0090] This invention introduces pruning and compression after model training to form models of different sizes. L1 regular pruning reduces the model parameter size and shortens the effective computation path by deleting less important connections and potentially hidden units that hardly participate in computation. This reduces storage and computational burden on edge devices, thereby lowering the deployment threshold and improving online running efficiency. Network Slimming is a structured pruning method that directly removes unimportant hidden channels from the network structure, synchronously reducing the dimension of the network correlation matrix. This makes it easier to achieve deterministic inference acceleration on general-purpose hardware. This invention uses knowledge distillation to train the student model, enabling the student model to still learn and approximate the output behavior and representation ability of the teacher model under the condition of structural reduction. It is easier to maintain high diagnostic ability and improve the upper limit of the accuracy of the verification model under the same small model size.
[0091] Furthermore, to mitigate the potential decline in diagnostic accuracy of a single lightweight model under complex operating conditions or boundary samples, this invention does not rely solely on a single pruning model. Instead, it deploys multiple strategy models in both the front-end and back-end, using them collaboratively within the same online process, employing a tiered diagnostic and invocation mechanism. Simultaneously, this invention allows different models to be trained using data with varying coverage, ensuring that the default smaller model closely reflects everyday operating conditions, the medium-sized model covers a wider range of boundary fluctuations, and the larger model inherits the stronger model's ability to handle complex samples, thus creating a complementary effect through tiered invocation.
[0092] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit it. Although the present invention has been described in detail with reference to the above embodiments, those skilled in the art should understand that modifications or equivalent substitutions can still be made to the specific implementation of the present invention. Any modifications or equivalent substitutions that do not depart from the spirit and scope of the present invention should be covered within the scope of protection of the claims of the present invention.
Claims
1. A safety monitoring method for a hydrogen-electric-oil-gas integrated energy station based on lightweight deep learning, characterized in that, The specific method is as follows: Collect time-series data from multiple sources of sensors at the integrated energy station and preprocess the time-series data; Construct an LSTM encoder-decoder model with the goal of minimizing the reconstruction error between the input time series data and the reconstructed time series data, and optimize the LSTM encoder-decoder model. The LSTM encoder-decoder model after training convergence is pruned and lightweighted using a custom pruning strategy, and three lightweight models of different sizes are constructed. Deploy at least two lightweight models with the smallest scale in the front-end equipment of the integrated energy station, and build the largest lightweight model on the back-end server; Based on a custom invocation strategy, dynamically select and invoke different levels of lightweight models for real-time security diagnostics.
2. The safety monitoring method for a hydrogen-electric-oil-gas integrated energy station based on lightweight deep learning as described in claim 1, characterized in that, Multi-source sensor time-series data includes, but is not limited to: Hydrogen energy system data: hydrogen pressure and concentration, oil and gas pipeline pressure and flow rate; Oil and gas system data: pressure, valve opening, oil and gas flow; Power system data: voltage, current, temperature, equipment status.
3. The safety monitoring method for a hydrogen-electric-oil-gas integrated energy station based on lightweight deep learning as described in claim 2, characterized in that, The time series data is preprocessed using the following methods: Each sensor records readings over time, which are then aggregated to form multidimensional time-series data. , Indicates time Collected Sensor readings; Normalization of time series data with different dimensions, for the first... Data sequences of various sensors Normalization is performed using min-max linear scaling, and the transformation formula is as follows: ; in, and These represent the sensors. The minimum and maximum values in the training data; If a sensor is at time... If no valid readings are recorded, linear interpolation is used to fill in the gaps using data from adjacent time points to estimate the single-point missing data, as shown in the following formula: ; That is, the assumed time missing values It is the average of the values at consecutive time points; Obtain a clean and scale-uniform time series .
4. The safety monitoring method for a hydrogen-electric-oil-gas integrated energy station based on lightweight deep learning as described in claim 1, characterized in that, The LSTM encoder-decoder model includes an encoder and a decoder; The encoder is used to receive lengths of The input sequence is updated with its hidden state at each time step, and finally the information of the entire sequence is compressed into a hidden vector of fixed dimension. The decoder uses the encoder's hidden vector as initial conditions to generate an output sequence of the same length as the input, in order to reconstruct the original sequence.
5. The safety monitoring method for a hydrogen-electric-oil-gas integrated energy station based on lightweight deep learning as described in claim 4, characterized in that, The specific method for training the LSTM encoder-decoder model is as follows: Define the input sequence of the training samples, such as ,in, For sequence length, Indicates the first Input features at each time step; The encoder uses LSTM to progressively encode the input sequence to obtain the hidden state sequence. The state update relationship is expressed by the following formula: ; in, For the encoder in the first The hidden state of the step, This represents the state update function of the encoder LSTM. This is the set of parameters for the encoder; after... After the encoding steps, the final implicit representation can be obtained. ; The decoder uses context vectors As the initial state, the output sequence is gradually generated using LSTM. ; The formula for the hidden state recursion of the decoder is as follows: ; in, For the decoder in the The hidden state of the step, This represents the state update function of the decoder LSTM, with the following parameter set: ; Decoder initial hidden state command , Generated by the encoder, and Time can be ordered This is the initial input; Through the output mapping function Map the hidden states to the reconstructed output of the decoder. The formula is as follows: ; in This is the decoder's output mapping function, used to map the current hidden state. Convert to output The total number of trainable parameters of the model is denoted as . ; The training objective function is defined as the reconstruction error, and the mean squared error is used to measure the output sequence. Relative to the input sequence The reconstruction error is calculated using the following formula: ; in, Indicates the first The squared error between input and output at each time step; for a given time step containing The training set of sequence samples can be used to calculate the above error for each sample and take the average to evaluate the overall reconstruction error; By introducing an L2 regularization term into the loss function, a penalty constraint is imposed on the sum of squares of the model parameters. The overall loss function expression after introducing the regularization term is as follows: ; Among them, the first item That is, the reconstruction error, and the second term is the L2 regularization term. This represents the sum of squares of all parameters in the model. The regularization coefficient is used. The parameters are iteratively updated using gradient descent, adjusting along the negative gradient direction. The gradient descent update rule formula is as follows: until convergence or the preset number of iterations is reached: ; in, express Any trainable parameter in the dataset, Indicates the first The value of this parameter in the next iteration. The learning rate; due to the loss function Includes L2 regularization terms, which are applied to the parameter The gradient is increased based on the reconstructed error gradient. Therefore, the parentheses in the above formula contain the term. That is, the loss function with respect to The gradient.
6. The safety monitoring method for a hydrogen-electric-oil-gas integrated energy station based on lightweight deep learning as described in claim 5, characterized in that, The smallest lightweight model M1 is constructed using the following method: Collect normal operating condition data to form a training set Each variable was normalized and missing values were filled in in accordance with the online method. Using a sliding window to generate sample sequences And divide the training / validation subsets; Based on the LSTM encoder-decoder model, the reconstruction error is minimized on the training samples, and the model is trained to the validation set. Once convergence or a preset number of iterations is reached, the parameters and reconstruction error statistics are obtained. Continuing training based on the LSTM encoder-decoder model parameters, L1 and L2 regularization are introduced to obtain: ; in, This is the total loss after adding regularization. It is a set of model parameters. It is any weight parameter. These are L1 regularization coefficients. These are L2 regularization coefficients; An incremental strategy, starting small and gradually increasing, can be used to avoid instability in the early stages of training. The parameter matrix to be cut Calculate importance: ; in, This represents the weight matrix of the layer to be pruned. This represents the weight of the i-th row and j-th column. This indicates the importance measure of the weight; Using threshold Determine the pruning intensity: Directly prune. The connection; Construct the mask matrix: ; Perform pruning ,in, It is a binary mask. For indicator functions, The pruning threshold is represented by ⊙, which indicates element-wise multiplication. This is the weight matrix after pruning; the pruned weights are fixed to 0 to form a sparse connection structure. Optional node-level pruning further reduces the number of neurons, and the importance of nodes is calculated for hidden unit k: ; When satisfied Delete the hidden unit and its associated ingress and egress connections, and simultaneously update the dimensions of the relevant matrix. For the first The importance score of each neuron The weights flowing into this neuron, The weights that flow out of this neuron, The neuron pruning threshold; Fine-tuning training is performed using the pruned parameters as initial values, with indicators retained. Control updates: ; in, and These are the weights before and after the update, respectively. For learning rate, To calculate the gradient of the loss with respect to the weights, Indicates whether the connection should be retained; the fine-tuning stopping condition can be set to the preset proportion range of the validation set reconstruction error recovering to the error before pruning, or no significant decrease for several consecutive rounds; Calculate the online error on M1 using normal data: ; in, For M1 at time Reconstruction error, The abnormal threshold for M1 is determined based on the normal error distribution. And export M1 and .
7. The safety monitoring method for a hydrogen-electric-oil-gas integrated energy station based on lightweight deep learning as described in claim 5, characterized in that, The following is a method for constructing a lightweight model M2 of medium size: Select a normal dataset that covers a wider range of operations Sliding window generates samples ; Introduce a scaling vector at the hidden state output : ; in, Let be the hidden state vector of the LSTM at time t; This is the scaled-down hidden state; This is a vector of scaling factors; For the first The importance parameters of each hidden unit; For hidden state dimensions; Training using an objective function makes some Converging to near 0: ; in, The training objective is pruning; This is the weight decay coefficient; The sparsity intensity is the scaling factor; The closer to 0, the smaller the contribution of the corresponding hidden unit; Start training small To ensure the stability of the reconstruction, further increase... Promotes sparseness; To determine the pruning set, all statistics were collected. And sort them, set a threshold. get: ; in, To preserve the set of hidden units, To remove the set of hidden units, This is the pruning threshold; Perform structured trimming and reconstruction on all Delete the corresponding hidden dimension, making the hidden dimension ; Gating calculations include: ; in, These are the input gate, forget gate, output gate, and candidate gate, respectively. For input, This is the hidden state from the previous moment. For bias; Updated after cropping: ; in, For input dimensions, The hidden state dimension after clipping. This is the weight matrix after clipping; Fine-tune the training of the cropped model until the validation set error stabilizes, then perform statistical analysis using normal data. ; in, The reconstruction error of M2; The anomaly threshold for M2 is determined based on the error distribution. And export M2 and .
8. The safety monitoring method for a hydrogen-electric-oil-gas integrated energy station based on lightweight deep learning as described in claim 5, characterized in that, The largest lightweight model, M3, is constructed using the following method: Build a more comprehensive dataset covering different working conditions ; Training Teacher Model The teacher's reconstructed output is obtained by converging to the optimal value on the validation set. ; Design a student model M3 with a structure smaller than the teacher model but larger than M1 and M2, for the same input. Computational student reconstruction and read the teacher's output. ; The distillation objective is constructed and the student model is trained. The student reconstruction loss is as follows: ; in, The loss for students reconstructing real input; Reconstruct the output for students; To make the student model closer to the teacher model, the distillation loss is calculated first: ; in, For distillation losses, Reconstructing output for teachers; At this point, the comprehensive objective function for training the student model can be obtained: ; in, The overall goal of student training As a weighting factor, For the student parameter set, The regularization coefficient is... For any weight in the student network; The validation set can be used to adjust the fit to the input and to approximate the teacher model. If there is unnormalized output in the discrimination process, temperature and KL terms can be introduced: ; in, , Output the original values for the teacher model and the student model respectively. The distillation temperature. , It is a soft probability distribution; ; Indexed by category; At this time, you can Added as an add-on ; After distillation and convergence, only one type of product can be used. Make minor adjustments, then calculate the reconstruction error: ; In the formula, For M3 reconstruction error, The M3 anomaly threshold; determine And export M3 and .
9. The safety monitoring method for a hydrogen-electric-oil-gas integrated energy station based on lightweight deep learning as described in claim 1, characterized in that, The specific method for customizing the invocation strategy is as follows: Calculate the margin of reconstruction error relative to the threshold: ; Then map the margin to confidence level: ; in, This is the error margin; For single-test credibility; The mapping slope coefficient; The smaller the value, the closer it is to the threshold, and the more uncertain it is. when near hour, Smaller, less credible Relatively low; when When far from the threshold, The larger the value, the higher the credibility. When alarms or suspected faults are accurately identified after subsequent maintenance... At that time, the model output for this sample will be... Compare with the true labels and update the recent accuracy metric using an exponential moving average: ; in, For the recent accuracy estimate of M1, The moving average update coefficients are... For indicator functions; For real labels, Output for Mi; The overall cost of candidate invocation actions is quantified, and the optimal invocation path is selected based on this, starting from the current level. Switch to , , : ; in, Indicates at time From the current level Call to level Comprehensive evaluation indicators; , , , , Assign weights to each cost item; These are network usage costs, time usage costs, risk costs, computing power costs, and electricity usage costs. The conditions for converting M1 to M2 include: When the current sample is considered to be an uncertain sample, then, This is the credibility threshold; When; when the reconstruction error and threshold deviation fluctuate more within a preset time period; when it is found that the misjudgment rate of M1 increases, the accuracy decreases, or the error distribution drifts within the preset time window; The final decision on whether to upgrade to model M2 will be made based on a comprehensive evaluation index: ; in, To retrieve the comprehensive evaluation index of M2 from M1, To continue using the comprehensive evaluation index of M1, the transition from M1 to M2 will be triggered when this condition is met; The conditions for converting M1 to M3 include: Excellent network conditions, that is When; when the latency is too high, and That is, when the front-end review cannot meet the timeliness requirements; or when the front-end equipment is operating at a high risk. That is, when continuing to monitor the front end would increase the probability of equipment failure or the consequences of failure; In the above situations, a final decision will be made on whether to directly make a backend call, based on comprehensive evaluation indicators: ; in, To retrieve the comprehensive evaluation indicators of M3 from M1, To continue using the comprehensive evaluation of M1, M1→M3 is triggered when this condition is met, which is the transition from the front end to the back end. The conditions for converting M1 to M2 include: Even after review, the credibility is still insufficient, that is... The accuracy rate still decreased after review, that is... The operational risk of model M2 deployed at the front end is increased. This means that continued front-end monitoring would increase the time spent on the device, thereby increasing the probability of device failure or its consequences; or the computing power cost would be too high. That is, when the computational cost of M2 is too high; and when the network conditions are excellent, i.e. Time; front-end energy consumption cost That is, when the local energy consumption exceeds the budget; When the above situations occur, a final decision on whether to call the M3 model in the background will be made based on comprehensive evaluation indicators: ; in, To retrieve the comprehensive evaluation indicators of M3 from M2, To continue using the comprehensive evaluation of M2, the condition is met to trigger M2→M3, which is the transition from the front end to the back end.
10. The safety monitoring method for a hydrogen-electric-oil-gas integrated energy station based on lightweight deep learning as described in claim 1, characterized in that, Network occupancy cost The formula is as follows: ; in, To the amount of data sent; For the amount of data to be returned; , This is the link occupancy factor; when Usually , Significant increase; when When calling from the front end locally, you can , Approximately zero or ignored; Time cost The formula is as follows: ; in, The inference latency for calling model Mk; For scheduling delay; For network round-trip latency; For indicator functions; Introducing usage time The formula is as follows: ; in, This represents the time elapsed from initiating an action to obtaining a switch, expressed as a time cost. As its estimate; Computing cost The formula is as follows: ; in, , , These represent the CPU, GPU, and accelerator usage of the calling model Mk, respectively. , , This is the resource conversion factor; Electricity usage cost The formula is as follows: ; in, The energy consumption for inference of Mk This refers to the energy consumption of data transfer during cross-layer calls. For indicator functions; Operating risk costs ; ; in, Indicates the status of the front-end device or operating condition. The consequences and risk levels of front-end equipment malfunctions; This represents the probability of a front-end failure occurring within the specified time period. Weighting the consequences of misjudgment; For model Mk in the window Estimation of the probability of misjudgment; The formula for network availability metrics is as follows: ; in, Available bandwidth; For round-trip time; Packet loss rate; ; in, Indicates whether background calls are available; For indicator functions; This is the lower limit of bandwidth. This is the upper limit of latency; This is the maximum number of packets lost.