Deep learning prediction method and apparatus for voltage extremum difference of battery system

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By using the BiGRU-Transformer-LSTM model to predict the voltage range of the battery system through deep learning, the problem of unpredictable voltage range trends in the battery system is solved, thereby improving the safety and reliability of the battery system and extending the service life of the energy storage system.

WO2026138527A1PCT designated stage Publication Date: 2026-07-02CHINA ELECTRIC POWER RESEARCH INSTITUTE CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: WO · WO
Patent Type: Applications
Current Assignee / Owner: CHINA ELECTRIC POWER RESEARCH INSTITUTE CO LTD
Filing Date: 2025-12-11
Publication Date: 2026-07-02

Application Information

Patent Timeline

11 Dec 2025

Application

02 Jul 2026

Publication

WO2026138527A1

IPC: G06F30/27

AI Tagging

Technology Topics

Electrical battery Battery system

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing technologies cannot accurately predict future trends in battery system voltage differences, leading to increased inconsistency in battery packs and impacting system safety and lifespan.

Method used

A BiGRU-Transformer-LSTM model is used to perform deep learning prediction on battery operation data. By collecting battery operation data of battery clusters, the BiGRU layer is used to capture bidirectional dependency information, the Transformer layer is used to perform global feature modeling, and the LSTM layer is used to extract time dependency relationship, and finally the voltage range prediction result is generated.

Benefits of technology

It enables accurate prediction of the voltage difference of the battery system, improves the safety and reliability of the battery system, and extends the service life of the energy storage system.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN2025141830_02072026_PF_FP_ABST

Patent Text Reader

Abstract

Provided are a deep learning prediction method and apparatus for a voltage extremum difference of a battery system. The method comprises: collecting battery operation data of a battery cluster, the battery operation data comprising battery cell extremum difference data, battery cluster charging and discharging power data, and battery cluster total voltage data; and inputting the battery operation data into a pre-trained deep learning prediction model for prediction, so as to obtain a voltage extremum difference prediction result of the battery cluster within a preset future time, wherein the pre-trained deep learning prediction model is a BiGRU-Transformer-LSTM model.

Need to check novelty before this filing date? Find Prior Art

Description

Deep learning prediction method and device for battery system voltage range

[0001] This disclosure claims priority to Chinese patent application No. 202411902766.9, filed on December 23, 2024, the entire contents of which are incorporated herein by reference. Technical Field

[0002] This disclosure relates to the field of energy storage battery technology, and in particular to a deep learning prediction method and apparatus for battery system voltage differences. Background Technology

[0003] Currently, battery systems are widely used in electric vehicles, energy storage power stations, and other systems, and their safe and reliable operation is of great concern. During actual charging and discharging, the individual cells that make up a battery system will exhibit voltage differences due to individual variations in manufacturing processes, ambient temperature, and other factors. The difference between the maximum and minimum voltages of all individual cells in a battery pack at a given moment is called the voltage range.

[0004] Technically, abnormal fluctuations in voltage range exacerbate battery pack inconsistencies, impacting system safety and lifespan. However, the future trend of battery system voltage range changes cannot be obtained from the Battery Management System (BMS). Therefore, a method is urgently needed to accurately predict the trend of battery system voltage range changes, enabling early assessment and risk warning of the battery system's operating status. Summary of the Invention

[0005] This disclosure provides a deep learning method and apparatus for predicting voltage range in a battery system.

[0006] In a first aspect, a deep learning prediction method for the voltage range of a battery system is provided, the method comprising the following steps.

[0007] Collect battery operation data of the battery cluster; the battery operation data includes individual cell range data, battery cluster charge and discharge power data, and battery cluster total voltage data.

[0008] The battery operating data is input into a pre-trained deep learning prediction model to predict the voltage range of the battery cluster within a preset time period.

[0009] The pre-trained deep learning prediction model is a BiGRU-Transformer-LSTM model.

[0010] The BiGRU-Transformer-LSTM model comprises a BiGRU layer, a Transformer layer, an LSTM layer, a fully connected layer, and an activation function connected in sequence.

[0011] The BiGRU layer is used to encode the input time series data in both forward and reverse directions, capture bidirectional dependency information, and output feature sequences.

[0012] The Transformer layer receives the output of the BiGRU layer, uses a multi-head self-attention mechanism to perform global feature modeling on all positions of the time series, and outputs the feature sequence after global feature modeling.

[0013] The LSTM layer is a two-layer LSTM layer, consisting of a first LSTM layer and a second LSTM layer. The first LSTM layer is used to further extract the temporal dependencies of the feature sequence after global feature modeling and output the time step features after processing by the first LSTM layer. The second LSTM layer is used to receive the output of the first LSTM layer and combine it with the long-term dependency features to generate the final output features.

[0014] The fully connected layer and the activation function are used to map the final output features of the LSTM layer to the prediction result.

[0015] In some embodiments, the pre-trained deep learning prediction model is obtained through the following steps.

[0016] Obtain historical datasets of battery clusters in energy storage power stations. The historical datasets include: individual cell range data, battery cluster charge and discharge power data, and battery cluster total voltage data. Divide the historical datasets into N parts according to time order to obtain N subsets of historical data.

[0017] Use N-1 subsets of the N historical data sets as the training set and the remaining 1 subset as the validation and test set; establish an identical initial deep learning prediction model for each of the N-1 historical data subsets in the training set.

[0018] For each subset of historical data in the training set, an initial deep learning prediction model is trained iteratively to obtain N-1 trained initial deep learning prediction models.

[0019] Each network parameter in the N-1 trained initial deep learning prediction models is weighted and averaged according to the performance on the validation set. The weights are determined by the inverse ratio of the prediction error of each trained initial deep learning prediction model on the validation set. The weighted average parameter values are then assigned to the deep learning prediction model to generate a pre-trained deep learning prediction model.

[0020] In some embodiments, both the first LSTM layer and the second LSTM layer include a gating mechanism.

[0021] In some embodiments, the BiGRU-Transformer-LSTM model further includes a dynamic smoothing layer, which is used to add dynamic smoothing processing after the fully connected layer and activation function to further eliminate noise in the prediction results and optimize the stability of the prediction curve to obtain the final prediction results.

[0022] In some embodiments, the activation function uses the LeakyReLU activation function.

[0023] In some embodiments, the BiGRU layer is used to receive input time series data, perform forward and reverse encoding, capture the bidirectional dependency information of the time series, and output a feature sequence H. BiGRU The input time series data is X = {x1, x2, ..., xT}, where T represents the length of the time step and xt is the input feature vector of the t-th time step.

[0024] H BiGRU =[h1,h2,...,h T ]

[0025] Where ht is the bidirectional feature representation generated at time step t;

[0026] The Transformer layer receives the feature sequence HBiGRU output from the BiGRU layer, uses a multi-head self-attention mechanism to perform global feature modeling on all positions of the time series, and outputs the feature sequence H after global feature modeling. Transformer ;

[0027] in, This represents the global feature representation at time step t after processing by the Transformer layer;

[0028] The first LSTM layer is used to receive the feature sequence H from the Transformer layer. Transformer Furthermore, the time dependencies of the feature sequences after global feature modeling are extracted, and the time step features L1 after the first LSTM layer are output; L1 = LTML1(H Transformer )

[0029] The second LSTM layer receives the output L1 of the first LSTM layer and combines it with long-term dependency features to generate the final output feature L2; L2 = LSTML2(L1).

[0030] In some embodiments, the prediction results are generated by a fully connected layer and the LeakyReLU activation function: Y = LeakReLU(FC(L2)).

[0031] Secondly, a deep learning prediction device for battery system voltage range is provided, the device including a data acquisition module and a prediction module.

[0032] The acquisition module is configured to acquire battery operating data of the battery cluster; the battery operating data includes individual cell range data, battery cluster charge and discharge power, and battery cluster total voltage data.

[0033] The prediction module is configured to input the battery operating data into a pre-trained deep learning prediction model to predict the voltage range of the battery cluster within a preset time period.

[0034] The pre-trained deep learning prediction model is a BiGRU-Transformer-LSTM model.

[0035] The BiGRU-Transformer-LSTM model comprises a BiGRU layer, a Transformer layer, an LSTM layer, a fully connected layer, and an activation function connected in sequence.

[0036] The BiGRU layer is used to encode the input time series data in both forward and reverse directions, capture bidirectional dependency information, and output feature sequences.

[0037] The Transformer layer receives the output of the BiGRU layer, uses a multi-head self-attention mechanism to perform global feature modeling on all positions of the time series, and outputs the feature sequence after global feature modeling.

[0038] The LSTM layer is a two-layer LSTM layer, consisting of a first LSTM layer and a second LSTM layer. The first LSTM layer is used to further extract the temporal dependencies of the feature sequence after global feature modeling and output the time step features after processing by the first LSTM layer. The second LSTM layer is used to receive the output of the first LSTM layer and combine it with the long-term dependency features to generate the final output features.

[0039] The fully connected layer and the activation function are used to map the final output features of the LSTM layer to the prediction result.

[0040] In some embodiments, the pre-trained deep learning prediction model is obtained through the following steps.

[0041] Obtain historical datasets of battery clusters in energy storage power stations. The historical datasets include: individual cell range data, battery cluster charge and discharge power data, and battery cluster total voltage data. Divide the historical datasets into N parts according to time order to obtain N subsets of historical data.

[0042] Use N-1 subsets of the N historical data sets as the training set and the remaining 1 subset as the validation and test set; establish an identical initial deep learning prediction model for each of the N-1 historical data subsets in the training set.

[0043] For each subset of historical data in the training set, an initial deep learning prediction model is trained iteratively to obtain N-1 trained initial deep learning prediction models.

[0044] Each network parameter in the N-1 trained initial deep learning prediction models is weighted and averaged according to the performance on the validation set. The weights are determined by the inverse ratio of the prediction error of each trained initial deep learning prediction model on the validation set. The weighted average parameter values are then assigned to the deep learning prediction model to generate a pre-trained deep learning prediction model.

[0045] In some embodiments, both the first-layer LSTM and the second-layer LSTM include a gating mechanism.

[0046] In some embodiments, the BiGRU-Transformer-LSTM model further includes a dynamic smoothing layer, which is used to add dynamic smoothing processing after the fully connected layer and activation function to further eliminate noise in the prediction results and optimize the stability of the prediction curve to obtain the final prediction results.

[0047] In some embodiments, the activation function uses the LeakyReLU activation function.

[0048] In some embodiments, the BiGRU layer is used to receive input time series data, perform forward and reverse encoding, capture the bidirectional dependency information of the time series, and output a feature sequence H. BiGRU The input time series data is X = {x1, x2, ..., xT}, where T represents the length of the time step, and xt is the input feature vector of the t-th time step; H BiGRU =[h1,h2,...,h T ]

[0049] Where ht is the bidirectional feature representation generated at time step t;

[0050] The Transformer layer receives the feature sequence HBiGRU output from the BiGRU layer, uses a multi-head self-attention mechanism to perform global feature modeling on all positions of the time series, and outputs the feature sequence H after global feature modeling. Transformer ;

[0051] in, This represents the global feature representation at time step t after processing by the Transformer layer;

[0052] The first LSTM layer is used to receive the feature sequence H from the Transformer layer. Transformer Furthermore, the time dependencies of the feature sequences after global feature modeling are extracted, and the time step features L1 after the first LSTM layer are output; L1 = LTML1(H Transformer )

[0053] The second LSTM layer receives the output L1 of the first LSTM layer and combines it with long-term dependency features to generate the final output feature L2; L2 = LSTML2(L1).

[0054] In some embodiments, the prediction results are generated by a fully connected layer and the LeakyReLU activation function: Y = LeakReLU(FC(L2)).

[0055] Thirdly, an electronic device is provided, the electronic device including a processor and a memory, the processor being configured to execute a computer program stored in the memory to implement a deep learning prediction method for the voltage range of the battery system.

[0056] Fourthly, a computer-readable storage medium is provided, the computer-readable storage medium storing at least one instruction, which, when executed by a processor, implements a deep learning prediction method for the voltage range of the battery system. Attached Figure Description

[0057] To more clearly illustrate the technical solutions in this disclosure, the accompanying drawings used in some embodiments of this disclosure will be briefly described below. Obviously, the drawings described below are merely drawings of some embodiments of this disclosure, and those skilled in the art can obtain other drawings based on these drawings.

[0058] Figure 1 is a flowchart of a deep learning prediction method for battery system voltage range according to an embodiment of the present disclosure.

[0059] Figure 2 is a diagram illustrating the training process of a BiGRU-Transformer-LSTM model according to an embodiment of the present disclosure.

[0060] Figure 3 is a flowchart of another deep learning prediction method for battery system voltage range according to an embodiment of the present disclosure.

[0061] Figure 4 is a comparison of prediction results of different neural network models according to the embodiments of this disclosure.

[0062] Figure 5 is a block diagram of a deep learning prediction device for battery system voltage range according to an embodiment of the present disclosure.

[0063] Figure 6 is a structural diagram of an electronic device according to the present disclosure. Detailed Implementation

[0064] To enable those skilled in the art to better understand the technical solutions of the embodiments of this disclosure, the technical solutions of this disclosure will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this disclosure, and not all embodiments. Based on the embodiments of this disclosure, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this disclosure.

[0065] In the description disclosed in this application, unless otherwise stated, the words "first," "second," etc. do not limit the quantity or order of execution, and the words "first," "second," etc., do not necessarily mean that they are different.

[0066] The slash " / " means "or". For example, A / B can mean either A or B. In this article, "and / or" is simply a way of describing the relationship between related objects, indicating that there can be three relationships. For example, A and / or B can mean: only A, only B, and A and B.

[0067] The present disclosure will now be described in detail with reference to the accompanying drawings and embodiments. It should be noted that, unless otherwise specified, the embodiments and features described herein can be combined with each other.

[0068] The following detailed description is exemplary and intended to provide further detailed explanation of this disclosure. Unless otherwise specified, all technical terms used in this disclosure have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains. The terminology used in this disclosure is for the purpose of describing particular embodiments only and is not intended to limit the exemplary embodiments according to this disclosure.

[0069] In some technologies, if the number of individual battery cells is large, obtaining the voltage difference becomes overly reliant on the hardware and software system, inevitably affected by noise and interference signals. In such cases, if the measured battery system voltage difference exceeds the limit before performing balancing and protection actions, the system protection requirements will be far from met. Furthermore, the actual BMS voltage balancing function (including active and passive balancing strategies) is not frequently executed; it is generally performed at specific stages, such as the final stage of charging. Frequent and excessive fluctuations and exceeding of limits in the battery voltage difference pose a threat to the safe and reliable operation of the battery system. Therefore, predicting and assessing voltage difference fluctuations in the battery system to anticipate and control risks in advance can effectively ensure the safe operation of the battery system.

[0070] Predicting voltage range is inseparable from predicting the voltage of individual cells in the battery system. However, numerous factors influence the voltage variation of individual cells, including charge / discharge power, total voltage, temperature, impedance, State of Charge (SOC), and State of Health (SOH). Therefore, predicting these voltage ranges using traditional modeling methods is extremely difficult.

[0071] Data-driven prediction techniques have become the preferred method for voltage range prediction. The most common data-driven technique is deep learning (DL). Deep learning's powerful learning capabilities enable it to better process time-series data and understand battery dynamics, and it has been successfully applied to the field of energy storage battery state prediction, including estimations of SOH, Remaining Useful Life (RUL), and SOC. However, deep learning algorithms still have some challenges in battery state prediction, such as prediction stability, generalization ability, and transferability, making online applications difficult.

[0072] To address the issue of voltage range variation in battery systems, this disclosure provides a deep learning method for predicting battery system voltage range variation. By preprocessing battery system operating data and using voltage range variation as the prediction target, feature engineering is constructed. The data features are divided into N parts, with N-1 parts used for model training and the remaining part used for validation and testing. In each training round, a deep learning prediction model is constructed, and the optimal parameters of the model are obtained through an optimization algorithm. A corresponding voltage range variation prediction model library is then trained. By intelligently selecting and calling model files, online, multi-step prediction is achieved, and a comprehensive evaluation of the battery system voltage consistency status can be performed.

[0073] This disclosure presents a corresponding prediction device, which includes modules for data acquisition and transmission, data preprocessing and storage, and data analysis and status assessment. This disclosure provides high accuracy and generalization ability in predicting voltage range differences, helping to extend the service life of various energy storage systems and improve their safety and reliability.

[0074] This disclosure provides a deep learning prediction method for battery system voltage range, belonging to the field of energy storage battery technology. The method includes the following steps.

[0075] 1. Obtain historical data of the battery pack (cluster); the historical data includes at least: charge and discharge power, total voltage and voltage range; perform data preprocessing on the historical data of the battery pack (cluster), create feature engineering around the target value, and establish a deep learning prediction model with the charge and discharge power and total voltage of the battery pack (cluster) as input and the voltage range as the prediction target.

[0076] 2. Based on the characteristics of the operational data (including historical charge / discharge power, total voltage, and time cycle characteristics), the historical data of the battery pack (cluster) is divided into N parts. N-1 parts are used for training the deep learning prediction model, and the remaining part is used as a validation and test set. A total of N rounds of training are conducted to obtain N model files. Simultaneously, during the N rounds of model training, the network model parameters are continuously adjusted to achieve the optimal prediction effect, and the corresponding models are saved.

[0077] 3. Target prediction is performed using N models that comprehensively reflect different data characteristics. Based on the current measured voltage range information, the model input and other adjustable parameters are continuously adjusted to achieve rolling optimization of the prediction model parameters, ultimately enabling the prediction of voltage range over a future period. On this basis, a comprehensive scoring mechanism is constructed by combining historical voltage range data, standard deviation, and target predicted values to achieve voltage consistency assessment of battery clusters.

[0078] This method has good prediction accuracy and generalization ability, and can quickly and accurately predict voltage range for decision-making reference, which helps to extend the service life of energy storage systems and improve their safety and reliability.

[0079] In some embodiments, as shown in Figures 1 and 2, this disclosure provides a deep learning prediction method for battery system voltage range, the method comprising steps 1 to 4.

[0080] In step 1: Historical data is preprocessed and divided into N parts. A historical dataset of the energy storage power station's battery clusters is obtained, including: individual cell range data, battery cluster charge / discharge power data, and battery cluster total voltage data. The obtained data on individual cell range, battery cluster charge / discharge power, and battery cluster total voltage are preprocessed to obtain a preprocessed historical dataset.

[0081] In some embodiments, preprocessing may be implemented by removing duplicate data, handling missing values, and detecting and handling outliers. However, those skilled in the art will understand that other methods can also be used to preprocess the raw data.

[0082] In some embodiments, deduplication can be achieved by identifying duplicate records using timestamps and historical data values, and then deleting the duplicate data.

[0083] In some embodiments, missing values can be handled by using linear interpolation to fill in missing data points.

[0084] In some embodiments, outlier detection and handling can be achieved by applying the Z-score method to identify numerical outliers, adjusting or deleting them to reduce noise and potentially misleading analysis.

[0085] In some embodiments, considering the time-series characteristics of the feature quantity, the preprocessed historical data can be divided into N parts according to time order to ensure that each part of the data is continuous in time.

[0086] In step 2: Prediction model construction. N-1 subsets of the N historical data are used as the training set for training the prediction model, and the remaining 1 subset is used as the validation and test set. An initial deep learning prediction model is built for each of the N-1 historical data subsets in the training set, resulting in N-1 identical initial deep learning prediction models.

[0087] In step 3: Model training and validation. An initial deep learning prediction model is iteratively trained based on each subset of historical data in the training set until the preset maximum number of training iterations or the loss function converges, completing the training. After successful validation using validation and test sets, a trained initial deep learning prediction model is obtained. N-1 trained initial deep learning prediction models are obtained by training using N-1 subsets of historical data.

[0088] In step 4: Model transfer learning and model parameter self-update. Each network parameter in the N-1 pre-trained initial deep learning prediction models is weighted and averaged based on the validation set performance. The weights are determined inversely proportional to the prediction errors of each model on the validation set. The weighted average parameter values are then calculated and assigned to the deep learning prediction model to generate a pre-trained deep learning prediction model.

[0089] Utilizing the latest battery operation data (individual cell voltage range data, battery cluster charge / discharge power data, and battery cluster total voltage data) from the energy storage power station's battery clusters, a transfer learning mechanism is used to optimize the parameters of a pre-trained deep learning prediction model. The final prediction model's structure and weights are dynamically adjusted to adapt to the temporal characteristics of voltage range fluctuations. Based on this, and combining operational data features (including historical charge / discharge power, total voltage, and time period characteristics), multi-step predictions of future voltage range values are completed, ultimately achieving rolling model updates and dynamic evaluation of the battery system's voltage consistency.

[0090] The embodiments disclosed herein effectively improve the generalization ability of the model through weighted averaging. This is because the relationship between the features of the time series and the prediction target changes with complex operating conditions. Therefore, the N-fold crossover and weighted averaging method employed can improve the model's generalization ability.

[0091] The deep learning prediction model in some embodiments of this disclosure can be implemented using a BiGRU-Transformer-LSTM model. This BiGRU-Transformer-LSTM model is not a single, standardized deep learning model, but rather a hybrid model proposed in this disclosure to address technical problems. It combines three different architectures: a Bidirectional Gated Recurrent Unit (BiGRU), a Transformer model, and a Long Short-Term Memory (LSTM) network. This hybrid model leverages the strengths of each component to solve specific tasks. The following is an introduction to these three components and how they can be combined.

[0092] BiGRU is a variant of GRU (Gated Recurrent Unit) that introduces a bidirectional structure to better capture the bidirectional dependencies in sequence data. It consists of two independent GRU units: one processes the data in the forward direction of the time series, and the other processes it in the reverse direction. Through this bidirectional structure, BiGRU can simultaneously capture both forward and backward information from the sequence data, thereby enhancing its ability to understand and predict patterns within the sequence.

[0093] The Transformer is a model based on a self-attention mechanism. It abandons the recurrent structure of Recurrent Neural Networks (RNNs) and relies entirely on self-attention to process sequential data. The Transformer can simultaneously focus on all positions in a sequence, thus capturing global information and effectively capturing long-term dependencies. Furthermore, the Transformer has strong parallelization capabilities, which can significantly improve the training speed of the model. This has enabled the Transformer to achieve remarkable results in tasks such as natural language processing (e.g., machine translation).

[0094] LSTM is a special type of recurrent neural network (RNN) designed to address the vanishing and exploding gradient problems encountered by traditional RNNs when processing long sequences. LSTM uses gating mechanisms (including input gates, forget gates, and output gates) to control the flow of information, thereby effectively capturing long-term dependencies. This makes LSTM perform well when processing long sequences of data.

[0095] The deep learning prediction model in some embodiments of this disclosure combines three architectures—BiGRU, Transformer, and LSTM—to construct a hierarchical time series prediction model. First, the BiGRU layer extracts local information from the sequence through forward and backward encoding, capturing bidirectional dependencies. Next, the Transformer layer utilizes a multi-head self-attention mechanism to generate global feature representations, enabling parallel processing of all positions in the sequence and capturing long-range dependencies. Then, a two-layer LSTM layer further processes the global features, capturing long-term dependencies and enhancing the model's memory capacity and robustness. Next, a fully connected layer maps the output of the LSTM layer to the target prediction space, while simultaneously enhancing nonlinear expressiveness through the LeakyReLU activation function. Finally, a dynamic smoothing layer (using a moving average method) smooths the output to eliminate short-term fluctuations in the prediction, providing more stable and accurate prediction results. This multi-layered, comprehensive architecture enables the model to better capture multidimensional information in time series data, improving prediction accuracy and robustness.

[0096] In some embodiments, the BiGRU-Transformer-LSTM model includes the following layers connected in sequence.

[0097] 1) BiGRU Layer: Used to encode the input time series data in both forward and backward directions to capture bidirectional dependency information and output a feature sequence. The output feature sequence contains both past and future relationship features of the time series, improving the ability to understand sequence patterns. The feature sequence output by the BiGRU layer encodes the local temporal dependencies of the time series data, providing contextual information of the time series data for the Transformer and LSTM. As a bidirectional recurrent neural network (RNN), the BiGRU layer helps capture past and future dependencies through forward and backward processing; while the Transformer and LSTM are responsible for capturing global features and long-term dependencies, respectively. The entire process forms a hierarchical and progressive feature extraction and processing flow, thereby enhancing the model's predictive ability for time series data.

[0098] 2) Transformer Layer: This layer receives the output of the BiGRU layer and uses a multi-head self-attention mechanism to model global features across all positions in the time series, outputting the Transformer global features. The self-attention mechanism captures long-range dependencies and stabilizes these features through residual connections and normalization. Powerful parallel computing capabilities significantly improve training efficiency and enhance the ability to represent features in complex data. The output of the Transformer layer is the global feature representation after modeling features across all positions in the time series using the multi-head self-attention mechanism. This ultimately generates a tensor of (batch_size, sequence_length, hidden_size_gru*2) dimensions. This output incorporates global contextual information, enabling the model to capture long-term dependencies and global relationships between different time steps in the time series within the features of each time step. This process allows the model to more comprehensively understand complex patterns in time series data and provides richer input features for subsequent LSTM layers.

[0099] 3) LSTM Layer: The LSTM layer is a two-layer LSTM layer, consisting of a first LSTM layer and a second LSTM layer. The first LSTM layer is used to further extract the temporal dependencies of the global features of the Transformer. The second LSTM layer is used to combine long-term dependency features to generate smooth and memory-capable output features. Each LSTM layer includes a gating mechanism to enhance robustness to noise and complex patterns. The role of the first LSTM layer is to further extract and model the local temporal dependencies of the time series from the global features output by the Transformer. It not only focuses on short-term changes and dynamics in the sequence but also captures the gradually evolving trends, providing more detailed and ordered feature information for the subsequent second LSTM layer, ultimately helping to generate more accurate output features. Long-term dependency features refer to patterns and trends that the model needs to understand and remember over a longer time span. Usually, this dependency goes beyond the scope of local information and requires the model to recall and integrate information from long-term series. In this model, the formation of long-term dependency features mainly depends on the following aspects.

[0100] Global features of Transformer: Although Transformer primarily focuses on modeling global features, its multi-head self-attention mechanism can capture long-range dependencies between different time steps, i.e., relationships spanning long periods of time. These global features contain patterns and regularities in time series that may span long periods of time, such as seasonal variations and long-term trends.

[0101] The first layer of LSTM handles local dependencies: The first layer of LSTM captures the gradual evolution of information by processing the local dependencies of the time series (i.e., short-term and medium-term dynamic changes), but it does not overemphasize long-term trends. The feature sequence it outputs is a gradual transition from local dependencies to medium-term dependencies, and these features themselves may contain some inherent relationships from short-term to medium-term.

[0102] The long-term memory capability of the second-layer LSTM: The second-layer LSTM receives the output features from the first-layer LSTM and continues to process these features, focusing on dependencies over longer time spans. LSTMs possess strong long-term memory capabilities, particularly in their cellular and hidden states, retaining and updating information accumulated over time. This information can include long-term periodic trends, long-term change patterns, and slowly evolving fluctuations in time series.

[0103] 4) Fully connected layers and activation functions: These are used to map the final output of the LSTM layer to the prediction result. The LeakyReLU activation function is used to handle nonlinear relationships and improve the model's adaptability to diverse inputs.

[0104] 5) Dynamic smoothing layer: Used to add dynamic smoothing processing after the prediction output layer (fully connected layer and activation function) to further eliminate noise in the prediction results and optimize the smoothness of the prediction curve to obtain the final prediction result.

[0105] In some embodiments, the outputs of the layers included in the BiGRU-Transformer-LSTM model are as follows.

[0106] 1. Output of the BiGRU layer: The BiGRU layer receives the input time series data and performs forward and reverse encoding to capture the bidirectional dependency information of the time series. Assume the input data is a time series X = {x1, x2, ..., x...} T}, where T represents the length of the time step, x t It is the input feature vector at time step t. The BiGRU layer will process x at each time step. t Output a feature representation h t After forward and reverse processing, the feature sequence H output by BiGRU is... BiGRU Yes: H BiGRU =[h1,h2,...,h T ]

[0107] Among them, h t It is a bidirectional feature representation generated at time step t, which may be a vector with dimension d. BiGRU This indicates that the BiGRU layer captures the bidirectional dependency of this time step.

[0108] 2. Output of the Transformer layer: The Transformer layer receives the feature sequence H output by the BiGRU layer. BiGRU A multi-head self-attention mechanism is used to model global features at all positions in the time series, generating global feature representations. Through self-attention, the Transformer layer can capture the relationship between each time step and other time steps, generating new feature representations. Assume the output of the Transformer layer is H. Transformer It will be the feature sequence H after global modeling. Transformer .

[0109] in, This represents the global feature representation of time step t after processing by the Transformer layer, with dimension dTransformer, representing the global context information of that time step.

[0110] 3. Output of the first LSTM layer: The first LSTM layer receives the output H from the Transformer layer. Transformer Furthermore, the time dependencies of the feature sequences after global feature modeling are extracted, and the time-step features after processing by the first LSTM layer are output. Assume the output of the first LSTM layer is L1, which is the time-step feature representation after processing by the first LSTM layer. L1 = LTML1(H Transformer )

[0111] For example, the first LSTM layer utilizes historical information from each time step to capture more complex temporal dependencies. Therefore, the output L1 of the first LSTM layer is still a time-series feature matrix with shape (T, d). LSTM ), where d LSTM It is the output dimension of the LSTM unit.

[0112] 4. Output of the second LSTM layer: The second LSTM layer receives the output L1 from the first LSTM layer and combines it with long-term dependency features to generate the final output features. The output of the second LSTM layer is L2, which includes further processing of the features from the first LSTM layer and has stronger long-term dependency capabilities. L2 = LSTML2(L1)

[0113] Similarly, the output L2 of the second LSTM layer is also a time series feature matrix with shape (T, d). LSTM ), representing the feature representation after two layers of LSTM processing.

[0114] 5. Final Output (after Fully Connected Layer): After processing by the LSTM layer, a fully connected layer is typically used to map the final predicted value. The final output is generated by the fully connected layer and the LeakyReLU activation function, and further optimized by a moving average smoothing layer. Y = LeakReLU(FC(L2))

[0115] Here, Y represents the final output characteristic, which is usually a scalar value representing the prediction result of voltage range fluctuation.

[0116] Summarize:

[0117] 1. Output of the BiGRU layer: H BiGRU =[h1,h2,...,h T ] represents the bidirectional feature of each time step.

[0118] 2. Output of the Transformer layer: It captures global contextual information, enhancing long-term modeling.

[0119] 3. Output of the first LSTM layer: L1 = LSTML1(H Transformer It extracts the local time dependency features of time series.

[0120] 4. The output of the second LSTM layer: L2 = LSTML2(L1), which combines information from a long period of time and further enhances the feature representation.

[0121] 5. Final output: Y = LeakReLU(FC(L2)), which is the final prediction result of the model.

[0122] The output of each layer is an enhancement of the processing result of the previous layer, helping the model to better capture and model local and global features and long-term dependencies in time series.

[0123] As shown in Figure 3, this disclosure provides a deep learning prediction method for battery system voltage range, which includes S1 and S2.

[0124] S1. Collect battery operation data of the battery cluster; the battery operation data includes individual cell range data, battery cluster charge and discharge power data and battery cluster total voltage data.

[0125] S2. Input the battery operation data into a pre-trained deep learning prediction model to predict the voltage range prediction result of the battery cluster within a preset time period.

[0126] The pre-trained deep learning prediction model is a BiGRU-Transformer-LSTM model.

[0127] In some embodiments, the pre-trained deep learning prediction model is trained through the following steps: Obtaining a historical dataset of battery clusters in an energy storage power station, the historical dataset including: individual cell range data, battery cluster charge / discharge power data, and battery cluster total voltage data; dividing the historical dataset into N parts according to time sequence to obtain N subsets of historical data.

[0128] Use N-1 subsets of the N historical data sets as the training set, and the remaining 1 subset as the validation and test set; build an identical initial deep learning prediction model for each of the N-1 historical data subsets in the training set.

[0129] For each subset of historical data in the training set, an initial deep learning prediction model is iteratively trained to obtain a well-trained initial deep learning prediction model; N-1 well-trained initial deep learning prediction models are obtained by training N-1 subsets of historical data.

[0130] Each network parameter in the N-1 trained initial deep learning prediction models is weighted and averaged according to the performance on the validation set. The weights are determined by the inverse ratio of the prediction errors of each trained initial deep learning prediction model on the validation set. The weighted average parameter values are then assigned to the deep learning prediction model to generate a pre-trained deep learning prediction model.

[0131] Weighted average: Instead of a simple arithmetic average, a weighted average is calculated based on the performance of each model on the validation set (such as validation loss).

[0132] Where, θ final The parameter value is the weighted average, ω i Let θ be the network parameter weights of the i-th initial deep learning prediction model. i Let be the network parameters of the i-th initial deep learning prediction model, and be the prediction error. i Let be the prediction error of the i-th initial deep learning prediction model.

[0133] In some embodiments, the BiGRU-Transformer-LSTM model includes a BiGRU layer, a Transformer layer, an LSTM layer, a fully connected layer, and an activation function connected in sequence.

[0134] In some embodiments, the BiGRU layer is used to encode the input time series data in both forward and reverse directions, capture bidirectional dependency information, and output a feature sequence.

[0135] The Transformer layer receives the output of the BiGRU layer, uses a multi-head self-attention mechanism to perform global feature modeling on all positions of the time series, and outputs the feature sequence after global feature modeling.

[0136] The LSTM layer is a two-layer LSTM layer, consisting of a first LSTM layer and a second LSTM layer. The first LSTM layer is used to further extract the temporal dependencies of the feature sequence after global feature modeling and output the time step features after processing by the first LSTM layer. The second LSTM layer is used to receive the output of the first LSTM layer and combine it with the long-term dependency features to generate the final output features.

[0137] Fully connected layers and activation functions: used to map the final output features of the LSTM layer to the prediction results.

[0138] In some embodiments, both the first-layer LSTM and the second-layer LSTM include a gating mechanism.

[0139] In some embodiments, the BiGRU-Transformer-LSTM model further includes a dynamic smoothing layer, which is used to add dynamic smoothing processing after the fully connected layer and activation function to further eliminate noise in the prediction results and optimize the stability of the prediction curve to obtain the final prediction results.

[0140] In some embodiments, the activation function uses the LeakyReLU activation function.

[0141] This disclosure proposes a BiGRU-Transformer-LSTM model. While BiGRU, Transformer, and LSTM each have unique advantages, using only one model may not fully utilize all the information in the data when facing complex tasks. Therefore, this disclosure innovatively combines these three models to construct a hybrid model to fully leverage their respective strengths.

[0142] The embodiments of this disclosure are simulated and compared with the BiGRU-Transformer-LSTM model in the embodiments of this disclosure, as well as the prediction effects of LSTM, BiGRU-Transformer, and CNN-LSTM-AM in some technologies on predicting future voltage range values. See Table 1 for a detailed comparison.

[0143] Table 1 Evaluation of Model Prediction Performance

[0144] As can be seen from Table 1, the BiGRU-Transformer-LSTM model in this embodiment has the smallest root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE). 2 maximum.

[0145] Figure 4 shows a comparison of the predicted voltage range under different methods. As shown in Figure 4, a comprehensive comparison shows that the BiGRU-Transformer-LSTM model proposed in this embodiment has better prediction performance.

[0146] The deep learning prediction method for battery system voltage range provided in this embodiment further includes step 5.

[0147] In step 5: Voltage consistency assessment. Based on the final prediction results, the current battery voltage consistency status is assessed, and the charge / discharge strategies for each battery cluster are optimized.

[0148] In some embodiments, voltage consistency assessment includes: voltage extreme value offset rate β: the voltage range of all individual cells at each sampling point divided by the average voltage, which can reflect problems such as battery aging and insufficient capacity.

[0149] In the formula, k is the sampling time, and u is the voltage of a single cell. This represents the average voltage. Δu k This represents the difference between the maximum and minimum voltage of a single cell in the battery cluster at time k (the value of a single cell is not fixed and is calculated based on the current maximum and minimum voltage values). k_max u k_min These represent the maximum and minimum voltage values of individual cells in the battery cluster at time k, respectively.

[0150] Overall voltage offset δ: The standard deviation of the voltage of all individual cells at each sampling point, which reflects the overall performance of the battery cluster.

[0151] In the formula, i is the battery number, n is the number of individual cells in the battery cluster, and u i Let i be the voltage of the single cell. This represents the average voltage of a single cell.

[0152] Comprehensive scoring system: Based on the individual scores of β and δ (refer to Table 2), a comprehensive score is given in a 1:1 ratio, corresponding to four levels: "Good (≥80), Average (≥70 and <80), Pass (≥60 and <70), and Unsatisfactory (<60)".

[0153] Table 2. Reference Standards for Indicator Scoring

[0154] Note: The scoring criteria in Table 2 are reference values and can be adjusted according to the actual system. The weights of the two indicators can be adjusted appropriately; for example, the weight of β can be increased to focus on voltage limit exceedance issues. This embodiment addresses the voltage range problem. The voltage range refers to the difference between the maximum and minimum values of the individual cell voltage at a certain moment. Therefore, the voltage range is a key indicator in voltage consistency analysis.

[0155] By analyzing the cumulative voltage range of individual battery cells over a certain period of time and comparing it with a large amount of data, the frequency of abnormal voltage states of individual battery cells can be determined, and it can be determined whether the individual battery cell has developed an abnormal working state due to long-term operation.

[0156] The inter-cell voltage difference ΔU t 1. Cumulative range voltage of individual battery cells ΔU id Voltage standard deviation δ id A voltage consistency index. The calculation method is as follows: ΔU t =U t_max -U t_min ΔU id =U id_max -U id_min

[0157] In the formula: U t_max U t_min These represent the highest and lowest voltage values among all cells in the sample data at time t; U id_max U id_min They are respectively numbered i d The maximum and minimum voltage values of a single battery cell within the statistical period T; U id_t For number i d The voltage of a single battery cell at time t; For number i d The average voltage value of a single battery cell within the statistical period T; n is the number of samples at time t.

[0158] Range voltage characterizes the maximum voltage difference between different battery cells at the same time, reflecting the voltage consistency between battery cells; cumulative range voltage characterizes the voltage fluctuation of a single battery cell over a period of time, reflecting the voltage consistency of battery cells after a period of operation; voltage standard deviation reflects the voltage dispersion between battery cells.

[0159] This disclosure provides a deep learning prediction method for battery system voltage range. This method can predict the voltage range fluctuations of multiple battery clusters in an energy storage power station and perform voltage consistency assessment. It can minimize the adverse effects on the safety and lifespan of battery clusters caused by voltage inconsistency issues such as voltage exceeding limits, thereby improving the safety and reliability of the energy storage power station.

[0160] This method combines multiple cross-validation and model transfer learning to improve the model's generalization ability and prediction accuracy, enabling the model to make rapid predictions. Taking into account the directly measurable voltage difference, this method adjusts model parameters and continuously optimizes the model based on feedback between the current voltage difference and the predicted difference.

[0161] As shown in Figure 5, this embodiment of the present disclosure provides a deep learning prediction device 100 for battery system voltage range, which includes a data acquisition module 110 and a prediction module 120.

[0162] The acquisition module 110 is configured to acquire battery operating data of the battery cluster; the battery operating data includes individual cell range data, battery cluster charge and discharge power data, and battery cluster total voltage data.

[0163] The prediction module 120 is configured to input the battery operating data into a pre-trained deep learning prediction model to predict the voltage range of the battery cluster within a preset time period.

[0164] The pre-trained deep learning prediction model is a BiGRU-Transformer-LSTM model.

[0165] In some embodiments, the pre-trained deep learning prediction model is trained to obtain a historical dataset of battery clusters in an energy storage power station through the following steps: the historical dataset includes: individual battery range data, battery cluster charge and discharge power data, and battery cluster total voltage data; the historical dataset is divided into N parts according to time order to obtain N subsets of historical data.

[0166] Use N-1 subsets of the N historical data sets as the training set, and the remaining 1 subset as the validation and test set; build an identical initial deep learning prediction model for each of the N-1 historical data subsets in the training set.

[0167] For each subset of historical data in the training set, an initial deep learning prediction model is iteratively trained to obtain a well-trained initial deep learning prediction model; N-1 well-trained initial deep learning prediction models are obtained by training N-1 subsets of historical data.

[0168] Each network parameter in the N-1 trained initial deep learning prediction models is weighted and averaged according to the performance on the validation set. The weights are determined by the inverse ratio of the prediction errors of each trained initial deep learning prediction model on the validation set. The weighted average parameter values are then assigned to the deep learning prediction model to generate a pre-trained deep learning prediction model.

[0169] In some embodiments, the BiGRU-Transformer-LSTM model includes a BiGRU layer, a Transformer layer, an LSTM layer, a fully connected layer, and an activation function connected in sequence.

[0170] In some embodiments, the BiGRU layer is used to encode the input time series data in both forward and reverse directions, capture bidirectional dependency information, and output a feature sequence.

[0171] The Transformer layer receives the output of the BiGRU layer, uses a multi-head self-attention mechanism to perform global feature modeling on all positions of the time series, and outputs the feature sequence after global feature modeling.

[0172] The LSTM layer is a two-layer LSTM layer, consisting of a first LSTM layer and a second LSTM layer. The first LSTM layer is used to further extract the temporal dependencies of the feature sequence after global feature modeling and output the time step features after processing by the first LSTM layer. The second LSTM layer is used to receive the output of the first LSTM layer and combine it with the long-term dependency features to generate the final output features.

[0173] Fully connected layers and activation functions are used to map the final output features of the LSTM layer to the prediction results.

[0174] In some embodiments, both the first-layer LSTM and the second-layer LSTM include a gating mechanism.

[0175] In some embodiments, the BiGRU-Transformer-LSTM model further includes a dynamic smoothing layer, which is used to add dynamic smoothing processing after the fully connected layer and activation function to further eliminate noise in the prediction results and optimize the stability of the prediction curve to obtain the final prediction results.

[0176] In some embodiments, the activation function uses the LeakyReLU activation function.

[0177] As shown in Figure 6, this disclosure provides an electronic device 100 for implementing a deep learning prediction method for battery system voltage differences. The electronic device 100 includes a memory 101, at least one processor 102, a computer program 103 stored in the memory 101 and executable on the at least one processor 102, and at least one communication bus 104.

[0178] The memory 101 can be used to store the computer program 103. The processor 102 implements the steps of the deep learning prediction method for battery system voltage difference described in the above embodiments by running or executing the computer program stored in the memory 101 and calling the data stored in the memory 101. The memory 101 may mainly include a program storage area and a data storage area. The program storage area may store the operating system, at least one application program required for a function (such as sound playback function, image playback function, etc.), etc.; the data storage area may store data created according to the use of the electronic device 100 (such as audio data), etc. In addition, the memory 101 may include non-volatile memory, such as hard disk, memory, plug-in hard disk, smart media card (SMC), secure digital (SD) card, flash card, at least one disk storage device, flash memory device, or other non-volatile solid-state storage device.

[0179] At least one processor 102 may be a Central Processing Unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. Processor 102 may be a microprocessor or any other processor. Processor 102 is the control center of electronic device 100, connecting various parts of electronic device 100 via various interfaces and lines.

[0180] The memory 101 in the electronic device 100 stores multiple instructions to implement the deep learning prediction method for the voltage range of the battery system described above. The processor 102 can execute the multiple instructions to achieve: collecting battery operating data of the battery cluster; the battery operating data includes individual cell range data, battery cluster charge and discharge power data, and battery cluster total voltage data.

[0181] The battery operating data is input into a pre-trained deep learning prediction model to predict the voltage range of the battery cluster within a preset time period.

[0182] The pre-trained deep learning prediction model is a BiGRU-Transformer-LSTM model.

[0183] If the modules or units integrated in the electronic device 100 are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium (including non-transitory computer-readable storage media). Based on this understanding, all or part of the processes in the methods of the above embodiments can also be implemented by a computer program instructing related hardware. The computer program can be stored in a computer-readable storage medium, and when executed by a processor, it can implement the steps of the various method embodiments described above. The computer program includes computer program code, which can be in the form of source code, object code, executable files, or certain intermediate forms. The computer-readable medium can include: any entity or device capable of carrying the computer program code, a recording medium, a USB flash drive, a portable hard drive, a magnetic disk, an optical disk, a computer memory, and a read-only memory (ROM).

[0184] Those skilled in the art will understand that embodiments of this disclosure can be provided as methods, systems, or computer program products. Therefore, this disclosure can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this disclosure can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0185] This disclosure is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this disclosure. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in one or more flowchart illustrations and / or one or more block diagrams.

[0186] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means that implement the functions specified in one or more flowcharts and / or one or more block diagrams.

[0187] These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer-implemented process, such that the instructions, which execute on the computer or other programmable apparatus, provide steps for implementing the functions specified in one or more flowcharts and / or one or more block diagrams.

[0188] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of this disclosure and not to limit them. Although this disclosure has been described in detail with reference to the above embodiments, those skilled in the art should understand that modifications or equivalent substitutions can still be made to the implementation methods of this disclosure. Any modifications or equivalent substitutions that do not depart from the spirit and scope of this disclosure should be covered within the protection scope of the claims of this disclosure.

Claims

1. A deep learning prediction method for battery system voltage range, comprising: Collect battery operation data from the battery cluster; The battery operating data includes individual cell range data, battery cluster charge / discharge power, and battery cluster total voltage data. The battery operating data is input into a pre-trained deep learning prediction model to predict the voltage range of the battery cluster within a preset time period. The pre-trained deep learning prediction model is a BiGRU-Transformer-LSTM model. The BiGRU-Transformer-LSTM model includes a BiGRU layer, a Transformer layer, an LSTM layer, a fully connected layer, and an activation function connected in sequence. The BiGRU layer is used to encode the input time series data in both forward and reverse directions, capture bidirectional dependency information, and output feature sequences. The Transformer layer is used to receive the output of the BiGRU layer, and uses a multi-head self-attention mechanism to perform global feature modeling on all positions of the time series, and outputs the feature sequence after global feature modeling. The LSTM layer is a two-layer LSTM layer, including a first LSTM layer and a second LSTM layer; wherein, the first LSTM layer is used to further extract the time dependency relationship of the feature sequence after global feature modeling, and output the time step features after processing by the first LSTM layer; the second LSTM layer is used to receive the output of the first LSTM layer and combine it with the long-term dependency features to generate the final output features. The fully connected layer and the activation function are used to map the final output features of the LSTM layer to the prediction result.

2. The deep learning prediction method for battery system voltage range according to claim 1, wherein, The pre-trained deep learning prediction model is obtained through the following steps: Obtain historical datasets of battery clusters in an energy storage power station. The historical datasets include: individual cell range data, battery cluster charge / discharge power data, and battery cluster total voltage data. Divide the historical datasets into N parts according to time order to obtain N subsets of historical data. N is a positive integer and N≥3. Use N-1 subsets of the N historical data sets as the training set and the remaining 1 subset as the validation and test set; establish an identical initial deep learning prediction model for each of the N-1 historical data subsets in the training set. For each subset of historical data in the training set, an initial deep learning prediction model is trained iteratively to obtain N-1 trained initial deep learning prediction models. Each network parameter in the N-1 trained initial deep learning prediction models is weighted and averaged according to the performance on the validation set. The weights are determined by the inverse ratio of the prediction error of each trained initial deep learning prediction model on the validation set. The weighted average parameter values are then assigned to the deep learning prediction model to generate a pre-trained deep learning prediction model.

3. The deep learning prediction method for battery system voltage range according to claim 1 or 2, wherein, Both the first-layer LSTM and the second-layer LSTM include gating mechanisms.

4. The deep learning prediction method for battery system voltage range according to any one of claims 1 to 3, wherein, The BiGRU-Transformer-LSTM model also includes: A dynamic smoothing layer is used to add dynamic smoothing processing after the fully connected layer and the activation function to further eliminate noise in the prediction results and optimize the smoothness of the prediction curve to obtain the final prediction result.

5. The deep learning prediction method for battery system voltage range according to any one of claims 1 to 4, wherein, The activation function used is the LeakyReLU activation function.

6. A deep learning prediction device for battery system voltage range, comprising: The acquisition module is configured to collect battery operating data from the battery clusters; The battery operating data includes individual cell range data, battery cluster charge / discharge power, and battery cluster total voltage data. The prediction module is configured to input the battery operating data into a pre-trained deep learning prediction model to predict the voltage range of the battery cluster within a preset time period. The pre-trained deep learning prediction model is a BiGRU-Transformer-LSTM model. The BiGRU-Transformer-LSTM model includes a BiGRU layer, a Transformer layer, an LSTM layer, a fully connected layer, and an activation function connected in sequence. The BiGRU layer is used to encode the input time series data in both forward and reverse directions, capture bidirectional dependency information, and output feature sequences. The Transformer layer is used to receive the output of the BiGRU layer, and uses a multi-head self-attention mechanism to perform global feature modeling on all positions of the time series, and outputs the feature sequence after global feature modeling. The LSTM layer is a two-layer LSTM layer, including a first LSTM layer and a second LSTM layer; wherein, the first LSTM layer is used to further extract the time dependency relationship of the feature sequence after global feature modeling, and output the time step features after processing by the first LSTM layer; the second LSTM layer is used to receive the output of the first LSTM layer and combine it with the long-term dependency features to generate the final output features. The fully connected layer and the activation function are used to map the final output features of the LSTM layer to the prediction result.

7. The deep learning prediction device for battery system voltage range according to claim 6, wherein, The pre-trained deep learning prediction model is obtained through the following steps: Obtain historical datasets of battery clusters in energy storage power stations. The historical datasets include: individual cell range data, battery cluster charge and discharge power data, and battery cluster total voltage data. Divide the historical datasets into N parts according to time order to obtain N subsets of historical data. Use N-1 subsets of the N historical data sets as the training set and the remaining 1 subset as the validation and test set; establish an identical initial deep learning prediction model for each of the N-1 historical data subsets in the training set. For each subset of historical data in the training set, an initial deep learning prediction model is trained iteratively to obtain N-1 trained initial deep learning prediction models. Each network parameter in the N-1 trained initial deep learning prediction models is weighted and averaged according to the performance on the validation set. The weights are determined by the inverse ratio of the prediction error of each trained initial deep learning prediction model on the validation set. The weighted average parameter values are then assigned to the deep learning prediction model to generate a pre-trained deep learning prediction model.

8. The deep learning prediction device for battery system voltage range according to claim 6 or 7, wherein, Both the first-layer LSTM and the second-layer LSTM include gating mechanisms.

9. The deep learning prediction device for battery system voltage range according to any one of claims 6 to 8, wherein, The BiGRU-Transformer-LSTM model also includes: A dynamic smoothing layer is used to add dynamic smoothing processing after the fully connected layer and the activation function to further eliminate noise in the prediction results and optimize the smoothness of the prediction curve to obtain the final prediction result.

10. The deep learning prediction device for battery system voltage range according to any one of claims 6 to 9, wherein, The activation function used is the LeakyReLU activation function.

11. An electronic device comprising a processor and a memory, the processor being configured to execute a computer program stored in the memory to implement a deep learning prediction method for battery system voltage range as described in any one of claims 1 to 5.

12. A computer-readable storage medium, wherein, The computer-readable storage medium stores at least one instruction, which, when executed by a processor, implements the deep learning prediction method for battery system voltage range as described in any one of claims 1 to 5.