Fuel cell-ultra capacitor hybrid power distribution system and apparatus

By decomposing the parameters of the hybrid power system and integrating a discriminator mechanism into the hybrid controller, precise power distribution of the fuel cell-supercapacitor hybrid power system under extreme conditions was achieved, solving the problems of dynamic response lag and improper energy distribution, and improving the robustness and stability of the system.

CN122232610APending Publication Date: 2026-06-19SHANDONG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHANDONG UNIV
Filing Date
2026-05-21
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing hybrid power systems struggle to achieve precise dynamic power allocation under complex road conditions, leading to issues such as lag in dynamic response, reduced fuel cell lifespan, and improper energy distribution.

Method used

A power distribution system for a fuel cell-supercapacitor hybrid power system is adopted. The operating parameters are acquired by the acquisition module and decomposed into low-frequency and high-frequency intrinsic mode components. The reference power of the fuel cell is generated by model predictive control. A hybrid controller is constructed based on a dual-delay deep deterministic strategy gradient algorithm with an integrated discriminator mechanism to obtain the final compensation power of the supercapacitor. Finally, the two are combined to obtain the final command power of the fuel cell.

🎯Benefits of technology

It improves the system's robustness in extreme scenarios, achieves reasonable allocation of required power, enhances the system's risk perception and stability under extreme operating conditions, and extends the equipment's service life.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122232610A_ABST
    Figure CN122232610A_ABST
Patent Text Reader

Abstract

This invention belongs to the field of power allocation technology. To address the poor robustness of existing power allocation methods, a power allocation system and device for a fuel cell-supercapacitor hybrid power system are proposed. The system decomposes the demand power time series into low-frequency intrinsic mode components and high-frequency intrinsic mode components. Based on the low-frequency intrinsic mode components, a fuel cell reference power is generated through model predictive control. A hybrid controller is constructed and trained based on a dual-delay deep deterministic strategy gradient algorithm with an integrated discriminator mechanism. Based on the high-frequency intrinsic mode components, the trained hybrid controller is used to obtain the final compensation power of the supercapacitor. Finally, the fuel cell reference power and the final compensation power of the supercapacitor are combined to obtain the final command power of the fuel cell, achieving reasonable allocation of demand power and improving the robustness of the system under extreme scenarios.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of power distribution technology, and particularly relates to a power distribution system and equipment for a fuel cell-supercapacitor hybrid power system. Background Technology

[0002] The statements in this section are merely background information related to the present invention and do not necessarily constitute prior art.

[0003] In the practical application of new energy vehicles, the randomness and volatility of power demand under complex road conditions lead to problems such as dynamic response lag, fuel cell lifespan degradation, and improper energy allocation. Therefore, designing a highly robust energy management strategy to achieve precise dynamic power allocation is crucial for improving system efficiency and extending equipment lifespan.

[0004] Currently, energy management strategies for hybrid power systems mainly fall into three categories: rule-based methods, model predictive control (MMCC)-based methods, and reinforcement learning-based methods. Rule-based methods trigger power allocation rules by setting preset thresholds. While computationally simple, this method struggles to adapt to varying road conditions and is prone to fuel cell response lag or supercapacitor overload issues under transient power fluctuations. Model predictive control (MMCC) methods perform rolling optimization by constructing a system state-space model. Although this method improves steady-state control accuracy, its predictive performance is highly dependent on accurate operating condition models and has high computational complexity, failing to effectively handle high-frequency power components under extreme conditions. Reinforcement learning-based methods learn optimal power allocation strategies using deep neural networks. While this method possesses strong nonlinear fitting capabilities, the training process requires a large amount of high-quality data covering extreme operating conditions; the scarcity of high-risk operating condition samples leads to insufficient strategy generalization. Summary of the Invention

[0005] To overcome the shortcomings of the prior art, the present invention provides a power distribution system and equipment for a fuel cell-supercapacitor hybrid power system, which realizes reasonable distribution of demand power and improves the robustness of the system under extreme scenarios.

[0006] To achieve the above objectives, the present invention adopts the following technical solution: In a first aspect, the present invention provides a power distribution system for a fuel cell-supercapacitor hybrid power system, comprising: The acquisition module is configured to acquire the operating parameters of the hybrid power system; wherein the operating parameters of the hybrid power system include the demand power time series and the supercapacitor state of charge. The decomposition module is configured to decompose the demand power time series into low-frequency intrinsic mode components and high-frequency intrinsic mode components; The model prediction module is configured to: use model prediction control to obtain the fuel cell reference power for the low-frequency intrinsic mode components; The reinforcement learning module is configured to: construct a hybrid controller based on a dual-delay deep deterministic policy gradient algorithm with an integrated discriminator mechanism, and obtain the final compensation power of the supercapacitor using the trained hybrid controller according to the high-frequency intrinsic mode components; The allocation module is configured to combine the fuel cell reference power and the supercapacitor final compensation power to obtain the fuel cell final command power.

[0007] In a second aspect, the present invention provides an electronic device, including a memory and a processor, and computer instructions stored in the memory and running on the processor, wherein the computer instructions, when executed by the processor, perform the following steps: Obtain the operating parameters of the hybrid power system; wherein, the operating parameters of the hybrid power system include the demand power time series and the state of charge of the supercapacitor; The demand power time series is decomposed into low-frequency intrinsic mode components and high-frequency intrinsic mode components; The low-frequency intrinsic mode components are used to obtain the fuel cell reference power through model predictive control; A hybrid controller is constructed based on a dual-delay deep deterministic policy gradient algorithm with an integrated discriminator mechanism. The final compensation power of the supercapacitor is obtained by using the trained hybrid controller according to the high-frequency intrinsic mode components. The final command power of the fuel cell is obtained by combining the base power of the fuel cell and the final compensation power of the supercapacitor.

[0008] Thirdly, the present invention provides a computer-readable storage medium for storing computer instructions, which, when executed by a processor, perform the following steps: Obtain the operating parameters of the hybrid power system; wherein, the operating parameters of the hybrid power system include the demand power time series and the state of charge of the supercapacitor; The demand power time series is decomposed into low-frequency intrinsic mode components and high-frequency intrinsic mode components; The low-frequency intrinsic mode components are used to obtain the fuel cell reference power through model predictive control; A hybrid controller is constructed based on a dual-delay deep deterministic policy gradient algorithm with an integrated discriminator mechanism. The final compensation power of the supercapacitor is obtained by using the trained hybrid controller according to the high-frequency intrinsic mode components. The final command power of the fuel cell is obtained by combining the base power of the fuel cell and the final compensation power of the supercapacitor.

[0009] Fourthly, the present invention provides a computer program product, comprising a computer program, which, when executed by a processor, performs the following steps: Obtain the operating parameters of the hybrid power system; wherein, the operating parameters of the hybrid power system include the demand power time series and the state of charge of the supercapacitor; The demand power time series is decomposed into low-frequency intrinsic mode components and high-frequency intrinsic mode components; The low-frequency intrinsic mode components are used to obtain the fuel cell reference power through model predictive control; A hybrid controller is constructed based on a dual-delay deep deterministic policy gradient algorithm with an integrated discriminator mechanism. The final compensation power of the supercapacitor is obtained by using the trained hybrid controller according to the high-frequency intrinsic mode components. The final command power of the fuel cell is obtained by combining the base power of the fuel cell and the final compensation power of the supercapacitor.

[0010] The above one or more technical solutions have the following beneficial effects: In this invention, the demand power time series is decomposed into low-frequency intrinsic mode components and high-frequency intrinsic mode components. Based on the low-frequency intrinsic mode components, the fuel cell reference power is generated through model predictive control. A hybrid controller is constructed and trained based on a dual-delay deep deterministic strategy gradient algorithm with an integrated discriminator mechanism. Based on the high-frequency intrinsic mode components, the trained hybrid controller is used to obtain the final compensation power of the supercapacitor. Then, the fuel cell reference power and the final compensation power of the supercapacitor are combined to obtain the final command power of the fuel cell, thereby achieving reasonable allocation of demand power and improving the robustness of the system under extreme scenarios.

[0011] Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. Attached Figure Description

[0012] The accompanying drawings, which form part of this invention, are used to provide a further understanding of the invention. The illustrative embodiments of the invention and their descriptions are used to explain the invention and do not constitute an improper limitation of the invention.

[0013] Figure 1 This is a flowchart of the power distribution strategy for the hybrid power system in an embodiment of the present invention; Figure 2 This is a diagram of the Actor network structure topology in an embodiment of the present invention; Figure 3 This is a network topology diagram of the GAN-TD3 hybrid controller in an embodiment of the present invention. Detailed Implementation

[0014] It should be noted that the following detailed descriptions are exemplary and intended to provide further illustration of the invention. Unless otherwise specified, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains.

[0015] It should be noted that the terminology used herein is for the purpose of describing particular implementations only and is not intended to limit the exemplary implementations of the present invention.

[0016] Where there is no conflict, the embodiments and features in the embodiments of the present invention can be combined with each other.

[0017] Example 1 This embodiment discloses a power distribution system for a fuel cell-supercapacitor hybrid power system, including: The acquisition module is configured to acquire the operating parameters of the hybrid power system; wherein the operating parameters of the hybrid power system include the demand power time series and the state of charge of the supercapacitor. The decomposition module is configured to decompose the demand power time series into low-frequency intrinsic mode components and high-frequency intrinsic mode components; The model prediction module is configured to use model predictive control to obtain the fuel cell reference power for low-frequency intrinsic mode components; The reinforcement learning module is configured to: construct a hybrid controller based on a dual-delay deep deterministic policy gradient algorithm with an integrated discriminator mechanism; and obtain the final compensation power of the supercapacitor using the trained hybrid controller based on the high-frequency intrinsic mode components. The distribution module is configured to combine the fuel cell reference power and the supercapacitor's final compensation power to obtain the fuel cell's final command power.

[0018] In this embodiment, the demand power time series is decomposed into low-frequency intrinsic mode components and high-frequency intrinsic mode components. Based on the low-frequency intrinsic mode components, the fuel cell reference power is generated through model predictive control. A hybrid controller is constructed and trained based on a dual-delay deep deterministic strategy gradient algorithm with an integrated discriminator mechanism. Based on the high-frequency intrinsic mode components, the trained hybrid controller is used to obtain the final compensation power of the supercapacitor. Then, the allocation factor is dynamically calculated based on the supercapacitor's state of charge. The power is fused through the allocation factor to achieve reasonable allocation of demand power and improve the robustness of the system under extreme scenarios.

[0019] The following is combined Figure 1 This embodiment provides a detailed description of a power distribution system for a fuel cell-supercapacitor hybrid power system: The acquisition module is configured to acquire the operating parameters of the hybrid power system, including the demand power time series and the supercapacitor state of charge.

[0020] In the acquisition module of this embodiment, a hybrid power system is built by selecting a fuel cell, a supercapacitor, and a bidirectional DC / DC converter. The system is then subjected to a simulated operation test under selected road operating conditions to collect operating data of the hybrid power system.

[0021] The models of fuel cells, supercapacitors, and bidirectional DC / DC converters, as well as the road cycle conditions, are selected based on the actual application. Road cycle conditions include the accelerator pedal opening gradient, braking deceleration intensity, ambient temperature, and altitude change rate.

[0022] Hybrid system operating data includes demand power time series. fuel cell power and DC bus voltage and the state of charge of supercapacitors Among them, bus voltage The voltage value is equal to that of the supercapacitor terminal.

[0023] The decomposition module is configured to decompose the demand power time series into low-frequency intrinsic mode components and high-frequency intrinsic mode components.

[0024] In the decomposition module of this embodiment, the empirical mode decomposition algorithm is used to decompose the collected demand power time series into low-frequency intrinsic mode components according to the energy entropy threshold. and high-frequency intrinsic mode components .

[0025] demand power time series The specific steps for decomposition are as follows: First, the IMF component is extracted iteratively through a sieving process, and the residual signal is defined. for:

[0026] Wherein, initial value , This represents the k-th intrinsic mode component. The screening stops when the standard deviation SD value is less than 0.3. t represents the discrete-time sampling point.

[0027] Then, the energy entropy of each IMF component was calculated. The specific formula is as follows:

[0028] in, L2 norm is used to normalize components. The total number of IMF components obtained from the decomposition is denoted as t, which represents the discrete-time sampling points.

[0029] Finally, based on the energy entropy threshold Divide into high and low frequency components, and The components are classified as low-frequency components. , The components are classified as high-frequency components. Among them, the energy entropy threshold Determined based on actual working conditions.

[0030] The model prediction module is configured to use model predictive control to obtain the fuel cell reference power for low-frequency intrinsic mode components.

[0031] In the model prediction module of this embodiment, the state-space equation of the hybrid power system is first established by acquiring data from the acquisition module. The specific algorithm and system parameters are determined according to the actual situation. Then, an optimization objective function is constructed. By solving the optimization objective function and mapping the obtained optimal prediction sequence, the fuel cell reference power is generated. .

[0032] The specific expression for the objective function to be optimized is as follows:

[0033] in, This represents the predicted fuel cell output power in the time domain, which is obtained through iterative calculation using the state-space equation. Let f be the efficiency function of the fuel cell; This serves as the reference state of charge for the supercapacitor. Indicates the length of the prediction time domain; These are the weighting coefficients; This represents the state of charge of the supercapacitor in the predicted time domain; i represents the i-th control cycle predicted from the current moment. This indicates the low-frequency component.

[0034] The reinforcement learning module is configured to: construct a hybrid controller based on a dual-delay deep deterministic policy gradient algorithm with an integrated discriminator mechanism; and obtain the final compensation power of the supercapacitor using the trained hybrid controller based on the high-frequency intrinsic mode components.

[0035] In the reinforcement learning module of this embodiment, such as Figure 3 As shown, the discriminator mechanism of the generative adversarial network is integrated into the Critic network of the dual-delay deep deterministic policy gradient algorithm (TD3) to construct a GAN-TD3 hybrid controller, with high-frequency components as input. State of charge of supercapacitors DC bus voltage Fuel cell output power Train the controller model.

[0036] First, the structure of the GAN-TD3 hybrid controller, which is constructed from generative adversarial networks and dual-delay deep deterministic policy gradients, is explained: The TD3 algorithm's network structure consists of two parts: an Actor network and a Critic network. The Actor network operates based on the current state, which includes high-frequency components. State of charge of supercapacitor DC bus voltage Fuel cell power In addition to historical power allocation records, action commands are generated, which are power compensation strategies for supercapacitors.

[0037] like Figure 2 As shown, the Actor network structure includes a feature extraction channel, a feature fusion module, and several nonlinear transformation processing layers. To address the multi-scale dynamic coupling problem in hybrid systems, this embodiment divides the feature extraction channel into two feature channels: a spatial feature channel and a temporal feature channel. The spatial feature channel consists of N one-dimensional (1D) convolutional layers and is responsible for processing the input features reflecting the instantaneous dynamics of the system, namely the DC bus voltage. Fuel cell power Extracting instantaneous dynamic features The temporal feature channel is composed of a Long Short-Term Memory (LSTM) network, including an input gate, a forget gate, an output gate, and cell states. It is responsible for processing time-dependent input features, i.e., high-frequency components. State of charge of supercapacitor Extract long-term dependency features .

[0038] The feature fusion module extracts instantaneous dynamic features With long-term dependency characteristics By splicing and merging, a joint feature is formed. The feature fusion module uses an adaptive weighting mechanism to dynamically determine instantaneous dynamic features. With long-term dependency characteristics The contribution to the fusion, specifically, the instantaneous dynamic characteristics. With long-term dependency characteristics The vectors are concatenated into a combined vector and then fed into a fully connected layer of size M to generate the original weight scores. The softmax function is then used to convert these scores into actual weight coefficients. and ,make sure , obtain fusion features : .

[0039] Fusion features Finally, after several layers of nonlinear transformation processing, normalized action commands are generated. The nonlinear transformation processing layer consists of several fully connected layers, with the first layer responsible for receiving the fused features. The last layer is responsible for generating action instructions. The last layer uses the Tanh activation function, while the other fully connected layers use the ReLU activation function.

[0040] The Critic network is responsible for evaluating the value of actions generated by the Actor network. The Critic network structure consists of two structurally identical and independent Q-value evaluation networks. Each Q-network comprises an input layer and a feature processing layer: the input layer is responsible for processing the current true state vector... Action vectors generated by the Actor network The features are then concatenated. The feature processing layer consists of several fully connected layers, responsible for processing the concatenated joint input features. A non-linear transformation is introduced after each fully connected layer using the ReLU activation function. Each independent Q-network ultimately outputs a state-action value assessment value, i.e., the Q-value, which is minimized to avoid overestimation.

[0041] To improve the robustness of the hybrid power system under extreme conditions, a GAN discriminator is added to the existing Actor and Critic network structures. The GAN discriminator, as an independent network module, employs a multi-layer fully connected neural network architecture, with its input layer receiving the real state vector of the same dimension as the Critic network. With action vectors Then, feature extraction is performed through a fully connected layer nonlinear activation function to learn the distribution characteristics of correct state-action. Finally, the output layer uses the Sigmoid activation function to generate the sample authenticity probability, with a value close to 1 indicating a real working condition sample and a value close to 0 indicating a generated sample.

[0042] The GAN discriminator operates in parallel with the Critic network without altering its original dual-Q network structure. By identifying the risk characteristics of extreme operating conditions through output signals, it injects risk signal perception into the system evaluation framework, ultimately achieving the technical effect of improving robustness under extreme operating conditions.

[0043] The network structure constructed above is then trained. The specific training steps are as follows: First, construct extreme working condition samples. Including but not limited to: 150% amplification of high-frequency components, Mapped to the critical value of {0.3, 0.85}, DC voltage is increased to 90% of the safety threshold.

[0044] The GAN discriminator is pre-trained using only the true state vectors. With action commands Initialize the discriminant capability of the GAN discriminator.

[0045] Then, extreme working condition samples With the true state vector Mixing with action commands The entire network model is fed into the input for training. In each training cycle, the loss function of the Q-network is first used... The Critic network parameters are updated, and then the Actor network parameters are updated every d steps based on the policy gradient evaluated by the Critic. Finally, the GAN discriminator parameters are updated based on the binary classification loss of real samples and adversarial samples.

[0046] loss function The specific expression is:

[0047] in, Represents the temporal difference objective. As a discount factor in the TD3 algorithm θ i 'and '' represents the parameters of the Critic target network and the Actor target network, respectively, which are updated from the online network parameters using a soft update strategy. θ i and Updated to obtain; The state-action value function of the Critic network. This is the function for determining the authenticity of a GAN discriminator. For the policy function of the Actor network, This is a sample of extreme operating conditions. , , These refer to the processing of their respective input values ​​by the Critic network, GAN discriminator, and Actor network in this embodiment, and the function result refers to the output of these three networks. The symbol for mathematical expectation; Represents the true state vector at time t; This represents the true state vector at time t+1; This is the discount factor.

[0048] The distribution module is configured to combine the fuel cell reference power and the supercapacitor's final compensation power to obtain the fuel cell's final command power.

[0049] In the allocation module of this embodiment, the output of the Actor network in the TD3 algorithm is processed, the risk value of adversarial examples is evaluated, and the final compensation power of the supercapacitor is synthesized according to the risk level. .

[0050] The specific steps for processing the Actor network output value and synthesizing the final compensation power of the supercapacitor are as follows: Convert the Actor network output into physical power. The specific expression is:

[0051] in, , These are the maximum discharge power of the supercapacitor and the lower limit of the instantaneous power in charging mode, i.e., the maximum feedback power. This is the action vector.

[0052] Calculate the risk coefficient based on Critic output. The specific expression is:

[0053] in, To counteract the DC bus voltage under the current condition, For reference voltage, This is within the normal voltage fluctuation range. To counteract the state of supercapacitor SOC, The Q-value output by the i-th Critic network. This serves as the reference state of charge for the supercapacitor. This is a sample of extreme operating conditions.

[0054] Based on risk coefficient The value of the synthesized supercapacitor final compensation power .

[0055] The low-risk zone adopts the TD3 optimization results entirely, while the transitional operating zone (0.3 < <0.7) The weight β of the TD3 strategy is reduced exponentially, while the weight (1-β) of the MPC safety strategy is increased, so that the TD3 optimization result is fully adopted in the low-risk area and the MPC is switched in the high-risk area, limiting the compensation power to within 70% of the fuel cell base power.

[0056] The specific formula is as follows:

[0057]

[0058] in, It is a fusion weighting coefficient used to dynamically adjust the proportion of the two power allocation strategies; Indicates the rated power of the fuel cell. For physical power, This is the reference power for the fuel cell.

[0059] The fuel cell reference power is dynamically allocated using a factor. and the final compensation power of the supercapacitor The final commanded power of the fuel cell is obtained by fusion, and the constraints are verified to achieve a reasonable allocation of the required power.

[0060] To further optimize fuel cell power distribution and fully utilize the compensation function of supercapacitors, it is necessary to base the real-time performance of supercapacitors on... That is, the state of charge Dynamic calculation of allocation factor Generate the final command power of the fuel cell :

[0061] And satisfy the following constraints:

[0062] in, For fuel cell power, That is, the state of charge of a supercapacitor , This is the DC bus voltage, where V is volts and s is the time unit, i.e., seconds. Physical power; This is the reference power for the fuel cell.

[0063] This embodiment constructs a hybrid power system and collects data. The required power is decomposed using an empirical mode decomposition algorithm. A fuel cell baseline power is generated through model predictive control. A GAN-TD3 hybrid controller with an integrated GAN discriminator is constructed and trained. The Actor output is processed to synthesize supercapacitor compensation power. Power is then fused through dynamic allocation factors to achieve reasonable allocation of required power, improving the system's robustness in extreme scenarios. Specific advantages include: This embodiment proposes a network integration of a GAN discriminator and the TD3 algorithm. By learning the state-action distribution characteristics through the discriminator mechanism and combining it with adversarial example training, the system's risk perception capability and robustness under extreme conditions are significantly improved, avoiding the failure of traditional methods due to the lack of high-risk samples.

[0064] This embodiment employs spatial-temporal dual-channel feature fusion in the Actor network, utilizing convolutional layers and LSTM networks to process instantaneous dynamic and temporally dependent features respectively, and then fusing them through an adaptive weight mechanism. This improves the multi-scale dynamic coupling problem of hybrid systems and enhances the accuracy of action commands.

[0065] This embodiment sets up a dynamic constraint and safety fuse mechanism, implements protection measures based on the real-time state of charge and voltage deviation of the supercapacitor, and forces a switch to a safe mode in high-risk areas to ensure that power distribution is reliably executed in the engineering environment, which is conducive to extending the equipment life and improving the overall stability of the system.

[0066] Example 2 The purpose of this embodiment is to provide a power distribution method for a fuel cell-supercapacitor hybrid power system, including: Obtain the operating parameters of the hybrid power system; among which, the operating parameters of the hybrid power system include the demand power time series and the state of charge of the supercapacitor; The demand power time series is decomposed into low-frequency intrinsic mode components and high-frequency intrinsic mode components; Model predictive control is used to obtain the fuel cell reference power for the low-frequency intrinsic mode components; A hybrid controller is constructed using a dual-delay deep deterministic policy gradient algorithm with an integrated discriminator mechanism. The final compensation power of the supercapacitor is obtained by using the trained hybrid controller based on the high-frequency intrinsic mode components. The final command power of the fuel cell is obtained by combining the fuel cell reference power and the supercapacitor final compensation power.

[0067] Each step in Example 2 corresponds to that in Example 1, and for the sake of brevity, they will not be repeated here.

[0068] In further embodiments, the following is also provided: An electronic device includes a memory and a processor, as well as computer instructions stored in the memory and running on the processor. When executed by the processor, the computer instructions perform the method described in Embodiment 2. For brevity, further details are omitted here.

[0069] It should be understood that in this embodiment, the processor can be a central processing unit (CPU), or it can be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor can be a microprocessor or any conventional processor, etc.

[0070] Memory may include read-only memory and random access memory, and provides instructions and data to the processor. A portion of memory may also include non-volatile random access memory. For example, memory may also store information about the device type.

[0071] A computer-readable storage medium for storing computer instructions, which, when executed by a processor, perform the method described in Embodiment 2.

[0072] The method in Embodiment 1 can be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules within the processor. The software modules can reside in readily available storage media in the art, such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, or registers. This storage medium is located in memory; the processor reads information from the memory and, in conjunction with its hardware, completes the steps of the above method. To avoid repetition, a detailed description is not provided here.

[0073] A computer program product includes a computer program that, when executed by a processor, implements the method described in Embodiment 2.

[0074] The present invention also provides at least one computer program product tangibly stored on a non-transitory computer-readable storage medium. The computer program product includes computer-executable instructions, such as instructions included in program modules, which execute in a device on a target real or virtual processor to perform the processes / methods described above. Typically, program modules include routines, programs, libraries, objects, classes, components, data structures, etc., that perform specific tasks or implement specific abstract data types. In various embodiments, the functionality of program modules can be combined or divided among program modules as needed. The machine-executable instructions for the program modules can execute within a local or distributed device. In a distributed device, the program modules can reside in both local and remote storage media.

[0075] The computer program code used to implement the methods of the present invention may be written in one or more programming languages. This computer program code may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, such that when executed by the computer or other programmable data processing device, the program code causes the functions / operations specified in the flowcharts and / or block diagrams to be implemented. The program code may be executed entirely on a computer, partially on a computer, as a stand-alone software package, partially on a computer and partially on a remote computer, or entirely on a remote computer or server.

[0076] In the context of this invention, computer program code or related data may be carried by any suitable carrier to enable a device, apparatus, or processor to perform the various processes and operations described above. Examples of carriers include signals, computer-readable media, and the like. Examples of signals may include electrical, optical, radio, sound, or other forms of propagation signals, such as carrier waves, infrared signals, etc.

[0077] Those skilled in the art will recognize that the units and algorithm steps described in conjunction with the embodiments herein can be implemented in electronic hardware or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.

[0078] While the specific embodiments of the present invention have been described above in conjunction with the accompanying drawings, this is not intended to limit the scope of protection of the present invention. Those skilled in the art should understand that various modifications or variations that can be made by those skilled in the art without creative effort based on the technical solutions of the present invention are still within the scope of protection of the present invention.

Claims

1. A fuel cell-ultra-capacitor hybrid power distribution system, characterized by, include: The acquisition module is configured to acquire the operating parameters of the hybrid power system; wherein the operating parameters of the hybrid power system include the demand power time series and the supercapacitor state of charge. The decomposition module is configured to decompose the demand power time series into low-frequency intrinsic mode components and high-frequency intrinsic mode components; The model prediction module is configured to: use model prediction control to obtain the fuel cell reference power for the low-frequency intrinsic mode components; The reinforcement learning module is configured to: construct a hybrid controller based on a dual-delay deep deterministic policy gradient algorithm with an integrated discriminator mechanism, and obtain the final compensation power of the supercapacitor using the trained hybrid controller according to the high-frequency intrinsic mode components; The allocation module is configured to combine the fuel cell reference power and the supercapacitor final compensation power to obtain the fuel cell final command power.

2. The fuel cell-ultracap hybrid power distribution system of claim 1, wherein, A model predictive controller is constructed based on the low-frequency intrinsic mode components, and an optimization objective function is established for the model predictive controller. The mapping of the optimal prediction sequence is obtained by solving the optimization objective function, and the fuel cell reference power is generated. Specifically, the optimization objective function is: in, This indicates the predicted output power of the fuel cell in the time domain; Let f be the efficiency function of the fuel cell; This serves as the reference state of charge for the supercapacitor. Indicates the length of the prediction time domain; These are the weighting coefficients; The state of charge of the supercapacitor is represented by , and i represents the i-th control cycle predicted from the current moment. This indicates the low-frequency component.

3. The power distribution system for a fuel cell-supercapacitor hybrid power system as described in claim 1, characterized in that, The dual-delay deep deterministic policy gradient algorithm includes an Actor network and a Critic network. The Actor network includes a feature extraction channel, a feature fusion module, and a nonlinear transformation processing layer. The feature extraction channel includes a parallel spatial feature extraction channel and a temporal feature channel. The spatial feature extraction channel extracts instantaneous dynamic features in the hybrid power system. The temporal feature channel extracts long-term dependency features in the hybrid power system; The feature fusion module fuses the extracted instantaneous dynamic features and long-term dependency features to obtain fused features; The nonlinear transformation processing layer normalizes the fused features to obtain action commands.

4. The power distribution system of the fuel cell-supercapacitor hybrid power system as described in claim 3, characterized in that, The Critic network includes two independent Q-value evaluation networks. The Q-value evaluation network generates a state-action value evaluation value based on the current state vector and the action vector output by the Actor network. The state vector includes high-frequency intrinsic mode components, supercapacitor state of charge, DC bus voltage, fuel cell power, and historical power allocation records. The action vector is the power compensation of the supercapacitor.

5. The power distribution system for a fuel cell-supercapacitor hybrid power system as described in claim 1, characterized in that, The training of the hybrid controller specifically involves: The generative adversarial network discriminator is pre-trained using the real state vector and action vector of the hybrid power system; The hybrid controller is trained by mixing extreme operating condition samples and real state vectors of the hybrid system and combining them with action commands. During the training loop, the parameters of the Critic network are updated based on the loss function evaluated by the Q-value, the parameters of the Actor network are updated based on the policy gradient evaluated by the Critic network, and the parameters of the Generative Adversarial Network (GAN) discriminator are updated based on the binary classification loss of the real state vector and adversarial examples.

6. The power distribution system for a fuel cell-supercapacitor hybrid power system as described in any one of claims 1-5, characterized in that, Based on the high-frequency intrinsic mode components, the final compensation power of the supercapacitor is obtained using a trained hybrid controller, specifically as follows: The physical power of the supercapacitor is obtained by using the action vector output by the Actor network of the trained hybrid controller. The risk coefficient is calculated by using the state-action value assessment value output by the Critic network of the trained hybrid controller, combined with the DC bus voltage under adversarial conditions, the state of charge of the supercapacitor under adversarial conditions, the reference voltage, and the normal voltage fluctuation range. The final compensation power of the supercapacitor is obtained based on the physical power of the supercapacitor and the risk coefficient.

7. The power distribution system for a fuel cell-supercapacitor hybrid power system as described in claim 1, characterized in that, The final command power of the fuel cell is obtained by dynamically calculating the allocation factor based on the state of charge of the supercapacitor, and combining the base power of the fuel cell and the final compensation power of the supercapacitor.

8. An electronic device, characterized in that, Includes memory and a processor, as well as computer instructions stored in the memory and running on the processor, which, when executed by the processor, perform the following steps: Obtain the operating parameters of the hybrid power system; wherein, the operating parameters of the hybrid power system include the demand power time series and the state of charge of the supercapacitor; The demand power time series is decomposed into low-frequency intrinsic mode components and high-frequency intrinsic mode components; The low-frequency intrinsic mode components are used to obtain the fuel cell reference power through model predictive control; A hybrid controller is constructed based on a dual-delay deep deterministic policy gradient algorithm with an integrated discriminator mechanism. The final compensation power of the supercapacitor is obtained by using the trained hybrid controller according to the high-frequency intrinsic mode components. The final command power of the fuel cell is obtained by combining the base power of the fuel cell and the final compensation power of the supercapacitor.

9. A computer-readable storage medium, characterized in that, Used to store computer instructions, which, when executed by the processor, perform the following steps: Obtain the operating parameters of the hybrid power system; wherein, the operating parameters of the hybrid power system include the demand power time series and the state of charge of the supercapacitor; The demand power time series is decomposed into low-frequency intrinsic mode components and high-frequency intrinsic mode components; The low-frequency intrinsic mode components are used to obtain the fuel cell reference power through model predictive control; A hybrid controller is constructed based on a dual-delay deep deterministic policy gradient algorithm with an integrated discriminator mechanism. The final compensation power of the supercapacitor is obtained by using the trained hybrid controller according to the high-frequency intrinsic mode components. The final command power of the fuel cell is obtained by combining the base power of the fuel cell and the final compensation power of the supercapacitor.

10. A computer program product, characterized in that, This includes a computer program, which, when executed by a processor, performs the following steps: Obtain the operating parameters of the hybrid power system; wherein, the operating parameters of the hybrid power system include the demand power time series and the state of charge of the supercapacitor; The demand power time series is decomposed into low-frequency intrinsic mode components and high-frequency intrinsic mode components; The low-frequency intrinsic mode components are used to obtain the fuel cell reference power through model predictive control; A hybrid controller is constructed based on a dual-delay deep deterministic policy gradient algorithm with an integrated discriminator mechanism. The final compensation power of the supercapacitor is obtained by using the trained hybrid controller according to the high-frequency intrinsic mode components. The final command power of the fuel cell is obtained by combining the base power of the fuel cell and the final compensation power of the supercapacitor.