A method, device, terminal equipment, and storage medium for controlling a photovoltaic self-powered nitrogen-controlled grain storage silo.

By constructing an energy-atmosphere coupling decision model for a photovoltaic self-powered nitrogen-controlled grain storage facility, and combining reinforcement learning and real-time data, the problems of energy waste and unstable atmosphere control in photovoltaic self-powered scenarios were solved, achieving efficient energy utilization and precise atmosphere control.

CN122308480APending Publication Date: 2026-06-30SWEET FAYAN IND LTD CO OF GUANGDONG PROVINCE

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SWEET FAYAN IND LTD CO OF GUANGDONG PROVINCE
Filing Date
2026-04-01
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing photovoltaic self-powered nitrogen-controlled atmosphere control methods for grain storage fail to achieve coordinated adaptation between atmosphere control and photovoltaic energy supply, resulting in energy waste and unstable atmosphere control effects.

Method used

By acquiring real-time photovoltaic output power, energy storage battery SOC, and environmental data, a current state space is constructed. Nitrogen regulation is then performed using an energy-atmosphere regulation coupled decision model based on reinforcement learning. Combined with atmospheric regulation compliance rewards, energy utilization rewards, equipment loss penalties, and anomaly penalties, precise action decisions are generated.

Benefits of technology

It achieves efficient utilization of photovoltaic energy, reduces equipment wear, improves the accuracy of controlled atmosphere control and the reliability of system operation, and avoids energy waste and abnormal conditions.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122308480A_ABST
    Figure CN122308480A_ABST
Patent Text Reader

Abstract

This invention discloses a method, device, terminal equipment, and storage medium for controlling a photovoltaic self-powered nitrogen-controlled atmosphere grain storage silo. The method includes: acquiring real-time photovoltaic output power of the photovoltaic system, real-time state of charge (SOC) of the energy storage battery, and real-time environmental data within the grain storage silo; comparing the real-time environmental data with the environmental parameter boundaries corresponding to each atmosphere control stage to determine the current atmosphere control stage of the grain storage silo; determining the leakage index and nitrogen concentration decay coefficient based on the current atmosphere control stage; constructing a current state space based on the real-time SOC of the energy storage battery, real-time photovoltaic output power, real-time environmental data, leakage index, and nitrogen concentration decay coefficient; inputting the current state space into a trained energy-atmosphere control coupled decision model to generate a current action decision; and regulating the nitrogen generator within the grain storage silo based on the current action decision. Implementing this invention can improve the accuracy and reliability of nitrogen atmosphere control.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of grain storage technology, and in particular to a photovoltaic self-powered nitrogen-controlled grain storage control method, device, terminal equipment, and storage medium. Background Technology

[0002] Nitrogen-controlled atmosphere storage technology has become one of the core technologies of modern grain storage due to its effectiveness in inhibiting mold growth in grain piles, killing storage pests, and delaying the decline in grain quality. This technology creates an oxygen-deficient environment by filling sealed storage silos with high-purity nitrogen to preserve grain. However, the stable operation of core equipment such as nitrogen generators relies on a continuous energy supply. In response to the call for clean energy substitution, photovoltaic self-powered systems are increasingly being applied to nitrogen-controlled atmosphere grain storage silos. These systems generate electricity through photovoltaic modules and store energy in batteries, enabling the storage silo's controlled atmosphere system to be self-powered. This reduces the energy costs of traditional grid power supply and aligns with the development needs of low-carbon grain storage, becoming an important direction for the current development of grain storage technology.

[0003] Currently, the control methods for photovoltaic self-powered nitrogen-controlled atmosphere grain storage still have significant technical defects. The core problem is that existing control strategies mostly adopt a single control logic, using only gas content parameters such as nitrogen and oxygen concentrations in the storage as the sole basis for regulation. This fails to achieve coordinated adaptation between atmosphere control and photovoltaic energy supply. Since the output power of the photovoltaic system is significantly fluctuating and uncertain due to natural factors such as light intensity and ambient temperature, and the state of charge (SOC) of the energy storage battery directly determines the power supply stability of the system, its charging and discharging state needs to be dynamically adjusted according to the photovoltaic output to avoid battery damage and power outages caused by overcharging and over-discharging.

[0004] However, existing control methods completely ignore the prediction information of photovoltaic output power and the real-time SOC status of energy storage batteries. They only trigger the start-up and shutdown of the nitrogen generator and adjust its output power based on the real-time gas content in the storage chamber. On the one hand, when the nitrogen concentration in the storage chamber is lower than the target value, the nitrogen generator will start running at full load regardless of whether the photovoltaic output is sufficient or whether the energy storage battery is within a reasonable SOC range. This can easily lead to waste of photovoltaic energy or over-discharge of the energy storage battery, shortening the lifespan of the energy storage equipment. On the other hand, when the photovoltaic output is insufficient or the energy storage is depleted, the nitrogen generator will shut down, making it impossible to maintain a stable controlled atmosphere environment and affecting the safety of grain storage. Therefore, existing control methods are not suitable for photovoltaic self-powered scenarios, easily causing energy waste and unstable controlled atmosphere effects. Summary of the Invention

[0005] This invention provides a photovoltaic self-powered nitrogen-modified atmosphere grain storage control method and device to solve the problems that existing nitrogen-modified atmosphere grain storage control methods are not applicable to photovoltaic self-powered scenarios, have unreasonable control schemes, and are prone to energy waste and unstable atmosphere control effects.

[0006] To achieve the above objectives, a first aspect of this application provides a photovoltaic self-powered nitrogen-controlled grain storage control method, comprising: Acquire real-time photovoltaic output power of the photovoltaic system supplying energy to the grain storage silo, real-time SOC of the energy storage battery, and real-time environmental data inside the grain storage silo; By comparing real-time environmental data with the environmental parameter boundaries corresponding to each controlled atmosphere stage, the current controlled atmosphere stage of the grain storage warehouse is determined, and the corresponding leakage index and nitrogen concentration decay coefficient are determined based on the controlled atmosphere stage. The current state space is constructed based on the real-time SOC of the energy storage battery, the real-time photovoltaic output power, the real-time environmental data, the leakage index, and the nitrogen concentration decay coefficient. The current state space is input into the trained energy-atmosphere controlled environment (AGE) coupled decision model, enabling the model to generate the current action decision based on the current state space. The AGE coupled decision model is obtained after training on a reinforcement learning model. The action space of the reinforcement learning model includes the nitrogen output mode of the nitrogen generator and the valve opening. The reward function of the reinforcement learning model includes: an AGE compliance reward, an energy utilization reward, an equipment loss penalty, and an anomaly penalty. The AGE compliance reward quantifies the difference between the gas concentration parameters in the grain storage silo and the target gas concentration parameters for the corresponding AGE stage. The energy utilization reward quantifies the photovoltaic energy utilization rate of the photovoltaic system. The equipment loss penalty quantifies the operating losses of the equipment. The anomaly penalty quantifies the severity of abnormal conditions in the grain storage silo. The nitrogen generator in the grain storage silo is adjusted based on the current action decision.

[0007] Furthermore, the current state space is input into the trained energy-atmosphere-controlled coupling decision model so that the energy-atmosphere-controlled coupling decision model generates the current action decision based on the current state space. This process also includes: Obtain the historical average load power corresponding to the grain storage silo; The usable power of the battery is determined based on the battery power release coefficient corresponding to the real-time SOC of the energy storage battery and the battery's rated capacity. The energy supply is determined based on the sum of the real-time photovoltaic output power and the available battery power. The energy margin is determined based on the deviation between the energy supply and the historical average load power. The energy state of the photovoltaic system is determined based on the energy margin and the real-time SOC of the energy storage battery. Using the energy state of the photovoltaic system as a hard constraint, the action space set is initially screened, and actions that cannot be executed in the action space set under the corresponding energy state are eliminated, resulting in a screened action space set. This allows the quantity-gas regulation coupled decision model to use the screened action space set as the optional execution range after receiving the current state space, and to generate the current action decision.

[0008] Furthermore, determining the energy state of the photovoltaic system based on the energy margin and the real-time SOC of the energy storage battery includes: When the real-time SOC of the energy storage battery is greater than the first preset SOC, or the energy margin is greater than the first preset power value, the energy state of the photovoltaic system is determined to be a high-energy state. When the real-time SOC of the energy storage battery is between the first preset SOC and the second preset SOC, or when the energy margin is between the first preset power value and the second preset power value, the energy state of the photovoltaic system is determined to be a medium-energy state; wherein, the first preset SOC is greater than the second preset SOC, and the first preset power value is greater than the second preset power value. When the real-time SOC of the energy storage battery is less than the second preset SOC, or the energy margin is less than the second preset power value, the energy state of the photovoltaic system is determined to be a low-energy state. The action space set is initially screened using the energy state of the photovoltaic system as a hard constraint, eliminating actions that cannot be executed under the corresponding energy state, resulting in a screened action space set, including: When the energy state is medium energy state, the action space is eliminated by the continuous high flow nitrogen charging mode and the action of valve full opening, and the filtered action space is obtained. When the energy state is low, the continuous high flow nitrogen charging mode, the continuous medium flow nitrogen charging mode, and the valve fully open mode in the action space are eliminated to obtain the filtered action space.

[0009] Furthermore, the energy-controlled climate coupling decision model also includes an emergency decision branch; Before adjusting the nitrogen generators in the grain storage silos based on current action decisions, the following steps are also included: If a sudden abnormal state is detected, based on the emergency decision branch, the corresponding emergency action is matched in the preset emergency action space according to the type of the sudden abnormal state, and the nitrogen generator in the grain storage is regulated based on the emergency action.

[0010] Furthermore, the controlled atmosphere stage includes: a gas pre-replacement stage, a rapid attainment stage, and a maintenance stage; The determination of the corresponding leakage index and nitrogen concentration decay coefficient based on the current controlled atmosphere stage includes: When the controlled atmosphere phase is the maintenance phase, the continuous time change sequence of nitrogen concentration in the grain storage is linearly fitted, and the nitrogen concentration decay coefficient is determined based on the fitting results; the average pressure difference fall rate inside and outside the grain storage is determined, and the leakage index is determined based on the average pressure difference fall rate. When the controlled atmosphere stage is either the gas pre-replacement stage or the rapid standard attainment stage, the nitrogen concentration decay coefficient and leakage index are set to zero.

[0011] Furthermore, the reward function includes: ; in, ; ; ; ; For the reward function; This is a reward item for achieving controlled atmosphere standards; Energy utilization rewards; This is a penalty item for equipment wear and tear; This is an abnormal penalty item; This refers to the weighting coefficient of the controlled atmosphere compliance reward items; Weighting coefficients for energy utilization rewards; This refers to the weighting coefficient for the equipment loss penalty term; This represents the weighting coefficient for the anomaly penalty item; The index weighting coefficient is the monitoring index corresponding to the oxygen concentration. This represents the target oxygen concentration for the current controlled atmosphere phase. Real-time oxygen concentration; This represents the real-time nitrogen concentration. This represents the target nitrogen concentration for the current controlled atmosphere phase. Energy utilization weighting coefficient; Real-time photovoltaic output power; The real-time discharge power of the energy storage battery based on action decision A; Real-time SOC of energy storage batteries; This represents the maximum SOC of the energy storage battery. This represents the minimum SOC for energy storage batteries. This is the loss weighting coefficient; This represents the actual number of times the equipment starts and stops per unit time based on action decision A; This represents the maximum number of start-stop cycles allowed for the equipment. The actual full-load operating time of the equipment based on action decision A; The maximum allowable full-load operating time of the equipment; m represents the type of abnormal situation; i represents the i-th type of abnormal situation; is the weight value for the i-th type of anomaly; Let A be an exception indication function based on action decision A. If action decision A triggers an exception of type i, then =1, if not triggered =0.

[0012] Furthermore, the leakage index is determined based on the rate of decrease in average differential pressure, including: The leakage index is calculated using the following formula: ; in, Leakage index, This is the preset warehouse capacity correction factor. This is the preset chamber sealing correction factor. This represents the average differential pressure drop rate.

[0013] Based on the above method embodiments, the present invention provides corresponding apparatus embodiments; One embodiment of the present invention provides a photovoltaic self-powered nitrogen-controlled grain storage control device, comprising: a basic data acquisition module, a derived data determination module, a current state space construction module, an action decision generation module, and an equipment control module; The basic data acquisition module is used to acquire the real-time photovoltaic output power of the photovoltaic system that powers the grain storage warehouse, the real-time SOC of the energy storage battery, and the real-time environmental data inside the grain storage warehouse. The derived data determination module is used to compare real-time environmental data with the environmental parameter boundaries corresponding to each controlled atmosphere stage, determine the current controlled atmosphere stage of the grain storage warehouse, and determine the corresponding leakage index and nitrogen concentration decay coefficient based on the controlled atmosphere stage. The current state space construction module is used to construct the current state space based on the real-time SOC of the energy storage battery, the real-time photovoltaic output power, the real-time environmental data, the leakage index, and the nitrogen concentration decay coefficient. The action decision generation module is used to input the current state space into the trained energy-atmosphere controlled environment (AGE) coupled decision model, so that the AGE coupled decision model generates the current action decision based on the current state space. The AGE coupled decision model is obtained after training on a reinforcement learning model. The action space of the reinforcement learning model includes: the nitrogen output mode of the nitrogen generator and the valve opening. The reward function of the reinforcement learning model includes: AGE compliance reward, energy utilization reward, equipment loss penalty, and anomaly penalty. The AGE compliance reward is used to quantify the difference between the gas concentration parameters in the grain storage silo and the target gas concentration parameters for the corresponding AGE stage. The energy utilization reward is used to quantify the photovoltaic energy utilization of the photovoltaic system. The equipment loss penalty is used to quantify the operating losses of the equipment. The anomaly penalty is used to quantify the severity of abnormal states in the grain storage silo. The equipment control module is used to adjust the nitrogen generator in the grain storage silo based on the current action decision.

[0014] Based on the above-mentioned method, the present invention provides corresponding embodiments for terminal devices.

[0015] An embodiment of the present invention provides a terminal device, including: a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor. When the processor executes the computer program, it implements the photovoltaic self-powered nitrogen atmosphere-controlled grain storage control method according to any one of the present invention.

[0016] Based on the above-described method, the present invention provides corresponding embodiments for storage media.

[0017] One embodiment of the present invention provides a storage medium, the storage medium including a stored computer program, wherein, when the computer program is running, it controls the device where the storage medium is located to execute the photovoltaic self-powered nitrogen-controlled grain storage control method according to any one of the present invention.

[0018] The following benefits can be obtained by implementing the present invention: This invention provides a method, device, terminal equipment, and storage medium for controlling a photovoltaic self-powered nitrogen-controlled grain storage silo. The method acquires real-time environmental data within the storage silo and compares it with the boundary parameters of each controlled atmosphere stage to accurately determine the current controlled atmosphere stage. Simultaneously, it acquires the leakage index and nitrogen concentration decay coefficient for the corresponding stage, improving the targeted nature of the controlled atmosphere control. Subsequently, a current state space is constructed using the real-time SOC of the energy storage battery, real-time photovoltaic output power, real-time environmental data, leakage index, and nitrogen concentration decay coefficient. A nitrogen control decision is made using an energy-controlled atmosphere coupling decision model trained based on a reinforcement learning model. Compared to existing technologies, this method innovatively incorporates real-time photovoltaic output power and real-time SOC of the energy storage battery into the control considerations, jointly constructing the current state space with real-time environmental data, leakage index, and nitrogen concentration decay coefficient, enabling precise matching between controlled atmosphere control decisions and photovoltaic energy supply capacity. The design of a climate control compliance reward item through an energy-atmosphere coupling decision model ensures that nitrogen regulation meets basic requirements. Furthermore, the design of an energy utilization reward item fully utilizes excess photovoltaic power, preventing photovoltaic energy waste. Simultaneously, the design of equipment loss and anomaly penalty items effectively reduces mechanical wear caused by frequent start-ups and shutdowns and improper operation of the nitrogen generator, extending equipment lifespan and lowering maintenance costs. In addition, the anomaly penalty item can promptly identify and mitigate abnormal states within the storage silo, preventing further damage to the climate control system and further improving the overall operational reliability of the storage silo's climate control system. Attached Figure Description

[0019] Figure 1 This is a schematic flowchart of a photovoltaic self-powered nitrogen-controlled grain storage control method according to an embodiment of the present invention; Figure 2 This is a schematic diagram of the structure of a photovoltaic self-powered nitrogen-controlled grain storage control device according to an embodiment of the present invention. Detailed Implementation

[0020] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0021] like Figure 1 As shown, an embodiment of the present invention provides a method for controlling a photovoltaic self-powered nitrogen-controlled grain storage facility, comprising the following steps: S1. Acquire real-time photovoltaic output power of the photovoltaic system supplying energy to the grain storage silo, real-time SOC of the energy storage battery, and real-time environmental data inside the grain storage silo.

[0022] Specifically, the environmental data within the grain storage silo in this invention includes: oxygen concentration, nitrogen concentration, average temperature, relative humidity, dew point temperature, humidity gradient, and silo pressure. Oxygen and nitrogen concentrations are core indicators for determining the effectiveness of controlled atmosphere storage. Average temperature reflects the basic environment of grain storage, relative humidity affects grain water activity and the growth of insects and mold, dew point temperature reflects the risk of condensation within the silo, and the humidity gradient is the rate of change of humidity in the vertical / horizontal directions within the silo; an excessively large gradient can lead to localized dampness in the grain. Average temperature, relative humidity, dew point temperature, and humidity gradient can reflect the risks associated with grain storage. Silo pressure reflects the sealing condition of the silo. The aforementioned real-time environmental data can be obtained by setting up various sensors to collect the environmental data in real time.

[0023] Step S2: Compare the real-time environmental data with the environmental parameter boundaries corresponding to each controlled atmosphere stage to determine the current controlled atmosphere stage of the grain storage warehouse, and determine the corresponding leakage index and nitrogen concentration decay coefficient based on the controlled atmosphere stage. Preferably, in this invention, the aforementioned controlled atmosphere stage includes: a gas pre-replacement stage, a rapid target attainment stage, or a maintenance stage. The gas pre-replacement stage is the initial operation stage of nitrogen controlled atmosphere, referring to the rapid removal of the original air in the grain storage silo by rapidly injecting nitrogen at a high flow rate after the controlled atmosphere task is started, significantly reducing the oxygen content in the silo from the natural environmental value and initially increasing the nitrogen concentration. The rapid target attainment stage is the core process stage of nitrogen controlled atmosphere, referring to the continuous high-flow-rate nitrogen injection and precise control of the nitrogen injection rate after the pre-replacement stage, rapidly and accurately adjusting the oxygen content and nitrogen concentration in the silo to the target gas content for the corresponding grain variety. The maintenance stage refers to the process of monitoring and compensating for the fluctuations in indicators caused by the natural decay of nitrogen and slight leakage in the silo in real time using a low-energy, flexible control method (such as pulse nitrogen injection) after achieving precise target attainment in all indicators in the rapid target attainment stage, so that the core controlled atmosphere indicators such as oxygen content and nitrogen concentration remain stable within the target threshold range for a long time. This is a key stage to ensure the long-term controlled atmosphere preservation of grain.

[0024] The boundary values ​​of oxygen and nitrogen concentrations corresponding to different controlled atmosphere stages are different. In this invention, the boundary values ​​of oxygen and nitrogen concentrations corresponding to each controlled atmosphere stage are used as the boundaries of the above-mentioned environmental parameters. The oxygen and nitrogen concentrations at the current moment are extracted from the real-time environmental data and compared with the boundary values ​​of oxygen and nitrogen concentrations corresponding to each controlled atmosphere stage to determine the controlled atmosphere stage of the current grain storage. In a preferred embodiment, determining the corresponding leakage index and nitrogen concentration decay coefficient based on the current controlled atmosphere stage includes: When the controlled atmosphere phase is the maintenance phase, the continuous time change sequence of nitrogen concentration in the grain storage is linearly fitted, and the nitrogen concentration decay coefficient is determined based on the fitting results; the average pressure difference fall rate inside and outside the grain storage is determined, and the leakage index is determined based on the average pressure difference fall rate. When the controlled atmosphere stage is either the gas pre-replacement stage or the rapid standard attainment stage, the nitrogen concentration decay coefficient and leakage index are set to zero.

[0025] Specifically, during the maintenance phase of the controlled atmosphere storage, nitrogen concentration data within the grain storage silo is collected at the current moment, within the most recent cycle, to obtain a continuous time-varying sequence of nitrogen concentration. Then, a univariate linear fit is performed on this sequence, with time as the independent variable and nitrogen concentration as the dependent variable. The absolute value of the slope obtained through the fit is defined as the nitrogen concentration decay coefficient. For the leakage index, the pressure difference between the inside and outside of the grain storage silo at the start of the most recent cycle is used as the initial pressure difference, and the pressure difference at the current moment is used as the real-time pressure difference. The total decrease in pressure difference is calculated based on the initial and real-time pressure differences. The average pressure difference fall rate is calculated based on the total decrease in pressure difference and the total cycle duration. The leakage index is then calculated using the following formula: ;in, Leakage index, The preset storage capacity correction coefficient reflects the impact of storage volume on pressure drop. The larger the volume, the slower the pressure drop under the same leakage amount, and the smaller the coefficient. The preset sealing correction coefficient for the silo body reflects the inherent leakage characteristics of the basic sealing level of the silo body; The above refers to the average pressure differential drop rate; the specific values ​​of the silo sealing correction coefficient and silo capacity correction coefficient can be set according to the actual situation of the grain storage silo. The nitrogen concentration decay coefficient and leakage index are the core judgment criteria for whether the maintenance stage is abnormal. There are two situations in the maintenance stage: normal nitrogen concentration decay (such as grain respiration consuming a small amount of nitrogen, micro-permeability of silo materials) and (abnormal decay, such as silo leakage, silo door not sealed). Combining the nitrogen concentration decay coefficient with the leakage index can determine the root cause of nitrogen decay, thereby determining whether there is abnormal leakage in the silo. When the nitrogen concentration decay coefficient exceeds the threshold and the leakage index increases simultaneously, it indicates that the silo is leaking.

[0026] When the controlled atmosphere stage is either the gas pre-replacement stage or the rapid standard attainment stage, these two stages are dynamic pressure building / concentration adjustment processes involving active nitrogen charging, which do not meet the core premise for index calculation. Therefore, the index itself has no practical reference value. In this case, it should be set to zero directly to avoid invalid calculations.

[0027] S3. Construct the current state space based on the real-time SOC of the energy storage battery, the real-time photovoltaic output power, the real-time environmental data, the leakage index, and the nitrogen concentration decay coefficient; Specifically, the real-time SOC of the energy storage battery, real-time photovoltaic output power, real-time environmental data, leakage index, and nitrogen concentration decay coefficient are used as the state space. These data are then input into an energy-atmosphere regulation coupled decision-making model trained based on a reinforcement learning model to predict the corresponding action space and thus obtain the specific decision-making action.

[0028] Constructing the current state space of the controlled atmosphere storage tank based on the current SOC of the energy storage battery, real-time photovoltaic output power, real-time environmental data, leakage index, and nitrogen concentration decay coefficient is the core design for deeply coupling and intelligently coordinating the photovoltaic self-powered system and the nitrogen controlled atmosphere process. It integrates scattered single-dimensional monitoring data into a multi-dimensional, strongly correlated, and quantifiable overall system state representation system. This breaks through the traditional single-indicator control. In the traditional control mode, the photovoltaic power supply system (SOC, photovoltaic power) and the controlled atmosphere process system (leakage, decay) are monitored and decided independently, which easily leads to the problem of mismatch between energy supply and controlled atmosphere demand (such as high-energy nitrogen charging when photovoltaic power is insufficient, or untimely nitrogen replenishment when photovoltaic power is sufficient). By constructing a unified state space, energy dispatch and controlled atmosphere process decisions can be made more precise.

[0029] In other alternative embodiments, the real-time photovoltaic output power can be replaced with the predicted average photovoltaic output power over a preset future time period; the current state space can be constructed based on the real-time SOC of the energy storage battery, the predicted average photovoltaic output power, real-time environmental data, leakage index, and nitrogen concentration decay coefficient. In a preferred embodiment, the predicted photovoltaic output power of the photovoltaic system powering the grain storage silo can be obtained through the following method: The solar radiation illuminance, ambient temperature, photovoltaic array layout in the photovoltaic system, and reflectivity of the grain storage silo surface material are obtained at various times within a preset time period. The solar radiation irradiance, ambient temperature, the arrangement of the photovoltaic array in the photovoltaic system, and the reflectivity of the grain storage silo surface material are input into the trained photovoltaic output power prediction model so that the photovoltaic output power prediction model can determine the predicted average photovoltaic output power of the photovoltaic system.

[0030] Specifically, the aforementioned preset duration can be 5 minutes (the specific duration can be set according to actual needs). In this invention, the average photovoltaic output power within the next 5 minutes can be predicted as the predicted average photovoltaic output power. Based on the weather forecast data within the next 5 minutes, the solar radiation illuminance and ambient temperature at each moment within the next 5 minutes are obtained. Combined with the photovoltaic array layout and the reflectivity of the grain storage silo surface material, the input data is constructed. Based on the trained photovoltaic output power prediction model, the predicted average photovoltaic output power within the next 5 minutes is predicted. The photovoltaic output power prediction model can be trained using existing neural network models to obtain different training samples. Each training sample includes: solar radiation illuminance sample, ambient temperature sample, photovoltaic array layout sample, and reflectivity of the grain storage silo surface material sample. Each training sample corresponds to an actual average photovoltaic output power label. Using the above training samples and the corresponding actual average photovoltaic output power labels as input, and the predicted average photovoltaic output power as output, the neural network model is trained to obtain the aforementioned predicted average photovoltaic output power. Photovoltaic power generation is intermittent, random, and time-varying, while nitrogen atmosphere regulation in grain storage is a process with large inertia and large lag. In this embodiment, the predicted average photovoltaic output power is used instead of the real-time photovoltaic output power to achieve advanced energy dispatch and avoid frequent changes in control decisions and frequent start-stop of actuators such as nitrogen generators and valves. However, if the real-time photovoltaic output power with a long time interval is used, it is easy for the decision to fail to adapt to the fluctuation characteristics of photovoltaic output. Therefore, this application selects a relatively suitable time scale, namely the average photovoltaic power of the next 5 minutes, as the control benchmark. This can avoid frequent changes in decision and adapt to the fluctuation characteristics of photovoltaic output, ensuring the stable operation of the photovoltaic self-powered system and the accuracy of atmosphere regulation control.

[0031] Unlike conventional photovoltaic (PV) output power prediction models, this invention considers the characteristics of grain silos and improves the model's input data. In real-world scenarios, grain storage silos come in various types, such as vertical silos and shallow circular silos. Vertical silos are tall (25-40m) and cylindrical, with PV modules mostly installed on the side walls (lower part) or the roofs of adjacent attached flat buildings. The silos themselves cause significant shading (side wall modules are easily blocked by adjacent silos), and sunlight is concentrated during peak hours. Shallow circular silos are cylindrical with a moderate height (15-25m), and PV modules are mostly installed on the roof (circular top surface), with slight shading in some areas. Different types of grain storage silos affect the layout of the PV array in the photovoltaic system, thus affecting the PV output power. Furthermore, the reflectivity of the grain storage silo surface material affects the intensity of reflected light, further impacting the PV output power. By controlling these two parameters, the accuracy of PV power prediction can be improved.

[0032] S4: Input the current state space into the trained energy-atmosphere controlled environment (AGE) coupled decision model, so that the AGE coupled decision model generates the current action decision based on the current state space; wherein, the energy-atmosphere controlled environment (AGE) coupled decision model is obtained after training on a reinforcement learning model, the action space of the reinforcement learning model includes: the nitrogen output mode of the nitrogen generator and the valve opening, the reward function of the reinforcement learning model includes: AGE compliance reward item, energy utilization reward item, equipment loss penalty item, and anomaly penalty item; the AGE compliance reward item is used to quantify the difference between the gas concentration parameter in the grain storage and the target gas concentration parameter of the corresponding AGE stage; the energy utilization reward item is used to quantify the photovoltaic energy utilization of the photovoltaic system; the equipment loss penalty item is used to quantify the operating loss of the equipment; the anomaly penalty item is used to quantify the severity of the abnormal state in the grain storage; In a preferred embodiment, the reward function includes: ; in, ; ; ; ; For the reward function; This is a reward item for achieving controlled atmosphere standards; Energy utilization rewards; This is a penalty item for equipment wear and tear; This is an abnormal penalty item; This refers to the weighting coefficient of the controlled atmosphere compliance reward items; Weighting coefficients for energy utilization rewards; This refers to the weighting coefficient for the equipment loss penalty term; This represents the weighting coefficient for the anomaly penalty item; The index weighting coefficient is the monitoring index corresponding to the oxygen concentration. This represents the target oxygen concentration for the current controlled atmosphere phase. Real-time oxygen concentration; This represents the real-time nitrogen concentration. This represents the target nitrogen concentration for the current controlled atmosphere phase. Energy utilization weighting coefficient; Real-time photovoltaic output power; The real-time discharge power of the energy storage battery based on action decision A; Real-time SOC of energy storage batteries; This represents the maximum SOC of the energy storage battery. This represents the minimum SOC for energy storage batteries. This is the loss weighting coefficient; This represents the actual number of times the equipment starts and stops per unit time based on action decision A; This represents the maximum number of start-stop cycles allowed for the equipment. The actual full-load operating time of the equipment based on action decision A; The maximum allowable full-load operating time of the equipment; m represents the type of abnormal situation; i represents the i-th type of abnormal situation; is the weight value for the i-th type of anomaly; Let A be an exception indication function based on action decision A. If action decision A triggers an exception of type i, then =1, if not triggered =0.

[0033] It should be noted that, when constructing the current state space using the real-time SOC of the energy storage battery, the predicted average output power of the photovoltaic system, real-time environmental data, leakage index, and nitrogen concentration decay coefficient, the above... To predict the average output power of photovoltaics; Specifically, in this invention, a reinforcement learning model is used as the basic model. The state space of the reinforcement learning model is set as follows: the current SOC of the energy storage battery, the real-time photovoltaic output power (or the predicted average photovoltaic output power), real-time environmental data, leakage index, and nitrogen concentration decay coefficient. The action space of the reinforcement learning model is set as follows: the nitrogen output mode of the nitrogen generator and the valve opening. A corresponding reward function is set, and then corresponding samples are collected for iterative training to finally generate the above-mentioned energy-atmosphere regulation coupled decision model. This allows the energy-atmosphere regulation coupled decision model to predict and generate the corresponding nitrogen output mode and valve opening of the nitrogen generator based on the actual collected real-time SOC of the energy storage battery, real-time photovoltaic output power, real-time environmental data, leakage index, and nitrogen concentration decay coefficient, thereby obtaining the above-mentioned current action decision. In this invention, the nitrogen output modes include: continuous high-flow nitrogen charging mode (e.g., 80-120 m³ / h), continuous medium-flow nitrogen charging mode (e.g., 30-60 m³ / h), and intermittent nitrogen charging mode, etc.; the valve opening includes: fully open, half open, and closed, etc. (in other optional embodiments, more specific opening levels can be set); various combinations of nitrogen output modes and valve openings communicate the corresponding action space set of the reinforcement learning model; The setting of the reward function is the key to the reinforcement learning model. The following is a detailed explanation of the reward and penalty terms involved in the reward function of this invention. For modified atmosphere compliance reward items , ; Its core objective is to quantify the degree to which the environmental indicators inside the warehouse approach the target of the corresponding controlled atmosphere stage. The closer the indicator is to the target, the higher the reward value, which is directly related to the nitrogen filling strategy and valve opening determined by the model. In this invention, nitrogen and oxygen concentrations are used as measurement standards. By comparing the nitrogen and oxygen concentrations (percentages) in the grain storage silo with the target nitrogen and oxygen concentrations for the corresponding controlled atmosphere stage (the target nitrogen and oxygen concentrations are different for different controlled atmosphere stages), the controlled atmosphere compliance reward is calculated. The indicator weighting coefficient for the monitoring indicator corresponding to the oxygen concentration can be 0.6. For energy utilization rewards , Its core is to quantify the utilization efficiency of photovoltaic energy and the rationality of battery energy consumption, and to prioritize the use of photovoltaic power. We can take 0.7 to prioritize the local utilization of photovoltaic power and reduce dependence on battery discharge; Equipment loss penalty , Its core is to quantify the start-up, shutdown, and operating losses of the nitrogen generator, and to avoid frequent start-up and shutdown or long-term operation at full load of the nitrogen generator. It can be set to 0.8. The start-up and shutdown losses of the nitrogen generator are much greater than the continuous operation losses, so frequent start-ups and shutdowns are penalized first. This refers to the maximum number of start-stop cycles allowed for the nitrogen generator; the specific value is determined by the equipment's factory specifications. This refers to the maximum allowable full-load operating time of the nitrogen generator; the specific value is determined by the equipment's factory specifications.

[0034] Abnormal penalty items , Its core function is to quantify the severity of abnormal states in grain storage silos. After the model's action decision A is executed, parameters such as the average temperature, relative humidity, dew point temperature, humidity gradient, and internal air pressure determine the severity. Specific abnormal situations include: abnormal average temperature, abnormal relative humidity, abnormal dew point temperature, abnormal humidity gradient, abnormal internal air pressure, and leakage (which can be determined by the aforementioned leakage index and nitrogen concentration decay coefficient), totaling six abnormal items. For example, m can be 6. If the values ​​of the above indicators exceed the threshold of the corresponding controlled atmosphere stage after the model's action decision A is executed, an abnormality is determined. For instance, if the current stage is the gas pre-replacement stage, the standard average temperature in this stage without abnormalities is the first preset temperature threshold. If the average temperature inside the grain storage silo exceeds the first preset temperature threshold after the model's action decision A is executed, an average temperature abnormality is triggered, and a penalty is applied. Each type of abnormality corresponds to a weighting coefficient, and the threshold for the same indicator differs in different controlled atmosphere stages. For example, in the maintenance stage, the standard average temperature in the absence of abnormalities is the second preset temperature threshold, which differs from the first preset temperature threshold. By setting abnormality penalty items, the model's action decision A is used to determine the severity of abnormalities. Penalize behaviors that exceed the limits of the indicators, thereby reducing the risk of grain storage.

[0035] Furthermore, the weighting coefficients among the controlled atmosphere compliance reward items, energy utilization reward items, equipment loss, and anomaly penalty items at different controlled atmosphere stages in this invention are dynamically configured in the reward function, as illustrated below: During the pre-replacement stage, the following settings can be configured: =0.5; =0.2; =0.1; =0.2; This weighting configuration allows the generated decision actions in the pre-replacement stage to quickly reduce oxygen content and ensure basic grain storage safety. During the rapid achievement phase, the following settings can be configured: =0.6; =0.4; =0.1; =0.2; This weighting configuration enables the generated decision-making actions to achieve controlled atmosphere standards more quickly.

[0036] During the maintenance phase, the following settings are available: =0.3; =0.4; =0.2; =0.1; This weighting configuration enables the generated decision-making actions to maintain the standard state with low energy consumption and improve the utilization rate of photovoltaics.

[0037] Based on the above action space, state space, and reward function, after configuring the reinforcement model, collect corresponding samples and iteratively train the reinforcement model according to the existing training method to generate an energy-atmosphere coupled decision model. Then, input the current previous state space into the trained energy-atmosphere coupled decision model to obtain the current action decision, that is, the current nitrogen output mode of the nitrogen generator and the current valve opening. S5. Adjust the nitrogen generator in the grain storage silo based on the current action decision.

[0038] Specifically, based on the current nitrogen output mode and current valve opening obtained from the above steps, the nitrogen generator is controlled to perform nitrogen filling operation with the corresponding output mode and valve opening.

[0039] In other preferred embodiments, the current state space is input into a trained energy-atmosphere-controlled decision model so that the energy-atmosphere-controlled decision model generates a current action decision based on the current state space, further comprising: Obtain the historical average load power corresponding to the grain storage silo; The usable power of the battery is determined based on the battery power release coefficient corresponding to the real-time SOC of the energy storage battery and the battery's rated capacity. The energy supply is determined based on the sum of the real-time photovoltaic output power and the available battery power. The energy margin is determined based on the deviation between the energy supply and the historical average load power. The energy state of the photovoltaic system is determined based on the energy margin and the real-time SOC of the energy storage battery. Using the energy state of the photovoltaic system as a hard constraint, the action space set is initially screened, and actions that cannot be executed in the action space set under the corresponding energy state are eliminated, resulting in a screened action space set. This allows the quantity-gas regulation coupled decision model to use the screened action space set as the optional execution range after receiving the current state space, and to generate the current action decision.

[0040] Preferably, when the real-time SOC of the energy storage battery is greater than the first preset SOC, or the energy margin is greater than the first preset power value, the energy state of the photovoltaic system is determined to be a high-energy state. When the real-time SOC of the energy storage battery is between the first preset SOC and the second preset SOC, or when the energy margin is between the first preset power value and the second preset power value, the energy state of the photovoltaic system is determined to be a medium-energy state; wherein, the first preset SOC is greater than the second preset SOC, and the first preset power value is greater than the second preset power value. When the real-time SOC of the energy storage battery is less than the second preset SOC, or the energy margin is less than the second preset power value, the energy state of the photovoltaic system is determined to be a low-energy state. The action space set is initially screened using the energy state of the photovoltaic system as a hard constraint, eliminating actions that cannot be executed under the corresponding energy state, resulting in a screened action space set, including: When the energy state is medium energy state, the action space is eliminated by the continuous high flow nitrogen charging mode and the action of valve full opening, and the filtered action space is obtained. When the energy state is low, the continuous high flow nitrogen charging mode, the continuous medium flow nitrogen charging mode, and the valve fully open mode in the action space are eliminated to obtain the filtered action space.

[0041] In this embodiment, the energy state of the photovoltaic system is determined. Using the energy state as a hard constraint, actions that cannot be executed under the corresponding energy state are eliminated from the set of available actions in the model's action space. This reduces the search space when the model generates the current decision action, thereby improving the efficiency and accuracy of decision generation.

[0042] Specifically, in this embodiment, the available battery power is calculated using the following formula: ; in, Let t be the available battery power. Let t be the battery SOC; This is the battery power output coefficient, calibrated by the battery's factory specifications and can be obtained by looking up a table; it is illustrative. When ≥80%, =1; This refers to the battery's rated capacity.

[0043] The energy supply can be determined by summing the real-time photovoltaic output power and the available battery power. Then, the energy supply is subtracted from the historical average load power of the grain storage silo, and the difference is used to determine the energy margin. If the energy margin is greater than zero, it means that the energy is sufficient. If the energy margin is less than zero, it means that the energy is insufficient. If the energy margin is equal to zero, it means that the energy is balanced. Then, the energy state is finally determined by a combination of two thresholds: energy margin and real-time SOC of the energy storage battery. It is understandable that, when constructing the current state space based on the real-time SOC of the energy storage battery, the predicted average output power of the photovoltaic, real-time environmental data, leakage index, and nitrogen concentration decay coefficient, the real-time photovoltaic output power can be replaced with the predicted average photovoltaic output power, and the energy margin can be calculated based on the predicted average photovoltaic output power. Preferably, the first preset SOC can be 80%; the second preset SOC can be 50%; the first preset power value can be 50% of the historical average load power; and the second preset power value can be 0. The energy state of a photovoltaic system can be determined by setting the above thresholds.

[0044] When the energy state is high, all combinations of nitrogen charging mode and valve are retained, that is, the above action space set is retained unchanged; When the energy state is medium energy state, there will be any action combination that includes either continuous high flow nitrogen charging mode or valve full opening. These actions will be removed from the action space to obtain the filtered action space. When the energy state is low, there will be any combination of actions such as continuous high flow nitrogen charging mode, continuous medium flow nitrogen charging mode, or valve fully open. These actions will be eliminated from the action space to obtain the filtered action space.

[0045] In an optional embodiment, the energy-atmosphere regulation coupled decision model further includes: an emergency decision branch; Before adjusting the nitrogen generators in the grain storage silos based on current action decisions, the following steps are also included: If a sudden abnormal state is detected, based on the emergency decision branch, the corresponding emergency action is matched in the preset emergency action space according to the type of the sudden abnormal state, and the nitrogen generator in the grain storage is regulated based on the emergency action.

[0046] In this embodiment, an emergency decision branch is set up to deal with sudden abnormal states, including photovoltaic system failures and manual equipment maintenance. When these sudden abnormal states are detected, the corresponding emergency actions in the preset emergency action space are invoked, and the nitrogen generator in the grain storage is regulated based on the emergency actions. The various emergency actions in the emergency action space are preset by the user. For example, when a photovoltaic system failure is detected, the corresponding emergency action can be: control the nitrogen generator to stop working, close the valve, and generate a photovoltaic system failure warning.

[0047] By setting up emergency decision-making branches to handle unexpected events, the accuracy of regulation can be further improved and ineffective regulation can be avoided.

[0048] Based on the above method embodiments, the present invention provides corresponding apparatus embodiments; like Figure 2 The present invention provides a photovoltaic self-powered nitrogen-controlled grain storage control device, comprising: a basic data acquisition module, a derived data determination module, a current state space construction module, an action decision generation module, and an equipment control module; The basic data acquisition module is used to acquire the real-time photovoltaic output power of the photovoltaic system that powers the grain storage warehouse, the real-time SOC of the energy storage battery, and the real-time environmental data inside the grain storage warehouse. The derived data determination module is used to compare real-time environmental data with the environmental parameter boundaries corresponding to each controlled atmosphere stage, determine the current controlled atmosphere stage of the grain storage warehouse, and determine the corresponding leakage index and nitrogen concentration decay coefficient based on the controlled atmosphere stage. The current state space construction module is used to construct the current state space based on the real-time SOC of the energy storage battery, the real-time photovoltaic output power, the real-time environmental data, the leakage index, and the nitrogen concentration decay coefficient. The action decision generation module is used to input the current state space into the trained energy-atmosphere controlled environment (AGE) coupled decision model, so that the AGE coupled decision model generates the current action decision based on the current state space. The AGE coupled decision model is obtained after training on a reinforcement learning model. The action space of the reinforcement learning model includes: the nitrogen output mode of the nitrogen generator and the valve opening. The reward function of the reinforcement learning model includes: AGE compliance reward, energy utilization reward, equipment loss penalty, and anomaly penalty. The AGE compliance reward is used to quantify the difference between the gas concentration parameters in the grain storage silo and the target gas concentration parameters for the corresponding AGE stage. The energy utilization reward is used to quantify the photovoltaic energy utilization of the photovoltaic system. The equipment loss penalty is used to quantify the operating losses of the equipment. The anomaly penalty is used to quantify the severity of abnormal states in the grain storage silo. The equipment control module is used to adjust the nitrogen generator in the grain storage silo based on the current action decision.

[0049] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working process of the device described above can be referred to the corresponding process in the foregoing method embodiments, and will not be repeated here.

[0050] One embodiment of this application provides a terminal device, including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor. When the processor executes the computer program, it implements the photovoltaic self-powered nitrogen-controlled atmosphere grain storage control method described above.

[0051] One embodiment of this application provides a computer-readable storage medium storing a computer program, which, when executed by a processor, implements the photovoltaic self-powered nitrogen-controlled grain storage control method described above.

[0052] The computer device may be a smartphone, tablet, desktop computer, or cloud server, among other computing devices. This computer device may include, but is not limited to, a processor and memory. Those skilled in the art will understand that the figures are merely examples of computer devices and do not constitute a limitation on the computer device. It may include more or fewer components than illustrated, or a combination of certain components, or different components, such as input / output devices, network access devices, etc.

[0053] The processor referred to can be a Central Processing Unit (CPU), but it can also be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor can be a microprocessor or any conventional processor.

[0054] In some embodiments, the memory may be an internal storage unit of the computer device, such as a hard drive or RAM. In other embodiments, the memory may be an external storage device of the computer device, such as a plug-in hard drive, Smart Media Card (SMC), Secure Digital (SD) card, or Flash Card. Furthermore, the memory may include both internal and external storage units of the computer device. The memory is used to store the operating system, applications, bootloader, data, and other programs, such as the program code of the computer program. The memory can also be used to temporarily store data that has been output or will be output.

[0055] This application provides a computer program product that, when run on a computer device, enables the computer device to execute the steps described in the various method embodiments above.

[0056] In the several embodiments provided in this application, it will be understood that each block in the flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions marked in the blocks may occur in a different order than those shown in the figures. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved.

[0057] If the aforementioned functions are implemented as software functional modules and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0058] The above description represents the preferred embodiments of the present invention. It should be noted that those skilled in the art can make various improvements and modifications without departing from the principles of the present invention, and these improvements and modifications are also considered to be within the scope of protection of the present invention.

Claims

1. A method for controlling a photovoltaic self-powered nitrogen-controlled grain storage silo, characterized in that, include: Acquire real-time photovoltaic output power of the photovoltaic system supplying energy to the grain storage silo, real-time SOC of the energy storage battery, and real-time environmental data inside the grain storage silo; By comparing real-time environmental data with the environmental parameter boundaries corresponding to each controlled atmosphere stage, the current controlled atmosphere stage of the grain storage warehouse is determined, and the corresponding leakage index and nitrogen concentration decay coefficient are determined based on the controlled atmosphere stage. The current state space is constructed based on the real-time SOC of the energy storage battery, the real-time photovoltaic output power, the real-time environmental data, the leakage index, and the nitrogen concentration decay coefficient. The current state space is input into the trained energy-atmosphere controlled environment (AGE) coupled decision model, enabling the model to generate the current action decision based on the current state space. The AGE coupled decision model is obtained after training on a reinforcement learning model. The action space of the reinforcement learning model includes the nitrogen output mode of the nitrogen generator and the valve opening. The reward function of the reinforcement learning model includes: an AGE compliance reward, an energy utilization reward, an equipment loss penalty, and an anomaly penalty. The AGE compliance reward quantifies the difference between the gas concentration parameters in the grain storage silo and the target gas concentration parameters for the corresponding AGE stage. The energy utilization reward quantifies the photovoltaic energy utilization rate of the photovoltaic system. The equipment loss penalty quantifies the operating losses of the equipment. The anomaly penalty quantifies the severity of abnormal conditions in the grain storage silo. The nitrogen generator in the grain storage silo is adjusted based on the current action decision.

2. The photovoltaic self-powered nitrogen-controlled grain storage control method as described in claim 1, characterized in that, The current state space is input into the trained energy-modulated atmosphere coupled decision model so that the energy-modulated atmosphere coupled decision model generates the current action decision based on the current state space. This process also includes: Obtain the historical average load power corresponding to the grain storage silo; The usable power of the battery is determined based on the battery power release coefficient corresponding to the real-time SOC of the energy storage battery and the battery's rated capacity. The energy supply is determined based on the sum of the real-time photovoltaic output power and the available battery power. The energy margin is determined based on the deviation between the energy supply and the historical average load power. The energy state of the photovoltaic system is determined based on the energy margin and the real-time SOC of the energy storage battery. Using the energy state of the photovoltaic system as a hard constraint, the action space set is initially screened, and actions that cannot be executed in the action space set under the corresponding energy state are eliminated, resulting in a screened action space set. This allows the quantity-gas regulation coupled decision model to use the screened action space set as the optional execution range after receiving the current state space, and to generate the current action decision.

3. The photovoltaic self-powered nitrogen-controlled grain storage control method as described in claim 2, characterized in that, The determination of the energy state of the photovoltaic system based on the energy margin and the real-time SOC of the energy storage battery includes: When the real-time SOC of the energy storage battery is greater than the first preset SOC, or the energy margin is greater than the first preset power value, the energy state of the photovoltaic system is determined to be a high-energy state. When the real-time SOC of the energy storage battery is between the first preset SOC and the second preset SOC, or when the energy margin is between the first preset power value and the second preset power value, the energy state of the photovoltaic system is determined to be a medium-energy state; wherein, the first preset SOC is greater than the second preset SOC, and the first preset power value is greater than the second preset power value. When the real-time SOC of the energy storage battery is less than the second preset SOC, or the energy margin is less than the second preset power value, the energy state of the photovoltaic system is determined to be a low-energy state. The action space set is initially screened using the energy state of the photovoltaic system as a hard constraint, eliminating actions that cannot be executed under the corresponding energy state, resulting in a screened action space set, including: When the energy state is medium energy state, the action space is eliminated by the continuous high flow nitrogen charging mode and the action of valve full opening, and the filtered action space is obtained. When the energy state is low, the continuous high flow nitrogen charging mode, the continuous medium flow nitrogen charging mode, and the valve fully open mode in the action space are eliminated to obtain the filtered action space.

4. The photovoltaic self-powered nitrogen-controlled grain storage control method as described in claim 3, characterized in that, The energy-controlled atmosphere coupling decision model also includes: an emergency decision branch; Before adjusting the nitrogen generators in the grain storage silos based on current action decisions, the following steps are also included: If a sudden abnormal state is detected, based on the emergency decision branch, the corresponding emergency action is matched in the preset emergency action space according to the type of the sudden abnormal state, and the nitrogen generator in the grain storage is regulated based on the emergency action.

5. The photovoltaic self-powered nitrogen-controlled grain storage control method as described in claim 4, characterized in that, The controlled atmosphere stage includes: a gas pre-replacement stage, a rapid attainment stage, and a maintenance stage; The determination of the corresponding leakage index and nitrogen concentration decay coefficient based on the current controlled atmosphere stage includes: When the controlled atmosphere phase is the maintenance phase, the continuous time change sequence of nitrogen concentration in the grain storage is linearly fitted, and the nitrogen concentration decay coefficient is determined based on the fitting results; the average pressure difference fall rate inside and outside the grain storage is determined, and the leakage index is determined based on the average pressure difference fall rate. When the controlled atmosphere stage is either the gas pre-replacement stage or the rapid standard attainment stage, the nitrogen concentration decay coefficient and leakage index are set to zero.

6. The photovoltaic self-powered nitrogen-controlled grain storage control method as described in claim 5, characterized in that, The reward function includes: ; in, ; ; ; ; For the reward function; This is a reward item for achieving controlled atmosphere standards; Energy utilization rewards; This is a penalty item for equipment wear and tear; This is an abnormal penalty item; This refers to the weighting coefficient of the controlled atmosphere compliance reward items; Weighting coefficients for energy utilization rewards; This refers to the weighting coefficient for the equipment loss penalty term; This represents the weighting coefficient for the anomaly penalty item; The index weighting coefficient is the monitoring index corresponding to the oxygen concentration. This represents the target oxygen concentration for the current controlled atmosphere phase. Real-time oxygen concentration; This represents the real-time nitrogen concentration. This represents the target nitrogen concentration for the current controlled atmosphere phase. Energy utilization weighting coefficient; Real-time photovoltaic output power; The real-time discharge power of the energy storage battery based on action decision A; Real-time SOC of energy storage batteries; This represents the maximum SOC of the energy storage battery. This represents the minimum SOC for energy storage batteries. This is the loss weighting coefficient; This represents the actual number of times the equipment starts and stops per unit time based on action decision A; This represents the maximum number of start-stop cycles allowed for the equipment. The actual full-load operating time of the equipment based on action decision A; The maximum allowable full-load operating time of the equipment; m represents the type of abnormal situation; i represents the i-th type of abnormal situation; is the weight value for the i-th type of anomaly; Let A be an exception indication function based on action decision A. If action decision A triggers an exception of type i, then =1, if not triggered =0.

7. The photovoltaic self-powered nitrogen-controlled grain storage control method as described in claim 6, characterized in that, The leakage index is determined based on the rate of decrease in average differential pressure, including: The leakage index is calculated using the following formula: ; in, Leakage index, This is the preset warehouse capacity correction factor. This is the preset chamber sealing correction factor. This represents the average differential pressure drop rate.

8. A photovoltaic self-powered nitrogen-controlled grain storage control device, characterized in that, include: The module includes a basic data acquisition module, a derived data determination module, a current state space construction module, an action decision generation module, and a device control module. The basic data acquisition module is used to acquire the real-time photovoltaic output power of the photovoltaic system that powers the grain storage warehouse, the real-time SOC of the energy storage battery, and the real-time environmental data inside the grain storage warehouse. The derived data determination module is used to compare real-time environmental data with the environmental parameter boundaries corresponding to each controlled atmosphere stage, determine the current controlled atmosphere stage of the grain storage warehouse, and determine the corresponding leakage index and nitrogen concentration decay coefficient based on the controlled atmosphere stage. The current state space construction module is used to construct the current state space based on the real-time SOC of the energy storage battery, the real-time photovoltaic output power, the real-time environmental data, the leakage index, and the nitrogen concentration decay coefficient. The action decision generation module is used to input the current state space into the trained energy-atmosphere controlled environment (AGE) coupled decision model, so that the AGE coupled decision model generates the current action decision based on the current state space. The AGE coupled decision model is obtained after training on a reinforcement learning model. The action space of the reinforcement learning model includes: the nitrogen output mode of the nitrogen generator and the valve opening. The reward function of the reinforcement learning model includes: AGE compliance reward, energy utilization reward, equipment loss penalty, and anomaly penalty. The AGE compliance reward is used to quantify the difference between the gas concentration parameters in the grain storage and the target gas concentration parameters for the corresponding AGE stage. The energy utilization reward is used to quantify the photovoltaic energy utilization of the photovoltaic system. The equipment loss penalty is used to quantify the operating losses of the equipment. The anomaly penalty is used to quantify the severity of abnormal states in the grain storage. The equipment control module is used to adjust the nitrogen generator in the grain storage silo based on the current action decision.

9. A terminal device, characterized in that, The method includes a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, wherein the processor, when executing the computer program, implements the photovoltaic self-powered nitrogen-controlled grain storage control method as described in any one of claims 1 to 7.

10. A storage medium, characterized in that, The storage medium includes a stored computer program, wherein, when the computer program is running, it controls the device containing the storage medium to execute the photovoltaic self-powered nitrogen-controlled grain storage control method as described in any one of claims 1 to 7.