A method and system for regulating the temperature of a duck breeding environment by fusing quantile regression prediction and reinforcement learning
By combining quantile regression prediction and reinforcement learning methods, the status of fans and wet curtain equipment is dynamically adjusted, which solves the problems of insufficient predictability and energy consumption in temperature control of duck farming environment, and realizes smooth and efficient temperature control and energy consumption optimization.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHANDONG ACADEMY OF AGRICULTURAL SCIENCES
- Filing Date
- 2025-07-18
- Publication Date
- 2026-06-26
AI Technical Summary
Existing temperature control technologies for duck farming environments suffer from insufficient predictability, delayed control, and a lack of multi-objective optimization, making it difficult to achieve a balance between optimizing indoor temperature and reducing energy consumption.
By employing a method that integrates quantile regression prediction and reinforcement learning, the LightGBM quantile regression model is used to predict future temperature distribution, and the Soft Actor-Critic model is used to dynamically adjust the operating status of the fan and evaporative cooling pad equipment. Combined with an energy consumption penalty mechanism, a closed-loop intelligent control system is formed.
It enables effective prediction and dynamic control of future temperature fluctuations, improves the robustness and energy efficiency of the system, ensures that the temperature inside the building is within a suitable range, and reduces unnecessary operating costs of the equipment.
Smart Images

Figure CN120849953B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the technical field of intelligent control of livestock and poultry breeding environment, and more specifically, relates to a method and system for temperature regulation of meat duck breeding environment that integrates quantile regression prediction and reinforcement learning. Background Technology
[0002] Faced with the triple challenges of consumption upgrading, resource scarcity, and structural labor shortages, factory farming has gradually become the dominant model in poultry farming. Under this model, high stocking densities and fluctuations in environmental factors such as temperature and humidity can easily cause stress responses in poultry, leading to decreased farming profits. Therefore, the development of precise environmental control technology is particularly important. Currently, temperature control in duck farming mainly employs three different methods: rule / empirical models, traditional predictive models, and artificial intelligence-based methods. Existing control methods still suffer from insufficient quantification of uncertainties in future environmental variables and fail to effectively balance the multi-objective optimization relationship between control precision and energy consumption.
[0003] Chinese patent document CN120008690A discloses a method and system for monitoring the environment of livestock farms, comprising: collecting raw thermal image data, raw audio signals, and raw gas data to generate a spatiotemporally aligned multimodal dataset; performing spatiotemporal gridding processing on the multimodal dataset to extract temperature gradient, voiceprint MFCC features, and spatial distribution of gas concentration, and constructing a temperature-voiceprint-gas three-dimensional correlation matrix; based on the temperature-voiceprint-gas three-dimensional correlation matrix, dynamically compensating for animal surface temperature by combining temperature and humidity indices, and inputting gas concentration as a weighting factor into an LSTM time-series prediction model to output stress risk level and pollution risk level; and generating and executing corresponding environmental control strategies based on the stress risk level and pollution risk level.
[0004] In summary, temperature control technology in duck farming still faces challenges, including insufficient predictability and lag in regulation, static control strategies, and a lack of multi-objective optimization. Therefore, there is an urgent need to develop a predictive model with uncertainty quantification capabilities and an intelligent control decision-making architecture with autonomous learning, dynamic optimization, and multi-objective balancing capabilities. This aims to achieve a balance between optimizing indoor temperature and reducing energy consumption, thereby improving the scientific and economic efficiency of duck farming environment control. Summary of the Invention
[0005] The present invention aims to overcome at least one of the defects of the prior art and provide a method for regulating the environmental temperature of duck farming by integrating quantile regression prediction and reinforcement learning, so as to solve the problems that the environmental temperature regulation technology for duck farming still faces, including insufficient predictability and regulation lag, static control strategy and lack of multi-objective optimization.
[0006] The detailed technical solution of this invention is as follows:
[0007] A method for temperature control in duck farming environments that integrates quantile regression prediction and reinforcement learning, the method comprising:
[0008] S1. Collect multidimensional data of the aquaculture environment and the operating status information of environmental control equipment, construct features to obtain feature vectors, and form a multi-source dataset from multiple feature vector formats;
[0009] S2. Predict the temperature within the round using quantile regression based on multi-source datasets to obtain the predicted temperature at different quantiles;
[0010] S3. Using a reinforcement learning model, the problem of temperature control in aquaculture is modeled as a Markov decision process based on the current environmental state and quantile regression prediction results. The fan power and the status of the wet curtain equipment are dynamically adjusted, while energy consumption is penalized. Finally, a reinforcement learning decision model that integrates quantile regression information is obtained through training.
[0011] S4. Set an online update model step size threshold, deploy and apply the reinforcement learning decision model that integrates quantile regression information obtained from training, and save the interaction data between the decision model and the actual operating environment; when the current model accumulates a set threshold of data, use the acquired data to train the model, and update the parameters of the current running model with the optimal model parameters after training.
[0012] Furthermore, the collection of multi-dimensional data on the aquaculture environment and the operational status information of environmental control equipment specifically includes:
[0013] The system collects multi-dimensional physical parameters of the aquaculture environment and operational status information of environmental control equipment in real time through an Internet of Things (IoT) sensor network. These multi-dimensional physical parameters include, but are not limited to, indoor temperature. ,humidity Outdoor temperature ,humidity Wind speed ,wind direction The operating status of the environmental control equipment includes, but is not limited to, the operating power of the fan. Status of evaporative cooling pad equipment Information such as year, month, day, hour, and minute is collected during data acquisition, and the data is saved to obtain the aquaculture environment dataset.
[0014] Furthermore, feature vectors are constructed from the collected data through data cleaning and feature calculation, specifically as follows:
[0015] The data cleaning includes: missing value handling, outlier detection and correction, and time alignment;
[0016] The feature calculation includes: calculation of basic environmental change features and equipment-environment coupling features;
[0017] The feature vector includes: in-house environmental information at time t, external environmental prediction information at time t+1, equipment operating status information from time t-1 to time t, basic environmental change characteristics, time characteristics, and equipment-environment coupling characteristics;
[0018] The indoor environmental information at time t includes: indoor temperature at time t. Humidity inside the building at time t ;
[0019] The predicted outdoor environment information at time t+1 is: the local outdoor temperature at time t+1 obtained from the weather forecast. Outdoor humidity ;
[0020] The equipment operating status information from time t-1 to time t is: wind turbine operating power. Status of evaporative cooling pad equipment ;
[0021] The basic environmental change characteristics include: indoor temperature changes. Humidity changes inside the building Temperature changes outside the building Changes in outdoor humidity And the temperature difference between the outside temperature at time t+1 and the inside temperature at time t. The calculation method is as follows:
[0022] (1);
[0023] (2);
[0024] (3);
[0025] (4);
[0026] (5);
[0027] In formulas (1)-(5), Let be the temperature inside the building at time t-1. The humidity inside the building at time t-1. Let t be the outside temperature. Let t-1 be the outside temperature. Let t be the outdoor humidity. The external humidity at time t-1;
[0028] The device-environment coupling characteristics include: fan-temperature difference interaction characteristics. evaporative cooling pad-temperature interaction characteristics and the interaction characteristics of wet curtains and humidity The calculation formula is as follows:
[0029] (6);
[0030] (7);
[0031] (8);
[0032] In formulas (6)-(8), This refers to the operating power of the wind turbine. Status of the evaporative cooling pad equipment;
[0033] The complete representation of the feature vector is as follows: ;
[0034] A multi-source dataset consists of data in multiple feature vector formats.
[0035] Furthermore, time characteristics can be predicted using time information, including the hour in which the sampling time t is located. That is, 0-23h, month That is, January to December.
[0036] Furthermore, the rounding temperature prediction based on quantile regression specifically involves:
[0037] Quantile prediction is achieved using the LightGBM quantile regression model, which includes three independent LightGBM quantile regression models corresponding to the 0.25, 0.5, and 0.75 quantiles, respectively.
[0038] The multi-source dataset was divided into training and testing sets in a 4:1 ratio. Three independent LightGBM quantile regression models were trained on each set, and the quantile loss function was used as the training and evaluation metric. The quantile loss function is defined as follows:
[0039] (9);
[0040] In formula (9), It is the actual temperature inside the chamber at time t+1; It is the predicted temperature inside the building at time t+1; These are quantiles, with values of 0.25, 0.5, and 0.75.
[0041] The output results of the LightGBM quantile regression model are as follows: , , .
[0042] Furthermore, the method of using a reinforcement learning model to model the temperature regulation problem in the aquaculture environment as a Markov decision process based on the current environmental state and quantile regression prediction results specifically includes:
[0043] First, a thermodynamic model of the poultry house was built using Gym to simulate the internal and external environment of the house and the operating status of the fans and wet curtain equipment.
[0044] Then, a Markov decision process model was used to model the temperature control problem in duck farming. The Markov decision process includes a state space, an action space, and a reward function.
[0045] The state space is defined as: current internal and external environmental information, predicted external environmental information at time t+1, equipment operating status information at time t-1, time characteristics, and quantile regression model prediction results;
[0046] The state space is represented as follows: ;
[0047] Action space is defined as: fan operating power and the operating power of the wet curtain ;
[0048] The reward function is defined as: comfort reward and energy consumption penalty;
[0049] The comfort reward refers to a bonus if the indoor temperature remains within the target range, and a penalty if it deviates from it; the target range for indoor temperature is... , This represents the minimum temperature inside the building. The comfort reward is the maximum indoor temperature. for:
[0050] (10);
[0051] In formula (10), This represents the degree to which the indoor temperature deviates from the target range.
[0052] energy consumption penalty for:
[0053] (11);
[0054] In formula (11), It is the penalty coefficient for equipment energy consumption. . This is the sampling step size; For electricity price;
[0055] The reward function based on reinforcement learning is:
[0056] (12);
[0057] Then, a reinforcement learning method, namely the reinforcement learning model Soft Actor-Critic, is used to interact with a Markov decision process.
[0058] The Soft Actor-Critic model includes one Actor network, two Critic networks, and a corresponding target Critic network:
[0059] The Actor network receives the state shown in the state space, outputs the Gaussian distribution parameters of the fan operating power and the wet curtain operating status, and samples the fan operating power and the wet curtain operating status from the distribution through the reparameter technique.
[0060] The Critic network is used to evaluate the expected cumulative reward for the operating power of the fan and evaporative cooling pad equipment under a given state.
[0061] The target Critic network synchronizes its parameters from the main Critic network via soft updates to stabilize the learning process;
[0062] The training method for the temperature control model in duck breeding houses based on reinforcement learning is as follows: data is obtained by interacting with the simulated poultry house environment through the SoftActor-Critic model, the interaction data is stored by using experience playback, and the parameters of the Critic network, Actor network and target network are trained and updated based on randomly sampled small batch data to obtain a reinforcement learning decision model that integrates quantile regression information.
[0063] Specifically, the training process of the reinforcement learning-based temperature control model for duck farming houses includes:
[0064] (1) Use CFD or Gym to build simulation environment: Use CFD or Gym methods, including but not limited to, to build thermodynamic or fluid dynamic models of poultry houses to simulate the changes in the environmental state inside and outside the houses; at the same time, model the temperature regulation problem of meat duck farming environment using Markov decision process modeling.
[0065] (2) Set appropriate training parameters for the Soft Actor-Critic model and the total number of interaction steps between the model and the environment, and set the number of interaction steps between the model and the environment to 0 for the first time;
[0066] (3) The model interacts with the simulation environment to obtain interactive data and stores the interactive data in the playback buffer;
[0067] Specifically, the data is obtained through interaction between the Soft Actor-Critic model and the simulated poultry house environment. Each interaction yields one data point, which is then stored in the playback buffer.
[0068] (4) The number of steps for model-environment interaction is increased by 1;
[0069] (5) Determine if the number of interaction steps has reached the initial training step count:
[0070] If so, randomly sample a small batch of data from the data stored in the replay buffer, train and update the Actor and Critic network parameters according to the Soft Actor-Critic model rules, update the target network, and increase the number of interaction steps between the model and the environment by 1 step;
[0071] Otherwise, return to (2);
[0072] (6) Determine if the model check and save steps have been reached:
[0073] If yes, then evaluate the model performance; if no, then jump to (8).
[0074] (7) Evaluate model performance, i.e. set the number of model evaluation steps n, use the currently trained model to interact with the environment n times, calculate the average reward value of n times, compare the current average reward value with the highest average reward value during the previous model evaluation, if the current value is the largest, then save the current model as the best model; otherwise, store it as the stage model.
[0075] Once the model evaluation is complete, proceed to (8);
[0076] (8) Determine whether the number of interaction steps between the model and the environment has reached the total number of training steps. If yes, end the training; otherwise, return to (3).
[0077] On the other hand, the present invention also includes: a temperature control system for duck farming environment that integrates quantile regression prediction and reinforcement learning, the system comprising: a farming shed section and an outdoor section;
[0078] The breeding house includes: the main body of the duck breeding house 1, the wet curtain equipment 2, the fan equipment 3, the breeding cage frame 4, and the indoor temperature and humidity sensor 5, the edge controller 10 and the cloud server 11 installed on the breeding cage frame 4;
[0079] The evaporative cooling pad 2 is installed on one side of the long side of the main body 1 of the breeding house; the fan 3 is installed on one side of the short side of the breeding house; the status of the evaporative cooling pad 2 and the fan 3 can be directly read or indirectly calculated. Fan operating power The edge controller 10 is installed on either side of the main body 1 of the duck breeding house;
[0080] The temperature and humidity sensor 5 inside the house is equipped with three sets, which are installed on different breeding cages 4 respectively. The horizontal installation position is located at the middle of the vertical distance between the wet curtain equipment 2 and the fan equipment 3, and the vertical installation position is located at half the height of the breeding cage 4.
[0081] Furthermore, the indoor temperature and humidity sensor 5 collects data every five minutes, and the average value is used as the collected indoor temperature and humidity data. , ;
[0082] The outdoor portion includes: an outdoor mounting bracket 6, a wind speed sensor 7, a wind direction sensor 8, and an outdoor temperature and humidity sensor 9;
[0083] The external mounting bracket 6 is installed above the main body 1 of the duck breeding house, and the installation position is far away from the wet curtain equipment 2 and the fan equipment 3;
[0084] The wind speed sensor 7, wind direction sensor 8, and outdoor temperature and humidity sensor 9 are mounted on the outdoor mounting bracket 6 for collecting wind speed data. ,wind direction Outdoor temperature ,humidity data;
[0085] Furthermore, the wind speed sensor 7, wind direction sensor 8, and outdoor temperature and humidity sensor 9 collect data once every five minutes;
[0086] The edge controller 10 is connected to the wet curtain device 2, the fan device 3, the indoor temperature and humidity sensor 5, the wind speed sensor 7, the wind direction sensor 8, the outdoor temperature and humidity sensor 9, and the cloud server 11 via wired or wireless means.
[0087] The edge controller 10 is used to collect the data required for the indoor temperature control model, deploy the indoor temperature control model, and communicate with the cloud server 11. It sends the data required for the indoor temperature control model to the cloud server 11 and receives the updated optimal indoor temperature control model from the cloud server 11.
[0088] The cloud server 11 is used for building the simulation environment, training the indoor temperature control model, storing the data uploaded by the edge controller 10, updating the indoor temperature control model online, and distributing the updated optimal model to the edge controller 10 for deployment.
[0089] In another aspect of the invention, a computer-readable storage medium is also provided, which stores executable instructions that, when executed, cause the machine to perform the above-described method for regulating the ambient temperature of duck farming environment by fusing quantile regression prediction and reinforcement learning.
[0090] Compared with the prior art, the beneficial effects of the present invention are as follows:
[0091] (1) The present invention provides a method and system for regulating the ambient temperature of duck farming environment by integrating quantile regression prediction and reinforcement learning. It uses LightGBM quantile regression to predict three key quantiles of future temperature, namely 0.25, 0.5, and 0.75, which provides quantitative prior information about the potential fluctuation range and uncertainty of future temperature for the regulation model, enabling it to make more effective forward regulation when making action decisions.
[0092] (2) The present invention provides a method and system for temperature control in duck farming environment that integrates quantile regression prediction and reinforcement learning. The input features are comprehensive, covering key historical states, current states, future external environment predictions, time features, and important equipment-environment interaction features, making it more adaptable to dynamic changes in the poultry house environment. SoftActor-Critic uses comprehensive features and prediction uncertainty as input, enabling the control system to take preventive measures in advance, improve the robustness of the system to cope with environmental disturbances, and avoid drastic temperature fluctuations.
[0093] (3) The present invention provides a method and system for temperature control in duck farming environment that integrates quantile regression prediction and reinforcement learning. The core objective of Soft Actor-Critic is to find and execute the lowest energy consumption operation scheme under the premise of ensuring that the temperature inside the shed is within a suitable range, thereby reducing unnecessary start-ups or excessive operation of equipment such as fans and wet curtains. At the same time, it uses future environmental prediction information for pre-adjustment to make the system operation smoother and more efficient. It deeply integrates prediction and control to realize intelligent decision-making closed loop. LightGBM provides intelligent prediction of future temperature changes and their uncertainties. SoftActor-Critic makes end-to-end decisions based on this uncertainty information. The two are closely coupled to form a closed-loop intelligent control system of "perception-prediction-decision-execution". Attached Figure Description
[0094] Figure 1 This is a schematic diagram of the process for regulating the ambient temperature in duck farming using a method that integrates quantile regression prediction and reinforcement learning, as described in this invention.
[0095] Figure 2 This is a schematic diagram of the training process for round-round temperature prediction based on quantile regression in Embodiment 1 of the present invention.
[0096] Figure 3 This is a schematic diagram of the training process of the temperature control model for duck farming houses based on reinforcement learning in Embodiment 1 of the present invention.
[0097] Figure 4It is a schematic diagram of the structure of a meat duck breeding house and the layout of sensors inside the house in Embodiment 1 of the present invention.
[0098] Figure 5 It is a schematic diagram of the layout of sensors outside the meat duck breeding house in Embodiment 1 of the present invention.
[0099] Explanation of the reference numerals is as follows: 1. Meat duck breeding house; 2. Wet curtain equipment; 3. Fan equipment; 4. Breeding cage rack; 5. Temperature and humidity sensor inside the house; 6. Sensor support outside the house; 7. Wind speed sensor; 8. Wind direction sensor; 9. Temperature and humidity sensor outside the house; 10. Edge controller; 11. Cloud server. Detailed implementation manners
[0100] The present invention will be further described below with reference to the drawings and embodiments.
[0101] It should be noted that the following detailed description is exemplary and intended to provide further explanation of the present invention. Unless otherwise specified, all technical and scientific terms used herein have the same meaning as commonly understood by those of ordinary skill in the technical field to which the present invention belongs.
[0102] It should be noted that the terms used herein are only for describing specific implementation manners and are not intended to limit the exemplary embodiments according to the present invention. As used herein, unless the context clearly indicates otherwise, the singular form is also intended to include the plural form. In addition, it should be understood that when the terms "comprise" and / or "include" are used in this specification, they indicate the presence of features, steps, operations, devices, components, and / or combinations thereof.
[0103] In the case of no conflict, the embodiments in the present invention and the features in the embodiments can be combined with each other.
[0104] Embodiment 1
[0105] Refer Figure 1 , as shown in the training flow chart of the LightGBM model for predicting the temperature quantiles inside the meat duck breeding house, this embodiment provides a method for regulating the temperature of the meat duck breeding environment by integrating quantile regression prediction and reinforcement learning, including:
[0106] S1. Multi-source data collection and feature construction; collect multi-dimensional data of the breeding environment and the operation status information of the environmental control equipment, perform feature construction to obtain feature vectors, and form a multi-source data set from data in the format of multiple feature vectors.
[0107] Specifically, the collection of multi-dimensional data of the breeding environment and the operation status information of the environmental control equipment specifically includes:
[0108] The system collects multi-dimensional physical parameters of the aquaculture environment and operational status information of environmental control equipment in real time through an Internet of Things (IoT) sensor network. These multi-dimensional physical parameters include, but are not limited to, indoor temperature. ,humidity Outdoor temperature ,humidity Wind speed ,wind direction The operating status of the environmental control equipment includes, but is not limited to, the operating power of the fan. Status of evaporative cooling pad equipment Information such as year, month, day, hour, and minute is collected during data acquisition, and the data is saved to obtain the aquaculture environment dataset.
[0109] Preferably, feature vectors are constructed from the collected data through data cleaning and feature calculation, specifically as follows:
[0110] The data cleaning includes: missing value handling, outlier detection and correction, and time alignment;
[0111] The feature calculation includes: calculation of basic environmental change features and equipment-environment coupling features;
[0112] The feature vector includes: indoor environmental information at time t, outdoor environmental prediction information at time t+1, equipment operating status information from time t-1 to time t, basic environmental change characteristics, time characteristics, and equipment-environment coupling characteristics.
[0113] Specifically, the environmental information inside the building at time t includes: indoor temperature. Humidity inside the building ;
[0114] The outdoor environment prediction information at time t+1 includes: the local outdoor temperature at time t+1 obtained from the weather forecast. Outdoor humidity ;
[0115] The equipment operating status information from time t-1 to time t includes: wind turbine operating power. Status of evaporative cooling pad equipment ;
[0116] The basic environmental change characteristics include: indoor temperature changes. Humidity changes inside the building Temperature changes outside the building Changes in outdoor humidity And the temperature difference between the outside temperature at time t+1 and the inside temperature at time t. The calculation formula is as follows:
[0117] (1);
[0118] (2);
[0119] (3);
[0120] (4);
[0121] (5);
[0122] In formulas (1)-(5), Let be the temperature inside the building at time t-1. The humidity inside the building at time t-1. Let t be the outside temperature. Let t-1 be the outside temperature. Let t be the outdoor humidity. The external humidity at time t-1;
[0123] Time characteristics can be predicted by time information, including: the hour in which the sampling time t is located. That is, 0-23h, month That is, January to December;
[0124] The device-environment coupling characteristics include: fan-temperature difference interaction characteristics. evaporative cooling pad-temperature interaction characteristics and the interaction characteristics of wet curtains and humidity The calculation formula is as follows:
[0125] (6);
[0126] (7);
[0127] (8);
[0128] In formulas (6)-(8), This refers to the operating power of the wind turbine. Status of the evaporative cooling pad equipment;
[0129] The feature vector is fully represented as: .
[0130] S2. Rounding temperature prediction based on quantile regression: Using multi-source datasets, rounding temperature is predicted based on quantile regression to obtain predicted temperatures at different quantiles.
[0131] Specifically, the rounding temperature prediction based on quantile regression is as follows: Figure 2 As shown, specifically:
[0132] Data was collected in actual breeding environments or simulated in simulation environments, accumulating at least 30 days of data in the format of the feature vectors described above as a dataset, and then divided into training and testing sets at a ratio of 4:1.
[0133] The quantile prediction model is the LightGBM model. Training parameters are configured for the quantile prediction model, and quantile regression mode is enabled. Target quantiles are set to 0.25, 0.5, and 0.75, respectively. Preferably, the quantile prediction model structure parameters are configured as follows: the maximum number of elements is 1000, the maximum number of leaf nodes per tree is 31, and the gradient descent step size is 0.05. The regularization and robustness parameters are configured as follows: L1 regularization coefficient is 0.1, L2 regularization coefficient is 0.1, 80% of features are randomly selected for each tree, and 80% of the data is randomly sampled for each tree.
[0134] The quantile prediction model uses a quantile loss function as both the model training objective and evaluation metric to optimize and validate the accuracy of quantile predictions. The quantile loss function is defined as follows:
[0135] (9);
[0136] In formula (9), It is the actual temperature inside the chamber at time t+1; It is the predicted temperature inside the building at time t+1; These are quantiles, with values of 0.25, 0.5, and 0.75.
[0137] The output results of the LightGBM quantile regression model are as follows: , , .
[0138] S3. Reinforcement learning decision generation that integrates quantile regression information: Using a reinforcement learning model, the problem of temperature control in the aquaculture environment is modeled as a Markov decision process based on the current environmental state and quantile regression prediction results. The fan power and the status of the wet curtain equipment are dynamically adjusted, while energy consumption is penalized. Finally, a reinforcement learning decision model that integrates quantile regression information is obtained through training.
[0139] Furthermore, the reinforcement learning model is used to model the temperature control problem in the aquaculture environment as a Markov decision process based on the current environmental state and quantile regression prediction results. The reinforcement learning model adopts the Soft Actor-Critic model suitable for continuous action spaces. Using this model, the fan power and the status of the evaporative cooling pad are dynamically adjusted based on the current environmental state and quantile regression model prediction results, while energy consumption penalties are applied to ensure that energy costs are reduced under suitable temperature conditions. Specifically, this includes:
[0140] First, a thermodynamic model of the poultry house was built using Gym to simulate the internal and external environment of the house and the operating status of the fans and wet curtain equipment.
[0141] Then, a Markov decision process model was used to model the temperature control problem in duck farming. The Markov decision process includes a state space, an action space, and a reward function.
[0142] The state space is defined as: current internal and external environmental information, predicted external environmental information at time t+1, equipment operating status information at time t-1, time characteristics, and quantile regression model prediction results;
[0143] The state space is represented as follows: ;
[0144] Action space is defined as: wind turbine operating power and the operating power of the wet curtain ;
[0145] The reward function is defined as: comfort reward and energy consumption penalty;
[0146] The comfort reward refers to a bonus if the indoor temperature remains within the target range, and a penalty if it deviates from it; the target range for indoor temperature is... , This represents the minimum temperature inside the building. The comfort reward is the maximum indoor temperature. for:
[0147] (10);
[0148] In formula (10), This represents the degree to which the indoor temperature deviates from the target range;
[0149] The energy consumption penalty for:
[0150] (11);
[0151] In formula (11), It is the penalty coefficient for equipment energy consumption. . This is the sampling step size; For electricity price;
[0152] The reward function based on reinforcement learning is:
[0153] ;
[0154] Then, a reinforcement learning method, namely the reinforcement learning model Soft Actor-Critic, is used to interact with the Markov Decision Process. The reinforcement learning method selects actions based on the environment of the MDP, executes the actions, and obtains rewards. At the same time, the execution of actions will affect the environment, which is the interaction process.
[0155] The reinforcement learning method is a Soft Actor-Critic model suitable for continuous action spaces:
[0156] The Soft Actor-Critic model dynamically adjusts the operating power of the fan and evaporative cooling pad equipment based on the current environmental conditions and the prediction results of the quantile regression model, while also imposing energy consumption penalties to ensure that energy costs are reduced under suitable temperature conditions.
[0157] The Soft Actor-Critic model includes an Actor network, two Critic networks, and a corresponding target Critic network.
[0158] The Actor network receives the states shown in the state space, outputs Gaussian distribution parameters of the fan operating power and the evaporative cooling pad operating state, and samples the fan operating power and the evaporative cooling pad operating state from the distribution using the reparameter repetition technique; the Critic network is used to evaluate the expected cumulative reward for the fan and evaporative cooling pad operating power under a given state; the target Critic network synchronizes parameters from the main Critic network through soft updates to stabilize the learning process.
[0159] The training method for the temperature control model in duck farmhouse based on reinforcement learning is as follows: data is obtained by interacting with the simulated poultry house environment through the SoftActor-Critic model, the interactive data is stored using experience playback, and the parameters of the Critic network, Actor network and target network are trained and updated based on randomly sampled small batch data to obtain a reinforcement learning decision model that integrates quantile regression information, i.e., the temperature control model in duck farmhouse based on reinforcement learning.
[0160] Specifically, the training process of the reinforcement learning-based temperature control model for duck farming sheds is as follows: Figure 3 As shown, it specifically includes:
[0161] (1) Use CFD or Gym to build simulation environment: Use CFD or Gym methods, including but not limited to, to build thermodynamic or fluid dynamic models of poultry houses to simulate the changes in the environmental state inside and outside the houses; at the same time, model the temperature regulation problem of meat duck farming environment using Markov decision process modeling.
[0162] (2) Set appropriate training parameters for the Soft Actor-Critic model and the total number of interaction steps between the model and the environment, and set the number of interaction steps between the model and the environment to 0 for the first time;
[0163] (3) The model interacts with the simulation environment to obtain interactive data and stores the interactive data in the playback buffer;
[0164] Specifically, the data is obtained through interaction between the Soft Actor-Critic model and the simulated poultry house environment. Each interaction yields one data point, which is then stored in the playback buffer.
[0165] (4) The number of steps for model-environment interaction is increased by 1;
[0166] (5) Determine if the number of interaction steps has reached the initial training step count:
[0167] If so, randomly sample a small batch of data from the data stored in the replay buffer, train and update the Actor and Critic network parameters according to the Soft Actor-Critic model rules, update the target network, and increase the number of interaction steps between the model and the environment by 1 step;
[0168] Otherwise, return to (2);
[0169] (6) Determine if the model check and save steps have been reached:
[0170] If yes, then evaluate the model performance; if no, then jump to (8).
[0171] (7) Evaluate model performance, i.e. set the number of model evaluation steps n, use the currently trained model to interact with the environment n times, calculate the average reward value of n times, compare the current average reward value with the highest average reward value during the previous model evaluation, if the current value is the largest, then save the current model as the best model; otherwise, store it as the stage model.
[0172] Once the model evaluation is complete, proceed to (8);
[0173] (8) Determine whether the number of interaction steps between the model and the environment has reached the total number of training steps. If yes, end the training; otherwise, return to (3).
[0174] Preferably, the Soft Actor-Critic model parameter settings include two parts: model initialization parameters, training process parameters, and callback function parameters. The model initialization parameters include a policy network type of multilayer perceptron, an experience buffer size of 10,000 data points, 5,000 experience steps collected before learning begins, 256 samples used each time, one data point generated per step, the number of steps and the number of data points being equivalent, a soft update coefficient of 0.005, a discount factor of 0.99, and an entropy coefficient set to automatic adjustment.
[0175] The training process parameters and callback function parameters include a total training step count of 200,000 and a model check and save step count of 10,000, meaning that the system performs a model evaluation and save every 10,000 steps.
[0176] S4. Online Model Update: Set an online model update step size threshold, deploy and apply the reinforcement learning decision model that integrates quantile regression information obtained from training, and save the interaction data between the decision model and the actual operating environment; when the current model accumulates a set threshold of data, use the acquired data to train the model, and update the parameters of the currently running model with the optimal model parameters after training is completed.
[0177] Example 2
[0178] This embodiment provides a temperature control system for duck farming environment that integrates quantile regression prediction and reinforcement learning. The system includes: a farming shed section and an outdoor section.
[0179] The breeding shed section is as follows Figure 4 As shown, it includes: the main body of the duck breeding house 1, the wet curtain equipment 2, the fan equipment 3, the breeding cage frame 4, and the indoor temperature and humidity sensor 5, the edge controller 10 and the cloud server 11 installed on the breeding cage frame 4.
[0180] The evaporative cooling pad 2 is installed on one side of the long side of the main body 1 of the breeding house; the fan 3 is installed on one side of the short side of the breeding house; the status of the evaporative cooling pad 2 and the fan 3 can be directly read or indirectly calculated. Fan operating power The edge controller 10 is installed on either side of the main body 1 of the duck breeding house.
[0181] The temperature and humidity sensor 5 inside the house is equipped with three sets, which are installed on different breeding cages 4 respectively. The horizontal installation position is located at the middle of the vertical distance between the wet curtain device 2 and the fan device 3, and the vertical installation position is located at half the height of the breeding cage 4.
[0182] Preferably, the indoor temperature and humidity sensor 5 collects data every five minutes, and the average value is used as the collected indoor temperature and humidity data. , .
[0183] The external portion, such as Figure 5 As shown, it includes: an outdoor mounting bracket 6, a wind speed sensor 7, a wind direction sensor 8, and an outdoor temperature and humidity sensor 9;
[0184] The external mounting bracket 6 is installed above the main body 1 of the duck breeding house, and the installation position is as far away as possible from the wet curtain equipment 2 and the fan equipment 3 to avoid interference; preferably, the installation position of the external mounting bracket 6 can be located at the middle position of the vertical distance between the wet curtain equipment 2 and the fan equipment 3.
[0185] The wind speed sensor 7, wind direction sensor 8, and outdoor temperature and humidity sensor 9 are mounted on the outdoor mounting bracket 6 for collecting wind speed data. ,wind direction Outdoor temperature ,humidity Data. The wind speed sensor 7, wind direction sensor 8, and outdoor temperature and humidity sensor 9 collect data every five minutes.
[0186] The edge controller 10 is connected to the wet curtain device 2, the fan device 3, the indoor temperature and humidity sensor 5, the wind speed sensor 7, the wind direction sensor 8, the outdoor temperature and humidity sensor 9, and the cloud server 11 via wired or wireless means.
[0187] The edge controller 10 is used to collect the data required for the indoor temperature control model, deploy the indoor temperature control model, and communicate with the cloud server 11. It sends the data required for the indoor temperature control model to the cloud server 11 and receives the updated optimal indoor temperature control model from the cloud server 11.
[0188] The cloud server 11 is used for building the simulation environment, training the indoor temperature control model, storing the data uploaded by the edge controller 10, updating the indoor temperature control model online, and distributing the updated optimal model to the edge controller 10 for deployment.
[0189] Example 3
[0190] This embodiment also provides a computer-readable storage medium storing executable instructions, which, when executed, cause the machine to perform the above-described method for regulating the ambient temperature of duck farming environment by fusing quantile regression prediction and reinforcement learning.
[0191] Specifically, a system or apparatus equipped with a readable storage medium may be provided, on which software program code implementing the functions of any of the embodiments described above is stored, and the computer or processor of the system or apparatus can read and execute the instructions stored in the readable storage medium.
[0192] In this case, the program code itself, which can be read from the readable medium, can perform the functions of any of the above embodiments, and therefore the computer-readable code and the readable storage medium storing the computer-readable code constitute a part of this specification.
[0193] Examples of readable storage media include floppy disks, hard disks, magneto-optical disks, optical disks (such as CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD-RW), magnetic tapes, non-volatile memory cards, and ROMs. Alternatively, program code can be downloaded from a server computer or the cloud via a communication network.
[0194] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
[0195] Obviously, the above embodiments of the present invention are merely examples for clearly illustrating the technical solutions of the present invention, and are not intended to limit the specific implementation of the present invention. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of the claims of the present invention should be included within the protection scope of the claims of the present invention.
Claims
1. A method for temperature control in duck farming environments that integrates quantile regression prediction and reinforcement learning, characterized in that, The method includes: S1. Collect multidimensional data of the aquaculture environment and the operating status information of environmental control equipment, construct features to obtain feature vectors, and form a multi-source dataset from multiple feature vector formats; S2. Predict the temperature within the round using quantile regression based on multi-source datasets to obtain the predicted temperature at different quantiles; The rounding temperature prediction based on quantile regression specifically refers to: Quantile prediction is achieved using the LightGBM quantile regression model, which includes three independent LightGBM quantile regression models corresponding to the 0.25, 0.5, and 0.75 quantiles, respectively. The multi-source dataset was divided into training and testing sets in a 4:1 ratio. Three independent LightGBM quantile regression models were trained on each set, and the quantile loss function was used as the training and evaluation metric. The quantile loss function is defined as follows: (9); In formula (9), It is the actual temperature inside the chamber at time t+1; It is the predicted temperature inside the building at time t+1; These are quantiles, with values of 0.25, 0.5, and 0.
75. The output results of the LightGBM quantile regression model are as follows: , , ; S3. Using a reinforcement learning model, the problem of temperature control in aquaculture is modeled as a Markov decision process based on the current environmental state and quantile regression prediction results. The fan power and the status of the wet curtain equipment are dynamically adjusted, while energy consumption is penalized. Finally, a reinforcement learning decision model that integrates quantile regression information is obtained through training. The method of using a reinforcement learning model to model the problem of temperature regulation in aquaculture environments as a Markov decision process based on the current environmental state and quantile regression prediction results includes: First, a thermodynamic model of the poultry house was built using Gym to simulate the internal and external environment of the house and the operating status of the fans and wet curtain equipment. Then, a Markov decision process model was used to model the temperature control problem in duck farming. The Markov decision process includes a state space, an action space, and a reward function. The state space is defined as: current internal and external environmental information, predicted external environmental information at time t+1, equipment operating status information at time t-1, time characteristics, and quantile regression model prediction results; The state space is represented as follows: ; Wherein, the temperature inside the house at time t Humidity inside the building at time t outdoor temperature at time t outdoor humidity at time t The local outdoor temperature at time t+1 obtained from the weather forecast. Local outdoor humidity at time t+1 obtained from weather forecast Fan operating power Status of evaporative cooling pad equipment The hour The month ; Action space is defined as: wind turbine operating power and the operating power of the wet curtain ; The reward function is defined as: comfort reward and energy consumption penalty; The comfort reward refers to a bonus if the indoor temperature remains within the target range, and a penalty if it deviates from it; the target range for indoor temperature is... , This represents the minimum temperature inside the building. The comfort reward is the maximum indoor temperature. for: (10); In formula (10), This represents the degree to which the indoor temperature deviates from the target range; energy consumption penalty for: (11); In formula (11), It is the penalty coefficient for equipment energy consumption. , This is the sampling step size; For electricity price; The reward function based on reinforcement learning is: (12); Then, a reinforcement learning method, namely the reinforcement learning model Soft Actor-Critic, is used to interact with a Markov decision process. The Soft Actor-Critic model includes one Actor network, two Critic networks, and a corresponding target Critic network: The Actor network receives the state shown in the state space, outputs the Gaussian distribution parameters of the fan operating power and the wet curtain operating status, and samples the fan operating power and the wet curtain operating status from the distribution through the reparameter technique. The Critic network is used to evaluate the expected cumulative reward for the operating power of the fan and evaporative cooling pad equipment under a given state. The target Critic network synchronizes its parameters from the main Critic network via soft updates to stabilize the learning process; The training method for the temperature control model in the duck breeding house based on reinforcement learning is as follows: data is obtained by interacting with the simulated poultry house environment through the SoftActor-Critic model, the interaction data is stored by using experience playback, and the parameters of the Critic network, Actor network and target network are trained and updated based on randomly sampled small batch data to obtain a reinforcement learning decision model that integrates quantile regression information. S4. Set an online update model step size threshold, deploy and apply the reinforcement learning decision model that integrates quantile regression information obtained from training, and save the interaction data between the decision model and the actual operating environment; when the current model accumulates a set threshold of data, use the acquired data to train the model, and update the parameters of the current running model with the optimal model parameters after training.
2. The method for temperature control in duck farming environment by integrating quantile regression prediction and reinforcement learning as described in claim 1, characterized in that, The collection of multidimensional data on the aquaculture environment and the operational status information of environmental control equipment specifically includes: The system collects multi-dimensional physical parameters of the aquaculture environment and operational status information of environmental control equipment in real time through an Internet of Things (IoT) sensor network. These multi-dimensional physical parameters include: indoor temperature. ,humidity Outdoor temperature ,humidity Wind speed ,wind direction ; The operating status of the environmental control equipment includes: fan operating power. Status of evaporative cooling pad equipment Information such as year, month, day, hour, and minute is collected during data acquisition, and the data is saved to obtain the aquaculture environment dataset.
3. The method for temperature control in duck farming environment by integrating quantile regression prediction and reinforcement learning according to claim 2, characterized in that, The collected data is cleaned and feature vectors are constructed through feature calculation, specifically as follows: The data cleaning includes: missing value handling, outlier detection and correction, and time alignment; The feature calculation includes: calculation of basic environmental change features and equipment-environment coupling features; The feature vector includes: in-house environmental information at time t, external environmental prediction information at time t+1, equipment operating status information from time t-1 to time t, basic environmental change characteristics, time characteristics, and equipment-environment coupling characteristics; The indoor environmental information at time t includes: indoor temperature at time t. Humidity inside the building at time t ; The predicted outdoor environment information at time t+1 is: the local outdoor temperature at time t+1 obtained from the weather forecast. Outdoor humidity ; The equipment operating status information from time t-1 to time t is: wind turbine operating power. Status of evaporative cooling pad equipment ; The basic environmental change characteristics include: indoor temperature changes. Humidity changes inside the building Temperature changes outside the building Changes in outdoor humidity And the temperature difference between the outside temperature at time t+1 and the inside temperature at time t. The calculation method is as follows: (1); (2); (3); (4); (5); In formulas (1)-(5), Let be the temperature inside the building at time t-1. The humidity inside the building at time t-1. Let t be the outside temperature. Let be the outside temperature at time t-1. Let t be the outdoor humidity. The external humidity at time t-1; The device-environment coupling characteristics include: fan-temperature difference interaction characteristics. evaporative cooling pad-temperature interaction characteristics and the interaction characteristics of wet curtains and humidity The calculation formula is as follows: (6); (7); (8); In formulas (6)-(8), This refers to the operating power of the wind turbine. Status of the evaporative cooling pad equipment; The complete representation of the feature vector is as follows: ; A multi-source dataset consists of data in multiple feature vector formats.
4. The method for temperature control in duck farming environment by integrating quantile regression prediction and reinforcement learning as described in claim 3, characterized in that, The training process of the reinforcement learning-based temperature control model for duck farming houses includes: (1) Build a simulation environment using CFD or Gym: Thermodynamic or fluid dynamic models of poultry houses are constructed using methods including but not limited to CFD and Gym to simulate changes in the environmental conditions inside and outside the houses; at the same time, Markov decision process modeling is performed on the temperature regulation problem in duck farming. (2) Set appropriate training parameters for the Soft Actor-Critic model and the total number of interaction steps between the model and the environment, and set the number of interaction steps between the model and the environment to 0 for the first time; (3) The model interacts with the simulation environment to obtain interactive data and stores the interactive data in the playback buffer; Specifically, the data is obtained through interaction between the Soft Actor-Critic model and the simulated poultry house environment. Each interaction yields one data point, which is then stored in the playback buffer. (4) The number of steps for model-environment interaction is increased by 1; (5) Determine if the number of interaction steps has reached the initial training step count: If so, randomly sample a small batch of data from the data stored in the replay buffer, train and update the Actor and Critic network parameters according to the Soft Actor-Critic model rules, update the target network, and increase the number of interaction steps between the model and the environment by 1 step; Otherwise, return to (2); (6) Determine if the model check and save steps have been reached: If yes, then evaluate the model performance; if no, then jump to (8). (7) Evaluate model performance, i.e. set the number of model evaluation steps n, use the currently trained model to interact with the environment n times, calculate the average reward value of n times, compare the current average reward value with the highest average reward value during the previous model evaluation, if the current value is the largest, then save the current model as the best model; otherwise, store it as the stage model. Once the model evaluation is complete, proceed to (8); (8) Determine whether the number of interaction steps between the model and the environment has reached the total number of training steps. If yes, end the training; otherwise, return to (3).
5. The method for regulating environmental temperature in duck farming by integrating quantile regression prediction and reinforcement learning according to claim 2, characterized in that, Time characteristics can be predicted by time information, including: the hour in which the sampling time t is located. That is, 0-23h, month That is, January to December.
6. A temperature control system for duck farming environment integrating quantile regression prediction and reinforcement learning, used to execute the method as described in any one of claims 1-5, characterized in that, The system includes: a breeding shed section and an outdoor section; The breeding house includes: the main body of the duck breeding house 1, the wet curtain equipment 2, the fan equipment 3, the breeding cage frame 4, and the indoor temperature and humidity sensor 5, the edge controller 10 and the cloud server 11 installed on the breeding cage frame 4; The evaporative cooling pad 2 is installed on one side of the long side of the main body 1 of the breeding house; the fan 3 is installed on one side of the short side of the breeding house; the status of the evaporative cooling pad 2 and the fan 3 can be directly read or indirectly calculated. Fan operating power The edge controller 10 is installed on either side of the main body 1 of the duck breeding house; The temperature and humidity sensor 5 inside the house is equipped with three sets, which are installed on different breeding cages 4 respectively. The horizontal installation position is located at the middle of the vertical distance between the wet curtain equipment 2 and the fan equipment 3, and the vertical installation position is located at half the height of the breeding cage 4. The outdoor portion includes: an outdoor mounting bracket 6, a wind speed sensor 7, a wind direction sensor 8, and an outdoor temperature and humidity sensor 9; The external mounting bracket 6 is installed above the main body 1 of the duck breeding house, and the installation position is far away from the wet curtain equipment 2 and the fan equipment 3; The wind speed sensor 7, wind direction sensor 8, and outdoor temperature and humidity sensor 9 are mounted on the outdoor mounting bracket 6 for collecting wind speed data. ,wind direction Outdoor temperature ,humidity data; The edge controller 10 is connected to the wet curtain device 2, the fan device 3, the indoor temperature and humidity sensor 5, the wind speed sensor 7, the wind direction sensor 8, the outdoor temperature and humidity sensor 9, and the cloud server 11 via wired or wireless means. The edge controller 10 is used to collect the data required for the indoor temperature control model, deploy the indoor temperature control model, and communicate with the cloud server 11. It sends the data required for the indoor temperature control model to the cloud server 11 and receives the updated optimal indoor temperature control model from the cloud server 11. The cloud server 11 is used for building the simulation environment, training the indoor temperature control model, storing the data uploaded by the edge controller 10, updating the indoor temperature control model online, and distributing the updated optimal model to the edge controller 10 for deployment.
7. The temperature control system for duck farming environment integrating quantile regression prediction and reinforcement learning according to claim 6, characterized in that, The indoor temperature and humidity sensor 5 collects data once every five minutes, and the average value is used as the collected indoor temperature and humidity. The wind speed sensor 7, wind direction sensor 8, and outdoor temperature and humidity sensor 9 collect data once every five minutes.
8. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the method as described in any one of claims 1 to 5.