A comfort adjustment device dynamic control method, apparatus, device and medium
By combining a deep prediction model and an optimization model with a convolutional extended long short-term memory network and a feasible repair probability discrete particle swarm optimization algorithm, the control actions of comfort adjustment equipment are predicted and optimized, solving the problem of equipment control lag and improving the equipment's response speed and energy efficiency.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHENZHEN UNIV
- Filing Date
- 2026-01-30
- Publication Date
- 2026-06-30
AI Technical Summary
Existing comfort control methods suffer from lag, resulting in delayed environmental improvements and an inability to respond promptly to environmental changes.
By employing a deep prediction model and an optimization model, and by acquiring current environmental and operational data, the control actions of comfort adjustment devices are predicted and optimized in subsequent moments. Convolutional extended long short-term memory networks and feasible repair probability discrete particle swarm optimization algorithms are used to iteratively optimize control actions to reduce hysteresis.
This enables comfort adjustment equipment to respond to environmental changes in advance, reduces the lag in environmental improvement, and improves the response speed and energy efficiency of the equipment.
Smart Images

Figure CN121657482B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of intelligent control technology, specifically to a dynamic control method, device, equipment, and medium for comfort adjustment equipment. Background Technology
[0002] Comfort control devices are used to adjust environmental parameters to improve user comfort. For example, air conditioners can regulate indoor temperature, humidity, and fan speed to enhance user comfort. Existing control methods for comfort control devices include timed control and thermostat-based start-stop control. This means the user sets a fixed temperature value, and the comfort control device starts / stops or adjusts its frequency based on the deviation between the indoor temperature and the set temperature. However, this control method only adjusts the comfort control device based on the current indoor temperature, resulting in a lag in the device's improvement of the indoor environment. In other words, the device only starts adjusting when the indoor temperature deviates from the set temperature, and it takes a period of time for the device to return the room temperature to the set level.
[0003] In summary, existing control methods for comfort adjustment devices are outdated.
[0004] Therefore, existing technologies still need to be improved and enhanced. Summary of the Invention
[0005] To address the aforementioned technical problems, this invention provides a dynamic control method, device, equipment, and medium for comfort adjustment equipment, which solves the problem of lag in existing control methods for comfort adjustment equipment.
[0006] To achieve the above objectives, the present invention adopts the following technical solution:
[0007] In a first aspect, the present invention provides a dynamic control method for a comfort adjustment device, comprising:
[0008] Obtain the current environmental data of the working environment of the comfort adjustment device at the current moment, and obtain the current operating data of the comfort adjustment device at the current moment;
[0009] The preset control actions of the comfort adjustment device at subsequent times are obtained;
[0010] A predictive control model is applied to the current environmental data, the current operating data, and the preset control actions to predict the optimized control actions of the comfort adjustment device at subsequent times. The predictive control model includes a deep prediction model and an optimization model. The deep prediction model is used to predict environmental data and device energy consumption based on the preset control actions, the current environmental data, and the current operating data. The optimization model is used to optimize and adjust the control actions based on the predicted environmental parameters and device energy consumption.
[0011] In one implementation, a predictive control model is applied to the current environmental data, the current operating data, and the preset control actions to predict the optimized control actions of the comfort adjustment device at subsequent times, including:
[0012] Determine the current ambient temperature, current ambient humidity, current ambient wind speed, and current number of users from the current environmental data;
[0013] Determine the current wind speed, current temperature, and current energy consumption of the device from the current operating data;
[0014] Determine the preset temperature and preset airflow of the equipment in the preset control actions;
[0015] A predictive control model is applied to the current ambient temperature, current ambient humidity, current ambient wind speed, current number of users, current device wind speed, current device temperature, current device energy consumption, preset device temperature, and preset device wind speed to predict the device wind speed and device temperature of the comfort adjustment device at the subsequent time, and the predicted device wind speed and device temperature are used as optimized control actions.
[0016] In one implementation, a predictive control model is applied to the current environmental data, the current operating data, and the preset control actions to predict the optimized control actions of the comfort adjustment device at subsequent times, including:
[0017] The deep prediction model is applied to the current environmental data, the current operating data, and the preset control actions to predict the environmental data and equipment energy consumption at subsequent times, thereby obtaining environmental prediction data and equipment prediction energy consumption.
[0018] The objective function is obtained by using the optimization model based on the environmental prediction data and the energy consumption prediction of the equipment;
[0019] The optimization model iteratively optimizes the preset control action based on the value of the objective function, determines the value of the objective function corresponding to the preset control action after each iteration, and continues until the value of the objective function satisfies the preset iteration termination condition, thus obtaining the optimized control action.
[0020] In one implementation, the objective function is obtained through the optimization model based on the environmental prediction data and the device's predicted energy consumption, including:
[0021] Based on the environmental prediction data, the optimized model obtains the predicted user comfort level.
[0022] The optimization model calculates the equipment loss data caused by the preset control action to the comfort adjustment device, and the equipment loss data includes the equipment loss caused by the device start-up and / or fluctuations in the device's control action.
[0023] The objective function is obtained by using the optimization model based on the user's predicted comfort level, the device's predicted energy consumption, and the device's wear data.
[0024] In one implementation, the preset control action is iteratively optimized based on the value of the objective function using the optimization model, including:
[0025] Obtain the self-constraints of the control action, and obtain the constraints of comfort on the control action;
[0026] Based on the value of the objective function, the optimization model iteratively optimizes the preset control action under the constraints of its own conditions and comfort on the control action.
[0027] In one implementation, the depth prediction model is a convolutional extended long short-term memory network, which includes a cascaded one-dimensional convolution, an extended long short-term memory network, a fully connected layer, and a linear rectified function activation layer.
[0028] In one implementation, the extended long short-term memory network consists of a scalar long short-term memory network module and a multidimensional long short-term memory network module.
[0029] Secondly, embodiments of the present invention also provide a dynamic control device for a comfort adjustment device, wherein the device comprises the following components:
[0030] The data acquisition module is used to acquire the current environmental data of the working environment of the comfort adjustment device at the current moment, and to acquire the current operating data of the comfort adjustment device at the current moment;
[0031] The preset module is used to preset the control actions of the comfort adjustment device at subsequent times, thereby obtaining the preset control actions;
[0032] An optimization control module is used to apply a predictive control model to the current environmental data, the current operating data, and the preset control actions to predict the optimized control actions of the comfort adjustment device at subsequent times. The predictive control model includes a deep prediction model and an optimization model. The deep prediction model is used to predict environmental data and device energy consumption based on the preset control actions, the current environmental data, and the current operating data. The optimization model is used to optimize and adjust the control actions based on the predicted environmental parameters and device energy consumption.
[0033] Thirdly, embodiments of the present invention also provide a terminal device, wherein the terminal device includes a memory, a processor, and a dynamic control program for a comfort adjustment device stored in the memory and executable on the processor, wherein when the processor executes the dynamic control program for the comfort adjustment device, it implements the steps of the dynamic control method for the comfort adjustment device described above.
[0034] Fourthly, embodiments of the present invention also provide a computer-readable storage medium storing a dynamic control program for a comfort adjustment device. When the dynamic control program for a comfort adjustment device is executed by a processor, it implements the steps of the dynamic control method for a comfort adjustment device described above.
[0035] Beneficial Effects: This invention first presets the control actions of the device for future or subsequent times, that is, presets the device actions required to improve the environment at subsequent times. Then, this invention applies a predictive control model to the current environmental data, the current operating data of the device, and the aforementioned preset control actions. The predictive control model iteratively optimizes the preset control actions for subsequent times to obtain optimized control actions, which are then used to control the device's actions at subsequent times. In summary, this invention predicts the device's control actions required to meet subsequent environmental changes based on current environmental data, allowing the invention to input the predicted optimized control actions for subsequent times into the device in advance. This enables the device to respond promptly to environmental changes at subsequent times, thereby reducing the lag in the device's environmental improvement. Attached Figure Description
[0036] Figure 1 This is an overall flowchart of the present invention;
[0037] Figure 2 This is a framework diagram of the technical solution in an embodiment of the present invention;
[0038] Figure 3 This is a structural diagram of the depth prediction model in an embodiment of the present invention;
[0039] Figure 4 This is a schematic diagram illustrating the training of the predictive control model in an embodiment of the present invention;
[0040] Figure 5 A structural diagram of the dynamic control device for comfort adjustment equipment provided by the present invention;
[0041] Figure 6 This is a block diagram illustrating the internal structure of a terminal device provided in an embodiment of the present invention. Detailed Implementation
[0042] The technical solutions of the present invention will be clearly and completely described below with reference to the embodiments and accompanying drawings. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without inventive effort are within the scope of protection of the present invention.
[0043] Research has found that comfort control devices are used to adjust environmental parameters to improve user comfort. For example, air conditioners can regulate indoor temperature, humidity, and fan speed to enhance user comfort. Existing control methods for comfort control devices include timed control and thermostat-based start-stop control. This means the user sets a fixed temperature value, and the comfort control device starts / stops or adjusts its frequency based on the deviation between the indoor temperature and the set temperature. However, this control method only adjusts the comfort control device based on the current indoor temperature, resulting in a lag in the device's improvement of the indoor environment. In other words, the device only adjusts when the indoor temperature deviates from the set temperature, and it takes a period of time for the adjusted device to bring the indoor temperature back to the set temperature.
[0044] To address the aforementioned technical problems, this invention provides a dynamic control method, device, equipment, and medium for comfort adjustment equipment, resolving the issue of lag in existing control methods for comfort adjustment equipment. Specifically, when the comfort adjustment equipment is an air conditioner, the current environmental data and air conditioner operating data are first collected. The current environmental data includes the current indoor temperature, black ball temperature, relative humidity, indoor wind speed, number of people indoors, outdoor temperature, and outdoor humidity. The current air conditioner operating data includes the air conditioner set temperature, air conditioner set wind speed, and air conditioner energy consumption. The system presets or estimates the required air conditioning output temperature and fan speed to cope with potential environmental data in subsequent moments. These preset air conditioning output temperatures and fan speeds are used as preset control actions. The indoor temperature, black ball temperature, relative humidity, indoor fan speed, number of people indoors, outdoor temperature, outdoor humidity, air conditioning set temperature, air conditioning set fan speed, air conditioning energy consumption, and preset control actions are input into the predictive control model. The deep predictive model of the predictive control model predicts future possible indoor temperature, indoor relative humidity, black ball temperature, indoor fan speed, and air conditioning energy consumption based on the input data. The optimization model of the predictive control model optimizes the preset air conditioning actions based on the predicted indoor temperature, indoor relative humidity, black ball temperature, indoor fan speed, and air conditioning energy consumption. Each optimized preset air conditioning action corresponds to a set of indoor temperature, indoor relative humidity, black ball temperature, indoor fan speed, and air conditioning energy consumption, until the objective function value corresponding to the optimized preset air conditioning action reaches the iteration termination condition. At this point, the optimized preset air conditioning action is the final optimized control action, where the objective function consists of indoor temperature, indoor relative humidity, black ball temperature, indoor fan speed, and air conditioning energy consumption.
[0045] The dynamic control method for comfort adjustment devices in this embodiment can be applied to terminal devices, which can be terminal products with control functions, such as computers. In this embodiment, for example... Figure 1 As shown, the dynamic control method for the comfort adjustment device specifically includes the following steps:
[0046] S100, acquire the current environmental data of the working environment of the comfort adjustment device at the current moment, and acquire the current operating data of the comfort adjustment device at the current moment;
[0047] S200, preset the control action of the comfort adjustment device at a subsequent time to obtain the preset control action;
[0048] S300, apply a predictive control model to the current environmental data, the current operating data, and the preset control actions to predict the optimized control actions of the comfort adjustment device at the subsequent time.
[0049] The predictive control model in this embodiment is as follows: Figure 2The predictive control model shown, Figure 2 The predictive control model is the MPC model, which stands for Model Predictive Control.
[0050] "Control" refers to model predictive control. In this embodiment, the predictive control model includes a deep predictive model and an optimization model. The deep predictive model in this embodiment is a convolutional extended long short-term memory network, which is Conv-xLSTM. Conv-xLSTM stands for Conv-Extended Long Short-Term Memory.
[0051] The structure of convolution-extended long short-term memory networks is as follows: Figure 3 As shown, the Convolutional Extended Long Short-Term Memory (LSTM) network comprises a cascaded array of one-dimensional convolutions, an extended LSTM network, a fully connected layer, and a ReLU activation layer. The one-dimensional convolution includes a cascaded array of an input layer, a forgetting layer, and a convolutional layer. In this embodiment, the input layer uses Token Embedding (meaning embedding), the forgetting layer is a Dropout layer, and the convolutional kernels are 1x1. The extended LSTM network is composed of multiple stacked modules, each consisting of a scalar LSTM module and a multi-dimensional LSTM module. In this embodiment, the input of the scalar LSTM module is connected to the output of the one-dimensional convolution, the input of the multi-dimensional LSTM module is connected to the output of the scalar LSTM module, and the output of the multi-dimensional LSTM module is connected to the input of the fully connected layer.
[0052] The fully connected layer in this embodiment includes several parallel fully connected units, each performing regression predictions on different physical quantities (including environmental data and equipment energy consumption). It directly outputs the indoor temperature, relative humidity, black sphere temperature, indoor wind speed, and air conditioning energy consumption values for the future prediction time domain. Technical effect: This decoupling design effectively avoids gradient interference caused by target variables with different dimensions and distribution characteristics during backpropagation.
[0053] This embodiment can also use Attention (attention mechanism) to replace one-dimensional convolutional layers to extract the correlation between multiple variables. Especially when the number of sensor nodes increases, GNNs may be used to model the topological relationships between sensors. Graph neural networks can also be used to replace one-dimensional convolutional layers.
[0054] The aforementioned input embedding layer has the following technical effects:
[0055] A token embedding mechanism, or either sine or cosine encoding, is introduced at the input. A learnable embedding matrix maps the token to a high-dimensional continuous vector. Embedding can map temporal information from a low-dimensional linear space to a high-dimensional semantic space, enabling the model to more accurately capture the differences in thermal performance hidden at different time points and significantly improving the model's adaptability to periodic environmental changes.
[0056] The above convolutional layer has the following technical effects:
[0057] The time-series data, after embedding enhancement, first enters the convolutional layer. The convolutional kernels of the convolutional layer perform sliding calculations over the time step, aiming to extract and fuse deep local features from the multidimensional input data. This layer can effectively identify and extract the variation patterns and key features of different environmental parameters (such as temperature, humidity, radiation, etc.) within a local time window, playing a role in smoothing noise, reducing data redundancy, and enhancing feature representation capabilities, transforming the original high-dimensional sensor readings into higher-order abstract feature vectors.
[0058] like Figure 3 As shown, the scalar long short-term memory (SLSTM) module in this embodiment consists of a normalization layer, a scalar SLSTM network, a normalization group, dimensionality increase and gating, and a dimensionality reduction layer. The output of the normalization layer is convolved and then input into the scalar SLSTM network. The output of the normalization layer is also directly input into the scalar SLSTM network. The output of the scalar SLSTM network passes through the normalization group and enters the dimensionality increase and gating module. The dimensionality-increased data is added to the gating vector and then fed into the dimensionality reduction module for dimensionality reduction. The scalar SLSTM network is also known as sLSTM, or Scalar LSTM.
[0059] Scalar Long Short-Term Memory (LSTM) networks mitigate the vanishing gradient problem and improve the stability of long sequence modeling by introducing exponential gating and state normalization.
[0060] Gating in scalar long short-term memory networks includes input gates. And the Gate of Oblivion : , ,or , This represents the original activation value of the forget gate. express Activation function Used to map the input to the range (0,1). This represents the original activation value of the input gate. and as well as This makes the gate value no longer limited to Within this range, the vanishing gradient problem in deep training is effectively alleviated, ensuring stable gradient propagation even in extremely long sequences. Scalar Long Short-Term Memory networks utilize a normalization factor... Cell state Manage the hidden state output of the scalar long short-term memory network. : , Represents the output gate. The cell states represent element-wise multiplication. Cell states handle changes in ambient temperature, humidity fluctuations, and device energy consumption to help the scalar long short-term memory network generate control strategies for the device. Hidden states represent the current predictions output by the scalar long short-term memory network. Together, cell states and hidden states ensure the numerical stability of the network during training.
[0061] Depend on Figure 3 As can be seen, the multidimensional long short-term memory network module in this embodiment consists of a normalization layer, dimensionality increase and gating, a multidimensional long short-term memory network, a normalization group, and a dimensionality reduction layer. Among them, the multidimensional long short-term memory network is Matrix LSTM, and the normalization group is the mechanism for stabilizing the training of deep neural networks. The normalization group divides the multidimensional feature channels into several groups and calculates the mean and variance independently in each group for normalization processing. The normalization group is the core component that ensures numerical stability during large-scale parallel computing.
[0062] Multidimensional Long Short-Term Memory Networks (LSTM) employ Instead of scalar states, it can more richly encode and store complex information in time series. Multidimensional Long Short-Term Memory (LSTM) networks employ a key-based approach. ,value The covariance update rule for vectors. The cell state update is achieved through the outer product operation, with the update method as follows: , Representing the cell state at the previous time step, this mechanism allows the Multidimensional Long Short-Term Memory (MLS) network to explicitly learn and remember the dynamic relationships and covariance structure between different feature dimensions in the input. The MLS removes the recursive dependencies between hidden states in time steps; the computation of its current state depends only on the current input and the cell state at the previous time step. This parallelized feedforward structure enables the MLS computation at all time steps to be performed in parallel, thereby improving training and inference efficiency, especially suitable for long sequence processing.
[0063] The optimization model in this embodiment is FA-PDPSO, which stands for Feasible-Repair Probabilistic Discrete PSO. FA represents the perturbation mechanism of the firefly algorithm, and Feasible-Repair Probabilistic Discrete PSO represents the discrete particle swarm optimization algorithm with feasible repair probability.
[0064] This embodiment can also use genetic algorithms, differential evolution, artificial bee colony, sparrow search algorithm, or gray wolf optimization to replace FA-PDPSO.
[0065] The specific steps for training the predictive control model in this embodiment are as follows:
[0066] Step 1, as follows Figure 4 As shown, a state vector is collected by an environmental state perception sensor. The state vector includes indoor temperature, black ball temperature, relative humidity, indoor wind speed, number of people indoors, outdoor temperature, outdoor humidity, air conditioner set temperature, air conditioner set wind speed, air conditioner energy consumption, and preset control actions of the air conditioner.
[0067] Step two: Input the state vector corresponding to the selected historical moment into the policy network, which is the predictive control model mentioned above. The policy network predicts the air conditioning control action corresponding to the next historical moment after the selected historical moment.
[0068] Step 3: Based on the predictive control action in Step 2, determine the indoor temperature, black ball temperature (obtained using existing technology), relative humidity, indoor wind speed, and air conditioning energy consumption at the next historical moment caused by the predictive control action of the air conditioning. Calculate the average radiant temperature based on the indoor temperature, black ball temperature, relative humidity, and indoor wind speed, and calculate the comfort level based on the average radiant temperature.
[0069] Step four: Determine the energy consumption of the air conditioner resulting from the predictive control action in step two.
[0070] Step 5: Calculate the value of the reward function based on the predicted comfort level from Step 3 and the energy consumption from Step 4.
[0071] ;
[0072] Representing the The reward function value at time step, Representing the The comfort deviation value at any given time is used to train the predictive control model so that... The value approaches zero; the comfort deviation value is the difference between the comfort level in step three and the baseline comfort level. Representing the air conditioner in the Energy consumption at any time Represents baseline energy consumption. This represents an over-limit penalty; if the indoor temperature in step three exceeds the set value, a penalty will be imposed. Negative values are used to force the predictive control model to learn to avoid outputting extreme control actions. , , These are the three hyperparameters of the reward function. When the indoor temperature in step three exceeds the set value, then... A negative value, meaning when the indoor temperature in step three is greater than 28 degrees Celsius or less than 18 degrees Celsius, is assigned... Negative value.
[0073] Step six: Update the weight coefficients of the predictive control model based on the value of the reward function obtained in step five until the optimal combination of weight coefficients is obtained. Each weight coefficient affects the air conditioning energy consumption, comfort, and stability of the air conditioning system, respectively.
[0074] To avoid the predictive control model randomly exploring the weight coefficients in step six during the initial stage, this embodiment first constructs a virtual simulation environment based on physical information. Using building energy consumption simulation software, a simulation model is established based on data such as the geometric dimensions of the target office and the thermal parameters of the building envelope. The predictive control model undergoes large-scale accelerated training in the simulation environment, initially mastering basic weighting logic (e.g., prioritizing comfort during hot weather and energy conservation during transitional seasons). The pre-trained model parameters will serve as initial values for online operation. After the air conditioning is put into actual operation, the predictive control model continues to utilize real environmental data stored in the experience replay pool to periodically update the parameters of the policy network and value network, thereby adapting to real changes in room thermal characteristics and user habit drift.
[0075] The trained predictive control model optimizes the preset control actions of the comfort adjustment device based on the current environmental data and the current operating data of the comfort adjustment device, so as to obtain the optimized control action. The operation of the comfort adjustment device is controlled by the final optimized control action, so that the comfort adjustment device can provide a good living environment under the influence of the optimized control action.
[0076] The comfort adjustment device in step S100 can be an air conditioner, a fan, a heating device, a fan coil unit, a variable air volume system, a ground source heat pump, or a floor radiant heating system. The subsequent time in step S200 can be the next time after the current time, or several times after the current time. When the comfort adjustment device is an air conditioner, the preset control action in step S200 is to preset the temperature and fan speed output by the air conditioner in the subsequent time. The purpose of step S300 is to iteratively optimize the temperature and fan speed output by the air conditioner in the subsequent time, using the optimized temperature and fan speed to control the operation of the air conditioner in the subsequent time.
[0077] Step S300 involves applying a predictive control model to the current environmental data, the current operating data, and the preset control actions to predict the optimized control actions of the comfort adjustment device at subsequent times. This includes the following specific steps: S301, S302, S303, S304, S305, S306, and S307:
[0078] S301, apply the deep prediction model to the current environmental data, the current operating data, and the preset control action to predict the environmental data and equipment energy consumption at subsequent times, and obtain environmental prediction data and equipment prediction energy consumption.
[0079] Current environmental data includes the current ambient temperature, current ambient humidity, current ambient wind speed, and current number of users at the current moment. The current ambient temperature includes the outdoor temperature and the indoor temperature where the comfort control equipment is located. The current ambient humidity includes the outdoor humidity and the indoor humidity where the comfort control equipment is located. The current number of users is the number of people indoors who need the comfort control equipment to provide temperature and wind speed.
[0080] In this embodiment, a non-contact millimeter-wave radar is deployed at a height of approximately 2.2m on the wall. The non-contact millimeter-wave radar collects the number of people in the room in real time.
[0081] In this embodiment, outdoor temperature and humidity are collected at the shaded area of the building's exterior wall as external disturbance boundary conditions.
[0082] Current operating data includes the temperature (the current temperature of the device) and wind speed (the current wind speed of the device) set for the comfort control device at the current moment, as well as the energy consumption required by the comfort control device at the set temperature and wind speed (the current energy consumption of the device).
[0083] A thermal anemometer is deployed at the air outlet of a comfort control device (which may be an air conditioner) to collect the air outlet speed and temperature of the device, and real-time power and cumulative energy consumption are collected through a smart socket or communication interface.
[0084] Preset control actions include pre-setting the temperature and fan speed of the comfort adjustment equipment.
[0085] The current environmental data, current operating data, and preset control actions are input into the deep prediction model (the deep prediction model is a Convolutional Extended Long Short-Term Memory Network Conv-xLSTM). The deep prediction model predicts and outputs the environmental prediction data and the device's predicted energy consumption at subsequent time points. The environmental prediction data includes the indoor temperature at subsequent time points, the indoor and outdoor relative humidity at subsequent time points, the black ball temperature at subsequent time points, and the indoor wind speed at subsequent time points.
[0086] S302, based on the environmental prediction data, the user's predicted comfort level is obtained through the optimization model.
[0087] This embodiment is based on mean radiation temperature. Calculate user-predicted comfort based on indoor and outdoor relative humidity at subsequent times. Calculate user comfort This is existing technology.
[0088] ;
[0089] In the formula, The indoor temperature at subsequent times, The temperature of the black ball at subsequent moments, Indoor wind speed at subsequent moments, Represents the surface emissivity of the black sphere. The value can be 0.95. Represents the diameter of the black ball.
[0090] In this embodiment, a high-precision sensor is placed at the geometric center of the work area, 1.10 meters above the ground, to collect the aforementioned indoor temperature, black sphere temperature, indoor wind speed, and relative humidity.
[0091] S303, calculate the equipment loss data of the preset control action to the comfort adjustment device through the optimization model, the equipment loss data includes the equipment loss caused by the start-up and / or control action fluctuation of the device.
[0092] use This represents equipment wear data; when the control action is to set the temperature and fan speed of the equipment (which is an air conditioner), This includes penalties for changes in set temperature and airflow to prevent frequent control oscillations. In other words, the more frequently the preset control actions change the equipment temperature and airflow speed, the more likely the system will penalize these changes. The larger the value, the better. For example, if the preset control action is to increase the equipment temperature and airflow speed at the first moment, decrease the equipment temperature and airflow speed at the second moment, and then increase the equipment temperature and airflow speed again at the third moment, then the preset control action changes frequently. Frequent changes in control action will increase equipment wear and tear. The value will increase.
[0093] S304, the objective function value is obtained through the optimization model based on the user's predicted comfort, the device's predicted energy consumption, and the device's loss data.
[0094] ;
[0095] This represents the value of the objective function. This represents the average difference in comfort at various times. The difference in comfort at various times is the difference between the baseline comfort and the user's predicted comfort at each time based on the preset control action. Representing the Real-time device energy consumption prediction Represents the total number of moments. This represents the penalty data for breach of contract. The penalty data is assigned a value corresponding to the deviation from the predicted comfort level whenever the user predicts the comfort level to exceed the range of possible comfort values. The range of possible comfort values is... . for The weight, for The weight.
[0096] S305, obtain the self-constraints of the control action, and obtain the constraints of comfort on the control action.
[0097] The self-constraints of the control action include temperature constraint range and wind speed constraint range. The temperature constraint range refers to the range of temperature values, and the wind speed constraint range refers to the range of wind speed values. In this embodiment, the temperature range is... The wind speed range is .
[0098] The range of comfort values is: Comfort is related to the two control actions of the comfort adjustment device: temperature and airflow speed. Therefore, the range of comfort values can be converted into a constraint of comfort on the control actions.
[0099] S306, the preset control action is iteratively optimized based on the value of the objective function and the constraints of the self-constraints and comfort on the control action by the optimization model.
[0100] S307, determine the value of the objective function corresponding to the preset control action after each iteration, until the value of the objective function satisfies the preset iteration termination condition, and obtain the optimized control action.
[0101] In this embodiment, the iteration termination condition is that the objective function reaches its minimum value. When the objective function reaches its minimum value, there are three control actions at future time points. Only the control action at the first future time point is sent to the comfort adjustment device through the IoT interface.
[0102] Unexecuted The time-based command is temporarily stored as a "suspend setting" as a reference for the warm-start optimization in the next time-based time, further ensuring the continuity of control.
[0103] After one control cycle (5 minutes), the sensor collects new measured states and updates the historical window; then the reinforcement learning module outputs new weights. In this embodiment, the closed-loop rolling process of "prediction-optimization-execution" is repeated to suppress the impact of model mismatch and external random disturbances.
[0104] This embodiment of the process of optimizing control actions through FA-PDPSO includes the following steps one, two, and three. Optimizing control actions means optimizing the temperature and airflow speed of the comfort adjustment device:
[0105] Step 1: Mixed state encoding and initialization.
[0106] That is, defining particles position vector For dimension sequence, For prediction, the prediction time domain is the time domain consisting of the times after the current time.
[0107] ;
[0108] In the formula, , , This represents the temperature of the device at the first, second, and third moments within the prediction time domain. , , This represents the airflow velocity of the device at the first, second, and third moments within the prediction time domain. Set the temperature to an integer. Wind speed settings are integers. Particle velocity. It is kept as a continuous set of real numbers and used to guide the direction during the update process.
[0109] Step 2: Distance-based adaptive Softmax discrete sampling strategy.
[0110] First, calculate the temporary continuous position of the particle using the following formula. :
[0111] ;
[0112] Representative particles Location, Representative particles The rate of change of position.
[0113] calculate With all legal discrete values Euclidean distance between (e.g., the individual temperature values within a temperature set) :
[0114] ;
[0115] The Softmax function is used to convert distance into selection probability, with the probability increasing the closer the distance.
[0116] ;
[0117] in, Represents the current iteration step The adaptive return parameters are as follows. This represents the initial value of the adaptive parameters. This determines the intensity of exploration in the early stages of the iteration. This represents the maximum set value of the adaptive parameter. This determines the convergence accuracy in the later stages of the iteration. This represents the total number of iterations.
[0118] For adaptive annealing parameters, at the beginning of the iteration, The value is equal to During the iteration process, The value is equal to In the early stages of the iteration, Smaller values and a flatter probability distribution encourage particles to explore different discrete states; in the later stages of iteration... As the probability distribution increases, particles tend to choose the nearest integer, thus enhancing local exploitation capabilities.
[0119] Step 3: Heuristic constraint repair mechanism.
[0120] The heuristic constraint repair mechanism uses a Conv-xLSTM prediction model to estimate the position of the current particle. Value. If (Setting a comfort zone) triggers the repair logic; if If it overheats, lower the set temperature or increase the fan speed; if If it is too cold, increase the set temperature or decrease the fan speed.
[0121] The repair process is executed iteratively until the constraints are met or the iteration limit is reached.
[0122] The heuristic constraint repair mechanism utilizes prior knowledge in the HVAC field to guide particles back to the feasible region quickly, avoiding invalid searches.
[0123] Step 4: Neighborhood search mechanism based on firefly perturbation.
[0124] After each iteration, the fitness variance of the population is calculated. If the fitness variance is lower than a preset threshold, it indicates that the particle swarm is converging, and activation is required. The mechanism initiates a local search for the currently globally optimal particle.
[0125] In each dimension of the globally optimal particle, the generation radius is The neighborhood solution set, The radius includes both temperature and wind speed, where the temperature radius can be... The radius of the wind speed can be set to level 1.
[0126] The algorithm iterates through the neighborhood solution set and evaluates the fitness; if a better solution is found than the current globally optimal particle, it is immediately replaced. This step significantly enhances the algorithm's fine-grained optimization capability in discrete space, ensuring that the output control commands are true local extrema.
[0127] This embodiment also provides a dynamic control device for comfort adjustment equipment, such as... Figure 5 As shown, the device comprises the following components:
[0128] The data acquisition module 01 is used to acquire the current environmental data of the working environment of the comfort adjustment device at the current moment, and to acquire the current operating data of the comfort adjustment device at the current moment;
[0129] The preset module 02 is used to preset the control actions of the comfort adjustment device at subsequent times to obtain the preset control actions;
[0130] The optimization control module 03 is used to apply a predictive control model to the current environmental data, the current operating data, and the preset control actions to predict the optimized control actions of the comfort adjustment device at subsequent times. The predictive control model includes a deep prediction model and an optimization model. The deep prediction model is used to predict environmental data and device energy consumption based on the preset control actions, the current environmental data, and the current operating data. The optimization model is used to optimize and adjust the control actions based on the predicted environmental parameters and device energy consumption.
[0131] Based on the above embodiments, the present invention also provides a terminal device, the principle block diagram of which can be as follows: Figure 6 As shown, the terminal device includes a processor, memory, network interface, and display screen connected via a system bus. The processor provides computing and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The network interface is used to communicate with external terminals via a network connection. When the computer program is executed by the processor, it implements a dynamic control method for comfort adjustment devices. The display screen of the terminal device can be a liquid crystal display (LCD) or an e-ink display.
[0132] Those skilled in the art will understand that Figure 6 The schematic diagram shown is only a partial structural diagram related to the present invention and does not constitute a limitation on the terminal device to which the present invention is applied. The specific terminal device may include more or fewer components than shown in the figure, or combine certain components, or have different component arrangements.
[0133] In one embodiment, a terminal device is provided, the terminal device including a memory, a processor, and a dynamic control program for a comfort adjustment device stored in the memory and executable on the processor. When the processor executes the dynamic control program for the comfort adjustment device, it implements the following operation instructions:
[0134] Obtain the current environmental data of the working environment of the comfort adjustment device at the current moment, and obtain the current operating data of the comfort adjustment device at the current moment;
[0135] The preset control actions of the comfort adjustment device at subsequent times are obtained;
[0136] A predictive control model is applied to the current environmental data, the current operating data, and the preset control actions to predict the optimized control actions of the comfort adjustment device at subsequent times. The predictive control model includes a deep prediction model and an optimization model. The deep prediction model is used to predict environmental data and device energy consumption based on the preset control actions, the current environmental data, and the current operating data. The optimization model is used to optimize and adjust the control actions based on the predicted environmental parameters and device energy consumption.
[0137] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium. When executed, the computer program can include the processes of the embodiments of the above methods. Any references to memory, storage, databases, or other media used in the embodiments provided by this invention can include non-volatile and / or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link DRAM (SLDRAM), Rambus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
[0138] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims
1. A dynamic control method for a comfort adjustment device, characterized in that, include: Obtain the current environmental data of the working environment of the comfort adjustment device at the current moment, and obtain the current operating data of the comfort adjustment device at the current moment; The preset control actions of the comfort adjustment device at subsequent times are obtained; A predictive control model is applied to the current environmental data, the current operating data, and the preset control actions to predict the optimized control actions of the comfort adjustment device at subsequent times. These subsequent times include several time points. The predictive control model includes a deep prediction model and an optimization model. The deep prediction model is a Convolutional Extended Long Short-Term Memory (Conv-xLSTM) network, which includes a cascaded one-dimensional convolution, an extended long short-term memory network, a fully connected layer, and a linear rectified function activation layer. The deep prediction model is used to predict environmental data and device energy consumption based on the preset control actions, the current environmental data, and the current operating data. The optimization model is used to optimize and adjust the control actions based on the predicted environmental parameters and device energy consumption. Applying a predictive control model to the current environmental data, the current operating data, and the preset control actions, the system predicts the optimal control actions of the comfort adjustment device at subsequent times, including: The deep prediction model is applied to the current environmental data, the current operating data, and the preset control actions to predict the environmental data and equipment energy consumption at subsequent times, thereby obtaining environmental prediction data and equipment prediction energy consumption. The objective function is obtained by using the optimization model based on the environmental prediction data and the equipment's predicted energy consumption: This represents the value of the objective function. The average value representing the difference in comfort at various times. Representing the Real-time device energy consumption prediction Represents the total number of moments. Data representing penalties for breach of contract for The weight, for The weight, Represents equipment wear and tear data; The optimization model iteratively optimizes the preset control action based on the value of the objective function, determines the value of the objective function corresponding to the preset control action after each iteration, until the value of the objective function satisfies the preset iteration termination condition, and obtains the optimized control action. The optimization model is FA-PDPSO, which stands for Feasible-Repair Probabilistic Discrete PSO. FA represents the firefly algorithm perturbation mechanism, and Feasible-Repair Probabilistic Discrete PSO represents the feasible repair probability discrete particle swarm optimization algorithm. The optimization control actions based on FA-PDPSO include: hybrid state encoding and initialization, distance-based adaptive softmax discrete sampling strategy, heuristic constraint repair mechanism, and neighborhood search mechanism based on firefly perturbation.
2. The dynamic control method for comfort adjustment equipment as described in claim 1, characterized in that, Applying a predictive control model to the current environmental data, the current operating data, and the preset control actions, the system predicts the optimal control actions of the comfort adjustment device at subsequent times, including: Determine the current ambient temperature, current ambient humidity, current ambient wind speed, and current number of users from the current environmental data; Determine the current wind speed, current temperature, and current energy consumption of the device from the current operating data; Determine the preset temperature and preset airflow of the equipment in the preset control actions; A predictive control model is applied to the current ambient temperature, current ambient humidity, current ambient wind speed, current number of users, current device wind speed, current device temperature, current device energy consumption, preset device temperature, and preset device wind speed to predict the device wind speed and device temperature of the comfort adjustment device at the subsequent time, and the predicted device wind speed and device temperature are used as optimized control actions.
3. The dynamic control method for comfort adjustment equipment as described in claim 1, characterized in that, The objective function is obtained through the optimization model based on the environmental prediction data and the equipment's predicted energy consumption, including: Based on the environmental prediction data, the optimized model obtains the predicted user comfort level. The optimization model calculates the equipment loss data caused by the preset control action to the comfort adjustment device, and the equipment loss data includes the equipment loss caused by the device start-up and / or fluctuations in the device's control action. The objective function is obtained by using the optimization model based on the user's predicted comfort level, the device's predicted energy consumption, and the device's wear data.
4. The dynamic control method for comfort adjustment equipment as described in claim 1, characterized in that, The preset control action is iteratively optimized based on the value of the objective function using the optimization model, including: Obtain the self-constraints of the control action, and obtain the constraints of comfort on the control action; Based on the value of the objective function, the optimization model iteratively optimizes the preset control action under the constraints of its own conditions and comfort on the control action.
5. The dynamic control method for comfort adjustment equipment as described in claim 1, characterized in that, The extended long short-term memory network consists of a scalar long short-term memory network module and a multidimensional long short-term memory network module.
6. A dynamic control device for comfort adjustment equipment, characterized in that, The device comprises the following components: The data acquisition module is used to acquire the current environmental data of the working environment of the comfort adjustment device at the current moment, and to acquire the current operating data of the comfort adjustment device at the current moment; The preset module is used to preset the control actions of the comfort adjustment device at subsequent times, thereby obtaining the preset control actions; An optimization control module is used to apply a predictive control model to the current environmental data, the current operating data, and the preset control actions to predict the optimized control actions of the comfort adjustment device at subsequent times. These subsequent times include several time points. The predictive control model includes a deep prediction model and an optimization model. The deep prediction model is a Convolutional Extended Long Short-Term Memory (Conv-xLSTM) network, which includes a cascaded one-dimensional convolution, an extended long short-term memory network, a fully connected layer, and a linear rectified function activation layer. The deep prediction model is used to predict environmental data and device energy consumption based on the preset control actions, the current environmental data, and the current operating data. The optimization model is used to optimize and adjust the control actions based on the predicted environmental parameters and device energy consumption. Applying a predictive control model to the current environmental data, the current operating data, and the preset control actions, the system predicts the optimal control actions of the comfort adjustment device at subsequent times, including: The deep prediction model is applied to the current environmental data, the current operating data, and the preset control actions to predict the environmental data and equipment energy consumption at subsequent times, thereby obtaining environmental prediction data and equipment prediction energy consumption. The objective function is obtained by using the optimization model based on the environmental prediction data and the equipment's predicted energy consumption: This represents the value of the objective function. The average value representing the difference in comfort at various times. Representing the Real-time device energy consumption prediction Represents the total number of moments. Data representing penalties for breach of contract for The weight, for The weight, Represents equipment wear and tear data; The optimization model iteratively optimizes the preset control action based on the value of the objective function, determines the value of the objective function corresponding to the preset control action after each iteration, until the value of the objective function satisfies the preset iteration termination condition, and obtains the optimized control action. The optimization model is FA-PDPSO, which stands for Feasible-Repair Probabilistic Discrete PSO. FA represents the firefly algorithm perturbation mechanism, and Feasible-Repair Probabilistic Discrete PSO represents the feasible repair probability discrete particle swarm optimization algorithm. The optimization control actions based on FA-PDPSO include: hybrid state encoding and initialization, distance-based adaptive softmax discrete sampling strategy, heuristic constraint repair mechanism, and neighborhood search mechanism based on firefly perturbation.
7. A terminal device, characterized in that, The terminal device includes a memory, a processor, and a dynamic control program for comfort adjustment devices stored in the memory and executable on the processor. When the processor executes the dynamic control program for comfort adjustment devices, it implements the steps of the dynamic control method for comfort adjustment devices as described in any one of claims 1-5.
8. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a dynamic control program for a comfort adjustment device, which, when executed by a processor, implements the steps of the dynamic control method for a comfort adjustment device as described in any one of claims 1-5.