A method for adjusting illumination based on reinforcement learning

By using a reinforcement learning-based method, the control parameters of a visible light source array are automatically adjusted, solving the illuminance requirements of multiple illuminance test points, achieving automated adjustment, and enriching the application scenarios of visible light source arrays.

CN116867135BActive Publication Date: 2026-06-16XIAN INST OF OPTICS & PRECISION MECHANICS CHINESE ACAD OF SCI

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
XIAN INST OF OPTICS & PRECISION MECHANICS CHINESE ACAD OF SCI
Filing Date
2023-05-15
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing technologies cannot effectively adjust each adjustable light source in a visible light source array to ensure that multiple illuminance test points simultaneously meet the illuminance requirements. This is especially true in large-scale visible light source arrays, where manual adjustment is difficult and impractical.

Method used

By employing a reinforcement learning-based approach, an evaluation neural network and an action neural network are constructed, and a deep deterministic policy gradient algorithm is used to automatically adjust the control parameters in the visible light source array, thereby achieving automated adjustment of each illuminance test point.

🎯Benefits of technology

It realizes the automated adjustment of visible light source arrays, meets the illuminance requirements of each illuminance test point, solves the illuminance control problem of large-scale visible light source arrays, and enriches its application scenarios.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116867135B_ABST
    Figure CN116867135B_ABST
Patent Text Reader

Abstract

The application discloses an illumination adjustment method based on reinforcement learning, to solve the problem that the prior art cannot simultaneously adjust each adjustable light source in a visible light source array, so that multiple illumination test points simultaneously meet the illumination requirement. The application collects feedback values of illuminometers of each illumination test point, uses a reinforcement learning method to automatically adjust control parameter values of each adjustable visible light source in the visible light source array, and through continuous iteration of an algorithm, realizes automatic adjustment and meets the illumination requirement of each illumination test point. The control method provided by the application can effectively solve the illumination control requirement of the visible light source array for each radiation point, and perfect and enrich the application scene of the visible light source array.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to illuminance control methods, specifically to an illuminance adjustment method based on reinforcement learning. Background Technology

[0002] Visible light sources are a common type of lighting equipment, typically including LED lights, xenon lamps, and incandescent lamps. When a visible light source is lit, its illuminance value can be measured using a lux meter to characterize the light intensity at a given distance. This illuminance value is usually measured by taking a reading from the lux meter at a certain distance from the visible light source, and the reading is typically expressed in lux (lx). Adjusting the illuminance value of a visible light source is usually achieved by changing the control parameters (voltage, current, etc.) in the visible light source controller, thus monotonically and continuously changing the illuminance value. Therefore, there is a monotonic mapping relationship between the illuminance value at a certain distance from the visible light source and the control parameters controlling the illuminance, denoted as: E = F(A), where E represents the illuminance value, A represents the control parameter controlling the illuminance value, and E increases as A increases. Of course, the mapping function F will be different depending on the distance between the illuminance meter and the visible light source, but the monotonically increasing relationship of F mapping A to E will not change, where A and E are both continuous values.

[0003] When using visible light sources for illumination, one type of application involves placing an illuminance meter at a fixed point within the irradiance range of the visible light source, depending on the specific usage scenario. By changing the control parameters of the visible light source, the illuminance value displayed by the illuminance meter is adjusted to a range that roughly meets the application requirements. This type of operation can be performed manually because it only involves adjusting the illuminance value of a single visible light source, making it relatively simple. However, when a single visible light source is extended to a visible light source array, and there are specific control requirements for the illuminance values ​​at multiple points within the array's irradiance range, it becomes necessary to adjust multiple adjustable units within the array simultaneously. This workload is enormous and often cannot be performed manually.

[0004] In practical applications, the use and control requirements of visible light source arrays are very common, such as xenon lamp arrays for testing battery arrays and solar panels, and LED light source arrays for illumination. These visible light source arrays are characterized by consisting of multiple single visible light sources, each or several of which can have their illuminance adjusted by regulating control parameters. As mentioned earlier, in these applications, it is necessary to control the illuminance values ​​at several points within the irradiation range of the visible light source array to meet specific irradiation requirements. This involves placing illuminance meters at the test points of interest within the irradiation range of the visible light source array, and adjusting the control parameters of each adjustable light source in the array so that the values ​​displayed by these illuminance meters simultaneously meet the illuminance requirements within a certain range. Since the illuminance values ​​at each point within the radiation range of a visible light source array are a complex superposition of all light sources, it is very difficult and impractical to manually adjust the illuminance values ​​of each adjustable light source to simultaneously meet the illuminance requirements of each test point. Furthermore, due to the inconsistency of the various visible light sources in the visible light source array and the diversity of illuminance requirements, it is impossible to use the light source control parameters to perform a simple linear superposition of illuminance to simultaneously meet the various illuminance control requirements. This is because the control requirement is a typical multi-objective, multi-variable non-convex optimization problem, which is difficult to solve using conventional mathematical methods. Moreover, manually adjusting each adjustable light source to make multiple illuminance test points simultaneously meet the illuminance requirements is even more impossible to achieve effective control. Summary of the Invention

[0005] The purpose of this invention is to provide a reinforcement learning-based illuminance adjustment method to solve the technical problem that existing technologies cannot simultaneously adjust each tunable light source in a visible light source array so that multiple illuminance test points can simultaneously meet the illuminance requirements.

[0006] To achieve the above objectives, this invention provides a reinforcement learning-based illuminance adjustment method for visible light source arrays, characterized by the following steps:

[0007] Step 1: The visible light source array includes n visible light sources, of which m visible light sources have adjustable illuminance through control parameters, represented by the set Φ = {L1, L2, ..., L...}. m} indicates that n≥2, n≥m; the illuminance control parameter value of the k-th visible light source is The adjustable range of parameters is 1≤k≤m, L k ∈Φ, Within the irradiation range of the visible light source array, there are p illuminance meters for monitoring and real-time feedback of illuminance values, denoted by the set Γ = {Z1, Z2, ..., Z...}. p} represents the final adjustment value of the r-th lux meter. satisfy 1≤r≤p, Z r ∈Γ;

[0008] Step 2: Construct two identical evaluation neural networks Q(a) t s t |ω t Q ), Q′(a t s t |ω t Q′ ) and two action neural networks μ(s) with identical structures t |ω t μ ), μ′(s t |ω t μ′ ); where ω t Q ω t Q′ ω t μ ω t μ′ Let be the parameters of the neural network at stage t; allocate an experience storage pool with a dimension of M×(m+2p+1), where each experience pool can store a total of M experiences, and each time it can store (a... t s t s t+1 R t (m+2p+1) represents a single experience point, with a size of (m+2p+1). This experience storage pool can extract N single experiences for calculation, where N≤M. The experience storage pool is continuously updated, and the total number of experiences is less than or equal to M. t Let s be the action vector for stage t; t Let t be the state vector at stage t. Let t be the reward function for stage t; Indicates stage t Indicates stage t V represents stage t sum V sum For all The sum of values ​​where 1 ≤ r ≤ p The illuminance meter value at stage t The calculation shows that stage t represents any adjustment process in which all adjustable visible light sources in the visible light source array change the illuminance within the irradiance range according to the illuminance control parameter value calculated by the neural network.

[0009] Step 3: Calculate the illuminance control parameter values ​​based on the neural network:

[0010] 3.1) Set the stage counter variable t = 0, and set the stage 0 state vector s t Assign it to a vector of all 1s;

[0011] 3.2) Obtain the action vector a in stage t. t =μ(s) t |ω t μ )+ξ t , where ξ t These are random samples of m-dimensional standard Gaussian noise;

[0012] 3.3) For the action vector a in stage t t The k-th element Transform using the following formula:

[0013]

[0014] 3.4) Place a t The value corresponding to the k-th element is used as the L-th element. k The parameter values ​​for controlling the illuminance intensity of each visible light source are obtained, resulting in a total of m parameter values. These parameter values ​​are then assigned to all adjustable visible light sources in real time to adjust the illuminance of the visible light source array.

[0015] 3.5) After the illuminance of the visible light source is adjusted, collect the values ​​of all illuminance meters. And obtain according to the following formula And obtain the state vector of stage t+1.

[0016]

[0017] 3.6) If the values ​​of all lux meters satisfy Proceed to step 3.8; if the condition is not met, proceed to step 3.7.

[0018] 3.7) (a) t s t s t+1 R t Store h experiences into the experience storage pool; extract h experiences from the experience storage pool to update the network; update the evaluation neural network Q using the method for updating the critic network in the deep deterministic policy gradient algorithm; update the action neural network μ using the method for updating the actor network in the deep deterministic policy gradient algorithm; update the evaluation neural network Q′ and the action neural network μ′ respectively using the method for updating the target network in the deep deterministic policy gradient algorithm; increment the stage count variable t by 1, return to step 3.2, and use the s obtained in step 3.5. t+1The value is assigned to s in step 3.2. t ;

[0019] 3.8) End illuminance adjustment.

[0020] Furthermore, the action neural network described in step 2 comprises a fully connected layer I, an activation function I, a fully connected layer II, and an activation function II connected in sequence, with its input being s. t Output a t .

[0021] Furthermore, activation function I is a linear rectified function, and activation function II is a hyperbolic tangent function.

[0022] Furthermore, the evaluation neural network described in step 2 comprises a fully connected layer III, an activation function III, a fully connected layer IV, and an activation function IV connected in sequence, with s as its input. t and a t The output is the evaluation value.

[0023] Furthermore, both activation functions III and IV employ linear rectified functions.

[0024] The beneficial effects of this invention are:

[0025] 1. This invention simultaneously collects feedback values ​​from illuminance meters at various illuminance test points and utilizes reinforcement learning to automatically adjust the control parameters of each adjustable visible light source in a visible light source array. Through continuous algorithm iteration, it achieves automated adjustment while meeting the illuminance requirements of each illuminance test point. The control method provided by this invention can effectively solve the illuminance control needs of visible light source arrays at various radiation points, thus improving and enriching the application scenarios of visible light source arrays.

[0026] 2. This invention solves the problem of the inability to manually adjust the illuminance of visible light source arrays (especially large-scale visible light source arrays) under illuminance control requirements based on reinforcement learning methods.

[0027] 3. This invention constructs a modeling scheme for a visible light source array with illuminance control requirements, constructs various reinforcement learning components that can be specifically implemented, and realizes a reinforcement learning algorithm for illuminance adjustment of a visible light source array. Attached Figure Description

[0028] Figure 1 This is a schematic diagram of the action neural network structure in an embodiment of the present invention;

[0029] Figure 2 This is a schematic diagram of the structure of the evaluation neural network in an embodiment of the present invention. Detailed Implementation

[0030] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0031] The characteristics and control requirements of the visible light source array to which this invention is applicable can be described as follows:

[0032] (a) The visible light source array consists of n (n≥2) visible light sources, of which m are visible light sources whose illuminance can be adjusted by control parameters, and n≥m. All visible light sources can be numbered L1, L2, ..., L... m Therefore, the visible light source array can be represented by the set Φ = {L1, L2, ..., L...} m} represents the k-th (1≤k≤m, L) light source array. k The adjustable range of the parameters controlling the illuminance intensity of (∈Φ) visible light sources is expressed as the range. The illuminance control parameter value for the k-th visible light source is

[0033] (b) Place p (p≥1) illuminance meters within the irradiance range of the visible light source array to monitor and provide real-time feedback of illuminance values. All illuminance meters can be numbered Z1, Z2, ..., Z... p Therefore, all illuminance meters can be represented by the set Γ = {Z1, Z2, ..., Z...} p} represents the r-th lux meter among all lux meters (1≤r≤p, Z). r The readings of ∈Γ) lux meters are expressed as

[0034] (c) The illuminance control requirements for the visible light source array are characterized by the values ​​of the illuminance meters described in (b), i.e., all illuminance meters Γ = {Z1, Z2, ..., Z...} p The value of} needs to be adjusted to a certain range, for the r-th (1≤r≤p, Z) r The final adjustment value of each lux meter (∈Γ) Should meet Among them, for The specific value is determined by the application scenario that the visible light source array is intended for.

[0035] By analyzing the characteristics and control requirements of the visible light source arrays listed in (a) to (c) above, this invention constructs these characteristics and control requirements into the following non-convex optimization problem:

[0036]

[0037] Among them, V sum For all The sum of, Values ​​from the illuminance meter The calculation is as follows:

[0038]

[0039] This invention effectively solves the problems listed in formulas (1) and (2) using a reinforcement learning-based algorithm. During the reinforcement learning training process, it can automatically adjust the values ​​of each illuminance control parameter in the visible light source array, ultimately meeting the illuminance control requirements for the array.

[0040] The reinforcement learning algorithm described in this invention is constructed based on the Deep Deterministic Policy Gradient Algorithm (DDPG). Before training the algorithm, the following components required by the algorithm are pre-designed:

[0041] (1) Construct the action vector for stage t as a t Construct the state vector for stage t as follows: Construct the reward function for stage t as follows in Indicates stage t Indicates stage t V represents stage t sum The t-stage represents the following process: all illuminance values ​​sampled by the illuminance meters are returned to the reinforcement learning algorithm. The reinforcement learning algorithm calculates the illuminance control parameter values ​​for all adjustable light sources in the array based on the illuminance values ​​sampled by the illuminance meters. Then, all adjustable light sources in the array feed back the illuminance control parameter values ​​calculated by the reinforcement learning algorithm to the adjustable light sources in the array and implement these parameter values. After these parameter values ​​are implemented, the illuminance within the irradiance range of the light source array changes.

[0042] (2) This experience storage pool It can store a total of M experiences, and each time it can store (a) t s t s t+1 R t Let M be a single experience instance, with a size of (m+2p+1). This experience storage pool can extract N (N≤M) single experiences for computation. The experience storage pool starts storing from the first experience instance. When the (M+1)th experience instance needs to be stored, it starts storing from the first experience instance again, overwriting the original experience, and so on, so that the experience storage pool is always updated and the total number of experiences does not exceed M.

[0043] (3) Construct two evaluation neural networks, Q(a) and Q(a) respectively. t ,st |ω t Q ), Q′(a t ,s t |ω t Q′ Construct two action neural networks, μ(s) and μ(s) respectively. t |ω t μ ), μ′(s t |ω t μ′ The two evaluation neural networks have identical structures, such as... Figure 1 As shown, the action neural network consists of a fully connected layer I, activation function I, a fully connected layer II, and activation function II connected in sequence, with s as its input. t Output a t The two action neural networks have the same structure, such as... Figure 2 As shown, the evaluation neural network consists of a fully connected layer III, an activation function III, a fully connected layer IV, and an activation function IV connected in sequence, with s as its input. t and a t The output is the evaluation value. Where ω t Q ω t Q′ ω t μ ω t μ′ These are the parameters of the neural network at stage t. Figure 1 In this study, activation function I uses the linear rectification function, and activation function II uses the hyperbolic tangent function. Figure 2 In this example, activation functions III and IV both employ linear rectified functions.

[0044] By constructing the components described in the algorithm above, the specific implementation steps can be described as follows:

[0045] Step 1: Set the stage counter variable t = 0. Set the stage 0 state vector s... t Assign it to a vector of all 1s.

[0046] Step 2: Obtain the action vector a for stage t. t =μ(s) t |ω t μ )+ξ t , where ξ t These are random samples of m-dimensional standard Gaussian noise;

[0047] Step 3, for at The k-th (1≤k≤m) element Perform the following transformation:

[0048]

[0049] Step 4, place a t The value corresponding to the k-th element is used as the L-th element. k The parameters for controlling the illuminance intensity of each visible light source are obtained, resulting in a total of m parameter values. These values ​​are then assigned in real time to all adjustable visible light sources to adjust the illuminance of the visible light source array.

[0050] After steps 5 and 4 are completed, collect the values ​​of all lux meters and obtain the result according to formula (2). And obtain the state vector of stage t+1.

[0051] Step 6, if the result obtained in step 5 The value satisfies This indicates that the illuminance of the entire visible light source array meets the requirements, and proceed to step 8. If the above requirements are not met, proceed to step 7.

[0052] Step 7, (a) t s t s t+1 R t Store in the experience storage pool From the experience storage pool Extract h experiences to update the network. Update the evaluation neural network Q using the method for updating the critic network in the Deep Deterministic Policy Gradient Algorithm (DDP), and update the action neural network μ using the method for updating the actor network in the DDP. Update the neural networks Q′ and μ′ respectively using the method for updating the target network in the DDP. Increment the stage count variable t by 1, return to step 2, and use the s obtained in step 5. t+1 The value assigned to s t That is, the s t The value is the s used in step 2. t The value of .

[0053] Step 8: End illuminance adjustment.

[0054] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any changes or substitutions within the technical scope disclosed in the present invention should be covered within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

Claims

1. A reinforcement learning-based illuminance adjustment method for visible light source arrays, characterized in that, Includes the following steps: Step 1: The visible light source array includes... One visible light source, among which A single visible light source can adjust its illuminance by controlling parameters, using a collection of... express, , ;No. The illuminance control parameter value for each visible light source is The adjustable range of parameters is , , Within the irradiation range of the visible light source array, there are... A set of illuminance meters used to monitor and provide real-time feedback of illuminance values. It means that the first Final adjustment value of individual illuminance meter satisfy , ; Step 2: Construct two evaluation neural networks with identical structures. , and two action neural networks with identical structures , ; in, , , , for The parameters of the neural network in the stage; allocate a dimension of size The experience storage pool, which can store a total of [number] experience storage pools. Each experience can be stored as a single point. This is a single experience point constructed, and the size of this single experience point is... Simultaneously, this experience storage pool can be retrieved. A single experience is used for calculation, N≤M, and this experience storage pool is constantly updated while the total number of experiences is less than or equal to M. ; for Phase action vector; for Stage state vector, for Stage reward function; express stage ; express stage ; , , pass The value of the illuminance meter during the period Calculated; A stage represents any adjustment process in which all adjustable visible light sources in the visible light source array change the illuminance within the irradiation range according to the illuminance control parameter value calculated by the neural network; Step 3: Calculate the illuminance control parameter values ​​based on the neural network: 3.1) Set the stage count variable The state vector of stage 0 Assign it to a vector of all 1s; 3.2) Obtain the first Phase action vector ,in for Random sampled values ​​of standard Gaussian noise; 3.3) Regarding the first Phase action vector The Middle element Transform using the following formula: ; 3.4) will The Middle The value corresponding to the i-th element is used as the i-th element. The parameter values ​​for controlling the illuminance intensity of a visible light source were obtained in total. These parameter values ​​are assigned to all adjustable visible light sources in real time to adjust the illuminance of the visible light source array. 3.5) After the illuminance of the visible light source array is adjusted, collect the values ​​of all illuminance meters. and obtained ; and thus obtain the first Stage state vector ; 3.6) If the values ​​of all lux meters satisfy If the condition is not met, proceed to step 3.8; otherwise, proceed to step 3.

7. 3.7) will Store in the experience storage pool; retrieve from the experience storage pool One experience is used to update the network; the evaluation neural network is updated using the method for updating the critic network in the deep deterministic policy gradient algorithm. Update the action neural network using the method of updating the actor network in the deep deterministic policy gradient algorithm. The evaluation neural network is updated using the method for updating the target network in the deep deterministic policy gradient algorithm. Action Neural Network ; the stage count variable Add 1, return to step 3.2, and use the result obtained in step 3.

5. The value is assigned to step 3.

2. ; 3.8) End illuminance adjustment.

2. The illumination adjustment method based on reinforcement learning according to claim 1, characterized in that: The action neural network described in step 2 includes a fully connected layer I, activation function I, a fully connected layer II, and activation function II connected in sequence, with its input being... Output .

3. The illumination adjustment method based on reinforcement learning according to claim 2, characterized in that: The activation function I is a linear rectified function, and the activation function II is a hyperbolic tangent function.

4. A method for adjusting illumination based on reinforcement learning according to any one of claims 1-3, characterized in that: The evaluation neural network described in step 2 includes a fully connected layer III, an activation function III, a fully connected layer IV, and an activation function IV connected in sequence, with its input being... and The output is the evaluation value.

5. The illumination adjustment method based on reinforcement learning according to claim 4, characterized in that: Both activation functions III and IV employ linear rectified functions.