A monitoring telescope array control method based on reinforcement learning
By constructing a reinforcement learning-based control method for monitoring telescope arrays, the problem of detection equipment in space debris monitoring networks being unable to acquire effective and redundant data was solved, thus achieving efficient space debris monitoring and resource optimization.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- TAIYUAN UNIVERSITY OF TECHNOLOGY
- Filing Date
- 2022-07-22
- Publication Date
- 2026-06-19
AI Technical Summary
In existing space debris monitoring networks, the detection equipment cannot obtain effective data and there is a large amount of redundant data.
A reinforcement learning-based control method for the monitoring telescope array is adopted. By constructing a digital universe model, the Monte Carlo method is used to generate orbital data of space debris. Combined with a deep reinforcement learning-based intelligent scheduling strategy, the task scheduling of the monitoring equipment is optimized.
It has achieved efficient data acquisition for space debris detection networks, reduced redundant data, and improved monitoring capabilities and resource utilization efficiency.
Abstract
Description
Technical Field
[0001] This invention relates to the field of space debris detection. Background Technology
[0002] All non-functional man-made objects generated by human space activities are collectively referred to as space debris. Statistics show that since the beginning of space exploration, over 39,000 trackable pieces of space debris have been created by various missions. Currently, over 16,000 pieces are still in orbit, posing a significant threat to all types of space activities. To ensure the safety of space activities, space debris monitoring is indispensable.
[0003] After years of development, my country has established a comprehensive scientific space observation system for space debris monitoring, including optoelectronic imaging systems, radar detection systems, and laser telemetry systems. This system, through long-term operation, has accumulated a large amount of space debris monitoring data and achieved routine target cataloging. However, in recent years, the scale and frequency of official and civilian space missions worldwide have grown rapidly, leading to a sharp increase in the number of space debris and a continuously rising demand for space debris monitoring. Maximizing the effectiveness of the observation system based on existing equipment has become a hot topic in space debris monitoring research. Since space debris does not emit light, under the current system, monitoring systems either actively emit electromagnetic waves towards the debris and receive the echoes (radar detection systems or laser telemetry systems), or passively receive reflected light from the debris (optoelectronic imaging systems), ultimately obtaining information by processing the echoes or reflected light. Due to limitations in monitoring conditions and equipment performance, all of the aforementioned information acquisition methods have certain limitations, necessitating the combined use of multiple methods to comprehensively characterize the properties of space debris. In addition, space debris orbit determination and cataloging tasks also require the acquisition of monitoring data for multiple arc segments of the same space debris target. Therefore, joint monitoring by multiple devices in different geographical locations is an inevitable trend in the field of space debris monitoring.
[0004] In summary, for space debris, establishing a space debris monitoring network by integrating various devices is an inevitable development direction for space debris monitoring research. From a practical perspective, a space debris monitoring network should not only routinely monitor existing space debris to maintain the catalog, but also promptly detect new targets or monitor key targets as needed. When the number of space debris is small and only larger or brighter targets are being monitored, traditional automatic time-series task scheduling algorithms can achieve task scheduling for the space debris monitoring network. However, the rapidly increasing number of space debris and the ever-increasing requirements for the safety of space missions have led to new directions for improving the monitoring capabilities of space debris networks: space debris monitoring is beginning to develop towards the monitoring of massive numbers of small targets and the rapid monitoring of critical targets. When the monitoring targets are smaller-scale space debris in higher orbits, the number of targets that need to be monitored will increase significantly; at the same time, the differences in monitoring capabilities among different devices will become more significant (for example, with photoelectric imaging systems, small targets can only be monitored by large-aperture devices or devices located at better sites, while small-aperture devices located at other sites will not be able to obtain effective data on these targets).
[0005] Traditional scheduling methods do not consider the actual observation capabilities and conditions of monitoring equipment, treating only static units with a certain theoretical visibility. Furthermore, they fail to account for the redundancy of different types of space debris data. Therefore, the combined effect of these two factors can lead to two problems: firstly, scheduled equipment may fail to acquire valid target data, reducing the capability of the space debris monitoring network; secondly, the monitoring network may acquire a large amount of unnecessary redundant data, wasting network monitoring resources. Therefore, it is urgent to develop a novel scheduling method for space debris monitoring networks based on research into the redundancy of space debris data and the data acquisition capabilities of debris monitoring equipment. Summary of the Invention
[0006] The technical problem to be solved by this invention is the problem that detection equipment may be unable to obtain valid target data and that monitoring networks may contain a large amount of unnecessary redundant data.
[0007] The technical solution adopted in this invention is: a reinforcement learning-based control method for a monitoring telescope array, which involves modeling space debris and the telescope monitoring array system, performing accuracy measurement and analysis of the target orbit data of space debris, and constructing an intelligent scheduling strategy based on deep reinforcement learning. Specifically, the method is carried out according to the following steps.
[0008] Step 1: Generate a digital universe for space debris monitoring based on target orbit monitoring simulation data using the Monte Carlo method. Building upon existing TLE orbital reporting data for space debris, the spg4 library in Python is used to return the trajectory of space debris in the simulated environment. The monitoring equipment for space debris is used as a simulated telescope. The ephem library in Python is used to return the parameters of the simulated telescope used by the target space debris at the target detection time. The spg4 library in Python returns the trajectory of space debris in the simulated environment, and the ephem library in Python returns the parameters of the simulated telescope used by the target space debris at the target detection time to construct the digital universe. In other words, the digital universe uses the spg4 and ephem libraries in Python to return the trajectory of the space debris and the parameters of the simulated telescope used by the target space debris at the target detection time.
[0009] Step 2: Monitoring accuracy of different space debris on different monitoring devices. Using a simulated telescope, the trajectory of the space debris returned during the target detection time is compared with the latest TLE orbit report data. The monitoring accuracy of the same space debris under different monitoring devices is recorded. The aperture (c) and field of view (F) parameters of the monitoring devices are used as inputs, and the monitoring accuracy is used as the label. A regression algorithm from the sklearn library in Python is used to fit the correspondence between the monitoring device parameters and the measurement accuracy of space debris. Step 3: Combining the digital universe constructed in Step 1 and the monitoring accuracy of different space debris on different monitoring devices described in Step 2, an intelligent scheduling strategy based on deep reinforcement learning is constructed. The digital universe from Step 1 is used as the training environment, the number of explored space debris (N) is used as the exploration effect, and the monitoring time interval for a single monitorable space target is M. i The monitoring time interval for all monitorable space targets The system's monitoring performance is used as the system's state space; the current monitoring and exploration performance of the simulated telescopes is used as the state space; the combinations of different simulated telescopes performing monitoring or exploration tasks are used as the action space; and the monitoring and exploration performance are multiplied by different coefficients according to the emphasis of the monitoring or exploration task to serve as the reward R for reinforcement learning, i.e., R = Here, α and β represent the proportion of exploration and monitoring in this task. A multilayer perceptron is used as the policy network for reinforcement learning to fit the Q-values of state-action pairs for policy evaluation. The multilayer perceptron consists of three fully connected layers, with the dimension of the state space as input and the dimension of the action space as output. At certain time intervals, the current policy is improved using another identical policy network, which then selects the action. Through continuous iterative learning, a reinforcement learning-based spatial debris monitoring array control method is completed. Based on the current exploration and monitoring effects of the entire spatial monitoring array, it autonomously chooses whether to perform monitoring or exploration tasks in subsequent time periods. By pre-setting the reward coefficients for monitoring and exploration, it tends to perform monitoring tasks, thus satisfying the monitoring requirements for known spatial targets within a unit of time while exploring more spatial debris.
[0010] In step one, the step of using the SPG4 library in Python to return the trajectory of space debris in the simulated environment refers to using the SPG4 library in Python to establish the correspondence between the position, brightness, and distance between the space debris and the Earth in the celestial coordinate system and time. This correspondence is used as a model of the distribution of space debris, and the position, brightness, and distance between the target space debris and the Earth are returned at the target detection time.
[0011] In step one, the step of returning the parameters of the simulated telescope used by the target space debris at the target detection time through the ephem library in Python refers to establishing the correspondence between the azimuth angle θ, elevation angle Ф, aperture c, and field of view F of the simulated telescope pointing at the target space debris in the celestial coordinate system and time. This is done by using the azimuth angle θ, elevation angle Ф, aperture c, and field of view F of the actual simulated telescope pointing at the target space debris. The condition is met when the deviations of az and θ are within the simulated telescope's field of view F, the deviations of alt and Ф are within the simulated telescope's field of view F, and the product of the brightness value (bright) and the simulated telescope's aperture c is less than a set threshold. , , At that time, the target space debris can be monitored by a simulated telescope, with a brightness value of bright = scale / h.
[0012] The process of comparing the trajectory of space debris returning from the target detection time using a simulated telescope with the latest TLE orbital data of the space debris is as follows: 1. Acquire TLE orbital data of all nearby space debris while the monitoring equipment is performing its monitoring task. Use the acquired TLE orbital data to perform a cone search on each space debris, and record the elevation angle and azimuth angle measurements at each time step when the monitoring equipment monitors the space debris. 2. Review the historical TLEs of the target space debris, using the latest past TLE as an initial guess. By examining the historical evolution of the target space debris's TLE orbital data, generate boundary conditions for monitoring each space debris. Input the boundary conditions and the measured values of elevation angle and azimuth angle, fit the satellite orbit, and output the error between the trajectory of the identified space debris and the trajectory of the space debris returning from the target detection time using the simulated telescope, and the latest TLE orbital data of the space debris.
[0013] The space debris includes all celestial bodies within the solar system and all celestial bodies observed outside the solar system.
[0014] The beneficial effects of this invention are as follows: This invention realizes the construction of a digital universe for space debris detection network, and uses the Monte Carlo sampling method to generate space debris data that simulates the distribution of the real world; on this basis, it establishes a method for precision measurement and analysis of space orbit data, and studies the influence of monitoring equipment parameters on the measurement accuracy of space debris; finally, it uses reinforcement learning technology to establish an intelligent control method for space debris monitoring network, and realizes rapid perception of the space debris situation. Detailed Implementation
[0015] A reinforcement learning-based control method for a monitoring telescope array is proposed. This method involves modeling space debris and the telescope monitoring array system, performing accuracy measurement and analysis of target orbit data for space debris, and constructing an intelligent scheduling strategy based on deep reinforcement learning. The specific steps are as follows:
[0016] Step 1: Generate a digital universe for space debris monitoring based on target orbit monitoring simulation data using the Monte Carlo method. Building upon existing TLE orbital reporting data for space debris, the spg4 library in Python is used to return the trajectory of space debris in a simulated environment. The monitoring equipment for space debris is used as a simulated telescope. The ephem library in Python is used to return the parameters of the simulated telescope used by the target space debris at the target detection time. The digital universe is constructed using the spg4 and ephem libraries in Python to return the trajectory of space debris in the simulated environment and the parameters of the simulated telescope used by the target space debris at the target detection time.
[0017] Registered users can download all published TLE orbital data from space-track.org. We use the Monte Carlo sampling method to select 500 data points from tens of thousands as space debris in the simulation environment. The Python spg4 library returns the trajectory of the space debris in the simulation environment. In the Python ephem library, by inputting the observer's latitude and longitude, the azimuth and elevation angles of the target relative to the observer can be obtained. We set up five observation locations in Beijing, Boston, London, Sydney, and Cape Town, South Africa, and added parameters such as telescope aperture and field of view to limit the telescope's observation range and minimum brightness requirements. An algorithm was written to ensure that observation can only be performed if the space target's location is within the telescope's limited range and its brightness meets the telescope's minimum brightness requirement. Users can add space targets to the environment, or add or remove telescopes, move telescope positions, etc., according to their needs.
[0018] Step 2: Monitoring accuracy of different space debris on different monitoring devices. By using a simulated telescope to monitor the target space debris, the trajectory of the space debris returned at the target detection time is compared with the latest TLE orbit report data of the space debris. The monitoring accuracy of the same space debris under different monitoring devices is recorded. The aperture c and field of view F parameters of the monitoring device are used as inputs, and the monitoring accuracy is used as the label. The regression algorithm in the sklearn library of Python language is used to fit the correspondence between the monitoring device parameters and the measurement accuracy of space debris.
[0019] The current research mainly focuses on the impact of telescope field of view and aperture on monitoring effectiveness. Four field of view settings are used: 5 degrees, 10 degrees, 15 degrees, and 20 degrees. The aperture range is between 1 meter and 5 meters. The monitoring accuracy of the same space debris under different monitoring devices is recorded. The aperture (c) and field of view (f) of the monitoring device are used as inputs, and the monitoring accuracy is used as the label. A regression algorithm from the sklearn library in Python is used to fit the impact of the monitoring device parameters on the measurement accuracy of space debris. Step three: Combining the digital universe constructed in step one and the monitoring accuracy of different space debris under different monitoring devices described in step two, an intelligent scheduling strategy based on deep reinforcement learning is constructed. The digital universe from step one is used as the training environment, the number of explored space debris (N) is used as the exploration effect, and the monitoring time interval for a single monitorable space target is M. i The monitoring time interval for all monitorable space targets The system's monitoring performance is used as the system's state space; the current monitoring and exploration performance of the simulated telescopes is used as the state space; the combinations of different simulated telescopes performing monitoring or exploration tasks are used as the action space; and the monitoring and exploration performance are multiplied by different coefficients according to the emphasis of the monitoring or exploration task to serve as the reward R for reinforcement learning, i.e., R = Here, α and β represent the proportion of exploration and monitoring in this task. A multilayer perceptron is used as the policy network for reinforcement learning to fit the Q-values of state-action pairs for policy evaluation. The multilayer perceptron consists of three fully connected layers, with the dimension of the state space as input and the dimension of the action space as output. At certain time intervals, the current policy is improved using another identical policy network, which then selects the action. Through continuous iterative learning, a reinforcement learning-based spatial debris monitoring array control method is completed. Based on the current exploration and monitoring effects of the entire spatial monitoring array, it autonomously chooses whether to perform monitoring or exploration tasks in subsequent time periods. By pre-setting the reward coefficients for monitoring and exploration, it tends to perform monitoring tasks, thus satisfying the monitoring requirements for known spatial targets within a unit of time while exploring more spatial debris.
[0020] The digital universe constructed in step 1 is used as the training environment; the number N of explored space targets is used as the system's exploration effect. If there are 500 space targets in the simulated environment, N can reach a maximum of 500; the monitoring time interval for a single detectable space target is M. i M i The size of M is related to the duration of the environmental simulation; the longer the simulation duration, the larger M will be. i The larger the value, the longer the monitoring time interval will be for all monitorable space targets. The system's monitoring performance is used as the basis; the telescope's current monitoring and exploration performance is used as the state space for reinforcement learning, with the state space having the following dimensions. The action space for reinforcement learning is defined by the permutations and combinations of different telescopes used to perform monitoring or exploration tasks. The action space has a dimension of 2^n, where n is the number of telescopes. Depending on the emphasis of the monitoring or exploration task, the monitoring and exploration results are multiplied by different coefficients as the reward for reinforcement learning, i.e., R = Here, α and β represent the proportion of exploration and monitoring in this task, which can be adjusted as needed. We set α = 20 and β = 1. A multilayer perceptron is used as the policy network for reinforcement learning to fit the behavioral values of state-action pairs for policy evaluation. The multilayer perceptron consists of three fully connected layers, with the dimension of the state space as the input and the dimension of the action space as the output. At certain time intervals, the current policy is improved using another identical policy network, which then selects the action. Through continuous iterative learning, the agent learns to select which task to execute based on different exploration and monitoring results, far exceeding the reward value obtained by human experience in task selection. In a three-day simulation with 500 spatial targets and 5 telescopes, the agent can obtain an average reward value of 2500, exceeding the reward value obtained by human experience in task selection, and it completes the task fully automatically, realizing the Sitian Intelligent Brain scheduling algorithm.
[0021] The above description is merely a preferred embodiment of the present invention. The scope of protection of the present invention is not limited to the above embodiments. All technical solutions falling within the scope of the present invention's concept are within the scope of protection of the present invention. It should be noted that for those skilled in the art, any improvements and modifications made without departing from the principles of the present invention should also be considered within the scope of protection of the present invention.
Claims
1. A monitoring telescope array control method based on reinforcement learning, characterized in that: Modeling of space debris and telescope monitoring array systems, accuracy measurement and analysis of target orbit data of space debris, and construction of an intelligent scheduling strategy based on deep reinforcement learning are carried out in the following steps. Step 1: Generate a digital universe for space debris monitoring based on target orbit monitoring simulation data using the Monte Carlo method. Building upon existing TLE orbital reporting data for space debris, the spg4 library in Python is used to return the trajectory of space debris in the simulated environment. The monitoring equipment for space debris is used as a simulated telescope. The ephem library in Python is used to return the parameters of the simulated telescope used by the target space debris at the target detection time. The spg4 library in Python returns the trajectory of space debris in the simulated environment, and the ephem library in Python returns the parameters of the simulated telescope used by the target space debris at the target detection time to construct the digital universe. In other words, the digital universe uses the spg4 and ephem libraries in Python to return the trajectory of the space debris and the parameters of the simulated telescope used by the target space debris at the target detection time. Step 2: Monitoring accuracy of different space debris on different monitoring devices. Using a simulated telescope, the trajectory of the space debris returned during the target detection time is compared with the latest TLE orbit report data. The monitoring accuracy of the same space debris under different monitoring devices is recorded. The aperture (c) and field of view (F) parameters of the monitoring devices are used as inputs, and the monitoring accuracy is used as the label. A regression algorithm from the sklearn library in Python is used to fit the correspondence between the monitoring device parameters and the measurement accuracy of space debris. Step 3: Combining the digital universe constructed in Step 1 and the monitoring accuracy of different space debris on different monitoring devices described in Step 2, an intelligent scheduling strategy based on deep reinforcement learning is constructed. The digital universe from Step 1 is used as the training environment, the number of explored space debris (N) is used as the exploration effect, and the monitoring time interval for a single monitorable space target is M. i The monitoring time interval for all monitorable space targets The system's monitoring performance is used as the system's state space; the current monitoring and exploration performance of the simulated telescopes is used as the state space; the combinations of different simulated telescopes performing monitoring or exploration tasks are used as the action space; and the monitoring and exploration performance are multiplied by different coefficients according to the emphasis of the monitoring or exploration task to serve as the reward R for reinforcement learning, i.e., R = Here, α and β represent the proportion of exploration and monitoring in this task. A multilayer perceptron is used as the policy network for reinforcement learning to fit the behavioral values Q of the state-action pairs for policy evaluation. The multilayer perceptron consists of three fully connected layers, with the dimension of the state space as the input and the dimension of the action space as the output. At certain time intervals, the current policy is improved using another identical policy network, which then selects the action. Through continuous iterative learning, a reinforcement learning-based spatial debris monitoring array control method is completed. Based on the current exploration and monitoring effects of the entire spatial monitoring array, it autonomously chooses whether to perform monitoring or exploration tasks in subsequent time periods. By setting monitoring and exploration reward coefficients in advance and selecting tasks that tend to favor monitoring, we can explore more space debris while meeting the monitoring requirements for known space targets within a unit of time.
2. The method for controlling a monitoring telescope array based on reinforcement learning according to claim 1, characterized in that: In step one, the step of using the SPG4 library in Python to return the trajectory of space debris in the simulated environment refers to using the SPG4 library in Python to establish the correspondence between the position, brightness, and distance between the space debris and the Earth in the celestial coordinate system and time. This correspondence is used as a model of the distribution of space debris, and the position, brightness, and distance between the target space debris and the Earth are returned at the target detection time.
3. The monitoring telescope array control method based on reinforcement learning according to claim 1, characterized in that: In step one, the step of returning the parameters of the simulated telescope used by the target space debris at the target detection time through the ephem library in Python refers to establishing the correspondence between the azimuth angle θ, elevation angle Ф, aperture c, and field of view F of the simulated telescope pointing at the target space debris in the celestial coordinate system and time. This is done by using the azimuth angle θ, elevation angle Ф, aperture c, and field of view F of the actual simulated telescope pointing at the target space debris. The condition is met when the deviations of az and θ are within the simulated telescope's field of view F, the deviations of alt and Ф are within the simulated telescope's field of view F, and the product of the brightness value (bright) and the simulated telescope's aperture c is less than a set threshold. , , At that time, the target space debris can be monitored by a simulated telescope, with a brightness value of bright = scale / h.
4. The monitoring telescope array control method based on reinforcement learning according to claim 1, characterized in that: The steps for comparing the trajectory of space debris returning from the target detection time using a simulated telescope with the latest TLE orbital data of the space debris are as follows:
1. When the monitoring equipment is performing monitoring tasks, acquire TLE orbital data of all nearby space debris, use the acquired TLE orbital data to perform a cone search on each space debris, and record the elevation angle and azimuth angle measurement data of each time step when the monitoring equipment monitors the space debris; 2. View the TLEs published in the past of the object, and use the latest TLE in the past as the initial guess. By examining the historical evolution of the TLE orbit reports of the target space debris, boundary conditions are generated for the monitoring of each space debris. The boundary conditions and the measured values of elevation angle and azimuth angle are input, the satellite orbit is fitted, and the error between the trajectory of the identified space debris and the space debris returned by the space debris monitored by the simulated telescope at the target detection time is output and the latest space debris TLE orbit report data is obtained.
5. The monitoring telescope array control method based on reinforcement learning according to claim 1, characterized in that: The space debris includes all celestial bodies within the solar system and all celestial bodies observed outside the solar system.
Citation Information
Patent Citations
Method for autonomously capturing space debris by super-redundant mechanical arm based on reinforcement learning algorithm
CN112809675A
Space object maneuver detection
US20210261276A1