A multi-task concurrent remote sensing satellite network elastic resource reconstruction method and system
By performing gridded processing and dynamic scheduling of the remote sensing satellite network, combined with opportunity cost incentives, the problem of insufficient emergency response capability in remote sensing satellite mission planning has been solved. This enables second-level dynamic emergency response and resource optimization for sudden missions, thereby improving the robustness and collaborative efficiency of the system.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- XIDIAN UNIV
- Filing Date
- 2026-04-23
- Publication Date
- 2026-06-23
AI Technical Summary
Existing remote sensing satellite mission planning methods are ill-suited to handle sudden and time-sensitive emergency missions. They suffer from problems such as high communication overhead, short-sighted local decision-making, resource contention, and insufficient response delay and resource flexibility optimization, resulting in the inability to image emergency targets in a timely manner and poor system robustness.
A multi-task concurrent remote sensing satellite network elastic resource reconfiguration method is adopted. By gridding the target area, the observation, active idle or passive idle actions are generated by the pre-trained target policy network. Combined with the opportunity cost incentive mechanism, dynamic emergency response and resource reservation are realized, avoiding high-frequency inter-satellite communication.
It enables second-level dynamic emergency response to sudden tasks, improves the system's sustainable response capability and resource coordination effect in continuous emergency scenarios, reduces communication complexity, and is suitable for rapid response and emergency task handling of low-Earth orbit constellations.
Smart Images

Figure CN122093271B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of satellite remote sensing communication technology, specifically relating to a method and system for elastic resource reconfiguration of remote sensing satellite networks with multiple concurrent tasks. Background Technology
[0002] With the rapid growth in demand for Earth observation, low-Earth orbit constellations composed of various types of remote sensing satellites, including optical and synthetic aperture radar, have become a crucial infrastructure supporting land monitoring, environmental assessment, and emergency disaster relief. However, existing remote sensing satellite mission planning methods still face significant challenges in responding to sudden and time-sensitive emergency missions.
[0003] First, traditional task scheduling models often employ static or batch processing methods, treating tasks as a fixed set for one-time allocation. This makes them ill-suited to the dynamic insertion characteristics of unforeseen events. When unexpected tasks occur randomly, existing scheduling schemes often lack rapid reconfiguration capabilities, resulting in delayed imaging of emergency targets and missed critical response windows. Second, multi-satellite collaborative scheduling commonly suffers from a contradiction between high communication overhead and short-sighted local decision-making. While centralized scheduling can achieve global optimization, it relies on high-frequency inter-satellite communication, leading to low reliability in emergency scenarios. Fully distributed methods, on the other hand, struggle to coordinate multi-satellite behavior, easily resulting in resource contention or insufficient emergency task coverage. Especially in scenarios where unexpected tasks have absolute priority, guiding satellites to autonomously reserve resources and collaboratively ensure high-priority targets without explicit communication remains an unresolved technical challenge. Finally, existing reward or objective function designs often focus only on the number of tasks completed or total revenue, lacking a joint optimization mechanism for response latency and resource elasticity. This leads scheduling strategies to prioritize executing all visible tasks as early as possible, exhausting imaging attempts and failing to respond to subsequent unexpected demands, resulting in poor system robustness. Therefore, there is an urgent need for a remote sensing satellite network resource reconstruction method that can integrate dynamic task perception, decentralized collaborative decision-making and flexible resource reservation mechanisms, so as to achieve high-quality observation of sudden areas in complex environments with multiple concurrent tasks and limited resources. Summary of the Invention
[0004] To address the aforementioned problems in the existing technology, this invention provides a method and system for resilient resource reconfiguration in remote sensing satellite networks with multiple concurrent tasks.
[0005] The technical problem to be solved by this invention is achieved through the following technical solution:
[0006] In a first aspect, the present invention provides a method for resilient resource reconfiguration in a multi-task concurrent remote sensing satellite network, comprising:
[0007] The target area is gridded to obtain multiple grid targets;
[0008] Determine the set of visible time windows for the satellite to be planned for multiple grid targets;
[0009] Based on the pre-acquired attribute data of the satellite to be planned, the current mission status data, and the set of visible time windows, determine the current local observation status data;
[0010] The current local observation status data is input into the pre-trained target policy network, which outputs the next target action of the satellite to be planned. The next target action is an observation action, an active idle action, or a passive idle action. An active idle action means that an idle action is performed to wait for handling sudden tasks when there is an executable visible time window.
[0011] Secondly, the present invention provides a multi-task concurrent remote sensing satellite network elastic resource reconfiguration system, comprising:
[0012] The region gridding module is used to process the target region into grids, resulting in multiple grid targets;
[0013] The observation window construction module is used to determine the set of visible time windows for multiple grid targets of the satellite to be planned;
[0014] The local observation status determination module is used to determine the current local observation status data based on the pre-acquired attribute data of the satellite to be planned, the current mission status data, and the set of visible time windows;
[0015] The strategy network prediction module is used to input the current local observation state data into the pre-trained target strategy network and output the next target action of the satellite to be planned. The next target action is an observation action, an active idle action, or a passive idle action. An active idle action means that an idle action is performed to wait for handling sudden tasks when there is an executable visible time window.
[0016] This invention provides a method and system for resilient resource reconfiguration in multi-task concurrent remote sensing satellite networks. It overcomes the limitations of existing methods that rely on the assumptions of "batch prediction" or "static insertion" of sudden tasks, achieving online incremental scheduling capabilities for continuous, random, and streaming sudden events. This invention models sudden tasks as runtime random events, allowing any grid to transition from a regular to a sudden event at any time. A pre-trained policy network enables second-level action generation and lightweight conflict verification without restarting the optimization process. This mechanism gives the system true "dynamic emergency" response capabilities, filling the technological gap in the evolution from "static emergency scheduling" to "continuous online reconfiguration."
[0017] The present invention will now be described in further detail with reference to the accompanying drawings. Attached Figure Description
[0018] Figure 1This is a flowchart illustrating a method for resilient resource reconfiguration in a multi-task concurrent remote sensing satellite network provided by an embodiment of the present invention.
[0019] Figure 2 This is a schematic diagram of the target area gridding provided in an embodiment of the present invention;
[0020] Figure 3 This is a schematic diagram of the training process of the policy network provided in an embodiment of the present invention;
[0021] Figure 4 This is a structural block diagram of a multi-task concurrent remote sensing satellite network elastic resource reconfiguration system provided in an embodiment of the present invention;
[0022] Figure 5 This is a schematic diagram of another exemplary multi-task concurrent elastic resource reconfiguration system for remote sensing satellite networks provided in this embodiment of the invention. Detailed Implementation
[0023] The present invention will be further described in detail below with reference to specific embodiments, but the implementation of the present invention is not limited thereto.
[0024] This invention provides a method for resilient resource reconfiguration in multi-task concurrent remote sensing satellite networks. See [link to relevant documentation]. Figure 1 The method includes the following steps:
[0025] S10. The target area is gridded to obtain multiple grid targets.
[0026] For example, the target region set is represented as: , Indicates the first Each target area Defined by its geographic boundary, which is represented as an ordered sequence of vertex coordinates: any vertex The geographic coordinates are represented as , The geographic location information is represented as follows: .
[0027] like Figure 2 As shown, the Gauss-Krüger projection is used. The projection zone is automatically determined based on the central longitude of the target area, and the projection is based on the central meridian. The vertex is then projected... geographic coordinates Mapped to Cartesian coordinates .
[0028] For the target area Construct its minimum bounding rectangle boundary, and denote the four extreme coordinates of this bounding rectangle boundary as: minimum x-coordinate Maximum x-coordinate Minimum y-coordinate and the maximum ordinate Divide the circumscribed rectangle boundary evenly along the horizontal axis into sections. Each unit is evenly divided along the longitudinal axis. Each unit generates a set of continuous and adjacent regular grid targets, denoted as:
[0029]
[0030] in, Indicates that it is located in the target area No. Column, number The coordinates of the top right vertex of the grid target in the Gaussian plane are:
[0031]
[0032] target area The total number of corresponding grid targets is .
[0033] S20. Determine the set of visible time windows for multiple grid targets for the satellite to be planned.
[0034] For example, the satellite to be planned can be any satellite in any satellite network.
[0035] Optionally, step S20 may specifically include:
[0036] S201. Construct the initial set of visible time windows for multiple grid targets for the satellite to be planned.
[0037] For example, the initial set of visible time windows for the satellite to be planned for multiple grid targets is calculated. :
[0038]
[0039]
[0040] in, Representative satellites to be planned For grid targets The A visible time window, , This represents the total number of time windows. express right The The start time of a visible time window. express right The The end time of a visible time window. This represents the index of the satellite to be planned (corresponding to the satellites in the training phase). The index representing the grid target (corresponding to the grid sample in the training phase).
[0041] S202. Based on the imaging type, resolution, start and end times of each initial visible time window in the initial visible time window set of the satellite to be planned, and preset filtering conditions, the initial visible time window set is filtered to obtain the visible time window set of the satellite to be planned for multiple grid targets.
[0042] For example, the preset filtering conditions include: if Imaging type Not belonging to Required imaging type Then remove right All visible time windows; if resolution Not satisfied Resolution requirement range Then remove right All visible time windows; if the end time of a certain visible time window is... Later The latest acceptable time or start time Earlier The earliest acceptable time If so, then the visible time window will be removed.
[0043] S30. Based on the pre-acquired attribute data of the satellite to be planned, the current mission status data, and the set of visible time windows, determine the current local observation status data.
[0044] Optionally, step S30 may specifically include:
[0045] S301. Obtain the attribute data and current mission status data of the satellite to be planned.
[0046] The attribute data includes satellite remote sensor type, ground resolution, and maximum daily imaging limit, while the current mission status data includes the number of imaging operations performed by the satellite to be planned and the latest mission end time.
[0047] For example, the first The number of imaging operations performed by each satellite to be planned is represented as follows:
[0048]
[0049] in, This represents the maximum index value of the grid targets in the target area. It is a scheduling decision variable, if In the Observed within a visible time window The value is 1 if it is not 0 otherwise.
[0050] The latest task completion time is expressed as:
[0051]
[0052] S302. Construct current local observation status data based on the set of visible time windows, the attribute data of the satellite to be planned, and the current mission status data.
[0053] S40. Input the current local observation status data into the pre-trained target policy network and output the next target action of the satellite to be planned.
[0054] The next target action can be an observation action, an active idle action, or a passive idle action. An active idle action means that an idle action is performed to wait for a sudden task to be handled when there is an executable visible time window.
[0055] For example, an active idle action means that there is at least one executable observation task (i.e., a visible time window) at the current moment, but the satellite to be planned chooses not to perform any observation action in order to reserve imaging opportunities or time resources for subsequent potential emergencies; a passive idle action means that there is no executable observation task at the current moment, including situations where there is no visible time window, all visible time windows have been allocated, there is a time conflict, or the imaging opportunity has reached its limit.
[0056] Optionally, refer to Figure 3 The training process of the target policy network includes:
[0057] A1. Based on the pre-acquired attribute data samples, mission status data samples, and visible time window set samples of multiple grid samples in the regional samples of each satellite, determine the local observation status data samples of each satellite.
[0058] Specifically, multiple satellites form a satellite network. The specific steps for determining the local observation state data samples for each satellite during the training phase can be found in the detailed process of obtaining local observation state data during the inference phase, and will not be elaborated upon here.
[0059] A2. Determine the global state data sample based on the pre-determined task data sample of each grid sample and the current imaging count sample of each satellite.
[0060] Optionally, the mission data sample includes coverage variables for each grid sample, mission type representation, trigger time of sudden missions, and actual imaging time of each imaged grid sample.
[0061] For example, grid samples Covering variables This characterizes whether each grid sample has been effectively covered; a value of 1 indicates coverage, and a value of 0 indicates otherwise. The task type identifier for each grid sample is denoted as... ,in, Indicates a routine task. This indicates an emergency task; the trigger time for each emergency task is denoted as... The actual imaging time of each imaged grid sample is recorded as follows: The current number of imaging samples for each satellite is recorded as follows: , Indices representing the satellites during the training phase. Indicates the index of the grid sample during the training phase.
[0062] A3. Based on local observation state data samples, global state data samples, and pre-constructed action space, train the policy network and value network of each satellite until the policy network of each satellite completes parameter updates, and obtain the target policy network of each satellite.
[0063] Optionally, step A3 may specifically include:
[0064] A31. Based on local observation state data samples, the strategy network of each satellite samples action samples from the pre-constructed action space; and calculates the reward samples corresponding to the action samples based on the preset reward function to generate trajectory data.
[0065] Among them, the visible time window set sample in the local observation state data sample includes both sudden tasks and regular tasks.
[0066] For example, the action space of each satellite is pre-constructed for any remote sensing satellite. The observation actions it performs are represented as binary pairs. ,in Indicates the index of the grid sample to be observed. This represents the maximum index value of the grid sample to be observed. This represents the index of the visible time window for the satellite in that grid sample. This represents the maximum index value of the visible time window for this grid sample. Active idle action is defined as: denoted as... This indicates that at the current decision step, there is at least one executable observation task, but the satellite chooses not to perform any observations to conserve imaging opportunities or time resources for subsequent potential contingency tasks; a passive idle action is defined as: denoted as This indicates that there are no executable observation tasks at the current decision step, including situations where there is no visible time window, all visible time windows have been allocated, there is a time conflict, or the maximum number of imaging attempts has been reached.
[0067] The policy network and value network for each satellite are pre-constructed using a fully connected neural network. Specifically, 1) the input layer of the policy network is set up to receive local observation state data samples. 2) Set up hidden layers for the policy network, each containing 128 neurons, and use the ReLU activation function for nonlinear transformation; 3) Set up the output layer for the policy network, and use the softmax activation function to generate the probability distribution of discrete actions. 4) Set up the value network input layer to receive global state data samples. 5) Set up hidden layers for the value network, each containing 128 neurons, and use the ReLU activation function for nonlinear transformation; 6) Set up output layers for the value network, outputting scalar values. , representing the expected discounted return starting from the current global state, uses a linear activation function.
[0068] During the training phase, local observation state data samples are used. The input policy network performs action sampling and online multidimensional constraint verification. Each satellite's policy network samples action samples in parallel, and the central coordinator verifies conflicts using grid samples as units and assigns absolute priority to emergency tasks to ensure that high-priority targets are allocated resources first.
[0069] In addition, this embodiment designs a resource reservation incentive mechanism based on opportunity cost to drive elastic resource reservation behavior: during the state initialization phase of each training round e, a portion of the tasks (partial visible time window samples) in the task set (i.e., the visible time window set samples) are randomly transformed into a burst task set. The remaining tasks constitute the regular task set. And calculate the total number of unexpected tasks in this training round. ,in It is used only for reward normalization and is not used as input to the policy network for any observation information.
[0070] Optionally, a preset reward function is defined as follows:
[0071]
[0072] in, This indicates the preset reward function. This represents a burst coverage reward for each burst mission that achieves its first successful observation. This refers to a timely reward for each emergency mission that achieves a first successful observation. This represents the positive reward for each successful first observation in a routine task. This indicates the penalty when there is at least one executable routine task but the satellite chooses an idle action. Indicates the first One decision-making step.
[0073] Optionally, burst coverage bonus Represented as:
[0074]
[0075] in, Indicates the first Grid samples corresponding to each emergency task Is it in the decision-making step? For the first time, it was effectively observed. express The covered variable, i.e. Whether it is effectively covered Indicates training rounds A collection of unexpected tasks, This indicates the total number of emergency tasks.
[0076] Specifically,
[0077]
[0078]
[0079] Optional, emergency response bonus Represented as:
[0080]
[0081] in, Represents grid samples The maximum possible response latency, express The time of observation express The trigger time for sudden tasks. This represents a constant that prevents division by zero.
[0082] Optional, positive rewards Represented as:
[0083]
[0084] in, Indicates the weighting factor. Indicate decision steps The number of routine tasks completed by China and Singapore.
[0085] Specifically,
[0086]
[0087] Optionally, punishment Represented as:
[0088]
[0089] in, Indicate decision steps The number of regular tasks that were not selected for execution due to idle time.
[0090] At every decision step Samples of local observation status data from each satellite Action samples output by the policy network ,award and global state data samples at the next time step Store in experience buffer Obtain trajectory data; from the experience buffer Randomly sample a batch of trajectory segments Used for network updates.
[0091] A32. Input the trajectory data and global state samples into the value network corresponding to each satellite to obtain the value estimate; calculate the advantage estimate based on the value estimate and the advantage function.
[0092] For example, TD residuals are calculated from trajectory segments. :
[0093]
[0094] in, Let be the discount factor. This is the termination time of the current episode, used to monitor the accuracy of the value function; This represents the predicted value estimate of the value network output. Represents the network parameters of the value network.
[0095] According to TD residuals The generalized advantage estimate is calculated as follows:
[0096]
[0097] The decay parameter of the generalized dominance estimation (GAE) is: , express The temporal difference residual (TD residual) at time step 1.
[0098] A33. Based on the advantage estimation and the preset policy network loss function, update the parameters of the policy network, substitute the value estimate and the target value into the preset value loss function, calculate the value loss value, and update the parameters of the value network through backpropagation; until the policy network and the value network complete the iterative update, the target policy network of each satellite is obtained.
[0099] For example, according to Construct the shearing probability ratio policy loss function, i.e., the preset policy network loss function. :
[0100]
[0101] in, This represents the mean operation. This represents the truncation function. To cut off the threshold, the probability ratio is restricted to an interval. Internally, implement trust region constraints. The ratio of the probability of actions under the old and new strategies:
[0102]
[0103] in, This represents the prediction result output by the policy network. This represents the historical prediction results output by the policy network.
[0104] Based on the calculated policy network loss, the network parameters of the policy network for each satellite are updated. .
[0105] Next, the mean squared error loss of the value network is calculated, i.e., the pre-defined value loss function is used. :
[0106]
[0107] Based on the calculated value loss, the network parameters of the value network for each satellite are updated.
[0108] Finally, online elastic task scheduling is performed based on the trained policy network.
[0109] The present invention provides a method for resilient resource reconfiguration in multi-task concurrent remote sensing satellite networks, which has the following advantages compared with the prior art:
[0110] 1. This invention designs a resource reservation incentive mechanism based on opportunity cost. When a satellite successfully executes a routine task, it receives a small positive reward. However, if at least one executable routine task exists in the environment but the satellite still chooses to remain idle, an equal amount of slight penalty is imposed. Under this reward structure, if the policy network over-executes routine tasks in the early stages of scheduling, resulting in the exhaustion of imaging opportunities, it will suffer a loss of emergency rewards far exceeding the initial routine gains if subsequent unexpected tasks occur due to the inability to allocate resources. This allows the policy network to spontaneously evolve a behavior pattern of "moderate restraint in the early stages and proactive preservation of imaging capacity" during reinforcement learning training without human rule intervention. This provides proactive resilience in the face of future uncertainties and significantly improves the sustainable response capability in continuous unexpected scenarios.
[0111] 2. This invention models each satellite as an independent intelligent agent, whose policy network generates observation decisions based solely on local observations, without needing to exchange states with other satellites, thus avoiding the problem of inter-satellite communication delays. Simultaneously, by introducing a shared value network, using the global state as input, it outputs a long-term system reward estimate, which is then used to calculate the generalized advantage function to guide the updates of each policy network. This design enables each agent to perceive the impact of other satellite behaviors on the overall objective during the training phase, achieving a high degree of coordination even in the online execution phase without real-time communication. This results in superior multi-satellite resource coordination with lower communication complexity, making it particularly suitable for engineering applications requiring high-frequency revisits and rapid responses in low-Earth orbit constellations.
[0112] 3. This invention models sudden tasks as external random events, meaning that each grid point can dynamically transform from a regular task to a sudden task at any simulation moment, immediately triggering a reconfiguration of the scheduling strategy. Because the trained policy network possesses generalization reasoning capabilities, there is no need to restart the entire optimization process. Each satellite can generate new actions within seconds based solely on current local observations, and outputs feasible solutions after lightweight central verification. This mechanism enables this invention to adapt to real emergency command processes, maintaining high responsiveness in environments with continuous task injection, filling the gap in the evolution of existing technologies from "static emergency response" to "dynamic emergency response."
[0113] Furthermore, this invention has the following application prospects: It can be directly applied to low-Earth orbit constellation mission planning systems composed of multiple heterogeneous remote sensing satellites such as optical and SAR satellites, and is particularly suitable for operational scenarios that require simultaneous handling of routine observation plans and high-priority emergency requests. For example, in daily operations, satellites perform routine tasks such as land patrols and crop monitoring according to plan; when sudden wildfires, floods, or earthquakes occur, the ground command center can mark the disaster area as an "emergency task." The system does not need to interrupt the current scheduling process or restart the global optimization algorithm. Instead, each satellite, based on its pre-trained policy network and current local state (such as remaining imaging attempts and visibility window), autonomously decides whether to reserve resources or respond immediately. The central coordinator only needs to perform lightweight conflict verification (such as prioritizing allocation to the earliest requester when multiple satellites simultaneously apply for the same grid point, and ensuring absolute priority for emergency tasks) to quickly output a feasible scheduling scheme. This "online incremental reconstruction" capability is significantly superior to existing systems that require replanning at the minute or even hour level.
[0114] Furthermore, the decentralized architecture and resource reservation mechanism of this invention make it highly suitable for deployment in onboard intelligent mission management systems or edge ground stations. Since each satellite makes decisions based solely on its local observation status, without relying on high-frequency inter-satellite communication or continuous full-status data transmission to the ground center, the load on the satellite-to-ground link can be significantly reduced, improving the system's robustness in situations of communication constraints or central node failure. Simultaneously, through a reinforcement learning-embedded "opportunity cost" incentive mechanism, the system can automatically balance routine mission throughput and emergency response capabilities during long-term operation, avoiding the problem of insufficient resources for urgent missions due to excessive execution of low-priority tasks. This characteristic is particularly important for remote sensing satellites with limited daily imaging attempts and short orbital periods.
[0115] Corresponding to the above-described method for resilient resource reconfiguration of multi-task concurrent remote sensing satellite networks, this invention also provides a system for resilient resource reconfiguration of multi-task concurrent remote sensing satellite networks; such as Figure 4 As shown, the system may include:
[0116] The region meshing module 401 is used to perform meshing processing on the target region to obtain multiple mesh targets;
[0117] The observation window construction module 402 is used to determine the set of visible time windows for multiple grid targets of the satellite to be planned;
[0118] The local observation status determination module 403 is used to determine the current local observation status data based on the attribute data of the satellite to be planned, the current mission status data, and the set of visible time windows that have been acquired in advance.
[0119] The strategy network prediction module 404 is used to input the current local observation state data into the pre-trained target strategy network and output the next target action of the satellite to be planned. The next target action is an observation action, an active idle action, or a passive idle action. The active idle action means that an idle action is performed to wait for the handling of sudden tasks when there is an executable visible time window.
[0120] For details on the system, please refer to the steps of the first aspect of the method for resilient resource reconfiguration of multi-task concurrent remote sensing satellite networks, which will not be repeated here.
[0121] Furthermore, during the training phase, this embodiment utilizes the training system described above, as detailed in the following reference. Figure 5 The training system includes a regional target gridding module, a satellite network and observation window construction module, an agent state and action modeling module, and a decentralized reinforcement learning scheduling and decision-making module.
[0122] The regional target gridding module includes a regional target boundary resolution module, a coordinate projection transformation module, and a regular grid partitioning module. The regional target boundary resolution module resolves the boundaries of the user-specified target region (or region sample) and converts them into a unified standard format. The coordinate projection transformation module converts geographic information from different coordinate systems to the same standard coordinate system, ensuring the consistency and compatibility of all data and facilitating satellite observation and scheduling. The regular grid partitioning module divides the target region (or region sample) into a series of regular grid targets (or grid samples). Each grid target (or grid sample) represents a potential observation task point, contributing to efficient task allocation and resource management.
[0123] The satellite network and observation window construction module includes a satellite parameter acquisition submodule and a visible time window calculation submodule. The satellite parameter acquisition submodule is used to collect and update key parameter information of each satellite, including but not limited to imaging resolution, sensor type, orbital altitude, etc., to provide basic data support for mission planning. The visible time window calculation submodule is used to calculate the visible time window of each grid target (or grid sample) within a specific time period, that is, the set of time periods during which the satellite can effectively observe the area.
[0124] The agent state and action modeling functional module includes a state space submodule and an action modeling submodule. The state space submodule comprises a local observation state acquisition unit and a global state acquisition unit. The local observation state acquisition unit generates local observation state data, providing immediate feedback for decision-making. The global state acquisition unit integrates information from the entire satellite network, helping the agent understand the overall task execution status and resource utilization efficiency. The action modeling submodule includes an execution observation action unit and an active / passive idle unit. The execution observation action unit guides the satellite on when and where to perform imaging operations. The active / passive idle unit, when no suitable task is available, determines whether to enter an active waiting (reserving resources) or passive waiting (no task available) state to optimize resource utilization.
[0125] The decentralized reinforcement learning scheduling and decision-making module includes a policy network submodule, a shared value network submodule, an opportunity cost-based elastic resource reservation incentive submodule, and a central scheduling and decision-making submodule. The policy network submodule improves task completion rate and resource utilization through the trained policy network; the shared value network submodule establishes a shared value assessment system; the opportunity cost-based elastic resource reservation incentive submodule encourages agents to rationally reserve resources according to the opportunity cost principle to cope with sudden high-priority tasks; and the central scheduling and decision-making submodule is responsible for final coordination to ensure that critical tasks receive timely responses and processing.
[0126] The present invention provides a multi-task concurrent elastic resource reconfiguration system for remote sensing satellite networks, which has the following advantages:
[0127] First, this invention breaks through the limitations of existing methods that rely on the assumptions of "batch prediction" or "static insertion" of sudden tasks. It achieves online incremental scheduling capabilities for continuous, random, and streaming sudden events. This invention models sudden tasks as runtime random events, allowing any grid to transition from a regular to a sudden event at any time. It also achieves second-level action generation and lightweight conflict verification through a pre-trained policy network, without requiring a restart optimization process. This mechanism enables the system to possess true "dynamic emergency" response capabilities, filling the technological gap in the evolution from "static emergency scheduling" to "continuous online reconstruction."
[0128] Secondly, this invention effectively solves the multi-satellite coordination problem in fully distributed methods while avoiding high communication overhead, overcoming the dual defects of poor scalability and weak scheme coordination. This invention adopts a hybrid architecture of "decentralized decision-making + shared value guidance": each satellite uses only its local state to make action selections, avoiding real-time communication; simultaneously, a shared value network transmits global coordination signals during the training phase, enabling agents to implicitly learn complementary behaviors. This design achieves near-fully distributed communication complexity while maintaining coordination effects close to centralized methods, making it particularly suitable for the engineering deployment of hundreds of low-Earth orbit remote sensing constellations.
[0129] Third, this invention introduces for the first time a flexible resource reservation incentive mechanism based on opportunity cost, fundamentally solving the problems of "short-sighted execution and resource depletion" and significantly improving the long-term response performance of the system under continuous perturbation. Through its designed reward structure, this invention gives high weight and timely rewards to sudden tasks, provides only minor positive incentives to routine tasks, and imposes controllable penalties for "actively remaining idle when there are tasks available." This allows the agent to spontaneously weigh "current benefits" against "future opportunity costs" during reinforcement learning, thereby forming a forward-looking resource reservation behavior.
[0130] It should be noted that, for the system, since it is basically similar to the method embodiment, the description is relatively simple, and relevant parts can be found in the description of the method embodiment.
[0131] It should be noted that the terms "first," "second," etc., are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of the invention described herein can be implemented in orders other than those illustrated or described herein. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatuses and methods consistent with some aspects of the invention.
[0132] In the description of this specification, the references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of the present invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Moreover, the specific features or characteristics described may be combined in any suitable manner in one or more embodiments or examples. Furthermore, those skilled in the art can combine and integrate the different embodiments or examples described in this specification.
[0133] Although the invention has been described herein in conjunction with various embodiments, those skilled in the art will understand and implement other variations of the disclosed embodiments by reviewing the accompanying drawings and the disclosure in carrying out the claimed invention. In the description of the invention, the word "comprising" does not exclude other components or steps, "a" or "an" does not exclude a plurality, and "a plurality" means two or more, unless otherwise explicitly specified. Furthermore, while different embodiments may describe certain measures, this does not mean that these measures cannot be combined to produce good results.
[0134] The above description, in conjunction with specific preferred embodiments, provides a further detailed explanation of the present invention. It should not be construed that the specific implementation of the present invention is limited to these descriptions. For those skilled in the art, various simple deductions or substitutions can be made without departing from the concept of the present invention, and all such modifications and substitutions should be considered within the scope of protection of the present invention.
Claims
1. A method for resilient resource reconfiguration in a multi-task concurrent remote sensing satellite network, characterized in that, include: The target area is gridded to obtain multiple grid targets; Construct an initial set of visible time windows for the multiple grid targets of the satellite to be planned; Based on the imaging type, resolution, start and end times of each initial visible time window in the initial visible time window set of the satellite to be planned, and preset filtering conditions, the initial visible time window set is filtered to obtain the visible time window set of the satellite to be planned for the multiple grid targets. Based on the pre-acquired attribute data of the satellite to be planned, the current mission status data, and the set of visible time windows, the current local observation status data is determined; The current local observation status data is input into a pre-trained target policy network, which outputs the next target action of the satellite to be planned. The next target action is an observation action, an active idle action, or a passive idle action. The active idle action means that an idle action is performed to wait for handling sudden tasks when there is an executable visible time window. The training process of the target policy network includes: Based on the pre-acquired attribute data samples, mission status data samples, and visible time window set samples of multiple grid samples in the regional sample for each satellite, the local observation status data samples of each satellite are determined. Based on the pre-determined task data samples of each grid sample and the current imaging count samples of each satellite, the global state data samples are determined; Based on the local observation state data samples, the strategy network of each satellite samples action samples from the pre-constructed action space; and calculates the reward samples corresponding to the action samples based on the preset reward function to generate trajectory data; the visible time window set samples in the local observation state data samples include sudden tasks and regular tasks; The trajectory data and the global state data samples are input into the value network of each satellite to obtain a value estimate; based on the value estimate and the advantage function, the advantage estimate is calculated. Based on the advantage estimation and the preset strategy network loss function, the parameters of the strategy network are updated. The value estimation and the target value are substituted into the preset value loss function to calculate the value loss value, and the parameters of the value network are updated through backpropagation. This process continues until the strategy network and the value network have been iteratively updated to obtain the target strategy network for each satellite.
2. The method for elastic resource reconfiguration of a multi-task concurrent remote sensing satellite network according to claim 1, characterized in that, The preset reward function is expressed as: in, This indicates the preset reward function. This represents a burst coverage reward for each burst mission that achieves its first successful observation. This refers to a timely reward for each emergency mission that achieves a first successful observation. This represents the positive reward for each successful first observation in a routine task. This indicates the penalty when there is at least one executable routine task but the satellite chooses an idle action. Indicates the first One decision-making step.
3. The method for flexible resource reconfiguration of a multi-task concurrent remote sensing satellite network according to claim 2, characterized in that, The sudden coverage reward Represented as: in, Indicates the first Grid samples corresponding to each emergency task Is it in the decision-making step? For the first time, it was effectively observed. express The covered variable, i.e. Whether it is effectively covered Indicates training rounds A collection of unexpected tasks, Indicates training rounds The total number of emergency tasks; The emergency time-sensitive reward Represented as: in, Represents grid samples The maximum possible response latency, Represents grid samples The time of observation Represents grid samples The trigger time for sudden tasks. A constant that prevents division by zero; The positive reward Represented as: in, Indicates the weighting factor. Indicate decision steps The number of routine tasks completed by China and Singapore; The punishment Represented as: in, Indicate decision steps The number of regular tasks that were not selected for execution due to idle time.
4. The method for elastic resource reconfiguration of a multi-task concurrent remote sensing satellite network according to claim 3, characterized in that, The task data sample includes the coverage variables of each grid sample, the task type representation, the trigger time of the sudden task, and the actual imaging time of each imaged grid sample.
5. The method for elastic resource reconfiguration of a multi-task concurrent remote sensing satellite network according to claim 4, characterized in that, The step of determining the current local observation status data based on the pre-acquired attribute data of the satellite to be planned, the current mission status data, and the set of visible time windows includes: Acquire the attribute data and current mission status data of the satellite to be planned; the attribute data includes the satellite remote sensor type, ground resolution, and maximum daily imaging limit, and the current mission status data includes the number of imaging operations performed by the satellite to be planned and the latest mission end time; Based on the set of visible time windows, the attribute data of the satellite to be planned, and the current mission status data, the current local observation status data is constructed.
6. A multi-task concurrent remote sensing satellite network elastic resource reconfiguration system, characterized in that, include: The region gridding module is used to process the target region into grids, resulting in multiple grid targets; The observation window construction module is used to construct an initial set of visible time windows for the satellite to be planned on the multiple grid targets; based on the imaging type, resolution, start time and end time of each initial visible time window in the initial visible time window set of the satellite to be planned, and preset filtering conditions, the initial visible time window set is filtered to obtain the set of visible time windows for the satellite to be planned on the multiple grid targets. The local observation status determination module is used to determine the current local observation status data based on the attribute data of the satellite to be planned, the current mission status data, and the set of visible time windows that have been acquired in advance. The strategy network prediction module is used to input the current local observation state data into the pre-trained target strategy network and output the next target action of the satellite to be planned. The next target action is an observation action, an active idle action, or a passive idle action. The active idle action means that an idle action is performed to wait for the handling of sudden tasks when there is an executable visible time window. The training process of the target policy network includes: Based on the pre-acquired attribute data samples, mission status data samples, and visible time window set samples of multiple grid samples in the regional sample for each satellite, the local observation status data samples of each satellite are determined. Based on the pre-determined task data samples of each grid sample and the current imaging count samples of each satellite, the global state data samples are determined; Based on the local observation state data samples, the strategy network of each satellite samples action samples from the pre-constructed action space; and calculates the reward samples corresponding to the action samples based on the preset reward function to generate trajectory data; the visible time window set samples in the local observation state data samples include sudden tasks and regular tasks; The trajectory data and the global state data samples are input into the value network of each satellite to obtain a value estimate; based on the value estimate and the advantage function, the advantage estimate is calculated. Based on the advantage estimation and the preset strategy network loss function, the parameters of the strategy network are updated. The value estimation and the target value are substituted into the preset value loss function to calculate the value loss value, and the parameters of the value network are updated through backpropagation. This process continues until the strategy network and the value network have been iteratively updated to obtain the target strategy network for each satellite.