Low earth orbit satellite resource scheduling system and method

The low-Earth orbit satellite resource scheduling system, which combines channel prediction with multi-agent collaboration, solves the problems of dynamic characteristics and insufficient channel prediction in low-Earth orbit satellite networks, achieves global optimization of resource allocation and task scheduling, and improves the system's adaptability and robustness.

CN122247493APending Publication Date: 2026-06-19HARBIN INST OF TECH +2

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HARBIN INST OF TECH
Filing Date
2026-05-15
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing low-Earth orbit satellite network resource scheduling technologies are ill-suited to highly dynamic characteristics, lack channel prediction accuracy, are deficient in multi-satellite collaborative optimization capabilities, and are not robust enough in extreme scenarios, thus failing to achieve global optimization of resource allocation and task scheduling.

Method used

By combining channel prediction and multi-agent collaboration, an UE-HAP-LEO three-layer network environment is constructed using the FMA channel prediction module and the MADDPG multi-agent collaborative scheduling framework. Through modular system structure and timing methods, joint optimization scheduling of task offloading, spectrum resources and transmit power is achieved.

🎯Benefits of technology

It improves the scheduling accuracy and adaptability of the system in environments with frequent topology switching and severe channel fluctuations, realizes joint optimization and balance of multi-objective performance, improves transmission efficiency and load balancing, and enhances the robustness and scalability of the system.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122247493A_ABST
    Figure CN122247493A_ABST
Patent Text Reader

Abstract

This invention discloses a low-Earth orbit (LEO) satellite resource scheduling system and method, belonging to the interdisciplinary field of LEO satellite communication and mobile edge computing. The system includes a network state acquisition and environment construction module, a channel state prediction module, a collaborative scheduling decision module, a scheduling execution and performance evaluation module, and a model training and parameter management module. It constructs a three-layer dynamic network environment (UE-HAP-LEO), utilizes an FMA module integrating Mamba and attention mechanisms to predict future channel states, and uses the predicted information as input to an enhanced observation multi-agent decision network. This network collaboratively generates joint strategies for task offloading, resource allocation, and power control through centralized training and distributed execution. This invention achieves global optimization under multiple objective constraints such as throughput, latency, energy efficiency, and fairness through a closed-loop "perception-prediction-decision-execution" process, significantly improving system resource utilization efficiency and collaborative scheduling performance.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the interdisciplinary field of low-Earth orbit satellite communication and mobile edge computing (MEC), specifically relating to a low-Earth orbit satellite resource scheduling system and method. It can be applied to scenarios such as 6G non-terrestrial networks (NTN), seamless global communication, disaster emergency response, and network coverage in remote areas. Background Technology

[0002] With the rapid development of the Internet of Things (IoT), mobile internet, and artificial intelligence (AI) applications, the demand for low latency, high bandwidth, and computing power in emerging applications such as augmented reality / virtual reality (AR / VR), real-time autonomous driving perception, and precise control of industrial IoT is growing exponentially. Traditional terrestrial networks are limited by physical deployment and geographical environment, resulting in significant coverage blind spots in complex terrains (mountains, canyons), remote areas (deserts, forests), vast oceans, and high-altitude environments. They are also vulnerable to damage from natural disasters (earthquakes, floods) and extreme weather (hurricanes, blizzards), making it difficult to guarantee reliable network access in critical areas.

[0003] Low Earth Orbit (LEO) satellite networks, with their advantages of low altitude, wide coverage, and relatively controllable latency, have become a core solution for achieving seamless global coverage. Deeply integrating LEO satellite networks with Mobile Edge Computing (MEC) to construct a collaborative three-layer computing framework of "terminal-satellite-cloud" can achieve an optimal balance between global performance and resource utilization efficiency through flexible task offloading paths (local computing → LEO satellite edge computing → terrestrial cloud server). However, the highly dynamic nature of LEO satellite networks (rapid satellite movement, frequent topology changes, and drastic fluctuations in channel conditions), the optimization challenges of multi-objective coupling, and the complexity of multi-satellite collaboration make existing resource scheduling technologies insufficient to meet practical application needs. Therefore, there is an urgent need to develop intelligent scheduling technologies with dynamic adaptability, accurate prediction, and collaborative optimization capabilities.

[0004] Existing technologies related to resource allocation and mission scheduling for low-Earth orbit (LEO) satellite constellations can be categorized into several main types: resource allocation methods, multiple access technologies, SMEC architecture, multi-satellite cooperation strategies, and channel prediction and scheduling algorithms. Resource allocation includes fixed allocation and dynamic allocation. Fixed allocation is prone to resource waste due to uneven spatiotemporal distribution of service demands. While dynamic allocation focuses on flexible scheduling of multi-domain resources, it rarely considers core characteristics such as inter-beam interference, multi-beam joint scheduling, and LEO satellite mobility. Among multiple access technologies, NOMA technology achieves shared time-frequency resources for multiple users through superposition coding and serial interference cancellation, but it suffers from problems such as insufficient consideration of channel gain differences in user grouping, high computational complexity of power allocation algorithms, or failure to take into account service and channel differences. SMEC architecture is mostly a three-layer collaborative system of "terminal-satellite-cloud," but the integration of satellite and ground-cloud computing power is insufficient, and dynamic environment adaptation relies on static models or convex optimization theory, making it difficult to cope with time-varying and non-convex joint optimization challenges.

[0005] Multi-satellite collaboration often employs distributed game theory or federated learning, modeling inter-satellite task allocation as a non-cooperative game or achieving collaboration through model parameter aggregation. However, dynamic collaborative modeling for large-scale constellation clusters is immature, lacking robustness in extreme scenarios such as satellite failures and sudden surges in traffic. Channel prediction often uses traditional models such as LSTM and TCN, which suffer from insufficient long-term dependency capture and slow perception of sudden local disturbances. Traditional multi-agent reinforcement learning methods, on the other hand, face drawbacks such as low collaboration efficiency, reward inflation, and decision lag, making them unsuitable for the highly dynamic characteristics of low-Earth orbit satellite networks. Overall, existing technologies struggle to accurately characterize the dynamic characteristics of satellite networks, suffer from insufficient channel prediction accuracy leading to scheduling failures, lack multi-satellite collaborative optimization capabilities, and exhibit insufficient robustness in extreme scenarios, failing to achieve global optimization in resource allocation and task scheduling. Summary of the Invention

[0006] The purpose of this invention is to address the shortcomings of the existing technologies by proposing a low-Earth orbit (LEO) satellite resource scheduling system and method. For a UE-HAP-LEO three-layer non-terrestrial network structure, this invention organically combines network environment modeling, channel prediction, and multi-agent reinforcement learning. Through a modular system structure and a time-series methodology, it achieves joint optimization scheduling of task offloading, spectrum resources, and transmit power.

[0007] To achieve the above-mentioned technical effects, the technical solution adopted by the present invention is as follows: A low-Earth orbit satellite resource scheduling system based on channel prediction and multi-agent collaboration includes a network state acquisition and environment construction module, a channel state prediction module, a collaborative scheduling decision module, a scheduling execution and performance evaluation module, and a model training and parameter management module. The channel state prediction module is preferably implemented using the FMA channel prediction module, and the cooperative scheduling decision module is preferably implemented using the MADDPG multi-agent cooperative scheduling framework.

[0008] The network state acquisition and environment construction module is used to construct a UE-HAP-LEO three-layer integrated air-space-ground network environment, and to provide the channel state prediction module with a multi-time slot instantaneous channel state information historical sequence, and to provide the collaborative scheduling decision module with the current global state and local observation information of each intelligent agent; The channel state prediction module is used to predict the instantaneous channel state information of key links in several future time slots in a rapidly changing and partially observable environment; and to provide the cooperative scheduling decision module with a sequence of predicted instantaneous channel state information organized by link and time slot. The predicted instantaneous channel state information sequence is added to the local state of each agent in the cooperative scheduling decision module as enhanced observation information. The collaborative scheduling decision module is used to generate multi-node collaborative task offloading, resource allocation and power control strategies based on the predicted enhanced state information, so as to achieve joint optimization of the system's multi-objective performance. The scheduling execution and performance evaluation module is used to map the joint decision given by the collaborative scheduling decision module to specific network behaviors and to provide performance evaluation feedback on the execution results. The model training and parameter management module is used to maintain the experience replay pool and model parameters during the training process.

[0009] Furthermore, the network status acquisition and environment construction module is specifically implemented as follows: Given the basic parameters of low-Earth orbit satellite constellation orbit, high-altitude platform deployment altitude and geographical location, ground user spatial distribution and service arrival process, radio frequency carrier frequency and total system bandwidth, the visibility relationship and topology evolution model of three types of links, UE-HAP, UE-LEO and HAP-LEO, are established to obtain the network topology structure that changes over time. For different link characteristics, channel models including free space path loss, shadow fading, small-scale fading and Doppler shift are established respectively, and the instantaneous channel state information, received signal-to-noise ratio, reachable rate and bit error rate of each link in the current time slot are calculated. Based on the current link conditions and node resources, calculate the instantaneous rate and end-to-end transmission delay of each UE-HAP link, and statistically analyze the cumulative throughput and Jain fairness index of each user, as well as the queue length, remaining computing and spectrum resource usage of each HAP / LEO node, and the performance parameters of the number of HAP-LEO cooperative events. The topology, channel, and service information are encoded into the global network state of the current time slot and the local state that can be observed by each node; at the same time, the instantaneous channel state information sequence of several historical time slots is cached to provide input for the channel state prediction module.

[0010] Furthermore, the channel state prediction module is specifically implemented, preferably using an FMA channel prediction module: The system receives the instantaneous channel state information sequence of the target link over the past L time slots, as well as the dynamic features related to the link, output by the network state acquisition and environment construction module. The dynamic features are then normalized and encoded to form a unified time-series input representation. Channel prediction is achieved by constructing a time-series modeling structure that integrates the Mamba state-space model and a multi-head self-attention mechanism. The Mamba state-space model employs an input-related state transition mechanism to capture the evolution characteristics of instantaneous channel state information over long time scales with approximately linear computational complexity. The multi-head self-attention mechanism, on the other hand, performs weighted modeling of the hidden sequence output by Mamba, focusing on sudden interference and rapid changes in link quality over short time scales. The two temporal features are fused through residual connection, normalization, and a feedforward network to obtain a stable high-dimensional temporal representation. Time pooling and regression at the fully connected output layer then yield the predicted instantaneous channel state information of the target link over several future time slots. The cooperative scheduling decision module is specifically implemented, preferably using the MADDPG multi-agent cooperative scheduling framework. The UE, HAP, and LEO nodes are modeled as multiple agents, and a multi-agent deep deterministic policy gradient framework with centralized training and distributed execution is adopted: During the training phase, the centralized Critic network receives the global network state and the joint actions of all agents to evaluate the value; During the execution phase, the Actor network of each agent outputs actions independently based only on the local observations of its own node and the corresponding predicted instantaneous channel state information, thereby ensuring the scalability of the system in large-scale non-terrestrial network scenarios. Design corresponding state and action spaces for various intelligent agents: the state includes the service queue length of the node, the remaining computing and spectrum resources, the current instantaneous channel state information and the predicted instantaneous channel state information of the relevant links, the load status of neighboring nodes, etc.; the actions include task offloading decisions, the allocation ratio of local or on-board computing resources among different tasks, the transmit power control of UE-HAP and HAP-LEO links, and the priority arrangement of different tasks under resource-constrained conditions. Design a multi-objective constrained reward function that combines the system's total throughput, average latency, energy consumption, task completion rate, fairness, and HAP-LEO collaborative event count performance indicators into a single scalar reward, and introduces anti-idle and anti-free-rider penalty terms; when the average latency exceeds the threshold or Jain fairness is lower than the target value, the corresponding penalty term is amplified; when the system is in a low-cooperation state for a long time or individual agents do not participate in resource contribution for a long time, additional penalties are introduced. Based on the experience replay and soft update strategy, the parameters of the Critic network and each Actor network are iteratively updated until the cumulative reward converges, thus obtaining a stable multi-agent scheduling strategy. The output of the collaborative scheduling decision module is a joint scheduling decision for all UE / HAP / LEO nodes, including the execution location and transmission path of each task, the bandwidth and power allocation scheme of each link, and the scheduling order of the task queues within each node. This decision is then implemented in the network environment by the scheduling execution and performance evaluation module.

[0011] Furthermore, the scheduling execution and performance evaluation module is specifically implemented as follows: Based on scheduling decisions, the UE-HAP association, OFDM subcarrier allocation structure, and HAP-LEO NOMA superposition power are adjusted to trigger local execution or uplink offloading via HAP / LEO for selected tasks. The queue length, link occupancy, and energy consumption of each node are updated, and the total system throughput, average latency, fairness index, number of HAP-LEO cooperative events, and overall resource utilization efficiency performance indicators are calculated after the current time slot ends. The updated network status and performance indicators output by the scheduling execution and performance evaluation module are returned to the network status acquisition and environment construction module as the initial state for the next time slot. On the other hand, the status, actions, and rewards of this time slot are sent to the model training and parameter management module.

[0012] Furthermore, the model training and parameter management module receives the state-action-reward-next state quadruple from the environment, prediction, and scheduling modules, along with the corresponding instantaneous channel state information and the predicted instantaneous channel state information sequence, and stores and prioritizes them according to a certain strategy. During the training phase, samples are periodically drawn from the experience pool, and the prediction error loss of the channel state prediction module and the temporal difference loss and policy gradient of MADDPG are calculated respectively. The prediction model and the multi-agent policy network are then jointly or in stages updated. After training convergence, the model training and parameter management module distributes the updated model parameters to each execution node or inference platform. During online operation, only forward inference is performed without updating the parameters.

[0013] A method for scheduling low-Earth orbit (LEO) satellite resources based on channel prediction and multi-agent collaboration is implemented through a LEO satellite resource scheduling system based on channel prediction and multi-agent collaboration, and includes the following steps: Step 1: Based on the preset LEO constellation orbit parameters, HAP deployment scheme, ground user distribution and service arrival model, physical layer bandwidth and power constraints, construct a UE-HAP-LEO three-layer network, and generate the initial network topology, channel status and node resource status by the network status acquisition and environment construction module. Step 2: In each scheduling time slot, the network state acquisition and environment construction module collects the current location of each node, link visibility, instantaneous channel state information measurement values, queue length and resource occupation information, updates the channel state of UE-HAP and HAP-LEO links, calculates key indicators such as instantaneous rate, latency, fairness and number of cooperative events, forms the current global network state and local observations of each agent, and caches the historical sequence of instantaneous channel state information of the most recent time slots; Step 3: Input the historical sequence of instantaneous channel state information of the target link and the dynamic characteristics of related nodes into the channel state prediction module to predict the instantaneous channel state information of the link in the next few time slots; preferably, the channel state prediction module is implemented using an FMA channel prediction module. The predicted instantaneous channel state information is added as enhanced observation information to the local state of the corresponding agent; Step 4: Each agent inputs its enhanced local observations, containing current state and predicted instantaneous channel state information, into the collaborative scheduling decision module, which outputs task offloading, bandwidth allocation, power control, and task priority decisions for its current time slot. Preferably, the collaborative scheduling decision module is implemented using the MADDPG multi-agent reinforcement learning framework. The combined actions output by all agents form the joint scheduling scheme for this time slot. Step 5: The scheduling execution and performance evaluation module implements task offloading, resource allocation and power control in the network environment according to the joint scheduling scheme, driving the actual data transmission and calculation process; at the end of the time slot, it updates the network topology, channel and resource status, and calculates the total system throughput, average latency, energy consumption, fairness index and HAP-LEO cooperative event number performance index according to the actual service completion, forming the performance feedback of this time slot; Step 6: Based on performance feedback and preset multi-objective constraints, throughput, latency, energy efficiency, fairness, task completion rate, and number of collaborative events are weighted and synthesized into a single reward signal. Penalties are applied for latency violations, decreased fairness, insufficient collaboration, and long-term agent idleness to obtain the global reward value for the current time slot. The model training and parameter management module stores the current time slot's state, joint actions, reward, and the next time slot's state as an experience sample in the experience replay pool. During the training phase, samples are periodically extracted from the experience pool to update the model parameters of the channel state prediction module and the MADDPG policy network parameters. During the online deployment phase, the model parameters are fixed, and steps 2 to 5 are executed cyclically to achieve real-time intelligent scheduling.

[0014] Compared with the prior art, the beneficial effects achieved by the present invention are as follows: 1. By introducing a channel state prediction module, the system can predict the channel state of key links in several future time slots and use the prediction results as an enhanced observation input to the collaborative scheduling decision module, making the scheduling strategy forward-looking. This alleviates the resource allocation mismatch problem caused by rapid channel time-varying and decision lag, and improves the system's adaptability and scheduling accuracy in highly dynamic LEO network environments with frequent topology switching and drastic channel fluctuations.

[0015] 2. This invention adopts a multi-agent reinforcement learning collaborative scheduling framework (preferably MADDPG) with centralized training and distributed execution, and constructs a multi-objective constrained reward function. It combines indicators such as total system throughput, average latency, energy consumption, task completion rate, fairness, and cross-layer collaboration into a single reward signal according to weights, thereby achieving joint optimization and balance of multi-objective performance at the system level.

[0016] 3. During the distributed execution phase, each agent independently outputs decisions based solely on local augmented observations, reducing reliance on real-time aggregation of global information and ensuring the system's scalability in large-scale UE / HAP / LEO scenarios. Simultaneously, the model training and parameter management module supports continuous model learning and parameter updates, enabling the system to adapt to scenarios such as changes in network parameters, business model migration, and partial node failures, thereby improving robustness and long-term operational stability.

[0017] 4. By combining channel state prediction with multi-node collaborative decision-making, the system can more intelligently select task offloading paths, allocate spectrum resources, and control transmit power, thereby improving overall transmission efficiency and load balancing under the constraints of limited bandwidth and computing resources. Attached Figure Description

[0018] Figure 1 This is a flowchart illustrating the scheduling method of the present invention.

[0019] Figure 2 This is a schematic diagram of the scheduling system architecture of the present invention. Detailed Implementation

[0020] This embodiment of a low-Earth orbit satellite resource scheduling system based on channel prediction and multi-agent collaboration includes: a network state acquisition and environment construction module, a channel state prediction module, a collaborative scheduling decision module, a scheduling execution and performance evaluation module, and a model training and parameter management module.

[0021] The channel state prediction module is preferably implemented using the FMA channel prediction module, and the cooperative scheduling decision module is preferably implemented using the MADDPG multi-agent cooperative scheduling framework.

[0022] The network state acquisition and environment construction module serves as the system's data foundation, used to construct a digital simulation environment for a three-layer network consisting of ground users (UE), high-altitude platforms (HAP), and low-Earth orbit (LEO) satellites. Based on preset LEO satellite constellation orbital parameters (such as orbital altitude, inclination, and constellation configuration), HAP deployment locations, and UE distribution, this module dynamically predicts changes in the network topology in real time; it periodically collects the operational status of each node in the network, including task queue length, remaining available bandwidth, and transmit power margin; and it monitors link quality, including the fading status of the UE-HAP link, the Doppler shift of the HAP-LEO link, and the received signal-to-noise ratio. To support subsequent prediction functions, this module maintains a sliding time window, extracting continuous channel state data from several past time slots leading backward from the current moment, constructing a standardized historical channel state sequence, and transmitting it to the channel state prediction module.

[0023] The channel state prediction module is used to predict channel state information for several future time slots in a rapidly changing environment, thereby reducing the risk of scheduling mismatch due to observation lag. This module receives historical channel state sequences and extracts features through a parallel dual-branch structure: the first branch is a Mamba state space branch, used to extract long-term, slowly changing channel features caused by the periodic motion of satellite orbits; the second branch is a multi-head self-attention mechanism branch, used to capture short-term, rapidly changing channel features caused by rain attenuation, blockage, or sudden interference. The two feature paths are fused through residual connection and layer normalization, and then mapped through a time pooling layer and a fully connected layer to output a channel prediction vector for several future time slots. This channel prediction vector is input as prediction enhancement information to the cooperative scheduling decision module.

[0024] The collaborative scheduling decision module is used to generate resource scheduling policies. This module adopts a centralized training and decentralized execution architecture, preferably modeling HAP nodes and LEO nodes as independent agents, with the UE participating in environmental interaction as a task request entity (or optionally, the UE is further modeled as an agent). Each agent receives an enhanced observation vector as input, which is obtained by concatenating local real-time observation data with the prediction data output by the channel state prediction module, allowing the agent to represent both the current state and future trends. Each agent outputs a joint scheduling action vector through the policy network. This joint scheduling action vector includes at least: user association decision (determining the HAP the UE accesses), bandwidth resource allocation (e.g., the number or proportion of subcarriers allocated), and power control (e.g., the transmit power level for HAP-LEO backhaul).

[0025] The scheduling execution and performance evaluation module is used to execute joint scheduling actions and form a closed-loop feedback. This module maps the joint scheduling action vector into network control signaling and sends it to each node, driving the equipment to adjust transmission parameters and establish communication links; and calculates reward signals during execution to guide model optimization. Preferably, the reward function is a constraint-enhanced reward function, including at least: a cooperation incentive term (incentivizing successful HAP-LEO connection establishment), a latency penalty term (imposing a penalty when task queuing or end-to-end latency exceeds a threshold), a fairness constraint term (suppressing resource monopolization based on a fairness index), and a throughput gain term (increasing the overall system transmission rate); it may also optionally include an energy consumption penalty term or a task completion rate gain term to meet multi-objective constraint requirements.

[0026] The model training and parameter management module is used to maintain the experience replay pool and update the model parameters. This module receives and stores experience samples consisting of the current state, joint action, reward, and the state at the next time step. During the training phase, it periodically samples small batches of samples from the experience replay pool and uses optimization methods such as gradient descent to update the network parameters of the channel state prediction module and the cooperative scheduling decision module, so that the scheduling strategy converges to a stable (approximate) optimal strategy.

[0027] The specific workflow and resource scheduling methods of this system are as follows: Step 1: Construct a dynamic integrated air-space-ground network environment.

[0028] Step 101: Initialize the UE-HAP-LEO three-layer network topology, configure the low-Earth orbit satellite constellation parameters, HAP deployment scheme, and UE distribution model.

[0029] Step 102: Initialize the network parameters of the channel state prediction module and the cooperative scheduling decision module, and set the hyperparameters required for training.

[0030] Step 2: Form the global state.

[0031] Step 201: In each scheduling time slot, the network status acquisition and environment construction module collects node operating status and link quality information, including at least: the task queue status of HAP nodes, remaining available bandwidth and transmit power margin, as well as the instantaneous channel status information, received signal-to-noise ratio and Doppler shift of UE-HAP link and HAP-LEO link.

[0032] Step 202: Maintain a sliding time window, extract continuous channel state data from several past time slots and normalize them to construct a standardized historical channel state sequence, which serves as the input data basis for subsequent prediction and decision-making.

[0033] Step 3: Construct the FMA channel prediction model.

[0034] Step 301: Input the historical channel state sequence obtained in step S2 into the channel state prediction module to extract the temporal change characteristics of the channel.

[0035] Step 302: The channel state prediction module adopts a parallel dual-branch structure: the historical channel state sequence is input into the Mamba state space branch to extract long-term slow-changing channel features; at the same time, the multi-head self-attention mechanism branch is input to extract short-term fast-changing channel features.

[0036] Step 303: Fuse and map the two features to output channel prediction vectors for several future time slots, which are used to characterize the future channel change trend.

[0037] Step 4: Construct the MADDPG multi-agent cooperative scheduling module.

[0038] Step 401: Model the HAP nodes and LEO nodes in the network as intelligent agents (UE participates in the interaction as a task request entity, or can be further modeled as an intelligent agent), and establish a centralized training and decentralized execution framework.

[0039] Step 402: Constructing an enhanced observation vector: Concatenate the local real-time observation data with the channel prediction vector output in step S3, and use it as the input for each agent.

[0040] Step 403: Each agent outputs a joint scheduling action vector based on the enhanced observation vector. The joint scheduling action vector includes at least: user association decision, bandwidth resource allocation, and power control.

[0041] Step 5: Model training and optimization.

[0042] Step 501: The scheduling execution and performance evaluation module issues control signaling based on the joint scheduling action vector. The network devices adjust power, allocate bandwidth resources, and establish links according to the signaling to execute data transmission.

[0043] Step 502: Monitor system performance metrics during transmission, including at least total system throughput, end-to-end latency, fairness metrics, and the number of HAP-LEO collaboration events.

[0044] Step 503: Calculate the reward signal for this time slot based on the monitoring results, and sum the weighted values ​​of throughput gain, latency penalty, fairness constraint and cooperation incentive to obtain the global reward value.

[0045] Step 504: The model training and parameter management module stores the state-action-reward-next state as an experience sample in the experience replay pool; during the training phase, it samples and updates the parameters of the collaborative scheduling decision module from the experience replay pool, and optionally, it performs joint updates on the parameters of the channel state prediction module based on the prediction error.

[0046] Step 505: Repeat steps 2 to 5 until the average cumulative reward tends to stabilize, and a stable (approximate) optimal scheduling strategy is obtained.

[0047] Step 6: Real-time scheduling and inference.

[0048] Step 601: In actual operation, the system executes step 2 in each scheduling time slot to form a global state, and generates a channel prediction vector by step 3.

[0049] Step 602: Input the enhanced observation vector into the trained collaborative scheduling decision module and output the joint scheduling action vector.

[0050] Step 603: The scheduling execution and performance evaluation module issues control signaling and executes link establishment and resource scheduling to complete the real-time inference closed loop; and can continue to transmit the running process data back to the model training and parameter management module to support online or periodic updates.

Claims

1. A low-orbit satellite resource scheduling system, characterized in that, It includes a network state acquisition and environment construction module, a channel state prediction module, a cooperative scheduling decision module, a scheduling execution and performance evaluation module, and a model training and parameter management module; The network state acquisition and environment construction module is used to construct a UE-HAP-LEO three-layer integrated air-space-ground network environment, and to provide the channel state prediction module with a multi-time slot instantaneous channel state information historical sequence, and to provide the collaborative scheduling decision module with the current global state and local observation information of each intelligent agent; The channel state prediction module is used to predict the instantaneous channel state information of key links in several future time slots in a rapidly changing and partially observable environment; and to provide the cooperative scheduling decision module with a sequence of predicted instantaneous channel state information organized by link and time slot, wherein the sequence of predicted instantaneous channel state information is added to the local state of each agent in the cooperative scheduling decision module as enhanced observation information. The collaborative scheduling decision module is used to generate multi-node collaborative task offloading, resource allocation and power control strategies based on the predicted enhanced state information, so as to achieve joint optimization of the system's multi-objective performance. The scheduling execution and performance evaluation module is used to map the joint decision given by the collaborative scheduling decision module to specific network behaviors and to provide performance evaluation feedback on the execution results. The model training and parameter management module is used to maintain the experience replay pool and model parameters during the training process.

2. The low-orbit satellite resource scheduling system according to claim 1, characterized in that, The specific implementation of the network status acquisition and environment construction module is as follows: Given the basic parameters of low-Earth orbit satellite constellation orbit, high-altitude platform deployment altitude and geographical location, ground user spatial distribution, service arrival process, radio frequency carrier frequency and total system bandwidth, the visibility relationship and topology evolution model of three types of links, UE-HAP, UE-LEO and HAP-LEO, are established to obtain the network topology structure that changes over time. For each link characteristic, a channel model including free space path loss, shadow fading, small-scale fading and Doppler shift is established to calculate the instantaneous channel state information, received signal-to-noise ratio, reachable rate and bit error rate of each link in the current time slot. Based on the current link conditions and node resources, calculate the instantaneous rate and end-to-end transmission delay of each UE-HAP link, and statistically analyze the cumulative throughput and Jain fairness index of each user, as well as the queue length, remaining computing and spectrum resource usage of each HAP node and LEO node, and the performance parameters of the number of HAP-LEO cooperative events. The topology, channel, and service information are encoded into the global network state of the current time slot and the local state that can be observed by each node; at the same time, the instantaneous channel state information sequence of several historical time slots is cached to provide input for the channel state prediction module.

3. A low-orbit satellite resource scheduling system according to claim 1, characterized in that, The channel state prediction module is specifically implemented as follows: The system receives the instantaneous channel state information sequence of the target link over the past L time slots, as well as the dynamic features related to the link, from the network state acquisition and environment construction module. The dynamic features are then normalized and encoded to form a unified time-series input representation. Channel prediction is performed by integrating the Mamba state-space model with a multi-head self-attention mechanism in a time-series modeling structure. The Mamba state-space model uses an input-related state transition mechanism to capture the evolution characteristics of instantaneous channel state information over long time scales with approximately linear computational complexity. The multi-head self-attention mechanism performs weighted modeling of the hidden sequence output by Mamba and focuses on sudden interference and rapid changes in link quality over short time scales. The two time series features are fused through residual connection, normalization and feedforward network to obtain a stable high-dimensional time series representation. The instantaneous channel state information prediction value of the target link in several time slots is obtained by time pooling and regression of the fully connected output layer.

4. A low-orbit satellite resource scheduling system according to claim 1, characterized in that, The specific implementation of the collaborative scheduling decision module is as follows: The UE, HAP, and LEO nodes are modeled as multiple agents, and a multi-agent deep deterministic policy gradient framework with centralized training and distributed execution is adopted: During the training phase, the centralized Critic network receives the global network state and the joint actions of all agents to evaluate the value; During the execution phase, the Actor network of each agent outputs actions independently based only on the local observations of its own node and the corresponding predicted instantaneous channel state information, thereby ensuring the scalability of the system in large-scale non-terrestrial network scenarios. Design corresponding state and action spaces for various intelligent agents: the state includes the service queue length of the node, the remaining computing and spectrum resources, the current instantaneous channel state information and the predicted instantaneous channel state information of the relevant links, and the load state of neighboring nodes; the actions include task offloading decisions, the allocation ratio of local or on-board computing resources among different tasks, the transmit power control of UE-HAP and HAP-LEO links, and the priority arrangement of different tasks under resource-constrained conditions. Design a multi-objective constrained reward function that combines the system's total throughput, average latency, energy consumption, task completion rate, fairness, and HAP-LEO collaborative event count performance indicators into a single scalar reward, and introduces anti-idle and anti-free-rider penalty terms. When the average latency exceeds the threshold or Jain fairness is lower than the target value, the corresponding penalty term is amplified; when the agent is in a low-cooperation state for a long time or individual agents do not participate in resource contribution for a long time, additional penalties are introduced. Based on the experience replay and soft update strategy, the parameters of the Critic network and each Actor network are iteratively updated until the cumulative reward converges, thus obtaining a stable multi-agent scheduling strategy. The output of the collaborative scheduling decision module is a joint scheduling decision for all UEs, HAPs, and LEO nodes, including the execution location and transmission path of each task, the bandwidth and power allocation scheme of each link, and the scheduling order of the task queues within each node. This decision is then implemented in the network environment by the scheduling execution and performance evaluation module.

5. A low-orbit satellite resource scheduling system according to claim 1, characterized in that, The specific implementation of the scheduled execution and performance evaluation module is as follows: Adjust the UE-HAP association, OFDM subcarrier allocation structure and HAP-LEO NOMA superposition power according to the scheduling decision, and trigger local execution or uplink offloading via HAP or LEO for selected tasks; Update the queue length, link occupancy, and energy consumption of each node, and calculate the total system throughput, average latency, fairness index, number of HAP-LEO collaborative events, and overall resource utilization efficiency performance indicators after the current time slot ends. The updated network status and performance indicators output by the scheduling execution and performance evaluation module are returned to the network status acquisition and environment construction module as the initial status for the next time slot. On the other hand, the status, actions, and rewards of this time slot are sent to the model training and parameter management module.

6. A low-orbit satellite resource scheduling system according to claim 1, characterized in that, The model training and parameter management module receives the state-action-reward-next state quadruple and the corresponding instantaneous channel state information and predicted instantaneous channel state information sequence from the environment, prediction and scheduling modules, and stores and prioritizes them according to a certain strategy. During the training phase, samples are periodically drawn from the experience pool, and the prediction error loss of the channel state prediction module and the temporal difference loss and policy gradient of MADDPG are calculated respectively. The prediction model and the multi-agent policy network are jointly or in stages updated. After training converges, the model training and parameter management module will distribute the updated model parameters to each execution node or inference platform. During online operation, only forward inference will be performed without updating the parameters.

7. A method for scheduling low-orbit satellite resources, characterized in that, This is achieved through a low-Earth orbit satellite resource scheduling system as described in any one of claims 1 to 6, comprising the following steps: Step 1: Based on the preset LEO constellation orbit parameters, HAP deployment scheme, ground user distribution and service arrival model, physical layer bandwidth and power constraints, construct a UE-HAP-LEO three-layer network, and generate the initial network topology, channel status and node resource status by the network status acquisition and environment construction module. Step 2: In each scheduling time slot, the network state acquisition and environment construction module collects the current location of each node, link visibility, instantaneous channel state information measurement values, queue length and resource occupancy information, updates the channel state of UE-HAP and HAP-LEO links, calculates key indicators such as instantaneous rate, latency, fairness and number of cooperative events, forms the current global network state and local observations of each agent, and caches the historical sequence of instantaneous channel state information of the most recent time slots; Step 3: Input the historical sequence of instantaneous channel state information of the target link and the dynamic characteristics of related nodes into the channel state prediction module to predict the instantaneous channel state information of the link in the next few time slots; add the predicted instantaneous channel state information as enhanced observation information to the local state of the corresponding agent; Step 4: Each agent inputs its enhanced local observation, which includes the current state and predicted instantaneous channel state information, into the collaborative scheduling decision module, and outputs the task offloading, bandwidth allocation, power control, and task priority decisions for the current time slot; the combined actions of all agents form the joint scheduling scheme for the current time slot. Step 5: The scheduling execution and performance evaluation module implements task offloading, resource allocation and power control in the network environment according to the joint scheduling scheme, driving the actual data transmission and calculation process; at the end of the time slot, it updates the network topology, channel and resource status, and calculates the total system throughput, average latency, energy consumption, fairness index and HAP-LEO cooperative event number performance index according to the actual service completion, forming the performance feedback of this time slot; Step 6: Based on performance feedback and preset multi-objective constraints, throughput, latency, energy efficiency, fairness, task completion rate, and number of collaborative events are weighted and synthesized into a single reward signal. Penalties are applied for latency violations, decreased fairness, insufficient collaboration, and long-term agent idleness to obtain the global reward value for the current time slot. The model training and parameter management module stores the current time slot's state, joint actions, reward, and the next time slot's state as an experience sample in the experience replay pool. During the training phase, samples are periodically extracted from the experience pool to update the model parameters of the channel state prediction module and the MADDPG policy network parameters. During the online deployment phase, the model parameters are fixed, and steps 2 to 5 are executed cyclically to achieve real-time intelligent scheduling.