A cognitive star-ground ISAC security beamforming method based on federated reinforcement learning

By employing an active reconfigurable smart surface and federated reinforcement learning-based cognitive satellite-to-ground (ISAC) secure beamforming method, the dual fading and data privacy issues in satellite-to-ground networks are addressed, achieving synergistic optimization of communication and sensing performance and enhanced signal strength.

CN122247486APending Publication Date: 2026-06-19XIAN UNIV OF POSTS & TELECOMM

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
XIAN UNIV OF POSTS & TELECOMM
Filing Date
2026-03-31
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing beamforming schemes in cognitive satellite-terrestrial networks suffer from dual fading effects, thermal noise amplification effects, and data privacy leaks, leading to a decline in communication and sensing performance.

Method used

A cognitive satellite-to-ground (ISAC) secure beamforming method based on federated reinforcement learning is adopted. By using an active reconfigurable smart surface and a federated multi-agent deep reinforcement learning framework, the method optimizes satellite transmission beamforming, ground base station transmission precoding, and reflection coefficients, constructs a joint optimization problem, and solves the optimal secure beamforming strategy in a distributed iterative manner to protect data privacy and improve signal strength.

🎯Benefits of technology

It effectively compensates for long-distance link losses, improves signal strength, reduces signaling overhead, protects user data privacy, and achieves synergistic optimization of communication and sensing performance.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122247486A_ABST
    Figure CN122247486A_ABST
Patent Text Reader

Abstract

This invention discloses a secure beamforming method for cognitive satellite-ground integrated sensing (ISAC) based on federated reinforcement learning. The method includes: constructing a joint optimization problem based on an established communication model and radar perception model, with the objective of maximizing the radar output signal-to-noise ratio (SNR) and constraints including primary user quality of service (QoS) constraints, secondary user QoS constraints, eavesdropper SNR constraints, satellite power constraints, base station power constraints, total power constraints of active reconfigurable smart surfaces, and reflective element amplitude constraints; transforming the joint optimization problem into a multi-agent Markov decision process, defining the state space, action space, and shared reward function for each satellite agent and base station agent to maximize the radar output SNR. This invention effectively overcomes the dual fading effect in the satellite-ground link, achieving secure beamforming of the integrated communication and perception system while protecting data privacy.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of communications, and more specifically, to a cognitive satellite-to-ground (ISAC) secure beamforming method based on federated reinforcement learning. Background Technology

[0002] Cognitive satellite-terrestrial networks allow secondary terrestrial networks to share spectrum resources with the main satellite network, alleviating spectrum scarcity. Integrated sensing and communication technologies unify radar sensing and data transmission capabilities on a single hardware platform, enabling the network to monitor the surrounding environment while communicating. Reconfigurable smart surfaces, as two-dimensional planes with numerous low-cost reflective elements, can reconstruct the wireless propagation environment by adjusting the phase and amplitude of incident signals, providing a new technological means to improve communication quality and sensing accuracy.

[0003] However, existing beamforming schemes in cognitive satellite-terrestrial networks suffer from the following technical drawbacks: Traditional passive reconfigurable smart surfaces face severe "double fading" effects in long-distance satellite-terrestrial links, with the reflected signal experiencing the product of path losses in both the transmitter-surface and surface-receiver segments, resulting in a reflected signal strength far below the thermal noise level, making it difficult to effectively compensate for cross-layer interference from ground base stations to satellite primary users; most existing schemes assume perfect channel state information and ignore the thermal noise amplification effect introduced by active reconfigurable smart surfaces, leading to a significant performance degradation of beamforming designs in practical systems; centralized optimization schemes require the aggregation of global channel state information to a central node for joint processing, which not only generates huge signaling overhead but also poses a risk of user data privacy leakage.

[0004] Therefore, to address the aforementioned technical issues, it is necessary to provide a cognitive satellite-to-ground ISAC security beamforming method based on federated reinforcement learning. Summary of the Invention

[0005] In view of this, this application provides a cognitive satellite-to-ground ISAC security beamforming method based on federated reinforcement learning, which is used to achieve synergistic optimization of communication and sensing performance while protecting data privacy.

[0006] To achieve the above objectives, the following solution is proposed: A cognitive satellite-to-ground ISAC security beamforming method based on federated reinforcement learning includes: Based on the cognitive satellite-terrestrial heterogeneous network, an integrated sensing and communication system model is constructed, which includes an active reconfigurable smart surface, a dual-function satellite base station, a terrestrial base station, a primary user, a secondary user, and an eavesdropping target. Based on the channel state information of satellite link, ground link, and active reconfigurable smart surface cascade link, a channel and signal reception model containing the thermal noise amplification effect of active reconfigurable smart surface is established. With the goal of maximizing the radar output signal-to-noise ratio, we jointly design satellite transmission beamforming, ground base station transmission precoding, active reconfigurable smart surface reflection coefficient, and radar receiving filter. We also construct a joint optimization problem that satisfies constraints on security rate, quality of service, satellite power, ground base station power, total power of active reconfigurable smart surface, and amplitude of reflective elements. A federated multi-agent deep reinforcement learning framework is adopted to transform the optimization problem into a multi-agent Markov decision process. The optimal safe beamforming strategy is obtained by distributed iterative solution through a federated deep deterministic policy gradient algorithm. Furthermore, the channel and signal reception model includes a direct channel from satellite to user, a direct channel from ground base station to user, a cascaded channel from ground base station to user via active reconfigurable smart surface, and a radar target echo channel; wherein, the active reconfigurable smart surface reflects a signal with an amplified thermal noise term, and the reflection coefficient includes adjustable amplitude and adjustable phase.

[0007] Furthermore, the joint optimization problem is specifically: to maximize the radar output signal-to-noise ratio under the constraints of the minimum signal-to-interference-plus-noise ratio thresholds for primary and secondary users, the maximum signal-to-interference-plus-noise ratio threshold for eavesdroppers, the maximum transmit power of satellites and ground base stations, the total power consumption limit of the active reconfigurable smart surface, and the maximum amplification of the active reconfigurable smart surface unit.

[0008] Furthermore, the federated multi-agent deep reinforcement learning framework includes: Both satellite base stations and ground base stations are set as independent intelligent agents, and the independent intelligent agents are used to maintain their respective local state space, action space and experience replay pool. The state space includes local channel state information, the previous round of actions, and historical radar signal-to-noise ratio; The action space includes continuous-value beamforming vectors, active reconfigurable smart surface amplitude and phase, and radar receiver filters; The reward function is a weighted sum of the logarithmic term of the radar signal-to-noise ratio and seven constraint penalty terms.

[0009] Furthermore, the federated learning process only exchanges model parameters between the agent and the central gateway, without transmitting raw channel data; Model aggregation adopts a weighted average method based on the number of users. After the global model is updated, it is distributed to each agent to achieve distributed collaborative decision-making.

[0010] Furthermore, the federated deep deterministic policy gradient algorithm includes: Each agent independently trains the actor network and the critic network, updating the critic network using the Bellman equation and updating the actor network using the policy gradient. Target network soft updates are used to ensure training stability; The model upload, global aggregation, and parameter broadcast are performed in fixed aggregation rounds until convergence.

[0011] Furthermore, the active reconfigurable smart surface is used to perform three functions: Suppress cross-layer interference from ground base stations to satellite primary users; Generating directional artificial noise reduces the quality of eavesdropping channels; Compensate for dual fading loss in long-distance satellite-to-ground links.

[0012] Furthermore, the non-convex constraints in the optimization problem are embedded into the reward function through a Lagrange penalty term to satisfy the joint constraints of confidentiality rate, service quality, and power budget.

[0013] Furthermore, the federated aggregation frequency is adaptively set according to the characteristics of slow fading in satellite channels and fast fading in terrestrial channels, achieving an optimal balance between local exploration and global synchronization.

[0014] Furthermore, after the federated deep deterministic policy gradient algorithm is trained offline, online inference only performs forward propagation of the neural network to meet the low latency requirements of the satellite-ground system.

[0015] Compared with the prior art, the beneficial effects of the present invention are as follows: 1. An active reconfigurable smart surface is used to replace the traditional passive reconfigurable smart surface. The integrated power amplifier built into the reflective element actively amplifies the reflected signal, effectively compensating for the product loss of the two paths from transmitter to surface and from surface to receiver, significantly improving the strength of the reflected signal. This enables the reconfigurable smart surface to play a substantial role in interference suppression and signal enhancement in high-attenuation satellite scenarios. 2. In the joint optimization problem, the thermal noise power term introduced by the active reconfigurable smart surface is explicitly modeled and incorporated into the signal-to-interference-plus-noise ratio expression of the received signal and the total power constraint. This makes the beamforming design closer to the actual physical system and avoids the performance estimation deviation and reliability degradation in actual deployment caused by simplified modeling. 3. A federated multi-agent deep reinforcement learning framework is adopted, in which satellites and ground base stations are constructed as independent agents. Each agent maintains its own state information locally and trains a deep neural network independently. It only periodically uploads model parameters to the gateway for federated aggregation, without exchanging raw channel data. This significantly reduces system signaling overhead while effectively protecting user data privacy. 4. By sharing the Lagrange penalty term in the reward function, the multi-constraint optimization problem is transformed into an unconstraint optimization problem, enabling the system to maximize radar sensing performance while meeting communication security requirements. This effectively solves the technical challenge of mutual constraints between communication and sensing functions in the cognitive satellite-ground integrated sensing and communication system. Attached Figure Description

[0016] Various other advantages and benefits will become apparent to those skilled in the art upon reading the following detailed description of preferred embodiments. The accompanying drawings are for illustrative purposes only and are not intended to limit the invention. Furthermore, the same reference numerals denote the same parts throughout the drawings. In the drawings: Figure 1 A flowchart of a cognitive satellite-to-ground ISAC security beamforming method based on federated reinforcement learning provided in an embodiment of the present invention; Figure 2 This is a schematic diagram of the cognitive satellite-ground ISAC system architecture provided in an embodiment of the present invention; Figure 3 This is a schematic diagram comparing the convergence performance of different algorithms provided in the embodiments of the present invention; Figure 4 This is a schematic diagram illustrating the radar signal-to-noise ratio convergence performance at different federated aggregation frequencies, provided in an embodiment of the present invention. Figure 5 This is a schematic diagram illustrating the relationship between radar signal-to-noise ratio and the number of reconfigurable smart surface reflective elements provided in an embodiment of the present invention. Figure 6 This is a schematic diagram illustrating the relationship between radar signal-to-noise ratio and the number of base station antennas provided in an embodiment of the present invention; Figure 7 This is a schematic diagram illustrating the relationship between radar signal-to-noise ratio and power budget of active reconfigurable smart surface provided in an embodiment of the present invention. Figure 8 A schematic diagram illustrating the relationship between radar signal-to-noise ratio and the horizontal position of a reconfigurable smart surface, provided in an embodiment of the present invention; Figure 9 This is a schematic diagram illustrating the relationship between radar signal-to-noise ratio and primary user service quality threshold provided in an embodiment of the present invention; Figure 10 This is a schematic diagram comparing the sensing beam pattern under different maximum magnification factors provided in an embodiment of the present invention. Detailed Implementation

[0017] Exemplary embodiments of the present disclosure will now be described in more detail with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be implemented in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided to enable a more thorough understanding of the present disclosure and to fully convey the scope of the disclosure to those skilled in the art. It should be noted that, unless otherwise specified, embodiments and features in the embodiments of the present invention can be combined with each other. The present invention will now be described in detail with reference to the accompanying drawings and embodiments.

[0018] Example 1 Figure 1-2 This invention provides a cognitive satellite-to-ground (ISAC) security beamforming method based on federated reinforcement learning, as an embodiment of the present invention. (Refer to...) Figure 1 The method may include the following steps: Step S100: Based on the cognitive satellite-terrestrial heterogeneous network, construct an integrated perception and communication system model containing a source RIS, a dual-function satellite base station, a ground base station, a primary user, a secondary user, and an eavesdropping target.

[0019] Specifically, refer to Figure 2 As can be seen, this embodiment considers a cognitive satellite-ground integrated sensing and communication network assisted by an active reconfigurable smart surface. In this system, there are K single-antenna primary users, one single-antenna secondary user, and one single-antenna eavesdropper serving as the sensing target. The satellite is equipped with N transceiver antennas, simultaneously performing communication transmission and radar sensing functions. The ground base station is equipped with N_t antennas to provide services to the secondary user. The active reconfigurable smart surface is deployed near the user and contains M reflective elements, each equipped with an integrated power amplifier capable of actively amplifying the reflected signal.

[0020] Step S110: Based on the channel state information of the satellite link, ground link, and active reconfigurable smart surface cascade link, establish a channel and signal reception model containing the thermal noise amplification effect of the active reconfigurable smart surface.

[0021] Specifically, the satellite channel between the satellite and the user is modeled using the shadowed Ricean fading model. , Let represent the channel vectors from the satellite to the k-th primary user, secondary user, and eavesdropper, respectively. The ground channels between the ground base station and each user include direct link channels and cascaded link channels via active reconfigurable smart surfaces. Let These represent the direct link channel vectors from the base station to the k-th primary user, secondary user, and eavesdropper, respectively. This represents the channel matrix from the base station to the active reconfigurable smart surface. Let represent the channel vectors from the active reconfigurable smart surface to the k-th primary user, secondary user, and eavesdropper, respectively.

[0022] The reflection coefficient matrix of an active reconfigurable smart surface is represented as follows: ,in Let m be the amplitude amplification factor of the m-th reflecting element. Let be the phase shift of the m-th reflecting element, and satisfy . This is the preset maximum amplitude threshold.

[0023] Because the reflective elements of active reconfigurable smart surfaces have an amplification effect, the thermal noise they introduce must be considered. Let n I ~ This represents the dynamic noise at the active reconfigurable smart surface, where This represents the thermal noise power. The additive white Gaussian noise at each user location is expressed as... ,in .

[0024] The combined radar communication signal transmitted by the satellite is represented as follows: ,in Contains information symbols for K master users, satisfying For the corresponding communication beamforming matrix; It contains N independent radar waveforms, satisfying Let be the radar beamforming matrix. This represents the overall transmission beamforming matrix of the satellite.

[0025] The base station sends the desired signal to the secondary user. ,satisfy Employing transmit beamforming vectors Perform precoding.

[0026] Define the composite channel between the base station and the k-th primary user as follows: The composite channel with the secondary user is The composite channel with the eavesdropper is .

[0027] Based on the above model, the received signals at the k-th primary user, secondary user, and eavesdropper are respectively represented as: in, It is the overall symbol vector.

[0028] Accordingly, the signal-to-interference-plus-noise ratio (SIR) of the k-th primary user is calculated as follows: Among them, W k Let W be the k-th column.

[0029] The signal-to-interference-plus-noise ratio (SIN / N) of the eavesdropping signal to the k-th master user is: The signal-to-interference-plus-noise ratio for secondary users is: For radar sensing capabilities, the radar echo signal received by the satellite is filtered by a receiving filter. After processing, the radar output signal-to-noise ratio is: in, This represents the expected value of the target's radar cross section.

[0030] Step S120: With maximizing the radar output signal-to-noise ratio as the optimization objective, jointly design satellite transmission beamforming, ground base station transmission precoding, active RIS reflection coefficient, and radar receiving filter, and construct a joint optimization problem that satisfies the constraints of security rate, quality of service, satellite power, ground base station power, active RIS total power, and reflection unit amplitude.

[0031] Specifically, the objective of this embodiment is to maximize the radar output signal-to-noise ratio γ while satisfying all system constraints. r Taking into account the quality of service requirements of primary users, secondary users, the upper limit of signal-to-interference-plus-noise ratio for eavesdroppers, the power budgets of satellites and base stations, the total power budget of active reconfigurable smart surfaces, and the amplitude constraints of reflective elements, the following joint optimization problem is constructed: Where, η k and η s Preset signal-to-interference-plus-noise ratio (SINORR) thresholds for primary and secondary users, η e,k P is the preset upper limit of the signal-to-interference-plus-noise ratio for the eavesdropper. BS and P SAT The preset power budgets for base stations and satellites are respectively, P RIS For the preset power budget of the active reconfigurable smart surface, τ max This is the preset maximum threshold for the amplitude of the reflective element.

[0032] The aforementioned optimization problem exhibits non-convexity and high coupling between variables, making it difficult to solve effectively using traditional optimization methods. Therefore, this embodiment transforms problem P1 into a multi-agent Markov decision process and employs a federated deep deterministic policy gradient algorithm for solution.

[0033] Step S130: Using a federated multi-agent deep reinforcement learning framework, the optimization problem is transformed into a multi-agent Markov decision process. The optimal safe beamforming strategy is obtained by distributed iterative solution through a federated deep deterministic policy gradient algorithm.

[0034] To solve the aforementioned non-convex optimization problem, this embodiment transforms it into a multi-agent Markov decision process. Given the distributed nature of the system and the need to protect channel privacy, a dual-agent framework consisting of a satellite agent and a ground base station agent is defined. The system is composed of tuples... It runs on the defined discrete time step.

[0035] 1) State-space design The state space provides the agent with sufficient global environmental information for decision-making. This embodiment divides the state space into the satellite agent's observation state. and base station intelligent agent observation status .

[0036] For a satellite agent, the state at time step t This includes satellite-related channel state information and action information from the previous time step, defined as: For the base station agent, state This includes channel state information for direct ground links and cascaded links, as well as the local configuration of the previous time slot, defined as: 2) Motion space design The action space corresponds to the decision variables in the optimization problem. Since the deep deterministic policy gradient algorithm can handle continuous action spaces, this embodiment decomposes the complex numerical beamforming matrix into real and imaginary parts for output.

[0037] The actions of the satellite agent are defined as the satellite transmit beamforming matrix: The actions of the base station agent include the base station transmit beamforming vector, the active reconfigurable smart surface reflectance matrix, and the radar receive filter: To handle the coupling constraints of the reflection coefficient, the action vector of the active reconfigurable smart surface is defined as follows: = The elements are generated by an agent network with a Tanh activation function. From the network output... The mapping formula to physical parameters is: This linear transformation ensures that the generated actions always satisfy the amplitude and phase range constraints.

[0038] 3) Reward function design This embodiment constructs the reward function by combining the original optimization objective with seven system constraints using the Lagrange multiplier method. At time step t, the total system reward is defined as: in, and These are weighted hyperparameters used to balance the order of magnitude between the optimization objective and the penalty terms. The penalty terms corresponding to each constraint are as follows: Penalties for breach of service quality by primary users: Penalties for service quality breaches by secondary users: Penalty for exceeding the interference-to-noise ratio limit by eavesdroppers: Base station power exceeding limit penalty items: Satellite power exceeding limits penalty items: Total power exceeding limit penalty for active reconfigurable smart surfaces: Penalty for exceeding the amplitude limit of reflective elements: The goal of the learning process is to determine the optimal strategy. To maximize the expected cumulative reward starting from any state s: The cumulative reward is the sum of the discount rewards: 4) Federated Deep Deterministic Policy Gradient Algorithm This embodiment employs a federated deep deterministic policy gradient algorithm to solve the aforementioned Markov decision process. Each agent maintains a policy network. and a value network The federated learning framework enables collaborative learning by leveraging data distributed across satellites and base stations. Each agent processes and retains its state information locally, transmitting only periodic model updates to the network gateway.

[0039] The global objective function is defined as: in, The aggregate weight is calculated using the following formula: , This represents the number of users in region j, for satellite regions. For ground base station areas .

[0040] The value network is updated by minimizing the Bellman error, with the loss function being: Among them, the target value The calculation is as follows: The target network parameters are updated via soft update: ,in, This is the soft update coefficient.

[0041] The agent network is updated using a deterministic policy gradient to maximize the expected cumulative reward: The gradient guides the neural network to adjust and optimize variables, maximizing the radar output signal-to-noise ratio while satisfying various constraints.

[0042] Every preset aggregation period T agg Each agent uploads its local model parameters to the gateway. The gateway then performs a weighted average of the parameters based on the aggregation weights to generate global model parameters. The gateway broadcasts global model parameters to each agent, and each agent uses the global parameters to update its local model, thus achieving distributed collaborative learning.

[0043] Example 2 This embodiment further defines the functionality of the active RIS based on Embodiment 1 described above. In this embodiment, the active RIS is used to perform three functions: 1) Suppress cross-layer interference from ground base stations to satellite primary users. Active RIS actively eliminates the impact of strong direct path signals from ground base stations on primary users by generating destructive interference, thereby ensuring satellite reception quality.

[0044] 2) Generating directional artificial noise reduces the quality of the eavesdropping channel. Active RIS reduces the quality of the eavesdropping channel and enhances physical layer security by converting ground communication signals into beneficial "green artificial noise" directed at the eavesdropper without consuming additional interference power.

[0045] 3) Compensating for dual fading losses in long-distance satellite-to-ground links. Active RIS compensates for the product effect of path losses in both the transmitter-RIS and RIS-receiver segments by actively amplifying the reflected signal, ensuring reliable connectivity and sensing performance.

[0046] Example 3 This embodiment, based on Embodiment 1 above, further restricts the penalty term in the reward function. This embodiment embeds the non-convex constraints in the optimization problem into the reward function through a Lagrange penalty term to satisfy the joint constraints of confidentiality rate, service quality, and power budget. Specifically, the penalty term includes penalties for primary user service quality violations, secondary user service quality violations, eavesdropper signal-to-interference-plus-noise ratio (SNR) exceeding limits, base station power exceeding limits, satellite power exceeding limits, active RIS total power exceeding limits, and reflector amplitude exceeding limits, which are weighted and summed to form the total penalty term.

[0047] Example 4 This embodiment, based on Embodiment 1 above, adaptively sets the federated aggregation frequency. In this embodiment, the federated aggregation frequency is adaptively set according to the characteristics of slow fading in satellite channels and fast fading in terrestrial channels. Specifically, since satellite channels change slowly while terrestrial channels change rapidly, this embodiment uses a longer aggregation interval (e.g., ρ=200) to allow the agent to better adapt to its specific local channel dynamics before averaging its parameters, thereby achieving an optimal balance between local exploration and global synchronization.

[0048] Example 5 This embodiment, based on Embodiment 1 above, explains the efficiency of online inference. After the federated deep deterministic policy gradient algorithm is trained offline, online inference only performs the forward propagation of the neural network, with a computational complexity of only [missing information]. It involves simple matrix-vector multiplication, which is much simpler than traditional iterative optimization algorithms, and is used to meet the low latency requirements of satellite-ground systems.

[0049] To verify the effectiveness of the proposed federated multi-agent deep deterministic policy gradient algorithm in this embodiment, simulation experiments were conducted. The simulation was implemented using Python 3.8 and PyTorch, and the Adam optimizer was used to update the neural network parameters.

[0050] Regarding the federated deep deterministic policy gradient architecture, both the satellite agent and the base station agent use the same structure in their actor and critic networks. Each network consists of an input layer, two hidden layers each with 256 neurons, and an output layer corresponding to the action space dimension. The hidden layers use the ReLU activation function, and the output layer of the actor network uses the hyperbolic tangent function to restrict actions to feasible ranges. The learning rate for both the actor and critic networks is set to 10. -3 Discount factor λ = 0.9, soft update parameters ζ=0.01. During training, the batch size B=64 and the aggregation frequency T are set. agg =100 and replay buffer size D=10 4 Total E max =2000 rounds, each round consists of T=10 time slots.

[0051] Regarding the wireless environment, the system operates in the Ka band with a carrier frequency of 28 GHz and a system bandwidth of 500 MHz. The geostationary satellite is located at an altitude of 35,786 km and is equipped with a uniform planar array consisting of N=4 antennas. The number of primary users is set to K=3. The ground base station is equipped with N=4 antennas, serving one secondary user and detecting potential targets. The active reconfigurable smart surface is equipped with M=30 reflective elements, and the thermal noise power introduced by the active components is set to... =-70dBm. The noise power of all receiving nodes is set to... =-110dBm. Base station power budget P BS =30dBm, satellite power budget P sAT =40dBm, Active Reconfigurable Smart Surface Power Budget P RIS =20dBm. The signal-to-interference-plus-noise ratio (SIR) requirement for primary and secondary users is set to 20dBm. =4dB, the intercept-to-noise ratio threshold for the eavesdropper's signal is set to... =2dB.

[0052] To verify the effectiveness and superiority of the federated deep deterministic strategy gradient scheme proposed in this embodiment, the following benchmark schemes are included for comparative analysis: (1) Dual-delay deep deterministic policy gradient scheme: adopts truncated double Q learning and delayed policy update mechanism to reduce the inherent overestimation bias in traditional deep deterministic policy gradient.

[0053] (2) Traditional deep deterministic strategy gradient scheme: The standard single agent deep deterministic strategy gradient method is used to jointly optimize beamforming and reflection coefficient.

[0054] (3) Passive reconfigurable smart surface solution: The active reconfigurable smart surface is replaced by the traditional passive reconfigurable smart surface, and the amplitude of the reflective element is strictly limited to 1 to eliminate the thermal noise introduced by the active component.

[0055] (4) Non-reconfigurable smart surface solution: Remove reconfigurable smart surfaces, and satellites and base stations only use direct links to serve users and sense targets.

[0056] See Figure 3-10 It can be seen that, Figure 3 The convergence behavior of the radar output signal-to-noise ratio (SNR) relative to the training epochs is demonstrated. Results show that the federated deep deterministic policy gradient with an active reconfigurable smart surface scheme achieves a significantly higher convergence SNR, representing a substantial performance improvement compared to the passive reconfigurable smart surface scheme and the baseline scheme without reconfigurable smart surfaces. The federated deep deterministic policy gradient proposed in this embodiment stabilizes rapidly at approximately 800 epochs, while the traditional deep deterministic policy gradient exhibits the slowest convergence speed and persistent oscillations throughout the training process.

[0057] Figure 4 The impact of federated aggregation frequency on learning efficiency was demonstrated. The results showed that a lower aggregation frequency (aggregation interval ρ=200) produced optimal performance, exhibiting the fastest convergence speed and stabilizing at a peak signal-to-noise ratio of approximately 36 dB as early as the 400th round.

[0058] Figure 5The number M of reconfigurable smart surface reflective elements and the active amplification factor τ are shown. max Impact on sensing performance. The active reconfigurable smart surface architecture delivers a significant performance improvement by enhancing channel gain to achieve a high-quality virtual line-of-sight link.

[0059] Figure 6 This demonstrates the radar output signal-to-noise ratio as a function of the number of base station antennas N. t The active reconfigurable smart surface scheme proposed in this embodiment exhibits significantly superior performance compared to passive reconfigurable smart surfaces and non-reconfigurable smart surface benchmarks.

[0060] Figure 7 The relationship between radar output signal-to-noise ratio and total power budget is demonstrated. The active reconfigurable smart surface architecture exhibits significantly superior power utilization efficiency compared to the passive reconfigurable smart surface baseline.

[0061] Figure 8 The relationship between the radar output signal-to-noise ratio and the horizontal distance between the reconfigurable smart surface and the perceived target is shown. The active reconfigurable smart surface scheme exhibits a significantly more stable trend.

[0062] Figure 9 This demonstrates the relationship between the radar output signal-to-noise ratio and the primary user service quality threshold η. k The functional relationship is shown. The active reconfigurable smart surface scheme proposed in this embodiment is significantly superior to the passive reconfigurable smart surface scheme.

[0063] Figure 10 The comparison of sensing beam gain under different maximum amplification factors is presented. The active reconfigurable smart surface-assisted architecture exhibits a significantly higher peak gain compared to the passive reconfigurable smart surface benchmark.

[0064] The simulation results above demonstrate that the cognitive satellite-ground integrated sensing and communication security beamforming method based on federated reinforcement learning proposed in this embodiment can effectively overcome the dual fading effect in the satellite-ground link, and achieve secure beamforming of the integrated communication and sensing system while protecting data privacy.

[0065] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program goods. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program goods embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0066] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program goods according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0067] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0068] These computer program instructions can also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0069] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit it. Although the present invention has been described in detail with reference to the above embodiments, those skilled in the art should understand that modifications or equivalent substitutions can still be made to the specific implementation of the present invention. Any modifications or equivalent substitutions that do not depart from the spirit and scope of the present invention should be covered within the scope of protection of the claims of the present invention.

Claims

1. A cognitive satellite-to-ground ISAC security beamforming method based on federated reinforcement learning, characterized in that, include: Based on the cognitive satellite-terrestrial heterogeneous network, an integrated sensing and communication system model is constructed, which includes an active reconfigurable smart surface, a dual-function satellite base station, a terrestrial base station, a primary user, a secondary user, and an eavesdropping target. Based on the channel state information of satellite link, ground link, and active reconfigurable smart surface cascade link, a channel and signal reception model containing the thermal noise amplification effect of active reconfigurable smart surface is established. With the goal of maximizing the radar output signal-to-noise ratio, we jointly design satellite transmission beamforming, ground base station transmission precoding, active reconfigurable smart surface reflection coefficient, and radar receiving filter. We also construct a joint optimization problem that satisfies constraints on security rate, quality of service, satellite power, ground base station power, total power of active reconfigurable smart surface, and amplitude of reflective elements. A federated multi-agent deep reinforcement learning framework is adopted to transform the optimization problem into a multi-agent Markov decision process, and the optimal safe beamforming strategy is obtained by distributed iterative solution through a federated deep deterministic policy gradient algorithm.

2. The cognitive satellite-to-ground ISAC security beamforming method based on federated reinforcement learning according to claim 1, characterized in that, The channel and signal reception model includes a direct channel from satellite to user, a direct channel from ground base station to user, a cascaded channel from ground base station to user via active reconfigurable smart surface, and a radar target echo channel; wherein, the active reconfigurable smart surface reflects a signal with an amplified thermal noise term, and the reflection coefficient includes adjustable amplitude and adjustable phase.

3. The cognitive satellite-to-ground ISAC security beamforming method based on federated reinforcement learning according to claim 1, characterized in that, The joint optimization problem is specifically: to maximize the radar output signal-to-noise ratio (SNR) while satisfying the minimum SNR thresholds for primary and secondary users, the maximum SNR threshold for eavesdroppers, the maximum transmit power of satellites and ground base stations, the total power consumption limit of the active reconfigurable smart surface, and the maximum amplification of the active reconfigurable smart surface unit.

4. The cognitive satellite-to-ground ISAC security beamforming method based on federated reinforcement learning according to claim 1, characterized in that, The federated multi-agent deep reinforcement learning framework includes: Both satellite base stations and ground base stations are set as independent intelligent agents, and the independent intelligent agents are used to maintain their respective local state space, action space and experience replay pool. The state space includes local channel state information, the previous round of actions, and historical radar signal-to-noise ratio; The action space includes continuous-value beamforming vectors, active reconfigurable smart surface amplitude and phase, and radar receiver filters; The reward function is a weighted sum of the logarithmic term of the radar signal-to-noise ratio and seven constraint penalty terms.

5. The cognitive satellite-to-ground ISAC security beamforming method based on federated reinforcement learning according to claim 4, characterized in that, The federated learning process only exchanges model parameters between the agent and the central gateway, without transmitting raw channel data; Model aggregation adopts a weighted average method based on the number of users. After the global model is updated, it is distributed to each agent to achieve distributed collaborative decision-making.

6. The cognitive satellite-to-ground ISAC security beamforming method based on federated reinforcement learning according to claim 1, characterized in that, The federated deep deterministic policy gradient algorithm includes: Each agent independently trains the actor network and the critic network, updating the critic network using the Bellman equation and updating the actor network using the policy gradient. Target network soft updates are used to ensure training stability; The model upload, global aggregation, and parameter broadcast are performed in fixed aggregation rounds until convergence.

7. The cognitive satellite-to-ground ISAC security beamforming method based on federated reinforcement learning according to claim 1, characterized in that, Active reconfigurable smart surfaces are used to perform three functions: Suppress cross-layer interference from ground base stations to satellite primary users; Generating directional artificial noise reduces the quality of eavesdropping channels; Compensate for dual fading loss in long-distance satellite-to-ground links.

8. The cognitive satellite-to-ground ISAC security beamforming method based on federated reinforcement learning according to claim 1, characterized in that, Non-convex constraints in the optimization problem are embedded into the reward function through a Lagrange penalty term to satisfy the joint constraints of confidentiality rate, quality of service, and power budget.

9. A cognitive satellite-to-ground ISAC security beamforming method based on federated reinforcement learning according to claim 6, characterized in that, The federated aggregation frequency is adaptively set according to the characteristics of slow fading in satellite channels and fast fading in terrestrial channels, achieving an optimal balance between local exploration and global synchronization.

10. A cognitive satellite-to-ground ISAC security beamforming method based on federated reinforcement learning according to claim 1, characterized in that, After the federated deep deterministic policy gradient algorithm is trained offline, online inference only performs the forward propagation of the neural network to meet the low latency requirements of the satellite-ground system.