Underwater resource-constrained frequency hopping communication-oriented interference resource allocation method
By constructing an underwater adaptive interference environment and optimizing interference strategies using Actor and Critic networks, the problems of channel characteristic perception and interference adjustment under underwater resource constraints are solved, thereby improving interference effectiveness and strategy stability.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- QINGDAO UNIV OF SCI & TECH
- Filing Date
- 2026-05-09
- Publication Date
- 2026-06-19
AI Technical Summary
In underwater acoustic communication countermeasures environments, existing technologies struggle to perceive channel characteristics and dynamically adjust interference frequency, bandwidth, and power under resource-constrained conditions, resulting in insufficient interference effectiveness.
An underwater adaptive interference environment is constructed, and interference strategies are optimized using Actor and Critic networks. Spectral features are extracted through probability threshold binarization mapping and sliding window operators. By combining near-end policy optimization algorithms and circuit breaking mechanisms, the interference frequency, bandwidth, and power are dynamically adjusted, and a composite reward function is constructed for training.
It improves the success rate of interference tasks, reduces energy consumption, and enhances the model's generalization ability and policy stability under resource-constrained conditions.
Smart Images

Figure CN122247455A_ABST
Abstract
Description
Technical Field
[0001] This invention discloses a method for allocating interference resources for underwater resource-constrained frequency hopping communication, belonging to the field of communication countermeasures and underwater acoustic communication technology. Background Technology
[0002] In underwater acoustic communication countermeasures, signal propagation losses vary significantly across different frequency bands due to frequency-selective fading. Traditional jamming methods typically rely on accurate prediction of the communicator's frequency hopping time series. However, in complex and variable underwater acoustic environments, jammers often face challenges such as non-stationary observation information, limited energy resources, and high feedback delays. Therefore, under resource-constrained conditions, sensing channel characteristics and dynamically adjusting jamming frequency, bandwidth, and power is crucial for improving jamming effectiveness. Summary of the Invention
[0003] The purpose of this invention is to provide an interference resource allocation method for underwater resource-constrained frequency hopping communication, so as to solve the problem in the prior art of how to sense channel characteristics and dynamically adjust interference frequency, bandwidth and power under resource-constrained conditions.
[0004] Interference resource allocation methods for underwater resource-constrained frequency-hopping communication include: S1. Construct an underwater adaptive interference environment, including setting the power spectral density of the interfering party's transmission to a dual power level, including a low power level and a high power level. Based on the power spectral density of the receiving end of the communicating party and the effective interference threshold, construct the interference success criterion. Use probability threshold binarization mapping to convert the frequency point fading probability into spectral features. Concatenate the spectral features and constraint indicators through state vectors to form the original state vector. S2. Based on the original state vector, the local density features under each power spectral density level are extracted using the sliding window operator. The high power level action margin and dynamic bandwidth pressure index are calculated according to the current resource boundary to construct an enhanced feature space. S3. Introduce the Actor network and Critic network from the near-end policy optimization algorithm to enhance the feature space input network. The Actor network outputs the probability distribution of the interference action, and the Critic network outputs the value estimate of the current state. The three-dimensional action is sampled from the probability distribution of the interference action, and the three-dimensional action includes the frequency index, bandwidth and power spectral density level. S4. Set up a circuit breaker mechanism. If the circuit breaker is triggered, the interference task fails. If the circuit breaker is not triggered, determine the hit frequency point based on the spectrum characteristics, combine it with the updated constraint index, form the original state vector of the next state, and then execute step S2 to obtain the enhanced feature space of the next state. S5. A composite reward function is constructed by linearly weighting the three parts of hit incentive, resource consumption and task evaluation. Based on the time step, a state sequence, action sequence, reward sequence and state sequence of the next state are constructed. S6. Introduce a near-end policy optimization pruning objective function to train the Actor network and Critic network into a neural network. After training, the optimal interference policy model is obtained.
[0005] S1 includes, S1.1, where the power spectral density of the interfering party's transmission is... , , This is the low power setting. For the high-power setting, the distance between the interfering party and the receiving end of the communicating party is set to [value missing]. The interference frequency is Propagation loss for: ; In the formula, The absorption coefficient of seawater increases significantly with increasing frequency; Power spectral density of the interfering party at the receiving end of the communicating party for: ; Set effective interference threshold The criteria for successful interference are: .
[0006] S1 includes, S1.2, setting a preset success probability threshold. and ,and ,set up Corresponding probabilistic decay parameter ,set up Corresponding probabilistic decay parameter ,satisfy ; For each frequency point in the frequency hopping set of the communicating party , For frequency point index, , Let be the total number of frequency points, and define the spectral characteristics. For the first Each frequency point The expected interference In order to meet the expected interference, The expected interference is not met; assuming the spectral characteristics... For the first Each frequency point The expected interference In order to meet the expected interference, The expected interference was not met; like ,make ;like ,make ; like ,make ;like ,make ; Will and Stacked row by row, forming A two-dimensional spectral feature array; Real-time collection of task constraint metrics, including the number of remaining interference steps. Target margin to be interfered with The total number of frequencies successfully jammed by the current jamming mission. and normalized remaining bandwidth resources ; Construct the original state vector : .
[0007] S2 includes the use of the sliding window operator. Extracting density features at each power spectral density level along the frequency axis : ; In the formula, for The spectral feature array in the data. A set of frequency indices for window coverage. for The index is The element value, For frequency point index, ; set up for The corresponding density features, let for Corresponding density features; Calculate the high-power gear operating margin based on real-time resource boundaries. : ; In the formula, For the energy consumption of a single high-power attack, This represents the current resource boundary; Calculate the dynamic bandwidth pressure index : ; Constructing an enhanced feature space : .
[0008] S3 includes obtaining three-dimensional actions by randomly sampling from the probability distribution of interference actions. , , For frequency point index, , For bandwidth, This is the power spectral density setting; At each time step Underwater adaptive interference environment receiver Calculate the current cumulative resource loss. : ; In the formula, For the first The bandwidth accumulated in this step For the first The energy weighting coefficient corresponding to each accumulated power level.
[0009] S4 includes setting a circuit breaker mechanism, including setting resource thresholds. ,like This triggers the circuit breaker mechanism, determining that the interference mission has failed. like If the circuit breaker mechanism is not triggered, the following operations will be performed: like ,choose ;like ,choose ; The frequency range covered by the action is from Start continuous Each frequency point, the set of action coverage locations is Check the spectral characteristics corresponding to the action coverage location set. If the spectral characteristic is 1, the current frequency point is defined as a hit; if the spectral characteristic is 0, the current frequency point is defined as a miss. Count the number of hits among the covered frequency points. Hit frequency points at and Clear to zero; renew , , The latest Integrate into the internal state, and calculate the next state. : ; The next state Then, step S2 is executed to obtain the enhanced feature space of the next state.
[0010] S5 includes, S5.1, setting the target coverage threshold. and task completion rewards ,according to Calculate current coverage ; when hour, ; When the remaining steps are 0, and hour, = , As an additional penalty for failure, The penalty coefficient for each remaining frequency point; Through hit incentives Resource depletion and task evaluation A composite reward function is constructed using three linearly weighted parts. : ; In the formula, This is the gain coefficient. As a penalty weight, This is the energy weighting coefficient corresponding to the power level.
[0011] S5 includes, S5.2, based on time steps Construct a state sequence Constructing action sequences Construct a reward sequence Construct the state sequence for the next state. , This is the preset number of sampling steps.
[0012] S6 includes, in S6.1, introducing a proximal policy optimization pruning objective function to update the network parameters of the Actor and Critic networks, including introducing probability ratio constraints, assuming the old Actor network with fixed parameters... Suppose the Actor network currently being updated... For the first Each time step and Define probability ratio : ; In the formula, For the parameters of the Actor network, For the first Each time step , For the first Each time step ; Set the cropping threshold ,Will Limited to the range Inside: ; In the formula, It means that when hour, Values ,when hour, Values ; Using state sequences, action sequences, reward sequences, and next state sequences, according to time steps Calculate the advantage function sequentially : ; ; In the formula, For the value estimation of the current state, As a discount factor, For parameters to estimate the generalized dominance; At that time, add the corresponding The probability of choosing, At that time, reduce the corresponding The probability of choosing, At that time, the corresponding The probability of selection remains unchanged.
[0013] S6 includes S6.2, based on and Construct the pruning objective function for near-end policy optimization: ; In the formula, For time step Expectations; Introducing value function loss With policy entropy term Construct the total loss function : ; In the formula, For the network parameters of the Critic network, The value loss coefficient, This is the entropy regularization coefficient; Update simultaneously using gradient descent. and Set a training step threshold Training steps reached When the time is right, stop training and output the trained Actor network, which is the optimal interference policy model.
[0014] Compared to existing technologies, this invention offers the following advantages: It effectively solves the problems of low allocation efficiency and unstable strategy evolution of underwater interference resources under dual constraints of bandwidth and power. By extracting spectral density maps and physical resource margin signals using a state guidance layer, this invention can accurately perceive the local coherence of the underwater acoustic channel and the task boundary. It also innovatively introduces a PPO pruning mechanism and a composite reward function, thereby capturing the strategy details in the interference task more accurately and deeply, providing strong support for the adaptive generation of interference strategies. Through the injection of multi-dimensional guidance signals, it can adaptively adjust decision-making behavior for different resource pressure states, improving the success rate of interference tasks while reducing energy consumption and enhancing the model's generalization ability. Attached Figure Description
[0015] Figure 1 This is an overall flowchart of one embodiment of the present invention; Figure 2 This is a schematic diagram of the task of this invention; Figure 3 This is a structural diagram of the intelligent agent of the present invention; Figure 4 This is a comparison chart of power consumption results between the method of this invention and other methods; Figure 5 This is a comparison chart of the reward function results of the method of this invention with those of other methods; Figure 6 This is a comparison chart of the interference success rates of the method of this invention with other methods. Detailed Implementation
[0016] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention are described clearly and completely below. Obviously, the described embodiments are only some, not all, of the embodiments of this invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.
[0017] Interference resource allocation methods for underwater resource-constrained frequency-hopping communication include: S1. Construct an underwater adaptive interference environment, including setting the power spectral density of the interfering party's transmission to a dual power level, including a low power level and a high power level. Based on the power spectral density of the receiving end of the communicating party and the effective interference threshold, construct the interference success criterion. Use probability threshold binarization mapping to convert the frequency point fading probability into spectral features. Concatenate the spectral features and constraint indicators through state vectors to form the original state vector. S2. Based on the original state vector, the local density features under each power spectral density level are extracted using the sliding window operator. The high power level action margin and dynamic bandwidth pressure index are calculated according to the current resource boundary to construct an enhanced feature space. S3. Introduce the Actor network and Critic network from the near-end policy optimization algorithm to enhance the feature space input network. The Actor network outputs the probability distribution of the interference action, and the Critic network outputs the value estimate of the current state. The three-dimensional action is sampled from the probability distribution of the interference action, and the three-dimensional action includes the frequency index, bandwidth and power spectral density level. S4. Set up a circuit breaker mechanism. If the circuit breaker is triggered, the interference task fails. If the circuit breaker is not triggered, determine the hit frequency point based on the spectrum characteristics, combine it with the updated constraint index, form the original state vector of the next state, and then execute step S2 to obtain the enhanced feature space of the next state. S5. A composite reward function is constructed by linearly weighting the three parts of hit incentive, resource consumption and task evaluation. Based on the time step, a state sequence, action sequence, reward sequence and state sequence of the next state are constructed. S6. Introduce a near-end policy optimization pruning objective function to train the Actor network and Critic network into a neural network. After training, the optimal interference policy model is obtained.
[0018] S1 includes, S1.1, where the power spectral density of the interfering party's transmission is... , , This is the low power setting. For the high-power setting, the distance between the interfering party and the receiving end of the communicating party is set to [value missing]. The interference frequency is Propagation loss for: ; In the formula, The absorption coefficient of seawater increases significantly with increasing frequency; Power spectral density of the interfering party at the receiving end of the communicating party for: ; Set effective interference threshold The criteria for successful interference are: .
[0019] S1 includes, S1.2, setting a preset success probability threshold. and ,and ,set up Corresponding probabilistic decay parameter ,set up Corresponding probabilistic decay parameter ,satisfy ; For each frequency point in the frequency hopping set of the communicating party , For frequency point index, , Let be the total number of frequency points, and define the spectral characteristics. For the first Each frequency point The expected interference In order to meet the expected interference, The expected interference is not met; assuming the spectral characteristics... For the first Each frequency point The expected interference In order to meet the expected interference, The expected interference was not met; like ,make ;like ,make ; like ,make ;like ,make ; Will and Stacked row by row, forming A two-dimensional spectral feature array; Real-time collection of task constraint metrics, including the number of remaining interference steps. Target margin to be interfered with The total number of frequencies successfully jammed by the current jamming mission. and normalized remaining bandwidth resources ; Construct the original state vector : .
[0020] S2 includes the use of the sliding window operator. Extracting density features at each power spectral density level along the frequency axis : ; In the formula, for The spectral feature array in the data. A set of frequency indices for window coverage. for The index is The element value, For frequency point index, ; set up for The corresponding density features, let for Corresponding density features; Calculate the high-power gear operating margin based on real-time resource boundaries. : ; In the formula, For the energy consumption of a single high-power attack, This represents the current resource boundary; Calculate the dynamic bandwidth pressure index : ; Constructing an enhanced feature space : .
[0021] S3 includes obtaining three-dimensional actions by randomly sampling from the probability distribution of interference actions. , , For frequency point index, , For bandwidth, This is the power spectral density setting; At each time step Underwater adaptive interference environment receiver Calculate the current cumulative resource loss. : ; In the formula, For the first The bandwidth accumulated in this step For the first The energy weighting coefficient corresponding to each accumulated power level.
[0022] S4 includes setting a circuit breaker mechanism, including setting resource thresholds. ,like This triggers the circuit breaker mechanism, determining that the interference mission has failed. like If the circuit breaker mechanism is not triggered, the following operations will be performed: like ,choose ;like ,choose ; The frequency range covered by the action is from Start continuous Each frequency point, the set of action coverage locations is Check the spectral characteristics corresponding to the action coverage location set. If the spectral characteristic is 1, the current frequency point is defined as a hit; if the spectral characteristic is 0, the current frequency point is defined as a miss. Count the number of hits among the covered frequency points. Hit frequency points at and Clear to zero; renew , , The latest Integrate into the internal state, and calculate the next state. : ; The next state Then, step S2 is executed to obtain the enhanced feature space of the next state.
[0023] S5 includes, S5.1, setting the target coverage threshold. and task completion rewards ,according to Calculate current coverage ; when hour, ; When the remaining steps are 0, and hour, = , As an additional penalty for failure, The penalty coefficient for each remaining frequency point; Through hit incentives Resource depletion and task evaluation A composite reward function is constructed using three linearly weighted parts. : ; In the formula, This is the gain coefficient. As a penalty weight, This is the energy weighting coefficient corresponding to the power level.
[0024] S5 includes, S5.2, based on time steps Construct a state sequence Constructing action sequences Construct a reward sequence Construct the state sequence for the next state. , This is the preset number of sampling steps.
[0025] S6 includes, in S6.1, introducing a proximal policy optimization pruning objective function to update the network parameters of the Actor and Critic networks, including introducing probability ratio constraints, assuming the old Actor network with fixed parameters... Suppose the Actor network currently being updated... For the first Each time step and Define probability ratio : ; In the formula, For the parameters of the Actor network, For the first Each time step , For the first Each time step ; Set the cropping threshold ,Will Limited to the range Inside: ; In the formula, It means that when hour, Values ,when hour, Values ; Using state sequences, action sequences, reward sequences, and next state sequences, according to time steps Calculate the advantage function sequentially : ; ; In the formula, For the value estimation of the current state, As a discount factor, For parameters to estimate the generalized dominance; At that time, add the corresponding The probability of choosing, At that time, reduce the corresponding The probability of choosing, At that time, the corresponding The probability of selection remains unchanged.
[0026] S6 includes S6.2, based on and Construct the pruning objective function for near-end policy optimization: ; In the formula, For time step Expectations; Introducing value function loss With policy entropy term Construct the total loss function : ; In the formula, For the network parameters of the Critic network, The value loss coefficient, This is the entropy regularization coefficient; Update simultaneously using gradient descent. and Set a training step threshold Training steps reached When the time is right, stop training and output the trained Actor network, which is the optimal interference policy model.
[0027] The successful interference determination derivation of this invention includes defining the signal power spectral density of the communicating party at the receiving end as... Define the interference threshold If the signal-to-interference-plus-noise ratio at the target location is less than Interference was successfully detected. ; In the formula, The underwater noise power spectral density, ; For a specific frequency Effective interference threshold for: ; A frequency is considered successfully interfered with if and only if the strength of the interfering signal at the receiver exceeds the sum of the communication signal and the background noise. This can be manifested as follows: .
[0028] In the formula, For the first Successful observations of interference at each frequency point; Therefore, the criterion for successful interference is: .
[0029] When setting the two power spectral densities in practice, it should be ensured that the low power spectrum has a certain redundancy while meeting the above threshold, and the high power spectral density should be greater than the low power to combat fading.
[0030] The following description, in conjunction with the accompanying drawings and embodiments, further illustrates the process of this invention. Figure 1 As shown, an underwater acoustic MDP environment is first constructed to realize PSD hierarchical state mapping and task constraint index initialization. Then, execution feature enhancement is obtained, local density is calculated and bandwidth pressure / margin signals are superimposed, PPO agent policy is generated and gradient is calculated, and an Actor-Critic network is configured. Then, action execution and environmental state transition are performed, resource circuit breaking is verified and the target state is updated. After the update, the composite reward signal is calculated. Finally, the network parameters are iteratively evolved using the PPO pruning objective function until the policy converges, and the optimal interference policy model is output.
[0031] To simulate a realistic underwater acoustic warfare environment, an adaptive interference environment based on the physical characteristics of sound propagation and the ambient noise floor is first constructed. The interference task of this invention is as follows: Figure 2 As shown, the communicating party will use the following method: Figure 2 The frequency hopping array used on the right transmits communication signals, while the dashed part on the left represents the interference party using low power spectral density interference signals. The solid line represents high power spectral density interference. If the dashed or solid line includes the frequency hopping band within the range of the interference point, then the interference at that frequency point is successful.
[0032] This invention establishes an optimal interference criterion that achieves a dynamic balance between task urgency and resource constraints. The overall structure of the model and its interaction with the environment are as follows: Figure 3 As shown, this illustrates the closed-loop decision-making process for underwater frequency-hopping adversarial attacks based on deep reinforcement learning. First, the agent's decision-making center consists of two neural networks; the upper policy network adjusts its decisions based on the reinforcement state. Output action distribution The value network below calculates the evaluation value of the current state. The decisions generated include three core components: interference bandwidth, interference power spectral density, and interference band start frequency. These actions directly affect the underwater frequency-hopping communication environment. In this environment, the system processes the original frequency-hopping set in real time and, combined with selective frequency attenuation probability calculations and physical factors such as underwater transmission loss and classical noise, simulates the actual performance of interference signals in complex acoustic channels. After the actions are executed, the environment provides a reward function. In specific experiments , When the coverage task is completed ,Right now When not completed ,coefficient This reward value is used to update the network weights. Simultaneously, the environment outputs the original state information. The process involves entering the state guidance layer for feature enhancement, which calculates the density distribution through a sliding window, monitors the margin for high-power actions, and evaluates the dynamic bandwidth pressure index, ultimately generating the enhanced state. The neural network is then re-inputted, thus completing the full iterative process from perception to decision-making.
[0033] The relevant settings of this embodiment are as follows. This embodiment of the invention is based on the reinforcement learning algorithm PPO to optimize the strategy for underwater frequency hopping communication interference tasks. The experimental environment is constructed to simulate underwater conditions, used to simulate an underwater two-layer power interference scenario. The frequency hopping set size is set to 64, and each frequency point contains two types of observation information: low-power visible layer and high-power visible layer. The original state dimension of the environment is 131-dimensional, specifically including 128-dimensional spectrum occupancy information and state variables representing task progress such as the remaining steps and the remaining number of targets. In each round, the environment dynamically generates the communication frequency point distribution and channel state information. In this experiment, it is only necessary to satisfy whether low-power density and high-power spectral density interference is feasible. The specific communication power setting can be set under this requirement. It is only necessary to determine the difference in the actual power of the two power spectral densities. In this example, it is set to 10 times, that is, under the same bandwidth, the power consumption of high-power spectral density is 10 times that of low-power spectral density. ,and exist The data are generated uniformly and randomly between the specified values, with thresholds of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 ... , , .
[0034] Subsequently, this embodiment introduces a guiding layer to enhance the state space based on the original observations. By constructing local spectral density features with a window width of 3, the neighborhood of each frequency point is aggregated and calculated to obtain low-power density maps and high-power density maps, each with 64 dimensions, to characterize the potential interference gains near the frequency point. Simultaneously, key decision signals are extracted from a global task perspective, including coverage capability ratio and task urgency. The coverage capability ratio represents the theoretical coverage capability of the current low-power mode for the remaining targets, and the task urgency is defined as the ratio of the number of remaining targets to the number of remaining steps, reflecting the task execution pressure. Finally, the original 131-dimensional state, 128-dimensional density features, and 2-dimensional guiding signal are concatenated to form an enhanced state vector with a total dimension of 261. This state serves as the input data for the reinforcement learning model of this invention.
[0035] In terms of model construction, the PPO algorithm is used for policy learning while maintaining the network structure and size unchanged to verify the performance improvement brought by the state augmentation method. Specific parameters are set as discount factors. Generalized dominance estimation parameters The learning rate is 3×10⁻ 4Each policy update involves 512 sampling steps, a batch size of 128, 10 training epochs, and a PPO pruning parameter of 0.2. Both the policy network and the value network employ a two-layer fully connected structure, with each layer containing 256 neurons and using ReLU as the activation function.
[0036] In terms of the training process, this embodiment sets the total number of training steps to 500,000 (i.e., ... The system uses a sliding window to statistically analyze the performance of the last 1000 rounds, continuously recording key indicators such as average reward, interference success rate, average bandwidth consumption, and number of high-power usages. The interference success criterion is defined as follows: at the single-step level (i.e., in each action decision), interference at a frequency point is considered successful when the interference power spectral density at the target location meets the interference requirements; at the round level (i.e., in each interference task), a round is considered successful when at least 40% of the frequency points in the frequency hopping set are successfully interfered with, otherwise it is considered a failure when resources are exhausted or the maximum number of steps is reached.
[0037] Experimental results show that in complex underwater channel environments, after 500,000 training steps, the proposed method achieves an interference success rate of approximately 95%, while outperforming traditional methods in terms of reward function convergence speed and resource utilization efficiency. Under the same interference success rate, the proposed method significantly reduces bandwidth consumption and high-power usage frequency, demonstrating superior policy stability and convergence efficiency. Comparison with DQN, A2C, and the standard PPO algorithm reveals that the proposed method enhances the state space through a guiding layer, enabling the agent to more effectively identify high-value interference frequencies. This achieves global optimization of resource allocation while ensuring interference effectiveness, verifying that the proposed method outperforms other methods in this scenario. The table below shows the average metrics from 1000 interference task tests. It is evident that the success rate surpasses other models, outperforming PPO and DQN in all metrics. In the A2C comparison, the success rate is 5% higher despite slightly higher power consumption, while power consumption only increases slightly. The results are shown in Table 1. Table 1. Comparison of results between the method of the present invention and the traditional method .
[0038] Power consumption during training is as follows Figure 4 As shown, the horizontal axis represents the number of training steps, where each step the model performs is considered as selecting an interference frequency band. The vertical axis represents the power consumption equivalent to the low power spectral bandwidth. Figure 5 This represents the reward situation. The horizontal axis in the graph also represents the number of training steps, and the vertical axis represents the reward value. Figure 6 To illustrate the success rate, the horizontal axis represents the number of training steps, and the vertical axis represents the success rate. For curves of the same color in the graph, different shades indicate the original data (unsmoothed data) in lighter shades and the smoothed data in darker shades.
[0039] The above embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some or all of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present invention.
Claims
1. A method for allocating interference resources for underwater resource-constrained frequency-hopping communication, characterized in that, include: S1. Construct an underwater adaptive interference environment, including setting the power spectral density of the interfering party's transmission to a dual power level, including a low power level and a high power level. Based on the power spectral density of the receiving end of the communicating party and the effective interference threshold, construct the interference success criterion. Use probability threshold binarization mapping to convert the frequency point fading probability into spectral features. Concatenate the spectral features and constraint indicators through state vectors to form the original state vector. S2. Based on the original state vector, the local density features under each power spectral density level are extracted using the sliding window operator. The high power level action margin and dynamic bandwidth pressure index are calculated according to the current resource boundary to construct an enhanced feature space. S3. Introduce the Actor network and Critic network from the near-end policy optimization algorithm to enhance the feature space input network. The Actor network outputs the probability distribution of the interference action, and the Critic network outputs the value estimate of the current state. The three-dimensional action is sampled from the probability distribution of the interference action, and the three-dimensional action includes the frequency index, bandwidth and power spectral density level. S4. Set up a circuit breaker mechanism. If the circuit breaker is triggered, the interference task fails. If the circuit breaker is not triggered, determine the hit frequency point based on the spectrum characteristics, combine it with the updated constraint index, form the original state vector of the next state, and then execute step S2 to obtain the enhanced feature space of the next state. S5. A composite reward function is constructed by linearly weighting the three parts of hit incentive, resource consumption and task evaluation. Based on the time step, a state sequence, action sequence, reward sequence and state sequence of the next state are constructed. S6. Introduce a near-end policy optimization pruning objective function to train the Actor network and Critic network into a neural network. After training, the optimal interference policy model is obtained.
2. The interference resource allocation method for underwater resource-constrained frequency hopping communication according to claim 1, characterized in that, S1 includes, S1.1, where the power spectral density of the interfering party's transmission is... , , This is the low power setting. For the high-power setting, the distance between the interfering party and the receiving end of the communicating party is set to [value missing]. The interference frequency is Propagation loss for: ; In the formula, The absorption coefficient of seawater increases significantly with increasing frequency; Power spectral density of the interfering party at the receiving end of the communication party for: ; Set effective interference threshold The criteria for successful interference are: 。 3. The interference resource allocation method for underwater resource-constrained frequency hopping communication according to claim 2, characterized in that, S1 includes, S1.2, setting a preset success probability threshold. and ,and ,set up Corresponding probabilistic decay parameter ,set up Corresponding probabilistic decay parameter ,satisfy ; For each frequency point in the frequency hopping set of the communicating party , For frequency point index, , Let be the total number of frequency points, and define the spectral characteristics. For the first Each frequency point The expected interference In order to meet the expected interference, The expected interference is not met; assuming the spectral characteristics... For the first Each frequency point The expected interference In order to meet the expected interference, The expected interference was not met; like ,make ;like ,make ; like ,make ;like ,make ; Will and Stacked row by row, forming A two-dimensional spectral feature array; Real-time collection of task constraint metrics, including the number of remaining interference steps. Target margin to be interfered with The total number of frequencies successfully jammed by the current jamming mission. and normalized remaining bandwidth resources ; Construct the original state vector : 。 4. The interference resource allocation method for underwater resource-constrained frequency hopping communication according to claim 3, characterized in that, S2 includes the use of the sliding window operator. Extracting density features at each power spectral density level along the frequency axis : ; In the formula, for The spectral feature array in the data. A set of frequency indices for window coverage. for The index is The element value, For frequency point index, ; set up for The corresponding density features, let for Corresponding density features; Calculate the high-power gear operating margin based on real-time resource boundaries. : ; In the formula, For the energy consumption of a single high-power attack, This represents the current resource boundary; Calculate the dynamic bandwidth pressure index : ; Constructing an enhanced feature space : 。 5. The interference resource allocation method for underwater resource-constrained frequency hopping communication according to claim 4, characterized in that, S3 includes obtaining three-dimensional actions by randomly sampling from the probability distribution of interference actions. , , For frequency point index, , For bandwidth, This is the power spectral density setting; At each time step Underwater adaptive interference environment receiver Calculate the current cumulative resource loss. : ; In the formula, For the first The bandwidth accumulated in this step For the first The energy weighting coefficient corresponding to each accumulated power level.
6. The interference resource allocation method for underwater resource-constrained frequency hopping communication according to claim 5, characterized in that, S4 includes setting a circuit breaker mechanism, including setting resource thresholds. ,like This triggers the circuit breaker mechanism, determining that the interference mission has failed. like If the circuit breaker mechanism is not triggered, the following operations will be performed: like ,choose ;like ,choose ; The frequency range covered by the action is from Start continuous The set of frequency points and action coverage locations is as follows: Check the spectral characteristics corresponding to the action coverage location set. If the spectral characteristic is 1, the current frequency point is defined as a hit; if the spectral characteristic is 0, the current frequency point is defined as a miss. Count the number of hits among the covered frequency points. Hit frequency points and Clear to zero; renew , , The latest Integrate into the internal state, and calculate the next state. : ; The next state Then, step S2 is executed to obtain the enhanced feature space of the next state.
7. The interference resource allocation method for underwater resource-constrained frequency hopping communication according to claim 6, characterized in that, S5 includes, S5.1, setting the target coverage threshold. and task completion rewards ,according to Calculate current coverage ; when hour, ; When the remaining steps are 0, and hour, = , As an additional penalty for failure, The penalty coefficient for each remaining frequency point; Through hit incentives Resource depletion and task evaluation A composite reward function is constructed using three linearly weighted parts. : ; In the formula, This is the gain coefficient. As a penalty weight, This is the energy weighting coefficient corresponding to the power level.
8. The interference resource allocation method for underwater resource-constrained frequency hopping communication according to claim 7, characterized in that, S5 includes, S5.2, based on time steps Construct a state sequence Constructing action sequences Construct a reward sequence Construct the state sequence for the next state. , This is the preset number of sampling steps.
9. The interference resource allocation method for underwater resource-constrained frequency hopping communication according to claim 7, characterized in that, S6 includes, in S6.1, introducing a proximal policy optimization pruning objective function to update the network parameters of the Actor and Critic networks, including introducing probability ratio constraints, assuming the old Actor network with fixed parameters... Suppose the Actor network currently being updated... For the first Each time step and Define probability ratio : ; In the formula, For the parameters of the Actor network, For the first Each time step , For the first Each time step ; Set the cropping threshold ,Will Limited to the range Inside: ; In the formula, It means that when hour, Values ,when hour, Values ; Using state sequences, action sequences, reward sequences, and next state sequences, according to time steps Calculate the advantage function sequentially : ; ; In the formula, For the value estimate of the current state, As a discount factor, For parameters to estimate the generalized dominance; At that time, add the corresponding The probability of choosing, At that time, reduce the corresponding The probability of choosing, At that time, the corresponding The probability of selection remains unchanged.
10. The interference resource allocation method for underwater resource-constrained frequency hopping communication according to claim 9, characterized in that, S6 includes S6.2, based on and Construct the pruning objective function for near-end policy optimization: ; In the formula, For time step Expectations; Introducing value function loss With policy entropy term Construct the total loss function : ; In the formula, For the network parameters of the Critic network, The value loss coefficient, This is the entropy regularization coefficient; Update simultaneously using gradient descent. and Set a training step threshold Training steps reached When the time is right, stop training and output the trained Actor network, which is the optimal interference policy model.