An interference waveform design method based on an autoencoder and a DDPG algorithm

By optimizing the jamming waveform design through autoencoders and DDPG algorithms, the problem of the single form of traditional jamming waveforms is solved, achieving effective jamming of cognitive radar and improving the combat effectiveness of electronic jamming.

CN116736239BActive Publication Date: 2026-06-12HARBIN ENG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HARBIN ENG UNIV
Filing Date
2023-06-15
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Traditional interference waveform design methods result in a single interference waveform form, which cannot be optimized in a timely manner according to changes in the actual environment, leading to poor interference effect on intelligent cognitive radar.

Method used

An autoencoder and DDPG algorithm are used to design the interference waveform. An interference signal generation filter is constructed, and Gaussian white noise and radar signal are used to generate the interference signal. Combined with constant false alarm rate detection and power assessment, reinforcement learning is used to optimize the interference waveform and realize adaptive interference waveform design.

🎯Benefits of technology

It improves the environmental adaptability of the interference waveform, can adaptively generate interference waveforms according to actual conditions, enhances the interference effect on cognitive radar, and achieves high degree of freedom optimization of the interference waveform in the time and frequency domain.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116736239B_ABST
    Figure CN116736239B_ABST
Patent Text Reader

Abstract

The application discloses a jamming waveform design method based on a self-encoder and a DDPG algorithm, and particularly relates to a cognitive jamming waveform design method for an electronic warfare jamming radar detection link based on a self-encoder and a deep deterministic policy gradient algorithm. In order to solve the problem that a jamming waveform obtained by a traditional jamming waveform design reduces the combat effect of electronic jamming and cannot form effective jamming on an intelligent cognitive radar, the process of generating a jamming signal from a Gaussian white noise and a radar signal is represented by a jamming signal generation filter, the impulse response of the jamming signal generation filter is obtained through a deconvolution operation of the jamming signal and the radar signal, the impulse response of the jamming signal generation filter is regulated through a self-encoder and a DDPG algorithm, the effective jamming interval and the power ratio are optimized, a cognitive jamming waveform is generated based on real-time feedback of a constant false alarm detection, and the jamming waveform can be optimized at any time along with the change of an actual environment. The application belongs to the field of radar jamming.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to a method for designing jamming waveforms, specifically a cognitive jamming waveform design method for the electronic warfare jamming radar detection stage based on an autoencoder and the Deep Deterministic Policy Gradient (DDPG) algorithm, belonging to the field of radar jamming technology. Background Technology

[0002] With the rapid development of artificial intelligence technology, the competition in the field of electronic warfare on modern battlefields is becoming increasingly fierce. The emergence of cognitive radar has brought enormous challenges to jamming forces, and consequently, cognitive jamming technology has received increasing attention from experts and scholars. Cognitive jamming technology can adaptively perceive the surrounding situation in complex electromagnetic environments. During the perception process, it uses machine learning to determine the threat level, make optimal jamming decisions, automatically generate the best jamming signal, and evaluate the current jamming effect. The effective jamming waveform generated by cognitive jamming technology can interfere with the signal processing stage of the radar receiver, suppressing the radar's signal processing capabilities and even inhibiting the radar's detection of reconnaissance targets.

[0003] Traditional cognitive jamming technology designs jamming waveforms based on theoretical formulas, which pre-modulate radar signals in the time and frequency domains to obtain jamming waveforms. However, the jamming waveforms are controlled by modulation parameters, resulting in problems such as a single waveform form and inability to be optimized in a timely manner according to changes in the actual environment. This greatly reduces the combat effectiveness of electronic jamming, especially against increasingly intelligent cognitive radars. Summary of the Invention

[0004] To address the problems of traditional jamming waveform design, which is subject to modulation parameters and suffers from a lack of variety in waveform form and inability to adapt to changes in the actual environment, thus greatly reducing the effectiveness of electronic jamming, especially against increasingly intelligent cognitive radars, this invention proposes a jamming waveform design method based on an autoencoder and the DDPG algorithm.

[0005] The technical solution adopted in this invention is:

[0006] It includes the following steps:

[0007] S1. Construct an interference signal generation filter, taking Gaussian white noise and radar signal as inputs, and outputting an interference signal. The specific process is as follows:

[0008] Gaussian white noise is acquired using a jammer, and then processed to obtain processed Gaussian white noise. At the same time, the jammer intercepts a segment of radar signal. Based on the processed Gaussian white noise and the radar signal, a smart noise forwarding jamming method is used to obtain the jamming signal.

[0009] S2. Using different modulation methods of the jammer, the jamming signal obtained in S1 is used to generate different jamming signals, and all the generated jamming signals are combined into an interference waveform sample set.

[0010] S3. Obtain the impulse response of the corresponding interference signal generation filter by performing a deconvolution operation on each interference signal in the radar signal and interference waveform sample set, thus obtaining the corresponding interference signal generation filter. Combine all the interference signal generation filters into an interference signal generation filter sample set.

[0011] S4. Construct an autoencoder, which consists of an input layer, a hidden layer, and an output layer. The input layer to the hidden layer is used as the encoder, and the hidden layer to the output layer is used as the decoder.

[0012] Each impulse response in the sample set of the interference signal generation filter is input into the encoder of the autoencoder to obtain the high-dimensional features of the corresponding impulse response. The high-dimensional features are input into the decoder to output the corresponding recovered impulse response, i.e., the new interference signal generation filter. The autoencoder is trained based on the above process until the MSE error no longer decreases, and the trained autoencoder, as well as the high-dimensional features of each impulse response and the recovered impulse response are obtained.

[0013] S5. Establish an interference effect evaluation system, which includes constant false alarm rate (CFAR) detection evaluation and power evaluation. CFAR detection evaluation includes the distribution area of ​​false alarm targets, i.e., the limited interference range, whether real targets are missed, and the peak energy characteristics of false alarm targets. Power evaluation includes the power ratio of interference signal and radar signal.

[0014] The radar signal intercepted in S1 is convolved with each recovered impulse response obtained in S4 to obtain multiple interference signals. The interference effect evaluation system is used to evaluate each interference signal to obtain the corresponding evaluation results.

[0015] S6. Based on the evaluation results obtained in S5, find the evaluation results corresponding to the high-dimensional features of each impulse response obtained in S4. Perform Markov modeling based on all high-dimensional features and their corresponding evaluation results to obtain the waveform design agent.

[0016] S7. Train the waveform optimization agent using the DDPG algorithm to obtain the final waveform optimization agent;

[0017] S8. Obtain radar signal and interference signal, obtain impulse response of interference signal generation filter based on radar signal and interference signal, input impulse response of interference signal generation filter into the trained autoencoder obtained in S3, and obtain high-dimensional features of impulse response and interference signal generation filter 1.

[0018] The radar signal is passed through the interference signal generation filter 1 to output a new interference signal. The interference signal is evaluated using the interference effect evaluation system of S5 to obtain the evaluation result.

[0019] The waveform design agent is obtained by taking the high-dimensional features of the impulse response and the evaluation results as state input S7, and then outputting the action of the waveform design agent, i.e., the updated high-dimensional features.

[0020] The updated high-dimensional features are input into the decoder of the trained autoencoder obtained by S4, and the output interference signal generation filter 2 is generated.

[0021] The radar signal is passed through the interference signal generation filter 2 to output interference signal I. The interference waveform is obtained based on interference signal I, and the interference waveform design is completed.

[0022] Furthermore, the specific process of S1 is as follows:

[0023] S11. Obtain Gaussian white noise n using an interference machine. w :

[0024] n w =wgn(1,N) w E w )

[0025] Where, N w The number of sampling points is E, which is the length of the Gaussian white noise. w The power of the noise;

[0026] S12, The jammer intercepts a segment of radar signal s t Determine radar signal s t With an intermediate frequency of f0, a bandwidth of bw, and a pulse width of pw, a Butterworth bandpass filter is designed and obtained based on the aforementioned three parameters.

[0027] S13. Process the Gaussian white noise through a Butterworth filter to obtain n. wb :

[0028] n wb =n w *buttord[w p ,w s ,R p ,R s ]

[0029] Among them, ws For the stopband frequency band of the Butterworth bandpass filter, w s =[f0-bw / 2-f m f0+bw / 2+f m ], f m It is the transition bandwidth between the passband and the stopband; w p For the passband frequency band of the Butterworth bandpass filter, w p = [f0-bw / 2, f0+bw / 2]; R p It is a parameter describing the passband ripple; R s It is a parameter describing stopband attenuation; buttord[·] represents the setting function for the Butterworth filter;

[0030] S14, n wb Multiplying the time-domain signal by the amplitude k of a normal distribution yields the modulated Gaussian white noise n. t :

[0031] n t =k*n wb

[0032] in, y follows N(0,E) c The normal distribution of y is given, and the length of y is the length N of the Gaussian white noise. w ;

[0033] S15, n t With s t Perform a convolution operation in the time domain to obtain the interference signal j. p :

[0034]

[0035] in, Indicates the convolution operation;

[0036] Using a cyclic shift operation to superimpose the interference signal j p The final interference signal j is generated.

[0037]

[0038] Where m1 is the shift interval of the interference signal, m2 is the number of shifts of the interference signal, and cyclicshift(j p (m1) is a cyclic shift function, which represents the shifting of the interference signal j. p The data in the middle is shifted to the right by point m1.

[0039] Furthermore, the specific process of S2 is as follows:

[0040] Based on S14-S15, through n sets of modulation parameters [Ec [m1, m2] Generate n sets of interference signals with different interference effects. Each interference signal is an interference waveform sample. Combine all interference waveform samples into an interference waveform sample set j. 1×n .

[0041] Furthermore, the specific process of S3 is as follows:

[0042] For radar signal s respectively t Perform a Fourier transform on each interference signal j(t) in the interference waveform sample set to obtain the radar signal frequency domain R(ω) and the corresponding interference signal frequency domain J(ω). According to the time-domain convolution theorem, divide each interference signal frequency domain J(ω) and radar signal frequency domain R(ω) in turn to obtain the frequency domain H(ω) of the corresponding interference signal generation filter impulse response. Perform an inverse Fourier transform on each H(ω) to obtain the corresponding interference signal generation filter impulse response h(t), thus obtaining the corresponding interference signal generation filter. Combine all interference signal generation filters into an interference signal generation filter sample set h. 1×n .

[0043] Furthermore, when training the autoencoder, S4 uses a mean squared error function to minimize the error between the input and output data of the autoencoder:

[0044]

[0045] Where L(w,b) represents the error between the input and output data, L MSE (·) represents the MSE function, and x represents the input data of the autoencoder. This represents the output data of the autoencoder;

[0046] Simultaneously, by adding KL divergence constraints to neurons or weights in the hidden layer within the autoencoder's loss function, then assuming a given input data of x1, h j (x1) represents the activation value of neuron j in the hidden layer.

[0047]

[0048] in, This represents the average activation value of neuron j in the hidden layer across all input data, where n is the number of input data.

[0049] Sparse penalty term based on KL divergence function:

[0050]

[0051] Where ρ is the sparsity parameter;

[0052] The loss function of the autoencoder with added sparsity is:

[0053]

[0054] Where β is the weight of the sparsity penalty factor, and m is the number of neurons in the hidden layer.

[0055] Furthermore, in step S5, the radar signals intercepted in S1 are convolved with each recovered impulse response obtained in S4 to obtain multiple interference signals. Each interference signal is then evaluated using an interference effect evaluation system to obtain the corresponding evaluation result. The specific process is as follows:

[0056] The radar signals intercepted by the jammer in S1 are convolved with each recovered impulse response obtained in S4 to obtain multiple jamming signals. Each jamming signal is then evaluated using a jamming effectiveness evaluation system to obtain the corresponding evaluation result, E = {F / T, R}. u ,η,JSR}, where F represents the real target being missed, T represents the real target being detected, and R u η represents the ratio of the maximum peak value of the pulse-compressed echo signal to the peak value of the target signal. The echo signal consists of the echo signals of the interference signal and the radar signal. JSR represents the power ratio of the echo signal.

[0057] Discretize the evaluation results:

[0058] (1) Detection results of real targets

[0059]

[0060] When the real target is missed, F / T = 0; otherwise, F / T = -1.

[0061] (2) Effective Interference Range

[0062]

[0063] in, Indicates the position of the first dummy target, X Real This indicates the location of the real target. If the first false target is located in front of the real target, then... Then R u =0; otherwise R u =-1;

[0064] (3) The ratio of the maximum peak value after pulse compression of the echo signal to the peak value of the target signal.

[0065]

[0066] in, Y represents the maximum peak value after pulse compression of the echo signal.Real This represents the peak value of the target signal. η0 is a preset maximum peak value ratio threshold. If the maximum peak value ratio exceeds the threshold, then η = 0; otherwise, η = -1. The maximum peak value ratio is the maximum value of the ratio between the maximum peak value after pulse compression of the echo signal and the peak value of the target signal.

[0067] (4) Echo signal power ratio

[0068]

[0069] Among them, P Jam P represents the power of the interference signal. Radar JSR0 represents the radar signal power. It is a preset power ratio threshold. If the power ratio is less than the threshold, JSR = 0; otherwise, JSR = -1.

[0070] Based on the above processing, the interference effect of the current interference signal can be represented by a set of 1×4 one-dimensional arrays, that is, the final evaluation result is represented by the discrete evaluation result.

[0071] Furthermore, the specific process of S6 is as follows:

[0072] Based on the evaluation results obtained in S5, find the evaluation results corresponding to the high-dimensional features of each impulse response obtained in S4. Take the high-dimensional features of each impulse response in the previous moment and the corresponding evaluation results as the state of the waveform design agent, take updating the high-dimensional features in the current moment as the action of the waveform design agent, and take the evaluation results corresponding to the current high-dimensional features as the reward of the waveform design agent. Thus, the initialized waveform design agent can be obtained.

[0073] Furthermore, the reward for the waveform design agent is determined based on the desired interference effect:

[0074] R t =[F / T+R u +(η-η0)+(JSR0-JSR)]×10

[0075] When the evaluation result E = {0,0,0,0}, R t >0, the waveform optimization agent aims for a higher maximum peak value ratio and a lower interference-to-information ratio, learning a better coding strategy in the direction of increasing reward value.

[0076] Furthermore, the specific process of S7 is as follows:

[0077] S71. Set the learning parameters of the DDPG algorithm, including the experience replay pool D, the reward discount factor γ, and the learning rate r;

[0078] S72. Initialize the experience replay pool Capacity D max ;

[0079] S73. Establish policy neural networks respectively. Value Neural Network Q(s,a|θ), Policy Neural Network Target Network The target network of the value neural network is Q'(s',a'|θ');

[0080] Initialize the parameters of the policy neural network and the value neural network respectively. θ, and the parameters of the target network of the policy neural network and the target network of the value neural network. θ', so that Q'←θ',

[0081] S74. Design the state S of the agent based on the waveform at a certain moment in S6. t The input policy neural network is configured according to the initialization parameters of the policy neural network. θ obtains the exploration strategy The waveform design agent's state is used to output the current action A of the waveform design agent through an exploration strategy. t Then initialize the noise distribution N, where the noise is a random number for each action A. t Adding noise will change the current action A. t For A t =μ θ (s)+N t N t This represents the noise added by the current action. The waveform design agent will execute the current action A. t Receive the current reward value R t And the state S at the next moment. t+1 ;

[0082] S75, will (S t A t ,R t ,S t+1 ) is stored in the experience replay pool D of S72, and at the same time, (S t A t ,R t ,S t+1 The policy neural network, value neural network, target network of the policy neural network, and target network of the value neural network are trained. During the training process, state-action pairs (S... t A t Inputting the value into a value neural network yields the state-action value Q. θ Then the parameters of the policy neural network are determined by Q. θ Update using gradient ascent strategy:

[0083]

[0084] In the formula: Let represent the gradient operator, J represent the loss function of the policy neural network, M represent the random training batch, and Q(·) and μ(·) represent the target network and policy neural network of the value neural network, respectively. Represents the parameters of the policy neural network;

[0085] The parameters of the value neural network are updated by minimizing the mean squared Bellman error loss function using a stochastic gradient descent strategy:

[0086]

[0087] In the formula: R represents the reward value, and Q'(·) and μ'(·) represent the target network of the value neural network and the target network of the policy neural network, respectively;

[0088] The parameter optimization of both target networks mentioned above is achieved through delayed updates of the online network:

[0089] θ'=ρθ'+(1-ρ)θ

[0090]

[0091] In the formula: θ represents the parameters of the policy neural network and the value neural network. θ′ represents the parameters of the target network of the policy neural network and the target network of the value neural network, and ρ represents the discount factor, which is a constant;

[0092] S76. Current action A obtained from S74 t Obtain the high-dimensional feature update value, input the high-dimensional feature update value into the decoder of the autoencoder trained in S4, output the impulse response of the corresponding interference signal generation filter, convolve the impulse response with the radar signal intercepted in S1 to obtain the interference signal, evaluate the interference signal using the interference effect evaluation system, if the real target is missed and the interference interval leads the real target, end the training and obtain the final waveform optimization agent, otherwise continue to execute S74-S76.

[0093] Beneficial effects:

[0094] This invention targets interference in the signal detection section, generating an interference signal using Gaussian white noise and radar signals. The generation process is represented by an interference signal generation filter. The interference signal and radar signal are deconvolved to obtain the impulse response of the interference signal generation filter. An autoencoder is trained using this impulse response. The autoencoder takes the impulse response as input and outputs high-dimensional features; the decoder takes these high-dimensional features as input and outputs the recovered impulse response, thus obtaining a new interference signal generation filter. The autoencoder is used to extract high-dimensional features, reducing computational complexity. An interference effectiveness evaluation system is established, including constant false alarm rate (CFAR) detection evaluation and power evaluation. CFAR detection evaluation includes the distribution area of ​​false alarm targets (i.e., the finite interference interval), whether real targets are missed, and the peak energy characteristics of false alarm targets. Power evaluation includes the power ratio of the interference signal and the radar signal. The radar signal and the recovered impulse response are convolved to obtain the interference signal. The interference effectiveness evaluation system is used to evaluate each interference signal, yielding the corresponding evaluation result. Markov modeling is performed based on high-dimensional features and their corresponding evaluation results to construct a reinforcement learning environment. Environmental feedback is introduced to enhance environmental adaptability. The high-dimensional features and corresponding evaluation results of each impulse response at the previous time step are used as the state of the waveform design agent. Updating the high-dimensional features at the current time step is used as the action of the waveform design agent, and the evaluation result corresponding to the current high-dimensional features is used as the reward, resulting in an initialized waveform design agent. The DDPG algorithm is used to optimize the waveform optimization agent, resulting in the final waveform optimization agent. The updated values ​​of the high-dimensional features are obtained using the final waveform optimization agent. A new interference signal generation filter is output using the decoder of the autoencoder. Then, a segment of radar signal is input into the new interference signal generation filter, which outputs an interference signal, resulting in the corresponding interference waveform. The machine learning algorithm (DDPG algorithm) is used as the waveform design tool. Through neural networks, it learns the design experience of traditional interference waveforms. Taking actual interference effects as a guide, reinforcement learning algorithms are combined with feedback from the battlefield environment to optimize the empirical features learned from traditional waveforms. This effectively improves the environmental adaptability of the interference waveform, enabling it to adaptively generate interference waveforms according to the needs of real-world conditions.

[0095] This invention addresses the electronic warfare jamming radar detection stage by using an autoencoder and the DDPG algorithm to regulate the impulse response (coefficients) of the jamming signal generation filter. This optimizes the effective jamming range and power ratio. Real-time feedback based on constant false alarm rate (CFAR) detection generates a cognitive jamming waveform, allowing it to be continuously optimized in response to changes in the actual environment. The resulting jamming waveform exhibits high degrees of freedom in the time and frequency domains, demonstrating stronger environmental adaptability and effectively jamming cognitive radar. Furthermore, it achieves adaptive jamming waveform design, enhancing the operational effectiveness of electronic jamming. Attached Figure Description

[0096] Figure 1 This is the design flowchart for the interference signal generation filter;

[0097] Figure 2 This is a flowchart of interference signal generation based on an autoencoder;

[0098] Figure 3 This is a diagram illustrating the learning principle of the DDPG algorithm.

[0099] Figure 4 This is a flowchart of the decoder network output control in conjunction with DDPG;

[0100] Figure 5 This is a schematic diagram of the interference waveform output by the interference signal generation filter;

[0101] Figure 6 This is a pulse compression result of the interference waveform after passing through the matched filter;

[0102] Figure 7 This is a graph showing the CFAR signal detection results;

[0103] Figure 8 It is a waveform optimization benefit curve; Detailed Implementation

[0104] Specific implementation method one: Combining Figures 1-8 This embodiment describes an interference waveform design method based on an autoencoder and the DDPG algorithm, which includes the following steps:

[0105] S1. Gaussian white noise is acquired using a jammer, processed to obtain processed Gaussian white noise. Simultaneously, the jammer intercepts a segment of radar signal. Based on the radar signal and the processed Gaussian white noise, a smart noise-forwarding jamming method is used to obtain jamming signals. Different modulation methods of the jammer are used to generate jamming signals with different jamming effects. Each generated jamming signal can cause varying degrees of false alarms and missed detections to the radar. All generated jamming signals are combined into an interference waveform sample set. The specific process is as follows:

[0106] First, using the jamming machine, the built-in Gaussian white noise generation function wgn(·) in the MATLAB library is used to generate noise of dimension (1×N). w Gaussian white noise n w :

[0107] n w =wgn(1,N) w E w )

[0108] Where, N wE represents the number of sampling points, which is also the length of the Gaussian white noise. w This represents the power of the noise.

[0109] The jammer intercepted a segment of radar signal s t The radar signal intermediate frequency is determined to be f0, bandwidth is bw, pulse width is pw, and sampling rate is f. s Using these four parameters, a Butterworth bandpass filter was designed and obtained. Gaussian white noise was input into the Butterworth filter for processing, resulting in n. wb :

[0110] n wb =n w *buttord[w p ,w s ,R p ,R s ]

[0111] Among them, w s For the stopband frequency band of the Butterworth bandpass filter, w s =[f0-bw / 2-f m f0+bw / 2+f m ], f m It is the transition bandwidth between the passband and the stopband; w p For the passband frequency band of the Butterworth bandpass filter, w p = [f0-bw / 2, f0+bw / 2]; R p It is a parameter describing the passband ripple (degree of fluctuation), set to 3; R s It is a parameter describing the stopband attenuation, set to 40; buttord[·] represents the setting function for the Butterworth filter.

[0112] To make the Gaussian white noise n after the above processing wb Convolution with radar signals achieves better energy convergence, and n wb Multiplying the time-domain signal by the amplitude k of a normal distribution yields the modulated Gaussian white noise n. t :

[0113] n t =k*n wb

[0114] in, y follows N(0,E) c The normal distribution of y is given, and the length of y is the length N of the Gaussian white noise. w .

[0115] In summary, the value of k is in [0, N] w The interval [equation] follows a mean of 0 and a variance of E. c The transformation of the normal distribution, so Ec It can adjust the convergence of the modulated Gaussian white noise energy, thereby controlling the peak difference between the false target region and the real target region after pulse compression, as well as the size of the individual false target interference region.

[0116] n t With s t Perform a convolution operation in the time domain to obtain the interference signal j. p :

[0117]

[0118] in, This indicates a convolution operation.

[0119] Using a cyclic shift operation to superimpose the interference signal j p The final interference signal j is generated.

[0120]

[0121] Where m1 is the shift interval of the interference signal, m2 is the number of shifts of the interference signal, and cyclicshift(j p (m1) is a cyclic shift function, which represents the shifting of the interference signal j. p The data in the middle is shifted to the right by point m1.

[0122] The echo signals of the jamming signal and the radar signal are input into the radar receiver. Based on the strong correlation between the jamming signal and the radar signal, the radar receiver can detect the jamming signal. After pulse compression processing by the radar receiver, the jamming signal will form an energy peak that can cause a false alarm in a certain area or at a certain point. The location of the energy peak formed by pulse compression is controlled by m1, and the number of energy peaks is determined by the size of m2. Therefore, by controlling the sizes of m1 and m2, the location and number of false targets can be controlled, thereby generating a false target area in the CFAR results.

[0123] Based on the above interference signal generation process, using n sets of modulation parameters [E] c [m1, m2] can generate n sets of interference signals with different interference effects. Each interference signal has an interference waveform, so each interference signal is an interference waveform sample. All interference samples can form an interference waveform sample set j. 1×n This is used for the design of subsequent interference signal generation filters.

[0124] S2. Represent the process of obtaining the interference signal from the radar signal in S1 using an interference signal generation filter. Assuming the impulse response of the interference signal generation filter is h(t), then the interference signal j(t) can be obtained from the radar signal s. t The convolution of (t) and h(t) is obtained, i.e.

[0125] j(t) = s t (t)*h(t)

[0126] Since radar signal s has already been obtained in S1 t Given the interference signal j(t) and the interference signal j(t), the impulse response of the interference signal generating filter can be given by j(t) as a function of s. t (t) is obtained by deconvolution, and the specific process is as follows:

[0127] For radar signal s respectively t Perform a Fourier transform (FFT) on the interference signal j(t) and each interference signal j(t) in the interference waveform sample set, and then compare the interference signal j(t) with the radar signal s. t Converting (t) to the frequency domain, we obtain the radar signal frequency domain R(ω) and the corresponding interference signal frequency domain J(ω). According to the time-domain convolution theorem, we divide each interference signal frequency domain J(ω) and radar signal frequency domain R(ω) in turn to obtain the frequency domain H(ω) of the impulse response of the corresponding interference signal generation filter. Performing an inverse Fourier transform on each H(ω) yields the impulse response of the corresponding interference signal generation filter, thus obtaining the corresponding interference signal generation filter.

[0128] h(t)=F -1 [J(ω) / R(ω)]

[0129] Therefore, the impulse response h(t) of the interference signal generation filter can be regarded as the coefficient of the interference signal generation filter in the traditional interference design method. Based on the above operations, the interference waveform sample set j 1×n Convert into the corresponding interference signal to generate filter sample set h 1×n .

[0130] S3. Construct an autoencoder and generate a filter sample set h using the interference signal. 1×n The autoencoder is trained by inputting an interference signal to generate the impulse response of a filter and outputting the recovered impulse response, until the MSE error no longer decreases, thus obtaining a trained autoencoder. The specific process is as follows:

[0131] An autoencoder consists of an input layer, a hidden layer, and an output layer. It is divided into an encoder and a decoder. The input layer to the hidden layer is the encoder, and the hidden layer to the output layer is the decoder.

[0132] h = f θ (x)=f1(wx+b h )

[0133] Where h represents the output data of the hidden layer, f θ(·) is the mapping function from the input layer to the hidden layer, where x represents the input data, w represents the connection weights between the input layer and the hidden layer, and b h f1(·) represents the bias between the input layer and the hidden layer, and f1(·) is the activation function of the encoder.

[0134]

[0135] in, Indicates the output data, g θ' (h) is the mapping from the hidden layer to the output layer. b represents the connection weight between the hidden layer and the output layer. v f2(·) represents the bias between the hidden layer and the output layer, and f2(·) is the activation function of the decoder.

[0136] Based on the two formulas above, the interference signal is used to generate the filter sample set h. 1×n Each impulse response is input into the encoder of the autoencoder to obtain the high-dimensional features of the corresponding impulse response. These high-dimensional features are then input into the decoder to output the corresponding recovered impulse response, which is the new interference signal generation filter. The autoencoder is trained in this way until the MSE error no longer decreases, resulting in a trained autoencoder, as well as the high-dimensional features of each impulse response and the recovered impulse response.

[0137] During training, the most important characteristic of an autoencoder is that the input and output are as similar as possible, that is, the error between the input x and the output x^ is minimized. Therefore, this invention uses the Mean Squared Error (MSE) function to minimize the error between the input (impulse response) and the output (high-dimensional features of the impulse response):

[0138]

[0139] Where L(w,b) represents the error between the input and output, L MSE (·) represents the MSE function.

[0140] Simultaneously, by adding KL divergence constraints to neurons or weights in the hidden layer within the autoencoder's loss function, then assuming a given input data (impulse response) of x1, h j (x1) represents the activation value of neuron j in the hidden layer.

[0141]

[0142] in, This represents the average activation value of neuron j in the hidden layer across all input data, where n is the number of input data.

[0143] The sparse penalty term based on the KL divergence function can be expressed as:

[0144]

[0145] Where ρ is the sparsity parameter.

[0146] The loss function of the autoencoder with added sparsity is as follows:

[0147]

[0148] Where β is the weight of the sparsity penalty factor, and m is the number of neurons in the hidden layer.

[0149] Autoencoders are unsupervised machine learning algorithms that leverage the nonlinear feature extraction capabilities of deep neural networks, exhibiting excellent feature dimensionality reduction abilities. The vectors obtained by the encoder contain some information from the original input signal, which can then be extracted for downstream tasks. High-dimensional features of traditional interference signals can be extracted using autoencoders, and interference waveforms with varying interference effects can be generated based on these features.

[0150] S4. Establish an interference effect evaluation system. The interference effect evaluation system includes constant false alarm rate (CFAR) evaluation and power evaluation. CFAR evaluation mainly includes the distribution area of ​​false alarm targets (false targets), i.e., the interference limited interval, whether real targets are missed, and the energy peak characteristics of false alarm targets. Power evaluation includes the power ratio of interference signal and radar signal.

[0151] The radar signals intercepted by the jammer in S1 are convolved with each recovered impulse response obtained in S3 to obtain multiple jamming signals. Each jamming signal is then evaluated using a jamming effectiveness evaluation system to obtain the corresponding evaluation result, E = {F / T, R}. u ,η,JSR}, where F represents the real target being missed, T represents the real target being detected (whether a false alarm occurred), and R u The interference range is finite. η represents the ratio of the maximum peak value of the pulse-compressed echo signal to the peak value of the target signal (energy peak characteristics of false alarm targets). The echo signal is the echo signal of the interference signal and the radar signal. JSR represents the echo signal power ratio.

[0152] To facilitate subsequent calculations, this invention presents the evaluation results in a numerical format, that is, discretizes the evaluation results:

[0153] (1) Detection results of real targets

[0154]

[0155] When the true target is missed, F / T = 0; otherwise, F / T = -1.

[0156] (2) Effective Interference Range (Distribution Area of ​​False Alarm Targets)

[0157]

[0158] in, Indicates the position of the first dummy target, X Real This indicates the location of the real target. If the first false target is located in front of the real target, then... Then R u =0; otherwise R u =-1.

[0159] (3) The ratio of the maximum peak value after pulse compression of the echo signal to the peak value of the target signal.

[0160]

[0161] in, Y represents the maximum peak value after pulse compression of the echo signal. Real This represents the peak value of the target signal. η0 is a preset maximum peak value ratio threshold. If the maximum peak value ratio exceeds the threshold, then η = 0; otherwise, η = -1. The maximum peak value ratio is the maximum value of the ratio between the maximum peak value after pulse compression of the echo signal and the peak value of the target signal.

[0162] (4) Echo signal power ratio

[0163]

[0164] Among them, P Jam P represents the power of the interference signal. Radar JSR0 represents the radar signal power. It is a preset power ratio threshold. If the power ratio is less than the threshold, JSR = 0; otherwise, JSR = -1.

[0165] Based on the above processing, the interference effect of the current interference signal can be represented by a set of 1×4 one-dimensional arrays, that is, the final evaluation result is represented by the discretized evaluation result. In practical applications, the interference signal is processed by the radar receiver pulse compression and then detected by a constant false alarm rate detector. If the real target is missed, the effective interference range is ahead of the real target, the maximum peak power ratio is greater than the threshold η0, and the echo signal power ratio is less than the threshold JSR0, then the evaluation result E={0,0,0,0} can be obtained.

[0166] S5. Based on the evaluation results obtained in S4, find the evaluation results corresponding to the high-dimensional features of each impulse response obtained in S3. Perform Markov modeling based on all high-dimensional features and their corresponding evaluation results to obtain the waveform design agent. The specific process is as follows:

[0167] The waveform design agent is initialized by taking the high-dimensional features of each impulse response in the previous time step and the corresponding evaluation results as the state of the waveform design agent, taking the updating of the high-dimensional features in the current time step as the action of the waveform design agent, and taking the evaluation results corresponding to the current high-dimensional features as the reward of the waveform design agent.

[0168] The actions that a waveform design agent can generate mainly involve updating the high-dimensional features of the impulse response of the interference signal generation filter at the current moment, represented as: in, This represents the updated value of the high-dimensional feature, which is the encoding input to the decoder at the current time step.

[0169] The goal of waveform optimization is to maximize the interference effect against CFAR, including raising the detection threshold to prevent the real target from being missed and optimizing the interference range to achieve leading interference. Therefore, the current interference signal is evaluated according to the interference effect evaluation system of S4, and the evaluation result E is obtained. The evaluation result E is used as the reward R. t Reward R t Determined based on the desired interference effect:

[0170] R t =[F / T+R u +(η-η0)+(JSR0-JSR)]×10

[0171] When the evaluation result E = {0,0,0,0}, R t >0, the waveform optimization AI will learn to achieve a higher maximum peak value ratio and a lower interference-to-signal ratio, and learn a better coding strategy by aiming for a higher maximum peak value ratio and a lower interference-to-signal ratio.

[0172] Markov processes are constructed based on a set of interactive objects: a waveform design agent and a waveform evaluation environment. The elements include state, action, policy, and reward. In the simulation of a Markov process, the agent perceives the current system state, performs actions on the environment according to the policy, thereby changing the state of the environment and receiving a reward.

[0173] S6. Train the waveform optimization agent using the DDPG algorithm to obtain the final waveform optimization agent. The specific process is as follows:

[0174] S61. Set the learning parameters for the DDPG algorithm, including the experience replay pool D, the reward discount factor γ, and the learning rate r.

[0175] S62. Initialize the experience replay pool Capacity D max .

[0176] S63. Establish policy neural networks respectively. Value Neural Network Q(s,a|θ), Policy Neural Network Target Network The target network of the value neural network is Q'(s',a'|θ'). The policy neural network is... The target network of the policy neural network is responsible for selecting the current action a based on the current state s, and interacting with the environment to generate the next state s' and the current reward value R. The network parameters are responsible for selecting the optimal next action a' based on the next state s' sampled from the experience replay pool. Regularly from The value neural network Q(s,a|θ) is responsible for calculating the current Q value Q(s,a); the target network Q'(s',a'|θ') of the value neural network is responsible for calculating the Q'(s',a') part of the target Q value, and the network parameter θ' is periodically copied from θ.

[0177] Initialize the parameters of the policy neural network and the value neural network respectively. θ, and the parameters of the target network of the policy neural network and the target network of the value neural network. θ', so that Q'←θ',

[0178] S64. Design the state S of the agent based on the waveform at a certain moment in S5. t The input policy neural network is configured according to the initialization parameters of the policy neural network. θ obtains the exploration strategy The waveform design agent's state is used to output the current action A of the waveform design agent through an exploration strategy. t Then initialize the noise distribution N, where the noise is a random number for each action A. t Adding noise introduces randomness to actions, enhancing the agent's exploratory capabilities. Then, the current action A... t For A t =μ θ (s)+N t N t This represents the noise added by the current action. The waveform design agent will execute the current action A. t Receive the current reward value R t And the state S at the next moment. t+1 ;

[0179] S65, will (S t A t ,R t ,S t+1 ) is stored in the experience replay pool D of S62, and at the same time, (S t A t ,R t,S t+1 The policy neural network, value neural network, target network of the policy neural network, and target network of the value neural network are trained. During the training process, state-action pairs (S... t A t Inputting the value into a value neural network yields the state-action value Q. θ Then the parameters of the policy neural network are determined by Q. θ Update using gradient ascent strategy:

[0180]

[0181] In the formula: Let represent the gradient operator, J represent the loss function of the policy neural network, M represent the random training batch, and Q(·) and μ(·) represent the target network and policy neural network of the value neural network, respectively. This represents the parameters of the policy neural network.

[0182] The parameters of the value neural network are updated by minimizing the mean-squared Bellman error (MSBE) loss function through a stochastic gradient descent strategy.

[0183]

[0184] In the formula: R represents the reward value, and Q'(·) and μ'(·) represent the target network of the value neural network and the target network of the policy neural network, respectively.

[0185] The parameter optimization for both target networks described above is achieved through delayed updates of the online network, as shown below:

[0186] θ'=ρθ'+(1-ρ)θ

[0187]

[0188] In the formula: θ represents the parameters of the policy neural network and the value neural network. θ′ represents the parameters of the target network of the policy neural network and the target network of the value neural network, and ρ represents the discount factor, which is a constant.

[0189] S66. Current action A obtained from S64 tObtain the high-dimensional feature update value, input the high-dimensional feature update value into the decoder of the autoencoder trained in S3, and output the impulse response of the corresponding interference signal generation filter. Convolve the impulse response with the radar signal intercepted in S1 to obtain the interference signal. Evaluate the interference signal using the interference effect evaluation system. If the real target is missed at this time and the interference interval leads the real target, end the training and obtain the final waveform optimization agent, as well as the final policy neural network, value neural network, target network of policy neural network and target network of value neural network. Otherwise, continue to execute S64-S66.

[0190] Due to (S) t A t ,R t ,S t+1 As time progresses, the four networks mentioned above are constantly updated. During training, training stops when the interference signal obtained by S66 satisfies the condition of failing to detect the real target, and the interference interval leads the real target. At this point, the training phase of the waveform optimization agent using the DDPG algorithm is complete, and the neural network parameters within the DDPG algorithm have been trained. Based on the autoencoder, the DDPG algorithm is used to optimize high-dimensional features, closely integrating with the actual environment to enhance the interference effect. The final waveform optimization agent selects the optimal high-dimensional feature encoding using a statistical feature (probability distribution), i.e., selecting the high-probability high-dimensional feature encoding. It may also select other encodings, but these will not be explicitly shown.

[0191] S7. In practical applications, radar signals and interference signals are obtained. The impulse response of the interference signal generation filter is obtained based on the radar signals and interference signals. The impulse response of the interference signal generation filter is input into the trained autoencoder obtained in S3 to obtain the high-dimensional features of the impulse response and interference signal generation filter 1.

[0192] The radar signal is passed through the interference signal generation filter 1 to output a new interference signal. The interference signal is evaluated using the interference effect evaluation system of S4 to obtain the evaluation result.

[0193] The waveform design agent is obtained by taking the high-dimensional features of the impulse response and the evaluation results as state input S6, and then outputting the action of the waveform design agent, i.e., the updated high-dimensional features.

[0194] The updated high-dimensional features are input into the decoder of the trained autoencoder obtained by S3, and the output interference signal generation filter 2 is generated.

[0195] The radar signal is passed through the interference signal generation filter 2 to output interference signal I. The interference waveform is obtained based on interference signal I, and the interference waveform design is completed.

[0196] The design results of the interference waveform generated by the autoencoder using the DDPG algorithm are as follows: Figures 3-5 As shown, Figure 3 It is an interference waveform generated by an autoencoder. Figure 4 It is the pulse compression result of the echo signal passing through the matched filter. Figure 5 These are the detection results from the CFAR detector. Figure 6 This shows how the waveform optimization gains change with the number of training iterations.

[0197] This invention focuses on verifying the effectiveness of interference to radar receivers. For the electronic warfare interference radar detection stage, it generates cognitive interference waveforms based on the real-time feedback of a constant false alarm rate detector. The waveforms obtained by this method have a high degree of freedom in the time and frequency domains and are more adaptable to the environment.

[0198] This invention targets interference in the signal detection section, representing the design process of the interference waveform as an interference signal generation filter. The filter coefficients are adjusted using an autoencoder network and the DDPG algorithm, thereby optimizing the effective interference range and interference-to-signal ratio (ISR) evaluation metrics to achieve adaptive interference waveform design. Simultaneously, to enhance the environmental adaptability of the interference waveform, it optimizes the traditional interference waveform based on actual interference effects. Reinforcement learning introduces environmental feedback to enhance environmental adaptability; the autoencoder extracts high-dimensional features, reducing computational complexity. For example, extracting 10-bit high-dimensional features from 3000-bit interference signal generation filter coefficient data can be achieved with reinforcement learning controlling only 10 bits, whereas controlling 3000 bits would significantly increase computational difficulty.

Claims

1. A method for designing interference waveforms based on an autoencoder and the DDPG algorithm, characterized in that: It includes the following steps: S1. Construct an interference signal generation filter, taking Gaussian white noise and radar signal as inputs, and outputting an interference signal. The specific process is as follows: Gaussian white noise is acquired using a jammer, and then processed to obtain processed Gaussian white noise. At the same time, the jammer intercepts a segment of radar signal. Based on the processed Gaussian white noise and the radar signal, a smart noise forwarding jamming method is used to obtain the jamming signal. S2. Using different modulation methods of the jammer, the jamming signal obtained in S1 is used to generate different jamming signals, and all the generated jamming signals are combined into an interference waveform sample set. S3. Obtain the impulse response of the corresponding interference signal generation filter by performing a deconvolution operation on each interference signal in the radar signal and interference waveform sample set, thus obtaining the corresponding interference signal generation filter. Combine all the interference signal generation filters into an interference signal generation filter sample set. S4. Construct an autoencoder, which consists of an input layer, a hidden layer, and an output layer. The input layer to the hidden layer is used as the encoder, and the hidden layer to the output layer is used as the decoder. Each impulse response in the sample set of the interference signal generation filter is input into the encoder of the autoencoder to obtain the high-dimensional features of the corresponding impulse response. The high-dimensional features are input into the decoder to output the corresponding recovered impulse response, i.e., the new interference signal generation filter. The autoencoder is trained based on the above process until the MSE error no longer decreases, and the trained autoencoder, as well as the high-dimensional features of each impulse response and the recovered impulse response are obtained. S5. Establish an interference effect evaluation system, which includes constant false alarm rate (CFAR) detection evaluation and power evaluation. CFAR detection evaluation includes the distribution area of ​​false alarm targets, i.e., the limited interference range, whether real targets are missed, and the peak energy characteristics of false alarm targets. Power evaluation includes the power ratio of interference signal and radar signal. The radar signal intercepted in S1 is convolved with each recovered impulse response obtained in S4 to obtain multiple interference signals. The interference effect evaluation system is used to evaluate each interference signal to obtain the corresponding evaluation results. S6. Based on the evaluation results obtained in S5, find the evaluation results corresponding to the high-dimensional features of each impulse response obtained in S4. Perform Markov modeling based on all high-dimensional features and their corresponding evaluation results to obtain the waveform design agent. S7. Train the waveform optimization agent using the DDPG algorithm to obtain the final waveform optimization agent; S8. Obtain radar signal and interference signal, obtain impulse response of interference signal generation filter based on radar signal and interference signal, input impulse response of interference signal generation filter into the trained autoencoder obtained in S4, and obtain high-dimensional features of impulse response and interference signal generation filter 1. The radar signal is passed through the interference signal generation filter 1 to output a new interference signal. The interference signal is evaluated using the interference effect evaluation system of S5 to obtain the evaluation result. The waveform design agent is obtained by taking the high-dimensional features of the impulse response and the evaluation results as state input S7, and then outputting the action of the waveform design agent, i.e., the updated high-dimensional features. The updated high-dimensional features are input into the decoder of the trained autoencoder obtained by S4, and the output interference signal generation filter 2 is generated. The radar signal is passed through the interference signal generation filter 2 to output interference signal I. The interference waveform is obtained based on interference signal I, and the interference waveform design is completed.

2. The interference waveform design method based on autoencoder and DDPG algorithm as described in claim 1, characterized in that: The specific process of S1 is as follows: S11. Obtain Gaussian white noise using an interference machine. : in, The number of sampling points is equal to the length of the Gaussian white noise. The power of the noise; S12, the jammer intercepted a radar signal. Determine radar signal Intermediate frequency is bandwidth is Pulse width is Based on the aforementioned three parameters, a Butterworth bandpass filter was designed and obtained; S13. Process the Gaussian white noise through a Butterworth filter to obtain... : in, This refers to the stopband frequency band of the Butterworth bandpass filter. , It is the transition bandwidth between the passband and the stopband; This refers to the passband frequency band of the Butterworth bandpass filter. ; It is a parameter that describes the passband ripple; It is a parameter that describes stopband attenuation; This represents the setting function for the Butterworth filter; S14, will Multiply the time domain by the amplitude of a normal distribution. Modulated Gaussian white noise is obtained. : in, , obey The normal distribution The length is the length of the Gaussian white noise. , Indicates variance; S15, will and Perform a convolution operation in the time domain to obtain the interference signal. : in, Indicates the convolution operation; Using a cyclic shift operation to superimpose interference signals This generates the final interference signal. : in, The shift interval for the interference signal. The number of shifts for the interference signal. It is a cyclic shift function, which represents the shifting of the interference signal. The data in the loop is moved to the right. point.

3. The interference waveform design method based on autoencoder and DDPG algorithm as described in claim 2, characterized in that: The specific process of S2 is as follows: Based on S14-S15, through n sets of modulation parameters Generate n sets of interference signals with different interference effects. Each interference signal is an interference waveform sample. Combine all interference waveform samples into an interference waveform sample set. .

4. The interference waveform design method based on autoencoder and DDPG algorithm as described in claim 3, characterized in that: The specific process of S3 is as follows: Radar signals respectively and each interference signal in the interference waveform sample set Perform a Fourier transform to obtain the frequency domain of the radar signal. and the corresponding interference signal frequency domain According to the time-domain convolution theorem, each interference signal is sequentially converted into its frequency domain. and radar signal frequency domain Dividing them yields the frequency domain of the impulse response of the corresponding interference signal generator filter. , each Performing an inverse Fourier transform yields the impulse response of the corresponding interference signal generation filter. This yields the corresponding interference signal generation filter, and all interference signal generation filters are combined into an interference signal generation filter sample set. .

5. The interference waveform design method based on autoencoder and DDPG algorithm as described in claim 4, characterized in that: When training the autoencoder, S4 uses the mean square error function to minimize the error between the input and output data of the autoencoder: in, This represents the error between the input and output data. The weights represent the weights of the autoencoder. This represents the MSE function. This represents the input data of the autoencoder. This represents the output data of the autoencoder; Simultaneously, by adding KL divergence constraints to neurons or weights in the hidden layer within the autoencoder's loss function, then assuming the given input data is... In this case, Let represent the activation value of neuron j in the hidden layer. in, This represents the average activation value of neuron j in the hidden layer across all input data. It is the number of input data; Sparse penalty term based on KL divergence function: in, It is a sparsity parameter; The loss function of the autoencoder with added sparsity is: in, is the weight of the sparsity penalty factor, and m is the number of neurons in the hidden layer.

6. The interference waveform design method based on autoencoder and DDPG algorithm as described in claim 5, characterized in that: In step S5, the radar signal intercepted in S1 is convolved with each recovered impulse response obtained in S4 to obtain multiple interference signals. Each interference signal is then evaluated using an interference effect evaluation system to obtain the corresponding evaluation result. The specific process is as follows: The radar signals intercepted by the jammer in S1 are convolved with each recovered impulse response obtained in S4 to obtain multiple jamming signals. Each jamming signal is then evaluated using a jamming effectiveness evaluation system to obtain the corresponding evaluation results. ,in, This indicates that the true target was missed. This indicates that the real target has been detected. Indicates a finite interval of interference. This represents the ratio of the maximum peak value of the pulse-compressed echo signal to the peak value of the target signal. The echo signal can be either an interference signal or a radar signal. Indicates the echo signal power ratio; Discretize the evaluation results: (1) Detection results of real targets When the real target is missed ,on the contrary ; (2) Effective Interference Range in, Indicates the position of the first false target. This indicates the location of the real target. If the first false target is located in front of the real target, then... ,but ;on the contrary ; (3) The ratio of the maximum peak value of the echo signal after pulse compression to the peak value of the target signal. in, This indicates the maximum peak value after pulse compression of the echo signal. Indicates the peak value of the target signal. This is a pre-set maximum peak value ratio threshold. If the maximum peak value ratio exceeds or equals the maximum peak value ratio threshold, then... ;on the contrary The maximum peak value ratio is the maximum value of the ratio between the maximum peak value after pulse compression of the echo signal and the peak value of the target signal. (4) Echo signal power ratio in, Indicates the power of the interference signal. Indicates radar signal power. It is a preset power ratio threshold. If the power ratio is less than or equal to the power ratio threshold, then... ;on the contrary ; Based on the above processing, the interference effect of the current interference signal can be determined by a set of... The one-dimensional array representation is used to represent the final evaluation result as the discrete evaluation result.

7. The interference waveform design method based on autoencoder and DDPG algorithm as described in claim 6, characterized in that: The specific process of S6 is as follows: Based on the evaluation results obtained in S5, find the evaluation results corresponding to the high-dimensional features of each impulse response obtained in S4. Take the high-dimensional features of each impulse response in the previous moment and the corresponding evaluation results as the state of the waveform design agent, take updating the high-dimensional features in the current moment as the action of the waveform design agent, and take the evaluation results corresponding to the current high-dimensional features as the reward of the waveform design agent. Thus, the initialized waveform design agent can be obtained.

8. The interference waveform design method based on autoencoder and DDPG algorithm as described in claim 7, characterized in that: The reward for the waveform design agent is determined based on the desired interference effect: When the evaluation results hour, The waveform optimization agent aims for a higher maximum peak value ratio and a lower interference-to-signal ratio, and learns a better coding strategy in the direction of increasing reward value.

9. The interference waveform design method based on autoencoder and DDPG algorithm as described in claim 8, characterized in that: The specific process of S7 is as follows: S71. Set the learning parameters of the DDPG algorithm, including the experience replay pool. Reward Discount Factor and learning rate ; S72. Initialize the experience replay pool The capacity is ; S73. Establish policy neural networks respectively. Value Neural Network The target network of the policy neural network and the target network of the value neural network ; Initialize the parameters of the policy neural network and the value neural network respectively. , And the parameters of the target network of the policy neural network and the target network of the value neural network. , ,make , ; S74. The waveform at a certain moment in S6 is used to design the state of the agent. The input policy neural network is configured according to the initialization parameters of the policy neural network. , Exploration strategy The waveform design agent's state is used to output the current action of the waveform design agent through an exploration strategy. Then initialize the noise distribution N, where the noise is a random number for each action. Adding noise will change the current action. for , This indicates the noise added by the current action, and the waveform design agent will execute the current action. Receive the current reward value And the state in the next moment. ; S75, will Experience replay pool stored in S72 In the middle, at the same time, utilizing The policy neural network, value neural network, target network of the policy neural network, and target network of the value neural network are trained. During the training process, state-action pairs are used. Inputting into a value neural network yields state-action value. The parameters of the policy neural network are then determined by... Update using gradient ascent strategy: In the formula: Represents the gradient operator. This represents the loss function of the policy neural network. Indicates a random training batch. These represent the target network and policy network of the value neural network, respectively. Represents the parameters of the policy neural network; The parameters of the value neural network are updated by minimizing the mean squared Bellman error loss function using a stochastic gradient descent strategy: In the formula: Indicates the reward value. These represent the target networks of the value neural network and the policy neural network, respectively. The parameter optimization of both target networks mentioned above is achieved through delayed updates of the online network: In the formula: , The parameters represent the policy neural network and the value neural network. , The parameters represent the target network of the policy neural network and the target network of the value neural network. This represents the discount factor, which is a constant. S76. Current action obtained from S74 Obtain the high-dimensional feature update value, input the high-dimensional feature update value into the decoder of the autoencoder trained in S4, output the impulse response of the corresponding interference signal generation filter, convolve the impulse response with the radar signal intercepted in S1 to obtain the interference signal, evaluate the interference signal using the interference effect evaluation system, if the real target is missed and the interference interval leads the real target, end the training and obtain the final waveform optimization agent, otherwise continue to execute S74-S76.