A real-time radar signal processing system based on high-speed acquisition and heterogeneous parallelism
By combining a high-speed data acquisition and parsing module with a CPU and GPU heterogeneous parallel processing module, the real-time performance of existing radar signal processing systems under high data volume and high frame rate conditions is solved, achieving efficient and real-time radar signal processing.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SABAT INTELLIGENT TECH (SHANDONG) CO LTD
- Filing Date
- 2026-02-11
- Publication Date
- 2026-06-23
AI Technical Summary
Existing radar signal processing systems suffer from insufficient real-time performance and poor versatility when faced with high data volume and high frame rate requirements. Furthermore, they suffer from high network communication latency, making it difficult to stably carry 10-gigabit-level raw radar data streams. Serial signal processing algorithms cannot meet the requirements for high-resolution real-time processing.
Employing a high-speed data acquisition and parsing module and a CPU and GPU heterogeneous parallel processing module, the system bypasses the operating system protocol stack through a 10 Gigabit fiber optic network card and a WinPcap parsing unit. Combined with the clear division of labor between the CPU and GPU, it achieves high-bandwidth, low-latency data acquisition and parallel computing. It introduces a channel data segmentation and rearrangement strategy and a parallel reduction integral graph acceleration algorithm to optimize FFT/IFFT calculation and constant false alarm rate detection.
It achieves high-bandwidth, low-latency acquisition of radar data, improves the parallel efficiency and computation speed of signal processing, solves the computational bottleneck of CFAR detection, and meets the requirements of multi-channel high-bandwidth data transmission and real-time processing.
Smart Images

Figure CN121679489B_ABST
Abstract
Description
Technical Field
[0001] This invention specifically relates to a real-time radar signal processing system based on high-speed acquisition and heterogeneous parallel processing, belonging to the field of real-time radar signal processing technology. Background Technology
[0002] Radar data processing here refers to the calculations performed by the radar on the target measurement data after obtaining the target's position and motion parameters (such as radial distance, radial velocity, azimuth, etc.) from the data acquisition unit, including interconnection, tracking, and filtering. For example, Chinese Patent Announcement No. CN102288941B discloses a real-time processing system for intermediate frequency LFM-PD radar signals based on FPGA and DSP and its implementation method, which consists of an intermediate frequency sampling module, a digital down-conversion module, a pulse compression module, a coherent accumulation module, a motion compensation module, and a constant false alarm detection module. This method can meet the requirements of real-time processing. Another example is Chinese Patent Announcement No. CN108802697B, which discloses a hybrid parallel processing method for pulse Doppler radar signals. The specific steps of implementation are: (1) initializing the processing thread of the multi-core CPU; (2) opening a radar signal processing thread on the multi-core CPU. (3) Radar system collects and stores data; (4) Data parsing; (5) Processing multi-channel data; (6) Synchronous parallel processing of multi-channel data. However, with the widespread application of radar technology in national defense, aerospace, meteorology and other fields, the number of radar channels and bandwidth are constantly increasing, resulting in an exponential increase in the amount of data at the signal processing end, which puts forward extremely high requirements for real-time processing capabilities. Existing radar signal processing systems mostly adopt FPGA+DSP architecture, which has the defects of insufficient real-time performance, poor versatility and difficulty in upgrading and maintenance. Traditional CPUs are inefficient and slow when processing high data volume signals alone. The network communication mode based on the operating system protocol stack has the problems of high latency and insufficient throughput, making it difficult to stably carry the 10-gigabit radar raw data stream. At the same time, serial signal processing algorithms cannot meet the real-time processing requirements of high frame rate and high resolution radar. Summary of the Invention
[0003] To address the aforementioned issues, this invention proposes a real-time radar signal processing system based on high-speed acquisition and heterogeneous parallelism. Through the coordinated operation of a high-speed data acquisition and analysis front-end and multi-layer parallel signal processing, the system achieves real-time processing requirements from data acquisition to target detection.
[0004] The real-time radar signal processing system based on high-speed acquisition and heterogeneous parallel processing of the present invention includes:
[0005] The high-speed data acquisition and analysis module includes a 10 Gigabit fiber optic network card and a WinPcap parsing unit. The 10 Gigabit fiber optic network card connects to the radar digital transceiver board via a QSFP interface to acquire raw IQ data from multiple radar channels. The WinPcap parsing unit bypasses the operating system protocol stack and interacts directly with the 10 Gigabit fiber optic network card driver through the driver program. It receives raw, unprocessed binary data streams, performs data fragment capture, data fragment filtering, and data fragment sorting, and outputs complete azimuth frame data. The use of a 10 Gigabit fiber optic network card (QSFP interface) and the WinPcap protocol constructs a high-speed data channel directly to the network card driver layer, completely avoiding the performance overhead and instability factors of the operating system protocol stack, ensuring stable acquisition of massive amounts of raw radar data streams with high bandwidth, low latency, and low packet loss rate.
[0006] The CPU and GPU heterogeneous parallel processing module is connected via a PCIe bus. The high-speed data acquisition and parsing module communicates with the CPU and GPU heterogeneous parallel processing module. The CPU module is responsible for system control, task scheduling, and handling complex logical tasks, while the GPU module handles computationally intensive parallel data operations. The GPU module includes a pulse compression parallel module and a constant false alarm rate (CFAR) detection parallel module. The pulse compression parallel module employs a channel data segmentation and rearrangement strategy, padding and rearranging multi-channel data and reference signals with zeros, then performing FFT, complex dot multiplication, and IFFT operations in parallel across multiple streams to output the pulse compression result. The CFAR detection parallel module uses an integral graph acceleration algorithm based on parallel reduction to construct a prefix sum matrix, summing the reference window through constant-time queries, dynamically calculating the detection threshold, and completing target decision-making.
[0007] When the CPU and GPU heterogeneous parallel processing modules process the pulse compression stage, a channel data segmentation and rearrangement strategy is used to tap the parallel potential of single-channel data on the basis of multi-channel parallelism, optimize the data access mode of FFT / IFFT calculation, and greatly improve the parallel efficiency of frequency domain pulse compression. In the constant false alarm rate (CFAR) detection process, an integral graph acceleration algorithm based on parallel reduction is introduced. This algorithm significantly reduces the computational complexity of traditional sliding window summation and realizes constant-time lookup for summation of arbitrary rectangular reference windows. When the data matrix and reference window are large, hundreds or thousands of times of performance acceleration can be achieved, solving the computational bottleneck of CFAR detection.
[0008] The specific processing is as follows: First, the system acquires multi-channel radar raw IQ data at high speed through a 10 Gigabit fiber optic interface, and uses WinPcap protocol technology to achieve stable, low-latency parsing and reconstruction of the data; then, the processing flow enters the core multi-level parallel computing stage: on the one hand, a channel data segmentation and rearrangement strategy is adopted for pulse compression to maximize parallel efficiency; on the other hand, an integral graph acceleration algorithm based on parallel reduction is introduced for constant false alarm detection, which transforms massive window summation into constant-time query, thereby realizing real-time and efficient processing of radar echo signals from acquisition and parsing to pulse compression and constant false alarm detection.
[0009] Furthermore, the WinPcap parsing unit works as follows: First, it captures data fragments by directly interacting with the network card driver interface through the WinPcap core driver, copying all data packets flowing through the 10 Gigabit fiber optic network card to the buffer; then, it filters the data fragments based on the Ethernet frame header, IP frame header, and TCP frame header to obtain valid information; finally, it sorts the data fragments based on the data offset and parses the complete azimuth frame based on the IQ frame header and frame tail.
[0010] Furthermore, the pulse compression parallel module operates as follows: First, CPU memory is allocated to store the radar's multi-channel data and reference signal data; the multi-channel data includes sum channel, azimuth difference channel, and elevation difference channel data; next, the cudaMemcpy() function is established, GPU memory space is allocated through cudaMemcpy(), and the multi-channel data and reference signal data are output to the GPU via the PCIe bus; zeros are padded to each beam data in each single channel to satisfy powers of 2; then, the data of each channel and the reference signal data are rearranged horizontally in the range dimension; multiple streams are created through the cudaStreamCreate() function, and the cufftExecZ2Z() function is called to perform FFT operations. Multiple thread blocks are opened in the x and y directions of the thread grid, and complex dot multiplication is performed on the FFT operation results. The cufftExecZ2Z() function is called again to perform IFFT operations. Then, the __syncthreads() function is used to complete stream synchronization, and the pulse compression results of each channel are output. Finally, the GPU memory is released through the cudaFree() function.
[0011] Furthermore, the FFT operation process is as follows:
[0012] Frequency domain pulse compression performs FFT operations on the echo signal and the reference signal separately:
[0013] (1)
[0014] in, The FFT result of the echo signal, The FFT result is for the reference signal, where n is the discrete time point, N is the signal length, and k is the frequency point.
[0015] Next, the two are multiplied in the frequency domain and then subjected to IFFT to obtain the pulse compression result. The specific calculation is as follows:
[0016] (2)
[0017] in, This is the pulse pressure result. This represents the dot product operation for complex numbers.
[0018] Furthermore, the CPU and GPU heterogeneous parallel processing module also includes a moving target display unit and a moving target detection unit. The moving target display unit receives the pulse compression result and performs clutter suppression. The moving target detection unit performs Doppler filtering on the suppressed signal and outputs it to the constant false alarm detection parallel module.
[0019] Furthermore, the operation process of the constant false alarm rate (CFAR) detection parallel module is as follows: First, clutter estimation is calculated, specifically as follows:
[0020] (3)
[0021] Where Z is the clutter estimate, For statistical estimation functions, Let N be the set of reference cell power values, and N be the total number of reference cells. When using the cell-averaged CFAR algorithm, the clutter estimate satisfies the formula:
[0022] (4)
[0023] in, The threshold factor, which is the cell-averaged clutter estimate, satisfies the following formula:
[0024] (5)
[0025] in, Threshold factor The false alarm probability is given; the adaptive threshold satisfies the formula:
[0026] (6)
[0027] Where T is the adaptive threshold; the binary decision satisfies the formula:
[0028] (7)
[0029] Where D represents the judgment result. This represents the power value of the unit to be tested.
[0030] Furthermore, the construction of the prefix sum matrix is as follows:
[0031] (8)
[0032] in, For prefix sum matrices in coordinates The value at that location, The original data matrix in coordinates The values at each position, i, j, p, and q, are matrix coordinate indices; the summation result of any rectangular reference window satisfies the formula:
[0033] (9)
[0034] in, Summation results for reference window Use the coordinates of the top left corner of the reference window. The coordinates are for the bottom right corner of the reference window.
[0035] Furthermore, when the constant false alarm detection parallel module is working, 128×512 thread blocks are opened in the thread grid unit of the GPU module, and 7×7 threads are opened in each thread block; 8 protection units are set around the detection unit, and 40 reference units are set around the protection units.
[0036] Compared with existing technologies, the real-time radar signal processing system based on high-speed acquisition and heterogeneous parallel processing of the present invention has the following advantages:
[0037] 1. The high-speed data acquisition and analysis module adopts a 10 Gigabit fiber optic network card (QSFP interface) and WinPcap protocol, bypassing the operating system protocol stack, to achieve low latency, high bandwidth and low packet loss rate acquisition and analysis of radar raw data. The theoretical transmission rate reaches 1139.32Mbps, which meets the requirements of multi-channel high bandwidth data transmission.
[0038] 2. It adopts a heterogeneous parallel architecture of CPU and GPU, with clear division of labor between CPU and GPU, giving full play to the logic control advantages of CPU and the parallel computing capabilities of GPU, and greatly improving signal processing efficiency;
[0039] 3. The pulse compression parallel module adopts a channel data segmentation and rearrangement strategy to optimize data access mode and improve parallel computing efficiency; the constant false alarm rate (CFAR) detection parallel module introduces a parallel reduction integral graph acceleration algorithm, clearly defining the symbols and physical meanings of formulas such as clutter estimation and threshold calculation, reducing the computational complexity of traditional sliding window summation from... Down to This achieves performance acceleration of hundreds or even thousands of times. Attached Figure Description
[0040] Figure 1 This is a flowchart of the multi-channel radar signal processing of the real-time radar signal processing system of the present invention.
[0041] Figure 2 This is a schematic diagram of data acquisition for the real-time radar signal processing system of the present invention.
[0042] Figure 3 This is a schematic diagram of data parsing for the real-time radar signal processing system of the present invention.
[0043] Figure 4 This is a schematic diagram illustrating the heterogeneous division of labor between the CPU and GPU in the real-time radar signal processing system of this invention.
[0044] Figure 5 This is a schematic diagram of the signal processing flow of the real-time radar signal processing system of the present invention.
[0045] Figure 6 This is a schematic diagram of the pulse compression parallelization processing of the real-time radar signal processing system of the present invention.
[0046] Figure 7 This is a schematic diagram illustrating the constant false alarm rate (CFAR) detection principle of the real-time radar signal processing system of the present invention.
[0047] Figure 8 This is a schematic diagram of the GPU thread strategy of the real-time radar signal processing system of the present invention.
[0048] Figure 9 This is a schematic diagram of the prefix sum matrix of the real-time radar signal processing system of the present invention.
[0049] Figure 10 This is a rectangular data schematic diagram of the real-time radar signal processing system of the present invention. Detailed Implementation
[0050] Example 1:
[0051] like Figures 1 to 5 The real-time radar signal processing system shown, based on high-speed acquisition and heterogeneous parallel processing, includes:
[0052] The high-speed data acquisition and analysis module includes a 10 Gigabit fiber optic network card and a WinPcap parsing unit. The 10 Gigabit fiber optic network card connects to the radar digital transceiver board via a QSFP interface to acquire raw IQ data from multiple radar channels. The WinPcap parsing unit bypasses the operating system protocol stack and interacts directly with the 10 Gigabit fiber optic network card driver through the driver program. It receives raw, unprocessed binary data streams, performs data fragment capture, data fragment filtering, and data fragment sorting, and outputs complete azimuth frame data. The use of a 10 Gigabit fiber optic network card (QSFP interface) and the WinPcap protocol constructs a high-speed data channel directly to the network card driver layer, completely avoiding the performance overhead and instability factors of the operating system protocol stack, ensuring stable acquisition of massive amounts of raw radar data streams with high bandwidth, low latency, and low packet loss rate.
[0053] The CPU and GPU heterogeneous parallel processing module is connected via a PCIe bus. The high-speed data acquisition and parsing module communicates with the CPU and GPU heterogeneous parallel processing module. The CPU module is responsible for system control, task scheduling, and handling complex logical tasks, while the GPU module handles computationally intensive parallel data operations. The GPU module includes a pulse compression parallel module and a constant false alarm rate (CFAR) detection parallel module. The pulse compression parallel module employs a channel data segmentation and rearrangement strategy, padding and rearranging multi-channel data and reference signals with zeros, then performing FFT, complex dot multiplication, and IFFT operations in parallel across multiple streams to output the pulse compression result. The CFAR detection parallel module uses an integral graph acceleration algorithm based on parallel reduction to construct a prefix sum matrix, summing the reference window through constant-time queries, dynamically calculating the detection threshold, and completing target decision-making.
[0054] When the CPU and GPU heterogeneous parallel processing modules process the pulse compression stage, a channel data segmentation and rearrangement strategy is used to tap the parallel potential of single-channel data on the basis of multi-channel parallelism, optimize the data access mode of FFT / IFFT calculation, and greatly improve the parallel efficiency of frequency domain pulse compression. In the constant false alarm rate (CFAR) detection process, an integral graph acceleration algorithm based on parallel reduction is introduced. This algorithm significantly reduces the computational complexity of traditional sliding window summation and realizes constant-time lookup for summation of arbitrary rectangular reference windows. When the data matrix and reference window are large, hundreds or thousands of times of performance acceleration can be achieved, solving the computational bottleneck of CFAR detection.
[0055] like Figure 1 and Figure 2As shown, the system processes data as follows: First, the system acquires high-speed multi-channel radar raw IQ data through a 10 Gigabit fiber optic interface, and uses WinPcap protocol technology to achieve stable, low-latency parsing and reassembly of the data. Subsequently, the processing flow enters the core multi-level parallel computing stage: on the one hand, a channel data segmentation and rearrangement strategy is adopted for pulse compression to maximize parallel efficiency; on the other hand, an integral graph acceleration algorithm based on parallel reduction is introduced for constant false alarm rate (CFAR) detection, which transforms massive window summation into constant-time query, thereby realizing real-time and efficient processing of radar echo signals from acquisition and parsing to pulse compression and CFAR detection.
[0056] like Figure 3 As shown, the WinPcap parsing unit works as follows: First, it captures data fragments by directly interacting with the network card driver interface through the WinPcap core driver, copying all data packets flowing through the 10 Gigabit fiber optic network card to the buffer; then, it filters the data fragments based on the Ethernet frame header, IP frame header, and TCP frame header to obtain valid information; finally, it sorts the data fragments based on the data offset and parses the complete azimuth frame based on the IQ frame header and frame tail.
[0057] like Figure 4 and Figure 5 As shown, the working process of the pulse compression parallel module is as follows: First, CPU memory is allocated to store the radar's multi-channel data and reference signal data; the multi-channel data includes sum channel, azimuth difference channel, and elevation difference channel data; next, the cudaMemcpy() function is established, GPU memory space is allocated through cudaMemcpy() function, and the multi-channel data and reference signal data are output to the GPU through the PCIe bus; zeros are padded to each beam data in each single channel to satisfy powers of 2; then, the data of each channel and the reference signal data are rearranged horizontally in the range dimension; multiple streams are created through the cudaStreamCreate() function, the cufftExecZ2Z() function is called to perform FFT operation, multiple thread blocks are opened in the x and y directions of the thread grid, complex dot multiplication is performed on the FFT operation results, the cufftExecZ2Z() function is called again to perform IFFT operation, then the __syncthreads() function is used to complete stream synchronization, the pulse compression result of each channel is output, and finally the GPU memory is released through the cudaFree() function.
[0058] The FFT operation process is as follows:
[0059] Frequency domain pulse compression performs FFT operations on the echo signal and the reference signal separately:
[0060] (1)
[0061] in, The FFT result of the echo signal, The FFT result is for the reference signal, where n is the discrete time point, N is the signal length, and K is the frequency point.
[0062] Next, the two are multiplied in the frequency domain and then subjected to IFFT to obtain the pulse compression result. The specific calculation is as follows:
[0063] (2)
[0064] in, This is the pulse pressure result. This represents the dot product operation for complex numbers.
[0065] The CPU and GPU heterogeneous parallel processing module also includes a moving target display unit and a moving target detection unit. The moving target display unit receives the pulse compression result and performs clutter suppression. The moving target detection unit performs Doppler filtering on the suppressed signal and outputs it to the constant false alarm detection parallel module.
[0066] The constant false alarm rate (CFAR) detection parallel module works as follows: First, it calculates the clutter estimate, as follows:
[0067] (3)
[0068] Where Z is the clutter estimate, For statistical estimation functions, Let N be the set of reference cell power values, and N be the total number of reference cells. When using the cell-averaged CFAR algorithm, the clutter estimate satisfies the formula:
[0069] (4)
[0070] in, The threshold factor, which is the cell-averaged clutter estimate, satisfies the following formula:
[0071] (5)
[0072] in, Threshold factor The false alarm probability is given; the adaptive threshold satisfies the formula:
[0073] (6)
[0074] Where T is the adaptive threshold; the binary decision satisfies the formula:
[0075] (7)
[0076] Where D represents the judgment result. This represents the power value of the unit to be tested.
[0077] The specific steps for constructing the prefix sum matrix are as follows:
[0078] (8)
[0079] in, For prefix sum matrices in coordinates The value at that location, The original data matrix in coordinates The values at each position, i, j, p, and q, are matrix coordinate indices; the summation result of any rectangular reference window satisfies the formula:
[0080] (9)
[0081] in, Summation results for reference window Use the coordinates of the top left corner of the reference window. The coordinates are for the bottom right corner of the reference window.
[0082] When the constant false alarm detection parallel module is working, 128×512 thread blocks are opened in the thread grid unit of the GPU module, and 7×7 threads are opened in each thread block; 8 protection units are set around the detection unit, and 40 reference units are set around the protection units.
[0083] Example 2:
[0084] like Figures 6 to 10 The real-time radar signal processing system shown is based on high-speed acquisition and heterogeneous parallel processing.
[0085] It includes a high-speed data acquisition and analysis module and a CPU and GPU heterogeneous parallel processing module, which are connected via a PCIe bus.
[0086] The high-speed data acquisition and analysis module is implemented as follows: A 10 Gigabit fiber optic network card is connected to the radar digital transceiver board (radar signal processor) using a QSFP interface to acquire 3 channels (sum channel, azimuth difference channel, and elevation difference channel) of raw radar IQ data. The radar parameters are strictly selected according to the values in Table 1 below:
[0087] Table 1: Radar Parameter Table
[0088]
[0089] The radar signal processor calculates the number of range cells for the incoming IQ data, based on one azimuth frame, as follows: Where R represents radar power, R cellFor range-resolved units; according to the parameters in Table 1, the resources occupied by a single-channel azimuth frame are calculated as follows:
[0090]
[0091] Based on radar data, the theoretical transmission rate is:
[0092]
[0093] The high-speed data acquisition and analysis module transmits data from the acquisition module to the CPU memory via a driver program. The theoretical transmission rate is calculated using the formula above. This invention employs a 10 Gigabit fiber optic network card, connected to the data acquisition board via a high-speed QSFP fiber optic interface. A data output diagram is shown below. Figure 2 As shown.
[0094] The WinPcap parsing unit is implemented as follows:
[0095] 1. Data packet capture: The WinPcap core driver interacts directly with the network card driver to copy all data packets flowing through the network card to the buffer, avoiding intervention from the operating system protocol stack;
[0096] 2. Data Slice Filtering: Filter out valid radar data slices based on the target MAC address in the Ethernet frame header, the target IP address in the IP frame header, and the port number in the TCP frame header;
[0097] 3. Data slice sorting: The data slices are sorted according to the offset field. Combined with the synchronization word in the IQ frame header and the check word in the frame tail, the complete azimuth frame data is obtained. The single-channel azimuth frame resource is calculated to be 4.56MB according to the formula.
[0098] ;
[0099] The implementation details of the CPU and GPU heterogeneous parallel processing module are as follows:
[0100] The CPU is an Intel Core i9 series processor, responsible for system initialization, task allocation, data scheduling, and exception handling; the GPU is an NVIDIA Tesla V100 graphics card, containing 5120 CUDA cores, responsible for parallel data processing; the specific processing is as follows:
[0101] I. For example Figure 6 As shown, pulse compression and parallel processing are performed first:
[0102] 1. Allocate 2GB of CPU memory to store 3 channels of data and reference signal data. Allocate 2GB of GPU memory through the cudaMemcpy() function. Transfer the data in the CPU memory to the GPU memory through the PCIe 4.0 bus, with a transfer bandwidth of 32GB / s.
[0103] 2. Zero-padding of the 10 beam data and reference signal for each channel to 256 points (meeting the power of 2 requirement);
[0104] 3. After padding with zeros, rearrange the data horizontally according to the distance dimension, and rearrange the continuous distance unit data according to the thread block partitioning rules;
[0105] 4. Create 4 streams using the cudaStreamCreate() function, and call the cufftExecZ2Z() function to perform FFT operations on the 3-channel data and reference signal respectively. The thread grid allocates 128 thread blocks in the x-direction and 512 thread blocks in the y-direction, with each thread block containing 256 threads.
[0106] 5. According to the formula Perform a complex dot product on the FFT result to obtain the frequency domain product;
[0107] 6. Call the cufftExecZ2Z() function to perform an IFFT operation on the frequency domain product result, according to the formula. The time-domain pulse compression result is obtained;
[0108] 7. Call the __syncthreads() function to achieve synchronous waiting of the four streams, transfer the pulse compression result to CPU memory, and release the GPU memory through the cudaFree() function.
[0109] II. Implementation of Moving Target Display and Detection:
[0110] In radar signal detection, the power of background noise and clutter (such as ground objects, weather, and sea surface echoes) is not constant but dynamically changes with distance, azimuth, and time. Using a fixed threshold is not suitable for the variable environment. The fundamental goal of constant false alarm rate (CFAR) detection is to maintain a constant false alarm probability in an unknown and time-varying noise / clutter environment. For each range-Doppler cell to be detected, its dedicated detection threshold is dynamically and adaptively calculated based on the statistical characteristics of its surrounding cells. The specific calculation is as follows: The moving target display unit uses a two-dimensional adaptive clutter suppression algorithm to filter the pulse compression result and remove fixed clutter such as ground objects and weather. The moving target detection unit uses a Doppler filtering algorithm to perform spectral analysis on the clutter-suppressed signal to separate the moving target signal from the remaining clutter signal.
[0111] III. Parallel Processing of Constant and False Alarm Detection:
[0112] 1. Thread configuration: 128×512 thread blocks (corresponding to 128 Doppler channels and 512 distance units) are opened within the GPU thread grid unit. Each thread block opens 7×7 threads, covering the detection unit and the surrounding 8 protection units and 40 reference units.
[0113] 2. Prefix sum matrix construction: Construct the prefix sum matrix using GPU threads in parallel computation, according to the following formula;
[0114] ;
[0115] 3. Reference window summation: For any reference window corresponding to a detection unit, the power of the reference unit is calculated according to the following formula;
[0116] ;
[0117] 4. Threshold calculation: according to the formula Calculate the clutter estimate and the threshold factor using the following formula;
[0118] ;
[0119] Next, according to the formula Calculate the adaptive threshold;
[0120] 5. Target Judgment: According to the formula The detection unit makes a judgment, records the target's distance and orientation information, and finally outputs a single point data.
[0121] like Figure 7 As shown, the principle of constant false alarm rate (CFAR) detection is as follows: M protection units are set on both sides of the detection unit, and N reference units are set on both sides. A weighted average is calculated for the N reference units. Depending on the detection type, the clutter estimate Z is obtained by averaging the results on both sides and selecting the minimum or maximum value. Threshold factor The decision threshold is obtained by multiplying the clutter estimate Z by the threshold value. If the detection unit is greater than the threshold value, it is considered a target; otherwise, it is not a target. As can be seen from the above principle, each detection unit needs to determine a detection threshold. The reference data of the detection unit needs to be added sequentially. Moreover, the reference windows of adjacent pixels are mostly overlapping. If they are added sequentially, it is equivalent to adding the data of the overlapping area of each window again, which requires a very long waiting time. Therefore, the threshold calculation is completed by taking advantage of the multi-parallel processing characteristics of the GPU and combining it with the parallel reduction and summation algorithm.
[0122] like Figure 8As shown, taking single-channel data as an example, assuming there are 128 Doppler channels, each with 512 distance units, and using a rectangular window as an example, one unit outside the detection unit serves as a protection unit, and two units outside the protection unit serve as reference units, resulting in a total of 8 protection units and 40 reference units. The GPU threading strategy is designed such that 128×512 thread blocks are created within the thread grid unit, and 7×7 threads are created within each thread block. During parallel processing, a large task is decomposed into several smaller, concurrently run subtasks, which are then merged layer by layer to obtain the final sum. Specifically:
[0123] Construct the prefix sum matrix:
[0124] get Then, the data accumulation of any matrix region is achieved by the following formula:
[0125]
[0126] like Figure 9 and Figure 10 As shown, taking a single thread block of data as an example, when processing a prefix sum matrix, the computational complexity of serial accumulation and parallel reduction is analyzed. Figure 10 Taking the gray area in matrix data as an example, if serial accumulation is used, it requires 9 addition operations. Using parallel reduction, according to... , Substitute the coordinates into the following formula: ;
[0127] Following the parallel computing approach: completing the cumulative sum requires only 4 lookups and 3 additions / subtractions. For radar data, such as the size of the radar data matrix... CFAR reference window size In the traditional accumulation method, the reference windows of adjacent pixels mostly overlap. When calculating the reference window for each pixel, it is necessary to traverse the entire window. The computational complexity is O(n units, summed up). Where N is the side length of the radar data matrix (e.g., N×N is the size of the range-Doppler data matrix); R is the side length of the reference window in CFAR detection (e.g., R×R is the window size composed of reference cells); O(·) represents the order of magnitude, not the specific number of executions, but only reflects the trend of algorithm time consumption as the data scale increases; for the first stage of parallel computing, only one row-by-row and column-by-column scan accumulation of the entire image is required, and its computational complexity is In the second stage, when calculating the sum of its reference window for each pixel, loops are no longer needed; only four memory reads and three additions / subtractions are required, resulting in a final computational cost of... By constructing an integral map in the early stage in a one-time, relatively inexpensive manner, the massive and repetitive rectangular region summation requests in the later stage are all transformed into a few fixed-cost table lookups and additions and subtractions. When the number of points and windows to be processed is huge, the amount of computation saved is exponential, thus achieving a speedup of hundreds or thousands of times.
[0128] The above embodiments are merely preferred embodiments of the present invention. Therefore, all equivalent changes or modifications made to the structure, features and principles described in the claims of the present invention are included within the scope of the present invention.
Claims
1. A real-time radar signal processing system based on high-speed acquisition and heterogeneous parallel processing, characterized in that: include: A high-speed data acquisition and analysis module, comprising a 10 Gigabit fiber optic network card and a WinPcap parsing unit, wherein the 10 Gigabit fiber optic network card is connected to the radar digital transceiver board via a QSFP interface and is used to acquire raw IQ data from multi-channel radar. The WinPcap parsing unit bypasses the operating system protocol stack and interacts directly with the 10 Gigabit fiber optic network card driver through the driver program. It receives raw, unprocessed binary data streams, completes data fragment capture, data fragment filtering and data fragment sorting, and outputs complete azimuth frame data. The CPU and GPU heterogeneous parallel processing module is connected via a PCIe bus. The high-speed data acquisition and parsing module communicates with the CPU and GPU heterogeneous parallel processing module. The CPU module is responsible for system control, task scheduling, and handling complex logical tasks, while the GPU module is responsible for computationally intensive parallel data operations. The GPU module includes a pulse compression parallel module and a constant false alarm rate (CFAR) detection parallel module. The pulse compression parallel module employs a channel data segmentation and rearrangement strategy. After zero-padding and horizontal rearrangement of multi-channel data and reference signals, it performs FFT, complex dot multiplication, and IFFT operations in parallel across multiple streams to output the pulse compression result. The constant false alarm detection parallel module adopts an integral graph acceleration algorithm based on parallel reduction to construct a prefix sum matrix, realize the summation of the reference window through constant time query, and perform dynamic calculation of the detection threshold and complete the target decision. First, the system acquires high-speed multi-channel radar raw IQ data through a 10 Gigabit fiber optic interface, and performs data parsing and reconstruction using the WinPcap protocol. Subsequently, the processing flow enters the core multi-level parallel computing stage: on the one hand, a channel data segmentation and rearrangement strategy is adopted for pulse compression, and on the other hand, an integral graph acceleration algorithm based on parallel reduction is introduced for constant false alarm detection, transforming massive window summation into constant-time query, thereby realizing the entire process of radar echo signal processing from acquisition and parsing to pulse compression and constant false alarm detection.
2. The real-time radar signal processing system based on high-speed acquisition and heterogeneous parallel processing according to claim 1, characterized in that: The WinPcap parsing unit works as follows: First, it captures data fragments by directly interacting with the network card driver interface through the WinPcap core driver, copying all data packets flowing through the 10 Gigabit fiber optic network card to the buffer; then, it filters the data fragments based on the Ethernet frame header, IP frame header, and TCP frame header to obtain valid information; finally, it sorts the data fragments based on the data offset and parses the complete azimuth frame based on the IQ frame header and frame tail.
3. The real-time radar signal processing system based on high-speed acquisition and heterogeneous parallel processing according to claim 1, characterized in that: The pulse compression parallel module operates as follows: First, CPU memory is allocated to store the radar's multi-channel data and reference signal data; the multi-channel data includes sum channel, azimuth difference channel, and elevation difference channel data. Next, the cudaMemcpy() function is established, GPU memory space is allocated through cudaMemcpy(), and the multi-channel data and reference signal data are output to the GPU via the PCIe bus. Zeros are padded to each beam data in each single channel to satisfy powers of 2. Then, the data of each channel and the reference signal data are rearranged horizontally in the range dimension. Multiple streams are created through the cudaStreamCreate() function, and the cufftExecZ2Z() function is called to perform FFT operations. Multiple thread blocks are opened in the x and y directions of the thread grid. Complex dot multiplication is performed on the FFT operation results, and the cufftExecZ2Z() function is called again to perform IFFT operations. Then, the __syncthreads() function is used to complete stream synchronization, and the pulse compression results of each channel are output. Finally, the GPU memory is released through the cudaFree() function.
4. The real-time radar signal processing system based on high-speed acquisition and heterogeneous parallel processing according to claim 3, characterized in that: The FFT operation process is as follows: Frequency domain pulse compression performs FFT operations on the echo signal and the reference signal separately: (1); in, The FFT result of the echo signal, The FFT result for the reference signal, For discrete time points, For signal length, For frequency points; Next, the two are multiplied in the frequency domain and then subjected to IFFT to obtain the pulse compression result. The specific calculation is as follows: (2); in, This is the pulse pressure result. This represents the dot product operation for complex numbers.
5. The real-time radar signal processing system based on high-speed acquisition and heterogeneous parallel processing according to claim 1, characterized in that: The CPU and GPU heterogeneous parallel processing module also includes a moving target display unit and a moving target detection unit. The moving target display unit receives the pulse compression result and performs clutter suppression. The moving target detection unit performs Doppler filtering on the suppressed signal and outputs it to the constant false alarm detection parallel module.
6. The real-time radar signal processing system based on high-speed acquisition and heterogeneous parallel processing according to claim 5, characterized in that: The constant false alarm rate (CFAR) detection parallel module works as follows: First, it calculates the clutter estimate, as follows: (3); in, This is a clutter estimate. For statistical estimation functions, For the set of reference unit power values, The total number of reference cells; when using the cell-averaged CFAR algorithm, the clutter estimate satisfies the formula: (4); in, The average clutter estimate for a given cell, with a threshold factor satisfying the following formula: (5); in, Threshold factor The false alarm probability is given; the adaptive threshold satisfies the formula: (6); in, The threshold is adaptive; the binary decision satisfies the formula: (7); in, For the judgment result, This represents the power value of the unit to be tested.
7. The real-time radar signal processing system based on high-speed acquisition and heterogeneous parallel processing according to claim 1, characterized in that, The specific steps for constructing the prefix sum matrix are as follows: (8); in, For prefix sum matrices in coordinates The value at that location, The original data matrix in coordinates The value at that location, All are matrix coordinate indices; the summation result of any rectangular reference window satisfies the formula: (9); in, Reference window summation result Use the coordinates of the top left corner of the reference window. The coordinates are for the bottom right corner of the reference window.
8. The real-time radar signal processing system based on high-speed acquisition and heterogeneous parallel processing according to claim 1, characterized in that: When the constant false alarm detection parallel module is working, 128×512 thread blocks are opened in the thread grid unit of the GPU module, and 7×7 threads are opened in each thread block; 8 protection units are set around the detection unit, and 40 reference units are set around the protection units.