Laser sweep frequency ranging method and system based on FPGA parallel acceleration and NUFFT local refinement
By integrating signal normalization and NUFFT local refinement algorithms within the FPGA, the complexity of frequency domain calculations and cross-device delay issues in FMCW laser ranging systems are resolved, achieving high-precision, low-latency, and low-cost real-time ranging, suitable for compact ranging modules and embedded systems.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HARBIN INSTITUTE OF TECHNOLOGY SUZHOU RESEARCH INSTITUTE
- Filing Date
- 2026-03-05
- Publication Date
- 2026-06-16
AI Technical Summary
Existing FMCW laser ranging systems face challenges such as complex frequency domain calculations, low peak refinement efficiency, large cross-device communication latency, high system cost, and insufficient integration under the requirements of high precision and real-time performance. They are particularly difficult to achieve efficient processing in high frame rate and high data volume scenarios.
A local refinement laser frequency sweeping ranging method based on FPGA parallel acceleration NUFFT is adopted. By integrating signal normalization, NUFFT calculation and local inverse Fourier transform refinement algorithm inside the FPGA, local refinement of the spectrum is achieved, avoiding cross-device data transmission and reducing hardware resource consumption and power consumption.
It achieves real-time high-precision spectrum processing with low latency, reduces system cost and physical size, is suitable for compact ranging modules and embedded systems, and improves system integration and processing efficiency.
Smart Images

Figure CN122218652A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of lidar signal processing technology, and specifically relates to a NUFFT local refinement laser sweeping ranging method and system based on FPGA parallel acceleration. Background Technology
[0002] Frequency-modulated continuous wave (FMCW) laser ranging technology has become an important ranging method in high-end manufacturing fields such as 3D measurement, equipment inspection, and aerospace due to its advantages such as large ranging range, high accuracy, and no special requirements for target reflection. This technology is based on the principle of frequency-modulated laser beat interferometry. By analyzing the frequency of the measured interference signal and the reference signal, the peak frequency of the beat frequency is obtained, and the target distance is further calculated. Therefore, the accuracy and real-time performance of the ranging system are directly determined by the accuracy and processing speed of the beat frequency signal's spectrum calculation.
[0003] In typical FMCW laser ranging systems, frequency domain determination of the beat frequency signal is a crucial step. Traditional methods mostly rely on the FFT algorithm to obtain the spectrum, but the resolution of the FFT is limited by the number of sampling points, often failing to meet peak refinement requirements in high-speed frequency sweeping or high-precision ranging scenarios. To improve frequency estimation accuracy, researchers both domestically and internationally have proposed various spectrum refinement methods, including Chirp-Z Transform (CZT), windowed interpolation, Non-Uniform Discrete Fourier Transform (NUDFT), and Non-Uniform Fourier Transform (NUFFT). Among these, the NUFFT series of algorithms is widely used in high-precision ranging and imaging fields because it can directly process non-uniformly sampled signals and possesses near-NUDFT accuracy. However, the NUFFT / NUDFT algorithms are computationally intensive, often relying on CPUs or GPUs for processing in practical engineering. For high-frame-rate, high-data-volume FMCW LiDAR, the limited sequential processing capability of CPUs cannot meet the real-time processing requirements of high-frequency beat frequency signals. While GPUs possess strong parallel processing capabilities, they require data exchange with front-end FPGAs or high-speed ADCs. Frequent cross-device data transmission causes significant latency, becoming a major bottleneck limiting ranging efficiency. For example, in high-frame-rate ranging scenarios, the ranging link needs to frequently move data between the FPGA and GPU, making the overall processing time dominated by communication latency, hindering the system from achieving continuous high-speed output. Furthermore, GPU solutions are power-hungry, bulky, and costly, with limited system integration, making deployment difficult in compact ranging modules, automotive LiDAR, or embedded measurement systems. Therefore, achieving a low-power, highly integrated, and low-latency real-time processing architecture while ensuring high accuracy has become a key requirement for the development of FMCW laser ranging technology.
[0004] Regarding spectrum refinement, while CZT can improve resolution by stretching the frequency domain range, its computational complexity is high. When it needs to be called continuously in a large number of ranging loops, it will significantly increase the system load, especially under high sampling rates or high data point counts, making it difficult to meet real-time requirements. In addition, CZT has limited frequency domain flexibility, making it difficult to quickly construct a suitable local high-resolution frequency domain region based on the actual peak position. While the NUFFT+CZT algorithm has higher theoretical accuracy, its traditional implementation requires a large number of complex exponential operations and interpolation operations, which places high demands on hardware resources, making direct implementation on FPGAs difficult.
[0005] Current FMCW ranging systems generally face problems such as complex frequency domain calculations, low peak refinement efficiency, large cross-device communication delays, high system costs, and insufficient integration. The industry urgently needs a ranging method that can complete high-precision spectrum processing, peak search, and local refinement within an FPGA to reduce cross-device communication and achieve true end-to-end real-time operation. Summary of the Invention
[0006] The purpose of this invention is to address the aforementioned problems by providing a NUFFT-based localized refined laser frequency sweep ranging method and system with parallel acceleration based on FPGA. This method can reduce the local frequency domain as needed, further improve the distance estimation accuracy, and adapt to rapid spectrum changes under high-speed frequency sweeping conditions. It is of great significance for improving the real-time performance of FMCW laser ranging systems, simplifying system architecture, and reducing costs.
[0007] The technical solution of this invention is:
[0008] The NUFFT-based local refinement laser sweeping ranging method with FPGA parallel acceleration includes the following steps:
[0009] S1. The swept laser generates measurement interference signal and reference interference signal through the measurement interferometer and the reference interferometer;
[0010] S2. The measured interference signal and the reference interference signal are converted by photoelectric conversion and then input to the FPGA for processing;
[0011] S3. Inside the FPGA, perform the following operations:
[0012] S3-1. Perform phase calculation and normalization on the reference interference signal to obtain a non-uniform time-domain signal that reflects the sweep frequency nonlinearity of the laser.
[0013] S3-2. Based on the non-uniform time-domain signal, the spectrum of the measured interference signal is calculated using the non-uniform fast Fourier transform (NUFFT) algorithm to compensate for the spectral broadening caused by the nonlinearity of laser sweep frequency.
[0014] S3-3. Search for peaks in the calculated NUFFT spectrum, and based on the searched peak index, apply the Local Inverse Fourier Transform Refinement (I-ZFFT) algorithm to refine the spectrum in the local frequency range to extract the peak frequency with high precision.
[0015] S3-4. Based on the mapping relationship between the peak frequency and the target distance, calculate the final target distance value.
[0016] Preferably, the step of spectrum calculation in S3-2 specifically includes:
[0017] The normalized non-uniform time-domain signal is interpolated by gridding using a Gaussian kernel function, which maps the non-uniform sampling points onto uniform grid points.
[0018] A parallel FFT structure based on Cooley-Tukey decomposition is used to perform Fourier transform on the large-scale uniform sequence after gridding. The parallel FFT structure breaks down the large-scale FFT calculation into multiple small-scale FFT parallel calculations that can be processed by FPGA IP cores, and splices the spectrum through a rotation factor to obtain the complete NUFFT spectrum.
[0019] Preferably, the parallel FFT structure based on Cooley-Tukey decomposition specifically refers to:
[0020] After meshing, the length is The sequence is split into two parts of length 1 by parity index. / 2 subsequence;
[0021] The two sub-sequences are respectively input into two FFT IP cores for parallel transformation to obtain two sub-spectrums. and ;
[0022] Using the pre-stored rotation factor W N The two sub-spectrums are weighted and concatenated to obtain the complete spectrum X(k), where:
[0023] .
[0024] Preferably, the Local Inverse Fourier Transform Refinement (I-ZFFT) algorithm described in S3-3 specifically includes:
[0025] Centered on the peak index idx found in the initial search, a local refined frequency interval of length L is selected;
[0026] Perform an inverse discrete Fourier transform (IDFT) of length L on the local refined frequency interval to obtain the local time-domain sequence;
[0027] The local time-domain sequence is padded with zeros to extend the sequence length to Z, resulting in a zero-padded sequence.
[0028] The zero-padded sequence is subjected to a Fast Fourier Transform (FFT) of length Z to obtain a refined high-resolution spectrum.
[0029] Search for precise peak positions in the refined high-resolution spectrum. And according to the formula The accurate peak frequency index idx is calculated. z .
[0030] Preferably, the internal processing flow of the FPGA is implemented through highly parallel hardware logic, specifically including the following hardware modules that are executed sequentially or in parallel:
[0031] The normalization module is used to align, filter, perform Hilbert transform, phase calculation, and normalize the input measurement interference signal and reference interference signal.
[0032] The gridding module receives the non-uniform time-domain signal and measurement interference signal output by the normalization module, and converts them into data on uniform grid points through Gaussian kernel interpolation in parallel computation.
[0033] The FFT module is used to perform parallel FFT calculations based on Cooley-Tukey decomposition on the uniform grid data output by the gridding module to obtain the NUFFT spectrum.
[0034] The peak finding and refinement module is used to search for peaks in the spectrum output by the FFT module and call the I-ZFFT algorithm logic to refine the local spectrum near the peak, and finally output a high-precision peak frequency.
[0035] Preferably, the meshing module adopts a pipelined and cross-clock domain design, specifically including:
[0036] Parallel computing units are used to calculate the distance between each non-uniform sampling point and its neighboring uniform grid points in parallel, and to calculate the corresponding Gaussian kernel weights in parallel.
[0037] The weight accumulation buffer adopts a register structure with an output area, a buffer area and an accumulation area, and is used to pipeline the accumulation of weights of neighboring non-uniform points within the overlapping Gaussian kernel range.
[0038] An asynchronous FIFO array is used to receive and buffer the interpolated data output from the weight accumulation buffer, and output the interpolated uniform data to the FFT module in two consecutive paths according to the parity index at a frequency higher than the input clock.
[0039] Preferably, the target distance R mBased on the refined peak frequency index idx z The following formula can be used to solve the problem:
[0040] ;
[0041] Where B is the optical frequency difference of the auxiliary interferometer within the sampling range of N points. To assist in the interferometer arm length difference, To measure the distance between the optical fiber output end face of the interferometer.
[0042] A laser frequency sweeping ranging system is provided to implement the above-mentioned NUFFT-based locally refined laser frequency sweeping ranging method with parallel acceleration based on FPGA. The system includes:
[0043] Frequency sweep light source;
[0044] A measuring interferometer, whose optical path is connected to the sweep frequency light source, is used to generate a measuring interferometric signal containing target distance information;
[0045] A reference interferometer, whose optical path is connected to the sweep frequency light source, is used to generate a reference interferometric signal that reflects the instantaneous frequency of the laser sweep frequency;
[0046] A photoelectric conversion module is used to convert the measured interference signal and the reference interference signal into electrical signals;
[0047] The FPGA processing module, whose signal input terminal is connected to the photoelectric conversion module, is configured to execute all digital signal processing steps of the above method within a single chip and output the target distance value.
[0048] Preferably, the frequency sweeping light source is any one of a DFB semiconductor laser, an external cavity laser (ECL), a Fourier mode-locked laser (FDML), or a vertical cavity surface-emitting laser (VCSEL).
[0049] Preferably, the FPGA processing module completes the calculation of the target distance within a single frequency sweep cycle through its internally integrated parallel hardware logic, with a processing latency of less than 1 millisecond, and does not require data exchange with the GPU or CPU outside the FPGA.
[0050] The NUFFT-based local refinement laser sweeping ranging method and system based on FPGA parallel acceleration provided by this invention has the following significant advantages compared with the prior art:
[0051] 1. This invention integrates a complete ranging algorithm, including signal normalization, NUFFT calculation, peak search, and local refinement, into a single FPGA chip. Through highly parallel and pipelined hardware logic design, frequent cross-device data transfer between the FPGA and GPU or CPU is avoided, fundamentally eliminating the communication latency bottleneck.
[0052] 2. The Local Inverse Fourier Transform Refinement (I-ZFFT) algorithm proposed in this invention performs local refinement processing on the NUFFT spectral peaks. This algorithm has a regular structure, consisting only of IFFT, zero-padding, and FFT operations. It has low computational complexity and low hardware resource consumption, making it very suitable for pipelined and parallel optimization on FPGAs.
[0053] 3. This invention integrates the entire high-precision signal processing chain onto an FPGA, eliminating the reliance on high-power, bulky GPUs and resulting in a more compact system structure and higher integration. This effectively reduces hardware costs, system power consumption, and physical size, making this method easier to deploy and apply in compact ranging modules that are sensitive to power consumption, size, and cost, such as automotive LiDAR and embedded measurement systems, thus promoting the practical application and commercialization of FMCW LiDAR technology.
[0054] 4. This invention innovatively adopts a parallel structure for FFT with odd-even sequence splitting based on Cooley-Tukey decomposition and dual-channel FFT, addressing the limitation of FPGA IP core count. This structure, through rotation factor weighted concatenation, is mathematically equivalent to a single large-scale FFT calculation, but in hardware, it only requires two small-scale FFT IP cores. This overcomes the IP core size limitation, significantly improves computational parallelism and throughput, and reduces logic resource consumption, providing a feasible engineering solution for efficiently implementing large-scale NUFFT real-time processing on FPGAs. Attached Figure Description
[0055] The present invention will be further described below with reference to the accompanying drawings and embodiments:
[0056] Figure 1 This is a schematic diagram of a laser scanning ranging system.
[0057] Figure 2 Here is a flowchart of the I-ZFFT algorithm;
[0058] Figure 3 The timing diagram for the I-ZFFT algorithm is shown below.
[0059] Figure 4 For comparison of the ranging accuracy of different algorithms;
[0060] Figure 5 A comparison of the distance measurement time of different algorithms. Detailed Implementation
[0061] This invention proposes a NUFFT-based local refinement laser frequency sweeping ranging system with FPGA parallel acceleration. The structure of the laser frequency sweeping ranging system is as follows: Figure 1As shown, it includes a swept frequency light source, five optical couplers, a measurement interferometer, an auxiliary interferometer, two photodetectors, a data acquisition system, and an FPGA core board.
[0062] The measuring interferometer consists of a second optical coupler 2, a circulator, a focusing system, a third optical coupler 3, and a first photodetector 1. The auxiliary interferometer consists of a fourth optical coupler 4, a fifth optical coupler 5, and a second photodetector 2. Each interferometer has one long-arm optical fiber and one short-arm optical fiber. In this embodiment, the frequency sweeping light source is a DFB semiconductor laser with a working wavelength of 1550 nm, a bandwidth of approximately 1.5 nm, and an output optical frequency f waveform that is a triangular wave with a frequency modulation period of 1 ms, i.e., one up / down sweep every 0.5 ms. This invention can use frequency sweeping light sources with other center wavelengths, bandwidths, and periods, including but not limited to external cavity lasers (ECL), Fourier mode-locked lasers (FDML), and vertical cavity surface-emitting lasers (VCSEL).
[0063] Sweep frequency light source outputs laser signal The light is split by a first optical coupler 1 (99:1 ratio); 99% of the light enters the measurement interferometer and is split again by a second optical coupler 2 (99:1 ratio). 99% of the light enters through port 1 and exits through port 2, and is then focused onto the target surface by a focusing system. The light reflected from the target enters through port 2 and exits through port 3, ultimately interfering with the 1% reference light after beam splitting at a third optical coupler 3 (50:50 ratio). Finally, the light undergoes photoelectric conversion and signal acquisition by the first photodetector 1 and the data acquisition system to obtain the measurement interferometer signal. Similarly, after being split by the first optical coupler 1 (99:1 ratio), the remaining 1% of the light enters the auxiliary interferometer. It then passes through the fourth optical coupler 4 (50:50 ratio) and the fifth optical coupler 5 (50:50 ratio) for interferometric beat frequency measurement. Finally, it undergoes photoelectric conversion and signal acquisition by the second photodetector 2 and the data acquisition system to obtain the auxiliary interferometer signal. .
[0064] This invention relates to a NUFFT-based locally refined laser frequency sweep ranging method with parallel acceleration using FPGA, comprising the following steps.
[0065] As a frequency-swept laser source, the output optical signal of DFB can be expressed as:
[0066] (1)
[0067] in Indicates the amplitude of the output optical signal. To represent the real-time phase of the output optical signal from the frequency-sweeping laser source, the frequency of the optical signal can be further expressed as:
[0068] (2)
[0069] The interference signals of the DFB swept-frequency laser source, through an auxiliary interferometer and a measuring interferometer, are expressed as follows:
[0070] (3)
[0071] (4)
[0072] Where A1 and A2 represent the amplitudes of the signals from the auxiliary interferometer and the measuring interferometer, respectively; and τa and τm represent the phases of the auxiliary interferometer and the measuring interferometer, respectively; τa and τm represent the group delays of the signal after passing through the short and long arms of the auxiliary and measuring interferometers, respectively; where τa and τm represent the group delays of the signal after passing through the short and long arms of the auxiliary and measuring interferometers, respectively; and The Taylor expansion can be expressed as:
[0073] (5)
[0074] (6)
[0075] The essence of target distance calculation in a laser sweep interferometric ranging system is to perform cross-correlation between the beat frequency signals of the measuring interferometer and the auxiliary interferometer. Therefore, the delay τa of the auxiliary interferometer is first calibrated, and then the phase information ratio between the auxiliary interferometer and the measuring interferometer is obtained through Fourier transform. The measured distance can then be obtained by further solving the problem. If the laser frequency is linearly modulated in time, the interference signal is a time-varying standard cosine signal, which can be directly determined using Fourier transform. However, since it is difficult to ensure that the frequency modulation characteristics of the laser source are completely linear in practice, a high-precision target distance extraction algorithm is needed to compensate for the tuning nonlinearity of the laser.
[0076] The spectral broadening caused by laser nonlinearity can be considered as a non-uniform sampling problem. Directly using Fast Fourier Transform (FFT) is insufficient to estimate the target peak position; therefore, a Non-Uniform Fourier Transform (NUFFT) algorithm is needed to obtain the compensated broadened spectrum with a computational complexity of NlogN. This involves measuring the beat frequency signal of the interferometer. Perform N-point discrete sampling, and obtain the sampling sequence from the Dirac function:
[0077] (7)
[0078] Since the phase of the auxiliary interferometer beat frequency signal at the same moment during the sampling process can reflect the real-time frequency modulation rate of the measuring interferometer, the auxiliary interferometer beat frequency signal is first... Perform phase decomposition and normalization to Auxiliary interferometer phase As a non-uniform time-domain signal in NUFFT, it is discretely represented as:
[0079] (8)
[0080] use Gaussian interpolation kernel function on The normalized non-uniform auxiliary interferometer phase signal Uniform gridding is achieved by using convolutional interpolation to generate uniform points, with a quantity of [number missing]. :
[0081] (9)
[0082] S3-2. Based on the non-uniform time-domain signal, the spectrum of the measured interference signal is calculated using the non-uniform fast Fourier transform (NUFFT) algorithm to compensate for the spectral broadening caused by the nonlinearity of laser sweep frequency.
[0083] The normalized non-uniform auxiliary interferometer phase signal Gridding, using convolutional interpolation to generate uniform grid points. The quantity is ,in The interpolation factor. For grid spacing, Index the points on a uniform grid. Then search for points on a non-uniform grid. index of the nearest uniform point Within the Gaussian kernel radius specified Within the range, weights are assigned based on the distance between uniform and non-uniform points. Equation (9) can be expressed as a standard FFT discretization:
[0084] (10)
[0085] Then respectively for and Performing a Fourier transform yields:
[0086] (11)
[0087] in The Gaussian kernel coefficients are determined by the sampling factor R and the Gaussian kernel radius. Decide:
[0088] (12)
[0089] It is worth noting that the above interpolation process essentially oversamples the non-uniform N points by a factor of R. However, the FFT IP core in the FPGA hardware system cannot provide an FFT of arbitrary length. To address this issue, this invention, based on the Cooley-Tukey FFT decomposition formula, breaks down the original FFT into several sub-FFT operations. By multiplying these sub-operations using a rotation factor, a spectrum completely consistent with the single-core complete FFT can be reconstructed. In this embodiment, the FPGA chip selected is K7325T, with N=32768 non-uniform points and an interpolation factor of R=4. After meshing, the result is... =131072, but the Xilinx FFT IP core only supports a maximum of 65536 points for FFT. Therefore, this invention adopts a structure of parity sequence splitting and dual-channel FFT in parallel, and then inputs it into two FFT IP cores for transformation, and finally transforms it through a rotation factor. The two spectrum streams are weighted and concatenated for reconstruction. Mathematically, this structure is equivalent to a single 131,072-point FFT calculation, but in hardware, it can be implemented by calling only two 65,536-point FFT IP cores. This satisfies the IP core point limit, significantly improves the parallelism and throughput of spectrum calculation, reduces logic resource consumption, and is more suitable for large-scale NUFFT real-time processing on FPGA.
[0090] First, the original interpolated sequence is split into two odd-even subsequences according to the odd-even index:
[0091] (13)
[0092] According to the Cooley-Tukey FFT decomposition principle, the FFT is divided into two paths, even and odd, and the complete spectrum is synthesized using a rotation factor:
[0093] (14)
[0094] The target peak index idx is found by peak locating in the NUFFT spectrum. Traditional NUDFT and CZT refinement are time-consuming and computationally intensive, significantly impacting the resources and real-time performance of FPGA real-time data processing systems. While the centroid method is faster, its accuracy is poor in cases of low signal-to-noise ratio and target peak broadening, significantly affecting distance calculation. Therefore, this invention proposes a Local Inverse Fourier Transform Refinement (I-ZFFT) algorithm, which selects a local refinement frequency range centered on the target peak index idx. Where L is the width of the local refinement interval. The selected local spectrum is plotted as a length of... The inverse discrete Fourier transform yields the local time-domain sequence:
[0095] (15)
[0096] Let the ZFFT length be Z, then the ZFFT refinement factor is... Zero-padding is performed on the local time-domain signal.
[0097] (16)
[0098] Perform an FFT of length Z on the zero-padded time-domain sequence:
[0099] (17)
[0100] The refined spectral resolution is:
[0101] (18)
[0102] Peaks were then searched in the refined spectrum of I-ZFFT. The target peak index obtained by I-ZFFT is:
[0103] (19)
[0104] In FMCW laser ranging, distance and frequency satisfy the following:
[0105] (20)
[0106] Where B is the optical frequency difference of the auxiliary interferometer within the sampling range of N points. To assist in the interferometer arm length difference, To measure the distance between the optical fiber output end face of the interferometer.
[0107] The complexity of the full-frequency domain NUDFT algorithm is... The NUFFT + CZT local refinement algorithm has a time complexity of O(n). However, CZT additionally involves exponential operations, chirp sequence generation, and large-scale complex multiplication and convolution, resulting in extremely complex hardware implementation; while the I-ZFFT refinement algorithm proposed in this invention has a complexity of [missing information]. It consists only of IFFT, zero-padding and FFT, has a regular structure, low hardware resource consumption, is easy to pipelining and parallel optimization, and is more suitable for FPGA real-time processing architecture.
[0108] like Figure 2 As shown, this invention is divided into four sub-modules in FPGA: normalization module, meshing module, FFT module, and peak finding module (including centroid peak finding and I-ZFFT peak finding). All sub-modules are developed using Verilog hardware logic language.
[0109] 1. Normalization module.
[0110] First, within the normalization submodule, the beat frequency signals da and db from the measurement interferometer and auxiliary interferometer are acquired. The db signal is filtered through a low-pass FIR filter to remove most high-frequency noise, yielding db_lowpass_real. db_lowpass_real, as the real part, is converted to the imaginary part db_lowpass_imag through a Hilbert transform FIR filter. The real and imaginary parts are then input into an arctangent Cordic ip kernel to obtain the sawtooth wave phase db_phase with a period of 2π. Unwrapping db_phase yields the auxiliary interferometer phase db_unwrap, whose phase increases over time. However, due to the laser's nonlinearity, db_unwrap exhibits a non-uniform increasing sequence. To facilitate subsequent meshing algorithms, db_unwrap is normalized using a divider to obtain x_in. Simultaneously, the measurement interferometer signal da is shifted and aligned with x_in in time to obtain y_in.
[0111] Specifically, in this embodiment, the sampling frequency of the data acquisition system is 100MHz, and the FMCW waveform period is 1ms, i.e., an upscan every 0.5ms and a downscan every 0.5ms. Figure 3 As shown, for one up / down scan, the data acquisition system will collect 50,000 points. Considering the edge effect of the frequency sweep light source, in order to minimize nonlinearity, the 10001-42768th point in one up / down scan is taken as the input data of the normalization program.
[0112] 2. Grid module.
[0113] The gridding submodule uses x_in and y_in as input data. To connect with the subsequent odd / even FFT module, this module is designed as a cross-clock domain conversion from the original clock to twice the clock frequency. In this embodiment, x_in and y_in are 100MHz data inputs, and even_data and odd_data are outputs in parallel at 200MHz. First, parallel calculations are performed based on the non-uniform increment of x_in. In this embodiment, there are several distance values. Parallel computing The distance between a non-uniform point and a uniform point is distance[0: ]. Connecting parallel distance values A parallel cordic ip core calculates the Gaussian kernel exponent exp_coe[0: ],get Each parallel weight weight[0: Specifically, however, the range pointed to by the Gaussian kernel radius is... or At that time, the value of the corresponding index within the weight is determined to be invalid and set to 0. Subsequently, the weights of different non-uniform points x_in overlapping within the vicinity of the Gaussian kernel are passed through a depth of weight_buf [0: The register is used for accumulation, and the output area is designed from the most significant bit to the least significant bit. Cache area and cumulative area The output area is used to distinguish the accumulated weights and can be output to the out_buf output buffer shift register. The output, buffer, and accumulation area together constitute the weight_buf pipeline structure, which facilitates the handling of special cases such as single negative phase and continuous negative phase. In this embodiment... ,
[0114] Meanwhile, the process from weight to weight_buf also involves cross-clock domain conversion, which in this embodiment is a conversion from 100MHz to 200MHz. At this time, a non-uniform point weight in the 100MHz clock domain will correspond to two clock cycles in the 200MHz clock domain: ① For the first clock cycle, weight[0: [Will] Each weight and the weight_buf accumulation area The magnitude is accumulated by summing the values, thus achieving weight accumulation within the adjacent Gaussian kernel range. Then, based on the normalized phase interval shift_cnt from the current clock cycle to the next clock cycle, the channel bits of all index addresses in weight_buf are shifted up by shift_cnt bits, and the freed-up low-order bits are set to 0. ① For the second clock cycle, if there are overflowing weights in the weight_buf output area, it means that there are no more overlapping weights from adjacent non-uniform points within the weight Gaussian kernel range that need to be accumulated. The weights in the output area can then be output to the out_buf output buffer shift register. Where out_buf is a depth of... The circular shift register is characterized by the rule that weight_buf is added to out_buf in address order. When the number of weights to be assigned in a given clock cycle exceeds the depth of out_buf, the addition continues from address 0. Simultaneously with the assignment from the weight_buf output area to out_buf, the weight_buf output area is cleared to 0 for the next shift and output.
[0115] out_buf is Parallel channels, and A parallel FIFO connection is established. Whenever the parallel channel of out_buf is updated to a non-zero value different from the previous value, FIFO_wr is pulled high, and the interpolated data is input into the FIFO at the corresponding index. At the same time, FIFO_rd of two indices is pulled high every 200MHz clock cycle, with indices ranging from 0 to... -1 increments sequentially, until the maximum FIFO index is exceeded in the next clock cycle. When the value is -1, the output continues from index 0. In this embodiment, When the FIFO of the next output index is detected to be empty, the system waits until the FIFO of the next output index is not empty before outputting. In this structure, the meshed interpolated data consists of non-continuous valid frames, resolving the contradiction between the non-uniform increasing phase caused by frequency sweep nonlinearity and continuous output interpolation across clock domains. Two interpolation points, even_data and odd_data, are output in parallel at a 200MHz clock domain, representing the even and odd indices of the interpolation, respectively. This achieves uniform meshing from N non-uniform points to 4N uniform interpolation points.
[0116] 3. FFT module.
[0117] The gridded odd-even interpolation data, even_data and odd_data, are used to perform a parallel 2N-point odd-even FFT using the FFT ip core. The odd-even concatenation in equation (14) is performed using a rotation factor pre-stored in ROM to obtain the real part FFT_real and the imaginary part FFT_imag of the complete FFT frequency domain. Then, the sum of squares of FFT_real and FFT_imag is calculated to obtain the spectrum data of the complete FFT. In particular, for high-speed data processing, the odd-even FFT ip core is set to output in reverse order, and the FFT spectrum is matched and subsequent peak finding is performed using the reverse address pre-stored in ROM.
[0118] 4. Peak Finding Module
[0119] To compare the differences between the centroid method and the I-ZFFT algorithm, Figure 2 and Figure 3 Flowcharts and timing diagrams for two algorithms are provided. For the centroid method for peak finding, the user first needs to define a custom spectrum search window range, which is obtained by pre-calibrating the target distance. The reverse-order output FFT spectrum is matched with the reverse address pre-stored in ROM. Since the FFT spectrum consists of positive and negative frequencies, and both are equivalent for peak finding, only the positive frequencies are taken. The first half of the spectrum (abs) is stored in bram. The sum of squares of the output spectrum (abs) is compared with the abs of the previous clock cycle, and the larger value and its corresponding index are placed in the peak_idx register and the peak_abs register. After two FFT outputs are completed, the peak finding module also finds the peak index `peak_idx` of the maximum value. The centroid method range is defined as 2M points around the peak index; in this embodiment, M=32. The points around the peak are output from `bram`, and the centroid method divisor and dividend `sum_divisor` and `sum_dividend` are accumulated. After the 2M points are output, the average index of the centroid method can be obtained through the divider, thereby calculating the target distance using the centroid method.
[0120] Unlike the centroid method, the I-ZFFT algorithm requires storing the real and imaginary parts of the FFT output in a bram. Similarly, by using reverse indexing, the 2M points near the peak index are input into the IFFT ip kernel to restore the time-domain signal ifft_out. To increase the spectral resolution, zeros are padded to the end of the 2M ifft_out points. The ratio of the total number of ifft_out points after zero padding to 2M is the refinement factor. Then, ifft_out is subjected to another FFT through the ZFFT ip kernel. For high-speed pipelined processing, the output is also set to reverse order. ZFFT peak finding is performed to obtain the ZFFT peak index zfft_peak_idx. The I-ZFFT target distance is calculated according to equations (17) and (18).
[0121] This embodiment uses 20 sets of ranging results under different signal-to-noise ratio conditions to compare the performance of four algorithms—①NUDFT; ②NUFFT + centroid method; ③NUFFT + CZT; ④NUFFT + I-ZFFT—in terms of ranging accuracy and real-time performance under the MATLAB platform CPU architecture. Figure 4 , 5 As shown, in situations with poor signal-to-noise ratios and high spectral clutter, the NUFFT+centroid method cannot accurately calculate the target distance, exhibiting significant bias. However, a combination algorithm that subdivides NUFFT (based on CZT) and I-ZFFT can better reproduce the results of NUFFT. Regarding real-time performance, the NUFFT algorithm has a lower complexity. The algorithm with the longest computation time is approximately 0.65 seconds, while NUFFT+CZT, due to its complex chirp and exponential function operations, takes the second longest time at approximately 0.1 seconds. NUFFT+I-ZFFT and NUFFT+centroid method have the shortest computation time, less than 1 ms, significantly improving the efficiency of target distance calculation while maintaining accuracy comparable to NUDFT and NUFFT+CZT algorithms.
[0122] In the comparative experiment of the embodiment, the control group used an FPGA+GPU architecture, while the experimental group used an FPGA parallel architecture. Although the GPU has strong parallel processing capabilities, it needs to exchange data with the front-end FPGA or high-speed ADC. Frequent cross-device data transmission will cause significant latency. The comparison of its real-time performance with that of the present invention is as follows:
[0123] FPGA+GPU architecture: The NUFFT centroid method takes 1~3ms to calculate each distance value;
[0124] FPGA parallel architecture: NUFFT+I-ZFFT algorithm (this invention) takes 0.5ms for each distance value.
[0125] The FPGA-parallel accelerated NUFFT local refinement laser sweep frequency interferometry system proposed in this invention achieves high-speed target distance calculation within 0.5ms, which has high engineering value and practical significance.
[0126] The above embodiments are only for illustrating the technical concept and features of the present invention, and are intended to enable those skilled in the art to understand the content of the present invention and implement it accordingly. They should not be construed as limiting the scope of protection of the present invention. All modifications made according to the spirit and essence of the main technical solution of the present invention should be covered within the scope of protection of the present invention.
Claims
1. A NUFFT-based locally refined laser sweeping ranging method with parallel acceleration using FPGA, characterized in that, Includes the following steps: S1. The swept laser generates measurement interference signal and reference interference signal through the measurement interferometer and the reference interferometer; S2. The measured interference signal and the reference interference signal are converted by photoelectric conversion and then input to the FPGA for processing; S3. Inside the FPGA, perform the following operations: S3-1. Perform phase calculation and normalization on the reference interference signal to obtain a non-uniform time-domain signal that reflects the sweep frequency nonlinearity of the laser. S3-2. Based on the non-uniform time-domain signal, the spectrum of the measured interference signal is calculated using the non-uniform fast Fourier transform (NUFFT) algorithm to compensate for the spectral broadening caused by the nonlinearity of laser sweep frequency. S3-3. Search for peaks in the calculated NUFFT spectrum, and based on the searched peak index, apply the Local Inverse Fourier Transform Refinement (I-ZFFT) algorithm to refine the spectrum in the local frequency range to extract the peak frequency with high precision. S3-4. Based on the mapping relationship between the peak frequency and the target distance, calculate the final target distance value.
2. The method according to claim 1, characterized in that, The specific steps for spectrum calculation in S3-2 include: The normalized non-uniform time-domain signal is interpolated by gridding using a Gaussian kernel function, which maps the non-uniform sampling points onto uniform grid points. A parallel FFT structure based on Cooley-Tukey decomposition is used to perform Fourier transform on the large-scale uniform sequence after gridding. The parallel FFT structure breaks down the large-scale FFT calculation into multiple small-scale FFT parallel calculations that can be processed by FPGA IP cores, and splices the spectrum through a rotation factor to obtain the complete NUFFT spectrum.
3. The method according to claim 2, characterized in that, The parallel FFT structure based on Cooley-Tukey decomposition is specifically as follows: After meshing, the length is The sequence is split into two parts of length 1 by parity index. / 2 subsequence; The two sub-sequences are respectively input into two FFT IP cores for parallel transformation to obtain two sub-spectrums. and ; Using the pre-stored rotation factor W N The two sub-spectrums are weighted and concatenated to obtain the complete spectrum X(k), where: 。 4. The method according to claim 3, characterized in that, The Local Inverse Fourier Transform Refinement (I-ZFFT) algorithm described in S3-3 specifically includes: Centered on the peak index idx found in the initial search, a local refined frequency interval of length L is selected; Perform an inverse discrete Fourier transform (IDFT) of length L on the local refined frequency interval to obtain the local time-domain sequence; The local time-domain sequence is padded with zeros to extend the sequence length to Z, resulting in a zero-padded sequence. The zero-padded sequence is subjected to a Fast Fourier Transform (FFT) of length Z to obtain a refined high-resolution spectrum. Search for precise peak positions in the refined high-resolution spectrum. And according to the formula The accurate peak frequency index idx is calculated. z .
5. The method according to claim 1, characterized in that, The internal processing flow of the FPGA is implemented through highly parallel hardware logic, specifically including the following hardware modules that are executed sequentially or in parallel: The normalization module is used to align, filter, perform Hilbert transform, phase calculation, and normalize the input measurement interference signal and reference interference signal. The gridding module receives the non-uniform time-domain signal and measurement interference signal output by the normalization module, and converts them into data on uniform grid points through Gaussian kernel interpolation in parallel computation. The FFT module is used to perform parallel FFT calculations based on Cooley-Tukey decomposition on the uniform grid data output by the gridding module to obtain the NUFFT spectrum. The peak finding and refinement module is used to search for peaks in the spectrum output by the FFT module and call the I-ZFFT algorithm logic to refine the local spectrum near the peak, and finally output a high-precision peak frequency.
6. The method according to claim 5, characterized in that, The meshing module adopts a pipelined and cross-clock domain design, specifically including: Parallel computing units are used to calculate the distance between each non-uniform sampling point and its neighboring uniform grid points in parallel, and to calculate the corresponding Gaussian kernel weights in parallel. The weight accumulation buffer adopts a register structure with an output area, a buffer area and an accumulation area, and is used to pipeline the accumulation of weights of neighboring non-uniform points within the overlapping Gaussian kernel range. An asynchronous FIFO array is used to receive and buffer the interpolated data output from the weight accumulation buffer, and output the interpolated uniform data to the FFT module in two consecutive paths according to the parity index at a frequency higher than the input clock.
7. The method according to claim 1, characterized in that, The target distance R m Based on the refined peak frequency index idx z The following formula can be used to solve the problem: ; Where B is the optical frequency difference of the auxiliary interferometer within the sampling range of N points. To assist in the difference in interferometer arm length, To measure the distance between the optical fiber output end face of the interferometer.
8. A laser scanning ranging system for implementing the method according to any one of claims 1-7, characterized in that, The system includes: Frequency sweep light source; A measuring interferometer, whose optical path is connected to the sweep frequency light source, is used to generate a measuring interferometric signal containing target distance information; A reference interferometer, whose optical path is connected to the sweep frequency light source, is used to generate a reference interferometric signal that reflects the instantaneous frequency of the laser sweep frequency; A photoelectric conversion module is used to convert the measured interference signal and the reference interference signal into electrical signals; An FPGA processing module, whose signal input terminal is connected to the photoelectric conversion module, is configured to execute all digital signal processing steps of the method according to any one of claims 1-7 within a single chip and output the target distance value.
9. The laser scanning ranging system according to claim 8, characterized in that, The frequency sweeping light source is any one of a DFB semiconductor laser, an external cavity laser (ECL), a Fourier mode-locked laser (FDML), or a vertical cavity surface-emitting laser (VCSEL).
10. The laser scanning ranging system according to claim 8, characterized in that, The FPGA processing module completes the calculation of the target distance within a single frequency sweep cycle through its internally integrated parallel hardware logic, with a processing latency of less than 1 millisecond, and does not require data exchange with the GPU or CPU outside the FPGA.