SPAD three-dimensional imaging method and device based on photon behavior tracking
By initializing multiple parallel neural trackers in SPAD 3D imaging, utilizing a pre-trained bi-branch neural network for dynamic control and residual correction, and combining attention masking to select the optimal depth position, the problems of low prediction accuracy and poor noise robustness in existing technologies are solved, achieving high-precision 3D imaging.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- XIDIAN UNIV
- Filing Date
- 2026-04-13
- Publication Date
- 2026-06-16
Smart Images

Figure CN122017880B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of single-photon avalanche diode (SPAD) 3D imaging and computer vision technology, and particularly to a SPAD 3D imaging method and apparatus based on photon behavior tracking. Background Technology
[0002] SPAD (Spectrum Optical Animation Detection) has become a key technology in the field of high-precision 3D depth sensing imaging due to its single-photon-level detection sensitivity and picosecond-level time resolution. By actively emitting laser pulses and measuring the time-of-flight (ToF) of photons, SPAD sensors can reconstruct 3D scenes under extreme conditions where traditional sensors fail, such as extremely low illumination and strong background light interference. This makes it valuable for applications in autonomous driving, robot navigation, and augmented reality. However, the high sensitivity of SPADs also introduces significant environmental background noise (such as sunlight) while capturing signal photons. Especially in outdoor high-light scenarios, signal photons are often overwhelmed by random background noise. Extracting signal photons from ambient light interference remains a core technological challenge.
[0003] Current mainstream SPAD 3D imaging methods typically employ a histogram accumulation strategy, which constructs a statistical histogram from photon timestamps collected over multiple detection cycles and obtains target distance information by detecting the peak positions of the histogram. However, this histogram accumulation strategy suffers from significant hardware resource bottlenecks: to ensure depth measurement accuracy, extremely fine time bins are required, causing on-chip memory consumption to increase dramatically with sensor resolution and detection depth. This huge storage overhead makes it difficult to integrate the histogram accumulation strategy on power- and area-constrained edge hardware. To address these hardware resource bottlenecks, recent research has shifted towards histogram-free neural network processing methods. These methods utilize time-series models such as recurrent neural networks or spiking neural networks to directly process the original photon timestamp sequence and regress to obtain depth values, thus avoiding the storage overhead associated with histogram construction. However, existing neural network methods typically simplify depth estimation into a single black-box regression task, lacking an explicit mechanism to distinguish between signal photons and background noise photons. Under conditions of strong background light interference or low signal-to-noise ratio, the model is prone to overfitting short-term noise bursts, causing drastic fluctuations in depth estimation results. This results in low prediction accuracy and poor noise robustness of existing techniques. Summary of the Invention
[0004] The purpose of this invention is to provide a SPAD three-dimensional imaging method and device based on photon behavior tracking, which solves the problems of low prediction accuracy and poor noise robustness in the prior art.
[0005] To address the aforementioned technical problems, the embodiments of the present invention provide the following technical solutions:
[0006] The first aspect of this invention provides a SPAD three-dimensional imaging method based on photon behavior tracking, comprising:
[0007] Within a preset detection depth range, multiple parallel neural trackers are initialized, each neural tracker being configured with state parameters including depth position and uncertainty width;
[0008] The photon timestamp sequence collected by the single-photon avalanche diode sensor is obtained, and each photon in the photon timestamp sequence is converted into a corresponding distance observation value according to the time-of-flight principle;
[0009] Based on the current state parameters and current distance observations of each neural tracker, a pre-trained bi-branch neural network is used to predict dynamic control parameters and residual correction parameters, respectively. The dynamic control parameters are used to control the macroscopic motion trend of each neural tracker, and the residual correction parameters are used to make microscopic corrections to each neural tracker to smooth the trajectory jitter.
[0010] Based on whether the distance observation falls within the uncertainty width of the current neural tracker, the attention mask is calculated, and the state parameters are updated according to the dynamic control parameters, residual correction parameters and attention mask to obtain the updated state parameters of the current neural tracker. The updated state parameters include the final state parameters.
[0011] After processing the photon timestamp sequence, based on the cumulative confidence of each neural tracker during the update process using the attention mask, the depth position in the final state parameter corresponding to the neural tracker with the highest confidence is selected as the depth estimate for constructing 3D imaging.
[0012] A second aspect of the present invention provides a SPAD three-dimensional imaging device based on photon behavior tracking, comprising:
[0013] An initialization module is used to initialize multiple parallel neural trackers within a preset detection depth range. Each neural tracker is configured with state parameters including depth position and uncertainty width.
[0014] The conversion module is used to acquire the photon timestamp sequence collected by the single-photon avalanche diode sensor, and convert each photon in the photon timestamp sequence into the corresponding distance observation value according to the time-of-flight principle;
[0015] The prediction module is used to predict the dynamic control parameters and residual correction parameters respectively based on the current state parameters and current distance observation values of each neural tracker using a pre-trained dual-branch neural network. The dynamic control parameters are used to control the macroscopic motion trend of each neural tracker, and the residual correction parameters are used to make microscopic corrections to each neural tracker to smooth the trajectory jitter.
[0016] The update module is used to calculate the attention mask based on whether the distance observation falls within the uncertainty width of the current neural tracker, and update the state parameters based on the dynamic control parameters, residual correction parameters and attention mask to obtain the updated state parameters of the current neural tracker. The updated state parameters include the final state parameters.
[0017] The selection module is used to select the depth position in the final state parameters of the neural tracker with the highest confidence as the depth estimate after processing the photon timestamp sequence, based on the cumulative confidence of each neural tracker during the update process using the attention mask, so as to construct a 3D image.
[0018] Compared to existing technologies, the SPAD 3D imaging method and device based on photon behavior tracking provided by this invention initializes multiple parallel neural trackers within a preset detection depth range. Each neural tracker is configured with state parameters including depth position and uncertainty width. It acquires a photon timestamp sequence collected by a single-photon avalanche diode sensor and, based on the time-of-flight principle, converts each photon in the timestamp sequence into a corresponding distance observation value. Based on the current state parameters and current distance observation values of each neural tracker, a pre-trained dual-branch neural network is used to predict dynamic control parameters and residual correction parameters, respectively. The dynamic control parameters are used for control... The macroscopic motion trend of each neural tracker is controlled, and residual correction parameters are used to make microscopic corrections to each neural tracker to smooth trajectory jitter. Based on whether the distance observation value falls within the uncertainty width of the current neural tracker, an attention mask is calculated, and the state parameters are updated according to the dynamic control parameters, residual correction parameters, and attention mask to obtain the updated state parameters of the current neural tracker. After processing the photon timestamp sequence, based on the cumulative confidence of each neural tracker during the update process based on the attention mask, the depth position in the final state parameters of the neural tracker with the highest confidence is selected as the depth estimate for constructing 3D imaging. In this way, the neural tracker with the highest confidence is selected from multiple parallel neural trackers, avoiding the possibility of a single tracker getting stuck in local extrema or tracking the wrong target. Furthermore, the use of a pre-trained dual-branch neural network, combined with the calculation of an attention mask, i.e., an attention gating mechanism, can effectively filter background noise photons, resulting in strong noise resistance. The residual correction parameters in the pre-trained dual-branch neural network are used to make micro-corrections to each neural tracker to smooth trajectory jitter, giving the pre-trained dual-branch neural network self-correction capabilities and resulting in high prediction accuracy. Attached Figure Description
[0019] The above and other objects, features, and advantages of exemplary embodiments of the present invention will become readily apparent upon reading the following detailed description with reference to the accompanying drawings. In the drawings, several embodiments of the invention are illustrated by way of example and not limitation, with the same or corresponding reference numerals denoteing the same or corresponding parts, wherein:
[0020] Figure 1 A flowchart illustrating a SPAD three-dimensional imaging method based on photon behavior tracking is shown schematically.
[0021] Figure 2 A flowchart illustrating the parameter prediction process of a pre-trained two-branch neural network is shown.
[0022] Figure 3 A schematic diagram illustrating the attention mechanism is shown.
[0023] Figure 4A schematic diagram illustrating the principle of motor contraction in a neural tracker is shown.
[0024] Figure 5 A schematic diagram of the SPAD three-dimensional imaging device based on photon behavior tracking is shown. Detailed Implementation
[0025] Exemplary embodiments of the invention will now be described in more detail with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be implemented in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided to enable a more thorough understanding of the invention and to fully convey the scope of the invention to those skilled in the art.
[0026] It should be noted that, unless otherwise stated, the technical or scientific terms used in this invention should have the ordinary meaning as understood by one of ordinary skill in the art.
[0027] The methods described in the embodiments of the present invention will be explained in detail below.
[0028] Figure 1 A flowchart illustrating a SPAD three-dimensional imaging method based on photon behavior tracking in an embodiment of the present invention is shown schematically. See [link to flowchart illustration]. Figure 1 As shown, this SPAD three-dimensional imaging method based on photon behavior tracking can include:
[0029] S101. Within the preset detection depth range, initialize multiple parallel neural trackers.
[0030] Each neural tracker is configured with state parameters including depth position and uncertainty width. The state parameters also include velocity momentum, hidden memory state, and residual feedback.
[0031] Specifically, the preset detection depth range Internal, uniform initialization A parallel neural tracker, with an initial position anchor point (i.e., depth position) set as... This is to cover shallow, medium, and deep depth regions. In this embodiment, the number of parallel neural trackers... The preferred value is 3.
[0032] Configure initial state parameters for each neural tracker:
[0033] ;
[0034] in, For the first The state parameters of a neural tracker at the initial time (i.e., time step 0), For depth location; This is the velocity momentum, and its initial value is set to 0. The uncertainty width is initially set to a wide window with a width of 0.5, which is used to capture a wide range of signals in the initial stage. To hide the memory state for encoding temporal features, the initial value of the hidden memory state is set to 0. This is the residual feedback quantity, which includes the position residual and velocity residual output at the previous moment. The initial value of the residual feedback quantity is set to 0.
[0035] S102. Obtain the photon timestamp sequence collected by the single-photon avalanche diode sensor, and convert each photon in the photon timestamp sequence into the corresponding distance observation value according to the time-of-flight principle.
[0036] Specifically, The neural trackers run concurrently, with the first one being the second. Taking the first neural tracker as an example, let's assume that the first... The photon timestamp input at each time step is Based on the time-of-flight principle, each photon in the photon timestamp sequence is converted into a corresponding distance observation value. Distance observations at each time step The expression is , At the speed of light, the corresponding position of the neural tracker at this time is The uncertainty width is , For the first Depth position at each time step For the first The uncertainty width of each time step.
[0037] S103. Based on the current state parameters and current distance observations of each neural tracker, use a pre-trained dual-branch neural network to predict the dynamic control parameters and residual correction parameters respectively.
[0038] Among them, the dynamic control parameters are used to control the macroscopic motion trend of each neural tracker, and the residual correction parameters are used to make microscopic corrections to each neural tracker to smooth the trajectory jitter.
[0039] The pre-trained dual-branch neural network includes a parallel dynamics branch and a residual correction branch. The nonlinear mapping function of the first neural network in the dynamics branch and the nonlinear mapping function of the second neural network in the residual correction branch both adopt a multilayer perceptron structure.
[0040] For each photon in the photon timestamp sequence, parameter prediction is performed using a pre-trained dual-branch neural network, consisting of a parallel dynamics branch and a residual correction branch.
[0041] Specifically, Figure 2 A schematic flowchart illustrating the parameter prediction process of a pre-trained two-branch neural network is shown below. Figure 2 As shown, based on the current state parameters and current distance observations of each neural tracker, a pre-trained dual-branch neural network is used to predict the dynamic control parameters and residual correction parameters, including:
[0042] Step A1: Input the deviation between the current distance observation value and the corresponding depth position, as well as the corresponding uncertainty width, into the dynamics motion branch for processing and output of dynamics control parameters.
[0043] The dynamic control parameters include global driving force, Kalman gain coefficient, and width contraction factor.
[0044] Specifically, the dynamics branch is used to capture the macroscopic motion patterns of photons, primarily relying on the positional deviation between photons and the neural tracker to determine the global evolutionary trend of the system. The dynamics branch uses photon timestamps... The corresponding number Distance observations at each time step With neural trackers in the first Depth position at each time step The difference between and the Uncertainty width at each time step As input.
[0045] The expression for the dynamics branch is:
[0046] ;
[0047] in, For dynamic control parameters, For the first The global driving force at each time step For the first Kalman gain coefficients at each time step For the first Width contraction factor for each time step For the first nonlinear mapping function of the neural network in the dynamics branch, For the first Distance observations at each time step, For the first Depth position at each time step For the first The uncertainty width at each time step. The first neural network nonlinear mapping function adopts a multilayer perceptron structure, which consists of a first fully connected layer (Linear), a linear rectified activation function (ReLU), a second fully connected layer (Linear), and a hyperbolic tangent activation function (Tanh) cascaded in sequence. Through the multilayer perceptron structure, the pre-trained bi-branch neural network can fit the complex nonlinear relationship between positional deviation and dynamic parameters.
[0048] First neural network nonlinear mapping function The first output vector contains three independent feature channels: global driving force, Kalman gain coefficient, and width contraction factor. The global driving force, Kalman gain coefficient, and width contraction factor are separated by slicing and mapped to dynamic control parameters with clear physical meaning.
[0049] No. Global driving force at each time step Characterization: The The macroscopic thrust vector exerted on the neural tracker by the input data at each time step. Global driving force at each time step The sign of the corresponding value represents the direction of the driving force (i.e., whether it propels the tracker forward or backward), and the magnitude of its absolute value represents the strength of the driving force.
[0050] No. Kalman gain coefficient at each time step It is a Kalman-like dynamically gated weight, which, in a physical sense, is equivalent to the optimal estimated gain in classical Kalman filtering, and is used to measure the tracker's performance on the first... The degree of trust in the input data at each time step. This Kalman gain coefficient. It is obtained by outputting a pre-trained dual-branch neural network and mapping it using the Sigmoid function, with its value strictly constrained within a certain range. The range. High gain indicates a higher probability that the input data is a signal photon, and this global driving force will be utilized. The speed of the neural tracker needs to be adjusted quickly; low gain indicates a higher probability that the input data consists of noisy photons, and the system needs to suppress this global driving force. The function of this is to prioritize preserving the inertial momentum of the neural tracker from the previous moment.
[0051] No. Width contraction factor per time step Characterization: Uncertainty-dependent width contraction rate of the neural tracker. This width contraction factor... An adaptive search strategy for implementing neural trackers, which reduces the width of the tracking window from coarse to fine, can maintain a wide tracking window in the initial stage to prevent the target from being lost, and can quickly shrink the window after locking onto the signal photon to suppress background noise photons.
[0052] Step A2: Input the normalized relative position between the current distance observation and the corresponding depth position, the corresponding uncertainty width, the hidden memory state, and the residual feedback from the previous time step into the residual correction branch, so as to process the residual correction parameters.
[0053] The residual correction parameters include position residual, velocity residual, and memory update characteristics.
[0054] Specifically, the residual correction branch is used to handle microscopic local distortions and smooth trajectory jitter, primarily based on the relative distribution of photons within the confidence window and historical state information for fine-tuning. This residual correction branch uses normalized relative positions... , No. Uncertainty width at each time step , No. Hidden memory state at each time step And the residual feedback amount from the previous time step, i.e., the first... Residual feedback quantity at each time step As input.
[0055] The expression for the residual correction branch is:
[0056] ;
[0057] in, For residual correction parameters, For the first Position residuals at each time step For the first The velocity residual at each time step For the first Memory update features at each time step The second neural network nonlinear mapping function is the residual correction branch. For the first Hidden memory state at each time step For the first The residual feedback at each time step. The second neural network nonlinear mapping function also adopts a multilayer perceptron structure, which consists of a first fully connected layer (Linear), a linear rectified activation function (ReLU), a second fully connected layer (Linear), and a hyperbolic tangent activation function (Tanh) cascaded in sequence. This multilayer perceptron structure introduces historical feedback input, forming a local recurrent feedback loop.
[0058] The second neural network nonlinear mapping function The second output vector contains three independent feature channels: position residual, velocity residual, and memory update feature. The position residual, velocity residual, and memory update feature are separated by slicing and mapped to residual correction parameters for fine-tuning.
[0059] No. Position residual at each time step Characterization: For the first Depth position at each time step The estimated micro-correction amount. The residual at this location. Used to eliminate local positional deviations caused by non-ideal characteristics of sensors or signal distortion.
[0060] No. The velocity residual at each time step Characterization: The Velocity and momentum at each time step The estimated micro-correction amount. This velocity residual. This is used to compensate for transient velocity changes that cannot be covered by global driving force alone. Simultaneously, the position residual and velocity residual will be combined as the updated residual feedback quantity. Passed on to the next moment, in which, For the first The updated residual feedback at each time step.
[0061] No. Memory update features at each time step Characterization: The temporal update amount of hidden memory states. This memory update feature. It does not directly participate in the depth position calculation at the current time step, but is used to update the depth position using a moving average strategy. Hidden memory state at each time step This memory update feature is achieved by accumulating historical time-series information. This enables the residual correction branch to have long-range memory capabilities, which can smooth out trajectory jitter caused by instantaneous noise.
[0062] S104. Calculate the attention mask based on whether the distance observation falls within the uncertainty width of the current neural tracker, and update the state parameters based on the dynamic control parameters, residual correction parameters, and attention mask to obtain the updated state parameters of the current neural tracker.
[0063] The updated state parameters include the final state parameters. These updated state parameters include the updated velocity-momentum, updated depth-position, updated uncertainty width, updated hidden memory state, and updated residual feedback.
[0064] Specifically, dynamic control parameters are obtained through dynamic motion branch and residual correction branch. and residual correction parameters Then, the state parameter update operation is performed. The core of this step S104 is to introduce an attention mechanism, which dynamically adjusts the magnitude of the state update based on the degree of matching between the current input data and the state of the neural tracker, and uses the momentum leakage mechanism to maintain the movement trend of the neural tracker in the no-signal area.
[0065] Figure 3 A schematic diagram of the attention mechanism is shown below. Figure 3 As shown, the attention mask is used to represent the uncertainty width, or uncertainty window, of whether an input photon falls into the current neural tracker. In order to maintain the differentiability of the system, the expression for the attention mask is:
[0066] ;
[0067] in, For the first Attention mask at each time step, It is the Sigmoid activation function. For the first Distance observations at each time step, For the first Depth position at each time step For the first The uncertainty width of each time step. When the input photon falls within the uncertainty window of the current neural tracker, the uncertainty width of the first time step... Attention mask at each time step This indicates that the input photon is valid and can be used to update the neural tracker state; when the input photon is outside the uncertainty window of the current neural tracker, the first... Attention mask at each time step This indicates that the input photon is invalid for the current neural tracker and cannot be used to update the neural tracker state.
[0068] Specifically, based on the dynamic control parameters, residual correction parameters, and attention mask, the state parameters are updated to obtain the updated state parameters of the current neural tracker, including:
[0069] Step B1: Based on the Kalman gain coefficient, global driving force, velocity residual, and attention mask, the velocity momentum is weighted and fused to obtain the updated velocity momentum.
[0070] Specifically, the first output of the dynamics motion branch Kalman gain coefficient at each time step As a dynamic gating mechanism, it integrates inertial momentum, global driving force, and velocity residual to calculate the velocity momentum at the next moment, i.e., the velocity momentum at the 1st moment. The updated velocity momentum at each time step :
[0071] The updated expression for velocity momentum is:
[0072] ;
[0073] in, For the first The updated velocity momentum at each time step. For the first Kalman gain coefficients at each time step For the first Velocity-momentum at each time step For the first The global driving force at each time step For the first The velocity residual at each time step For the first Attention mask for each time step. For the inertial retention term of the neural tracker, when the Kalman gain is low, the first term is mainly retained. Velocity and momentum at each time step The neural tracker maintains its motion state by utilizing inertia; As a global driver, when the Kalman gain is high, it will quickly correct the motion speed and direction of the neural tracker; For local fine-tuning terms, in the first... Attention mask at each time step It takes effect upon activation, and the velocity residual obtained from the residual correction branch dominates the correction of micro velocity deviation.
[0074] Step B2: Update the depth position based on the updated velocity momentum, position residual, and preset momentum leakage factor to obtain the updated depth position.
[0075] The preset momentum leakage factor is used to maintain the inertial drift of the neural tracker when the attention mask is not activated.
[0076] The updated depth position expression is:
[0077] ;
[0078] in, For the first The updated depth position at each time step For the first The depth position at the nth time step is... The center location of the neural tracker at each time step. For the first The updated velocity momentum at each time step. For the first Attention mask at each time step, For the first Position residuals at each time step Used to eliminate local position errors In this embodiment, the preset momentum leakage factor is used. The preferred value is 0.1, so that even when the attention mask is not activated ( Even under these conditions, the neural tracker can still maintain tiny movements, preventing it from ceasing to update its motion due to a prolonged lack of effective photons entering the current neural tracker.
[0079] Step B3: Based on the width shrinkage factor and attention mask, adaptively shrink the uncertainty width to obtain the updated uncertainty width.
[0080] An adaptive shrinkage strategy is used to adaptively shrink the uncertainty width, resulting in the updated uncertainty width:
[0081] ;
[0082] in, For the first The width of uncertainty after each time step update. For the first The uncertainty width of each time step In this embodiment, the shrinkage rate is... The value is 0.02. For the first The width shrinkage factor for each time step is applied only when a valid signal is detected (i.e., attention mask activation) and the network prediction requires shrinkage. When the noise level is relatively high, the window of the current neural tracker will narrow, thereby gradually improving the noise suppression capability.
[0083] Figure 4 A schematic diagram illustrating the motor contraction principle of a neural tracker is shown below. Figure 4 As shown, the horizontal axis represents the time step. The vertical axis represents the depth position. ,in, For the first time step, This represents the depth position corresponding to the first time step. At each time step, a pentagram indicates the depth position of the current time step, spanning the entire interval on the horizontal axis, for example... and The interval between them is the first time step. Uncertainty width The confidence interval covered gradually converges to the true target distance as the time step progresses, while the uncertainty width adaptively shrinks from an initial wide window to a minimum value. This clearly demonstrates the dynamic evolution of the neural tracker from coarse-grained search to precise localization, effectively suppressing background noise interference. Figure 4 The study comprehensively demonstrates the collaborative update effect of the updated depth position and the updated uncertainty width over the time dimension.
[0084] Step B4: Based on the memory update characteristics, perform a moving average update on the hidden memory state to obtain the updated hidden memory state.
[0085] A moving average strategy is used to update the hidden memory state by moving average, so as to smoothly preserve historical features and obtain the updated hidden memory state:
[0086] ;
[0087] in, For the first The updated hidden memory state at each time step For the first Hidden memory state at each time step For the first Memory update features at each time step The memory decay coefficient is the one used in this embodiment. The preferred value is 0.8.
[0088] Step B5: Combine the position residual and velocity residual to obtain the updated residual feedback.
[0089] S105. After processing the photon timestamp sequence, based on the cumulative confidence of each neural tracker during the update process using the attention mask, the depth position in the final state parameter corresponding to the neural tracker with the highest confidence is selected as the depth estimate for constructing a three-dimensional image.
[0090] Once the photon timestamp sequence (i.e., all photons) has been updated by the neural trackers, the number of valid signal captures for each neural tracker during the entire update cycle is counted, achieving a decision-making transition from local uncertainty to global determinism. The sequence length of the photon timestamp sequence is... .
[0091] Specifically, after processing the photon timestamp sequence, based on the cumulative confidence of each neural tracker during the update process using the attention mask, the depth position in the final state parameters corresponding to the neural tracker with the highest confidence is selected as the depth estimate for constructing 3D imaging, including:
[0092] Step C1: After processing the photon timestamp sequence, the attention masks obtained by each neural tracker during the update process are accumulated to obtain the cumulative confidence of each neural tracker.
[0093] During the photon-by-photon recursive processing, the cumulative confidence of each neural tracker is updated synchronously. , :
[0094] ;
[0095] in, For the first The neural tracker in the first The cumulative confidence level at each time step The confidence level for the initial time step. , For the first The neural tracker in the first The cumulative confidence level at each time step For the first The photon corresponding to the i-th time step is relative to the i-th time step. Attention mask of a neural tracker By accumulating the attention mask for each photon, the effective photon energy captured by the current neural tracker within the time window can be approximated. In other words, the accumulated confidence level can measure the number of signal photons captured by each neural tracker.
[0096] Step C2: Compare the cumulative confidence scores of all parallel neural trackers and select the target neural tracker.
[0097] Among them, the target neural tracker is the neural tracker with the highest cumulative confidence.
[0098] The expression for the neural tracker with the highest cumulative confidence is:
[0099] ;
[0100] in, The neural tracker with the highest cumulative confidence is the target neural tracker. For the first A neural tracker at total time steps The cumulative confidence level, To obtain the maximum value.
[0101] Step C3: Use the depth position in the final state parameters of the target neural tracker as a depth estimate to construct a 3D image.
[0102] The expression for the depth estimate is:
[0103] ;
[0104] in, This is a depth estimate. For the total time step The neural tracker with the highest cumulative confidence among them. The depth position in the final state parameters.
[0105] The prediction accuracy of the SPAD 3D imaging method based on photon behavior tracking in this invention is quantitatively evaluated using root mean square error (RMSE) and mean absolute percentage error (MAPE), respectively, and compared with the prediction accuracy of existing methods such as direct histogram accumulation, long short-term memory (LSTM), recurrent spiral neural network (RS-SNN), and single gate spiral recurrent neural network (SG-SRNN).
[0106] RMSE measures the absolute magnitude of the deviation between the predicted depth value and the actual depth value, expressed in centimeters (cm). A smaller RMSE value indicates higher measurement accuracy. MAPE measures the relative error of the predicted value, expressed as a percentage (%). A smaller MAPE value indicates higher prediction accuracy of the model.
[0107] The experiment sets different signal-to-noise ratios to generate simulation data with distances ranging from 0 to 15m, and sets different combinations of signal and noise intensities under the same signal-to-noise ratio.
[0108] Table 1 Comparison of Quantitative Parameters of Test Results
[0109]
[0110] Table 1 compares the quantitative parameters of the test results. Referring to Table 1, under both low and high signal-to-noise ratio (SNR) environments, the SPAD 3D imaging method based on photon behavior tracking of this invention achieved the lowest RMSE and MAPE values. This means that this invention outperforms existing comparative techniques in both absolute and relative accuracy of depth estimation. Compared with the traditional histogram accumulation method, the advantages of this invention are particularly evident in low SNR environments. Under strong noise interference with SNR=0.5, the RMSE of this invention decreased from 0.78 cm to 0.62 cm, and the MAPE decreased from 15.93% to 11.27%. This demonstrates that this invention utilizes a neural Kalman tracking mechanism to effectively extract effective photon signals from strong background noise, overcoming the problem of traditional statistical methods failing under low photon counts. Compared with neural network-based methods such as LSTM, RS-SNN, and SG-SRNN, this invention maintains its accuracy advantage.
[0111] This invention, based on a SPAD sensor, centers on modeling depth estimation as a multi-hypothesis parallel tracking process. It utilizes a neural network to predict Kalman filter parameters, enabling dynamic locking of weak signal trajectories. The method employs a pre-trained dual-branch neural network with an attention gating mechanism, effectively filtering background noise photons and exhibiting strong noise robustness. The residual correction branch introduces a local recurrent feedback loop, feeding back the correction from the previous time step to the current input. This closed-loop control mechanism endows the pre-trained dual-branch neural network with self-correction capabilities, significantly improving depth prediction accuracy. Furthermore, the multi-hypothesis parallel tracking strategy effectively addresses the problem of single-point initialization easily getting trapped in local extrema or tracking incorrect targets by initializing multiple parallel trackers at different depths and ultimately selecting the optimal one.
[0112] Based on the above Figure 1As can be seen from the implementation method, this embodiment of the invention initializes multiple parallel neural trackers within a preset detection depth range. Each neural tracker is configured with state parameters including depth position and uncertainty width. It acquires a photon timestamp sequence collected by a single-photon avalanche diode sensor and, based on the time-of-flight principle, converts each photon in the photon timestamp sequence into a corresponding distance observation value. Based on the current state parameters and current distance observation values of each neural tracker, a pre-trained dual-branch neural network is used to predict dynamic control parameters and residual correction parameters, respectively. The dynamic control parameters are used to control the macroscopic motion trend of each neural tracker, while the residual correction parameters are used to perform microscopic corrections on each neural tracker to smooth trajectory jitter. An attention mask is calculated based on whether the distance observation value falls within the uncertainty width of the current neural tracker, and the state parameters are updated based on the dynamic control parameters, residual correction parameters, and attention mask to obtain the updated state parameters of the current neural tracker. After processing the photon timestamp sequence, based on the cumulative confidence of each neural tracker during the update process using the attention mask, the depth position in the final state parameters corresponding to the neural tracker with the highest confidence is selected as the depth estimate for constructing a three-dimensional image. In this way, the neural tracker with the highest confidence is selected from multiple parallel neural trackers, avoiding the possibility of a single tracker getting stuck in local extrema or tracking the wrong target. Furthermore, the use of a pre-trained dual-branch neural network, combined with the calculation of an attention mask, i.e., an attention gating mechanism, can effectively filter background noise photons, resulting in strong noise resistance. The residual correction parameters in the pre-trained dual-branch neural network are used to make micro-corrections to each neural tracker to smooth trajectory jitter, giving the pre-trained dual-branch neural network self-correction capabilities and resulting in high prediction accuracy.
[0113] Based on the same inventive concept, as an implementation of the above-mentioned SPAD three-dimensional imaging method based on photon behavior tracking, this embodiment of the invention also provides a SPAD three-dimensional imaging device based on photon behavior tracking. Figure 5 This is a structural diagram of the SPAD three-dimensional imaging device based on photon behavior tracking in an embodiment of the present invention. See also... Figure 5 As shown, the SPAD 3D imaging device based on photon behavior tracking may include:
[0114] The initialization module 501 is used to initialize multiple parallel neural trackers within a preset detection depth range. Each neural tracker is configured with state parameters including depth position and uncertainty width.
[0115] The conversion module 502 is used to acquire the photon timestamp sequence collected by the single-photon avalanche diode sensor, and convert each photon in the photon timestamp sequence into the corresponding distance observation value according to the time-of-flight principle;
[0116] The prediction module 503 is used to predict the dynamic control parameters and residual correction parameters respectively based on the current state parameters and current distance observation values of each neural tracker using a pre-trained dual-branch neural network. The dynamic control parameters are used to control the macroscopic motion trend of each neural tracker, and the residual correction parameters are used to make microscopic corrections to each neural tracker to smooth the trajectory jitter.
[0117] The update module 504 is used to calculate the attention mask based on whether the distance observation falls within the uncertainty width of the current neural tracker, and update the state parameters based on the dynamic control parameters, residual correction parameters and attention mask to obtain the updated state parameters of the current neural tracker. The updated state parameters include the final state parameters.
[0118] The selection module 505 is used to select the depth position in the final state parameters of the neural tracker with the highest confidence as the depth estimate after processing the photon timestamp sequence, based on the cumulative confidence of each neural tracker during the update process using the attention mask, so as to construct a three-dimensional image.
[0119] The prediction module 503 is specifically used to input the deviation between the current distance observation and the corresponding depth position, as well as the corresponding uncertainty width, into the dynamics branch for processing and outputting dynamic control parameters, including the global driving force, Kalman gain coefficient, and width contraction factor; and to input the normalized relative position between the current distance observation and the corresponding depth position, the corresponding uncertainty width, the hidden memory state, and the residual feedback from the previous time step into the residual correction branch for processing and outputting residual correction parameters, including the position residual, velocity residual, and memory update features; the pre-trained dual-branch neural network includes a parallel dynamics branch and a residual correction branch, and the first neural network nonlinear mapping function of the dynamics branch and the second neural network nonlinear mapping function of the residual correction branch both adopt a multilayer perceptron structure.
[0120] In update module 504, the updated state parameters include updated velocity momentum, updated depth position, updated uncertainty width, updated hidden memory state, and updated residual feedback. Based on dynamic control parameters, residual correction parameters, and attention mask, the state parameters are updated to obtain the updated state parameters of the current neural tracker. This includes: weighted fusion of velocity momentum based on Kalman gain coefficient, global driving force, velocity residual, and attention mask to obtain updated velocity momentum; updating the depth position based on updated velocity momentum, position residual, and preset momentum leakage factor to obtain updated depth position; the preset momentum leakage factor is used to maintain the neural tracker's inertial drift when the attention mask is not activated; adaptively shrinking the uncertainty width based on width contraction factor and attention mask to obtain updated uncertainty width; updating the hidden memory state by moving average based on memory update characteristics to obtain updated hidden memory state; and combining the position residual and velocity residual to obtain updated residual feedback.
[0121] The selection module 505 is specifically used to accumulate the attention masks obtained by each neural tracker during the update process after processing the photon timestamp sequence, so as to obtain the cumulative confidence of each neural tracker; compare the cumulative confidence of all parallel neural trackers, select the target neural tracker, which is the neural tracker with the highest cumulative confidence; and use the depth position in the final state parameters of the target neural tracker as the depth estimate for constructing a three-dimensional image.
[0122] It should be noted that the above description of the SPAD 3D imaging device embodiment based on photon behavior tracking is similar to the description of the SPAD 3D imaging method embodiment based on photon behavior tracking, and has similar beneficial effects. For any technical details not disclosed in the embodiments of the SPAD 3D imaging device based on photon behavior tracking of this invention, please refer to the description of the SPAD 3D imaging method embodiment based on photon behavior tracking of this invention for understanding.
[0123] The above are merely specific embodiments of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.
Claims
1. A SPAD three-dimensional imaging method based on photon behavior tracking, characterized in that, include: Within a preset detection depth range, multiple parallel neural trackers are initialized, each neural tracker being configured with state parameters including depth position and uncertainty width; The photon timestamp sequence collected by the single-photon avalanche diode sensor is obtained, and each photon in the photon timestamp sequence is converted into a corresponding distance observation value according to the time-of-flight principle; Based on the current state parameters and current distance observations of each neural tracker, a pre-trained dual-branch neural network is used to predict dynamic control parameters and residual correction parameters, respectively. The dynamic control parameters are used to control the macroscopic motion trend of each neural tracker, and the residual correction parameters are used to make microscopic corrections to each neural tracker to smooth trajectory jitter. Based on whether the distance observation falls within the uncertainty width of the current neural tracker, an attention mask is calculated, and the state parameters are updated according to the dynamic control parameters, the residual correction parameters, and the attention mask to obtain the updated state parameters of the current neural tracker. The updated state parameters include the final state parameters. After processing the photon timestamp sequence, based on the cumulative confidence of each neural tracker during the update process using the attention mask, the depth position in the final state parameter corresponding to the neural tracker with the highest confidence is selected as the depth estimate for constructing a three-dimensional image.
2. The SPAD three-dimensional imaging method based on photon behavior tracking according to claim 1, characterized in that, The state parameters also include velocity momentum, hidden memory state, and residual feedback.
3. The SPAD three-dimensional imaging method based on photon behavior tracking according to claim 2, characterized in that, The pre-trained dual-branch neural network includes a parallel dynamics branch and a residual correction branch. The first neural network nonlinear mapping function of the dynamics branch and the second neural network nonlinear mapping function of the residual correction branch both adopt a multilayer perceptron structure. The step of predicting dynamic control parameters and residual correction parameters based on the current state parameters and current distance observations of each neural tracker using a pre-trained dual-branch neural network includes: The deviation between the current distance observation and the corresponding depth position, as well as the corresponding uncertainty width, are input to the dynamic motion branch for processing. The dynamic motion branch outputs the dynamic control parameters, which include global driving force, Kalman gain coefficient, and width contraction factor. The normalized relative position between the current distance observation and the corresponding depth position, the corresponding uncertainty width, the hidden memory state, and the residual feedback amount at the previous moment are input to the residual correction branch for processing, and the residual correction parameters are output. The residual correction parameters include position residual, velocity residual, and memory update features.
4. The SPAD three-dimensional imaging method based on photon behavior tracking according to claim 1, characterized in that, The expression for the attention mask is: ; in, For the first Attention mask at each time step, It is the Sigmoid activation function. For the first Distance observations at each time step, For the first Depth position at each time step For the first The uncertainty width of each time step.
5. The SPAD three-dimensional imaging method based on photon behavior tracking according to claim 3, characterized in that, The updated state parameters include the updated velocity momentum, updated depth position, updated uncertainty width, updated hidden memory state, and updated residual feedback. The step of updating the state parameters based on the dynamic control parameters, the residual correction parameters, and the attention mask to obtain the updated state parameters of the current neural tracker includes: The velocity momentum is weighted and fused based on the Kalman gain coefficient, the global driving force, the velocity residual, and the attention mask to obtain the updated velocity momentum. The depth position is updated based on the updated velocity momentum, the position residual, and the preset momentum leakage factor to obtain the updated depth position. The preset momentum leakage factor is used to maintain the inertial drift of the neural tracker when the attention mask is not activated. The uncertainty width is adaptively shrunk according to the width shrinkage factor and the attention mask to obtain the updated uncertainty width; Based on the memory update characteristics, the hidden memory state is updated by a moving average to obtain the updated hidden memory state; The position residual and the velocity residual are combined to obtain the updated residual feedback quantity.
6. The SPAD three-dimensional imaging method based on photon behavior tracking according to claim 5, characterized in that, The updated velocity-momentum expression is: ; in, For the first The updated velocity momentum at each time step. For the first Kalman gain coefficients at each time step For the first Velocity-momentum at each time step For the first The global driving force at each time step For the first The velocity residual at each time step For the first Attention mask for each time step.
7. The SPAD three-dimensional imaging method based on photon behavior tracking according to claim 5, characterized in that, The expression for the updated depth position is: ; in, For the first The updated depth position at each time step For the first Depth position at each time step For the first The updated velocity momentum at each time step. For the first Attention mask at each time step, The preset momentum leakage factor, For the first The position residual at each time step.
8. The SPAD three-dimensional imaging method based on photon behavior tracking according to claim 1, characterized in that, After processing the photon timestamp sequence, based on the cumulative confidence of each neural tracker during the update process using the attention mask, the depth position in the final state parameters corresponding to the neural tracker with the highest confidence is selected as the depth estimate for constructing a 3D image, including: After processing the photon timestamp sequence, the attention mask obtained by each neural tracker during the update process is accumulated to obtain the cumulative confidence of each neural tracker. Compare the cumulative confidence scores of all parallel neural trackers and select the target neural tracker, which is the neural tracker with the highest cumulative confidence score. The depth position in the final state parameters of the target neural tracker is used as the depth estimate to construct the three-dimensional image.
9. The SPAD three-dimensional imaging method based on photon behavior tracking according to claim 3, characterized in that, The expression for the dynamic motion branch is: ; in, These are the dynamic control parameters. For the first The global driving force at each time step For the first Kalman gain coefficients at each time step For the first Width contraction factor for each time step The first neural network nonlinear mapping function of the dynamic motion branch is... For the first Distance observations at each time step, For the first Depth position at each time step For the first The uncertainty width of each time step; The expression for the residual correction branch is: ; in, The residual correction parameter is... For the first Position residuals at each time step For the first The velocity residual at each time step For the first Memory update features at each time step The second neural network nonlinear mapping function is the residual correction branch. For the first Hidden memory state at each time step For the first The residual feedback amount at each time step.
10. A SPAD three-dimensional imaging device based on photon behavior tracking, characterized in that, include: An initialization module is used to initialize multiple parallel neural trackers within a preset detection depth range. Each neural tracker is configured with state parameters including depth position and uncertainty width. The conversion module is used to acquire the photon timestamp sequence collected by the single-photon avalanche diode sensor, and convert each photon in the photon timestamp sequence into a corresponding distance observation value according to the time-of-flight principle; The prediction module is used to predict dynamic control parameters and residual correction parameters respectively based on the current state parameters and current distance observation values of each neural tracker using a pre-trained dual-branch neural network. The dynamic control parameters are used to control the macroscopic motion trend of each neural tracker, and the residual correction parameters are used to make microscopic corrections to each neural tracker to smooth trajectory jitter. The update module is used to calculate the attention mask based on whether the distance observation value falls within the uncertainty width of the current neural tracker, and update the state parameters based on the dynamic control parameters, the residual correction parameters and the attention mask to obtain the updated state parameters of the current neural tracker. The updated state parameters include the final state parameters. The selection module is used to select the depth position in the final state parameters of the neural tracker with the highest confidence as the depth estimate after processing the photon timestamp sequence, based on the cumulative confidence of each neural tracker during the update process according to the attention mask, so as to construct a three-dimensional image.