A laser radar all-digital processing system-on-chip architecture and data processing method
By integrating the photosensitive layer and the processing layer into a fully digital on-chip architecture for lidar, the problems of large size, high power consumption, and high cost of traditional lidar systems are solved, achieving efficient and robust data processing that is suitable for the consumer market and complex environments.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HUACHEN XINGUANG (WUXI) SEMICONDUCTOR CO LTD
- Filing Date
- 2026-03-16
- Publication Date
- 2026-06-12
Smart Images

Figure CN122194101A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of lidar integrated circuit technology, specifically to a lidar fully digital processing on-chip system architecture and data processing method. Background Technology
[0002] As a core sensor in fields such as autonomous driving, robot navigation, and industrial measurement, the performance of LiDAR directly affects the system's perception capabilities and decision-making reliability. Traditional LiDAR systems generally employ a discrete device architecture, distributing the photon detection array, timing measurement circuit, signal processing unit, and communication interface across multiple independent chips interconnected via printed circuit boards. This architecture results in a bulky overall system size, making it difficult to meet the integration requirements of space-constrained platforms such as vehicles and drones. Long-distance signal transmission between multiple chips not only increases power consumption but also introduces additional timing uncertainties, limiting further improvements in ranging accuracy. Furthermore, the complex procurement, testing, and assembly processes for discrete components lead to high overall system costs, hindering the widespread adoption of LiDAR in the consumer market.
[0003] Existing lidar systems mostly employ analog signal processing architectures. The weak current signal output from the photodetector must undergo multiple stages of analog circuitry processing, including transimpedance amplification, filtering and shaping, and comparison and discrimination, before time measurement can be performed. Analog circuits are highly sensitive to power supply noise, temperature drift, and electromagnetic interference, making it difficult to guarantee signal integrity in complex electromagnetic environments such as automotive environments. Furthermore, analog circuits suffer from poor manufacturing consistency, with significant response differences between different pixels. This necessitates complex calibration processes to ensure array uniformity, increasing production costs and testing cycles.
[0004] In terms of data processing, traditional architectures typically upload all raw data from the probe array to a back-end processor for centralized processing. As the array size increases and the frame rate rises, the amount of raw data grows exponentially, placing stringent demands on on-chip bus bandwidth and processor computing power. The transmission of large amounts of invalid noise data mixed with valid signal data not only wastes bandwidth resources but also increases the burden and power consumption of back-end processing. Existing systems lack the ability to intelligently filter data at the data source, making it difficult to fundamentally solve the data throughput bottleneck problem.
[0005] In recent years, direct time-of-flight ranging technology based on single-photon avalanche diodes has attracted widespread attention due to its high sensitivity and digital output characteristics. However, existing implementation schemes still have significant shortcomings in terms of system integration, real-time processing capabilities, and robustness in extreme environments. Summary of the Invention
[0006] To overcome the shortcomings of existing technologies, this invention proposes a fully digital processing on-chip system architecture for lidar, including a photosensitive layer, a processing layer, and a communication layer. The photosensitive layer is stacked in three dimensions through a silicon via interconnect array to form an integrated data processing link of sensing-storage-computing; The photosensitive layer includes a two-dimensional addressable single-photon avalanche diode array, with each pixel unit integrating: A single-photon avalanche diode employing a planar PN junction design and covered with a passivation layer; The output terminal of the single-photon avalanche diode is connected to a hybrid quenching circuit, which includes a parallel resistive delayed quenching branch and an active quenching branch, as well as a MOSFET active reset unit connected to the output terminals of both. The output of the hybrid quenching circuit is connected to a time-to-digital converter, which adopts an architecture that combines a coarse timing module and a vernier caliper fine timing module. A local preprocessing circuit is connected to the output of the time-to-digital converter; The local preprocessing circuit includes a counting threshold register and a multi-cycle correlation state machine, configured to: perform continuous multi-cycle correlation judgment on the timestamp output by the time-to-digital converter; and only when the photon count in the same time unit exceeds the preset threshold in the counting threshold register within N consecutive laser cycles, mark the timestamp as a valid event and generate a simplified data packet containing the timestamp, photon count and pixel coordinates. The processing layer receives the simplified data packets from the photosensitive layer through the through-silicon via interconnect array, and constructs a sparse histogram and outputs point cloud data based on the simplified data packets, thereby reducing data throughput compared to the original photon events.
[0007] Furthermore, the single-photon avalanche diode employs the following structural co-design: A planarized PN junction is surrounded by a deep well isolation structure to suppress edge breakdown; A silicon dioxide passivation layer with a thickness of 40nm-60nm is applied over the photosensitive surface to reduce the dark count caused by surface states; A microlens array disposed above the passivation layer is used to improve the optical fill factor; The electric field distribution structure is optimized to ensure a uniform distribution of electric field intensity in the avalanche region; The deep well isolation structure, passivation layer, and electric field optimization distribution structure work together to reduce the dark count rate of a single pixel to less than 100 cps.
[0008] Furthermore, in the hybrid quenching circuit: The resistor-delayed quenching branch includes a polycrystalline silicon quenching resistor with a resistance of 20kΩ-50kΩ; The active quenching branch includes quenching control logic and a controlled switch. After detecting the avalanche current, the quenching control logic triggers the controlled switch to turn on within a preset delay. The MOSFET active reset unit uses an NMOSFET with a width-to-length ratio greater than 100:1, which quickly pulls the cathode voltage of the single-photon avalanche diode back to the over-biased state after quenching. The resistive delayed quenching branch provides initial fast quenching, the active quenching branch ensures complete quenching to suppress after-pulse, and the MOSFET active reset unit achieves a reset time of less than 3 nanoseconds. Together, these three components enable the maximum pixel count rate to reach over 10 Mcps.
[0009] Furthermore, the time-to-digital converter includes: The coarse timing module adopts an 8-phase clock sampling architecture to coarsely quantize the time interval between the laser synchronization signal and the photon arrival signal. The vernier caliper fine timing module includes a 32-stage ring oscillator for fine interpolation of the quantization margin of the coarse timing module; The time interpolation calculation unit calculates the final timestamp based on the outputs of the coarse timing module and the vernier caliper fine timing module. The coarse timing module and the vernier caliper fine timing module work together to achieve a single measurement accuracy of better than 55ps.
[0010] Furthermore, the time-to-digital converter also includes: The temperature compensation circuit includes a temperature sensor integrated into the pixel area and a compensation calculation unit that performs a piecewise linear fitting algorithm; the compensation calculation unit interpolates the stored calibration coefficients according to the real-time temperature to dynamically correct the measured value so that the temperature drift coefficient is less than 0.1 ps / °C. The nonlinear calibration module measures and stores the transmission delay of each stage of the 32-stage ring oscillator based on a reference clock during the power-on phase, and performs nonlinear compensation on the timestamp based on the stored value during the operation phase. The dynamic resolution switching module adaptively switches between 12-bit high-resolution mode and 9-bit low-resolution mode based on ambient light intensity and real-time signal-to-noise ratio, with a switching delay of less than 100 nanoseconds. The temperature compensation circuit, nonlinear calibration module, and dynamic resolution switching module work together to ensure the stability of the time-to-digital converter's measurement accuracy across the entire temperature range of -40°C to 125°C, while optimizing system power consumption.
[0011] Furthermore, the local preprocessing circuit includes: The counting threshold register stores a configurable photon counting threshold, with a configuration range of 4-16; The multi-cycle correlation state machine counts photon events that fall into the same time unit within N consecutive laser cycles, where N ranges from 2 to 8. The event determination logic outputs a valid event flag when the count value of the multi-cycle correlated state machine exceeds the threshold in the count threshold register. The data packaging unit, in response to the valid event flag, packages the corresponding timestamp, cumulative photon count and the coordinates of this pixel into a simplified data packet, and transmits it to the processing layer via DMA; The local preprocessing circuit filters out random noise photon events at the source by performing spatiotemporal correlation screening at the pixel level, which reduces the amount of data uploaded to the processing layer compared to the original photon events.
[0012] Furthermore, it also includes an array control module disposed on the photosensitive layer, the array control module comprising: The bias voltage control unit provides an adjustable overbias voltage to the single-photon avalanche diode array and dynamically adjusts it according to temperature feedback to maintain gain stability. An addressable controller, including row gating circuitry and column readout circuitry, enables independent addressing of any pixel or pixel partition in the array; The dynamic partition management unit divides the array into multiple independently enableable partitions according to scenario requirements. The partition granularity is configured through registers, with a configuration range of 32×24 to 128×96. The bias voltage control unit, addressable controller, and dynamic partition management unit work together to enable the system to selectively activate pixels in the target area, while the bias voltage of pixels in the non-activated area is turned off to achieve pixel-level power consumption control.
[0013] Furthermore, the processing layer includes a direct time-of-flight algorithm accelerator employing a heterogeneous parallel architecture, the accelerator comprising: The histogram construction kernel receives simplified data packets from the photosensitive layer, accumulates valid events into the corresponding time units according to the timestamp, and generates a sparse histogram. The histogram construction kernel has an SRAM architecture with a depth of 1024 bins and a dynamically adjustable bin width of 55ps-200ps, and has a built-in hardware scene analyzer that adaptively adjusts the time unit width and the number of integration frames according to the peak distribution and signal-to-noise ratio of the initial histogram. The multi-echo detection kernel integrates a three-level peak lookup logic to perform multi-level peak lookup on the sparse histogram, extract the flight time corresponding to up to 3 echoes, and calculate a confidence score for each echo based on peak signal-to-noise ratio, pulse width, and consistency between adjacent pixels. The point cloud filtering kernel implements motion blur compensation and rain / snow noise filtering in hardware. Based on the confidence score and the spatiotemporal consistency of adjacent pixels, it filters out noise points and false echoes, optimizing the point cloud error to ±3cm in dynamic scenes. The data scheduling core employs a ring buffer management and zero-copy transmission mechanism to output the filtered point cloud data to the communication layer; The four cores work together in a pipeline manner, so that the end-to-end delay of point cloud output is less than 10 microseconds.
[0014] Furthermore, the processing layer also includes a Gaussian fitting acceleration unit based on the RISC-V instruction set extension, the Gaussian fitting acceleration unit comprising: The VGFIT instruction is a dedicated instruction encoded in the custom-0 opcode space of RISC-V. The function code specifies the Gaussian fitting iteration type. The source register rs1 stores the sampled data vector, rs2 stores the current fitting parameters, and the target register rd outputs the Jacobian matrix contribution term. The dedicated functional unit includes a parallel multiplier array and an exponential function calculation module implemented using lookup tables and linear interpolation. It uses a 5-stage pipeline design to execute the VGFIT instruction, and a single instruction completes the contribution calculation of a data point to the iterative equation within 5 clock cycles. The 128-bit vector register supports single instruction multiple data operations and can process four 32-bit single-precision floating-point data simultaneously. During the execution of the VGFIT instruction, the processor core's standard pipeline continues to execute subsequent instructions independent of its result, enabling parallelism between custom instructions and general-purpose computing, achieving a speedup of more than 20 times compared to software implementation.
[0015] Furthermore, the three-dimensional stacking adopts the following inter-layer collaborative design: The density of the through-silicon via (TSV) interconnect array is 10,000 / mm. 2 The time stamp signal is allocated through a dedicated differential silicon via channel with a transmission rate of 5Gbps. The configuration bus adopts a 32-bit parallel architecture, and the signal, power, and ground silicon vias are configured in a 16:1:1 ratio to ensure signal integrity. A tantalum-based shielding layer with a thickness of 2μm is set between the photosensitive layer and the processing layer, with a shielding effectiveness of more than 40dB. Combined with deep N-well isolation, the power domain separation of the analog circuit of the photosensitive layer and the digital circuit of the processing layer is achieved, thereby reducing the coupling of digital switching noise to the dark counting of the single-photon avalanche diode. A phase-locked loop clock alignment circuit and a through-silicon via (TSV) delay correction circuit are set between layers. The TSV delay correction circuit measures and stores the propagation delay of each TSV when powered on, and dynamically compensates for process deviations through the delay-locked loop during operation, so that the interlayer clock skew is less than 15ps. The interlayer collaborative design enables valid events output by the photosensitive layer to reach the processing layer with a deterministic delay, ensuring system-level accuracy of time-of-flight measurements.
[0016] A data processing method for a fully digital processing on-chip architecture of lidar, applicable to the aforementioned fully digital processing on-chip architecture of lidar, includes the following steps: Perform pixel-level preprocessing within each pixel of the photosensitive layer: (a) Detect photon arrival events, perform avalanche quenching and fast reset through a hybrid quenching circuit, and output photon arrival timestamps by a time-to-digital converter; (b) The multi-cycle correlation state machine of the local preprocessing circuit determines whether the cumulative photon count of the time unit corresponding to the timestamp exceeds the threshold in the counting threshold register within N consecutive laser cycles; (c) If the threshold is exceeded, the data packaging unit generates a simplified data packet containing a timestamp, photon count, and pixel coordinates and uploads it to the processing layer; if the threshold is not exceeded, the timestamp is discarded. Perform sparse data processing at the processing layer: (d) The histogram construction kernel constructs a sparse histogram based solely on the received simplified data packets, and the hardware scene analyzer adaptively adjusts the bin width and integration time according to the histogram characteristics; (e) Multi-echo detection verifies the sparse histogram and performs peak detection to extract multiple echoes and calculate the confidence score of each echo; (f) The point cloud filtering kernel filters out noise points based on the confidence score and spatiotemporal consistency to generate the final point cloud data; Compared to the original photon events, the pixel-level preprocessing reduces the amount of data uploaded to the processing layer, and the sparse data processing makes the end-to-end delay of the point cloud output less than 10 microseconds.
[0017] Furthermore, the hardware scene analyzer in step (d) performs the following adaptive adjustments: An initial fast scan is performed with the default bin width and a small number of integration frames to generate an initial histogram; Peak detection is performed on the initial histogram to determine whether the distribution range of the effective peaks is less than 1 / 4 of the total measurement range. If so, it is determined to be a peak concentration scenario. Calculate the signal-to-noise ratio of the strongest signal peak; If the peak concentration is satisfied and the signal-to-noise ratio is higher than the high-precision mode threshold, then switch the bin width to the 55ps high-precision mode; otherwise, maintain the 200ps wide bin mode. If the signal-to-noise ratio is lower than the preset threshold, the number of integral frames will be dynamically increased to improve the signal-to-noise ratio.
[0018] Compared with the prior art, the beneficial effects of the present invention are: 1. This invention provides a fully digital on-chip architecture and data processing method for LiDAR. It employs a three-dimensional stacked on-chip architecture consisting of a photosensitive layer, a processing layer, and a communication layer, integrating multiple chips from traditional discrete solutions into a single chip. High-speed data transmission between layers is achieved through a through-silicon via (TSV) interconnect array. This heterogeneous integration approach, combining sensing, storage, and computing, significantly reduces system footprint, lowers bill of materials costs and assembly complexity, and enables LiDAR to be adapted to space-constrained mobile platforms such as robots and drones, laying the foundation for the widespread adoption of LiDAR in the consumer market.
[0019] 2. This invention provides a fully digital processing on-chip system architecture and data processing method for lidar. It integrates local preprocessing circuitry within each pixel unit and uses a multi-cycle correlation state machine and a counting threshold judgment mechanism to perform spatiotemporal correlation filtering on photon events at the data source, packaging and uploading only valid events to the processing layer. This near-sensor computing architecture fundamentally solves the problem of excessive raw data volume in traditional architectures, greatly alleviating on-chip bus bandwidth pressure and reducing backend processing load and overall system power consumption.
[0020] 3. This invention provides a fully digital on-chip architecture and data processing method for lidar. It employs a single-photon avalanche diode to directly output digital pulses, which, in conjunction with an on-chip time-to-digital converter, completes time-of-flight measurement. The entire signal link is fully digital, avoiding the sensitivity of traditional analog circuits to power supply noise and electromagnetic interference. The digital architecture offers excellent process repeatability, high pixel consistency, simplifies production testing and calibration processes, and improves yield and manufacturability.
[0021] 4. This invention provides a fully digital processing on-chip system architecture and data processing method for lidar. It integrates a temperature compensation circuit, a nonlinear calibration module, and a dynamic resolution switching module into the time-to-digital converter, enabling adaptive adjustment of operating parameters based on real-time temperature and ambient light conditions. The piecewise linear fitting temperature compensation algorithm effectively suppresses temperature drift, while the dynamic resolution switching mechanism optimizes system power consumption while ensuring ranging accuracy, allowing the system to maintain stable measurement performance over a wide temperature range and under complex lighting conditions. 5. This invention provides a fully digital processing on-chip system architecture and data processing method for lidar. It employs a quad-core heterogeneous parallel direct time-of-flight algorithm accelerator to decouple and accelerate histogram construction, multi-echo detection, point cloud filtering, and data scheduling functions at the hardware level. Combined with a Gaussian fitting acceleration unit based on a custom RISC-V instruction set, it achieves instruction-level parallelism for the core algorithm. Zero-copy transmission mechanism and circular buffer management ensure efficient data flow scheduling, enabling the system to output high-quality point cloud data with extremely low end-to-end latency, meeting the stringent real-time requirements of applications such as autonomous driving. Attached Figure Description
[0022] To more clearly illustrate the specific embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.
[0023] Figure 1 This is a schematic diagram of the system architecture of the present invention; Figure 2 This is a schematic diagram of a pixel unit circuit; Figure 3 This is a schematic diagram of the data processing flow. Detailed Implementation
[0024] The technical solution of the present invention will be more clearly and completely explained below with reference to the accompanying drawings and through the description of preferred embodiments of the present invention.
[0025] Terminology Explanation: SPAD: Single-photon avalanche diode; TDC: Time-to-Digital Converter; dToF: Direct Time of Flight; DMA: Direct Memory Access; RISC-V: The fifth generation of reduced instruction set computer; MIPI CSI-2: Mobile Industry Processor Interface Camera Serial Interface; IMU: Inertial Measurement Unit; SPI: Serial Peripheral Interface; I2C: Inter-Integrated Circuit Bus; MOSFET: Metal-Oxide-Semiconductor Field-Effect Transistor; NMOSFET: N-channel metal-oxide-semiconductor field-effect transistor; PLL: Phase-locked loop, a circuit used for frequency synthesis and phase alignment of clock signals; DLL: Delay-Locked Loop, a circuit used to precisely control signal delay; SRAM: Static Random Access Memory; AGV: Automated Guided Vehicle; PN junction: A structure formed by the contact between a P-type semiconductor and an N-type semiconductor; bin -: Time unit in the histogram.
[0026] The present invention will now be described in further detail with reference to the accompanying drawings and specific embodiments.
[0027] like Figure 1 As shown, the fully digital processing architecture for lidar on-chip provided by this invention adopts a three-layer heterogeneous stacked structure, consisting of a photosensitive layer, a processing layer, and a communication layer from bottom to top. The three layers are interconnected through a silicon via interconnect array to achieve three-dimensional stacking, forming an integrated data processing link of sensing, storage, and computing. The entire architecture realizes end-to-end digital processing from photon detection to point cloud output on a single chip, breaking through the limitations of traditional discrete device architectures in terms of size, power consumption, and cost.
[0028] like Figure 1 and Figure 2 As shown, the photosensitive layer is the front-end sensing unit of the entire system, and its core is a two-dimensional addressable single-photon avalanche diode array. This array adopts a back-illuminated structure design, with each pixel unit integrating a complete photon detection and preprocessing circuit link. The single-photon avalanche diode uses a planar PN junction design, with a deep-well isolation structure around the PN junction to effectively suppress edge breakdown effects and avoid premature breakdown caused by concentrated edge electric fields. A silicon dioxide passivation layer with a thickness of 40nm to 60nm is covered above the photosensitive surface. This passivation layer significantly reduces dark counts caused by surface states, improving the signal-to-noise ratio of the detector. To further improve the optical fill factor, a microlens array is placed above the silicon dioxide passivation layer to focus the incident light onto the effective photosensitive area. By optimizing the electric field distribution structure in the avalanche region, the electric field intensity is uniformly distributed throughout the entire avalanche region, avoiding increased noise caused by excessively strong local electric fields. The synergistic effect of the aforementioned deep-well isolation structure, passivation layer, and electric field optimization distribution structure enables the dark count rate of a single pixel to be controlled below 100 cps, meeting the requirements for high-sensitivity detection.
[0029] like Figure 2As shown, the output of the single-photon avalanche diode in each pixel unit is connected to a hybrid quenching circuit. This hybrid quenching circuit adopts a parallel topology, including two parallel branches: a resistive delay quenching branch and an active quenching branch, as well as a MOSFET active reset unit connected to the outputs of both. The resistive delay quenching branch uses a polysilicon quenching resistor with a resistance of 20kΩ to 50kΩ. When an avalanche occurs, this resistor can provide initial rapid quenching within nanoseconds. The active quenching branch includes quenching control logic and a controlled switch. After detecting the avalanche current, the quenching control logic triggers the controlled switch to turn on within a preset delay, ensuring complete quenching of the avalanche process and effectively suppressing the after-pulse effect. The MOSFET active reset unit uses an NMOSFET with a width-to-length ratio greater than 100:1. After quenching, it can quickly pull the cathode voltage of the single-photon avalanche diode back to the over-biased state, achieving a fast reset in less than 3 nanoseconds. Compared to the reset time of over 100 nanoseconds of traditional passive quenching circuits, this solution improves the reset speed by more than 30 times, significantly increasing the maximum count rate and dynamic range of the detector.
[0030] like Figure 2 As shown, the output of the hybrid quenching circuit is connected to a time-to-digital converter (TD-SCD), which employs a two-stage architecture combining a coarse timing module and a vernier caliper fine timing module. The coarse timing module uses an 8-phase clock sampling architecture, utilizing eight sampling clocks with a phase difference of 1 / 8 of the clock period to coarsely quantize the time interval between the laser synchronization signal and the photon arrival signal, obtaining a rough time measurement. The vernier caliper fine timing module includes a 32-stage ring oscillator, used for fine interpolation of the quantization margin of the coarse timing module. By measuring the propagation position of the photon at its arrival time within the ring oscillator, a precise time measurement is obtained. The time interpolation calculation unit calculates the final timestamp using a weighted algorithm based on the outputs of the coarse and vernier caliper fine timing modules, achieving a high time resolution with a single measurement accuracy better than 55 ps.
[0031] To ensure the stability of the time-to-digital converter's measurement accuracy over a wide temperature range and under different operating conditions, this invention designs a multi-dimensional dynamic compensation mechanism. The temperature compensation circuit includes a high-precision temperature sensor integrated into the pixel area and a compensation calculation unit that executes a piecewise linear fitting algorithm. During the calibration phase, the system acquires nonlinear error data at multiple temperature calibration points across the entire temperature range of -40°C to 125°C, establishing a temperature-error lookup table. During operation, the compensation calculation unit interpolates the stored calibration coefficients based on the real-time temperature, dynamically correcting the measured values to ensure the temperature drift coefficient is less than 0.1 ps / °C. During system power-on, the nonlinear calibration module uses a stable reference clock to measure and store the actual transmission delay of each stage of the 32-stage ring oscillator, compensating for inconsistencies in delay caused by process deviations. During operation, nonlinear compensation is performed on the timestamps based on the stored delay values to eliminate systematic errors in the ring oscillator. The dynamic resolution switching module adaptively switches between a 12-bit high-resolution mode and a 9-bit low-resolution mode based on ambient light intensity and real-time signal-to-noise ratio. In low-light, high-precision scenarios, the system employs a 12-bit high-resolution mode to achieve optimal measurement accuracy. In high-light or low-power scenarios, the system automatically switches to a 9-bit low-power mode, reducing power consumption by 40%. The mode switching latency is less than 100 nanoseconds, ensuring the system can respond quickly to environmental changes.
[0032] The output of the time-to-digital converter is connected to the local preprocessing circuit, which is the core unit for achieving pixel-level intelligent preprocessing. The local preprocessing circuit includes a count threshold register, a multi-cycle correlation state machine, event determination logic, and a data packaging unit. The count threshold register stores a configurable photon count threshold, ranging from 4 to 16, which can be adjusted according to the application scenario. The multi-cycle correlation state machine counts photon events falling within the same time unit over N consecutive laser cycles, where N ranges from 2 to 8. By setting the multi-cycle correlation mechanism, the system can effectively distinguish between random noise photons and valid signal photons. The event determination logic continuously monitors the count value of the multi-cycle correlation state machine; when the count value exceeds the preset threshold in the count threshold register, it outputs a valid event flag. In response to the valid event flag, the data packaging unit packages the corresponding timestamp, cumulative photon count, and the current pixel coordinates into a simplified data packet, which is then transmitted to the processing layer via DMA. The local preprocessing circuit filters out random noise photon events at the source by performing spatiotemporal correlation screening at the pixel level. Simulations show that this mechanism can reduce the amount of invalid data uploaded by 85% to 95% in typical urban scenarios, greatly alleviating the pressure on the on-chip bus bandwidth.
[0033] The photosensitive layer also includes an array control module for unified management and control of the entire single-photon avalanche diode array. A bias voltage control unit provides an adjustable overbias voltage to the single-photon avalanche diode array and dynamically adjusts the bias value based on temperature feedback to maintain gain stability. The addressable controller includes row gating circuits and column readout circuits, enabling independent addressing of any pixel or pixel partition within the array via row-to-column decoders. A dynamic partition management unit divides the array into multiple independently enableable partitions based on scenario requirements. The partition granularity can be configured via registers, ranging from 32×24 to 128×96. The system can activate specific areas as needed for the actual application scenario, achieving fine-grained power management and flexible detection strategies.
[0034] The processing layer is located above the photosensitive layer and achieves high-speed data transmission with the photosensitive layer through a through-silicon via interconnect array. The core of the processing layer is a direct time-of-flight algorithm accelerator with a heterogeneous parallel architecture. This accelerator includes four dedicated computing cores: a histogram construction core, a multi-echo detection core, a point cloud filtering core, and a data scheduling core. Each core is deeply optimized for the lidar data processing pipeline.
[0035] The histogram construction core receives simplified data packets from the photosensitive layer and accumulates valid events into the corresponding time units based on timestamps to generate a sparse histogram. This core features a 1024-bin SRAM architecture with a dynamically adjustable bin width ranging from 55ps to 200ps. The histogram construction core incorporates a hardware scene analyzer, which consists of a lightweight state machine and dedicated computing units. After system power-on or scene reset, a rapid scan is first performed with a default wide bin width (e.g., 200ps) and a limited number of integration frames (e.g., 10 frames) to generate an initial histogram. The scene analyzer then performs hardware-accelerated analysis on the initial histogram: detecting valid echo peaks and calculating the peak distribution range; if the distribution range is less than 1 / 4 of the total measurement range, it is determined to be a peak concentration scene; simultaneously, the signal-to-noise ratio of the strongest signal peak is calculated. Based on the analysis results, the hardware logic automatically performs mode switching: if the peak concentration is met and the signal-to-noise ratio (SNR) is higher than the high-precision mode threshold, the bin width is switched to the 55ps high-precision mode; if the SNR is lower than the preset threshold, the number of integral frames is dynamically increased to improve the SNR. All mode switching is automatically completed by the scene analyzer hardware without processor intervention.
[0036] The multi-echo detection kernel integrates a three-level peak lookup logic, performing multi-level peak detection on sparse histograms to extract the flight time corresponding to up to three echoes. For complex echo situations caused by highly reflective and transparent targets, the multi-echo detection kernel calculates a confidence score for each detected echo. This score is weighted based on multiple dimensions such as peak signal-to-noise ratio, pulse width, and neighboring pixel consistency, providing a reliable basis for subsequent point cloud filtering.
[0037] The point cloud filtering kernel implements motion blur compensation and rain / snow noise filtering at the hardware level. Based on the confidence score output by the multi-echo detection kernel and the spatiotemporal consistency of adjacent pixels, this kernel filters out noise points and spurious echoes, generating high-quality point cloud data. In dynamic scenes, the point cloud error can be optimized to ±3 cm.
[0038] The data scheduling core adopts a ring buffer management and zero-copy transmission mechanism to efficiently output the filtered point cloud data to the communication layer, ensuring that the end-to-end latency of the point cloud output is less than 10 microseconds.
[0039] like Figure 1 As shown, the processing layer also includes a Gaussian fitting acceleration unit based on a RISC-V instruction set extension, specifically designed to accelerate the core Gaussian fitting algorithm in direct time-of-flight processing. This acceleration unit defines the VGFIT dedicated instruction, encoded in the RISC-V custom-0 opcode space. The function code specifies the Gaussian fitting iteration type, source register rs1 stores the sampled data vector, rs2 stores the current fitting parameters, and the target register rd outputs the Jacobian matrix contribution term. The dedicated functional unit includes a parallel multiplier array and an exponential function calculation module implemented using lookup tables and linear interpolation. A 5-stage pipeline design executes the VGFIT instruction, with a single instruction completing the contribution calculation of a data point to the iterative equation within 5 clock cycles. To improve data throughput, this extension supports a 128-bit vector register, capable of simultaneously storing four 32-bit single-precision floating-point data points, enabling single-instruction multiple-data operations. During VGFIT instruction execution, the processor core's standard pipeline continues to execute subsequent instructions independent of the result, achieving efficient parallelism between custom instructions and general-purpose computation.
[0040] The 3D stacked architecture employs a carefully designed inter-layer coordination scheme. The density of the through-silicon via (TSV) interconnect array is 10,000 per mm. 2 A heterogeneous signal layout was implemented to address the characteristics of the LiDAR data stream. A dedicated differential silicon via (SSV) channel was used for timestamp signal allocation, achieving a transmission rate of 5Gbps to ensure reliable transmission of high-precision time information. The configuration bus adopted a 32-bit parallel architecture to guarantee real-time SPAD partition addressing. Signal, power, and ground SSVs were configured in a 16:1:1 ratio to ensure signal integrity and power supply stability.
[0041] A 2μm thick tantalum-based shielding layer is placed between the photosensitive layer and the processing layer, providing a shielding effectiveness greater than 40dB, effectively isolating electromagnetic interference between analog and digital circuits. Combined with deep N-well isolation technology, the power domains of the photosensitive layer's analog circuitry and the processing layer's digital circuitry are completely separated, reducing the coupling effect of digital noise on the SPAD from 15% to 3%.
[0042] Interlayer PLL clock alignment circuits and through-silicon via (TSV) delay correction circuits are implemented to ensure that timing constraints are met throughout the entire path. The PLL clock alignment circuit controls clock skew to within 15 ps. The TSV delay correction circuit measures the propagation delay of each TSV upon system power-up and stores it in on-chip memory. During operation, it dynamically compensates for delay differences caused by process variations using a delay-locked loop, keeping timing deviations on critical TDC paths to within 5 ps.
[0043] The communication layer features a high-speed bus and direct memory access controller for low-latency data transmission. This layer integrates a multi-sensor fusion hardware interface unit, including a global timestamp counter with nanosecond-level precision, providing a unified time reference for all input data. A MIPI CSI-2 interface is integrated into the hardware for connecting to an onboard camera, enabling precise synchronization of laser emission and camera exposure via hardware trigger signals. A dedicated SPI / I2C interface is also provided for connecting to the IMU, supporting hardware-level fusion of point cloud data and inertial navigation data.
[0044] As a specific embodiment, the fully digital processing architecture of the LiDAR of the present invention is applied to the forward main radar system of an autonomous vehicle. The photosensitive layer adopts a single-photon avalanche diode array with a size of 256×192 pixels. Each pixel's single-photon avalanche diode adopts a 40nm thick silicon dioxide passivation layer, and the dark count rate is controlled at 80 cps. In the hybrid quenching circuit, the polysilicon quenching resistor is 30kΩ, and the NMOSFET of the MOSFET active reset unit has a width-to-length ratio of 120:1, with a reset time of approximately 2.5 nanoseconds.
[0045] The coarse timing module of the time-to-digital converter uses a 200MHz 8-phase clock with a coarse quantization resolution of 625ps. The fine timing module of the vernier caliper uses a 32-stage ring oscillator operating at approximately 570MHz with a fine interpolation resolution of approximately 55ps. The temperature compensation circuit uses 8-segment linear fitting within the range of -40°C to 125°C, with a measured temperature drift coefficient of 0.08 ps / °C. The dynamic resolution switching module uses a 12-bit high-resolution mode when the signal-to-noise ratio is greater than 15dB, and switches to a 9-bit low-power mode when the signal-to-noise ratio is less than 8dB, with a switching delay of approximately 80 nanoseconds.
[0046] The local preprocessing circuit's count threshold register is configured with 8, and the number of cycles N of the multi-cycle correlated state machine is configured with 4. This means that a timestamp is only marked as a valid event when the photon count within the same time unit exceeds 8 for four consecutive laser cycles. In typical urban driving scenarios, this configuration reduces invalid data uploads by approximately 92%.
[0047] The histogram construction kernel of the processing layer is configured with a bin depth of 1024 and an initial bin width of 200 ps. When the hardware scene analyzer detects a scene with concentrated peaks and a signal-to-noise ratio greater than 12 dB, it automatically switches the bin width to 55 ps. The multi-echo detection kernel can extract up to 3 echoes in a single frame, with each echo having a confidence score ranging from 0 to 100. Echoes with a confidence score below 30 are filtered by the point cloud filtering kernel.
[0048] The 3D stack employs a through-silicon via (TSV) interconnect array with a density of 12,000 TSVs / mm², a tantalum-based shielding layer thickness of 2.2 μm, and a measured shielding effectiveness of 45 dB. The VGFIT instruction throughput of the Gaussian fitting acceleration unit is approximately 200 million iterations per second.
[0049] The overall system performance indicators are as follows: At a distance of 100m and under clear weather conditions with an ambient light level of 50,000 lux, the single-point ranging accuracy is ±1.5 cm (1σ), the point cloud frame rate is 20 Hz, and the typical power consumption is 3.5W; in a close-range indoor scene at 5m, the ranging accuracy can reach ±5mm, the point cloud frame rate is 30 Hz, and the power consumption is 1.8W; in a long-range rainy or snowy weather at 100m, the ranging accuracy is ±3 cm, the point cloud frame rate is 10 Hz, and the power consumption is 4.2W. The end-to-end latency of the point cloud output is consistently less than 8 microseconds.
[0050] In another specific embodiment, the present invention is applied to an industrial AGV navigation scenario. Considering the characteristics of weak indoor ambient light and close distance, the array partition is configured with a medium granularity of 64×48, activating only the pixels in the central area of the array to reduce power consumption. The counting threshold register is configured with 6, and the number of cycles N is configured with 3 to improve the detection sensitivity of weak signals at close range. The time-to-digital converter operates in 12-bit high-resolution mode, with a fixed bin width of 55ps. The system power consumption is approximately 1.2W, and the ranging accuracy is better than ±5 mm within a working distance of 3m to 15m, meeting the requirements for precise AGV positioning.
[0051] In the third specific embodiment, the present invention is applied to an obstacle avoidance system for unmanned aerial vehicles (UAVs). To meet the stringent requirements of UAVs regarding size and power consumption, a relatively small array with a resolution of 128×96 pixels is used, with the array partitioned into fine-grained 32×24 partitions. The dynamic partition management unit automatically activates pixels in the forward region based on the UAV's flight direction, while pixels in the lateral and rearward regions remain dormant. The time-to-digital converter primarily operates in a 9-bit low-power mode, with a system power consumption of approximately 0.8W. The measurement distance ranges from 0.5m to 50m, meeting the real-time requirements for close-range obstacle avoidance by UAVs, and the point cloud frame rate can reach 30 Hz.
[0052] like Figure 3As shown, the lidar data processing method based on the architecture of this invention is implemented according to the following process: Pixel-level preprocessing is performed within each pixel of the photosensitive layer: First, photon arrival events are detected. When a single-photon avalanche diode avalanche occurs, the hybrid quenching circuit completes avalanche quenching and fast reset, and the time-to-digital converter outputs a photon arrival timestamp. Then, the multi-cycle correlation state machine of the local preprocessing circuit determines whether the cumulative photon count of the time unit corresponding to the timestamp in N consecutive laser cycles exceeds the threshold in the counting threshold register. If it exceeds the threshold, it is determined to be a valid event, and the data packaging unit generates a simplified data packet containing the timestamp, photon count, and pixel coordinates and uploads it to the processing layer. If it does not exceed the threshold, it is determined to be a noise event, and the timestamp is discarded.
[0053] like Figure 3 As shown, sparse data processing is performed at the processing layer: the histogram construction kernel constructs a sparse histogram based solely on the received simplified data packets, and the hardware scene analyzer adaptively adjusts the bin width and integration time according to the histogram characteristics; the multi-echo detection kernel performs peak detection on the sparse histogram, extracts multiple echoes using a three-level peak search logic, and calculates the confidence score of each echo; the point cloud filtering kernel filters out noise points and spurious echoes based on the confidence score and spatiotemporal consistency, generating the final high-quality point cloud data; the data scheduling kernel outputs the point cloud data to the communication layer through a zero-copy transmission mechanism, completing the entire data processing link.
[0054] The adaptive adjustment process of the hardware scene analyzer is as follows: An initial fast scan is performed with a default wide bin width and a relatively small number of integration frames to generate an initial histogram; peak detection is performed on the initial histogram to determine if the distribution range of effective peaks is less than 1 / 4 of the total measurement range. If so, it is determined to be a peak concentration scene; the signal-to-noise ratio (SNR) of the strongest signal peak is calculated; if the peak concentration is satisfied and the SNR is higher than the high-precision mode threshold, the bin width is switched to 55ps for high-precision mode; otherwise, the 200ps wide bin width mode is maintained; if the SNR is lower than the preset threshold, the number of integration frames is dynamically increased to improve the SNR. This adaptive adjustment process is entirely executed automatically by the hardware, achieving a dynamic optimal balance between processing accuracy, measurement range, and processing speed.
[0055] The above-described specific embodiments are merely preferred embodiments of the present invention and are not intended to limit the scope of protection of the present invention. Various modifications, substitutions, and improvements made by those skilled in the art to the technical solutions of the present invention based on the provided textual description and drawings, without departing from the design concept and spirit of the present invention, should all fall within the scope of protection of the present invention. The scope of protection of the present invention is determined by the claims.
Claims
1. A fully digital processing on-chip system architecture for lidar, characterized in that, It includes a photosensitive layer, a processing layer, and a communication layer; The photosensitive layer is stacked in three dimensions through a silicon via interconnect array, forming an integrated data processing link of sensing, storage, and computing; The photosensitive layer includes a two-dimensional addressable single-photon avalanche diode array, with each pixel unit integrating: A single-photon avalanche diode employing a planar PN junction design and covered with a passivation layer; The output terminal of the single-photon avalanche diode is connected to a hybrid quenching circuit, which includes a parallel resistive delayed quenching branch and an active quenching branch, as well as a MOSFET active reset unit connected to the output terminals of both. The output of the hybrid quenching circuit is connected to a time-to-digital converter, which adopts an architecture that combines a coarse timing module and a vernier caliper fine timing module. A local preprocessing circuit is connected to the output of the time-to-digital converter; The local preprocessing circuit includes a counting threshold register and a multi-cycle correlation state machine, configured to: perform continuous multi-cycle correlation judgment on the timestamp output by the time-to-digital converter; and only when the photon count in the same time unit exceeds the preset threshold in the counting threshold register within N consecutive laser cycles, mark the timestamp as a valid event and generate a simplified data packet containing the timestamp, photon count and pixel coordinates. The processing layer receives simplified data packets from the photosensitive layer through the through-silicon via interconnect array, and constructs a sparse histogram and outputs point cloud data based on the simplified data packets.
2. The fully digital processing on-chip system architecture for lidar according to claim 1, characterized in that, The single-photon avalanche diode employs the following structural synergistic design: A planarized PN junction is surrounded by a deep well isolation structure to suppress edge breakdown; A silicon dioxide passivation layer with a thickness of 40nm-60nm is applied over the photosensitive surface to reduce the dark count caused by surface states; A microlens array is disposed above the silicon dioxide passivation layer to improve the optical fill factor; The electric field distribution structure is optimized to ensure a uniform distribution of electric field intensity in the avalanche region; The deep well isolation structure, passivation layer, and electric field optimized distribution structure work together to reduce the dark count rate of a single pixel to less than 100 cps. The three-dimensional stacking adopts the following inter-layer collaborative design: The density of the through-silicon via (TSV) interconnect array is 10,000 / mm. 2 The time stamp signal is allocated through a dedicated differential silicon via channel with a transmission rate of 5Gbps. The configuration bus adopts a 32-bit parallel architecture, and the signal, power, and ground silicon vias are configured in a 16:1:1 ratio to ensure signal integrity. A tantalum-based shielding layer with a thickness of 2μm is set between the photosensitive layer and the processing layer, with a shielding effectiveness greater than 40dB. Combined with deep N-well isolation, the power domain separation of the analog circuit of the photosensitive layer and the digital circuit of the processing layer is achieved. A phase-locked loop clock alignment circuit and a through-silicon via (TSV) delay correction circuit are set between layers; the TSV delay correction circuit measures and stores the propagation delay of each TSV when powered on, and dynamically compensates for process deviations through a delay-locked loop during operation.
3. The fully digital processing on-chip system architecture for lidar according to claim 1, characterized in that, In the hybrid quenching circuit: The resistance-delayed quenching branch includes polysilicon quenching resistors with resistance values of 20kΩ-50kΩ; The active quenching branch includes quenching control logic and a controlled switch. After detecting the avalanche current, the quenching control logic triggers the controlled switch to turn on within a preset delay. The MOSFET active reset unit uses an NMOSFET with a width-to-length ratio greater than 100:1, which quickly pulls the cathode voltage of the single-photon avalanche diode back to the over-biased state after quenching. The resistance-delayed quenching branch provides initial rapid quenching, while the active quenching branch ensures complete quenching to suppress subsequent pulses.
4. The fully digital processing on-chip system architecture for lidar according to claim 1, characterized in that, The time-to-digital converter includes: The coarse timing module adopts an 8-phase clock sampling architecture to coarsely quantize the time interval between the laser synchronization signal and the photon arrival signal. The vernier caliper fine timing module includes a 32-stage ring oscillator for fine interpolation of the quantization margin of the coarse timing module; The time interpolation calculation unit calculates the final timestamp based on the outputs of the coarse timing module and the vernier caliper fine timing module.
5. The fully digital processing on-chip system architecture for lidar according to claim 4, characterized in that, The time-to-digital converter also includes: The temperature compensation circuit includes a temperature sensor integrated into the pixel area and a compensation calculation unit that performs a piecewise linear fitting algorithm. The compensation calculation unit interpolates the stored calibration coefficients based on the real-time temperature to dynamically correct the measured values, so that the temperature drift coefficient is less than 0.1 ps / °C. The nonlinear calibration module measures and stores the transmission delay of each stage of the 32-stage ring oscillator based on a reference clock during the power-on phase, and performs nonlinear compensation on the timestamp based on the stored value during the operation phase. The dynamic resolution switching module adaptively switches between 12-bit high-resolution mode and 9-bit low-resolution mode based on ambient light intensity and real-time signal-to-noise ratio, with a switching delay of less than 100 nanoseconds. The temperature compensation circuit, nonlinear calibration module, and dynamic resolution switching module work together to ensure the stability of the time-to-digital converter's measurement accuracy across the entire temperature range from -40°C to 125°C.
6. The fully digital processing on-chip system architecture for lidar according to claim 1, characterized in that, The local preprocessing circuit includes: The counting threshold register stores a configurable photon counting threshold, with a configuration range of 4-16. A multi-cycle correlated state machine counts photon events that fall into the same time unit within N consecutive laser cycles, where N ranges from 2 to 8. The event determination logic outputs a valid event flag when the count value of the multi-cycle correlated state machine exceeds the threshold in the count threshold register. The data packaging unit, in response to the valid event flag, packages the corresponding timestamp, cumulative photon count and the coordinates of this pixel into a simplified data packet, and transmits it to the processing layer via DMA; The local preprocessing circuit filters out random noise photon events at the source by performing spatiotemporal correlation screening at the pixel level.
7. The fully digital processing on-chip system architecture for lidar according to claim 1, characterized in that, It also includes an array control module disposed on the photosensitive layer, the array control module comprising: The bias voltage control unit provides an adjustable overbias voltage to the single-photon avalanche diode array and dynamically adjusts it according to temperature feedback to maintain gain stability. An addressable controller, including row gating circuitry and column readout circuitry, enables independent addressing of any pixel or pixel partition in the array; The dynamic partition management unit divides the array into multiple independently enableable partitions according to scenario requirements. The partition granularity is configured through registers, with a configuration range of 32×24 to 128×96.
8. The all-digital processing on-chip system architecture for lidar according to claim 1, characterized in that, The processing layer includes a direct time-of-flight algorithm accelerator employing a heterogeneous parallel architecture, the accelerator comprising: The histogram construction kernel receives simplified data packets from the photosensitive layer, accumulates valid events into the corresponding time units according to the timestamp, and generates a sparse histogram. The histogram construction kernel has an SRAM architecture with a depth of 1024 bins and a dynamically adjustable bin width of 55ps-200ps, and a built-in hardware scene analyzer that adaptively adjusts the time unit width and the number of integration frames based on the peak distribution and signal-to-noise ratio of the initial histogram. The multi-echo detection kernel integrates a three-level peak lookup logic to perform multi-level peak lookup on the sparse histogram, extract the flight time corresponding to up to 3 echoes, and calculate a confidence score for each echo based on peak signal-to-noise ratio, pulse width, and consistency between adjacent pixels. The point cloud filtering kernel implements motion blur compensation and rain / snow noise filtering in hardware. Based on the confidence score and the spatiotemporal consistency of adjacent pixels, it filters out noise points and false echoes. The data scheduling core employs a ring buffer management and zero-copy transmission mechanism to output the filtered point cloud data to the communication layer; The processing layer also includes a Gaussian fitting acceleration unit based on a RISC-V instruction set extension, the Gaussian fitting acceleration unit comprising: The VGFIT instruction is a dedicated instruction encoded in the custom-0 opcode space of RISC-V. The function code specifies the Gaussian fitting iteration type. The source register rs1 stores the sampled data vector, rs2 stores the current fitting parameters, and the target register rd outputs the Jacobian matrix contribution term. The dedicated functional unit includes a parallel multiplier array and an exponential function calculation module implemented using lookup tables and linear interpolation. It uses a 5-stage pipeline design to execute the VGFIT instruction, and a single instruction completes the contribution calculation of a data point to the iterative equation within 5 clock cycles. The 128-bit vector register supports single instruction multiple data operations and can process four 32-bit single-precision floating-point data simultaneously. During the execution of the VGFIT instruction, the processor core's standard pipeline continues to execute subsequent instructions independent of its result, enabling parallelism between custom instructions and general-purpose computing.
9. A data processing method for a fully digital processing on-chip architecture of a lidar system, applicable to the fully digital processing on-chip architecture of a lidar system as described in any one of claims 1-8, characterized in that, Includes the following steps: Perform pixel-level preprocessing within each pixel of the photosensitive layer: (a) Detect photon arrival events, perform avalanche quenching and fast reset through a hybrid quenching circuit, and output photon arrival timestamps by a time-to-digital converter; (b) The multi-cycle correlation state machine of the local preprocessing circuit determines whether the cumulative photon count of the time unit corresponding to the timestamp exceeds the threshold in the counting threshold register within N consecutive laser cycles; (c) If the threshold is exceeded, the data packaging unit generates a simplified data packet containing a timestamp, photon count, and pixel coordinates and uploads it to the processing layer; if the threshold is not exceeded, the timestamp is discarded. Perform sparse data processing at the processing layer: (d) The histogram construction kernel constructs a sparse histogram based solely on the received simplified data packets, and the hardware scene analyzer adaptively adjusts the bin width and integration time according to the histogram characteristics; (e) Multi-echo detection verifies the sparse histogram and performs peak detection to extract multiple echoes and calculate the confidence score of each echo; (f) The point cloud filtering kernel filters out noise points based on the confidence score and spatiotemporal consistency to generate the final point cloud data.
10. The data processing method for a fully digital processing on-chip system architecture of lidar according to claim 9, characterized in that, The hardware scene analyzer in step (d) performs the following adaptive adjustments: An initial fast scan is performed with the default bin width and a small number of integration frames to generate an initial histogram; Peak detection is performed on the initial histogram to determine whether the distribution range of the effective peaks is less than 1 / 4 of the total measurement range. If so, it is determined to be a peak concentration scenario. Calculate the signal-to-noise ratio of the strongest signal peak; If the peak concentration is satisfied and the signal-to-noise ratio is higher than the high-precision mode threshold, then switch the bin width to the 55ps high-precision mode; otherwise, maintain the 200ps wide bin mode. If the signal-to-noise ratio is lower than the preset threshold, the number of integral frames will be dynamically increased to improve the signal-to-noise ratio.