A cell tracking based FPGA hardware acceleration technique method

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By using FPGA hardware acceleration technology to replace the operation of YOLOv7 and U-Net networks, and combining ant colony algorithm and Kalman filter model, the problems of many parameters and low efficiency in cell tracking algorithm are solved, and efficient and real-time cell image processing and analysis are realized.

CN117669672BActive Publication Date: 2026-06-19CHANGCHUN UNIV OF SCI & TECH

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: CHANGCHUN UNIV OF SCI & TECH
Filing Date: 2023-12-11
Publication Date: 2026-06-19

Application Information

Patent Timeline

11 Dec 2023

Application

19 Jun 2026

Publication

CN117669672B

IPC: G06N3/063; G06N3/045; G06N3/0464; G06T7/13; G06T7/11; G06T5/70; G06T7/246; G16H50/70; G06T5/40

AI Tagging

Application Domain

Image enhancement Medical data mining

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing deep learning-based cell tracking algorithms have many parameters, low efficiency, and are difficult to apply in real-world scenarios. Furthermore, conventional equipment cannot support massive data analysis, and the real-time performance of functional modules cannot be guaranteed.

Method used

Employing FPGA hardware acceleration technology, cell image sequence preprocessing is performed using FPGA chips, replacing the repetitive convolution operations in YOLOv7 and the downsampling part in the U-Net network. Channel and spatial attention mechanisms are designed, and combined with ant colony algorithm and Kalman filter model, cell detection, segmentation and tracking are realized.

Benefits of technology

It significantly improves the efficiency and accuracy of cell tracking algorithms, enabling efficient and real-time cell image processing and analysis. It can run on conventional equipment and is suitable for complex and ever-changing real-world scenarios.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN117669672B_ABST

Patent Text Reader

Abstract

This invention belongs to the field of algorithm acceleration technology, specifically proposing an FPGA hardware acceleration method based on cell tracking. This invention accurately detects and tracks multiple cells using a cell tracking algorithm, and replaces the algorithm structure in the deep learning module with an FPGA-based hardware structure, aiming to improve the algorithm's running speed and reduce the burden on deep learning devices. By implementing key parts of the tracking algorithm on the FPGA, such as downsampling and repeated convolution, the algorithm's running speed and parallel processing capabilities can be significantly improved. Simultaneously, FPGA hardware acceleration can also reduce power consumption and system latency, improving overall system performance. The tracking algorithm and FPGA hardware acceleration method proposed in this invention can achieve rapid and accurate tracking of large-scale cell populations, providing strong support for biomedical research. Furthermore, this method can also be applied to other fields requiring efficient processing of large-scale data, such as image processing and video surveillance.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of algorithm acceleration technology, specifically to an FPGA hardware acceleration technology method based on cell tracking. Background Technology

[0002] Human research places greater emphasis on biomedicine, as normal tissue development and pathological changes rely on cell division and movement. Accurate analysis of cell behavior is fundamental to understanding complex cellular pathological changes. With the rapid development of computer vision, a machine vision-based multi-cell tracking algorithm has emerged. However, multi-cell tracking algorithms face numerous challenges due to small differences in cell shape, significant background interference, and cell division issues. The most significant challenge is the excessive number of parameters in deep learning-based multi-cell tracking algorithms, placing a heavy burden on equipment. Even when deep learning-based cell tracking algorithms can run, conventional equipment struggles to support massive data analysis, the real-time performance of functional modules cannot be guaranteed, and the algorithm's efficiency is low, making it difficult to apply to complex and variable real-world scenarios.

[0003] FPGA (Field-Programmable Gate Array), as a hardware module, can offload a large number of data operations in a network model to hardware, thus being widely used to accelerate algorithms. Unlike CPUs and GPUs, the internal structure of an FPGA can be customized according to different application scenarios, enabling it to perform intensive computations in convolution and sampling processes in a highly parallel manner with low power consumption. It possesses high flexibility, low power consumption, and parallel computing characteristics, making this technology very suitable for application in the biomedical field. Summary of the Invention

[0004] (a) Technical problems to be solved

[0005] To address the shortcomings of existing technologies, this invention provides an FPGA hardware acceleration technology method based on cell tracking, which solves the problems of deep learning-based cell tracking algorithms having many parameters, low efficiency, and difficulty in being applied in real-world scenarios.

[0006] (II) Technical Solution

[0007] To achieve the aforementioned objectives, this invention proposes an FPGA hardware acceleration technique for cell tracking. This method utilizes an FPGA chip for cell image sequence preprocessing, replaces the repetitive convolution operations in YOLOv7 and the downsampling portion in the U-Net network with FPGA hardware modules, and allocates independent feature spaces for each population using memory. This effectively reduces the computational burden in deep learning methods, solves real-time issues in practical applications, enables deep learning-based cell tracking algorithms to run on conventional devices, and significantly improves algorithm efficiency.

[0008] The technical approach to achieving this invention is to combine deep learning, FPGA hardware acceleration, and cell tracking algorithms to build a complete FPGA hardware-accelerated cell tracking framework, ultimately realizing feasible cell tracking in practical applications. The specific implementation steps include the following:

[0009] S1. Transmit the cell image sequence to the FPGA chip for preprocessing. This step includes operations such as denoising, enhancement, and edge detection to improve the accuracy and stability of subsequent algorithms.

[0010] S2. The repetitive convolution operations in the YOLOv7 cell detection unit are replaced with hardware modules from an FPGA, and channel attention and spatial attention mechanisms are designed within the FPGA to improve the traditional YOLOv7 network. This accelerates the cell detection process and improves detection accuracy and robustness.

[0011] S3. Replace the downsampling part in the U-Net cell segmentation unit with an FPGA hardware module, and design a 3D attention mechanism in the FPGA to further improve the segmentation ability of the network. By implementing these operations in hardware, the speed of cell segmentation can be accelerated and the accuracy of segmentation can be improved.

[0012] S4. The detection and segmentation results output by the improved YOLOv7 network and U-Net network are transmitted to the ant colony algorithm module. The ant colony algorithm module uses the multi-ant colony algorithm to track each cell and performs trajectory reconstruction and abnormal cell detection based on the tracking results.

[0013] S5. Utilizing the parallel computing capabilities of FPGA hardware can improve the processing efficiency of the ant colony algorithm. Furthermore, by recording the position, velocity, and state information of each cell in the FPGA's built-in memory, faster and more accurate cell tracking can be achieved.

[0014] S6. Design a Kalman filter model in the FPGA to assist in judging cell tracking results and avoid tracking failure due to cell occlusion.

[0015] S7. The tracking results are transmitted back to a computer or other device for further processing, such as display, analysis and saving. Through hardware acceleration of the entire process, efficient and real-time cell image processing and analysis can be achieved.

[0016] Further, in step S1, preprocessing is performed on the FPGA chip, including:

[0017] The Canny operator was used for cell image preprocessing, which included Gaussian smoothing filtering, amplitude and orientation calculation, double threshold segmentation, and non-maximum suppression.

[0018] Gaussian smoothing filtering is used to remove the effects of noise. Specifically, a Gaussian low-pass filter is used, and the formula is shown below:

[0019] g(x,y)=h(x,y,σ)*f(x,y)#(1)

[0020] Where g(x, y) represents the smoothed image, f(x, y) represents the original image, and h(x, y, σ) represents the Gaussian filter function. The specific formula for the Gaussian filter function is as follows:

[0021]

[0022] Furthermore, in step S2, the repetitive convolution operation in the YOLOv7 cell detection unit is replaced by a hardware module of the FPGA, specifically including:

[0023] A synchronous pipeline design is used to accelerate the hardware of repeated convolutions, and the three multiplications of the 3×1 convolution kernel are computed in parallel.

[0024] Originally, YOLOv7 required three steps for data retrieval, computation, and storage. The pipeline optimization design means that the second data is retrieved while the first data is being processed, and the first data is stored and the third data is retrieved in parallel while the second data is being processed. This modular logic based on FPGA hardware improves operating efficiency without consuming device performance.

[0025] Furthermore, FPGA hardware modules can improve operating speed by sacrificing area and space. In this invention, three DSP cores are used to participate in the operation, thereby achieving parallel computation of three multiplications in one operation cycle, which further improves the efficiency of the algorithm.

[0026] Furthermore, in step S3, an FPGA hardware module is used to replace the downsampling portion in the cell segmentation unit U-Net network, specifically including:

[0027] This invention uses a maximum value output sampling method within a 2×2 downsampling window, designing a periodic operation for odd and even row data. In each pair of odd and even rows, the larger value is selected, and then compared to obtain the maximum value. In total, the 2×2 downsampling operation is implemented using an FPGA hardware module. This periodic alternation pattern is suitable for the operating conditions of an FPGA.

[0028] Furthermore, step S5 involves recording information such as the position, velocity, and state of each cell in the FPGA's built-in memory, specifically including:

[0029] In the original algorithm, the position, morphological information, and motion state of each cell are stored using matrices, involving some matrix affine transformations and inverse transformations. This invention uses the built-in memory of an FPGA to allocate separate storage space for each cell population, eliminating the need for affine transformations and significantly reducing the computational steps required in the algorithm.

[0030] (III) Beneficial Effects

[0031] Compared with existing technologies, this invention provides an FPGA hardware acceleration technology method based on cell tracking, which has the following beneficial effects:

[0032] 1. This invention utilizes FPGA for preprocessing of cell image sequences, which can improve the accuracy and stability of subsequent algorithms.

[0033] 2. This invention uses FPGA hardware modules to replace the convolution operations in the traditional YOLOv7 network, and adds channel attention mechanism and spatial attention mechanism, which can accelerate the cell detection process and improve the detection accuracy and robustness.

[0034] 3. This invention uses an FPGA hardware module to replace the downsampling part in the U-Net network and designs a 3D attention mechanism, which can improve the segmentation ability of the network. Hardware implementation can speed up cell segmentation and improve segmentation accuracy.

[0035] 4. In this invention, the detection and segmentation results output by the improved YOLOv7 network and U-Net network are transmitted to the ant colony algorithm module. The ant colony algorithm module uses a multi-ant colony algorithm to track each cell and performs trajectory reconstruction and abnormal cell detection based on the tracking results.

[0036] 5. This invention utilizes the parallel computing capabilities of FPGA hardware to improve the processing efficiency of the ant colony algorithm. Simultaneously, recording the position, velocity, and state of each cell in the FPGA's built-in memory enables faster and more accurate cell tracking.

[0037] 6. This invention designs a Kalman filter model in an FPGA to assist in judging cell tracking results, avoiding tracking failures caused by cell occlusion.

[0038] 7. This invention transmits the tracking results back to a computer or other device for subsequent processing, such as display, analysis, and storage. Through hardware acceleration of the entire process, efficient and real-time cell image processing and analysis can be achieved. Attached Figure Description

[0039] Figure 1 This is a schematic diagram of the process structure of the present invention;

[0040] Figure 2 This is a schematic diagram of the YOLOv7 neural network structure of the present invention;

[0041] Figure 3 This is a schematic diagram of the YOLOv7 attention mechanism structure of the present invention;

[0042] Figure 4 This is a schematic diagram of the U-Net attention mechanism structure of the present invention;

[0043] Figure 5 This is a schematic diagram of the improved U-Net network structure of the present invention;

[0044] Figure 6 This is a schematic diagram of the structure of a multi-group ant colony algorithm;

[0045] Figure 7 This is a schematic diagram of the interface implementation for FPGA hardware algorithm acceleration based on cell tracking. Detailed Implementation

[0046] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0047] Example

[0048] like Figure 1-7 As shown in the figure, an embodiment of the present invention proposes an FPGA hardware acceleration technology method based on cell tracking, which includes the following steps:

[0049] S1. The cell image sequence is transmitted to the FPGA chip for preprocessing. This step includes operations such as denoising, enhancement, and edge detection to improve the accuracy and stability of subsequent algorithms.

[0050] S11. As a programmable logic device, the FPGA chip has parallel processing capabilities and low latency, enabling it to efficiently process large amounts of image data. After the image is transmitted to the FPGA chip, a denoising operation is first performed. Mean filtering is used to eliminate noise in the image, thereby reducing interference in subsequent processing and improving image quality.

[0051] S12. Perform image enhancement operations. This method uses a histogram equalization algorithm to highlight the details and features of the cell image.

[0052] S13. In this method, the histogram equalization algorithm is performed using FPGA. First, the upstream module outputs histogram statistics, and then the cumulative sum of histogram gray levels is calculated.

[0053] S14. The shortcomings of FPGA itself in multiplication and division are addressed by calling the multipliers and dividers in Xilinx.

[0054] S15. Preprocessing of cell images using the Canny operator, specifically including Gaussian smoothing filtering, calculation of amplitude and orientation, double threshold segmentation, and non-maximum suppression.

[0055] S16. Use Gaussian smoothing filtering to remove the influence of noise, specifically using Gaussian low-pass filtering, as shown in the formula below:

[0056] g(x,y)=h(x,y,σ)*f(x,y)#(1)

[0057] S17, where g(x, y) represents the smoothed image, f(x, y) represents the original image, and h(x, y, σ) represents the Gaussian filter function. The specific formula for the Gaussian filter function is as follows:

[0058]

[0059] S18. There are two key aspects to calculating the magnitude and direction of the gradient vector: the magnitude and direction. The formulas for the gradient vector's direction and magnitude are as follows:

[0060]

[0061]

[0062] S19, where G x Take the partial derivative of the original image with respect to x, G y The result of taking the partial derivative of the original image with respect to y.

[0063] S110. Comparing the image preprocessing results based on FPGA and those based on conventional deep learning, it was found that FPGA-based image preprocessing achieved faster processing speed while maintaining similar processing results. The algorithm's average tracking accuracy reached 99.51%, and its efficiency was three times that of conventional deep learning algorithms.

[0064] S2. The repetitive convolution operations in the YOLOv7 cell detection unit are replaced with hardware modules from an FPGA, and channel attention and spatial attention mechanisms are designed within the FPGA to improve the traditional YOLOv7 network. This accelerates the cell detection process and improves detection accuracy and robustness.

[0065] S21. A synchronous pipeline design is adopted to accelerate the hardware of repeated convolutions, and the three multiplications of the 3×1 convolution kernel are calculated in parallel to further improve the efficiency of the algorithm.

[0066] S22. Originally, YOLOv7 required three steps for data retrieval, computation, and storage. The pipeline optimization design means that the second data is retrieved while the first data is being processed, and the first data is stored and the third data is retrieved in parallel while the second data is being processed. This modular logic based on FPGA hardware improves the operating efficiency without consuming device performance.

[0067] S23. In addition, the FPGA hardware module can improve the running speed by sacrificing area and space. In this invention, three DSP cores are used to participate in the operation, so that three multiplications can be performed in parallel in one cycle, thereby further improving the efficiency of the algorithm.

[0068] S24. The spatial attention and channel attention mechanisms in this method consist of three operational parts: dot product calculation, softmax calculation, and weighted average calculation.

[0069] S25, where the formula for calculating channel attention Mc is:

[0070] M c =sigmoid(MLP(Avgpool(F))+MLP(Maxpool(F)))#(5)

[0071] S26. The formula for calculating spatial attention Ms is:

[0072] M s =sigmoid(Conv(Avgpool(F)),(Maxpool(F)))#(6)

[0073] S27. The dot product is obtained by summing the elements of the output vector of the fully connected layer and the feature map. Similarly, we use a pipelined approach to perform parallel operations to improve the efficiency of the dot product calculation.

[0074] The S28 and Softmax calculation functions use Taylor series expansion operations in the FPGA to solve the problem. The weighted average calculation adopts a collaborative approach of segmented data transmission and pipelined calculation to improve computational efficiency.

[0075] S29. In the original unoptimized YOLOv7 deep learning model, a single-channel sequence of length 1024 requires 1024×3×3 clock cycles for each downsampling process. After using the FPGA hardware acceleration method of this method, the computing speed is increased by about nine times.

[0076] S210. Experimental results show that the original YOLOv7 algorithm model achieved an mAP value of 67.52% in object detection. After using FPGA-based hardware acceleration and adding an attention mechanism hardware module, the corresponding mAP value increased to 70.62%, and the performance improved by 3.1%.

[0077] S3. Replace the downsampling part of the U-Net cell segmentation unit with an FPGA hardware module, and design a 3D attention mechanism in the FPGA to further improve the network's segmentation ability. By implementing these operations in hardware, the speed of cell segmentation can be accelerated and the accuracy of segmentation can be improved.

[0078] S31. This invention uses a maximum value output sampling method within a 2×2 downsampling window, designing a periodic operation for odd and even row data. In each pair of odd and even row data, the larger value is selected, and then compared to obtain the maximum value. In total, the 2×2 downsampling operation is implemented using an FPGA hardware module. This periodic alternation pattern is suitable for the operating state of an FPGA.

[0079] S32. A nonlinear and normalized acceleration unit is used to enable FPGA to implement accelerated support for 3D attention mechanisms and to implement a general-purpose visual Transformer accelerator based on a general computational mapping method for FPGA.

[0080] S33. Based on the parallel operation characteristics of convolution operation, the parallel efficiency of convolution operation is improved by using FPGA module. This method adopts a 16-channel parallel operation strategy, which is significantly better than the traditional three-channel parallel model.

[0081] S34. Utilizing the symmetric structure of the U-Net network, residual connections are introduced, and a "retain or skip" connection strategy is adopted. Connections that are improved by adding residual blocks are retained, while connections that cause performance degradation are skipped directly.

[0082] S35. The average Dessian similarity coefficient (ADSC) was used to evaluate the image segmentation effect. The original U-Net network without the addition of hardware acceleration module had an index of 77.63%. After adding the hardware-based downsampling operation and residual network, the corresponding index was improved to 85.24%, which is a performance improvement of 7.61%.

[0083] S4. The detection and segmentation results output by the improved YOLOv7 network and U-Net network are transmitted to the ant colony algorithm module. The ant colony algorithm module uses a multi-ant colony algorithm to track each cell and performs trajectory reconstruction and abnormal cell detection based on the tracking results;

[0084] S41. For abnormal events in cell tracking itself, such as cell division and apoptosis, the original cell tracking algorithm's abnormal handling model is used for processing. Since the algorithm itself uses a heuristic algorithm to handle abnormal behavior and does not involve a large amount of data computation, the abnormal handling module is implemented in software and is not accelerated by FPGA hardware.

[0085] S42. The operation of reconstructing the trajectory based on the trajectory results occurs after each iteration of the ant colony algorithm. It only involves some trajectory splicing and detection operations and does not involve complex calculation processes. This part is handled by the original software algorithm.

[0086] S43. Experimental results show that, in terms of tracking accuracy, detection accuracy, and algorithm running efficiency, the FPGA hardware acceleration technology based on cell tracking is significantly better than the algorithm model without hardware acceleration. Essentially, it uses the space and resources of the FPGA hardware module to exchange for the working efficiency of the algorithm.

[0087] S44. Performance comparisons were performed on datasets including Fluo-N2DL-HeLa-01, Fluo-N2DL-HeLa-02, DIC-C2DH-HeLa-01, DIC-C2DH-HeLa-02, DIC-C2DH-HeLa-03, DIC-C2DH-HeLa-04, PhC-C2DL-PSC-01, and PhC-C2DL-PSC-02. The false detection rate, accuracy, precision, and F1 score obtained by this method are superior to other mainstream algorithms in a comprehensive comparison.

[0088] S5. Utilizing the parallel computing capabilities of FPGA hardware can improve the processing efficiency of the ant colony algorithm. Furthermore, by recording the position, velocity, and state of each cell in the FPGA's built-in memory, faster and more accurate cell tracking can be achieved.

[0089] S51. In the original algorithm, the position, morphological information, and motion state of each cell are stored using matrices, involving some matrix affine transformations and inverse transformations. This invention uses the built-in memory of the FPGA to allocate separate storage space for each cell population, eliminating the need for affine transformations and significantly reducing the computational steps required in the algorithm.

[0090] S52, the feature matching acceleration module in cell tracking consists of a control unit, a Winograd accelerator core, a data transmission unit, and a DDR. It uses the AXI bus to communicate with the DDR and then uses the FIFO function for data caching, thereby maximizing bandwidth utilization and reducing the algorithm complexity to less than half of the original.

[0091] S6. Design a Kalman filter model in the FPGA to assist in judging cell tracking results and avoid tracking failure due to cell occlusion.

[0092] S61. The Kalman filter model is a purely mathematical formula operation, involving a large number of matrix operations. This invention first saves the state transition of the Kalman filter to be estimated and the observation matrix into the FPGA hardware module.

[0093] S62. Initialize the filter, and then use the state transition matrix to predict the state based on the system model and the current state estimate. The state prediction is obtained by multiplying the state estimate from the previous time step with the system model.

[0094] S63. Use the observation matrix to convert the state prediction values into prediction values in the measurement space, and calculate the measurement residuals and the covariance matrix of the measurement residuals.

[0095] S64. Calculate the Kalman gain and update the state estimate. Continuously perform state prediction and measurement updates based on changes in the system model and measurements. Each iteration updates the state estimate and covariance matrix to obtain the optimal state estimate result.

[0096] S65. Compared with the cell tracking algorithm without Kalman filtering, when encountering cell occlusion problems, the cell detection module based solely on observation will misdetect overlapping cells as a single cell, affecting subsequent tracking performance. In contrast, Kalman filtering can easily analyze historical data to determine the predicted path of each cell.

[0097] S7. The tracking results are transmitted back to a computer or other device for further processing, such as display, analysis, and storage. Hardware acceleration throughout the entire process enables efficient, real-time cell image processing and analysis.

[0098] Finally, it should be noted that the above descriptions are merely preferred embodiments of the present invention and are not intended to limit the present invention. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing embodiments or make equivalent substitutions for some of the technical features. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.

Claims

1. A FPGA hardware acceleration technique method based on cell tracking, characterized in that: Includes the following steps: S1. Transmit the cell image sequence to the FPGA chip for preprocessing, including denoising, enhancement, and edge detection operations to improve the accuracy and stability of subsequent algorithms. S2. Replace the repetitive convolution operation in the YOLOv7 cell detection unit with a hardware module of FPGA, and improve the YOLOv7 network by designing channel attention mechanism and spatial attention mechanism in FPGA. This will accelerate the cell detection process and improve the detection accuracy and robustness. Spatial attention and channel attention mechanisms consist of three operational parts: dot product calculation, softmax calculation, and weighted average calculation. The formula for calculating channel attention Mc is as follows: ; The formula for calculating spatial attention Ms is: ; The dot product is calculated by summing the elements of the output vector of the fully connected layer and the feature map. Similarly, a pipelined approach is used to perform parallel operations to improve the efficiency of the dot product calculation. The Softmax calculation function is solved using Taylor series expansion in the FPGA, and the weighted average calculation adopts a cooperative approach of segmented data transmission and pipelined calculation to improve computational efficiency. The S2 step, which utilizes an FPGA hardware module to replace the repetitive convolution operations in the YOLOv7 cell detection unit, specifically includes: A synchronous pipeline design is used to accelerate the hardware of repeated convolutions, and the three multiplications of the 3×1 convolution kernel are computed in parallel. Originally, YOLOv7 required three steps for data retrieval, computation, and storage. The pipeline optimization design means that the second data is retrieved while the first data is being processed, and the first data is stored and the third data is retrieved in parallel while the second data is being processed. This modular logic based on FPGA hardware improves the efficiency of operation without consuming device performance. In addition, the FPGA hardware module improves the running speed by sacrificing area and space, and uses three DSP cores to participate in the operation, thereby realizing three parallel calculation operations in one cycle, which further improves the efficiency of the algorithm. S3. Replace the downsampling part in the U-Net cell segmentation unit with an FPGA hardware module, and design a 3D attention mechanism in the FPGA to further improve the segmentation ability of the network. By implementing these operations in hardware, the speed of cell segmentation is accelerated and the accuracy of segmentation is improved. Step S3 uses an FPGA hardware module to replace the downsampling part in the cell segmentation unit U-Net network, specifically including: Using a 2×2 downsampling window with maximum output sampling, a periodic operation of odd and even row data is designed. In each pair of odd and even row data, the larger value is selected, and then the larger value is compared to obtain the maximum value. In total, the 2×2 downsampling operation is implemented using FPGA hardware modules. This periodic alternating mode is suitable for the working state of FPGA. S4. The detection and segmentation results output by the improved YOLOv7 network and U-Net network are transmitted to the ant colony algorithm module. The ant colony algorithm module uses the multi-ant colony algorithm to track each cell and performs trajectory reconstruction and abnormal cell detection based on the tracking results. S5. By utilizing the parallel computing capabilities of FPGA hardware, the processing efficiency of the ant colony algorithm can be improved. At the same time, by recording the position, velocity, and state information of each cell in the FPGA's built-in memory, faster and more accurate cell tracking can be achieved. S6. Design a Kalman filter model in the FPGA to assist in judging cell tracking results and avoid tracking failure due to cell occlusion. S7. Transmit the tracking results back to a computer or other device for further processing, including display, analysis and storage. Through hardware acceleration throughout the process, efficient and real-time cell image processing and analysis are achieved.

2. The FPGA hardware acceleration technique based on cell tracking method of claim 1, wherein: Preprocessing is performed in the FPGA chip in S1, including: The Canny operator was used for cell image preprocessing, which included Gaussian smoothing filtering, calculation of magnitude and orientation, double threshold segmentation, and non-maximum suppression. Gaussian smoothing filtering is used to remove the effects of noise. Specifically, a Gaussian low-pass filter is used, and the formula is shown below: ; wherein denotes the smoothed image, denotes the original image, denotes a Gaussian filter function, the specific expression of which is as follows: 。 3. The FPGA hardware acceleration technique based on cell tracking method of claim 1, wherein: Step S5 involves recording the position, velocity, and state information of each cell in the FPGA's built-in memory, specifically including: In the original algorithm, the position, morphological information, and motion state of each cell are stored using matrices, which involves some matrix affine transformations and inverse transformations. By using the FPGA's built-in memory to allocate separate storage space for each cell population, the affine transformation operation is eliminated, reducing the computational steps required in the algorithm.