A deep learning-based microfluidic microdroplet array fluorescence proportion information statistical method and system
By using the DropletSegNet model to segment droplets and statistically analyze the positive rate in droplet digital PCR detection technology, the problems of inaccurate droplet segmentation due to adhesion and insufficient imaging resolution in existing technologies are solved, thereby improving the accuracy and stability of nucleic acid concentration calculation.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- WENZHOU QINGFENG BIOMEDICAL TECHNOLOGY CO LTD
- Filing Date
- 2026-05-15
- Publication Date
- 2026-06-19
AI Technical Summary
Existing droplet digital PCR detection technologies suffer from inaccurate droplet adhesion and segmentation, poor stability due to intensity threshold dependence, insufficient removal of interference signals, insufficient geometric consistency, and insufficient imaging resolution, leading to inaccurate nucleic acid concentration calculations and positive ratio statistics.
We employ the DropletSegNet model based on deep learning, and through three-dimensional fluorescence volume image preprocessing, multi-scale feature extraction, center-boundary collaborative attention fusion, and graph convolutional network, we achieve accurate droplet segmentation and positive ratio statistics.
It improves droplet segmentation and statistical accuracy, enhances robustness to changes in imaging conditions and noise, ensures the stability and geometric consistency of results, and enables more flexible fluorescence ratio statistics.
Smart Images

Figure CN122244012A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of ddPCR detection technology, specifically to a method and system for statistical analysis of fluorescence ratio information of microfluidic droplet arrays based on deep learning. Background Technology
[0002] Digital polymerase chain reaction (ddPCR) is a high-precision absolute quantitative detection method for nucleic acids based on microfluidic technology. This technology disperses the sample into a large number of uniform, independent microdroplets using a microfluidic chip, with each microdroplet acting as an independent miniature reactor. Target nucleic acid molecules are diluted at the single-molecule level and amplified independently within these microdroplets, significantly improving detection sensitivity and quantitative accuracy.
[0003] During the reaction, the sample and oil phase mix in a microchannel to form droplets, which are then enclosed within the oil phase to prevent cross-contamination. After amplification, droplets containing the target nucleic acid show positive fluorescence in the fluorescence detection system, while droplets without the target molecule show negative fluorescence. By capturing the fluorescence signal of the droplets and performing intensity analysis, the system can distinguish between positive and negative droplets and calculate their proportion. Based on the Poisson distribution model, the absolute copy number of the target nucleic acid molecule in the sample can be further calculated.
[0004] Compared to traditional quantitative PCR (qPCR), ddPCR does not rely on a standard curve for relative quantification; instead, it achieves absolute quantification based on the droplet positivity rate. Therefore, it has significant advantages in scenarios such as gene mutation detection, copy number variation analysis, rare sequence identification, and micro-sample detection. Existing droplet digital PCR detection and classification methods have the following shortcomings: First, the droplets adhere and are not accurately separated.
[0005] For example, Chinese patent CN106596489A proposes a method for processing fluorescence intensity data in fluorescent droplet detection, used for classifying positive and negative droplets in microdroplet digital PCR. By collecting and preprocessing fluorescent droplet data, it uses maximum likelihood estimation to fit the fluorescence intensity distribution to a normal distribution, calculates the classification threshold, and identifies positive droplets. While it proposes a classification method based on fluorescence intensity data and maximum likelihood estimation, it does not consider differences in droplet size and morphology, easily including excessively large or agglomerated abnormal droplets in the statistics, leading to discrepancies between the calculated nucleic acid concentration and the statistically significant positive proportion.
[0006] Chinese patent CN117274131A proposes a method and system for quantifying droplet digital PCR. This method generates droplets using a droplet generator and arranges them in a single layer within a flow channel plate. It uses an optical module to acquire bright-field and fluorescence images, identifying droplet positions and sizes in the bright-field image and performing droplet segmentation and masking in the fluorescence image. The grayscale value of each droplet is then counted and classified to quantify the number of positive and negative droplets. Although this patent incorporates a combination of bright-field and fluorescence image processing, in scenarios involving droplet accumulation or adhesion, its watershed segmentation is still prone to oversegmentation or undersegmentation, making it difficult to ensure segmentation accuracy.
[0007] Secondly, existing technologies are mostly based on preset intensity thresholds for classification. For example, Chinese patent CN106596489A relies on a fixed formula to calculate the threshold, which can easily cause false positives or false negatives under different experimental conditions. On the other hand, Chinese patent CN117274131A requires manually setting grayscale ranges to distinguish between positive, negative and uncertain droplets, which has poor stability across experiments and makes it difficult to obtain consistent fluorescence ratio statistical results.
[0008] Third, existing technologies suffer from insufficient interference signal removal. For example, Chinese patent CN113789259A designs a microdroplet digital nucleic acid detection device based on machine learning, combining microfluidic chip technology to achieve high-throughput droplet generation, nucleic acid amplification, and fluorescence detection. Machine learning algorithms are used to automatically classify and process fluorescence images, enabling absolute quantitative detection of viral nucleic acids. However, artifacts such as background signals, chip edges, flow guides, and fluorescence contamination points exist in the fluorescence images, which are difficult to effectively eliminate using traditional threshold segmentation, easily leading to classification confusion. Similarly, while Chinese patent CN117274131A proposes methods for removing light spots and scratches, it still struggles to completely remove small satellite droplets and strong light contamination; residual artifacts may be counted as valid droplets, affecting classification accuracy.
[0009] Fourth, existing technologies suffer from insufficient geometric consistency and morphological constraints. For example, Chinese patent CN109657731A proposes an anti-interference classification method for a droplet digital PCR instrument. It utilizes image processing technology to convert fluorescence intensity data into binary images, combining support vector machines and optimized classification models. By introducing artificial false negative data and adjusting the classification threshold, it improves the anti-interference classification ability of fluorescence data. Although an anti-interference classification method is proposed, it still relies solely on fluorescence intensity data and does not fully utilize the morphological features of droplets, resulting in insufficient geometric consistency in droplet segmentation results and a tendency to misclassify in atypical droplet cases. Furthermore, Chinese patent CN113789259A relies on machine learning but does not introduce droplet geometric priors, making it difficult to guarantee the coordination between droplet boundaries and center features.
[0010] Fifth, it is not robust enough to noise and insufficient imaging resolution.
[0011] Existing methods generally rely on single image features (such as fluorescence intensity alone) and lack comprehensive modeling of multidimensional features. When the Z-axis resolution of microscopic imaging is insufficient or experimental noise is significant, the classification results of CN106596489A and CN109657731A are easily affected by interference; although CN117274131A has added vignetting correction, it is still difficult to maintain statistical accuracy and stability in complex noise environments. Summary of the Invention
[0012] To address the shortcomings of existing technologies, the present invention aims to provide a method and system for statistical analysis of fluorescence ratio information in microfluidic droplet arrays based on deep learning.
[0013] To achieve the above objectives, the present invention provides the following technical solution: A deep learning-based statistical method for fluorescence ratio information of microfluidic droplet arrays includes the following steps: S1 acquires a three-dimensional fluorescence volume image of the droplet; and preprocesses the three-dimensional fluorescence volume image. S2 constructs the DropletSegNet model; S3 pre-trains the DropletSegNet model built in S2; S4 uses the three-dimensional fluorescence volume image of the target as input to the DropletSegNet model, and transforms the original output of the DropletSegNet model into droplet analysis results with physical meaning and biological interpretability. The construction steps of the DropletSegNet model include: S21 extracts multi-scale features from the input three-dimensional fluorescence volume image through an encoder; S22 inputs the bottleneck features of the deepest layer output of the encoder into the decoder, and gradually restores the spatial resolution through cascaded asymmetric upsampling modules; S23 After the decoder completely restores the feature map spatial resolution to the same full resolution as the input image, the high-resolution features are fed in parallel into two independent 3D convolutional branches: the center branch and the boundary branch. Then, center-boundary collaborative attention fusion is performed to obtain the final fused full-resolution features. F fused The central branch is used to emphasize the reconstruction of the droplet body region, and the boundary branch is used to emphasize the reconstruction of the droplet boundary and the contact gap region between adjacent droplets. S24 will feature full resolution. F fusedThe input multi-task prediction head simultaneously outputs a droplet foreground probability map, a sphericity prediction map, and a radius field prediction map. The foreground probability map, activated by a sigmoid function, represents the confidence level of each voxel belonging to the droplet foreground. The sphericity prediction map, activated by a sigmoid function, quantifies the geometric consistency of each voxel with the spherical droplet in space. The radius field prediction map, activated by a non-negative constraint, represents the estimated distance from each voxel to the centroid of its droplet. S25 constructs a graph convolutional network based on the foreground probability map and optional radius field prediction map and / or sphericity prediction map, and performs graph convolutional edge discrimination.
[0014] Step S1 includes: S11 Image Acquisition: Three-dimensional fluorescence volume images were acquired from the microfluidic droplet array experimental platform or reconstructed from two-dimensional / pseudo-three-dimensional slice sequences. Anisotropic pixel spacing Δx, Δy, and Δz were recorded, where Δz... Δx = Δy; S12 Normalization and Background Suppression: Perform Z-score intensity normalization on the input 3D fluorescence volume image and use flat field correction or background subtraction to suppress non-uniform illumination; S13 voxel anisotropy processing: Maintain the original image resolution and spatial proportions, without interpolating or scaling the Z-axis. S14 data augmentation performs random geometric transformation and light intensity transformation on the training data. The random geometric transformation and light intensity transformation include one or more of random flipping, rotation, intensity jittering and Gaussian noise perturbation; wherein the transformation amplitude in the Z-axis direction is smaller than that in the XY plane.
[0015] In S21, Multi-scale features are extracted through a series of sequentially connected coding units. Each coding unit first uses a three-dimensional anisotropic convolutional kernel with a size of (1, 3, 3) to extract features, and downsampling is performed only in the XY plane while maintaining the resolution in the Z direction. Then perform anisotropic attention fusion: extract Z-axis features using a depthwise separable convolution with kernel size (3, 1, 1), extract XY-axis features using a depthwise separable convolution with kernel size (1, 3, 3), and pass through a global channel gating factor. The outputs of the two branches are adaptively fused. The fused features are then projected by a 1×1×1 convolution and added to the input features as residuals, which are then used as the output of this coding unit.
[0016] The global channel gating factor Given a weight vector set by channel, the weighted fusion is performed according to... Execution, ⊙ indicates element-wise multiplication broadcast by channel.
[0017] Each asymmetric upsampling module employs 3D linear interpolation, scaling up by a factor of 2 only in the XY plane while maintaining the same resolution in the Z-axis. After interpolation, a 1×3×3 3D convolutional block is cascaded. After each upsampling stage, the upsampled features of that stage are concatenated with the native features of the same resolution in the corresponding layer of the encoder along the channel dimension. The concatenated composite features are then subjected to 3D convolution for dimensionality reduction and fed into another anisotropic attention module for recalibration and refinement of the Z-axis and XY-axis features, serving as the input for the next upsampling stage.
[0018] The center-boundary collaborative attention fusion step is as follows: using the features output by the center branch... F c Generate query Q, with features output by the boundary branches. F b Key K and value V are generated. Q, K, and V are all obtained through 1×1×1 three-dimensional convolutional linear mapping. The attention relevance matrix is calculated based on query Q and key K. C represents the channel dimension or embedding dimension. The interaction feature H = Attn × V is then calculated using the attention relevance matrix. Finally, H is compared with... F c By summing the residuals, we obtain the fused full-resolution features. F fused .
[0019] In S25, the graph convolutional network includes: The foreground probability map is binarized according to a preset threshold to obtain a foreground voxel mask. Three-dimensional connected component analysis is then performed on the foreground voxel mask to obtain candidate regions. Each candidate region or the candidate fragment obtained after further oversegmentation of the candidate region is used as a node in the graph. For each node, extract a feature vector containing node volume, average confidence, average radius, and optional average sphericity. Concatenate the node centroid coordinates with the above feature vectors to form the node input matrix X0. Construct the adjacency structure of the graph based on the Euclidean distance between the centroids of the nodes; The initial node representation H0 is obtained by linearly mapping the features of the node input matrix X0. Then, several graph convolutional layers are stacked on the adjacency relationships with added self-loops and symmetric normalization to aggregate neighborhood features, resulting in the enhanced node representation. }; Traverse the nodes in the graph that have edges, construct edge features and input them into the edge classifier, and the edge classifier outputs a binary classification decision on whether the edge should be kept or cut. Finally, a disjoint-set data structure is performed based on the retained edge set to achieve node connectivity aggregation and update instance labels, resulting in the optimized droplet instance segmentation result.
[0020] In S3, S31 constructs a multi-task supervision signal, which includes: foreground segmentation supervision signal, sphericity supervision signal, radius field supervision signal, and topology graph edge supervision signal; S32 constructs a multi-task loss function, which includes a loss function for foreground segmentation supervision, a loss function for sphericity prediction graph supervision, a loss function for radius field prediction, and a loss function for edge binary classification supervision of the graph decoupling module, and weights and combines the loss functions to form the total loss function; S33 uses the total loss function to train the DropletSegNet model.
[0021] S4 includes the following: S41 acquires a 3D fluorescence volume image and outputs a foreground probability map, a sphericity prediction map, and a radius field prediction map after inference using the DropletSegNet model. Threshold segmentation is applied to the foreground probability map to obtain a binary droplet mask. 3D connected component analysis is performed on the binary droplet mask to extract candidate droplet regions. The sphericity prediction map is used to constrain the sphericity of each candidate region, filtering or marking abnormal candidate regions. The radial geometric information provided by the radius field prediction map is used to correct or refine the boundaries of the candidate regions. S42 For regions with adhesion, high-density stacking, or abnormal morphology, the graph structure post-processing module is activated to optimize instance labels; the graph structure post-processing module includes: generating candidate nodes based on the droplet candidate region, constructing an adjacency graph, inputting the graph structure into a graph neural network and performing binary classification on the edges in the graph, performing node aggregation or separation based on the classification results, and updating instance labels; S43 extracts fluorescence intensity features from each droplet region in the final instance label image, performs positive / negative classification based on the fluorescence intensity features, counts the number of positive droplets and the total number of droplets, and calculates the positive ratio.
[0022] A deep learning-based statistical system for fluorescence ratio information of microfluidic droplet arrays, comprising: The image acquisition module is used to acquire three-dimensional fluorescence volume images of the droplets and to preprocess the three-dimensional fluorescence volume images. The DropletSegNet model includes: The encoder consists of multiple levels of sequentially connected coding units, each of which contains a three-dimensional anisotropic convolutional feature block and an anisotropic attention module. The decoder is connected to the deepest output of the encoder and contains multiple cascaded asymmetric upsampling modules. A full-resolution dual-branch structure is set at the output of the decoder to feed the feature map restored to the original input resolution into a central branch and a boundary branch in parallel. The central branch and the boundary branch are independent three-dimensional convolutional branches, which are used to reconstruct the main body region of the droplet and the boundary and contact gap region of the droplet, respectively. The center-boundary collaborative attention module generates queries using the output features of the center branch and generates keys and values using the output features of the boundary branches. It calculates interaction features through an attention mechanism and adds the interaction features to the residual of the output features of the center branch to obtain the fused full-resolution features. The multi-task prediction head receives the fused full-resolution features and outputs in parallel a droplet foreground probability map, a sphericity map, and a radius field prediction map, wherein the foreground probability map and the sphericity map are activated by Sigmoid, and the radius field prediction map is activated by non-negative constraints. The droplet graph construction and graph convolution edge discrimination module takes the foreground probability graph and optional radius field prediction graph and / or sphericity graph as input. It generates candidate nodes by binarizing the foreground probability graph and performing three-dimensional connected component analysis. For each node, it extracts a feature vector including volume, average confidence, average radius, and optional average sphericity. Based on the Euclidean distance between node centroids, it constructs the graph's adjacency structure. In the graph convolutional layer, it performs neighborhood aggregation on the node features to obtain an enhanced node representation. Then, for each edge, it constructs an edge feature including the representations of the two endpoint nodes, the difference vector, and the centroid distance. The edge classifier determines whether the edge should be retained or pruned, and the instance label is updated based on the set of retained edges. The output module is used to transform the raw output of the DropletSegNet model into droplet analysis results that are physically meaningful and biologically interpretable.
[0023] The beneficial effects of this invention are: 1. It solves the problem of inaccurate segmentation due to droplet adhesion, and improves the accuracy of segmentation and statistics.
[0024] By introducing droplet geometric prior constraints, including sphericity prediction branch and radius field prediction branch, abnormal shapes or adhesion regions are quantified and filtered.
[0025] A droplet stacking topology decoupling module was designed, modeling the adhesion decoupling as an edge classification problem on a graph structure, and using a graph neural network (GCN) for structured post-processing. This achieved accurate decoupling and segmentation even in cases of tight droplet stacking or adhesion, improving the accuracy of nucleic acid concentration calculation and positive rate statistics.
[0026] 2. It enhances robustness to changes in imaging conditions and noise, ensuring the stability of the results.
[0027] An anisotropic attention module was constructed, which structurally compensates for the problem that the z-axis resolution in microdroplet fluorescence images is significantly lower than that in the XY axis by modeling and adaptively fusing features in the z and XY axes respectively, thus significantly improving the robustness of droplet detection and segmentation under low-resolution imaging conditions.
[0028] 3. Achieved more flexible and adaptable fluorescence ratio statistics.
[0029] A statistical method for fluorescence signals based on droplet instance-level fluorescence intensity extraction and automatic classification is proposed, replacing the classification methods in existing methods that rely on fixed thresholds or manually set grayscale ranges. This solves the problem of poor stability of traditional methods under different experimental conditions, and obtains more consistent and accurate statistical results for the positive / negative droplet ratio.
[0030] 4. Improved the geometric consistency of segmentation results and avoided misjudgment.
[0031] The Design Center - Boundary Cooperative Attention Module applies geometric consistency constraints to the droplet segmentation results through the interaction mechanism of center features (droplet core localization) and boundary features (contour details). This avoids misjudgment or morphological distortion that is prone to occur in atypical droplet cases when relying solely on intensity features or without introducing geometric priors. Attached Figure Description
[0032] Figure 1 This is a flowchart of the present invention.
[0033] Figure 2 This is a schematic diagram of the DropletSegNet model of the present invention.
[0034] Figure 3 This is a schematic diagram of the graph convolutional network of the present invention. Detailed Implementation
[0035] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present invention, and not all of them. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0036] It should be noted that all directional indications (such as up, down, left, right, front, back, etc.) in the embodiments of the present invention are only used to explain a specific posture (as shown in the figure).
[0037] like Figure 1 As shown, this invention discloses a deep learning-based method for statistical analysis of fluorescence ratio information in microfluidic droplet arrays, comprising the following steps: S1 acquires a three-dimensional fluorescence volume image of the droplet; and preprocesses the three-dimensional fluorescence volume image. S2 constructs the DropletSegNet model; S3 pre-trains the DropletSegNet model built in S2; S4 uses the three-dimensional fluorescence volume image of the target as input to the DropletSegNet model and transforms the original output of the DropletSegNet model into droplet analysis results with physical meaning and biological interpretability.
[0038] S1 data acquisition and preprocessing (optional, to be enabled according to application requirements). S1.1 Image Acquisition Three-dimensional fluorescence volumetric images are acquired from a microfluidic droplet array experimental platform, or volumetric data are reconstructed from two-dimensional / pseudo-three-dimensional slice sequences. The imaging system records anisotropic pixel spacing Δx, Δy, and Δz, typically satisfying Δz... Δx = Δy, to reflect the physical characteristic of the microscope having lower resolution in the optical axis direction (Z axis).
[0039] S1.2 Normalization and Background Suppression Z-score intensity normalization is performed on the input volume to eliminate intensity differences between different batches or imaging conditions; and flat field correction / background subtraction is used to suppress non-uniform illumination and ensure stable contrast between the droplet fluorescence signal and the background.
[0040] S1.3 Voxel Anisotropy Treatment The original image resolution and spatial proportions are maintained without Z-axis interpolation or scaling. Subsequently, the Z-axis and XY plane features are separated and modeled in the anisotropic convolution and attention mechanism of the deep network to compensate for imaging anisotropy at the model structure level.
[0041] S1.4 Data Augmentation To enhance the model's generalization ability, random geometric and light intensity transformations are performed on the training data, including random flipping, slight rotation, intensity jitter, and Gaussian noise perturbation. The transformation amplitude in the Z-axis direction is smaller than that in the XY plane to maintain the near-spherical physical morphology of the droplet.
[0042] In S2, an encoder-decoder 3D network, DropletSegNet, is used, and a multi-task prediction head and a graph neural network post-processing layer are coupled at the decoding end to form an end-to-end droplet segmentation and adhesion decoupling process. For example... Figure 2 As shown, the specific steps for building the DropletSegNet model include: S21 extracts multi-scale features from the input three-dimensional fluorescence volume image through an encoder; A1. Three-dimensional anisotropic convolutional feature block (ConvBlock3D): Features are extracted from the input volume data using a three-dimensional anisotropic convolutional operator with a kernel size of (1,3,3). In a preferred implementation, the ConvBlock3D downsamples only in the XY plane while maintaining resolution in the Z direction. This downsampling can be achieved by setting the convolution stride or a pooling operator. The ConvBlock3D may include a Conv3d, a three-dimensional normalization operator, and a nonlinear activation function connected in sequence. The three-dimensional normalization operator can be any one or a combination of Batch Normalization, Instance Normalization, or Group Normalization. The activation function can be ReLU or other equivalent nonlinear functions used to extract low-level voxel features.
[0043] In another embodiment, the 3D convolutional kernel (1, 3, 3) can be replaced with a standard isotropic 3D convolutional kernel (3, 3, 3). Interpolation upsampling (e.g., bilinear or cubic interpolation) can be performed directly on the z-axis during preprocessing to match the XY-axis resolution, thereby transforming the anisotropic problem into an isotropic one, which can then be processed using a standard 3D U-Net structure.
[0044] A2. Anisotropic Attention Module: Depthwise separable 3D convolutions are used to model feature information in the Z-dimensional and XY-dimensional planes respectively. Specifically, the Z-axis branch uses a depthwise separable convolution with a kernel size of (3,1,1) to extract structural continuity features across layers, while the XY-axis branch uses a depthwise separable convolution with a kernel size of (1,3,3) to extract in-plane edge and texture detail features. A global channel gating factor is introduced into the outputs of both branches. Perform adaptive fusion, where This can be a weight vector set according to channels, used to adjust the contribution ratio of Z-axis features to XY-axis features. In a preferred implementation, let the input feature be F, and the Z-axis branch output be... The XY branch output is Then the fusion features It can be represented as:
[0045] in This indicates element-wise multiplication (broadcast by channel).
[0046] After fusion, conduct Convolution projection yields The module output is obtained by adding the residuals of the input features F to the output. ,For example:
[0047] Through the above-mentioned structure of "separation modeling - gated weighted fusion - projection and residual", the network can adaptively adjust the dependence on Z-axis and XY-axis information according to the input features without changing the anisotropic sampling relationship of the original body data. This achieves structural compensation for the incomplete axial information caused by insufficient Z-axis resolution in ddPCR imaging.
[0048] A3. Multi-layer coding: The encoder consists of multiple levels of coding units connected in series to achieve multi-scale feature representation. A3.1 Layer stacking method: The encoder is composed of N levels of coding units connected in series, where N≥2; each level of coding unit includes ConvBlock3D and anisotropic attention module in sequence, which is used to first extract local structural features, and then perform anisotropic feature recalibration and fusion.
[0049] A3.2 Spatial Scale Transformation Strategy: Spatial scale transformation of each coding unit is preferably implemented by an asymmetric three-dimensional max pooling layer (MaxPool3d), with its pooling kernel and stride set to (1,2,2), so that it performs downsampling by 2 times only in the XY plane direction while maintaining resolution without downsampling in the Z direction.
[0050] A3.3 Channel Number Configuration Strategy: The number of feature channels can be gradually increased with the coding depth to enhance the high-level semantic expression capability; channel growth can be achieved by a fixed multiple, a fixed increment, or other equivalent strategies.
[0051] A3.4 Attention Activation Strategy: The anisotropic attention module can be activated in each level of coding unit; in another optional implementation, the anisotropic attention module is only activated in some coding layers to reduce computational complexity while ensuring segmentation accuracy.
[0052] A3.5 Bottleneck layer setting: A bottleneck layer is set at the deepest layer of the encoder. The contextual information is further aggregated through a deeper ConvBlock3D to obtain a high semantic feature representation. The "deeper" can be achieved by increasing the number of convolutional layers, increasing the number of channels, introducing dilated convolutions to expand the receptive field, or using other equivalent methods. The output features of the bottleneck layer are used as input for subsequent decoding and reconstruction.
[0053] S22 inputs the bottleneck features of the deepest layer output of the encoder into the decoder, and gradually restores the spatial resolution through cascaded asymmetric upsampling modules; S23 After the decoder completely restores the feature map spatial resolution to the same full resolution as the input image, the high-resolution features are fed in parallel into two independent 3D convolutional branches: the center branch and the boundary branch. Then, center-boundary collaborative attention fusion is performed to obtain the final fused full-resolution features. F fused The central branch is used to emphasize the reconstruction of the droplet body region, and the boundary branch is used to emphasize the reconstruction of the droplet boundary and the contact gap region between adjacent droplets. The boundary collaborative attention module can also be replaced with a standard skip connection structure, which directly concatenates or adds the features of the corresponding layer of the encoder with the features upsampled by the decoder to achieve a simple fusion of the center features and boundary features.
[0054] Specifically, the decoder employs an architecture that first performs asymmetric scale recovery and then performs dual-branch collaborative fusion at full resolution to ensure accurate, high-resolution reconstruction of droplet boundaries and adhesion regions. This includes: B1. Asymmetric Scale Recovery and SkipConnection After being output from the deepest bottleneck layer of the encoder, the features enter the decoder backbone. The decoding process uses cascaded asymmetric upsampling modules (UpConv3D) to restore spatial resolution step by step.
[0055] Specifically, the upsampling operation employs three-dimensional linear interpolation (Trilinear Interpolation) and strictly limits the scale magnification factor to [value missing]. This means that the XY plane is magnified by a factor of 2, while the Z-axis resolution remains unchanged. This design replaces the traditional deconvolution, effectively avoiding checkerboard artifacts common in 3D image reconstruction and maintaining the translation invariance of microfluidic anisotropic data on the Z-axis. A 1×3×3 3D convolutional block is cascaded after interpolation to eliminate aliasing effects caused by interpolation.
[0056] Subsequently, the network introduces a skip connection, which concatenates the current upsampled features with the corresponding native features of the same resolution from the encoder along the channel dimension, forming a composite feature that combines shallow high-frequency spatial details with deep global semantic information. The concatenated composite feature is further dimensionality-reduced through a 3D convolutional block and then input into an anisotropic attention module for recalibration and refinement of Z-axis and XY-axis features, serving as the input to the next decoding layer.
[0057] B2. Full-Resolution Dual-Branch Feature Extraction After completing scale recovery for all decoding levels, ensuring the feature map spatial resolution is fully restored to match the initial input image (i.e., full resolution), the network feeds these high-resolution features in parallel into two independent 3D convolutional branches for decoupled representation: Center Branch: Used to emphasize the reconstruction of the droplet body region, enhance the stable expression of the droplet core features, and improve the robustness of the localization of the droplet body; Boundary Branch: Used to emphasize the reconstruction of droplet boundaries and contact gaps between adjacent droplets, enhancing sensitivity to boundary details and separation cues to reduce the risk of misconnection in densely packed areas.
[0058] B3. Center-Boundary Co-Attention Module To constrain the separation response of the boundary branches by utilizing the subject localization information provided by the central branch, and to reverse the regional consistency representation of the central branch by utilizing the boundary / gap information provided by the boundary branches, this network adopts a center-boundary collaborative attention mechanism for the final interactive fusion of features at the full resolution level.
[0059] Let the output characteristics of the central branch be... The boundary branch output features are The query (Q) is generated using the central branch features, and the key (K) and value (V) are generated using the boundary branch features. Q, K, and V are all obtained by 1×1×1 three-dimensional convolutional linear mapping.
[0060] Calculate the attention relevance matrix based on query and key. :
[0061] Where C represents the channel dimension or embedding dimension used in attention calculation.
[0062] Furthermore, the values of the boundary feature mapping are used with the attention matrix. We perform weighted aggregation to obtain the interaction features. :
[0063] Due to boundary values The interactive features have been mapped to the target channel space via convolution. After shape reconstruction, directly connect with the initial central branch features. By summing the residuals, we obtain the final fused full-resolution features. :
[0064] Through the aforementioned high-resolution center-guided boundary information injection, the network can effectively impose geometric consistency constraints on adherent regions. (Fused features) It can be directly used as a unified input for multi-task prediction heads (segmentation head, sphericity head, radius head).
[0065] S24 will feature full resolution. F fused The input is a multi-task prediction head, which simultaneously outputs a droplet foreground probability map, a sphericity prediction map, and a radius field prediction map. The foreground probability map, activated by a sigmoid function, represents the confidence level of each voxel belonging to the droplet foreground. The sphericity prediction map, activated by a sigmoid function, quantifies the geometric consistency of each voxel with the spherical droplet in space. The radius field prediction map, activated by a non-negative constraint, represents the estimated distance from each voxel to the centroid of its respective droplet. The multi-task prediction head includes: C1. Segmentation Head: This branch performs channel compression using a 1×1×1 3D convolution kernel, followed by a sigmoid activation function, outputting a probability map of the droplet foreground. [0,1]. This probability map represents the confidence level of each voxel belonging to the droplet foreground, and a binary segmentation mask can be obtained through thresholding; the threshold can be a fixed threshold or adaptively determined according to the statistical characteristics of the image.
[0066] C2. Sphericity Prediction Head: This branch structure is similar to the segmentation head, using 1×1×1 3D convolution with sigmoid activation to output a sphericity map. [0,1]. This output quantifies the degree of geometrical consistency between each voxel and the spherical droplet in space, reflecting whether the droplet shape deviates from the near-spherical prior. The sphericity map can be subsequently used to filter out anomalous shapes, identify adhesion regions, or serve as a geometrical consistency constraint signal to improve segmentation stability. In a preferred embodiment, the supervision signal of the sphericity map can be constructed from the distance transformation of the instance mask, spherical fitting error, morphological features, or other equivalent methods.
[0067] C3. Radius Field Head: This branch uses a 1×1×1 3D convolution followed by non-negative constraint activation to output a predicted droplet radius field map. ≥ 0. The non-negative constraint activation function can be ReLU, Softplus, or other equivalent functions. The "radius field" is used to characterize the radial distance distribution with the droplet centroid as a reference, that is, to estimate the distance between each voxel and the droplet centroid, thereby constructing a continuous geometric guidance map for each instance. The radius field can be used to assist boundary reconstruction, compensate for information loss in low-resolution directions, and can be combined with center / boundary features for the separation and correction of adhesion regions.
[0068] S25 constructs a graph convolutional network based on the foreground probability map and optional radius field prediction map and / or sphericity prediction map, and performs graph convolutional edge discrimination.
[0069] like Figure 3 As shown, this step is used to solve over-segmentation or under-segmentation problems in scenarios such as droplet adhesion, accumulation, and abnormal morphology.
[0070] Specifically, it includes: D1. Input Acquisition: Receives a droplet foreground probability map and an optional geometric auxiliary map output by a voxel-level segmentation network. The foreground probability map is the sigmoid activation output of the segmentation head, representing the confidence level that each voxel belongs to the droplet foreground. Optionally, a radius field prediction map and / or a sphericity map may also be received, where the radius field prediction map is the output of the radius field prediction head after non-negative constraint activation (e.g., Softplus), used to characterize the estimated distance from the voxel to the local centroid.
[0071] D2. Candidate Region Generation and Node Construction: The foreground probability map is binarized based on a preset threshold to obtain a foreground voxel mask. Three-dimensional connected component analysis (optional 6 / 18 / 26 adjacency methods) is then performed on this mask to extract candidate regions. In one preferred embodiment, each candidate region can be directly considered as a node in the graph. In another preferred embodiment, to enhance the processing capability for under-segmented scenes, optional candidate over-segmentation can be performed on candidate regions with low sphericity and where the radius field / center response indicates the presence of multiple centers. For example, multiple seeds can be generated based on the radius field or center response and labeled for segmentation (watershed, region growing, or other equivalent methods), decomposing a candidate region into multiple candidate segments, and using these candidate segments as nodes in the graph. The centroid position (voxel coordinates or physical coordinates mapped by spacing) of each node is calculated, forming a node set. .
[0072] D3. Node Feature Extraction: For each node, further extract geometric and statistical attributes to form a node feature vector, including but not limited to: node volume (number of connected voxels); average confidence (mean probability of foreground within the node); average radius (if a radius field is provided, then the mean radius value within the node); and optional average sphericity (if a sphericity map is provided, then the mean sphericity within the node). Concatenate the node centroid coordinates with the above feature vectors to form the node input matrix X0.
[0073] D4. Graph Construction (Adjacency Relationships): Construct the adjacency structure of the graph based on the Euclidean distance between the centroids of nodes. Adjacency strategies include: Radius Graph based on physical distance: two nodes are considered adjacent when the distance between them is less than a set threshold r; KNN Graph based on K nearest neighbors: each node is connected to its k nearest neighbors in geometric space.
[0074] The adjacency matrix can be a symmetric matrix, and by default, it does not contain self-loops during the graph construction stage. When performing graph convolution aggregation, self-loops can be further added to the adjacency matrix to perform stable neighborhood aggregation.
[0075] D5. Graph Convolutional Network (GCN) Node Augmentation: A linear mapping is performed on the node input X0 to obtain an initial node representation H0. Several graph convolutional layers are then stacked on the adjacency relationships with added self-loops and symmetric normalization to achieve neighborhood feature aggregation, thereby obtaining the augmented node representation. }
[0076] D6. Edge identification and instance label update: Traverse the graph for nodes with edges. The edge features are constructed and input into the edge classifier. Edge features may include: node representations. , Difference vector The system considers the Euclidean distance between centroids and optional edge embeddings (e.g., obtained through radius graph / KNN graph construction or array proximity encoding). Edge features are processed by a multilayer perceptron to output binary logits, used to determine whether an edge should be retained (merged) or pruned (not merged). During the inference phase, a softmax function is applied to the logits, and a threshold is used to determine the set of edges to be retained. Based on the retained edge set, a union-find operation is performed to achieve node connectivity aggregation and update instance labels, thereby obtaining the structured and optimized droplet instance segmentation result.
[0077] The network structure of this application adopts an end-to-end three-dimensional encoding-decoding architecture, and the data flow and functions between the internal modules are as follows.
[0078] E1. Input and Encoding: The input fluorescence volumetric image first enters the encoder, where low-level voxel features are extracted via anisotropic 3D convolutional feature blocks (ConvBlock3D), and anisotropic attention (Z / XY gated fusion) is used to structurally compensate for the lack of axial information caused by anisotropic sampling. Subsequently, contextual information is further aggregated through the bottleneck layer to obtain high-level semantic representation.
[0079] E2. Decoding and Scale Restoration: The bottleneck layer output enters the decoder, where spatial resolution is restored step by step through cascaded asymmetric 3D upsampling modules (magnifying only the XY plane while keeping the Z-axis resolution unchanged). After each upsampling stage, the features are spliced with the same-resolution features from the encoder's symmetric layers (Skip Connection), and then processed by anisotropic convolution and attention modules until the full resolution of the original input is restored.
[0080] E3. High-Resolution Center-Boundary Co-Attention Fusion: Decoded features restored to full resolution are fed in parallel into the center and boundary branches. The center branch focuses on the localization and enhancement of the droplet's main body region, while the boundary branch emphasizes contours, contacts, and edge details. The two branches interact and fuse at a high-resolution scale through center-boundary co-attention. The fused features are then uniformly output for subsequent predictions.
[0081] E4. Multi-task prediction output: The fused decoded features are input to the multi-task prediction head to obtain three single-channel outputs (logits / probability maps): Foreground segmentation output: The foreground probability map is obtained through Sigmoid, which is used to characterize the confidence that the voxel belongs to the droplet foreground; Sphericity / Sphericity output: The sphericity map is obtained through Sigmoid, which is used to characterize the degree to which the droplet shape is close to spherical or to describe the presence of abnormal shape / adhesion risk; Radius field output: The radius field prediction map is obtained through non-negative constraint activation (such as Softplus or other equivalent functions), which is used to characterize the radial distance estimation from the voxel to the local centroid, so as to provide continuous geometric guidance and assist in boundary reconstruction.
[0082] E4. Optional post-processing of decoupled topology graphs (Droplet Graph Builder & GCN) When droplet adhesion or high-density stacking in the sample causes instance labels to be unstable, the topology graph decoupling module can be optionally enabled as a post-processing step to improve the structural consistency of instance labels and correct oversegmentation or undersegmentation of candidate regions. Specifically, connected component analysis is performed on the 3D foreground mask obtained by thresholding the segmentation output to obtain candidate regions. In a preferred embodiment, each candidate region is directly used as a graph node. In another preferred embodiment, to enhance the processing capability for undersegmented scenes, optional candidate oversegmentation can be performed on candidate regions with low sphericity or where the radius field indicates the presence of multiple centers (e.g., generating multiple seeds based on the radius field or center response and performing labeling segmentation), decomposing a candidate region into multiple candidate fragments and using them as graph nodes. Centroid coordinates (which can be mapped to the physical coordinate system) are calculated for each node, and its volume, average segmentation probability, and (optionally) average radius and average sphericity are aggregated to form node features. Adjacency relationships are constructed based on the Euclidean distance threshold between centroids or the K-nearest neighbor strategy to form the graph structure input.
[0083] The graph structure input is fed into a graph neural network (GCN): first, the node input is linearly embedded; then, graph convolutional layers are stacked on adjacency relationships with added self-loops and symmetric normalization to achieve neighborhood feature aggregation; subsequently, edge features are constructed using the node representations at both ends of the edge, their difference vectors (element-wise absolute difference), and geometric distance (which can be combined with edge type embedding), and a multilayer perceptron outputs binary classification logits (merged / not merged) for each edge. During the inference phase, a softmax function is applied to the edge logits, and a threshold is used to determine whether to retain the edge. A disjoint-set data structure is used to aggregate nodes and update instance labels, thereby achieving structured optimization of droplet instance segmentation results in scenarios of adhesion and accumulation.
[0084] In S3, the DropletSegNet model constructed in S2 is pre-trained; including: S31 constructs a multi-task supervision signal, which includes: foreground segmentation supervision signal, sphericity supervision signal, radius field supervision signal, and topology graph edge supervision signal; For each output task, a supervision objective is constructed, specifically including: Foreground segmentation supervision: The labeled droplet instance masks are merged into a foreground / background binary map, which serves as the foreground probability map. The supervised objective is used to supervise the learning of voxel prospect confidence.
[0085] Sphericity supervision: To supervise the sphericity output A sphericity monitoring signal can be constructed. In a preferred embodiment, a distance transformation can be calculated and normalized within an instance based on an instance mask to obtain a voxel-level sphericity target map; in another preferred embodiment, an instance-level sphericity scalar can be calculated for each droplet instance. This value is then assigned / broadcast to the voxels within the instance to form a supervised target map. The sphericity scalar can be calculated, for example, using the following formula:
[0086] Where V is the volume of the instance and A is the surface area of the instance; other equivalent sphericity or morphological consistency measures can also be used to construct the monitoring signal.
[0087] Radius field supervision: to supervise the radius field prediction output For each droplet instance, using its geometric center / centroid as a reference point, the Euclidean distance from the voxel to this reference point is calculated inside the instance mask, generating a true radius field map. In a preferred embodiment, the Euclidean distance is calculated in the physical coordinate system, that is, the volume data spacing (Δx, Δy, Δz) is considered when calculating the distance to adapt to anisotropic sampling conditions; other equivalent methods can also be used to generate the radius field monitoring signal.
[0088] Topological graph supervision (edge supervision): After constructing the droplet candidate node graph, binary classification supervision labels are generated for adjacent node pairs (i,j) in the graph, which are used to train the edge discriminator of the graph decoupling module. The supervision label can be defined as follows: when the candidate regions (or candidate oversegments) corresponding to node i and node j belong to the same real droplet instance, it is labeled as "merge / retain edge"; otherwise, it is labeled as "do not merge / cut edge". This forms a binary classification label of "merge / do not merge" to supervise the learning of edge logits.
[0089] S32 constructs a multi-task loss function, which includes a loss function for foreground segmentation supervision, a loss function for sphericity prediction graph supervision, a loss function for radius field prediction, and a loss function for edge binary classification supervision of the graph decoupling module, and weights and combines the loss functions to form the total loss function; The overall loss function is a weighted combination of the losses of each subtask:
[0090] in: : Used for foreground segmentation supervision, it can be composed of Dice Loss and binary cross-entropy loss (BCE) to balance foreground-background imbalance and region overlap quality; : Used for supervision of sphericity map. Since the sphericity prediction head output is represented by a sigmoid activation, binary cross-entropy loss (BCE Loss) or its weighted variant is used here to constrain the consistency between the sphericity prediction and the target distribution. For radius field prediction, L1 or L2 loss or other equivalent distance regression loss constrains the difference between the radius field prediction and the true value. In a preferred embodiment, With / or It can be computed within the foreground mask or instance mask to reduce the interference of the background region on the geometric regression task; For edge-based binary classification supervision of the graph decoupling module, cross-entropy loss can be used; in class imbalance scenarios, weighted cross-entropy, focal loss, or other equivalent classification losses can also be used. The graph loss can be optionally enabled: when the graph decoupling module is not trained or enabled, it can be set to... Or it may not be included in this item.
[0091] Each loss weight Adjustments can be made through hyperparameter search, cross-validation, or empirical settings; in one exemplary setting, , , , For example, values could be 1.0 / 0.5 / 0.5 / 0.2, but are not limited to these.
[0092] S33 uses the total loss function to train the DropletSegNet model.
[0093] To achieve stable convergence of the multi-task voxel prediction and optional graph decoupling module, the present invention can adopt the following optimization strategy: Optimizer selection: Adam or AdamW or other equivalent adaptive optimizers are preferred; in a preferred embodiment, weight decay may be set to enhance generalization ability.
[0094] Learning rate and scheduling: Initial settings are as follows However, this is not the only option; learning rate scheduling strategies may include cosine annealing, Plateau scheduling based on validation metrics (ReduceLROnPlateau), step decay, or other equivalent strategies; alternatively, learning rate warmup may be used in the early stages of training to improve training stability.
[0095] Batch and Input Size: Input 3D volume data can be used as a whole image or as 3D patches for training. In one exemplary setup, the patch size can be, for example, 64×128×128, but other sizes can be used depending on GPU memory and data resolution. Batch size can be determined based on GPU memory and training stability. When the batch size is small, a normalization method more suitable for small-batch training (such as Instance Normalization, Group Normalization, or other equivalent normalization methods) can be selected.
[0096] Training phase settings (consistent with the optional graph module): In a preferred embodiment, the voxel-level encoder-decoder network and the multi-task prediction head are first trained to obtain stable foreground segmentation, sphericity, and radius field predictions; in another preferred embodiment, the voxel network parameters can be fixed or partially frozen after convergence, and the graph decoupling module (GCN and edge classifier) can be further trained; in yet another optional embodiment, end-to-end joint training or alternating training can be performed: when the graph decoupling module is enabled, a graph edge supervision loss is introduced, and the contribution of the graph loss to the total loss is controlled by setting the parameters; when the graph module is not enabled, the graph can be set to... =0.
[0097] Optional stabilization measures: Gradient clipping, mixed precision training (AMP), or other equivalent strategies can be used to reduce the risk of numerical instability and improve training efficiency.
[0098] S4 supports end-to-end voxel-level output during the inference phase and combines structural geometric priors with graph structure analysis for refined instance decoupling and attribute extraction. The specific process is as follows: After the S41 input fluorescence volume image is inferred by the trained network model, the output is: foreground probability map. ; Sphericity prediction diagram Radius field prediction diagram .
[0099] The following processing will be performed next: Threshold segmentation: for A preliminary binary droplet mask is obtained by applying a threshold; the threshold can be a fixed threshold (e.g., 0.5) or can be adaptively determined based on the statistical characteristics of the image. Connectivity analysis: Perform three-dimensional connectivity analysis on the binary mask (optional 6 / 18 / 26 adjacency mode) to extract droplet candidate regions and obtain preliminary instance candidate labels.
[0100] Sphericity constraint: utilizing The candidate regions are subjected to quality assessment and constraints, such as filtering out abnormal candidate regions with significantly low sphericity, or marking suspected adhesion / abnormal morphological regions for subsequent fine processing.
[0101] Radial field boundary refinement: utilizing The provided radial geometry information can be used to correct or refine the boundary, for example, constrain boundary consistency and suppress false boundary responses based on the geometric relationship between the radius field and the centroid within the candidate region; and can be used to generate seeds for subsequent candidate oversegmentation to reduce boundary misjudgment and adhesion errors.
[0102] In some embodiments, instead of using deep learning for prediction, classical morphological feature analysis (such as based on Hu moment or compactness indices) is performed on the preliminary segmentation results during post-processing to identify and filter out non-spherical adhesions or anomalous droplets. Alternatively, only one of the radius field prediction head or the sphericity prediction head may be used, rather than both, to simplify the model structure, or the radius field prediction may be replaced with distance transformation prediction (directly predicting the distance from the voxel to the background).
[0103] For areas with adhesion, high-density stacking, or abnormal morphology, S42 can optionally enable the graph structure post-processing module to optimize instance labels. Specifically, this includes: Candidate node generation: Based on the connected candidate regions obtained in S41, calculate the centroid of each candidate region (which can be mapped to the physical coordinate system), and extract node features such as volume, average segmentation confidence, and (optionally) average radius / average sphericity; in a preferred embodiment, each candidate region is directly used as a graph node.
[0104] Optional candidate oversegmentation (covering undersegmentation): For candidate regions with low sphericity or where the radius field / center response indicates the existence of multiple centers, candidate oversegmentation can be performed first (e.g., generating multiple seeds based on the radius field or center response and marking them for segmentation). This decomposes a candidate region into multiple candidate segments and uses them as graph nodes to facilitate subsequent structural decoupling.
[0105] Adjacency graph construction: Adjacency relationships are constructed based on the Euclidean distance threshold between centroids or the kkk nearest neighbor strategy to form potential interactive node pairs.
[0106] GCN Edge Discrimination and Instance Update: The graph structure is input into the graph neural network for node augmentation, and edges in the graph are classified into two categories (merge / not merge). During the inference phase, a Softmax function is applied to the edge logits, and a set of edges to be retained is determined based on a threshold. Based on the retained edge set, a Union-Find algorithm is executed to aggregate nodes and update instance labels. For segments obtained from candidate oversegmentation, if they are determined to be "not merged," they are kept separate, thus achieving structured splitting of undersegmented scenes; oversegmented segments are restored to the same instance through "merging."
[0107] The graph structure decoupling module can replace or supplement the traditional watershed method to improve instance consistency and reduce the risk of over-segmentation / under-segmentation in scenarios of densely packed droplets.
[0108] S43 performs the following statistics on each droplet region in the final instance label map: extracts fluorescence intensity features (such as voxel mean, volume-weighted average, or median); classifies droplets as positive / negative using a fixed threshold method or a distribution-based mixture model; and counts the number of positive droplets. Total number of droplets Calculate the positive rate (Optional) The concentration of the target molecule can be deduced from the Poisson statistical model to support digital PCR analysis.
[0109] This invention, by introducing deep learning models, multi-task prediction heads, and graph neural networks, achieves the following beneficial effects compared to existing technologies: 1. It solves the problem of inaccurate segmentation due to droplet adhesion, and improves the accuracy of segmentation and statistics.
[0110] By introducing droplet geometric prior constraints, including sphericity prediction branch and radius field prediction branch, abnormal shapes or adhesion regions are quantified and filtered, avoiding the inclusion of excessively large or adhesion-related abnormal droplets in the statistics.
[0111] 2. A droplet stacking topology decoupling module was designed, modeling the adhesion decoupling as an edge classification problem on a graph structure, and using a graph neural network (GCN) for structured post-processing. This effectively replaces the traditional watershed method, which is prone to over-segmentation or under-segmentation, achieving accurate decoupling and segmentation even in cases of tight droplet stacking or adhesion, thus improving the accuracy of nucleic acid concentration calculation and positive ratio statistics.
[0112] 3. It enhances robustness to changes in imaging conditions and noise, ensuring the stability of the results.
[0113] An anisotropic attention module was constructed, which structurally compensates for the problem that the z-axis resolution in microdroplet fluorescence images is significantly lower than that in the XY axis by modeling and adaptively fusing features in the z and XY axes respectively, thus significantly improving the robustness of droplet detection and segmentation under low-resolution imaging conditions.
[0114] 4. Achieved more flexible and adaptable fluorescence ratio statistics.
[0115] A statistical method for fluorescence signals based on droplet instance-level fluorescence intensity extraction and automatic classification is proposed, replacing the classification methods in existing methods that rely on fixed thresholds or manually set grayscale ranges. This solves the problem of poor stability of traditional methods under different experimental conditions, and obtains more consistent and accurate statistical results for the positive / negative droplet ratio.
[0116] The fluorescence signal statistical method can be replaced by a statistical method based on Gaussian mixture model (GMM) or clustering algorithm (such as K-means) to fit and classify the fluorescence intensity histograms extracted from all droplet instances, so as to achieve automatic differentiation and proportional statistics of positive / negative droplets.
[0117] 5. Improved the geometric consistency of segmentation results and avoided misjudgment.
[0118] The Design Center – Boundary Cooperative Attention Module applies geometric consistency constraints to the droplet segmentation results through the interaction mechanism of center features (droplet core localization) and boundary features (contour details). This avoids misjudgment or morphological distortion that is prone to occur in atypical droplet cases when relying solely on intensity features or without introducing geometric priors.
[0119] This invention also discloses a deep learning-based system for statistically analyzing the fluorescence ratio of microfluidic droplet arrays, comprising: The image acquisition module is used to acquire three-dimensional fluorescence volume images of the droplets and to preprocess the three-dimensional fluorescence volume images. The DropletSegNet model includes: The encoder consists of multiple levels of sequentially connected coding units, each of which contains a three-dimensional anisotropic convolutional feature block and an anisotropic attention module. The decoder is connected to the deepest output of the encoder and contains multiple cascaded asymmetric upsampling modules. A full-resolution dual-branch structure is set at the output of the decoder to feed the feature map restored to the original input resolution into a central branch and a boundary branch in parallel. The central branch and the boundary branch are independent three-dimensional convolutional branches, which are used to reconstruct the main body region of the droplet and the boundary and contact gap region of the droplet, respectively. The center-boundary collaborative attention module generates queries using the output features of the center branch and generates keys and values using the output features of the boundary branches. It calculates interaction features through an attention mechanism and adds the interaction features to the residual of the output features of the center branch to obtain the fused full-resolution features. The multi-task prediction head receives the fused full-resolution features and outputs in parallel a droplet foreground probability map, a sphericity map, and a radius field prediction map, wherein the foreground probability map and the sphericity map are activated by Sigmoid, and the radius field prediction map is activated by non-negative constraints. The droplet graph construction and graph convolution edge discrimination module takes the foreground probability graph and optional radius field prediction graph and / or sphericity graph as input. It generates candidate nodes by binarizing the foreground probability graph and performing three-dimensional connected component analysis. For each node, it extracts a feature vector including volume, average confidence, average radius, and optional average sphericity. Based on the Euclidean distance between node centroids, it constructs the graph's adjacency structure. In the graph convolutional layer, it performs neighborhood aggregation on the node features to obtain an enhanced node representation. Then, for each edge, it constructs an edge feature including the representations of the two endpoint nodes, the difference vector, and the centroid distance. The edge classifier determines whether the edge should be retained or pruned, and the instance label is updated based on the set of retained edges. The output module is used to transform the raw output of the DropletSegNet model into droplet analysis results that are physically meaningful and biologically interpretable.
[0120] The embodiments should not be regarded as limitations on the present invention, but any improvements made based on the spirit of the present invention should be within the protection scope of the present invention.
Claims
1. A deep learning-based method for counting fluorescence ratio information of a microfluidic droplet array, characterized in that: It includes the following steps: S1 acquires a three-dimensional fluorescence volume image of the droplet; and preprocesses the three-dimensional fluorescence volume image. S2 constructs the DropletSegNet model; S3 pre-trains the DropletSegNet model built in S2; S4 uses the three-dimensional fluorescence volume image of the target as input to the DropletSegNet model, and transforms the original output of the DropletSegNet model into droplet analysis results with physical meaning and biological interpretability. The construction steps of the DropletSegNet model include: S21 extracts multi-scale features from the input three-dimensional fluorescence volume image through an encoder; S22 inputs the bottleneck features of the deepest layer output of the encoder into the decoder, and gradually restores the spatial resolution through cascaded asymmetric upsampling modules; S23 After the decoder completely restores the feature map spatial resolution to the same full resolution as the input image, the high-resolution features are fed in parallel into two independent 3D convolutional branches: the center branch and the boundary branch. Then, center-boundary collaborative attention fusion is performed to obtain the final fused full-resolution features. F fused The central branch is used to emphasize the reconstruction of the droplet body region, and the boundary branch is used to emphasize the reconstruction of the droplet boundary and the contact gap region between adjacent droplets. S24 will feature full resolution. F fused The input is a multi-task prediction head, which simultaneously outputs a droplet foreground probability map, a sphericity prediction map, and a radius field prediction map. The foreground probability map, activated by a sigmoid function, represents the confidence level of each voxel belonging to the droplet foreground. The sphericity prediction map, activated by a sigmoid function, quantifies the geometric consistency of each voxel with the spherical droplet in space. The radius field prediction map, activated by a non-negative constraint, represents the estimated distance from each voxel to the centroid of its droplet. S25 constructs a graph convolutional network based on the foreground probability map and optional radius field prediction map and / or sphericity prediction map, and performs graph convolutional edge discrimination.
2. The method for statistical analysis of fluorescence ratio information in microfluidic droplet arrays based on deep learning according to claim 1, characterized in that: Step S1 includes: S11 Image Acquisition: Three-dimensional fluorescence volume images were acquired from the microfluidic droplet array experimental platform or reconstructed from two-dimensional / pseudo-three-dimensional slice sequences. Anisotropic pixel spacing Δx, Δy, and Δz were recorded, where Δz... Δx = Δy; S12 Normalization and Background Suppression: Perform Z-score intensity normalization on the input 3D fluorescence volume image and use flat field correction or background subtraction to suppress non-uniform illumination; S13 voxel anisotropy processing: Maintain the original image resolution and spatial proportions, without interpolating or scaling the Z-axis. S14 data augmentation performs random geometric transformation and light intensity transformation on the training data. The random geometric transformation and light intensity transformation include one or more of random flipping, rotation, intensity jittering and Gaussian noise perturbation; wherein the transformation amplitude in the Z-axis direction is smaller than that in the XY plane.
3. The method for statistical analysis of fluorescence ratio information in microfluidic droplet arrays based on deep learning according to claim 1, characterized in that: In S21, Multi-scale features are extracted through a series of sequentially connected coding units. Each coding unit first uses a three-dimensional anisotropic convolutional kernel with a size of (1, 3, 3) to extract features, and downsampling is performed only in the XY plane while maintaining the resolution in the Z direction. Then perform anisotropic attention fusion: extract Z-axis features using a depthwise separable convolution with kernel size (3, 1, 1), extract XY-axis features using a depthwise separable convolution with kernel size (1, 3, 3), and pass through a global channel gating factor. The outputs of the two branches are adaptively fused. The fused features are then projected by a 1×1×1 convolution and added to the input features as residuals, which are then used as the output of this coding unit.
4. The method for statistical analysis of fluorescence ratio information of microfluidic droplet arrays based on deep learning according to claim 3, characterized in that: The global channel gating factor Given a weight vector set by channel, the weighted fusion is performed according to... Execution, ⊙ indicates element-wise multiplication broadcast by channel.
5. A method for statistical analysis of fluorescence ratio information in microfluidic droplet arrays based on deep learning, as described in claim 1, characterized in that: Each asymmetric upsampling module employs 3D linear interpolation, scaling up by a factor of 2 only in the XY plane while maintaining the same resolution in the Z-axis. After interpolation, a 1×3×3 3D convolutional block is cascaded. After each upsampling stage, the upsampled features of that stage are concatenated with the native features of the same resolution in the corresponding layer of the encoder along the channel dimension. The concatenated composite features are then subjected to 3D convolution for dimensionality reduction and fed into another anisotropic attention module for recalibration and refinement of the Z-axis and XY-axis features, serving as the input for the next upsampling stage.
6. A method for statistical analysis of fluorescence ratio information of microfluidic droplet arrays based on deep learning, as described in claim 1 or 5, characterized in that: The center-boundary collaborative attention fusion step is as follows: using the features output by the center branch... F c Generate query Q, with features output by the boundary branches. F b Key K and value V are generated. Q, K, and V are all obtained through 1×1×1 three-dimensional convolutional linear mapping. The attention relevance matrix is calculated based on query Q and key K. C represents the channel dimension or embedding dimension. The interaction feature H = Attn × V is then calculated using the attention relevance matrix. Finally, H is compared with... F c By summing the residuals, we obtain the fused full-resolution features. F fused .
7. The method for statistical analysis of fluorescence ratio information of microfluidic droplet arrays based on deep learning according to claim 1, characterized in that: In S25, the graph convolutional network includes: The foreground probability map is binarized according to a preset threshold to obtain a foreground voxel mask. Three-dimensional connected component analysis is then performed on the foreground voxel mask to obtain candidate regions. Each candidate region or the candidate fragment obtained after further oversegmentation of the candidate region is used as a node in the graph. For each node, extract a feature vector containing node volume, average confidence, average radius, and optional average sphericity. Concatenate the node centroid coordinates with the above feature vectors to form the node input matrix X0. Construct the adjacency structure of the graph based on the Euclidean distance between the centroids of the nodes; The initial node representation H0 is obtained by linearly mapping the features of the node input matrix X0. Then, several graph convolutional layers are stacked on the adjacency relationships with added self-loops and symmetric normalization to aggregate neighborhood features, resulting in the enhanced node representation. }; Traverse the nodes in the graph that have edges, construct edge features and input them into the edge classifier, and the edge classifier outputs a binary classification decision on whether the edge should be kept or cut. Finally, a disjoint-set data structure is performed based on the retained edge set to achieve node connectivity aggregation and update instance labels, resulting in the optimized droplet instance segmentation result.
8. The method for statistical analysis of fluorescence ratio information of microfluidic droplet arrays based on deep learning according to claim 1, characterized in that: In S3 S31 constructs a multi-task supervision signal, which includes: foreground segmentation supervision signal, sphericity supervision signal, radius field supervision signal, and topology graph edge supervision signal; S32 constructs a multi-task loss function, which includes a loss function for foreground segmentation supervision, a loss function for sphericity prediction graph supervision, a loss function for radius field prediction, and a loss function for edge binary classification supervision of the graph decoupling module, and weights and combines the loss functions to form the total loss function; S33 uses the total loss function to train the DropletSegNet model.
9. The method for statistical analysis of fluorescence ratio information in a microfluidic droplet array based on deep learning according to claim 1, characterized in that: S4 includes the following: S41 acquires a 3D fluorescence volume image and outputs a foreground probability map, a sphericity prediction map, and a radius field prediction map after inference using the DropletSegNet model. Threshold segmentation is applied to the foreground probability map to obtain a binary droplet mask. 3D connected component analysis is performed on the binary droplet mask to extract candidate droplet regions. The sphericity prediction map is used to constrain the sphericity of each candidate region, filtering or marking abnormal candidate regions. The radial geometric information provided by the radius field prediction map is used to correct or refine the boundaries of the candidate regions. S42 enables the graph structure post-processing module to optimize instance labels for areas with adhesion, high-density stacking, or abnormal morphology. The graph structure post-processing module includes: generating candidate nodes based on the droplet candidate region, constructing an adjacency graph, inputting the graph structure into a graph neural network and performing binary classification on the edges in the graph, performing node aggregation or separation based on the classification results, and updating instance labels; S43 extracts fluorescence intensity features from each droplet region in the final instance label image, performs positive / negative classification based on the fluorescence intensity features, counts the number of positive droplets and the total number of droplets, and calculates the positive ratio.
10. A deep learning-based statistical system for fluorescence ratio information of microfluidic droplet arrays, characterized in that: It includes: The image acquisition module is used to acquire three-dimensional fluorescence volume images of the droplets; The three-dimensional fluorescence volume image was preprocessed. The DropletSegNet model includes: The encoder consists of multiple levels of sequentially connected coding units, each of which contains a three-dimensional anisotropic convolutional feature block and an anisotropic attention module. The decoder is connected to the deepest output of the encoder and contains multiple cascaded asymmetric upsampling modules. A full-resolution dual-branch structure is set at the output of the decoder to feed the feature map restored to the original input resolution into a central branch and a boundary branch in parallel. The central branch and the boundary branch are independent three-dimensional convolutional branches, which are used to reconstruct the main body region of the droplet and the boundary and contact gap region of the droplet, respectively. The center-boundary collaborative attention module generates queries using the output features of the center branch and generates keys and values using the output features of the boundary branches. It calculates interaction features through an attention mechanism and adds the interaction features to the residual of the output features of the center branch to obtain the fused full-resolution features. The multi-task prediction head receives the fused full-resolution features and outputs in parallel a droplet foreground probability map, a sphericity map, and a radius field prediction map, wherein the foreground probability map and the sphericity map are activated by Sigmoid, and the radius field prediction map is activated by non-negative constraints. The droplet graph construction and graph convolution edge discrimination module takes the foreground probability graph and optional radius field prediction graph and / or sphericity graph as input. It generates candidate nodes by binarizing the foreground probability graph and performing three-dimensional connected component analysis. For each node, it extracts a feature vector including volume, average confidence, average radius, and optional average sphericity. Based on the Euclidean distance between node centroids, it constructs the graph's adjacency structure. In the graph convolutional layer, it performs neighborhood aggregation on the node features to obtain an enhanced node representation. Then, for each edge, it constructs an edge feature including the representations of the two endpoint nodes, the difference vector, and the centroid distance. The edge classifier determines whether the edge should be retained or pruned, and the instance label is updated based on the set of retained edges. The output module is used to transform the raw output of the DropletSegNet model into droplet analysis results that are physically meaningful and biologically interpretable.
Citation Information
Patent Citations
Processing method of fluorescence intensity data in fluorescence droplet detection
CN106596489A
anti-interference classification method for a microdroplet digital PCR instrument
CN109657731A
Machine learning assisted droplet digital nucleic acid detection device and detection method
CN113789259A
Droplet digital PCR quantitative method and system
CN117274131A