An immune cell flow circle gate method and system based on a point cloud algorithm
By using a point cloud algorithm-based gating method for immune cell flow cytometry, flow cytometry data is processed automatically, solving the problems of high manual dependence, difficulty in handling complex boundaries, and insufficient utilization of high-dimensional information in traditional methods. This achieves high-accuracy cell classification and efficient high-throughput analysis.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- GYUNO (SHANGHAI) GENE TECH CO LTD
- Filing Date
- 2026-03-10
- Publication Date
- 2026-06-19
AI Technical Summary
Traditional flow cytometry gating methods rely on human experience, making it difficult to handle cell populations with complex boundaries, underutilizing high-dimensional information, and exhibiting low efficiency and poor consistency in high-throughput analysis scenarios.
An immune cell flow cytometry gating method based on point cloud algorithms is adopted. Through hierarchical gating configuration, data preprocessing and point cloud construction, point cloud geometric convolutional network training and layer-by-layer cumulative prediction, flow cytometry data is automatically processed, and high-dimensional fluorescence information is used to generate a visual gating map and report.
It achieves highly accurate automated cell classification, can handle complex cell population boundaries, makes full use of high-dimensional information, is suitable for high-throughput analysis, supports batch gating and parallel output, and generates traceable analysis reports.
Smart Images

Figure CN122245455A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of flow cytometry data analysis technology, specifically to an immune cell flow cytometry gating method and system based on point cloud algorithm. Background Technology
[0002] With the widespread application of flow cytometry in clinical diagnosis and basic research, the accurate identification and quantitative analysis of immune cell subsets has become an important technological requirement. Flow cytometry can rapidly analyze the physical and chemical properties of large numbers of cells by detecting multi-parameter fluorescence signals.
[0003] Traditional flow cytometry gating methods rely heavily on expert experience for manual gating, which has the following problems: First, it is highly dependent on manual intervention. Traditional methods require technicians to manually set the thresholds for scattered light and fluorescence signals, and the processing time for a single sample usually exceeds 30 minutes. Furthermore, the classification results can vary by up to ±15% between different operators.
[0004] Second, it struggles to handle complex boundaries. When the boundaries of cell populations are irregular or overlapping, traditional methods struggle to identify them accurately, leading to a decrease in classification accuracy.
[0005] Third, the high-dimensional information is not fully utilized. Traditional methods mainly rely on visual analysis of 2D scatter plots, failing to make full use of the high-dimensional features of multi-channel fluorescence information.
[0006] Fourth, repetitive tasks are inefficient. In high-throughput sample processing scenarios, traditional methods are time-consuming and inconsistent, and are prone to significant fluctuations during batch analysis.
[0007] Therefore, there is a need for a flow cytometry gating method that can automatically process flow cytometry data, accurately identify the boundaries of complex cell populations, make full use of high-dimensional fluorescence information, and is suitable for high-throughput analysis scenarios. Summary of the Invention
[0008] In view of the above-mentioned deficiencies of the prior art, the first aspect of the present invention provides an immune cell flow cytometry gating method based on a point cloud algorithm, comprising the following steps: Step S1: Hierarchical gating configuration; wherein, according to the predefined gating hierarchy structure file, the cell classification task is decomposed into a multi-classification problem at multiple levels; Step S2: Data preprocessing and point cloud construction; wherein, data including at least: forward scattering FSC signal, side scattering SSC signal and multi-channel fluorescence signal are extracted from the flow cytometry detection file, i.e., FCS file; spectral overflow correction of fluorescence signal is performed through compensation matrix; and linear or logarithmic coordinate transformation is performed according to channel type to obtain compensated standardized flow cytometry data; according to the gated hierarchy structure defined in step S1, corresponding two-dimensional coordinate data are extracted for each cell point, and a point cloud structure including adjacency relationship is constructed based on its spatial distribution; Step S3: Point cloud geometric convolutional network training; wherein, for each layer defined in step S1, a corresponding geometric graph convolutional network model is constructed, the point cloud graph structure constructed in step S2 is used as input, the spatial neighborhood information of cell points is aggregated by the geometric convolutional layer of the model to extract features, and the parameters of each layer model are optimized by combining the classification loss function. Step S4: Accumulate prediction layer by layer; predict from top to bottom layer by layer: for the current layer, select a subset of cell points that have been predicted as a specific category in its parent layer, input them into the geometric graph convolutional network model corresponding to the layer for multi-class prediction; accumulate the prediction results of each layer layer by layer, and finally generate a unified result file containing complete hierarchical classification information; Step S5: Gating image drawing and report output; wherein, the proportion of each type of cell at each level is statistically analyzed, a two-dimensional scatter plot is generated according to the level, and the output includes at least: user information, sample number, population name, gating parameters, cell proportion and graphic results in a report file. The report is exported in PDF or Excel format to achieve traceability and manual review of the gating results.
[0009] In the immune cell flow cytometry gating method based on point cloud algorithm described above, step S1 may optionally include the following steps: The gating hierarchy structure file is completed according to the gating task. The file is in CSV format and includes: the cell population that needs to be gated and its parent population, the features corresponding to the coordinate axes used for gating, and whether the gating level includes background. The background refers to cells at the same level that are not defined as any subpopulation under the parent gate during the gating process. The same level of the gating is defined as all subpopulations that need to be gated that have the same parent gate and use the same features to define the coordinate axes. In this hierarchy, all cells are used as input for gate formation. Since the gated cells do not belong to any subpopulation of any cell population, the parent gate of this hierarchy is defined as ROOT.
[0010] In the immune cell flow cytometry gating method based on point cloud algorithm described above, step S2 may optionally include the following steps: Read the user-uploaded FCS file, parse the HEADER information, and obtain the compensation matrix; The compensation matrix is expanded to a complete matrix containing all channels. The formula for calculating the compensated streaming data is as follows: ; Wherein, FL represents the compensated flow cytometry data. This is the compensation matrix that comes with the FCS file; Select the coordinate transformation method according to the channel type: linear scale is used for FSC or SSC signals, and logarithmic transformation is used for fluorescence signals. Read the compensated flow cytometry data and the hierarchical gating configuration defined in step S1, and extract the X-axis and Y-axis coordinate data for each layer. The coordinates are values normalized to the range of 0-1. Calculate the distance from each cell point to the origin. , Calculate the angle from each cell point to the origin. From -π to π, x and y are the X-axis and Y-axis coordinates of the cell point, respectively; Calculate the local density of each cell point based on the k-nearest neighbor distance: , for The minimum value is used to avoid division by zero and normalize the local density to the range of 0-1. Extract all fluorescence channel data except for the current layer's X and Y axes as high-dimensional features, and concatenate the coordinates, distance, angle, density, and high-dimensional features to form a multi-dimensional point cloud feature representation.
[0011] In the immune cell flow cytometry gating method based on point cloud algorithm described above, optionally, step S2 is executed through a nested loop structure based on the number of samples and the hierarchical gating configuration defined in step S1. The loop structure includes a sample loop and a subtype loop processing flow. The sample loop is the outer loop that traverses all samples to be analyzed and is used to read the raw detection data of each sample in sequence. The outer loop ends when all sample data has been processed. The subtype loop is the inner loop that traverses all subtype data to be analyzed for each sample and is used to read the detection data of each subtype for each sample. The inner loop ends when all subtype data has been analyzed.
[0012] In the point cloud-based flow cytometry gating method for immune cells described above, optionally, the k-nearest neighbor graph structure can be constructed in the following ways: The k-nearest neighbor graph structure is calculated based on the two-dimensional coordinates of point cloud, where the value of k ranges from 10 to 100. The calculation is accelerated by using the kd-tree algorithm to obtain the indices and distances of the k nearest neighbors of each point. In the subsequent training of the point cloud geometric convolutional network, the calculated k-nearest neighbor graph structure is reused without recalculating the k-nearest neighbor relationships, thus maintaining the consistency of the spatial neighborhood and reducing computational overhead. The k-nearest neighbor graph structure is pre-computed and cached during the model training phase and directly reused during the prediction phase; for programs that directly predict, the k-nearest neighbor graph structure is constructed according to the same data preprocessing steps as the training process.
[0013] In the immune cell flow cytometry gating method based on point cloud algorithm described above, optionally, the calculation logic of the geometric convolutional layer in step S3 is configured as follows: For each center point in the point cloud, its k neighboring points are obtained based on the k nearest neighbor graph structure; Calculate the coordinate difference between the center point and each neighboring point: The coordinates of the neighboring points; Calculate the Euclidean distance between the center point and its neighboring points: ; Calculate the angle between the center point and its neighboring points: ; Calculate the cosine similarity between the feature vector of the center point and the feature vectors of its neighboring points: and The coordinate difference, Euclidean distance, angle, and cosine similarity are concatenated into an edge feature vector. The edge features are then transformed and aggregated through convolution operations to extract local geometric features. By stacking multiple layers of geometric convolution, spatial geometric features from local to global are extracted layer by layer.
[0014] In the immune cell flow cytometry gating method based on point cloud algorithm described above, optionally, the geometric graph convolutional network model in step S3 is constructed as follows: It contains n sequentially cascaded geometric convolutional layers, where n is an integer between 2 and 5; each of the geometric convolutional layers is configured to: receive 128-dimensional input point features, aggregate the features of the center point and its k nearest neighbors, and output 128-dimensional features; Each of the geometric convolutional layers is sequentially connected to a residual adder, a multi-pooling module, and a feature-reduction multilayer perceptron; wherein, the residual adder is used to add the input features and output features of the geometric convolutional layer; The multi-pooling module is used to perform max pooling and average pooling on the summed features simultaneously; the feature restoration multilayer perceptron is used to map the pooled features back to 128 dimensions and use them as input features or final features of the next geometric convolutional layer. The model also includes a feature fusion module and a classification head; the feature fusion module is used to concatenate the 128-dimensional feature maps output by all n geometric convolutional layers to obtain local features, and to perform global pooling on the features of all layers to obtain global features, and then concatenate the local features with the global features; the classification head is used to receive the concatenated fused features and to map them to the probability distribution of each cell point belonging to each category through at least one fully connected layer.
[0015] In the immune cell flow cytometry gating method based on point cloud algorithm described above, optionally, in step S3, the model parameters are optimized through multiple loss functions, including: Focal loss, or classification loss, is used to handle class imbalance and focus on hard-to-classify samples. ,in This is the focusing parameter, and its value ranges from 1.0 to 3.0. Boundary loss is used to constrain the spatial location of predicted points and improve the quality of classification boundaries. It calculates the distance from the predicted point to the boundary based on the convex hull boundary of the true class points and applies a penalty to the predicted points outside the boundary. Boundary loss adopts a warmup strategy in the initial stage of training, and the weights are gradually increased from 0.1 to 0.5. Smooth Loss, also known as smoothing loss, is used to ensure consistency of neighborhood labels and reduce label fragmentation. It constructs graph Laplacian regularization based on the k-nearest neighbor graph structure to constrain the predicted probabilities of points within the neighborhood to remain consistent. and The total loss function is: ; The value range is 0.1-0.5. The value range is 0.1-0.3.
[0016] In the point cloud-based immune cell flow cytometry gating method described above, optionally, it also includes high-dimensional feature fusion: Identify the names of the X and Y axes used in the current level; Extract all fluorescence channel data other than the current X-axis and Y-axis from the flow cytometry data as high-dimensional features. The dimension of the high-dimensional features depends on the number of detection channels of the flow cytometer; for multicolor flow cytometry data, the dimension of the high-dimensional features is 10-30. Normalize the high-dimensional features to ensure that the data from different channels are within the same numerical range; High-dimensional features are combined and fused with geometric features calculated based on coordinate data to form a comprehensive feature representation for point cloud structures. The feature dimensions of each cell point are: coordinate dimension (2) + distance feature dimension (1) + angle feature dimension (1) + local density feature dimension (1) + number of high-dimensional fluorescence channels N.
[0017] To achieve the above objectives, a second aspect of this application provides an immune cell flow cytometry gating system based on a point cloud algorithm, wherein the immune cell flow cytometry gating method based on a point cloud algorithm as described in any embodiment of the first aspect includes: Hierarchical gating configuration module: used to decompose the cell classification task into a multi-classification problem with multiple levels according to a predefined gating hierarchy structure file; Data preprocessing module: used to extract cell data from flow cytometry detection files, extract X-axis and Y-axis coordinates, construct graph structure using K-nearest neighbors and calculate local density features, calculate geometric features and high-dimensional features, and construct point cloud data representation; Point cloud geometric convolutional network module: used to capture the spatial geometric relationship between cells using k-nearest neighbor graph structure and geometric convolutional layers, and to extend regional features to global features through multi-level geometric convolutional networks, and to optimize model parameters by combining multiple loss functions; Layer-by-layer cumulative prediction module: used to perform multi-class predictions sequentially from top to bottom, and accumulate the prediction results to the same file; The plotting and reporting output module is used to generate visualization results, output prediction result files and prediction reports, and enable traceability and manual review of the gate-based results.
[0018] The immune cell flow cytometry gating method and system based on point cloud algorithms provided by this invention effectively solves the problems of traditional flow cytometry gating, such as reliance on human experience, difficulty in handling complex boundaries, and insufficient utilization of high-dimensional information, through point cloud segmentation methods, K-nearest neighbor graph structures, geometric convolutional networks, and hierarchical prediction strategies. It boasts advantages such as high accuracy, end-to-end automation, the ability to handle complex cell population boundaries, and full utilization of high-dimensional fluorescence information. The entire process involves end-to-end gating according to a preset hierarchical table, combined with multi-channel biomarker data and density clustering to ensure gating accuracy. It is suitable for large-scale flow cytometry data samples, supports batch gating and parallel output, and can automatically generate visualized gating diagrams and analysis reports, facilitating manual review and reproducibility.
[0019] The following will further explain the concept, specific structure, and technical effects of the present invention in conjunction with the accompanying drawings, so as to fully understand the purpose, features, and effects of the present invention. Attached Figure Description
[0020] Figure 1 This is a flowchart illustrating an embodiment of an immune cell flow cytometry gating method based on point cloud algorithm provided by the present invention. Figure 2 This is a schematic diagram of the module architecture of an embodiment of an immune cell flow cytometry gating system based on point cloud algorithm provided by the present invention; Figure 3 Is adopted Figure 1 The comparison chart shows the gate effect of the middle gate method on the P1 level of a certain sample. The left chart is a scatter plot of the true labels, and the right chart is a scatter plot of the model's predicted labels. Detailed Implementation
[0021] To make the technical means, inventive features, objectives, and effects of the invention readily understandable, the invention is further illustrated below with reference to specific figures. However, the invention is not limited to the embodiments described below.
[0022] It should be noted that the structures, proportions, sizes, etc., illustrated in the accompanying drawings of this specification are only used to complement the content disclosed in the specification for those skilled in the art to understand and read, and are not intended to limit the conditions under which the present invention can be implemented. Therefore, they have no substantial technical significance. Any modifications to the structure, changes in the proportions, or adjustments to the size, without affecting the effects and objectives that the present invention can produce, should still fall within the scope of the technical content disclosed in the present invention.
[0023] Terms such as “comprising” and “including” indicate that, in addition to the components that are directly and explicitly stated in the specification and claims, the technical solution of the present invention does not exclude the presence of other components that are not directly or explicitly stated.
[0024] Furthermore, the term "and / or" in this article is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, or B existing alone. Additionally, the character " / " in this article generally indicates that the preceding and following related objects have an "or" relationship.
[0025] The present invention will be explained below with reference to embodiments. Those skilled in the art will understand that the following embodiments are for illustrative purposes only and should not be considered as limiting the scope of the invention. Where specific techniques or conditions are not specified in the embodiments, they are performed according to the techniques or conditions described in the literature in the art, or according to the product instructions or the conditions recommended by the manufacturer. Reagents or instruments used, unless otherwise specified, are all commercially available conventional products.
[0026] Example 1: Construction and application of a flow cytometry gating system for immune cells based on point cloud algorithm for immune assessment projects: As a preferred embodiment, in the immunity assessment project, simultaneous analysis of different cell levels and their subtypes in peripheral blood samples from 28 patients is required. Traditional methods require technicians to manually adjust the CD45 / SSC-A defraction gate, FSC-A / FSC-H deadhesion gate, CD14 / SSC-A lymphocyte / monocyte gate, CD3 / CD19 quadrant gate, CD3 / CD56 natural killer cell gate, CD56 / SSC-A natural killer cell subset gate, and CD16 / SSC-A monocyte subset gate for each sample, with a single batch analysis taking more than three working days. When processing samples with ambiguous boundaries, manually set thresholds cannot accurately distinguish cell populations, leading to cell proportion calculation errors in some samples exceeding clinically acceptable ranges.
[0027] As a preferred embodiment, the sample used is human peripheral blood, whose main components are leukocytes, erythrocytes, platelets, and plasma. Leukocytes are human immune cells, which are mainly divided into monocytes, T lymphocytes, B lymphocytes, natural killer cells, eosinophils, basophils, and neutrophils according to different cell characteristics. Among them, monocytes and lymphocytes and their subsets are the main objects of analysis.
[0028] like Figure 1 As shown, this application provides a flow cytometry gating method for immune cells based on point cloud algorithms, which may specifically include the following steps: Step S1: Hierarchical gating configuration.
[0029] In step S1, the existing gated process hierarchy is analyzed, and the gated hierarchy structure file of the cell populations and subpopulations that need to be gated is manually defined. The file adopts CSV format, and its contents include the cell populations that need to be gated and their parent populations, the features corresponding to the coordinate axes used for gated access, and whether the hierarchical gates include background.
[0030] In this embodiment, all cells are used as input for the gate hierarchy. Since the gated cells do not belong to any subpopulation of any cell population, the parent gate of this hierarchy is defined as ROOT.
[0031] The same level of a gate is defined as all subgroups that require gates and have the same parent gate and use the same coordinate axes defined by the same features. For example, T / B / P2 gates have the same parent gate (Lym) and the same coordinate axes (CD3 vs CD19), so they belong to the same level and can be processed simultaneously by the same multi-classification model.
[0032] Background refers to cells at the same level that are not defined as any subpopulation under the parent gate during the gating process. For example, a T / B / P2 gating system with a background means that all cells within the parent gating system include not only those belonging to the T / B / P2 gating system but also those not belonging to the aforementioned gating systems.
[0033] The final structure file content is shown in Table 1 below: Table 1. Hierarchical Gating Configuration File Step S2: Data preprocessing and point cloud construction.
[0034] Specifically, this may include the following steps: Read the user-uploaded FCS file, parse the HEADER information, and obtain the compensation matrix.
[0035] The compensation matrix is expanded to a complete matrix containing all channels. The formula for calculating the compensated streaming data is as follows: Wherein, FL represents the compensated flow cytometry data. This is the compensation matrix included in the FCS file; select the coordinate transformation method according to the channel type, using linear scale for FSC or SSC signals and logarithmic transformation for fluorescence signals.
[0036] Read the compensated flow cytometry data and the hierarchical gating configuration defined in step S1. Extract the X-axis and Y-axis coordinate data for each layer, with the coordinates normalized to the range of 0-1. For example, for the P1 layer (CD45 vs SSC-A), extract the CD45 channel data as the X-axis coordinate and the SSC-A channel data as the Y-axis coordinate from the data file. These coordinate data have been normalized to the range of 0-1 to eliminate the influence of different channel dimensions.
[0037] After the coordinates are extracted, each cell point can be represented by two-dimensional coordinates. It means that among them Let be the coordinate value of the i-th cell point on the X-axis. Let be the Y-coordinate of the i-th cell point, where i ranges from 1 to N, and N is the total number of cells.
[0038] In an optional embodiment, the value of N ranges from 10. 4 Up to 10 6 .
[0039] After extracting the coordinate data, it is necessary to calculate the geometric features of each cell point, including the distance to the origin, the angle to the origin, and the local density.
[0040] Calculate the distance from each cell point to the origin. , where x and y are the X-axis and Y-axis coordinates of the cell point, respectively.
[0041] Calculate the angle from each cell point to the origin. ,in It is an arctangent function, and the return value ranges from -π to π, where x and y are the X-axis and Y-axis coordinates of the cell point, respectively.
[0042] Calculate the local density of each cell point based on the k-nearest neighbor distance: Where k is the number of nearest neighbors. Let k be the distance to the k-th nearest neighbor. The minimum value is used to avoid division by zero and normalize the local density to 0-1.
[0043] Local density features reflect the spatial density of cells around them; cells located in high-density regions have higher local density values, while those in low-density regions have lower local density values. This feature helps distinguish between cells located in the core and peripheral regions.
[0044] In this embodiment, the k-nearest neighbor graph structure is calculated based on the two-dimensional coordinates of the point cloud, where k is 100. The calculation is accelerated by the kd-tree algorithm to obtain the indexes and distances of the k nearest neighbors of each point. In the subsequent point cloud geometric convolutional network training process, the calculated k-nearest neighbor graph structure is reused without recalculating the k-nearest neighbor relationships, thus maintaining the consistency of the spatial neighborhood and reducing computational overhead. The k-nearest neighbor graph structure is pre-calculated and cached during the model training phase and directly reused during the prediction phase. For programs that directly predict, the k-nearest neighbor graph structure is constructed according to the same data preprocessing steps as the training process.
[0045] Extract all fluorescence channel data other than the current layer's X and Y axes as high-dimensional features. For example, for the data in the current embodiment, the included fluorescence channels are: CD45, CD14, CD3, CD19, CD56, FSC-A, FSC-H, SSC-A, and CD16. If the current layer uses CD45 as the X-axis and SSC-A as the Y-axis, then the high-dimensional features include data from the CD14, CD3, CD19, CD56, FSC-A, FSC-H, and CD16 channels.
[0046] Coordinates, distance, angle, density, and high-dimensional features are concatenated, and each cell point is represented by a high-dimensional feature vector. The feature vector is composed of... ,in, Indicates the first Feature vector of each cell point; and These are the X-axis and Y-axis coordinates, respectively. The distance to the origin; The angle to the origin; This represents the normalized local density. Let be the eigenvalues of N high-dimensional fluorescence channels. The total dimension of the eigenvectors is . In this embodiment, N is 7, therefore the total dimension of the feature vector is 12. The feature vectors of all cell points form a point cloud data matrix with the shape of... , where N is the number of cell points and D is the feature dimension.
[0047] In the data preprocessing and point cloud construction stage described above, step S2 is executed through a nested loop structure based on the number of samples and the hierarchical gating configuration defined in step S1. The loop structure can include the processing flow of sample loop and subtype loop. The sample loop is the outer loop that traverses all samples to be analyzed and is used to read the raw detection data of each sample in sequence. The outer loop ends when all sample data has been processed. The subtype loop is the inner loop that traverses all subtype data to be analyzed for each sample and is used to read the detection data of each subtype for each sample. The inner loop ends when all subtype data has been analyzed.
[0048] Specifically, as a preferred embodiment, the sample loop will process all 28 samples sequentially, and the subtype loop will process all cell levels and their subtypes sequentially, including P1, Single Cell, Mon / Lym, T / B / P2, NK, CD56high NK / CD56dim NK, c-Mon / i-Mon / nc-Mon, ultimately obtaining a high-dimensional representation of each cell at each level as the input to the subsequent point cloud geometric convolutional network.
[0049] Step S3: Training the point cloud geometric convolutional network.
[0050] In a preferred embodiment, based on the hierarchical configuration file obtained in steps S1 and S2 and the processed input and K-nearest neighbor graph structure, the geometric relationship between K-nearest neighbor cells is extracted through the geometric convolution module, and the local features are expanded to global features through multiple extractions. Finally, the extracted features are processed through a classification layer to obtain the category probability, thus completing the cell type gate.
[0051] The core of a geometric convolutional network is the geometric convolutional layer. In a preferred embodiment, the geometric convolutional layer extracts geometric features through the following steps: First, for each center point in the point cloud, its k neighboring points are obtained based on the k-nearest neighbor graph structure; Next, the edge features of the point cloud center point and its k neighboring points are calculated, specifically as follows: Calculate the coordinate difference between the center point and each neighboring point: ,in The coordinates of the center point, The coordinates of the neighboring points; Calculate the Euclidean distance between the center point and its neighboring points: ; Calculate the angle between the center point and its neighboring points: ; Calculate the cosine similarity between the feature vector of the center point and the feature vectors of its neighboring points: ,in, and These are the feature vectors of the center point and its neighboring points, respectively.
[0052] These edge features are concatenated into an edge feature vector. Then, the edge features are transformed through convolution operations: Where Conv represents the convolution operation, These are the transformed edge features.
[0053] By stacking multiple layers of geometric convolution, spatial geometric features from local to global are extracted layer by layer.
[0054] As a preferred embodiment, the network contains four geometric convolutional layers, which extract features from local to global layers one by one.
[0055] The first geometric convolutional layer takes the original point cloud features as input and outputs 128-dimensional local features, capturing cell aggregation patterns within a small area; The second geometric convolutional layer takes the features from the previous layer as input and outputs 128-dimensional features, capturing cell distribution patterns in a medium range. The third geometric convolutional layer takes the features from the previous layer as input and outputs 128-dimensional features, capturing a wide range of cellular and tissue structures. The fourth geometric convolutional layer takes the features from the previous layer as input and outputs 128-dimensional features, capturing the global cell population distribution features.
[0056] To prevent gradient vanishing and improve feature representation, a residual connection is added after each geometric convolutional layer. The residual connection adds the input and output of the geometric convolutional layer to obtain the final output of that layer.
[0057] Meanwhile, each geometric convolutional layer is followed by a multi-pooling module, which combines max pooling and average pooling to pool the edge features, resulting in pooled features. Then, the pooled features are restored to their original dimensions using an MLP (Multilayer Perceptron, specifically two layers with dimensions of 256-128) as input for the next layer.
[0058] After four layers of geometric convolution, four feature maps at four scales are obtained, each with a dimension of 128. These feature maps are then concatenated to obtain a 512-dimensional local feature map. The pooling features from the four geometric convolutional layers are subjected to global max pooling and global average pooling to obtain the global feature vector. In this embodiment, the global feature dimension is 1024.
[0059] The local features (512 dimensions) and global features (1024 dimensions) are concatenated to obtain a 1536-dimensional fused feature. Then, the fused feature is mapped to the class space through a classification head.
[0060] The classification head consists of two layers: the first layer maps 1536-dimensional features to 256-dimensional features, using GroupNorm normalization, GELU activation, and Dropout regularization; the second layer maps 256-dimensional features to C-dimensional features, where C is the number of categories.
[0061] The network outputs the logits values for each point belonging to each category, with a shape of (N×C), where N is the number of points and C is the number of categories. The logits are then converted into a probability distribution using the softmax function.
[0062] In this embodiment, the dataset is first divided into a training set, a validation set, and a test set. The division ratio is 70:15:15, meaning 20 samples are used for training, 4 samples for validation, and 4 samples for testing. The dataset has been pre-divided and stored in different directories, and the original input of the cells in each sample obtained in step S2 is indexed according to the sample name.
[0063] The key parameters of the model are set as follows: k (number of nearest neighbors): 100, ranging from 10 to 100; emb_dims (embedding dimension): 1024; dropout ratio: 0.1, which can be adjusted within the range of 0 to 0.5; learning_rate: 0.001, ranging from 0.0001 to 0.01; batch_size: 8, ranging from 4 to 16; epochs (number of training epochs): 500, ranging from 100 to 1000; random seed is set to 42, which can be arbitrarily selected but must remain fixed after selection.
[0064] The training cycle includes a hierarchical cycle and a sample cycle. Specifically, the training proceeds layer by layer from shallow to deep according to the hierarchical configuration file obtained in step S1. Each training cycle consists of 500 training rounds, and each training round completes a sample cycle of 20 training samples.
[0065] For each epoch, perform the following steps: First, randomly sample batch_size samples from the training set; second, input the samples into the model, calculate the forward propagation, and obtain the prediction results; third, calculate the loss based on the prediction results and the true labels; fourth, calculate the gradient and update the model parameters; fifth, every 5 epochs, evaluate the model performance on the validation set and record the validation set F1 score.
[0066] The loss functions mentioned above include: Focal Loss: Used to handle class imbalance and focus on hard-to-classify samples. Its calculation formula is: ,in This represents the model's predicted probability for the correct category. This is the focusing parameter, and its value ranges from 1.0 to 3.0. For category weights, the inverse frequency weights are automatically calculated based on the distribution of each label in the current batch; Boundary Loss: Used to constrain the spatial location of predicted points and improve the quality of classification boundaries. It calculates the distance from the predicted point to the boundary based on the convex hull boundary of the true class points and applies a penalty to the predicted points outside the boundary. Boundary Loss adopts a warmup strategy in the initial stage of training, with the weight gradually increasing from 0.1 to 0.5. Smooth Loss: Used to ensure consistency of neighborhood labels and reduce label fragmentation. It constructs graph Laplacian regularization based on the k-nearest neighbor graph structure to constrain the predicted probabilities of points within the neighborhood to remain consistent. The formula for calculating Smooth Loss is: Where E is the set of edges, Gaussian weights based on spatial distance, and This is the predicted probability vector for neighboring points; The total loss function is: .in, , These are the weights of the three loss functions, respectively. In an optional embodiment, The value range is 0.1-0.5. The value range is 0.1-0.3. Preferably, in this embodiment, The value of is 1. The value is 0.1. The value is 0.2.
[0067] The model with the highest F1 score on the validation set is selected as the optimal model.
[0068] Step S4: Accumulate prediction layer by layer.
[0069] In a preferred embodiment, after the model training is completed, the trained model is used to predict the test set data. The prediction process is performed layer by layer from the top layer to the bottom layer.
[0070] For each level, perform the following steps: First, load the trained model parameters for that level; Second, filter cell points belonging to the parent gate from the original data or the prediction results of the previous level, with the filtering condition being that the predicted label of the parent gate is 1; Third, construct a point cloud for the filtered cell points and calculate geometric and high-dimensional features; Fourth, input the point cloud into the model and perform forward propagation to obtain the probability of each point belonging to each category; Fifth, determine the category label of each point based on the probability value and select the category with the highest probability as the predicted category; Sixth, add the prediction results to the data file as the basis for filtering the parent gate of the next level.
[0071] The prediction results from each level are accumulated into the same data file. The prediction results include the category label of each cell and the probability of belonging to each category.
[0072] After all hierarchical predictions are completed, a final prediction result file is generated, containing the complete hierarchical label for each cell. For example, a cell may be labeled as: P1=1, single cells=1, Lym=1, T=1, indicating that the cell passed the four gatings of P1, single cells, Lym, and T, and was ultimately identified as a T cell.
[0073] Step S5: Draw the gate image and output the report.
[0074] As a preferred embodiment, based on the prediction results in step S4, the proportion of cell subtypes at each level is statistically analyzed, a two-dimensional scatter plot is generated, and a report file including user information, sample number, population name, gate parameters, cell proportion and graphic results is output. The report can be exported in PDF or Excel format to display the clustering results, gate positions and statistical information, so as to realize the traceability and manual review of the gate results.
[0075] In a preferred embodiment, tagged cells are first predicted, and the final output shows a comparison between the actual labels and the predicted labels. Figure 3 The image shown is the output image at level P1 for a sample. The left side shows the image drawn with the true label, and the right side shows the image drawn with the predicted label. This image can be used as a reference for manual review. For prediction results with true labels, the Accuracy, Recall, Precision, and F1-Score metrics for each level were calculated, and the results are shown in Table 2 below. Table 2. Hierarchical Gating Prediction Results Indicators As can be clearly seen from Table 2, the immune cell flow cytometry gating method and system based on point cloud algorithm can achieve an accuracy of 0.99 at all levels of gating, with an F1-Score greater than 0.95 at the main level and around 0.8 at the rare cell level, which fully demonstrates the reliability of the method.
[0076] To achieve the above objectives, the present invention also provides an immune cell flow cytometry gating system based on a point cloud algorithm, wherein the system is constructed using the immune cell flow cytometry gating method based on a point cloud algorithm described in any of the foregoing embodiments, such as... Figure 2 As shown, it may include a hierarchical gating configuration module, a data preprocessing and point cloud construction module, a point cloud geometric convolutional network module, a layer-by-layer cumulative prediction module, and a plotting and report output module.
[0077] In an optional embodiment, the system is developed using Python, has a user-friendly interface, and supports operation on both Windows and Linux platforms.
[0078] The hierarchical gating configuration module is used to decompose the cell classification task into a multi-classification problem with multiple levels according to the predefined gating hierarchy structure file. In the data preprocessing module, cell data is extracted from the flow cytometry detection file, X-axis and Y-axis coordinates are extracted, a graph structure is constructed using K-nearest neighbors and local density features are calculated, geometric features and high-dimensional features are calculated, and a point cloud data representation is constructed. In the point cloud geometric convolutional network module, the spatial geometric relationship between cells is captured by using a k-nearest neighbor graph structure and geometric convolutional layers, and the regional features are extended to global features through a multi-level geometric convolutional network, and the model parameters are optimized by combining multiple loss functions. The layer-by-layer cumulative prediction module is used to perform multi-class predictions sequentially from the top to the bottom, and accumulate the prediction results into the same file; The plotting and reporting output module is used to generate visualization results, output prediction result files and prediction reports, and enable traceability and manual review of the gate-wide results.
[0079] The specific implementation method has been described in detail above, and will not be repeated here.
[0080] To achieve the above objectives, the present invention also provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor, when running the program, implements the immune cell flow cytometry gating method based on the point cloud algorithm as described in any of the foregoing embodiments. The processor and memory can be configured separately or integrated together, for example, integrated on a system-on-chip (SOC) of the terminal device.
[0081] To achieve the above objectives, the present invention also provides a computer-readable storage medium storing computer-executable instructions or a computer program, which, when processed and executed, implement the immune cell flow cytometry gating method based on point cloud algorithm as described above.
[0082] The computer-readable storage medium is, for example, memory. Memory can be volatile or non-volatile, or it can include both volatile and non-volatile memory. Non-volatile memory can be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. Volatile memory can be random access memory (RAM), which serves as an external cache. By way of example, but not limitation, many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), double data rate synchronous dynamic random access memory (DDRSDRAM), enhanced synchronous dynamic random access memory (ESDRAM), synchronous linked dynamic random access memory (SLDRAM), and direct rambus RAM (DRRAM).
[0083] If the integrated units in the above embodiments are implemented as software functional units and sold or used as independent products, they can be stored in the aforementioned computer-readable storage medium. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause one or more computer devices (which may be personal computers, servers, or network devices, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention.
[0084] The preferred embodiments of the present invention have been described in detail above. It should be understood that those skilled in the art can make numerous modifications and variations based on the concept of the present invention without creative effort. Therefore, all technical solutions that can be obtained by those skilled in the art based on the concept of the present invention through logical analysis, reasoning, or limited experimentation on the basis of existing technology should be within the scope of protection defined by the claims.
Claims
1. A flow cytometry gating method for immune cells based on point cloud algorithm, characterized in that, Includes the following steps: Step S1: Hierarchical gating configuration; wherein, according to the predefined gating hierarchy structure file, the cell classification task is decomposed into a multi-classification problem at multiple levels; Step S2: Data preprocessing and point cloud construction; wherein, data including at least: forward scattering FSC signal, side scattering SSC signal and multi-channel fluorescence signal are extracted from the flow cytometry detection file, i.e., FCS file; spectral overflow correction of fluorescence signal is performed through compensation matrix; and linear or logarithmic coordinate transformation is performed according to channel type to obtain compensated standardized flow cytometry data; according to the gated hierarchy structure defined in step S1, corresponding two-dimensional coordinate data are extracted for each cell point, and a point cloud structure including adjacency relationship is constructed based on its spatial distribution; Step S3: Point cloud geometric convolutional network training; wherein, for each layer defined in step S1, a corresponding geometric graph convolutional network model is constructed, the point cloud graph structure constructed in step S2 is used as input, the spatial neighborhood information of cell points is aggregated by the geometric convolutional layer of the model to extract features, and the parameters of each layer model are optimized by combining the classification loss function. Step S4: Accumulate prediction layer by layer; predict from top to bottom layer by layer: for the current layer, select a subset of cell points that have been predicted as a specific category in its parent layer, input them into the geometric graph convolutional network model corresponding to the layer for multi-class prediction; accumulate the prediction results of each layer layer by layer, and finally generate a unified result file containing complete hierarchical classification information; Step S5: Gating image drawing and report output; wherein, the proportion of each type of cell at each level is statistically analyzed, a two-dimensional scatter plot is generated according to the level, and the output includes at least: user information, sample number, population name, gating parameters, cell proportion and graphic results in a report file. The report is exported in PDF or Excel format to achieve traceability and manual review of the gating results.
2. The immune cell flow cytometry gating method based on point cloud algorithm according to claim 1, characterized in that, Step S1 includes the following steps: The gating hierarchy structure file is completed according to the gating task. The file is in CSV format and includes: the cell population that needs to be gated and its parent population, the features corresponding to the coordinate axes used for gating, and whether the gating level includes background. The background refers to cells at the same level that are not defined as any subpopulation under the parent gate during the gating process. The same level of the gating is defined as all subpopulations that need to be gated that have the same parent gate and use the same features to define the coordinate axes. In this hierarchy, all cells are used as input for gate formation. Since the gated cells do not belong to any subpopulation of any cell population, the parent gate of this hierarchy is defined as ROOT.
3. The immune cell flow cytometry gating method based on point cloud algorithm according to claim 1, characterized in that, Step S2 includes the following steps: Read the user-uploaded FCS file, parse the HEADER information, and obtain the compensation matrix; The compensation matrix is expanded to a complete matrix containing all channels. The formula for calculating the compensated streaming data is as follows: ; Wherein, FL represents the compensated flow cytometry data. This is the compensation matrix that comes with the FCS file; Select the coordinate transformation method according to the channel type: linear scale is used for FSC or SSC signals, and logarithmic transformation is used for fluorescence signals. Read the compensated flow cytometry data and the hierarchical gating configuration defined in step S1, and extract the X-axis and Y-axis coordinate data for each layer. The coordinates are values normalized to the range of 0-1. Calculate the distance from each cell point to the origin. , where x and y are the X-axis and Y-axis coordinates of the cell point, respectively; Calculate the angle from each cell point to the origin. ,in It is the arctangent function, and the return value ranges from -π to π, where x and y are the X-axis and Y-axis coordinates of the cell point, respectively; Calculate the local density of each cell point based on the k-nearest neighbor distance: Where k is the number of nearest neighbors. Let k be the distance to the k-th nearest neighbor. The minimum value is used to avoid division by zero and normalize the local density to the range of 0-1. Extract all fluorescence channel data except for the current layer's X and Y axes as high-dimensional features, and concatenate the coordinates, distance, angle, density, and high-dimensional features to form a multi-dimensional point cloud feature representation.
4. The immune cell flow cytometry gating method based on point cloud algorithm according to claim 1 or 3, characterized in that, Step S2 is executed through a nested loop structure based on the number of samples and the hierarchical gating configuration defined in step S1. The loop structure includes the processing flow of sample loop and subtype loop. The sample loop is the outer loop that traverses all samples to be analyzed and reads the raw detection data of each sample in sequence. The outer loop ends when all sample data has been processed. The subtype loop is the inner loop that traverses all subtype data to be analyzed for each sample and reads the detection data of each subtype for each sample. The inner loop ends when all subtype data has been analyzed.
5. The immune cell flow cytometry gating method based on point cloud algorithm according to claim 3, characterized in that, In step S2, the construction method of the k-nearest neighbor graph structure includes: The k-nearest neighbor graph structure is calculated based on the two-dimensional coordinates of point cloud, where the value of k ranges from 10 to 100. The calculation is accelerated by using the kd-tree algorithm to obtain the indices and distances of the k nearest neighbors of each point. In the subsequent training process of the point cloud geometric convolutional network, the calculated k-nearest neighbor graph structure is reused without recalculating the k-nearest neighbor relationships, thus maintaining the consistency of the spatial neighborhood and reducing computational overhead. The k-nearest neighbor graph structure is pre-computed and cached during the model training phase and directly reused during the prediction phase; for programs that directly predict, the k-nearest neighbor graph structure is constructed according to the same data preprocessing steps as the training process.
6. The immune cell flow cytometry gating method based on point cloud algorithm according to claim 1, characterized in that, The computation logic for the geometric convolutional layer in step S3 is configured as follows: For each center point in the point cloud, its k neighboring points are obtained based on the k nearest neighbor graph structure; Calculate the coordinate difference between the center point and each neighboring point: ,in The coordinates of the center point, The coordinates of the neighboring points; Calculate the Euclidean distance between the center point and its neighboring points: ; Calculate the angle between the center point and its neighboring points: ; Calculate the cosine similarity between the feature vector of the center point and the feature vectors of its neighboring points: ,in and These are the feature vectors of the center point and its neighboring points, respectively. The coordinate difference, Euclidean distance, angle, and cosine similarity are concatenated into an edge feature vector. The edge features are then transformed and aggregated through convolution operations to extract local geometric features. By stacking multiple layers of geometric convolution, spatial geometric features from local to global are extracted layer by layer.
7. The immune cell flow cytometry gating method based on point cloud algorithm according to claim 1, characterized in that, The geometric graph convolutional network model in step S3 is constructed as follows: It contains n sequentially cascaded geometric convolutional layers, where n is an integer between 2 and 5; each of the geometric convolutional layers is configured to: receive 128-dimensional input point features, aggregate the features of the center point and its k nearest neighbors, and output 128-dimensional features; Each of the geometric convolutional layers is sequentially connected to a residual adder, a multi-pooling module, and a feature-reduction multilayer perceptron; wherein, the residual adder is used to add the input features and output features of the geometric convolutional layer; The multi-pooling module is used to perform max pooling and average pooling on the summed features simultaneously; the feature restoration multilayer perceptron is used to map the pooled features back to 128 dimensions and use them as input features or final features of the next geometric convolutional layer. The model also includes a feature fusion module and a classification head; the feature fusion module is used to concatenate the 128-dimensional feature maps output by all n geometric convolutional layers to obtain local features, and to perform global pooling on the features of all layers to obtain global features, and then concatenate the local features with the global features; the classification head is used to receive the concatenated fused features and to map them to the probability distribution of each cell point belonging to each category through at least one fully connected layer.
8. The immune cell flow cytometry gating method based on point cloud algorithm according to claim 1, characterized in that, In step S3, the model parameters are optimized using multiple loss functions, including: Focal Loss, or classification loss, is used to handle class imbalance and focus on hard-to-classify samples. ,in This represents the model's predicted probability for the correct category. This is the focusing parameter, and its value ranges from 1.0 to 3.
0. For category weights, the inverse frequency weights are automatically calculated based on the distribution of each label in the current batch; Boundary loss is used to constrain the spatial location of predicted points and improve the quality of classification boundaries. It calculates the distance from the predicted point to the boundary based on the convex hull boundary of the true class points and applies a penalty to the predicted points outside the boundary. Boundary loss adopts a warmup strategy in the initial stage of training, with the weight gradually increasing from 0.1 to 0.
5. Smooth Loss, also known as smoothing loss, is used to ensure consistency of neighborhood labels and reduce label fragmentation. It constructs graph Laplacian regularization based on the k-nearest neighbor graph structure to constrain the predicted probabilities of points within the neighborhood to remain consistent. Where E is the set of edges, Gaussian weights based on spatial distance, and This is the predicted probability vector for neighboring points; The total loss function is: ; in , , These are the weights for the three loss functions. The value range is 0.8-1.
2. The value range is 0.1-0.
5. The value range is 0.1-0.
3.
9. The immune cell flow cytometry gating method based on point cloud algorithm according to claim 1, characterized in that, It also includes the fusion of high-dimensional features: Identify the names of the X and Y axes used in the current level; Extract all fluorescence channel data other than the current X-axis and Y-axis from the flow cytometry data as high-dimensional features. The dimension of the high-dimensional features depends on the number of detection channels of the flow cytometer; for multicolor flow cytometry data, the dimension of the high-dimensional features is 10-30. Normalize the high-dimensional features to ensure that the data from different channels are within the same numerical range; High-dimensional features are combined and fused with geometric features calculated based on coordinate data to form a comprehensive feature representation for point cloud structures. The feature dimensions of each cell point are: coordinate dimension (2) + distance feature dimension (1) + angle feature dimension (1) + local density feature dimension (1) + number of high-dimensional fluorescence channels N.
10. A flow cytometry gating system for immune cells based on a point cloud algorithm, characterized in that, The immune cell flow cytometry gating method based on point cloud algorithm as described in any one of claims 1 to 9 includes: Hierarchical gating configuration module: used to decompose the cell classification task into a multi-classification problem with multiple levels according to a predefined gating hierarchy structure file; Data preprocessing module: used to extract cell data from flow cytometry detection files, extract X-axis and Y-axis coordinates, construct graph structure using K-nearest neighbors and calculate local density features, calculate geometric features and high-dimensional features, and construct point cloud data representation; Point cloud geometric convolutional network module: used to capture the spatial geometric relationship between cells using k-nearest neighbor graph structure and geometric convolutional layers, and to extend regional features to global features through multi-level geometric convolutional networks, and to optimize model parameters by combining multiple loss functions; Layer-by-layer cumulative prediction module: used to perform multi-class predictions sequentially from top to bottom, and accumulate the prediction results to the same file; The plotting and reporting output module is used to generate visualization results, output prediction result files and prediction reports, and enable traceability and manual review of the gate-based results.