Comprehensive and standardized system and method for immune system phenotyping and automated cell classification
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- MELIO HEALTHCARE LTD
- Filing Date
- 2023-07-14
- Publication Date
- 2026-07-02
AI Technical Summary
Existing immune profiling methods are laborious, time-consuming, and lack standardization, making it difficult to compare results across experiments and scale to large sample sizes, thus hindering efficient healthcare outcomes.
A method combining full-spectrum flow cytometry with an ensemble machine learning approach to classify immune cells into unique subpopulations, using a standardized immunophenotyping panel and automated data processing, enabling rapid generation of comprehensive immune profiles.
This approach allows for standardized and scalable immune profiling, producing clinically relevant results in a timely manner, facilitating improved diagnostic and treatment monitoring of immune-related diseases.
Smart Images

Figure 00000000_0000_ABST
Abstract
Description
[Technical Field]
[0001] CROSS-REFERENCE TO RELATED APPLICATIONS This application claims the benefit of priority to U.S. Provisional Patent Application No. 63 / 386,476, filed December 7, 2022, the contents of which are incorporated by reference in their entirety.
[0002] Field The present disclosure relates to methods and systems for generating a standardized and comprehensive immune profile for a mammalian (e.g., human) subject. [Background technology]
[0003] background Immune profiling, i.e., analysis of a subject's immune health at a given time point, at the serological or cellular level, can aid in the diagnosis of immune-related diseases and disorders (e.g., allergies, hyper-reactive immune responses (as in asthma and Crohn's disease (inflammatory bowel disease)), or autoimmune diseases (polyglandular syndrome and some facets of diabetes; see, e.g., "Genes and Disease" [Internet], 1998-, "Diseases of the Immune System", National Center for Biotechnology Information (US), Bethesda, MD (Non-Patent Document 1)). Immune profiling can be used, for example, to identify an individual's specific response to an infectious disease (e.g., viral, bacterial, fungal, or parasitic infection), monitor a patient's, e.g., cancer patient's, response to treatment (see, e.g., Lyons, et al. (2017), "Immune Cell Profiling in Cancer: Molecular Approaches to Cell-Specific Identification", Precision Oncology 1, 26 (see Non-Patent Document 2), and in some cases, prediction of healthcare outcomes can also be made.
[0004] Conventional methods for generating immune profiles include, for example, enzyme-linked immunosorbent assays (ELISAs), immunoblotting techniques, and flow cytometry-based techniques, including the use of panels of fluorescently labeled antibodies directed against various cell surface receptors and manual gating of flow cytometry data. However, these techniques are often laborious and time-consuming, and are not easily scalable to levels that allow for the processing of hundreds or thousands of samples.
[0005] More recently, automation techniques have been applied to flow cytometry-based methods. Several programmatic approaches exist for the discovery of higher-order clusters of labeled immune cells and their identification following manual expert guidance. Off-the-shelf clustering algorithms, including tSNE, FlowSOM, and UMAP, are often used for this purpose. Other dedicated algorithms, such as flowType and Phenograph, exist for defining labeled cell clusters in higher-order immune space. Accordingly, companies such as Cytapex and Dotmatics OMIQ produce automated cell gating systems using similar methods. However, despite the existence of established immune profiling platforms and approaches, the immunophenotyping panels used in various experiments tend to differ, requiring users to configure existing automated gating methods for each experiment. As a result, direct comparability between experiments is reduced or eliminated. Therefore, there is a need for improved methods and systems that can provide standardized and comprehensive immune profiles for subjects (e.g., patients) of interest, facilitating biomedical investigations and enabling improved healthcare outcomes. [Prior art documents] [Non-patent literature]
[0006] [Non-Patent Document 1] "Genes and Disease" [Internet], 1998-, "Diseases of the Immune System", National Center for Biotechnology Information (US), Bethesda (MD) [Non-patent document 2] Lyons, et al. (2017), “Immune Cell Profiling in Cancer: Molecular Approaches to Cell-Specific Identification”, Precision Oncology 1, 26 Summary of the Invention
[0007] Quick Overview Disclosed herein are methods and systems for processing samples, such as blood samples, and generating a standardized and comprehensive immune profile for a subject. The disclosed methods and systems combine full-spectrum flow cytometry (FSFC)-based analysis of cells within a sample with an ensemble machine learning-based approach for filtering and classifying individual immune cells into multiple unique immune cell subpopulations. A key advantage of the disclosed methods and systems is the standardization of the immune profiling platform (including the immunophenotyping panel used on fluorescently labeled cells) and the implementation of automated data processing. This allows for the processing of highly complex FSFC data and the generation of immune profiles that can be translated into clinically relevant time frames (i.e., time).
[0008]
[0003] Disclosed herein is a method of generating an immune profile for a subject, the method comprising: contacting at least a first aliquot of a sample from the subject with at least a first immunophenotyping panel to fluorescently label cells contained in the sample; processing the fluorescently labeled cells using a full-spectrum flow cytometer to generate fluorescence intensity data or derived data for a plurality of fluorescently labeled cells from the sample; providing at least a subset of the fluorescence intensity data or derived data for the plurality of fluorescently labeled cells as input to an ensemble machine learning model configured to process the fluorescence intensity data or derived data and to classify individual cells of the plurality of fluorescently labeled cells as belonging to one of a plurality of unique immune cell subpopulations; and outputting a total cell count or cell frequency for each of a plurality of unique immune cell subpopulations in the sample as part of the immune profile for the subject.
[0009] In some embodiments, the ensemble machine learning model is optimized in a cascading hierarchical tree structure including multiple nodes, each node including an individual machine learning model. In some embodiments, each individual machine learning model includes one input dataset and 1 to 8 output datasets corresponding to branches of the cascading hierarchical tree structure. In some embodiments, each individual machine learning model includes a neural network model. In some embodiments, each individual machine learning model includes a gradient descent boosted tree model. In some embodiments, the multiple nodes include at least 1000, 1200, 1400, 1600, 1800, 2000, 2200, or 2400 nodes. In some embodiments, the number of individual machine learning models in the ensemble machine learning model is equal to the number of unique immune cell subpopulations in the multiple unique immune cell subpopulations. In some embodiments, the design of the cascading hierarchical tree structure is based, at least in part, on expert analysis of manually gated fluorescence intensity data or data derived therefrom for one or more control samples.
[0010] In some embodiments, an individual cell is classified independently of all other cells in the plurality of fluorescently labeled cells, hi some embodiments, an individual cell is recursively classified with all other cells in the plurality of fluorescently labeled cells.
[0011] In some embodiments, the ensemble machine learning model is trained using one or more labeled training datasets generated by an expert by manually gating fluorescence intensity data for one or more control samples or data derived therefrom. In some embodiments, the individual machine learning models in the ensemble machine learning model are trained individually using one or more labeled training datasets. In some embodiments, during training, predictions from the individual models are used to validate the individual models but are not propagated through the ensemble machine learning model, thereby eliminating error propagation during training. In some embodiments, the individual machine learning models in the ensemble machine learning model are trained together using a recursive training method. In some embodiments, the training of the ensemble machine learning model is controlled by one or more hyperparameter values that are the same for each node in the cascaded hierarchical tree structure. In some embodiments, the training of the ensemble machine learning model is controlled by one or more hyperparameter values that are different for a subset of nodes in the cascaded hierarchical tree structure. In some embodiments, the training of the ensemble machine learning model is controlled by one or more hyperparameter values that are determined by performing a random grid search of value ranges for one or more hyperparameters.
[0012] In some embodiments, the method further comprises performing a mathematical transformation of the fluorescence intensity data or data derived therefrom prior to using the transformed fluorescence intensity data as input to the ensemble machine learning model.
[0013] In some embodiments, the fluorescence intensity data or data derived therefrom comprises fluorescence intensity data for at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, 35, or 40 fluorescence detection channels. In some embodiments, the fluorescence intensity data or data derived therefrom further comprises forward scatter height data, forward scatter area data, side scatter height data, side scatter area data, autofluorescence data, or any combination thereof.
[0014] In some embodiments, the sample comprises a blood sample, a buffy coat sample, or a cell suspension.
[0015] In some embodiments, at least one immunophenotyping panel comprises a panel of fluorescently labeled antibodies directed against cell surface proteins associated with antigen-presenting cells (APCs). In some embodiments, the panel of fluorescently labeled antibodies comprises fluorescently labeled antibodies directed against IGM, CD5, CD62L, CD294, CD69, CD38, PD1, CD11C, CD3, CD8, HLADR, CD24, CD337, CD123, CD141, CD1C, CD4, TACI, CD319, CD335, PDL1, CD10, CD45, CD16, IGD, CD40, CD19_TCRGD, CD43, CD14, CD138, CD15, CD56, CD86, CD303, CD27, or any combination thereof. In some embodiments, the panel of fluorescently labeled antibodies further comprises fluorescently labeled antibodies directed against cell surface markers indicative of live cells, dead cells, or both.
[0016] In some embodiments, at least one immunophenotyping panel comprises a panel of fluorescently labeled antibodies directed to cell surface proteins associated with T cells, hi some embodiments, the panel of fluorescently labeled antibodies comprises fluorescently labeled antibodies directed to TIGIT, CD5, CD28, CXCR5, CD39, TIM3, CD38, PD1, TCRVA7_2_TCRVD1, CD95, CD3, CD8, HLADR, CD31, CCR4, CCR6, CCR7, CD57, ICOS, CD4, KLRG1, TCRVA24_JA18, CD122, CD103, CXCR3, TCRVD2, CD45, CCR10, CD16, CD25, CD161, CD19_TCRGD, LAG3, CD14, CD45RO, CD56, CD127, CD45RA, CD27, or any combination thereof.
[0017] In some embodiments, the plurality of unique immune cell subpopulations comprises at least 1000, 1200, 1400, 1600, 1800, 2000, 2200, or 2400 unique immune cell subpopulations. In some embodiments, the plurality of unique immune cell subpopulations comprises white blood cells (WBCs), eosinophils, eosinophils / CD5+, neutrophils, neutrophils / large, neutrophils / CD5+, neutrophils / small, B cells, B cells / CD5- CD27-, monocytes / CD56+, monocytes / CD56-, NK cells, dendritic cells (DCs), T cells, iNKT cells, gamma delta T cells (total GD), Vd1 cells, Vd2 cells, Vdx cells, mucosal-associated invariant T (MAIT) cells, TEMRA cells, CD4 naive cells, T helper cells, CD4 effector memory cells, Treg cells, or any combination thereof.
[0018] In some embodiments, the total cell count or cell frequency for each of a plurality of unique immune cell subpopulations in a sample is output as part of an immune profile in less than 24 hours, less than 12 hours, less than 8 hours, less than 6 hours, or less than 4 hours.
[0019] In some embodiments, the immune profile is used to diagnose an immune-related disease or disorder, monitor the progression of an immune-related disease or disorder, or monitor the response to treatment of an immune-related disease or disorder in a subject.
[0020] Disclosed herein is a computer-implemented method for generating an immune profile for a subject, the method comprising: receiving fluorescence intensity data or data derived therefrom generated using a full-spectrum flow cytometer and processing a fluorescently labeled cell sample collected from the subject; providing at least a subset of the fluorescence intensity data or data derived therefrom for a plurality of fluorescently labeled cells as input to an ensemble machine learning model configured to process the fluorescence intensity data or data derived therefrom and to classify individual cells of the plurality of fluorescently labeled cells as belonging to one of a plurality of unique immune cell subpopulations; and outputting a total cell number or cell frequency for each of the plurality of unique immune cell subpopulations in the sample as part of the immune profile for the subject.
[0021] In some embodiments, the ensemble machine learning model is optimized in a cascading hierarchical tree structure including multiple nodes, each node including an individual machine learning model. In some embodiments, each individual machine learning model includes one input dataset and 1 to 8 output datasets corresponding to branches of the cascading hierarchical tree structure. In some embodiments, each individual machine learning model includes a neural network model. In some embodiments, each individual machine learning model includes a gradient descent boosted tree model. In some embodiments, the multiple nodes include at least 1000, 1200, 1400, 1600, 1800, 2000, 2200, or 2400 nodes. In some embodiments, the number of individual machine learning models in the ensemble machine learning model is equal to the number of unique immune cell subpopulations in the multiple unique immune cell subpopulations. In some embodiments, the design of the cascading hierarchical tree structure is based, at least in part, on expert analysis of manually gated fluorescence intensity data or data derived therefrom for one or more control samples.
[0022] In some embodiments, an individual cell is classified independently of all other cells in the plurality of fluorescently labeled cells, hi some embodiments, an individual cell is recursively classified with all other cells in the plurality of fluorescently labeled cells.
[0023] In some embodiments, the ensemble machine learning model is trained using one or more labeled training datasets generated by an expert by manually gating fluorescence intensity data for one or more control samples or data derived therefrom. In some embodiments, the individual machine learning models in the ensemble machine learning model are trained individually using one or more labeled training datasets. In some embodiments, during training, predictions from the individual models are used to validate the individual models but are not propagated through the ensemble machine learning model, thereby eliminating error propagation during training. In some embodiments, the individual machine learning models in the ensemble machine learning model are trained together using a recursive training method. In some embodiments, the training of the ensemble machine learning model is controlled by one or more hyperparameter values that are the same for each node in the cascaded hierarchical tree structure. In some embodiments, the training of the ensemble machine learning model is controlled by one or more hyperparameter values that are different for a subset of nodes in the cascaded hierarchical tree structure. In some embodiments, the training of the ensemble machine learning model is controlled by one or more hyperparameter values that are determined by performing a random grid search of value ranges for one or more hyperparameters.
[0024] In some embodiments, the method further comprises performing a mathematical transformation of the fluorescence intensity data or data derived therefrom prior to using the transformed fluorescence intensity data as input to the ensemble machine learning model.
[0025] In some embodiments, the fluorescence intensity data or data derived therefrom comprises fluorescence intensity data for at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, 35, or 40 fluorescence detection channels. In some embodiments, the fluorescence intensity data or data derived therefrom further comprises forward scatter height data, forward scatter area data, side scatter height data, side scatter area data, autofluorescence data, or any combination thereof.
[0026] In some embodiments, the plurality of unique immune cell subpopulations comprises at least 1000, 1200, 1400, 1600, 1800, 2000, 2200, or 2400 unique immune cell subpopulations. In some embodiments, the plurality of unique immune cell subpopulations comprises white blood cells (WBCs), eosinophils, eosinophils / CD5+, neutrophils, neutrophils / large, neutrophils / CD5+, neutrophils / small, B cells, B cells / CD5- CD27-, monocytes / CD56+, monocytes / CD56-, NK cells, dendritic cells (DCs), T cells, iNKT cells, gamma delta T cells (total GD), Vd1 cells, Vd2 cells, Vdx cells, mucosal-associated invariant T (MAIT) cells, TEMRA cells, CD4 naive cells, T helper cells, CD4 effector memory cells, Treg cells, or any combination thereof.
[0027] In some embodiments, the total cell number or cell frequency for each of multiple unique immune cell subpopulations in a sample is output as part of an immune profile in less than 24 hours, less than 12 hours, less than 8 hours, less than 6 hours, or less than 4 hours.
[0028] In some embodiments, the immune profile is used to diagnose an immune-related disease or disorder, monitor the progression of an immune-related disease or disorder, or monitor the response to treatment of an immune-related disease or disorder in a subject.
[0029] Disclosed herein is a system comprising: one or more processors; and memory communicatively coupled to the one or more processors and configured to store instructions that, when executed by the one or more processors, cause the system to receive fluorescence intensity data or data derived therefrom generated using a full-spectrum flow cytometer, process a fluorescently labeled cell sample collected from the subject, provide at least a subset of the fluorescence intensity data or data derived therefrom for a plurality of fluorescently labeled cells as input to an ensemble machine learning model configured to process the fluorescence intensity data or data derived therefrom and classify individual cells of the plurality of fluorescently labeled cells as belonging to one of a plurality of unique immune cell subpopulations, and output a total cell number or cell frequency for each of the plurality of unique immune cell subpopulations in the sample as part of an immune profile for the subject. In some embodiments, the system further comprises a full-spectrum flow cytometer.
[0030] Disclosed herein is a non-transitory computer-readable storage medium storing one or more programs comprising instructions that, when executed by one or more processors of the system, cause the system to receive fluorescence intensity data or data derived therefrom generated using a full-spectrum flow cytometer, process a fluorescently labeled cell sample collected from the subject, provide at least a subset of the fluorescence intensity data or data derived therefrom for a plurality of fluorescently labeled cells as input to an ensemble machine learning model configured to process the fluorescence intensity data or data derived therefrom and to classify individual cells of the plurality of fluorescently labeled cells as belonging to one of a plurality of unique immune cell subpopulations, and output a total cell number or cell frequency for each of the plurality of unique immune cell subpopulations in the sample as part of an immune profile for the subject.
[0031] Incorporation by Reference All publications, patents, and patent applications mentioned in this specification are incorporated herein by reference in their entirety to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated herein by reference in its entirety. In the event of a conflict between a term in this specification and a term in an incorporated reference, the term in this specification shall control. [Brief explanation of the drawings]
[0032] Various aspects of the disclosed methods, apparatus, and systems are set forth with particularity in the appended claims. A better understanding of the features and advantages of the disclosed methods, apparatus, and systems will be obtained by reference to the following detailed description of exemplary embodiments and the accompanying drawings, in which:
[0033] [Figure 1] FIG. 1 shows a non-limiting example of a process flow diagram for a method of processing a blood sample and generating an immune profile according to one embodiment described herein. [Figure 2] 1 illustrates a non-limiting example process flow diagram for a method for training an ensemble machine learning (ML) model, according to one embodiment described herein. [Figure 3] 1 shows an exemplary illustration of machine learning model-based prediction and immune profile generation in accordance with the methods and systems described herein. [Figure 4] 1 illustrates an exemplary computing system in accordance with some embodiments of the methods and systems described herein. [Figure 5] 1 shows a non-limiting schematic diagram of the manual gating process for processing full spectrum flow cytometry data. [Figure 6] 1 shows a non-limiting example of a simplified gating hierarchy used to build a neural network for immune cell classification. [Figure 7] 1 shows a simplified schematic diagram of a neural network. [Figure 8]1 shows a non-limiting example of test data generated by a trained neural network classifier. [Figure 9] 10 shows another non-limiting example of test data generated by a trained neural network classifier. [Figure 10] 1 shows a non-limiting example of validation data generated by a trained neural network classifier. [Figure 11] 10 shows another non-limiting example of validation data generated by a trained neural network classifier. [Figure 12] 1 illustrates a non-limiting example of data flowing through a cascade prediction from node to node, i.e., from parent ML model output to child ML model input, in an ensemble machine learning model. [Figure 13A] 1 shows non-limiting example data for T helper 17 (T17) cell frequency and vitamin D levels for different age groups, considering gender as a covariate. [Figure 13B] 1 shows non-limiting example data for T helper 2 (T2) cell frequency and vitamin D levels for different age groups, considering gender as a covariate. DETAILED DESCRIPTION OF THE INVENTION
[0034] Detailed Description Disclosed herein are methods and systems for processing samples, such as blood samples, and generating a standardized and comprehensive immune profile for a subject. The disclosed methods and systems combine full-spectrum flow cytometry (FSFC)-based analysis of cells within a sample with an ensemble machine learning-based approach for filtering and classifying individual immune cells into multiple unique immune cell subpopulations. A key advantage of the disclosed methods and systems is the standardization of the immune profiling platform (including the immunophenotyping panel used on fluorescently labeled cells) and the implementation of automated data processing. This allows for the processing of highly complex FSFC data and the generation of immune profiles that can be translated into clinically relevant time frames (i.e., time).
[0035] In some cases, for example, methods are described for generating an immune profile for a subject, the method including: contacting at least a first aliquot of a sample from the subject with a first immunophenotyping panel to fluorescently label cells contained in the sample; processing the fluorescently labeled cells using a full-spectrum flow cytometer to generate fluorescence intensity data or derived data for a plurality of fluorescently labeled cells from the sample; providing at least a subset of the fluorescence intensity data or derived data for the plurality of fluorescently labeled cells as input to an ensemble machine learning model configured to process the fluorescence intensity data or derived data and to classify individual cells of the plurality of fluorescently labeled cells as belonging to one of a plurality of unique immune cell subpopulations; and outputting a total cell number or cell frequency for each of the plurality of unique immune cell subpopulations in the sample as part of the immune profile for the subject.
[0036] Also described is a system comprising one or more processors; and a memory communicatively coupled to the one or more processors and configured to store instructions that, when executed by the one or more processors, cause the system to receive fluorescence intensity data or data derived therefrom acquired using a full-spectrum flow cytometer, process a fluorescently labeled cell sample collected from the subject, provide at least a subset of the fluorescence intensity data or data derived therefrom for a plurality of fluorescently labeled cells as input to an ensemble machine learning model configured to process the fluorescence intensity data or data derived therefrom and to classify individual cells of the plurality of fluorescently labeled cells as belonging to one of a plurality of unique immune cell subpopulations, and output a total cell number or cell frequency for each of a plurality of unique immune cell subpopulations in the sample as part of an immune profile for the subject.
[0037] Also described is a non-transitory computer-readable storage medium storing one or more programs comprising instructions that, when executed by one or more processors of the system, cause the system to receive fluorescence intensity data or data derived therefrom acquired using a full-spectrum flow cytometer, process a fluorescently labeled cell sample collected from the subject, provide at least a subset of the fluorescence intensity data or data derived therefrom for a plurality of fluorescently labeled cells as input to an ensemble machine learning model configured to process the fluorescence intensity data or data derived therefrom and to classify individual cells of the plurality of fluorescently labeled cells as belonging to one of a plurality of unique immune cell subpopulations, and output a total cell number or cell frequency for each of the plurality of unique immune cell subpopulations in the sample as part of an immune profile for the subject.
[0038] definition Unless otherwise defined, all technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
[0039] As used in this specification and the appended claims, the singular forms "a," "an," and "the" include plural references unless the context clearly indicates otherwise. Any reference to "or" herein is intended to include "and / or" unless clearly stated otherwise and includes any and all possible combinations of one or more of the associated listed items.
[0040] As used herein, the terms "includes," "including," "comprises," and / or "comprising" specify the presence of stated features, integers, steps, operations, elements, components, and / or units, but do not exclude the presence or addition of one or more other features, integers, steps, operations, elements, components, units, and / or groups thereof.
[0041] Throughout this application, various parameter values may be presented in a range format. Descriptions in range format should be understood merely for convenience and simplicity and should not be construed as an inflexible limitation on the scope of the present disclosure. Thus, a description of a range should be construed as including all possible subranges specifically disclosed and individual numerical values within that range, regardless of whether a specific numerical value or subrange is explicitly recited. For example, a description of a range such as 1 to 6 should be construed as including specifically disclosed subranges such as 1 to 3, 1 to 4, 1 to 5, 2 to 4, 2 to 6, 3 to 6, etc., as well as individual numbers within that range, e.g., 1, 1.4, 2, 3, 3.6, 4, 5, 5.8, and 6. This applies regardless of the breadth of the range.
[0042] Numbers can be expressed herein as being "about" a particular value. Similarly, ranges can be expressed herein as from "about" one particular value and / or to "about" another particular value. The terms "about" and "approximately" generally refer to an acceptable degree of error or variation for a given value or range of values, for example, within 20 percent (%), within 15%, within 10%, or within 5% of the given value or range of values.
[0043] It should be understood that the use of sequential terms such as "first" and "second" in describing the methods and systems disclosed herein does not in itself imply any priority, an order in which one system component is more important than another, or a chronological order of actions in which a method is performed, but is merely used as a marker to distinguish one system component having a particular name from another system component having the same name but for the use of sequential terms, for example, to distinguish between two system components.
[0044] Furthermore, various implementations of the methods and systems described herein may be described in terms of exemplary block diagrams, process flow charts, and other illustrations. As will be apparent to one of ordinary skill in the art after reading this specification, the various implementations described herein can be implemented without being bound to the illustrated examples. For example, block diagrams and their accompanying descriptions should not be construed as mandating a particular architecture or configuration. Similarly, in the exemplary process flow charts, some blocks are optionally combined, the order of some blocks is optionally changed, and some blocks are optionally excluded. In some implementations, additional steps can be performed in combination with the example processes. Accordingly, the methods and systems described and illustrated in further detail below are exemplary in nature and, as such, should not be considered limiting.
[0045] As used herein, the terms "full spectrum flow cytometry" and "full spectrum flow cytometer" refer to a technique and an instrument, respectively, that perform flow cytometry in which the instrument is configured to capture the entire emission spectrum of fluorescent molecules using an array of highly sensitive photodetectors, thereby enabling the capture of highly multiplexed emission spectral data sets.
[0046] As used herein, the term "immunophenotyping panel" refers to a panel of antibodies (e.g., fluorescently labeled antibodies) used to identify cells based on the types of antigens or markers (e.g., cell surface receptor proteins) present on their surface.
[0047] The section headings used herein are for organizational purposes only and are not to be construed as limitations on the subject matter described. The description is presented to enable one of ordinary skill in the art to make and use the invention and is provided in the context of a patent application and its requirements.
[0048] Methods for immune system phenotyping and automated cell sorting As mentioned above, there are certain programmatic approaches for discovering higher-order clusters of labeled immune cells and dedicated algorithms for defining higher-order clusters in immune space. However, these approaches and algorithms have many drawbacks, as described in some FlowCAP challenge papers (e.g., Aghaeepour, et al. (2013), "Critical Assessment of Automated Flow Cytometry Data Analysis Techniques," Nature Methods 10(3):228-239).
[0049] Furthermore, as mentioned above, certain automated cell gating approaches for flow cytometry applications are known, however, these previous approaches rely on identifying patterns present in the distribution of aggregate flow cytometry events and clustering-first approaches, and do not utilize an ensemble of machine learning models to classify individual cell identities.
[0050] Overall, current immunophenotyping platforms (including high-throughput approaches) have several limitations, including two issues: (1) phenotyping platforms are purpose-built and vary sufficiently in design to confound data comparisons between different experiments and across different immunophenotyping panels, and (2) the manual work required to establish counts and / or frequencies of unique immune cell subpopulations identified by immunophenotyping panels prevents analyses from scaling beyond thousands of samples. For existing platforms and approaches, immunophenotyping panels used in different experiments are not comparable, and existing automated gating methods must be configured for each experiment to be applicable only to that experiment. Results from immune profiling experiments are therefore not generalizable.
[0051] Therefore, there is a need for improved immunophenotyping platforms and methods.
[0052] Disclosed herein are immunophenotyping platforms (also referred to as immune profiling platforms) and methods that address one or more of the above-mentioned deficiencies and needs. The immunophenotyping methods disclosed herein involve performing high-throughput, full-spectrum flow cytometry in the context of developing an immunophenotyping process, including several stages: sample processing, data generation, raw data analysis, and results analysis. The disclosed immunophenotyping platforms and methods, combined with the novel application of machine learning algorithms disclosed herein, can provide a scalable, high-throughput, automated method that addresses the above-mentioned deficiencies and needs.
[0053] The immunophenotyping platform disclosed herein provides the ability to process a blood sample from an individual (or any other single cell suspension of immune cells extracted from a tissue sample) and generate an immune signature for the individual using full-spectrum flow cytometry. Although described above in the context of immune profiling, the disclosed immunophenotyping platform and methods can also be used more generally to generate cell type profiles based on FSFC analysis of any blood sample (or other single cell suspension) for which a suitable panel of fluorescently labeled antibodies directed against an appropriate set of distinguishing cell surface antigens can be assembled.
[0054] In some cases, a whole blood sample can be processed using FSFC to generate a fluorescence profile for each cell (e.g., each immune cell) within the sample. In some cases, the blood sample can be processed to extract immune cells, which can then be processed using FSFC to generate a fluorescence profile for each immune cell extracted from the sample. These characteristics can then be used to generate an immune profile for the individual. This can be a highly standardized process that produces directly comparable results for analyses performed in different laboratories or at different times. In some cases, these results are used to train a machine learning model that is used to predict the immune cell category (i.e., unique immune cell subpopulation) to which each detected cell belongs. Together, the disclosed immunophenotyping platform and machine learning training and prediction framework enable the generation of clinically relevant reference ranges for immune cell subtypes, which support diagnostic decisions by utilizing machine learning-based automation of sample processing and flow cytometry data processing in a scalable and less biased approach.
[0055] The immunophenotyping platform and method disclosed herein differ from existing platforms and methods in that other platforms do not provide comprehensive immune system-level cellular profiles at high sample processing throughput. Existing platforms focus on specific solutions to specific scientific questions. The methods disclosed herein enable an integrated, automated, and standardized approach to generate high-resolution immune profiles from blood samples at unprecedented speed and scale. The standardized and high-sample-throughput nature of the platform (e.g., up to 100, 150, 200, or 250 samples processed per FSFC instrument in 8 hours / day) enables the establishment of biological reference ranges for cell types or subtypes (e.g., immune cell subtypes) for human populations, resulting in better patient satisfaction and improved clinical diagnostic applications.
[0056] The platform and method disclosed herein utilizes full-spectrum flow cytometry (FSFC) coupled with a learning algorithm to analyze cells labeled with a standardized set of immune system status antibody panels. This represents the first comprehensive and standardized immunophenotyping platform with automated cell sorting. First, all biological samples processed by the FSFC instrument are processed using highly standardized and rigorously controlled procedures that are subject to quality control (QC) specifications. Second, the training and application of custom machine learning models to analyze the resulting flow cytometry data enables the generation of immune profiles for blood samples on a timeline measured in hours.
[0057] Previous attempts at classifying the identity of immune cells have been based on clustering approaches similar to those used in manual gating. The ML-based data analysis pipeline disclosed herein differs from previous approaches in that it examines the fluorescent signature of each cell and classifies the signature based on an ensemble of gradient-boosting machine learning models. Thus, the identification of immune cell clusters and their hierarchical positioning is a by-product of individual cell identification and classification by applying a trained ensemble model containing thousands of individual machine learning models, and is not based on cluster-first identification, followed by measuring the concordance of individual cells with the identified clusters.
[0058] 1 shows a non-limiting example process flow diagram for the immune profiling method 100 disclosed herein. As shown, a blood sample (or, in some cases, a buffy coat sample, a peripheral blood mononuclear cell (PBMC) sample, or a cell suspension, etc.) is received at a laboratory facility, the sample is prepared (102) (e.g., including performing one or more of the following steps: dilution, centrifugation, staining (using one or more fluorescently labeled antibody panels), and / or washing), and analyzed (104) on a full spectrum flow cytometer (FSFC), and a flow cytometry standard FCS file (containing the flow cytometry data) is generated for the sample (106) that is stored in a database. An FCS file can include, for example, fluorescence intensity data for one or more fluorescence detection channels (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, 35, 40, 45, 50, or more than 50 fluorescence detection channels), as well as data derived therefrom (e.g., forward scatter height data, forward scatter area data, side scatter height data, side scatter area data, autofluorescence data, or any combination thereof). In some cases, the number of available fluorescence detection channels can be determined by, for example, a combination of detection hardware available as part of a flow cytometry instrument (e.g., with 5, 10, 20, 25, 50, 75, 100, 125, 150, 175, 200, or more than 200 detectors) and spectrally distinct fluorophores (e.g., 5, 10, 20, 25, 30, 35, 40, 45, 50, 60, or more than 60 spectrally distinct fluorophores).
[0059] During the training phase, data for several FCS files can be manually gated (e.g., by an immunologist or other expert) to generate labeled training datasets for one or more samples (e.g., control samples), and machine learning models (e.g., individual models in an ensemble machine learning model) are trained using the one or more labeled training datasets (108). These training sets are complete instances of the gating hierarchy implemented in the FSFC output for a sample (e.g., a blood sample). The output of the gating process is a series of industry-standard FlowJo workspace files that describe the immunophenotyping classification hierarchy associated with each sample. The gating data encoded in these files is then extracted and fed through a cascading ensemble machine learning hierarchy that follows the same gating procedure. Within the hierarchy, each node represents an execution pipeline that trains and sustains a single ML model that predicts events specific to the corresponding biological cell type for that position in the hierarchy. These models are trained using specified fluorescence channel data and gate position parameters provided by a domain expert.
[0060] In some cases, the disclosed immunophenotyping and automated data analysis platform can classify all cells in a cell population using a single machine learning model. While this can work well for some cell populations, for populations where the true number of positive classes (cell subtypes) is small (e.g., less than about 100), class imbalance can overwhelm the ability of a single ML model to reliably classify cell detection events. Weighting detection events of smaller classes can produce extreme cases of misclassification, which can cause model performance degradation (especially when class imbalance occurs). In some cases, gradient-boosting machine learning approaches (e.g., gradient-boosting ensemble models) can provide superior performance.
[0061] The trained machine learning model 110 is then used to generate predictions of cell types or subtypes (e.g., immune cell subpopulations) for individual cell detection events, and measure cell counts (or frequencies) for each of multiple unique cell types or subtypes, which are then combined into an immune profile 112.
[0062] As described above, processing 102 a sample (e.g., a blood sample) can include one or more steps, including a staining step that involves contacting cells in at least a first aliquot of the sample with at least a first panel of fluorescently labeled antibodies (i.e., an immunophenotyping panel or FSFC panel) directed to a set of specific cell surface antigens (e.g., cell surface proteins) that collectively enable differentiation among cell types or subtypes of interest. Sample processing can also include immunophenotyping panel design. The sample processing platform can include contacting each of one or more sample aliquots (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or 11 or more sample aliquots) with one or more FSFC panels (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or 11 or more FSFC panels).
[0063] For example, in some cases, two sample aliquots can each be stained with a different FSFC panel, with one sample focusing on the antigen-presenting cell (APC) population of the immune system (A panel) and containing antibodies directed to 36 different cell surface proteins, and the other sample focusing on a matched population of the immune system (T panel) and containing antibodies directed to 41 different cell surface proteins. In some cases, the panel can also include a cell viability stain to distinguish between live and dead cells. In some cases, the panel can also include autofluorescence measurements as a "marker." Specific examples of cell surface proteins and additional markers that can be included in these panels are listed in Table 1.
[0064] [Table 1]
[0065] These panels are custom tailored to the immunophenotyping platform disclosed herein and are designed to provide the most comprehensive view of the sample donor's immune system status. The panels include markers for measuring immune cell type, immune system activation, lineage (e.g., key marker(s) commonly used to define a particular cell population prior to further cell type subsetting; examples include, but are not limited to, CD3 to define total T cells and CD56 and CD16 to define natural killer cells), and exhaustion (cells expressing markers associated with "cell exhaustion" (e.g., PD-1, TIGIT) can no longer proliferate and may lose function as a result of prolonged, chronic stimulation / activation of the immune response). The immunophenotyping platform processes the fluorescence profile for each detected cell and also defines a general hierarchy (known as a gating tree) used to determine which cells, and how many of them, belong to each measured population (e.g., immune cell subpopulation) within the hierarchy. This includes 200+ gates for the APC panel and 2000+ gates for the T cell panel for the current configuration of the platform.
[0066] As part of panel design, it may be advantageous to define the maximum number of different fluorochromes that can be used within a single panel (related to the number of different fluorescent detection panels available on the FSFC instrument) to maximize coverage of the available spectrum while also providing clearly distinguishable signals to distinguish cell surface markers from one another. Multiple replicates can be used to determine panel design parameters, and biological constraints can be used to multiplex dye usage (e.g., because the γδ(GD) T cell receptor (TCR GD) cannot be expressed on B cells, the same fluorophore, also conjugated to anti-CD19 and anti-TCR GD antibodies, can be used to identify B cells and GD T cells, respectively).
[0067] Additionally, for panel design, it may be advantageous to minimize the sample (e.g., blood) volume required for sample processing. In some cases, the processes described herein use whole blood directly instead of isolating peripheral blood mononuclear cells. This allows for the measurement of granulocyte count and frequency.
[0068] Referring back to Figure 1, data generation can refer to a largely automated data generation pipeline that includes performing full-spectrum flow cytometry on a prepared blood sample to generate raw immunofluorescence intensities for each detected cell (e.g., immune cell) (104) and registering the output in a database (106). For example, in some cases where two immunophenotyping panels are used to stain an aliquot of the blood sample (e.g., the A panel and T panel described above), this process generates two datasets, one for each of the two immunophenotyping panels. In some cases, automated data transformation (e.g., mathematical transformation of fluorescence intensity data or data derived therefrom to minimize the impact of outlier data points) can be triggered by storing the FCS file on the FSFC instrument's hard drive. The transformed data can then be used as input for a machine learning model configured to classify individual cells according to cell type or subtype.
[0069] Any of a variety of supervised machine learning models can be used to implement the methods and systems described herein, including, but not limited to, neural networks (e.g., deep neural networks), decision trees, and random forests.
[0070] In some cases, the disclosed methods and systems can be implemented using an ensemble machine learning model that includes multiple individual machine learning models. For example, in some cases, the disclosed methods and systems can be implemented using an ensemble machine learning model that is configured to process fluorescence intensity data or data derived therefrom and classify individual cells of a plurality of fluorescently labeled cells as belonging to one of a plurality of unique immune cell subpopulations.
[0071] In some cases, the ensemble machine learning model can be organized, for example, in a cascading hierarchical tree structure including multiple nodes, with each node including an individual machine learning model. In some cases, each individual machine learning model includes one input dataset and eight or fewer output datasets (e.g., 1, 2, 3, 4, 5, 6, 7, or 8 output datasets) corresponding to branches of the cascading hierarchical tree structure. In some cases, each individual machine learning model can include a neural network model. In some cases, each individual machine learning model can include a gradient descent boosted tree model. In some cases, the plurality of nodes (i.e., the number of individual machine learning models in the ensemble) includes at least 1000, 1200, 1400, 1600, 1800, 2000, 2200, or 2400 nodes. In some cases, the number of individual machine learning models in the ensemble machine learning model is equal to the number of unique immune cell subpopulations in the plurality of unique immune cell subpopulations.
[0072] In some cases, the design of the cascade hierarchical tree structure can be based, at least in part, on expert analysis of manually gated fluorescence intensity data or data derived therefrom for one or more samples, such as control samples described elsewhere herein. In some cases, individual cells can be classified by a machine learning model (e.g., an ensemble machine learning model) independent of all other cells in the plurality of fluorescently labeled cells contained in the sample. In some cases, individual cells can be recursively classified with all other cells in the plurality of fluorescently labeled cells in the sample.
[0073] As described above, machine learning models (e.g., individual models in an ensemble machine learning model) are trained during a training phase using a labeled training dataset generated using one or more FCS files for one or more samples (e.g., control samples) that have been manually gated, e.g., by an immunologist or other expert (108). In some cases, the one or more samples can include a whole blood sample. In some cases, the one or more control samples can include a cell suspension, e.g., containing purified or partially purified cells of a single or only a few cell types or subtypes. Training and predictions can be periodically updated as the model architecture evolves, better hyperparameters are identified, or new data becomes available.
[0074] In some cases, individual machine learning models in an ensemble machine learning model can be trained individually using one or more tagged training datasets. For example, to minimize or eliminate error propagation during the training process, predictions of the individual models can be used to validate the individual models during training, but not propagate through the ensemble machine learning model.
[0075] In some cases, the individual machine learning models in the ensemble machine learning model can be trained together, for example, using an iterative training method.
[0076] In some cases, training of an ensemble machine learning model (or individual models included therein) can be controlled by one or more hyperparameter values that are the same for each node (individual model) in a cascading hierarchical tree structure. In some cases, training of an ensemble machine learning model can be controlled by one or more hyperparameter values that are different for each node (individual model) in a cascading hierarchical tree structure, or different for a subset of nodes. In some cases, training of an ensemble machine learning model can be controlled by one or more hyperparameter values that are determined by performing a random grid search of value ranges for one or more hyperparameters.
[0077] Referring back to FIG. 1 , data analysis can refer to the automated application (110) of trained machine learning models (e.g., an ensemble machine learning model including 2200+ individual machine learning models) to raw data files generated by the FSFC instrument and stored in a database (106). Cascading layers of the ensemble machine learning model process all or a portion of the fluorescence intensity readings (or data derived therefrom) from each cell to predict the cell's intensity (i.e., the cell type or subtype (also referred to herein as a cell subpopulation) to which it belongs), as measured by a gating hierarchy defined by the immunophenotyping panel design.
[0078] The inputs for each ML model can be the same set of cell surface markers used by the expert to analyze data at that level of the hierarchy. For example, to measure events identified as neutrophils, the expert might examine a two-dimensional plot with side scatter area on one axis and the fluorescence intensity of the CD16 marker on the second axis. The machine learning model at that node in the hierarchy can use the same two outputs. These inputs represent coded values for proxies of the size and quantity of the marker (measured as intensity). These input channels are arbitrarily selected to mimic as closely as possible what the expert would use without including noise from other channels that could overwhelm the provided signal. Biological expertise guides the selection of each channel, or channel, for each immune population subset in the ML ensemble. The input for each ML model can also be the complete set of fluorescence channels available in the panel.
[0079] Ensemble ML models interact with each other to produce accurate predictions of cell types for the entire gating hierarchy. During prediction, ML models are programmatically arranged, with the top-level model forwarding predictions to the next layer of models. For example, a white blood cell (WBC) model uses two adjacent size parameters (side scatter area vs. side scatter height) to determine whether a cell detection event is likely to be a WBC. All events classified as WBC events can then be further predicted using one or more ML models to categorize them into side scatter high (SSChi) and side scatter low (SSClo) event types using the forward scatter (FSC) and side scatter (SSC) channels. Cell detection events predicted as SSChi are then further characterized to determine whether they are positive for expression of the CD15 cell surface marker. Each step in this process is performed by a specialized ML model trained for this specific purpose. This process is followed for each node in the gating hierarchy.
[0080] The machine learning platform 110 independently uses one or more machine learning models to generate cell type or subtype predictions at each leaf (i.e., a terminal node with no child nodes) in a defined gating hierarchy, which may have the same or different architectures and act together in an ensemble. Predictions from these models can either be cascaded recursively through the hierarchy from more general to more specific (e.g., from parent to child), or each cell can be classified independently of all other cells and all other gating hierarchy decisions.
[0081] The trained machine learning model(s) can be configured to classify individual cells as belonging to one of a plurality of unique cell subpopulations, e.g., immune cell subpopulations. In some cases, the plurality of unique cell subpopulations (e.g., unique immune cell subpopulations) can include at least 1000, 1200, 1400, 1600, 1800, 2000, 2200, or 2400 unique immune cell subpopulations.
[0082] For example, non-limiting examples of immune cell subpopulations (or subtypes) that can be identified using the A-panel and T-panel sets of fluorescently labeled antibodies described above are shown in Table 2.
[0083] [Table 2]
[0084] FIG. 2 shows a non-limiting example flowchart of a machine learning training process 200. A domain expert can generate and store manually labeled training data 206 from selected FCS files. The machine learning expert can define the machine learning architecture, training algorithm, and hyperparameters 202 to generate an ML script. The ML script can then be executed to train machine learning models 204 upon the selection of manually labeled data 206, which can then be stored in a database 208. The output models can be validated using a holdout dataset 212 that is also manually labeled by the expert. The validation results can be manually confirmed by a data science expert and / or a biologist 210. Models that fail validation can be tuned and retrained.
[0085] The training data used to train the machine learning model can be manually generated using a specialized gated subset of exemplary immunophenotyping output. Using these manually gated cells, the machine learning model can be trained to be able to identify similar cells based on the fluorescence profile, as characterized by a spectral flow cytometer, for a given cell type or subtype.
[0086] In some cases, using training data generated by multiple experts can remove bias from the gating process, so replacing the automated gating process with manual gating may produce different (and potentially erroneous) results.
[0087] As described above, machine learning model predictions can be validated against a holdout set of a manual gating file generated by a domain expert using a prediction criterion that is a comparison with a manually gated holdout set. Examples of prediction criterion values include, but are not limited to, comparing the mean of the cell population distribution from the predicted set to within 5%, 10%, 20%, or 30% of the mean of the cell population distribution for the holdout set; comparing the standard deviation of the cell population distribution from the predicted set to within 5%, 10%, or 20% of the standard deviation of the cell population distribution for the holdout set; and / or correlating the cell population ratio for the predicted set with the cell population ratio for the holdout set by more than 85%, 90%, 95%, or 98%. A 100% correlation means that the ML model predicted exactly the same cell population ratio as measured by manual gating.
[0088] In some cases, the threshold for measuring the validity of the ML model predictions can be adjusted by thresholding the standard deviation difference, for example, using a standard deviation difference threshold of less than 5%, less than 10%, less than 15%, or less than 20%.
[0089] In some cases, the threshold for measuring the effectiveness of the ML model predictions can be adjusted by thresholding the correlation using a correlation threshold of greater than 80%, 85%, 90%, 95%, or 98%.
[0090] Models that pass validation can be recognized as good predictors and can be stored in database 208. The remaining models can be passed through one or more additional rounds of training by data science experts and / or biologists 210, with each round using more training data, algorithm changes, or hyperparameter tuning, until they pass validation.
[0091] In some cases, the machine learning model architecture may be, e.g. · Gradient Effect Boosted Tree machine learning algorithm with 1000 or fewer trees; Deep neural networks, typically containing 2-4 hidden layers and no more than 250 nodes per layer; Convolutional neural networks, e.g., with seven layers or less; and / or For example, it can include an autoencoder with 7 or fewer layers (e.g., 5 convolutions, 2 pooling).
[0092] Model hyperparameters can be adjusted for each model training for a specific application.
[0093] The trained and validated model can be used to predict immune cell subpopulation numbers and frequencies in all appropriate samples. The predictions for each cellular subpopulation within a given sample can be compiled to generate a systems view of the immune status of the individual from whom the sample was collected, which can form all or part of an immune profile.
[0094] 3 shows an exemplary diagram of a process 300 for machine learning model-based prediction and immune profile generation. Machine learning models (stored in database 302) that pass validation are applied to raw FCS files 306 to generate predictions of immune cell population numbers 304. These predictions are collected into immune profiles (stored in database 308) and made available for further analysis 310, including, for example, queries of aggregated immune profiles (e.g., aggregated according to individual or patient demographic or clinical data, including, but not limited to, age, sex, race, family history, disease diagnosis, etc.) stored in database 312.
[0095] Identification of the most appropriate machine learning algorithms and metadata parameters can be advantageous in helping to generate high-quality predictions on immune data. The system can be configured to use the most appropriate model architecture, which can continue to iterate. The training of the model(s) can also be updated periodically or continuously as new training data becomes available.
[0096] In some cases, an ensemble of machine learning models comprising over 2000 individual machine learning models can be used to perform automated analysis on a sample. Training and management of such large numbers of machine learning models can be achieved through a programmatic infrastructure. The platforms and methods disclosed herein can use a software platform to manage the application of trained machine learning models to raw flow cytometry data to generate cell count predictions.
[0097] The advantages of the disclosed platforms and methods can be provided by standardizing immunophenotyping platforms and enabling machine learning-based data processing automation, allowing the disclosed platforms and methods to translate highly complex FSFC data into immune profiles for use in clinically relevant time frames (e.g., hours rather than days). In some cases, immune profiles for samples can be generated in less than 24 hours, less than 12 hours, less than 10 hours, less than 8 hours, less than 6 hours, or less than 4 hours.
[0098] The output of the prediction process can include an immune profile. This profile can include immune cell counts and / or subpopulation frequencies for all measured cell subpopulations. These immune profiles represent a snapshot of the donor's immune system at the time of sample collection, and several long samples can be compiled to show immune trajectories. Additionally, several different donor immune signatures can be compiled to discover statistically significant population-level immune signatures. Displaying these various use cases may depend on specific project requirements.
[0099] Compared to those using different panels, different algorithms, and / or different automated gating approaches, the methodologies described herein can have advantages. Advantages can include generating more accurate and effective cell clustering, generating cell type or subtype predictions, and generating immune profiles. The accuracy of the disclosed machine learning models can be evaluated based on, for example, precision (e.g., the percentage of cells of that type that are actually classified as a given cell type), recall (e.g., the percentage of cells in a dataset that are correctly classified as belonging to a given cell type), and F1 score (i.e., an accuracy metric that combines the precision and recall of a model and measures how many times the model makes correct predictions across all datasets).
[0100] Purpose The disclosed methods and systems for generating immune profiles can be used in a variety of biomedical research and clinical diagnostic applications, including, but not limited to, diagnosing immune-related diseases and disorders, diagnosing autoimmune diseases and cancer, diagnosing and / or identifying individual-specific responses to infectious diseases, predicting response to treatment either before or during treatment, and donor and product characterization in cell therapy manufacturing.
[0101] Systems for immune system phenotyping and automated cell sorting Also disclosed herein is a system designed to implement any of the disclosed methods for generating an immune profile for a sample from a subject. The system may include, for example, one or more processors and a memory unit communicatively coupled to the one or more processors and configured to store instructions that, when executed by the one or more processors, cause the system to receive fluorescence intensity data or data derived therefrom acquired using a full-spectrum flow cytometer, process a fluorescently labeled cell sample collected from the subject, provide at least a subset of the fluorescence intensity data or data derived therefrom for a plurality of fluorescently labeled cells as input to an ensemble machine learning model configured to process the fluorescence intensity data or data derived therefrom and classify individual cells of the plurality of fluorescently labeled cells as belonging to one of a plurality of unique immune cell subpopulations, and output a total cell number or cell frequency for each of the plurality of unique immune cell subpopulations in the sample as part of the immune profile for the subject. In some cases, the system may further include a full-spectrum flow cytometer (FSFC) instrument.
[0102] Also disclosed is a non-transitory computer-readable storage medium that can include instructions for operating a system configured to perform any of the disclosed methods for generating an immune profile for a sample from a subject. For example, the non-transitory computer-readable storage medium is described, which stores one or more programs, which, when executed by one or more processors of the system, cause the system to receive or process fluorescence intensity data or data derived therefrom acquired using a full-spectrum flow cytometer, a fluorescently labeled cell sample collected from the subject, provide at least a subset of the fluorescence intensity data or data derived therefrom for a plurality of fluorescently labeled cells as input to an ensemble machine learning model configured to process the fluorescence intensity data or data derived therefrom and classify individual cells of the plurality of fluorescently labeled cells as belonging to one of a plurality of unique immune cell subpopulations, and output a total cell number or cell frequency for each of the plurality of unique immune cell subpopulations in the sample as part of the immune profile for the subject.
[0103] Computer Processors and Computing Systems 4 shows an exemplary computing system, according to some embodiments. The computing system 400 can be a component of a system for generating an immune profile for a sample from a subject.
[0104] The computing system 400 may comprise a host computer connected to a network. The computing system 400 may be a client computer or a server. As shown in Figure 4, the computing system 400 may comprise any suitable type of microprocessor-based device, such as a personal computer; a workstation; a server; or a handheld computing device, such as a phone or tablet. The computer may include, for example, one or more of a processor 410, an input device 420, an output device 430, memory storage 440, and a communication device 460.
[0105] The input device 420 can be any suitable device that provides input, such as a touchscreen or monitor, a keyboard, a mouse, or a voice recognition device. The output device 430 can be any suitable device that provides output, such as a touchscreen, a monitor, a printer, a disk drive, or a speaker.
[0106] Memory storage 440 can be any suitable device providing storage, such as electrical, magnetic, or optical memory, including RAM, cache, a hard drive, a CD-ROM drive, a tape drive, or a removable storage disk. Communication device 460 can include any suitable device capable of sending and receiving signals over a network, such as a network interface chip or card. Computer components can be connected in any suitable manner, such as via a physical bus or wirelessly. Memory storage 440 can be a non-transitory computer-readable storage medium containing one or more programs that, when executed by one or more processors, such as processor 410, cause the one or more processors to perform any one of the methods described herein.
[0107] Software 450, which may be stored in memory storage 440 and executed by processor 410, may include, for example, programming that embodies functionality of the present disclosure (e.g., functionality incorporated in the methods, systems, computers, servers, and / or devices described above). In some embodiments, software 450 may be implemented and executed on a combination of servers, such as an application server and a database server.
[0108] The software 450 may also be stored on and / or transported to any computer-readable storage medium used by or connected to an instruction execution system, apparatus, or device such as those described above, from which instructions associated with the software may be retrieved and executed. In the context of the present disclosure, a computer-readable storage medium may be any medium that contains or can store programming, such as storage 440, used by or connected to an instruction execution system, apparatus, or device.
[0109] The software 450 may also be propagated to any carrier medium used by or connected to an instruction execution system, instruction execution apparatus, or instruction execution device, such as those described above, from which instructions associated with the software may be retrieved and executed. In the context of this disclosure, a carrier medium may be any medium capable of communicating, propagating, or carrying programming, used by or connected to an instruction execution system, instruction execution apparatus, or instruction execution device. Transport-readable media include, but are not limited to, electronic, magnetic, optical, electromagnetic, or infrared wired or wireless propagation media.
[0110] Computer system 400 may be networked, which may be any suitable type of interconnected communication system. The network may implement any suitable communication protocol and may be protected by any suitable security protocol. The network may include any suitable configuration of network links capable of implementing the transmission and reception of network signals, such as a wireless network connection, a T1 or T3 line, a cable network, DSL, or a telephone line.
[0111] Computing system 400 may implement any operating system suitable for operating on a network. Software 450 may be written in any suitable programming language, such as C, C++, Java, or Python. In various embodiments, application software embodying functionality of the present disclosure may be deployed in different configurations, such as, for example, in a client / server configuration or as a web-based application or web service via a web browser. [Example]
[0112] The following examples are included for illustrative purposes only and are not intended to limit the scope of the disclosure.
[0113] Example 1 - Manual gating of full spectrum flow cytometry data One of the fundamental principles of flow cytometry data analysis is "gating," which is the sequential identification and refinement of cell populations of interest based on the labeling of a panel of cell-type-specific molecules (also known as markers), for example, using fluorescently labeled antibodies that are detected by fluorescence (Verschoor, et al. (2015), "An Introduction to Automated Flow Cytometry Gating Tools and Their Implementation," Frontiers in Immunology Vol. 6, Article 380). Figure 5 shows a non-limiting schematic diagram of the manual gating process for processing full-spectrum flow cytometry data. Fluorescence intensity data or data derived therefrom (e.g., forward scatter data, side scatter data, live / dead cell staining, autofluorescence, etc.) are generated by the FSFC instrument and reviewed by an expert practitioner. Cell types or subtypes are identified by selecting sections or subsections of plotted fluorescence data (e.g., fluorescence intensity data plotted as a heatmap in the individual panels of Figure 5) and further refining them using multiple additional criteria (each additional criterion is applied as a "gate" to an analytical component). The number of cell detection events enclosed within each gate equals the number of cells identified for that cell population or subpopulation. As shown in Figure 5, exemplary gating criteria include SSC-A (side scatter-area), FSC-A (forward scatter-area), SSC-H (side scatter-height), FSC-H (forward scatter-height), CD45 (fluorescent signal generated from a fluorescently labeled anti-CD45 monoclonal antibody), and the like. In some of the panels, the numbers associated with a particular cell type indicate the percentage of cells from the previous gate that meet the current gating criteria.
[0114] Example 2 - Training a neural network model for immune cell type classification This example provides an illustration for training a machine learning model (e.g., a neural network) that accurately predicts immune cell types using predefined gates from experts. Initial assumptions underlying the development of the model included: (i) FCS files contain data for hundreds of thousands of cell detection events, (ii) each cell detection event represents a training opportunity, and (iii) cell types, subtypes, and states are uniquely identifiable by inclusion in a series of labeled gates.
[0115] Figure 6 shows a non-limiting example of a simplified gating hierarchy used to build a neural network for immune cell classification. Each node in the gating hierarchy represents a different immune cell type, subtype, or cell state. Starting from the root, side scatter high (SSChi) and side scatter low (SSClow) signals can be used to distinguish, for example, leukocytes (eosinophils and neutrophils) from other immune cells. Then, fluorescent signals associated with appropriate cell surface marker labeling can be used to distinguish between eosinophils and neutrophils. As described in the gating hierarchy shown in Figure 6, detection of fluorescent signals associated with labeling of other cell surface markers (e.g., CD19, CD3T, GDT, CD4, CD8, etc.) can be used to classify immune cells into multiple unique immune cell subtypes.
[0116] Table 3 shows a non-limiting example of a panel of fluorescently labeled monoclonal antibodies that can be used to detect cell surface markers (adapted from Mahnke, et al. (2012), "OMIP-013: Differentiation of Human T-Cells", Cytometry Part A 81A: 935-936).
[0117] [Table 3]
[0118] FIG. 7 shows a simplified schematic diagram of a neural network including an input layer containing two or more nodes (or "perceptrons") and at least one hidden layer (containing four or more nodes), and an output layer (containing a single node in this non-limiting example). In general, a neural network can include any total number of layers and any number of hidden layers, where the hidden layer functions as a trainable feature extractor that enables mapping of a sequence of input data to a desired output value or sequence of output values. Each layer of a neural network includes multiple nodes (or perceptrons). A node receives input either directly from input data (e.g., flow cytometry data or data derived therefrom) or the output of a node in a previous layer and performs a specific operation, e.g., a summarization operation. In some cases, the connection from the input to the node is associated with a weight (or weight coefficient). In some cases, a node may sum the products of all pairs of inputs from previous layers and their associated weights, for example. In some cases, the weighted sum is offset by a bias b. In some cases, the output of a node can be gated using a threshold or activation function f, which can be a linear or nonlinear function. The activation function can be, for example, a rectified linear unit (ReLU) activation function or other functions such as saturated bipolar tangent, identity, binary step, logistic, arcTan, softdesign, parametric rectified linear unit, exponential linear unit, softPlus, bent identity, softExponential, sinusoidal, sine, Gaussian, or sigmoidal function, or any combination thereof.
[0119] The neural network's weight coefficients, bias values, and thresholds, or other operational parameters, can be "taught" or "educated" in a training phase using one or more sets of training data. For example, the parameters can be trained using a training dataset and the input data from a gradient descent or backpropagation neural network to predict output values (e.g., immune cell classifications) that match the examples contained in the training dataset. The adjustable parameters of the model can be obtained, for example, using a backpropagation neural network training process, which may or may not be implemented using the same hardware used to perform the immune cell classification.
[0120] Neural networks, such as those shown schematically in Figure 7, were trained using labeled training data divided into a training (typically 80% of the total) data set and a test (typically 20% of the total) data set. For each cell detection event in the training data set, the data was matched to the appropriate input node, an initial classification was performed in the neural network model, the output was compared with the actual (known) output for the detection event, the internal model weights were adjusted, and the cycle was repeated. The trained model was then tested by processing reserved test data and comparing the model's predictions for cell type with the actual (known) cell type. Model performance was evaluated using metrics such as precision and recall.
[0121] Figure 8 shows a non-limiting example of test data generated by a trained neural network classifier. Support, shown in the right-most column, is the number of cells classified by manual gating as belonging to the indicated cell type. Model performance was evaluated for 16 immune cell types by calculating precision, recall, and F1 scores, as described elsewhere herein. Micro-mean: A metric calculated overall by counting the total number of true positives, false negatives, and false positives. Macro-mean: A metric calculated for each label (cell type) to measure an unweighted mean (this does not take into account label imbalance). Weighted mean: A metric calculated for each label (number of true examples for each label) to measure a mean weighted by support. This weighted mean can change the "macro" calculation to account for label imbalance, resulting in an F-score that is not between precision and recall. Sample mean: A metric calculated for each example to measure the mean (this calculation is only meaningful for multi-label calculations, where it differs from the precision score). As can be seen, the weighted means for precision and recall were 0.94 and 0.92, respectively, in this study.
[0122] Figure 9 shows a non-limiting example of test data generated by the trained neural network classifier after training the model on a much larger training dataset. As can be seen, the weighted averages for precision and recall were 0.98 and 0.98, respectively, in this study.
[0123] Figure 10 shows a non-limiting example of validation data (i.e., model-predicted data for input data not previously provided to the model) for a trained neural network classifier. The weighted averages for precision and recall were 0.99 and 0.98, respectively, in this study.
[0124] Figure 11 shows another non-limiting example of validation data generated by a trained neural network classifier. In this validation run, the weighted averages for precision and recall were 0.98 and 0.91, respectively, in this study.
[0125] Example 3 - Application of ensemble machine learning models to automated cell counting An FCS file consists of data about the fluorophores and associated markers used when performing a flow cytometry experiment. The FCS file also contains data corresponding to each fluorescent channel for each event detected by the flow cytometer. A traditional approach to extracting the number and frequency of immune cell populations from a sample from an FCS file (e.g., measuring the percentage of identified neutrophils among all detected white blood cells) involves using specialized software (e.g., FlowJo, BD Biosciences, Ashland, OR). This manual method requires opening the file in the software, visually inspecting the events across multiple bivariate plots, and identifying cell clusters that correspond to well-established biological phenotypes. This manual method also requires utilizing the association of fluorophores with immune cell surface proteins defined by the immunophenotyping panel design. Then, following background immunological expertise that confirms cell surface marker distribution, an enclosing polygon is manually created to define immune cell clusters in two-dimensional space. When all immune cell subsets of interest are identified using the present method (2000+, in the case of IMU (IMU Biosciences, London, UK) phenotyping panel), a signature of cell population proportions can be extracted.
[0126] To increase the speed and accuracy of this cell counting and classification process, automated cell classification can be applied. Given a standard gating hierarchy, a set of FCS files, and a set of reference configurations for each gate stored in a manually gated workspace file, an ensemble machine learning model can be trained to measure cell population signatures. The ensemble machine learning model includes a set of individual ML models, each trained to predict whether an event falls inside or outside the configured gates. Once the set of models is trained, a signature can be constructed without human intervention as follows:
[0127] For each event in the FCS file, a machine learning ensemble is invoked. Each node in the ensemble can consist of one machine learning model configured to receive input from, for example, each fluorescence channel. The machine learning model has an internal state learned during the training phase and outputs a set of events predicted to be surrounded by a defined gate represented in the ML model. These ML-filtered events are then cascaded to one or more predicted outputs corresponding to the number of child gates at the next level of the gating hierarchy. The set of machine learning models in the ensemble is organized into a directed acyclic graph structure, with the root node being the first filter in the gating hierarchy. To begin the immune signature prediction process, the complete set of events in the FCS file is passed as input to the root machine learning model. Events that fit the gating definitions included in the ML model are trained and provided as input to the next model(s) in the hierarchy. Each node in the hierarchy represents an immune cell or state, and when this process is complete, the number and frequency of all nodes can be extracted to contain the complete immune signature for the data file.
[0128] 12 shows a non-limiting example of data flowing through a cascade prediction from node to node, i.e., from parent ML model output to child ML model input. In this example, fluorescence intensity is input to the root node (white blood cell (WBC)) model, which classifies individual detection events into two output categories (WBC subtypes). Fluorescence intensity data for the latter is then input to the next set of models in the gating hierarchy, e.g., either the eosinophil ML model and the neutrophil ML model, or the B cell ML model and the T cell ML model. As shown, fluorescence intensity data for a subset of T cell gating events can then be input, for example, to the INKT cell ML model or the γD cell ML model.
[0129] Example 4 - Application of immune signatures to highlight biologically meaningful insights. To demonstrate the efficacy of the disclosed immunophenotypic profiling method for identifying and correcting for changes in overall health lifestyle and health factors in donor samples, analyses were performed to measure the extent to which vitamin D fluctuations affect immune system parameters. A dataset of 609 donors (58% female, age 38.7±12.5 years) was assembled, and fresh blood was collected and analyzed using the FCSC approach defined herein. Additionally, comprehensive overall health blood testing (including lipid profiles, vitamin levels, and approximately 50 other biomarkers) was performed on the same set of donors at the same experimental time points. An automated cell classification method generated immune signatures for all donors. These signatures were then evaluated using linear regression models to detect significant changes in immune cell populations based on vitamin D levels, with sex and age group (<30 years: young, 30-60 years: middle-aged, 60+ years: elderly) as covariates. After applying multiple testing corrections, a total of 60 immune populations were identified that spanned both genders and age groups and were affected by variations in vitamin D levels.
[0130] Vitamin D's significant interaction and trajectory with Th2 cells, implicated in allergic responses (Georas, et al. (2005), "T-helper cell type-2 regulation in allergic disease", Eur Respir J. 26(6):1119-37), and Th17 cells, implicated in autoimmune and infectious disease response pathways (Zambrano-Zaragoza, et al. (2014), "Th17 cells in autoimmune and infectious diseases", Int J Inflam. 2014:651503), provides reference ranges for the baseline phenotypes of these two critically important populations in healthy cohorts, which can be used as monitoring and diagnostic reference points for patients undergoing treatment for either of these conditions.
[0131] 13A-B show non-limiting example plots showing the trajectories of T helper 2 (FIG. 13B) and T helper 17 (FIG. 13A) cell populations decreasing with increasing vitamin D levels in 609 healthy individuals across age groups, considering sex as a covariate.
[0132] Illustrative Embodiments Exemplary methods, systems, and computer-readable storage media are described in the following series.
[0133] Embodiment 1. A method for generating an immune profile for a subject, comprising: contacting at least a first aliquot of a sample from the subject with at least a first immunophenotyping panel to fluorescently label cells contained in the sample; processing the fluorescently labeled cells using a full spectrum flow cytometer to generate fluorescence intensity data or data derived therefrom for a plurality of fluorescently labeled cells from the sample; providing at least a subset of the fluorescence intensity data or data derived therefrom for the plurality of fluorescently labeled cells as input to an ensemble machine learning model configured to process the fluorescence intensity data or data derived therefrom and to classify individual cells of the plurality of fluorescently labeled cells as belonging to one of a plurality of unique immune cell subpopulations; outputting the total cell number or cell frequency for each of the plurality of unique immune cell subpopulations in the sample as part of an immune profile for the subject; The method comprising:
[0134] Embodiment 2. The method of embodiment 1, wherein the ensemble machine learning model is optimized in a cascaded hierarchical tree structure including multiple nodes, and each node includes an individual machine learning model.
[0135] Embodiment 3. The method of embodiment 2, wherein each individual machine learning model includes one input dataset and 1 to 8 output datasets corresponding to branches of the cascaded hierarchical tree structure.
[0136] Embodiment 4. The method of embodiment 2 or embodiment 3, wherein each individual machine learning model comprises a neural network model.
[0137] Embodiment 5. The method of embodiment 2 or embodiment 3, wherein each individual machine learning model comprises a gradient descent boosted tree model.
[0138] Embodiment 6. The method of any one of embodiments 2 to 5, wherein the plurality of nodes includes at least 1000, 1200, 1400, 1600, 1800, 2000, 2200, or 2400 nodes.
[0139] Embodiment 7. The method of any one of embodiments 2 to 6, wherein the number of individual machine learning models in the ensemble machine learning model is equal to the number of unique immune cell subpopulations in the plurality of unique immune cell subpopulations.
[0140] Embodiment 8. The method of any one of embodiments 2 to 7, wherein the design of the cascade hierarchical tree structure is based, at least in part, on manually gated fluorescence intensity data for one or more control samples or expert analysis of data derived therefrom.
[0141] Embodiment 9. The method of any one of embodiments 1 to 8, wherein individual cells are sorted independently of all other cells in the plurality of fluorescently labeled cells.
[0142] Embodiment 10. The method of any one of embodiments 1 to 8, wherein individual cells are recursively classified with every other cell of said plurality of fluorescently labeled cells.
[0143] Embodiment 11. The method of any one of embodiments 1 to 10, wherein the ensemble machine learning model is trained using one or more labeled training datasets generated by an expert by manual gating of fluorescence intensity data for, or data derived therefrom, one or more control samples.
[0144] Embodiment 12. The method of embodiment 11, wherein each individual machine learning model in the ensemble machine learning model is trained individually using the one or more labeled training datasets.
[0145] Embodiment 13. The method of embodiment 11 or embodiment 12, wherein during training, predictions of individual models are used to validate the individual models but are not propagated through the ensemble machine learning model, thereby eliminating error propagation during training.
[0146] Embodiment 14. The method of embodiment 11, wherein the individual machine learning models in the ensemble machine learning model are trained together using a recursive training method.
[0147] Embodiment 15. A method according to any one of embodiments 11 to 14, wherein the training of the ensemble machine learning model is controlled by one or more hyperparameter values that are the same for each node in the cascaded hierarchical tree structure.
[0148] Embodiment 16. A method according to any one of embodiments 11 to 14, wherein the training of the ensemble machine learning model is controlled by one or more hyperparameter values that differ for a subset of nodes in the cascaded hierarchical tree structure.
[0149] Embodiment 17. The method of any one of embodiments 11 to 16, wherein the training of the ensemble machine learning model is controlled by one or more hyperparameter values determined by performing a random grid search of value ranges for the one or more hyperparameters.
[0150] Embodiment 18. The method of any one of embodiments 1 to 17, further comprising performing a mathematical transformation of the fluorescence intensity data or data derived therefrom before using the transformed fluorescence intensity data as input to the ensemble machine learning model.
[0151] Embodiment 19. The method of any one of embodiments 1 to 18, wherein the fluorescence intensity data or data derived therefrom comprises fluorescence intensity data for at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, 35, or 40 fluorescence detection channels.
[0152] Embodiment 20. The method of embodiment 19, wherein the fluorescence intensity data or data derived therefrom further comprises forward scatter height data, forward scatter area data, side scatter height data, side scatter area data, autofluorescence data, or any combination thereof.
[0153] Embodiment 21. The method of any one of embodiments 1 to 20, wherein the sample comprises a blood sample, a buffy coat sample, or a cell suspension.
[0154] Embodiment 22. The method of any one of embodiments 1 to 21, wherein the at least one immunophenotyping panel comprises a panel of fluorescently labeled antibodies directed against cell surface proteins associated with antigen-presenting cells (APCs).
[0155] Embodiment 23. The method of embodiment 22, wherein said panel of fluorescently labeled antibodies comprises fluorescently labeled antibodies directed against IGM, CD5, CD62L, CD294, CD69, CD38, PD1, CD11C, CD3, CD8, HLADR, CD24, CD337, CD123, CD141, CD1C, CD4, TACI, CD319, CD335, PDL1, CD10, CD45, CD16, IGD, CD40, CD19_TCRGD, CD43, CD14, CD138, CD15, CD56, CD86, CD303, CD27, or any combination thereof.
[0156] Embodiment 24 The method of embodiment 23, wherein said panel of fluorescently labeled antibodies further comprises fluorescently labeled antibodies directed against cell surface markers indicative of live cells, dead cells, or both.
[0157] Embodiment 25. The method of any one of embodiments 1 to 24, wherein said at least one immunophenotyping panel comprises a panel of fluorescently labeled antibodies directed against cell surface proteins associated with T cells.
[0158] Embodiment 26. The method of embodiment 25, wherein said panel of fluorescently labeled antibodies comprises fluorescently labeled antibodies directed to TIGIT, CD5, CD28, CXCR5, CD39, TIM3, CD38, PD1, TCRVA7_2_TCRVD1, CD95, CD3, CD8, HLADR, CD31, CCR4, CCR6, CCR7, CD57, ICOS, CD4, KLRG1, TCRVA24_JA18, CD122, CD103, CXCR3, TCRVD2, CD45, CCR10, CD16, CD25, CD161, CD19_TCRGD, LAG3, CD14, CD45RO, CD56, CD127, CD45RA, CD27, or any combination thereof.
[0159] Embodiment 27. The method of any one of embodiments 1 to 26, wherein the plurality of unique immune cell subpopulations comprises at least 1000, 1200, 1400, 1600, 1800, 2000, 2200, or 2400 unique immune cell subpopulations.
[0160] Embodiment 28. The method of any one of embodiments 1 to 27, wherein the plurality of distinct immune cell subpopulations comprises white blood cells (WBCs), eosinophils, eosinophils / CD5+, neutrophils, neutrophils / large, neutrophils / CD5+, neutrophils / small, B cells, B cells / CD5- CD27-, monocytes / CD56+, monocytes / CD56-, NK cells, dendritic cells (DCs), T cells, iNKT cells, gamma delta T cells (total GD), Vd1 cells, Vd2 cells, Vdx cells, mucosal-associated invariant T (MAIT) cells, TEMRA cells, CD4 naive cells, T helper cells, CD4 effector memory cells, Treg cells, or any combination thereof.
[0161] Embodiment 29. The method of any one of embodiments 1 to 28, wherein the total cell count or cell frequency for each of the plurality of unique immune cell subpopulations in the sample is output as part of an immune profile in less than 24 hours, less than 12 hours, less than 8 hours, less than 6 hours, or less than 4 hours.
[0162] Embodiment 30. The method of any one of embodiments 1-29, wherein the immune profile is used to diagnose an immune-related disease or disorder, monitor the progression of an immune-related disease or disorder, or monitor the response to a treatment for an immune-related disease or disorder in the subject.
[0163] Embodiment 31. A computer-implemented method for generating an immune profile for a subject, comprising: receiving fluorescence intensity data generated using or derived from a full-spectrum flow cytometer and processing a sample of fluorescently labeled cells collected from said subject; providing at least a subset of the fluorescence intensity data or data derived therefrom for a plurality of fluorescently labeled cells as input to an ensemble machine learning model configured to process the fluorescence intensity data or data derived therefrom and to classify individual cells of the plurality of fluorescently labeled cells as belonging to one of a plurality of unique immune cell subpopulations; outputting the total cell number or cell frequency for each of the plurality of unique immune cell subpopulations in the sample as part of an immune profile for the subject; The method comprising:
[0164] Embodiment 32. A system comprising: one or more processors; communicatively coupled to the one or more processors and, when executed by the one or more processors, to the system, receiving fluorescence intensity data generated using or derived from a full-spectrum flow cytometer to process a fluorescently labeled cell sample collected from a subject; providing at least a subset of the fluorescence intensity data for, or data derived therefrom, a plurality of fluorescently labeled cells as input to an ensemble machine learning model configured to process the fluorescence intensity data or data derived therefrom and to classify individual cells of the plurality of fluorescently labeled cells as belonging to one of a plurality of unique immune cell subpopulations; and outputting the total cell number or cell frequency for each of the plurality of unique immune cell subpopulations in the sample as part of an immune profile for the subject; a memory configured to store instructions; The system comprising:
[0165] Embodiment 33. The system described in embodiment 32, further comprising a full-spectrum flow cytometer.
[0166] Embodiment 34. A non-transitory computer-readable storage medium storing one or more programs, the one or more programs, when executed by one or more processors of a system, providing the system with: receiving fluorescence intensity data generated using or derived from a full-spectrum flow cytometer to process a fluorescently labeled cell sample collected from the subject; providing at least a subset of the fluorescence intensity data or data derived therefrom for a plurality of fluorescently labeled cells as input to an ensemble machine learning model configured to process the fluorescence intensity data or data derived therefrom and to classify individual cells of the plurality of fluorescently labeled cells as belonging to one of a plurality of unique immune cell subpopulations; and the non-transitory computer-readable storage medium comprising instructions for outputting the total cell number or cell frequency for each of the plurality of unique immune cell subpopulations in the sample as part of an immune profile for the subject.
[0167] From the foregoing, while specific implementations of the disclosed method and system have been described and illustrated, it should be understood that various modifications are possible and contemplated herein. It is also not intended that the present invention be limited to the specific examples provided herein. While the present invention has been described with reference to the above specification, the description and drawings of preferred embodiments herein are not meant to be construed in a limiting sense. Furthermore, it should be understood that all aspects of the present invention are not limited to the specific depictions, configurations, or relative proportions set forth herein, which depend upon a variety of conditions and variables. Various modifications in form and details of the embodiments of the present invention will be apparent to those skilled in the art. It is therefore contemplated that the present invention must also include any such modifications, variations, and equivalents.
Claims
1. A method for generating an immune profile for a subject, At least a first aliquot of the sample derived from the subject is brought into contact with at least a first immunophenotypic measurement panel, and the cells contained in the sample are fluorescently labeled. The fluorescently labeled cells are processed using a full-spectrum flow cytometer to generate fluorescence intensity data or data derived therefrom for multiple fluorescently labeled cells from the sample. To provide at least a subset of fluorescence intensity data or data derived therefrom for the plurality of fluorescently labeled cells as input to an ensemble machine learning model configured to process the fluorescence intensity data or data derived therefrom and to classify individual cells of the plurality of fluorescently labeled cells as belonging to one of a plurality of unique immune cell subpopulations, The total number of cells or cell frequency for each of the multiple unique immune cell subpopulations in the sample is output as part of the immune profile for the subject. The method, including the method described above.
2. The method according to claim 1, wherein the ensemble machine learning model is organized in a cascaded hierarchical tree structure including multiple nodes, and each node includes an individual machine learning model.
3. The method according to claim 2, wherein each individual machine learning model includes one input dataset and one to eight output datasets corresponding to the branching of the cascaded hierarchical tree structure.
4. The method according to claim 2, wherein each individual machine learning model includes a neural network model or a gradient descent boosted tree model.
5. The method according to claim 2, wherein the plurality of nodes include at least 1000, 1200, 1400, 1600, 1800, 2000, 2200, or 2400 nodes.
6. The method according to claim 2, wherein the number of individual machine learning models in the ensemble machine learning model is equal to the number of unique immune cell subpopulations in the plurality of unique immune cell subpopulations.
7. The method according to claim 2, wherein the design of the cascade hierarchical tree structure is at least in part based on expert analysis of manually gated fluorescence intensity data or data derived therefrom for one or more control samples.
8. The method according to claim 1, wherein individual cells are classified independently of all other cells in the plurality of fluorescently labeled cells.
9. The method according to claim 1, wherein individual cells are recursively classified from all other cells of the plurality of fluorescently labeled cells.
10. The method according to claim 1, wherein the ensemble machine learning model is trained using one or more labeled training datasets generated by an expert by manual gating of fluorescence intensity data or data derived therefrom for one or more control samples.
11. The method according to claim 10, wherein each individual machine learning model in the ensemble machine learning model is trained individually using the one or more labeled training datasets.
12. The method according to claim 10, wherein, during training, the predictions of the individual models are used to validate the individual models, but the errors are not propagated through the ensemble machine learning model, thereby eliminating the propagation of errors during training.
13. The method according to claim 10, wherein the individual machine learning models in the ensemble machine learning model are trained together using a recursive training method.
14. The method according to any one of claims 10, wherein the training of the ensemble machine learning model is controlled by one or more hyperparameter values that are the same for each node in the cascaded hierarchical tree structure.
15. The method according to claim 10, wherein the training of the ensemble machine learning model is controlled by one or more different hyperparameter values for a subset of nodes in the cascaded hierarchical tree structure.
16. The method according to claim 10, wherein the training of the ensemble machine learning model is controlled by one or more hyperparameter values, which are measured by performing a random grid search of value ranges for one or more hyperparameters.
17. The method according to claim 1, further comprising performing a mathematical transformation on the fluorescence intensity data or data derived therefrom before using the transformed fluorescence intensity data as input to the ensemble machine learning model.
18. The method according to claim 1, wherein the fluorescence intensity data or data derived therefrom includes fluorescence intensity data for at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, 35, or 40 fluorescence detection channels.
19. The method according to claim 18, wherein the fluorescence intensity data or data derived therefrom further includes forward scattering height data, forward scattering area data, side scattering height data, side scattering area data, autofluorescence data, or any combination thereof.
20. The method according to claim 1, wherein the sample comprises a blood sample, a leukorrhea sample, or a cell suspension.
21. The method according to claim 1, wherein the at least one immunophenotypic measurement panel comprises a panel of fluorescently labeled antibodies directed to antigen-presenting cells (APCs) and cell surface proteins associated with them.
22. The method according to claim 21, wherein the panel of fluorescently labeled antibodies further comprises fluorescently labeled antibodies directed to cell surface markers indicating living cells, dead cells, or both.
23. The method according to claim 1, wherein the at least one immunophenotypic measurement panel comprises a panel of fluorescently labeled antibodies directed to cell surface proteins associated with T cells.
24. The method according to claim 1, wherein the plurality of unique immune cell subpopulations comprises at least 1,000, 1,200, 1,400, 1,600, 1,800, 2,000, 2,200, or 2,400 unique immune cell subpopulations.
25. The method according to claim 1, wherein the plurality of specific immune cell subpopulations include leukocytes (WBCs), eosinophils, eosinophils / CD5+, neutrophils, neutrophils / large, neutrophils / CD5+, neutrophils / small, B cells, B cells / CD5-CD27-, monocytes / CD56+, monocytes / CD56-, NK cells, dendritic cells (DCs), T cells, iNKT cells, γδT cells (total GD), Vd1 cells, Vd2 cells, Vdx cells, mucosa-associated invariant T (MAIT) cells, TEMRA cells, CD4 naive cells, T helper cells, CD4 effector memory cells, Treg cells, or any combination thereof.
26. The method according to claim 1, wherein the total number of cells or cell frequency for each of the plurality of unique immune cell subpopulations in the sample is output as part of an immune profile at less than 24 hours, less than 12 hours, less than 8 hours, less than 6 hours, or less than 4 hours.
27. The method according to claim 1, wherein the immune profile is used to diagnose an immune-related disease or disorder, monitor the progression of an immune-related disease or disorder, or monitor the response to treatment for an immune-related disease or disorder within the subject.
28. A computer-implemented method for generating an immune profile for a subject, Receiving fluorescence intensity data or data derived therefrom generated using a full-spectrum flow cytometer, and processing the fluorescently labeled cell sample collected from the subject, Providing at least a subset of the fluorescence intensity data or data derived therefrom for multiple fluorescently labeled cells as input to an ensemble machine learning model configured to process the fluorescence intensity data or data derived therefrom and to classify individual cells of the multiple fluorescently labeled cells as belonging to one of multiple unique immune cell subpopulations, The total number of cells or cell frequency for each of the multiple unique immune cell subpopulations in the sample is output as part of the immune profile for the subject. The method, including the method described above.
29. It is a system, One or more processors, It is communicatively connected to one or more of the aforementioned processors, and when executed by the one or more of the aforementioned processors, it is configured in the system. The fluorescence intensity data generated using a full-spectrum flow cytometer, or data derived therefrom, is received and used to process the fluorescently labeled cell samples collected from the subject. At least a subset of fluorescence intensity data or data derived therefrom for multiple fluorescently labeled cells is provided as input to an ensemble machine learning model configured to process the fluorescence intensity data or data derived therefrom and to classify individual cells of the multiple fluorescently labeled cells as belonging to one of multiple unique immune cell subpopulations, and The total number of cells or cell frequency for each of the multiple unique immune cell subpopulations in the sample is output as part of the immune profile for the subject. A memory configured to store instructions and The system including the above.
30. The system according to claim 29, further comprising a full-spectrum flow cytometer.