Bioparticle analysis system, information processing apparatus, and information processing method
By using data compression and machine learning models in a biological particle analysis system, and by setting thresholds based on confidence levels, the challenge of rapidly identifying particle sorting targets in flow cytometry has been addressed, thereby improving sorting purity and efficiency.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SONY GROUP CORP
- Filing Date
- 2024-12-13
- Publication Date
- 2026-06-19
AI Technical Summary
Existing flow cytometers face challenges in rapidly and in real-time determining whether particles are sorting targets, especially with the increased analytical complexity when dealing with large amounts of fluorescent substances.
A biological particle analysis system is provided, including an acquisition unit, a compression unit, a gating unit, a training unit, and a threshold setting unit. The system uses data compression and machine learning models to quickly determine whether a particle is a sorting target, and uses confidence levels to set thresholds for sorting.
It enables faster and real-time determination of whether a particle is a sorting target, improves sorting purity and efficiency, and simplifies complex measurement data analysis.
Smart Images

Figure CN122249703A_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to a biological particle analysis system, an information processing device, and an information processing method. Background Technology
[0002] In fields such as medicine and biochemistry, flow cytometry is commonly used to rapidly measure the characteristics of large numbers of particles. A flow cytometer is a device that measures the properties of each particle by illuminating a stream of particles, such as cells or beads, with a light beam and detecting the fluorescence emitted from the particles.
[0003] In addition, devices have been developed, each controlling the destination of particles based on fluorescence information detected by flow cytometry and thus sorting particles that emit specific fluorescence from the measured sample. Such sorting devices are also known as cell sorters.
[0004] In recent years, studies have been conducted to increase the number of fluorescent substances that can be measured in a single flow cytometer, thus enabling more detailed particle analysis. However, increasing the number of fluorescent substances increases the size of the measurement data, and therefore complicates the analysis in flow cytometry.
[0005] Therefore, various methods for analyzing measurement data in flow cytometers have been studied. For example, Patent Document 1 below discloses a technique for estimating the shape information of a target derived from a biological object based on the peak position of a pulse waveform from which target detection is derived by irradiating the biological object with a light beam.
[0006] Reference List
[0007] Patent documents
[0008] Patent Document 1: JP 2017-58361 A Summary of the Invention
[0009] Technical issues
[0010] On the other hand, sorting devices such as cell sorters are required to perform the following process: measuring and analyzing flowing particles and determining whether to sort the particles based on the measurement and analysis results within a limited time after the particles flow through the device.
[0011] Therefore, sorting devices such as cell sorters are needed to determine whether particles are sorting targets more quickly and in real time.
[0012] Solution to the problem
[0013] According to the first disclosure, a biological particle analysis system is provided, comprising: an acquisition unit for acquiring measurement data from biologically derived particles included in a sample; a compression unit for performing data compression processing on the measurement data acquired by the acquisition unit; a gating unit for gating the measurement data compressed by the compression unit into training measurement data and validation measurement data, and then adding labels to the training measurement data; a training unit for establishing a learning model using the training measurement data and the labels; an estimation unit for inputting validation measurement data into the learning model and outputting a confidence level of the validation measurement data; and a threshold setting unit for setting a threshold used when sorting samples based on the confidence level. Attached Figure Description
[0014] Figure 1 This is a block diagram illustrating a configuration embodiment of a biological particle analysis system according to an implementation method.
[0015] Figure 2 This is an explanatory diagram illustrating the filter-based detection mechanism of the measurement section.
[0016] Figure 3 This is an explanatory diagram used to illustrate the spectrum detection mechanism of the measurement section.
[0017] Figure 4 This is a block diagram illustrating a configuration embodiment of the information processing apparatus according to this embodiment.
[0018] Figure 5 This is a table illustrating an embodiment of information relating to the fluorescence of bio-derived particles obtained from a sorting device.
[0019] Figure 6 This is an explanatory diagram showing the results of the clustering process.
[0020] Figure 7 This is an explanatory diagram showing the results of the clustering process.
[0021] Figure 8 This is an illustrative diagram showing the results of dimensionality reduction of information related to the expression levels of fluorescent substances in up to two dimensions of bio-derived particles using the t-SNE algorithm.
[0022] Figure 9 This is a diagram showing the data used for verification according to the first embodiment.
[0023] Figure 10 This is a diagram showing the purity and efficiency of the dimensionality-reduced measurement data of the first embodiment.
[0024] Figure 11 This is a diagram showing the classes and confidence levels of the dimensionality-reduced measurement data according to the first embodiment.
[0025] Figure 12 This is a diagram showing the relationship between the indication mode, purity, and efficiency according to the first embodiment.
[0026] Figure 13 This is a diagram illustrating the case where a threshold is set for each measurement data in the first embodiment.
[0027] Figure 14 This is a diagram illustrating the situation where a threshold is set according to the first embodiment using an ROC curve of measurement data for verification.
[0028] Figure 15 This is a diagram illustrating a display embodiment of measurement data after dimensionality reduction according to the first embodiment.
[0029] Figure 16 This is a diagram illustrating a display embodiment according to the first embodiment, wherein measurement data of the cells to be measured before fluorescence compensation is displayed by distinguishing colors based on similarity.
[0030] Figure 17 This is a diagram illustrating a display embodiment according to the first embodiment, wherein measurement data of the cells to be measured after fluorescence compensation is displayed by distinguishing colors based on similarity.
[0031] Figure 18 It is a functional block diagram for sorting measurement data in deep learning according to the information processing apparatus of the first embodiment.
[0032] Figure 19 This is a flowchart describing the sorting of measurement data in deep learning within the information processing apparatus according to the first embodiment.
[0033] Figure 20 This is a block diagram illustrating a configuration embodiment of a biological particle analysis system according to a variation of the first embodiment.
[0034] Figure 21 This is a functional block diagram illustrating a variation of the information processing system of the first embodiment.
[0035] Figure 22 This is a functional block diagram illustrating a variation of the information processing system according to the first embodiment.
[0036] Figure 23 This is a diagram used to describe the concept of a threshold for cluster-based sorting according to the second embodiment.
[0037] Figure 24 This is a diagram illustrating the concept of the range when the threshold in the cluster-based sorting according to the second embodiment is set to 50%.
[0038] Figure 25 This is a functional block diagram for cluster-based sorting of an information processing apparatus according to the second embodiment.
[0039] Figure 26 This is a diagram illustrating a first embodiment of the FlowSOM circuit according to the second embodiment.
[0040] Figure 27 This is a diagram illustrating a second embodiment of the FlowSOM circuit according to the second implementation.
[0041] Figure 28 This is a diagram illustrating a third embodiment of the FlowSOM circuit according to the second embodiment.
[0042] Figure 29 This is a flowchart describing the cluster-based sorting of the information processing apparatus according to the second embodiment.
[0043] Figure 30 This is a functional block diagram of an information processing system modified according to the second embodiment.
[0044] Figure 31 This is a flowchart illustrating the operation of the first embodiment of the FlowSOM circuit in the second embodiment.
[0045] Figure 32 This is a flowchart illustrating the operation of a third embodiment of the FlowSOM circuit in the second embodiment.
[0046] Figure 33 This is a functional block diagram of the IFCM-based sorting of the information processing apparatus according to the third embodiment.
[0047] Figure 34 This is a flowchart describing the IFCM-based sorting of the information processing apparatus according to the third embodiment.
[0048] Figure 35 This is a functional block diagram of an information processing system according to a variation of the third embodiment.
[0049] Figure 36 This is a hardware configuration diagram illustrating an embodiment of a computer that implements the information processing apparatus according to this embodiment. Detailed Implementation
[0050] Preferred embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings. In this specification and the drawings, components having substantially the same functional configuration are indicated by the same reference numerals, and redundant descriptions are omitted. The description will be given in the following order.
[0051] 0. Basic Concepts
[0052] 0.1. Configuration of the bioparticle analysis system
[0053] 0.2. Configuration of Information Processing Device
[0054] 1. First Implementation Method
[0055] 1.1. Confidence-based sorting
[0056] 1.2. Use of confidence level
[0057] 1.3. Threshold Setting
[0058] 1.3.1. Threshold setting method 1
[0059] 1.3.2. Threshold setting method 2
[0060] 1.3.3. Threshold setting method 3
[0061] 1.4. Visualization using similarity or confidence levels
[0062] 1.5. Functional block diagram of information processing device 300
[0063] 1.6. Description of the operation
[0064] 1.7. Variations
[0065] 2. Second Implementation Method
[0066] 2.1. Confidence-based sorting (clustering)
[0067] 2.2. Thresholds for Cluster-Based Sorting
[0068] 2.2.1. Determining the threshold for each parameter
[0069] 2.2.2. Determination of the threshold by averaging all parameters
[0070] 2.3. Functional block diagram of information processing device 400
[0071] 2.4. FlowSOM Circuit
[0072] 2.5. Description of the operation
[0073] 2.6. Variations
[0074] 2.7. Flowchart of sorting based on FlowSOM
[0075] 3. Third Implementation Method
[0076] 3.1. Confidence-based sorting (imaging flow cytometry)
[0077] 3.2. Functional block diagram of information processing device 600
[0078] 3.3. Description of the operation
[0079] 3.4. Variations
[0080] 4. Hardware Configuration
[0081] 0. Basic Concepts
[0082] In recent years, a sorting method called machine learning-based sorting has been developed, in which target cells are sorted from a sample containing cells, etc., by using machine learning based on measurement data (including, for example, the intensity of fluorescence emitted from labeled cells). The basic concepts of machine learning-based sorting are disclosed in Reference 2, and reference to Reference 2 may be made as appropriate in this disclosure.
[0083] Reference 2: JP 2020-193877 A
[0084] 0.1. Structure of the bioparticle analysis system
[0085] A biological particle analysis system will be described.
[0086] First, refer to Figure 1 The configuration of the biological particle analysis system 1 according to the embodiment is described. Figure 1 This is a block diagram illustrating a configuration embodiment of the biological particle analysis system 1 according to an implementation method.
[0087] like Figure 1 As shown, the biological particle analysis system 1 according to this embodiment includes: a sorting device 10, which acquires measurement data from a sample S and sorts particles to be sorted based on the determination of an information processing device 20; and an information processing device 20, which analyzes the measurement data acquired by the sorting device 10 and determines whether a particle is a sorting target. The biological particle analysis system 1 can be used as, for example, a so-called cell sorter.
[0088] For example, sample S is a bio-derived particle, such as a cell, microorganism, or biologically related particle, and includes multiple populations of bio-derived particles. Sorting device 10 can analyze measurement data of sample S to classify the bio-derived particles into multiple populations, including internally bound populations and externally separated populations, and sort specific taxonomic populations. For example, sample S can be: cells, such as animal cells (e.g., blood cells) or plant cells; microorganisms, such as bacteria, such as Escherichia coli; viruses, such as tobacco mosaic virus; or fungi, such as yeast; biologically related particles that constitute cells, such as chromosomes, liposomes, mitochondria, or various organelles (cell organelles); or bio-derived particles, such as biologically related polymers, such as nucleic acids, proteins, lipids, polysaccharides, or complexes thereof.
[0089] Examples of sample S include synthetic particles, such as latex particles, gel particles, and industrial particles. Industrial particles can be, for example, organic or inorganic polymer materials and metals. Examples of organic polymer materials include polystyrene, styrene-divinylbenzene, and polymethyl methacrylate. Examples of inorganic polymer materials include glass, silica, and magnetic materials. Examples of metals include gold colloids and aluminum. Each particle can be spherical or non-spherical. The particles can include cavities and can be configured to trap bio-derived particles within the cavities. The size and mass of each particle can be suitably selected by those skilled in the art and are not particularly limited thereto.
[0090] Here, sample S is labeled (stained) with one or more fluorescent dyes. Sample S with fluorescent dyes can be labeled by known methods. For example, when sample S includes cells, fluorescently labeled antibodies are mixed with the cells to be measured, each fluorescently labeled antibody selectively binding to antigens present on the cell surface, and each fluorescently labeled antibody binds to antigens on the cell surface, thus the cells to be measured can be labeled with fluorescent dyes.
[0091] Fluorescently labeled antibodies are antibodies to which a fluorescent dye binds as a label. Specifically, fluorescently labeled antibodies can be obtained by binding a fluorescent dye that binds to avidin via an avidin-biotin reaction to a biotinylated antibody. Alternatively, fluorescently labeled antibodies can be antibodies to which a fluorescent dye binds directly. Note that the antibody can be a polyclonal antibody or a monoclonal antibody. Furthermore, there are no particular limitations on the fluorescent dye used to label cells, and at least one or more known dyes used for staining cells, etc., can be used.
[0092] The sorting device 10 includes a measuring unit and a sorting unit. The sorting device 10 can be a so-called flow cell type sorting device or a microfluidic chip type sorting device.
[0093] The measurement unit measures the fluorescence emitted from the sample S by illuminating it with a beam of light, such as a laser. Specifically, the measurement unit aligns itself with the sample S in one direction by forming a sheath fluid in which the sample S is dispersed as a laminar flow. At this point, the measurement unit irradiates the aligned sample S with a laser having a wavelength capable of exciting a fluorescent dye labeled the sample S, and the fluorescence generated from the laser-irradiated sample S is photoelectrically converted by a known photoelectric conversion element, such as a charge-coupled device (CCD), complementary metal-oxide-semiconductor (CMOS), photodiode, or photomultiplier tube (PMT). This allows the measurement unit to acquire the fluorescence from the sample S.
[0094] The detection mechanism for fluorescence from sample S in the measurement unit can be a filter-based system or a spectrum-based system. Reference will be made here. Figure 2 and Figure 3 Describe the detection mechanism of fluorescence from sample S. Figure 2 This is an explanatory diagram used to illustrate a filter-based detection mechanism. Figure 3 This is an explanatory diagram used to illustrate a spectrum-based detection mechanism.
[0095] like Figure 2 As shown, the fluorescence obtained by illuminating a light beam from the light source 11 onto the sample S flowing through the flow path 13 is spectrally separated using dichroic mirrors 15A, 15B, and 15C in the filter-based detection mechanism. Therefore, the filter-based detection mechanism can obtain the fluorescence intensity of each predetermined wavelength band at photodetectors 17A, 17B, and 17C.
[0096] Specifically, each of the dichroic mirrors 15A, 15B, and 15C is a mirror that reflects light within a specific wavelength band and transmits light within other wavelength bands. This allows the measurement unit to separate the fluorescence spectrum into individual wavelength bands by placing dichroic mirrors 15A, 15B, and 15C, which reflect light in different wavelength bands, along the optical path of the fluorescence from sample S. For example, by sequentially placing dichroic mirror 15A (reflecting light in the red band), dichroic mirror 15B (reflecting light in the green band), and dichroic mirror 15C (reflecting light in the blue band) from the side where the fluorescence from sample S is incident, the measurement unit can separate the fluorescence spectrum from sample S into individual wavelength bands.
[0097] like Figure 3 As shown, the fluorescence obtained by illuminating a light beam from the light source 11 onto the sample S passing through the flow path 13 is spectrally separated by prism 16 in the spectrum-based detection mechanism. This allows the spectrum-based detection mechanism to acquire a continuous fluorescence spectrum at the photodetector array 18.
[0098] Specifically, prism 16 is an optical component that disperses the incident light. This allows the measurement unit to detect the continuous spectrum of fluorescence at a photodetector array 18, which consists of multiple photoelectric conversion elements arranged in an array, by dispersing the fluorescence from the sample S through prism 16.
[0099] The sorting unit sorts a portion of the sample S to be sorted. Specifically, first, the sorting unit generates droplets of sample S and charges the droplets of sample S to be sorted. Then, the sorting unit moves the generated droplets into the electric field generated by the deflection plate. At this time, the charged droplets are attracted to the side of the polarizing plate on which the electric field has been generated, and thus the direction of movement of the droplets changes. This allows the sorting unit to separate the droplets of sample S to be sorted without sorting the droplets of sample S, thereby enabling the sorting of biologically derived particles to be sorted. Note that the sorting method of the sorting unit can be an air jet method or a cuvette flow cell method. In addition, sample S can be sorted by being sprayed to the outside of the flow cell or the microfluidic chip, or it can be sorted inside the microfluidic chip. Whether to sort sample S can be determined by logic circuitry (e.g., field-programmable gate array (FPGA) circuitry) included in the sorting device 10, or by instructions from the information processing device 20.
[0100] The information processing device 20 analyzes the measurement data of sample S acquired by the measurement unit and presents the analyzed data to the user. The user can specify the population of bio-derived particles to be sorted by examining the data analyzed by the information processing device 20.
[0101] 0.2. Configuration of Information Processing Device
[0102] Next, we will refer to Figure 4 The description includes a more specific configuration of the information processing device 20 in the biological particle analysis system 1 according to this embodiment. Figure 4 This is a block diagram illustrating a configuration embodiment of the information processing apparatus 20 in this embodiment.
[0103] like Figure 4 As shown, the information processing device 20 includes an acquisition unit 201, an analysis unit 203, a reference spectrum storage unit 205, a data compression processing unit 207, an interface unit 209, a training unit 211, a learning model storage unit 213, and a determination unit 215.
[0104] The acquisition unit 201 acquires information related to the fluorescence of the bio-derived particles from the sorting device 10. Specifically, the sorting device 10 detects the light of the bio-derived particles using a spectrum-based detection mechanism, and the acquisition unit 201 acquires information related to the spectrum of the light from the bio-derived particles. The light from the bio-derived particles can be scattered light or fluorescence from bio-derived particles irradiated with a laser, or both. The acquisition unit 201 can acquire the information related to the light of the bio-derived particles from the sorting device 10 via a network, or via a wired or wireless local area network (LAN) or wired cable.
[0105] For example, the light-related information about the bio-derived particles acquired by the acquisition unit 201 could be as follows: Figure 5 The information shown. Figure 5 This is a table illustrating an embodiment of light-related information about bio-derived particles obtained from the sorting device 10.
[0106] like Figure 5 As shown, for each identification number of a cell (i.e., a bio-derived particle), the light-related information of the bio-derived particle can be represented by the gain detected by the corresponding N photomultiplier tubes (PMTs) designated "PMT1" to "PMTN" in a photodetector array. The N photomultiplier tubes are arranged in a row in the direction of light dispersion via prisms. This allows the spectrum of light emitted by the cell to be obtained by sequentially arranging the gains of the N photomultiplier tubes into a histogram. Figure 5 The results show the measurement of the gain of N photomultiplier tubes for each of the N cells.
[0107] The analysis unit 203 derives information related to the characteristics of the bio-derived particles by analyzing the light-related information of the bio-derived particles measured by the sorting device 10. Specifically, the analysis unit 203 separates the individual fluorescent components included in the fluorescence spectrum measured by the sorting device 10, thereby determining the expression level of the fluorescent substance corresponding to each fluorescent component in the bio-derived particles.
[0108] The bio-derived particles to be measured are labeled with multiple fluorescent substances that emit fluorescence with overlapping wavelength distributions. This allows the analysis unit 203 to determine the expression level of each fluorescent substance by weighting and fitting the wavelength distribution of fluorescence emitted from each fluorescent substance to a fluorescence spectrum measured by the sorting device 10.
[0109] More specifically, first, the analysis unit 203 obtains a reference spectrum from the reference spectrum storage unit 205, indicating the wavelength distribution of fluorescence emitted by each fluorescent substance from the labeled bio-derived particles. Next, the analysis unit 203 estimates the expression level of each fluorescent substance by superimposing the reference spectra of each fluorescent substance and fitting the superimposed reference spectra to the fluorescence spectrum measured by the sorting device 10 using a weighted least squares method.
[0110] The reference spectrum storage unit 205 stores a reference spectrum indicating the wavelength distribution of fluorescence emitted by each fluorescent substance capable of labeling biologically derived particles. The reference spectrum storage unit 205 may be located in the information processing device 20 or the sorting device 10, or it may be located in another information processing device or information processing server that can communicate via a network.
[0111] The data compression processing unit 207 performs data compression processing on the optical information of the bio-derived particles analyzed by the analysis unit 203.
[0112] Data compression processing includes both nonlinear and linear processing. For example, nonlinear processing may include dimensionality reduction, clustering, or grouping. Linear processing may include generating fluorescence information for each fluorescent dye from the spectral information of light from the biologically derived particles by performing fluorescence separation.
[0113] Note that any algorithm of supervised, unsupervised, or weakly supervised machine learning can be used for nonlinear processing. However, the machine learning algorithm expected to be used for nonlinear processing is different from the machine learning algorithm used in training unit 211, which will be described below.
[0114] Specifically, the data compression processing unit 207 can perform clustering processing on information related to the expression levels of corresponding fluorescent substances in the bio-derived particles. This allows the data compression processing unit 207 to classify the bio-derived particles into multiple groups obtained through external separation and internal binding.
[0115] The clustering algorithm is not particularly restricted and can use known clustering algorithms. For example, the data compression processing unit 207 can perform clustering by using an algorithm that can specify the number of clusters (such as the k-means algorithm), or by using an algorithm that automatically determines the number of clusters (such as the FlowSOM algorithm).
[0116] The clustering results of the data compression processing unit 207 can be as follows: Figure 6 and Figure 7 The format shown is presented to the user. Figure 6 and Figure 7 This is an explanatory diagram showing the results of the clustering process.
[0117] For example, the clustering results of the data compression processing unit 207 can be as follows: Figure 6 The table format shown is presented to the user.
[0118] A population of 1000 cells (i.e., bio-derived particles) is divided into N clusters, and... Figure 6 Cells belonging to a specific cluster are identified by identification numbers assigned to each cluster and cell. Specifically, in Figure 6 In this table, cells with identifiers "1", "2", "3", and "10" belong to the cluster with identifier "1"; cells with identifiers "11", "12", "22", and "31" belong to the cluster with identifier "2"; cells with identifiers "4" to "6", "14", and "15" belong to the cluster with identifier "3"; and cells with identifier "1000" belong to the cluster with identifier "N". Presenting the data in this table format allows users to easily identify the cells belonging to each cluster.
[0119] For example, the clustering results of the data compression processing unit 207 can be presented to the user in a minimum spanning tree format, such as... Figure 7 As shown.
[0120] With multiple colors (color through) Figure 7 The radar charts, categorized by the type of shading, are arranged in a tree structure, where the radar charts are... Figure 7 The units are interconnected. Each radar map represents a corresponding unit (i.e., a bio-derived particle) within the cell. Specifically, the distribution and size of each radar map represent the carrier corresponding to the expression level of a specific fluorescent substance in the cell. Here, regions colored with different colors represent the clusters to which each unit belongs. For example, units shown in radar maps colored with the same color (i.e., the same type of shading) indicate that the units belong to the same cluster.
[0121] Furthermore, the distance between radar images corresponds to Figure 7 The similarity between units represented by the radar chart. That is, Figure 7 The radar charts showing cells that are close to each other indicate that they are similar, while those showing cells that are far apart indicate that they are dissimilar. As mentioned above, presenting the data to the user in this minimum spanning tree format allows for the illustration of similarity relationships between cells other than those belonging to clusters.
[0122] Optionally, the data compression processing unit 207 can perform dimensionality reduction processing on information related to the expression levels of corresponding fluorescent substances in the bio-derived particles. Thus, by reducing the dimensionality of high-dimensional data containing the expression levels of multiple fluorescent substances, the data compression processing unit 207 can visualize the relationships between high-dimensional data on a low-dimensional map in an easily understandable manner. Therefore, users can more easily classify bio-derived particles into multiple groups by examining the low-dimensional information after dimensionality reduction processing rather than examining the high-dimensional information before dimensionality reduction processing. The data compression processing unit 207 can perform dimensionality reduction processing to reduce the dimensionality by at least one or more dimensions, but for example, reducing the dimensionality of information related to the expression levels of corresponding fluorescent substances in the bio-derived particles to three or fewer dimensions allows for clearer visualization of the relationships between the high-dimensional data.
[0123] The dimensionality reduction algorithm is not particularly limited, and known dimensionality reduction algorithms can be used. For example, the data compression processing unit 207 can perform dimensionality reduction by using algorithms such as PCA, t-SNE, or Umap.
[0124] The result of the dimensionality reduction processing performed by the data compression processing unit 207 can be as follows: Figure 8 The format shown is presented to the user. Figure 8 This is an illustrative diagram showing the results of two-dimensional dimensionality reduction of information related to the expression levels of various fluorescent substances in bio-derived particles using the t-SNE algorithm.
[0125] For example, the Euclidean distance of high-dimensional data, i.e., the expression level of fluorescent substances in cells, is transformed into a probability using the Student's t-distribution, and... Figure 8 This is plotted on a two-dimensional coordinate system. This allows users to compare the similarity of expression levels of corresponding fluorescent substances in cells in a simplified way, rather than comparing the expression levels of the corresponding fluorescent substances themselves. For example, cells belonging to the same population... Figure 8 Different colors are used to represent them. (Reference) Figure 8 It can be seen that cells belonging to the same population are grouped in the dimensionality reduction process through appropriate external separation and internal binding.
[0126] Interface unit 209 includes output and input devices, and receives and outputs information to the user. Specifically, interface unit 209 can present information processed nonlinearly by data compression processing unit 207 to the user via a cathode ray tube (CRT) display device, liquid crystal display device, organic light-emitting diode (OLED) display device, etc. Interface unit 209 can also receive input from the user specifying the bio-derived particles to be sorted using input devices such as touch panel, keyboard, mouse, button, microphone, switch, or lever.
[0127] By examining the information after data compression processing output from interface unit 209, users can more easily specify the population of bio-derived particles to be sorted. For example, users can specify the cluster of bio-derived particles to be sorted by examining the information after clustering processing. Alternatively, users can specify the population range of bio-derived particles to be sorted by examining the information after dimensionality reduction processing.
[0128] The following describes the establishment of the learning model performed by training unit 211.
[0129] For example, the established learning model can be stored in the learning model storage unit 213 included in the information processing device 20. This allows the sorting device 10 to sort the biologically derived particles to be sorted according to the sorting control of the information processing device 20. Alternatively, the established learning model can be implemented in logic circuitry, such as the FPGA circuitry provided in the sorting device 10. For example, the sorting device 10 can be provided with a determination unit 215, and the FPGA circuitry provided in the sorting device 10 can be implemented using logic designed based on the type of determination unit 215 to execute the established learning model. The training unit 211 can be designed to execute the logic of the established learning model.
[0130] The machine learning algorithm executed by training unit 211 is supervised learning, where information related to the fluorescence spectrum of the bio-derived particles designated as sorting targets is used as the teacher. For example, training unit 211 can build a learning model using machine learning algorithms such as random forest, support vector machine, or deep learning.
[0131] The biological particle analysis system 1 according to this embodiment uses various unstandardized information as teachers, and therefore, a random forest machine learning algorithm that does not require standardization can be appropriately used. Furthermore, the random forest machine learning algorithm can be appropriately used in the biological particle analysis system 1 according to this embodiment, which essentially requires the rapid determination of whether biologically derived particles are sorting targets, because its learning model is easily implemented in hardware.
[0132] Note that the training unit 211 can determine whether a learning model capable of sufficiently determining the sorting target has been established and notify the user of the determination result. For example, when the number of information items related to the biologically derived particles already used for training, or the ratio of information items used for training to all information, exceeds a threshold, the training unit 211 can notify the user that a learning model capable of sufficiently determining the sorting target has been established.
[0133] Furthermore, when the accuracy of the learning model exceeds a threshold, the training unit 211 can notify the user that a learning model capable of sufficiently determining the sorting target has been established. The accuracy of the learning model can be determined, for example, through N-fold cross-validation. Specifically, the entire information used as the teacher is divided into N segments, and a learning model is built by performing learning using the information included in the N-1 segments. The information included in the remaining segment is then determined, allowing the accuracy of the established learning model to be determined.
[0134] The learning model storage unit 213 stores the learning model established by the training unit 211. The learning model storage unit 213 can use field-programmable gate array (FPGA) circuits or similar devices to store the learning model as hardware. This enables faster determination of whether a bio-derived particle is a sorting target.
[0135] The determining unit 215 determines, based on the learning model stored in the learning model storage unit 213, whether a bio-derived particle emitting fluorescence measured by the sorting device 10 is a sorting target. When the bio-derived particle is determined to be a sorting target, the determining unit 215 instructs the sorting device 10 to sort the bio-derived particle.
[0136] Note that the learning model storage unit 213 and the determination unit 215 can be located in the sorting device 10.
[0137] Furthermore, when the sorting device 10 can separately sort multiple populations of bio-derived particles, the determination unit 215 can indicate not only whether the bio-derived particles are the sorting target, but also in which collection unit the bio-derived particles are collected. The training unit 211 performs machine learning by using information related to the fluorescence spectra of particles derived from organisms further specified in the collection unit after sorting as teacher data. This allows the determination unit 215 to output instructions to the sorting device 10 to sort multiple populations of bio-derived particles individually.
[0138] As described above, in machine learning-based sorting, biologically derived particles are sorted according to the determination of determination unit 215. Because the machine learning-based sorting outputs the determination result with the highest confidence, sorting of target particles can be performed even when the confidence is low, if the confidence is higher than other determination results. This is not preferred when higher purity measurement data (confidence relative to the true value) is required.
[0139] The biological particle analysis system according to this embodiment inputs cell information into a machine learning model, performs sorting determination, and then further performs sorting determination on the particles determined to be sortable based on a threshold. An embodiment of the biological particle analysis system will be described below.
[0140] 1. First Implementation Method
[0141] 1.1. Confidence-based sorting
[0142] Whether sorting can be determined based on past trends (training data) in machine learning-based sorting is often ambiguous.
[0143] Furthermore, when the Softmax function is applied to the output layer in deep learning, a calculation is performed to ensure that the sum of the confidence scores for each category is 100%.
[0144] When sorting to the highest probability category is performed without setting a threshold, even if events occur where category 0 = 20%, category 1 = 40%, category 2 = 30%, and category 3 = 10%, sorting to the highest probability category 1 will still be performed. However, when the user wants to improve purity (confidence), sorting in the above situation needs to be a non-sorting target because the probability of category 1 is as low as 40%. In this case, sorting efficiency decreases. Here, "category" refers to the classification or grouping of data.
[0145] Therefore, this implementation provides a threshold to sort only events with high confidence. Note that the threshold can be set variably and adjusted according to the user's intent.
[0146] 1.2. Use of confidence level
[0147] The confidence level from deep learning in the first embodiment is used in the following operations. Here, "confidence level" is the probability that the estimation result in deep learning is correct.
[0148] Step 1: First, flow a portion of the sample and perform dimensionality reduction.
[0149] Step 2: Specify (gate) the population range of biological particles to be sorted from the dimensionality reduction results.
[0150] Step 3: Divide the dimensionality-reduced sample into data for training and data for validation.
[0151] Step 4: Use the trained data for validation to change the threshold, and then check for fluctuations in purity and efficiency.
[0152] Alternatively, in the above operations, both the training and validation data are subjected to dimensionality reduction and gating. However, as long as the dimensionality reduction algorithm can maintain reproducibility even for newly added data, dimensionality reduction and gating can be performed only on the training data, and then the validation data can be added to the dimensionality reduction result along with basic fact labels. The "label" indicates the category to which each data point belongs.
[0153] Figure 9 This is a diagram illustrating data used for verification according to the first embodiment. Events in cells 1, 2, 3, 4... are shown... Figure 9 As shown, data exists for validation, and "baseline facts," "estimates," and "confidence levels" are associated with each event. The "true value" data is appended in the gating process described below, and the "estimate" and "confidence level" data are appended in the inference process using the validation data. Here, "true value" indicates the category in which cells are actually included. "Estimation" represents the category estimated in machine learning.
[0154] The event in "Cell 1" is associated with the category where the "True Value" is "1", the category where the "Estimated Value" is "2", and the 55% confidence level. The event in "Cell 2" is associated with the category where the "True Value" is "3", the category where the "Estimated Value" is "3", and the 80% confidence level. The event in "Cell 3" is associated with the category where the "True Value" is "5", the category where the "Estimated Value" is "5", and the 98% confidence level. The category in "Cell 4" is associated with the category where the "True Value" is "2", the category where the "Estimated Value" is "4", and the 40% confidence level.
[0155] For example, when Figure 9 When the threshold is set to 60%, events from "Cell 1" and "Cell 4" are not sorted, thus improving purity. However, when the threshold is set too high (such as 90%), the probability of even correctly estimated events being considered as non-sorting targets increases, thus reducing the efficiency of event acquisition.
[0156] Figure 10 This is a diagram showing the purity and efficiency (yield) of the dimensionality reduction measurement data in the first embodiment. For example... Figure 10 As shown, the screen displays the purity and efficiency corresponding to the threshold, and also indicates which measurement data were sorted. Figure 10 In the diagram, the X-axis represents the value of the first dimension of measurement data used for dimensionality reduction, and the Y-axis represents the value of the second dimension of measurement data used for dimensionality reduction. Although Figure 10 An example of two-dimensional measurement data is shown, but the measurement data can be represented in three dimensions.
[0157] Here, "purity" is the percentage of correctly labeled measurement data, and "efficiency" is the percentage of correctly labeled measurement data included in the labeled measurement data.
[0158] Black stars, x-shapes, black squares, black triangles, and black circles represent the measurement data used for dimensionality reduction, with the portion enclosed by the solid line of the square indicating the area being reduced. Figure 10The regions to be marked are assigned to them. Assume that the measurement data with black stars in mark 101 is correct, the measurement data with black squares in mark 102 is correct, the measurement data with black triangles in mark 103 is correct, and the measurement data with black circles in mark 104 is correct.
[0159] For example, when the threshold is 0% ( Figure 10 In the case of the left figure, when sorting is performed within the range of mark 101, the purity and efficiency are 100% and 100%, respectively; when sorting is performed within the range of mark 102, the purity and efficiency are 70% and 70%, respectively; when sorting is performed within the range of mark 103, the purity and efficiency are 80% and 100%, respectively; and when sorting is performed within the range of mark 104, the purity and efficiency are 100% and 70%, respectively.
[0160] At a threshold of 70% ( Figure 10 In the case of the intermediate diagram in the figure, when sorting is performed within the range of mark 101, the purity and efficiency are 100% and 100%, respectively; when sorting is performed within the range of mark 102, the purity and efficiency are 75% and 60%, respectively; when sorting is performed within the range of mark 103, the purity and efficiency are 88.9% and 100%, respectively; and when sorting is performed within the range of mark 104, the purity and efficiency are 100% and 60%, respectively.
[0161] When the threshold is 90% ( Figure 10 (See the right-hand diagram). When sorting is performed within the range marked 101, the purity and efficiency are 98% and 84%, respectively. When sorting is performed within the range marked 102, the purity and efficiency are 85.7% and 60%, respectively. When sorting is performed within the range marked 103, the purity and efficiency are 100% and 87.5%, respectively. And when sorting is performed within the range marked 104, the purity and efficiency are 100% and 60%, respectively.
[0162] like Figure 10 The display screen shown in the figure allows users to set thresholds when checking quantitative changes in purity and efficiency and qualitative changes in plots of measurement data identified as being sorted, based on thresholds.
[0163] Figure 11 This is a diagram illustrating the categories and confidence levels of the dimensionality-reduced measurement data according to the first embodiment.
[0164] For example, a user clicks on a measurement data point, or selects a gate, etc. Figure 11 Multiple events within the data. When only one measurement is selected, the confidence level for each category of the selected measurement is displayed. Users can check the confidence level for the corresponding category of the selected measurement.
[0165] When multiple measurements are selected, the system displays the confidence level for each category, including the mean, median, etc., of the selected measurements. Users can check the confidence level for each category of the selected measurements.
[0166] exist Figure 11 Table 105, displayed on the left, indicates the corresponding categories and confidence levels of the selected multiple measurement data points. Figure 11 Table 106, displayed on the right, indicates the corresponding category and confidence level of the selected measurement data.
[0167] 1.3. Threshold Setting
[0168] 1.3.1. Threshold setting method 1
[0169] In the threshold settings, a predefined threshold can be set for each mode (threshold setting method 1). For example, set purity mode = 95%, normal mode = 75%, yield mode = 0%, etc. The user selects a mode, and then, in response to the user's selection in method 1, the threshold is set.
[0170] Figure 12 This is a diagram showing the relationship between the representation mode according to the first embodiment and purity and efficiency. Figure 12 The purity and efficiency are shown in the production mode, normal mode, and purity mode, and it also shows which measurements are sorted. Users can refer to... Figure 12 The screen shown in the image indicates which mode to select.
[0171] This method of setting a threshold provides an algorithm in which a mode is selected and a threshold is set according to the selected mode. Thus, users who have difficulty setting thresholds can easily do so.
[0172] 1.3.2. Threshold setting method 2
[0173] Users can enter a freely chosen threshold value in the graphical user interface (GUI) during threshold settings (threshold setting method 2). Thresholds can be entered directly, or by using a slider, etc.
[0174] 1.3.3. Threshold setting method 3
[0175] In threshold setting method 1, the threshold is predetermined for each mode based on past data. In threshold setting method 3, an appropriate threshold is automatically calculated for each mode based on each measurement data.
[0176] Figure 13 This diagram illustrates the situation where thresholds are set for each measurement data in the first embodiment. Figure 13 In the diagram, the thick line represents the purity of the measurement data used for verification, the thick dashed line represents the three-point moving average of the purity of the measurement data used for verification, the thin line represents the efficiency of the measurement data used for verification, and the thin dashed line represents the three-point moving average of the efficiency of the measurement data used for verification.
[0177] Focus on Figure 13 The purity level can be set in a normal mode where the slope becomes gentler, and the threshold can be set to 62% to 63%. Alternatively, the purity mode can be set where the slope becomes steeper again from a gentle slope, and the threshold can be set to 87% to 88%. Whether the slope is gentle or steep can be determined, for example, by determining that the slope is gentle when the slope difference between a certain segment of the threshold used for the confidence level and the segments of the threshold before and after that segment is equal to or less than a predetermined difference, and that the slope is steep when the slope difference between the specific segment of the threshold and the segments of the threshold before and after that specific segment is equal to or greater than the predetermined difference. Alternatively, the purity mode can be set at a point where the slope becomes gentler again at approximately 99% confidence level. That is, the threshold can be set at a point that has certain characteristics relative to the slope of purity, etc.
[0178] It should be noted that the "purity" setting mode and threshold can be replaced by the slope of efficiency, the slope of a combination of purity and efficiency, the slope of a moving average of purity, the slope of a moving average of efficiency, etc. Furthermore, the threshold can be calculated without using a slope.
[0179] Receiver operating characteristic (ROC) curves can be used as a method for automatically determining thresholds. Figure 14 This is a diagram illustrating the situation where a threshold is set using the ROC curve of the verification measurement data according to the first embodiment.
[0180] The true positive rate (TPR) refers to the number of actual positive cases that can be correctly identified as positive relative to the actual number of positive cases. Figure 14 The false positive rate (FPR) is the ratio of the number of all positive cases to the number of actual negative cases that are incorrectly identified as positive, relative to the total number of negative cases.
[0181] The threshold for balancing purity and efficiency is the threshold that is closest to the top left (0, 1) when plotting the ROC curve, and therefore, this value can be used as a threshold.
[0182] To calculate the threshold closest to (0, 1), a search can be performed using Euclidean distance or other methods to obtain the threshold.
[0183] 1.4. Visualization using similarity or confidence levels
[0184] Dimensionality reduction reduces multidimensional information to low-dimensional information; therefore, it is impossible to fully express the relationships in multidimensional space by using low-dimensional space.
[0185] This can result in even similar cell types (such as CD4+ T cells and CD8+ T cells) being distributed in distant locations. This degrades the efficiency of analyzing such cell-like tumors.
[0186] In the analytical method disclosed herein, cell groups are selected by gating or other means in dimensionality reduction, the similarity or confidence level between the selected cell group and the cell to be measured is calculated using each measurement data, and the cell is displayed in a color corresponding to the calculated similarity or confidence level.
[0187] Measurement data can be visualized by changing the shading of multiple measurement data points, or by changing the color of multiple measurement data points based on similarity, as an example of visualization methods.
[0188] Similarity can be obtained through distance-based calculations such as Euclidean distance, Manhattan distance, or Chebyshev distance; similarity-based calculations such as cosine similarity, Jaccard coefficient, or Dice coefficient; or other calculations.
[0189] This visualization can be performed for analytical purposes, or it can be performed on the measurement data after sorting.
[0190] Figure 15 This is a diagram illustrating a display embodiment of measurement data after dimensionality reduction in the first implementation. According to... Figure 15 The similarity to the measurement data 111 of the selected cell group indicates the measurement data segment that underwent dimensionality reduction. Multiple measurement data are represented by a darker color because they are more similar to the measurement data 111 of the selected cell group. Figure 15 Higher in the middle.
[0191] Furthermore, visualizations using similarity or confidence can be applied not only to plots that have undergone dimensionality reduction, but also to other types of plots. Figure 16 The data shown before fluorescence compensation and as follows Figure 17 The data shown is after fluorescence compensation.
[0192] Figure 16 This is a diagram illustrating a display embodiment according to the first embodiment, wherein multiple measurement data of the cells to be measured before fluorescence compensation are displayed in different colors corresponding to similarity. For example... Figure 16 As shown, the measurement data are displayed in different colors based on the similarity of the measurement data 111 of the selected cell groups.
[0193] When displaying a measurement of a cell before fluorescence compensation, for each ch, the value corresponding to the ch for each photoreceiving system can be shown. Alternatively, the horizontal axis can represent the fluorescence intensity of each fluorescent dye, and the vertical axis can represent the ch value corresponding to each photoreceiving system. Figure 17 This is a diagram illustrating an embodiment where multiple measurement data of cells measured after fluorescence compensation are displayed in different colors according to similarity, based on a first embodiment. Here, Figure 17 The X and Y axes in the figure represent the fluorescence intensity after fluorescence compensation for each fluorescent dye (color) included in the measurement data.
[0194] 1.5. Functional block diagram of information processing device 300
[0195] Figure 18 This is a functional block diagram of the information processing apparatus 300 according to the first embodiment, which performs the sorting of measurement data in deep learning.
[0196] like Figure 18 As shown, the measuring device 311 is connected to the information processing device 300. The measuring device 311 measures a sample (e.g., cells, etc.), adds necessary data (e.g., the color of the cell's fluorescence, the intensity of the fluorescence, etc.) to the measurement data, and outputs the measurement data with the necessary data added to the information processing device 300. The measurement includes events where at least one measurement data is measured (e.g., cell 1).
[0197] The information processing device 300 includes an acquisition unit 312, a preprocessing unit 313, a dimensionality reduction unit 314, a gating unit 315, a partitioning unit 316, a training unit 317, an estimation unit 318, a threshold setting unit 319, a display unit 320, and a sorting unit 321.
[0198] The acquisition unit 312 acquires multiple measurement data from the measurement device 311 outside the information processing device 300. The preprocessing unit 313 performs downsampling, target size reduction, and other operations on the measurement data measured by the acquisition unit 312.
[0199] The dimensionality reduction unit 314 performs dimensionality reduction on the measurement data preprocessed by the preprocessing unit 313. "Dimensionality reduction" means finding the common features among multiple data points in a multidimensional space and representing the multiple data points with a low dimension that preserves as much of the data distribution relationship in the multidimensional space as possible.
[0200] The dimensionality reduction unit 314 determines the sorting target range after dimensionality reduction of the measurement data block. The measurement data for dimensionality reduction by the dimensionality reduction unit 314 includes measurement data for validation and measurement data for training.
[0201] The original values (e.g., the spectrum) before fluorescence compensation can be used, or the data after fluorescence compensation can be used as descriptive variables for the measurement data. Additionally, during fluorescence compensation, the inverse matrix is calculated, and the solution can be obtained using the Gauss-Jordan method. Furthermore, to suppress batch effects, algorithms such as normalization can be used as preprocessing for clustering.
[0202] The gating unit 315 gates the measurement data (including measurement data for validation and measurement data for training) that has been dimensionality-reduced by the dimensionality reduction unit 314. Further, the gating unit 315 adds labels to the measurement data for training the dimensionality-reduced measurement data. The partitioning unit 316 partitions the multiple measurement data points that have undergone dimensionality reduction and been gated by the gating unit 315 into multiple measurement data points for training and multiple measurement data points for validation.
[0203] Training unit 317 performs machine learning to build a learning model using measurement data (measurement data before or after fluorescence compensation) partitioned by partitioning unit 316 and labels added to the training measurement data by gating unit 315. The learning model estimates the measurement data used to determine whether a bio-derived particle is a sorting target and estimates the confidence level.
[0204] The estimation unit 318 feeds at least one or some of the input data from multiple measurement data blocks (measurement data for verification) into the learning model created by the training unit 317, and infers whether to sort the input measurement data blocks.
[0205] The estimation unit 318 performs estimation of the multiple measurement data for verification using the ground truth and confidence levels of the estimates for the multiple measurement data acquired by the acquisition unit 312. Specifically, the estimation unit 318 estimates the multiple measurement data for verification and estimates the confidence levels by using a learning model generated by the training unit 317.
[0206] The estimation unit 318 includes a confidence calculation unit that calculates the confidence of the estimation result based on multiple measurement data used for inference by the estimation unit 318 and information obtained through data compression processing.
[0207] The threshold setting unit 319 sets a threshold for sorting multiple measurement data obtained by the acquisition unit 312 for the measurement data corresponding to the confidence level estimated by the estimation unit 318.
[0208] Display unit 320 displays on the screen the measurement data used for verification, thresholds, classifications (categories), patterns, purity, and efficiency of the measurement data used for verification. Display unit 320 can also display the estimation results of estimation unit 318.
[0209] The sorting unit 321 sorts the measurement data to be sorted from the multiple measurement data obtained by the acquisition unit 312 according to the threshold set by the threshold setting unit 319. Specifically, the sorting unit 321 classifies the remaining measurement data other than the measurement data estimated by the estimation unit 318 with confidence level and the valid measurement data, and sorts the included classified measurement data into each category by using the set threshold.
[0210] The remaining measurement data are used to measure samples other than those used for training and validation. After the information processing unit 300 issues an instruction to the measuring device 311, the sample for measuring the data is allowed to flow into the measuring device 311. Then, the measuring device 311 sorts the flowing sample and outputs the measurement data of the sorted sample to the acquisition unit 312 of the information processing unit 300. Instructions from the information processing unit 300 to the measuring device 311 are executed, for example, after a threshold is set in the threshold setting unit 319.
[0211] 1.6. Description of the operation
[0212] Figure 19 This is a flowchart describing the sorting of measurement data in deep learning according to the information processing apparatus 300 of the first embodiment.
[0213] First, one or more samples from a plurality of samples are fed into the measuring device 311, and one or more samples from the plurality of samples are measured (step S1). Next, one or a portion of the measured samples are preprocessed, such as downsampling and target group reduction (step S2).
[0214] Next, dimensionality reduction is performed on one or a portion of the preprocessed measurement data (step S3), and gating is then performed on one or a portion of the dimensionality-reduced measurement data (step S4). Here, the original values before fluorescence compensation (e.g., spectra) can be used as data undergoing dimensionality reduction and explanatory variables during learning, or the data after fluorescence compensation can be used. Additionally, during fluorescence compensation, the inverse matrix is calculated, and the solution can be obtained using the Gauss-Jordan method. Furthermore, to suppress batch effects, algorithms such as normalization can be used as preprocessing for dimensionality reduction.
[0215] Next, one or a portion of the multiple measurement data that have been dimensionality reduced by gating unit 315 and gated by gating unit 315 are divided into multiple measurement data for training and multiple measurement data for verification (step S5).
[0216] Next, a learning model is generated by training multiple training measurement data sets (step S6). The generated learning model is then used to perform estimation on multiple measurement data sets for validation using ground truth values, and then the confidence level of the estimation is estimated on the multiple measurement data sets for validation (step S7).
[0217] Then, a threshold is set for the estimated confidence level (step S8). The threshold can be set by a user instruction or it can be set automatically. Next, the user checks the values of purity and efficiency, the status of the curves of multiple measurement data displayed on the display unit 320, etc. (step S9), and if the threshold setting is not appropriate (NG in step S9), the process returns to step S8, and the threshold is set again.
[0218] On the other hand, if the threshold setting is appropriate (OK in step S9), the remaining samples are allowed to flow (step S10), the remaining measurement data for the remaining samples are sorted (step S11), and the measurement data to be sorted is determined by the set threshold based on the confidence level of the selected measurement data (step S12).
[0219] 1.7. Variations
[0220] Furthermore, in the first embodiment described above, the information processing device 300 sorts the remaining measurement data. However, since sorting the remaining measurement data takes time, it can also be processed using the measuring device 311.
[0221] Figure 20 This is a block diagram illustrating a configuration embodiment of the biological particle analysis system 1 according to a variation of the first embodiment. In the following description, the same reference numerals are given as... Figure 4 The same components as those in the text.
[0222] The biological particle analysis system 1 according to the modified example is one of them. Figure 4 Some functions of the information processing device 20 shown are divided and set in an embodiment of the sorting device 10 connected via a network.
[0223] Specifically, the sorting device 10 includes, for example, Figure 20 The biological particle analysis system shown in the variant example includes an analysis unit 203, a reference spectrum storage unit 205, a data compression processing unit 207, and a training unit 211. The sorting device 10 acquires measurement data from the sample S and sorts the particles to be sorted based on the determination made by the information processing device 20. The information processing device 20 includes an acquisition unit 201, a learning model storage unit 213, and a determination unit 215.
[0224] Note that the information processing device 20 and the sorting device 10 are connected in a communicable manner via public line networks such as the Internet, telephone line networks, and satellite communication networks, as well as various local area networks (LANs) including Ethernet (registered trademark) and wide area networks (WANs).
[0225] The sorting device 10 can perform functions with a large computational load (e.g., analysis unit 203, data compression processing unit 207, and training unit 211) in the biological particle analysis system according to the modified example. On the other hand, since it is desirable to avoid delays caused by networks used for rapid determination, and the computational load is not large, the functions of the determination unit 215 and the learning model storage unit 213 can be assigned to the information processing device 20 directly connected to the sorting device 10.
[0226] Figure 21 This is a functional block diagram of an information processing system according to a variation of the first embodiment. Note that in the following description, the same reference numerals are given as... Figure 18 The same components as those in the text. Figure 21 As shown, the sorting unit 321 provided in the information processing device 300 can be provided in the measuring device 311.
[0227] Alternatively, a preprocessing unit 313 and a threshold setting unit 319 may be provided in the measuring device 311.
[0228] like Figure 21 As shown, the information processing device 300 outputs the threshold set by the threshold setting unit 319 and the remaining measurement data not used for verification and learning by the estimation unit 318 to the measurement device 311.
[0229] The sorting unit 321 of the measuring device 311 receives the threshold and other measurement data besides the measurement data used for estimation output from the information processing device 300, and sorts the measurement data using the received threshold.
[0230] The sorting unit (determination unit) of the measuring device 311, which is a bio-derived particle sorting apparatus, inputs optical information measured from the bio-derived particles to be sorted into a learning model created by the training unit 317, infers whether the bio-derived particles to be sorted are sorting targets, and when it is inferred that the bio-derived particles to be sorted are sorting targets, performs sorting determination based on the threshold set by the threshold setting unit 119. The bio-derived particle sorting apparatus sorts the target particles based on the sorting determination used by the determination unit. The sorted bio-derived particles are included in the sample. Next, a variation of the information processing system according to the first embodiment will be described. Figure 22 This is a functional block diagram illustrating a variation of the information processing system according to the first embodiment.
[0231] In a variation of the information processing system of the first embodiment, such as Figure 22 As shown, computationally intensive functions (e.g., preprocessing unit 313, dimensionality reduction unit 314, gating unit 315, partitioning unit 316, training unit 317, estimation unit 318, and threshold setting unit 319) can be assigned to devices with high computational power (in... Figure 22 In one embodiment, the information processing server is 301.
[0232] The information processing device 300 may be a cloud-based computer connected to the measurement device 311 via a network. In this case, the cloud-based computer may perform one or more functions of the information processing device 300, such as the dimensionality reduction unit 314, the training unit 317 for machine learning, and the threshold setting unit 319.
[0233] On the other hand, the information processing device 300, which is directly connected to the measuring device 311, can perform functions to avoid delays caused by networks, etc., to perform fast determinations, and can perform functions with a relatively low computational load.
[0234] Similar to the information processing apparatus 300 according to the first embodiment, the information processing system according to a variation of the first embodiment is capable of appropriately classifying measurement data.
[0235] 2. Second Implementation Method
[0236] 2.1. Confidence-based sorting (clustering)
[0237] In the second embodiment, a threshold is set for cluster-based sorting. When sorting is performed using a clustering algorithm, classification is appropriately performed among clusters with the highest relative similarity. However, it is unclear whether the classification results are the closest in an absolute sense.
[0238] When the absolute distance to a cluster classified as a single measurement data point is long, the cluster classified as a single measurement data point can be considered a non-sorting target for users prioritizing purity. In the second embodiment, a threshold is set such that measurement data are sorted only when the distance is less than a certain distance.
[0239] 2.2. Thresholds for Cluster-Based Sorting
[0240] Figure 23 This is a diagram used to describe the concept of a threshold for cluster-based sorting according to the second embodiment.
[0241] The parameters on the horizontal axis indicate, for example, the type of antibody, antigen marker, and CD classification for fluorescent dyes, and the parameters on the vertical axis indicate... Figure 23The fluorescence intensity of events (e.g., cells) in the data. Solid lines represent representative values for clusters, while dashed lines represent target events (measurements of the remaining data).
[0242] For example, such as Figure 23 As shown, when the representative value of the cluster corresponding to the leftmost parameter is set to a threshold of 50%, as... Figure 24 As shown, the range of 25% to 75% of the representative value of the cluster is set as the range of the threshold. Figure 24 This is a simplified diagram used to describe the concept of the range when the threshold in the cluster-based sorting according to the second embodiment is set to 50%. Then, when Figure 23 The leftmost target event is classified as sorted when its measured parameter (fluorescence intensity) falls within 25% to 75% of the representative value for that cluster. The leftmost event's parameter measured value is outside the threshold range, and therefore... Figure 23 In this case, it is not a sorting target.
[0243] For example, starting from the leftmost position, if the threshold for the representative value of the cluster corresponding to the second parameter is set to 50%, then... Figure 24 As shown, the range of 25% to 75% of the representative values of the cluster is set as the threshold range. Furthermore, when the measured value of the parameter of the second target event from the leftmost position falls within the range of 25% to 75% of the representative values of the cluster, the measured value is determined to be sorted. The second measured value from the leftmost position does not fall within the threshold range, and therefore... Figure 23 In this case, it is not a sorting target.
[0244] For example, such as Figure 24 As shown, when the representative value of the cluster corresponding to the third parameter from the leftmost position is set to a threshold of 50%, the range of 25% to 75% of the representative value of that cluster is set as the threshold range. Furthermore, when the third measurement value from the leftmost position falls within the range of 25% to 75% of the representative value of that cluster, it is determined that that measurement value has been sorted. The third measurement value from the leftmost position falls within the threshold range, and therefore... Figure 24 In the case of [these conditions], it is determined to be sorted.
[0245] In the second embodiment, the threshold for cluster-based sorting can be determined as follows.
[0246] 2.2.1. Determining the threshold for each parameter Input an absolute value threshold; if all parameters are within the range of the cluster's representative value ± the threshold, then the measured values are sorted.
[0247] The input rate threshold is used to sort the measured values if all parameters are within the range of the cluster's representative value ± the representative value × the threshold.
[0248] In each cluster, sorting is performed if all parameters are within thresholds set by the user using inputs such as the frequency distribution for each parameter.
[0249] 2.2.2. Determination of the threshold by averaging all parameters Enter a threshold for the absolute value, and perform sorting if the average value (|measured value - representative value|) is within the threshold.
[0250] The input rate threshold is used, and sorting is performed if the average value (|measured value - representative value|) falls within the average value x threshold of the representative value.
[0251] Here, "average" refers to the average value. When using random forest as the algorithm, the number of trees or the tree-to-number ratio can be set as a threshold for majority voting on the decision trees. The threshold can be determined automatically based on measurement data or can be determined by the user.
[0252] The threshold can be determined not only using the average value but also using the median of multiple measurements included in the cluster. Alternatively, the threshold can be determined using a representative value determined by training unit 317.
[0253] 2.3. Functional block diagram of information processing device 400
[0254] Figure 25 This is a functional block diagram for cluster-based sorting in the information processing apparatus 400 according to the second embodiment.
[0255] like Figure 25 As shown, the measuring device 411 is connected to the information processing device 400. The measuring device 411 measures a sample (e.g., cells, etc.), adds necessary data (e.g., the color of the cell's fluorescence, the intensity of the fluorescence, etc.) to the measurement data, and outputs the measurement data with the necessary data added to the information processing device 400. The measurement includes at least one event (e.g., unit 1) in the measurement data.
[0256] The information processing device 400 includes an acquisition unit 412, a preprocessing unit 413, a category formation and clustering unit 414, a cluster selection unit 415, a display unit 416, a threshold setting unit 417, and a sorting unit 418.
[0257] The acquisition unit 412 acquires multiple measurement data from the measuring device 411 outside the information processing device 400. The preprocessing unit 413 performs downsampling, target size reduction, and other operations on the measurement data measured by the acquisition unit 412.
[0258] The category forming and clustering unit 414 classifies the multiple measurement data acquired by the acquisition unit 412 into categories. The category forming and clustering unit 414 also classifies the multiple measurement data acquired by the acquisition unit 412 into clusters.
[0259] The cluster selection unit 415 selects clusters to be sorted from the categories formed by the category formation and the clustering unit 414. The display unit 416 displays a screen showing the efficiency of the classified measurement data (e.g., measurement data, category, threshold, pattern, purity, efficiency, the classified measurement data and the clusters of the classified measurement data). The threshold setting unit 417 sets a threshold for the average value of the multiple measurement data contained in the cluster selected by the cluster selection unit 415, i.e., the representative value of the cluster.
[0260] The median of multiple measurement data points included in the cluster can be used as a threshold. Alternatively, the threshold can be determined using a representative value determined by training unit 317.
[0261] The sorting unit 418 sorts the measurement data to be sorted from the measurement data included in the clusters classified by the category forming and clustering unit 414 based on the threshold set by the threshold setting unit 417.
[0262] Specifically, when all the measurements of multiple measurement data included in the clusters classified by category formation and clustering unit 414 fall within the range of representative value ± threshold, the sorting unit 418 sorts the sampled data included in the clusters classified by category formation and clustering unit 414 as sorting targets.
[0263] When all measurements of multiple measurement data included in a cluster classified by category formation and clustering unit 414 fall within the representative value ± representative value × threshold, the sorting unit 418 can sort the sampled data included in the cluster classified by clustering unit as sorting targets.
[0264] 2.4. FlowSOM Circuit
[0265] Figure 26 This is a block diagram illustrating a first embodiment of the FlowSOM circuit according to the second embodiment. FlowSOM is a known clustering algorithm. Event data a (with d dimensions) and data b of node (cluster) 1 containing representative values with d dimensions are input into the subtraction unit 551, and as follows... Figure 26 The calculation of the difference (ab) is explained in the text.
[0266] Square unit 552 calculates the square (ab) of the difference (ab) calculated by subtraction unit 551. 2 The calculation result is then output to the summing unit 553. The summing unit 553 calculates the square (ab) of the difference (ab) calculated by the squaring unit 552. 2 The sum Σ(ab) 2 The calculation result is then output to the comparison unit 554.
[0267] Comparison unit 554 compares the minimum distance currently held in minimum distance holding unit 555 with the sum Σ(ab) output from summing unit 553. 2 The distances are compared, and the smaller distance is kept in the minimum distance holding unit 555 as the minimum distance.
[0268] Specifically, comparison unit 554 uses the sum Σ(ab) of event data a and data b, which have a near-Euclidean distance. 2 The minimum distance is replaced to remain in the minimum distance holding unit 555. That is, the comparison unit 554 performs comparisons to search for the node with the minimum error. This classifies the data into nodes (clusters) with the minimum distance maintained in the minimum distance holding unit 555.
[0269] Multiple data points b from nodes 1, 2, ..., and N are sequentially input into subtraction unit 551, but can be processed in parallel.
[0270] Figure 27 This is a diagram illustrating a second embodiment of the FlowSOM circuit according to the second embodiment. (See diagram for details.) Figure 27 As shown, the input consists of 100 data points corresponding to nodes 1 to 100, and the number of parallel operations is 10.
[0271] Specifically, each data point b containing d-dimensional representative values from nodes 1, 2, 3, ..., and 10 is input in parallel into subtraction units 551_1 to 551_10. Event data a (with d dimensions) is also input into subtraction units 551_1 to 551_10.
[0272] Enter multiple data entries b sequentially from nodes 1, 11, 21, ..., and 91. Enter multiple data entries b sequentially from nodes 2, 12, 22, ..., and 92. Enter multiple data entries b sequentially from nodes 3, 13, 23, ..., and 93. Enter multiple data entries b sequentially from nodes 10, 20, 30, ..., and 100.
[0273] Event data a (with d dimensions) and multiple data b from nodes (clusters) 1, 2, ..., N, each with a d-dimensional representative value, are input into the corresponding subtraction units 551_1 to 551_10, and the difference (ab) is calculated.
[0274] Square units 552_1 to 552_10 calculate the square (ab) of the difference (ab) calculated by subtraction units 551_1 to 551_10 respectively. 2 The calculation results are output to summation units 553_1 to 553_10 respectively. Summation units 553_1 to 553_10 calculate the sum of the squares (ab)2 of the differences (ab) calculated by squaring units 552_1 to 552_10 respectively, Σ(ab)2, and output the calculation results to comparison units 554_1 to 554_10 respectively.
[0275] Comparison units 554_1 to 554_10 compare the minimum distance currently held in minimum distance holding units 555_1 and 555_10 with the sum Σ(ab)2 output from summing unit 553_1 to summing unit 553_10, respectively, and store the smaller distance as the minimum distance in minimum distance holding units 555_1 to 555_10.
[0276] The nodes (clusters) with the shortest distance among nodes 1, 11, 21, ..., and 91, the nodes (clusters) with the shortest distance among nodes 2, 12, ..., and 92, ..., and the nodes (clusters) with the shortest distance among nodes 10, 20, 30, ..., and 100 are respectively classified into minimum distance preservation units 555_1 to minimum distance preservation units 555_10.
[0277] Comparison unit 556 compares the minimum distance maintained in minimum distance maintenance unit 555_1 with the minimum distance maintenance unit 555_10, and maintains the smaller distance as the minimum distance in minimum distance maintenance unit 257. As a result, the nodes (cluster) with the minimum distance among nodes 1 to 100 are individually maintained in minimum distance maintenance unit 257.
[0278] Note that in Figure 27 In this system, the number of nodes is 100 and the number of parallel operations is 10, but the number can be freely chosen based on circuit resources. Although... Figure 27 The illustration shows a configuration that includes only one comparison unit 556, but multiple comparison units 556, instead of just one, are used to perform processing in parallel.
[0279] Figure 28 This is a diagram illustrating a third embodiment of the FlowSOM circuit according to the second embodiment. Figure 28 In the diagram, the symbol ◇ represents a meta-cluster, and the symbol □ represents a node associated with the meta-cluster for which the minimum value is selected.
[0280] exist Figure 28 In this example, the number of meta-clusters is 8, and the number of nodes associated with the meta-cluster that selects the minimum value is 10, but the number of meta-clusters and nodes is not limited to these. Although Figure 28 The diagram illustrates the scenario where nodes 1 through 10 associated with the meta-cluster are computed serially, but computations can be performed in parallel.
[0281] Through Figure 28 In the third embodiment of the FlowSOM circuit shown, the subtraction unit 571 processes the data to the minimum distance holding unit 575 to find the meta-cluster with the minimum distance among meta-clusters 1 to 8. Thereafter, a node with the final distance is selected from the 10 nodes associated with the meta-cluster with the minimum distance.
[0282] Event data a (with d dimensions) and a data point b, associated with the selected meta-cluster that has the minimum error and including nodes (cluster) with d-dimensional representative values, are input into subtraction unit 571, and... Figure 28 Calculate the difference (ab).
[0283] Square unit 572 calculates the square (ab) of the difference (ab) calculated by subtraction unit 571. 2 The calculation result is then output to the summing unit 573. The summing unit 573 calculates the square (ab) of the difference (ab) calculated by the squaring unit 572. 2 The sum Σ(ab) 2 The calculation result is then output to comparison unit 574.
[0284] Comparison unit 574 compares the minimum distance currently held in minimum distance holding unit 555 with the sum Σ(ab) output from summing unit 573. 2 The smaller distance is compared and is kept in the minimum distance holding unit 575 as the minimum distance.
[0285] Specifically, comparison unit 574 uses the sum Σ(ab) of event data a and data b, which are held at near-Euclidean distances in minimum distance holding unit 575. 2 Replace the minimum distance. That is, comparison unit 574 performs comparisons to search for the node with the minimum error. Thus, the node (cluster) with the minimum distance maintained in minimum distance maintenance unit 575 is selected.
[0286] Multiple data entries b associated with node 1, node 2, ..., and node 10 of the meta-cluster with the smallest distance error are sequentially input into the subtraction unit 571, but can be processed in parallel.
[0287] 2.5. Description of the operation
[0288] Figure 29 This is a flowchart describing cluster-based sorting in the information processing apparatus 400 according to the second embodiment.
[0289] First, one or more samples from a plurality of samples are passed through the measuring device 411, and one or more samples from the plurality of samples are measured (step S21). Next, the measurement data of one or more samples from the plurality of samples are preprocessed, such as downsampling and target group reduction (step S22).
[0290] Then, sorting is performed, classifying one or more of the preprocessed measurement data into categories (step S23). The cluster to be sorted is selected from the clusters formed by the categories (step S24).
[0291] Next, a threshold is set for the average value, or representative value, of the multiple measurement data included in the selected cluster (step S25). Note that the threshold can be the median value of the multiple measurement data included in the selected cluster. Next, the user checks the efficiency value displayed on the display unit 416 (step S26), and if the efficiency is not 100% (not OK in step S26), the process returns to step S25, and the threshold is set again. Note that the efficiency value can be a freely chosen value determined by the user, instead of 100%.
[0292] On the other hand, if the efficiency is 100% (OK in step S26), the remaining samples are allowed to flow (step S27) and clustering is performed on the remaining multiple measurement data (step S28). Then, the measurement data to be sorted in the remaining measurement data contained in the clusters classified based on the set threshold are classified using the set threshold (step S29).
[0293] Here, the original values before fluorescence compensation, such as the spectrum, can be used, or the fluorescence-compensated data can be used as descriptive variables for the clustering target data. Additionally, during fluorescence compensation, the inverse matrix is calculated, and the Gauss-Jordan method can be used to obtain the solution. Furthermore, to suppress batch effects, algorithms such as normalization can be used as preprocessing for clustering.
[0294] 2.6. Variations
[0295] Furthermore, in the second embodiment, the case where the information processing device 400 classifies the remaining measurement data is described. However, it is also possible to spend time classifying the remaining measurement data, and therefore the processing can be performed by the measurement device 411.
[0296] Figure 30 This is a functional block diagram of an information processing system modified according to the second embodiment. Note that in the following description, the same reference numerals are given as... Figure 25 The same components as those in the text. Figure 30 As shown, the sorting unit 418 provided in the information processing device 400 can be provided in the measuring device 411.
[0297] like Figure 30 As shown, the threshold set by the threshold setting unit 417 and the clusters formed by the category forming and clustering unit 414 are output from the information processing device 400 to the measuring device 411.
[0298] The sorting unit 418 of the measuring device 411 receives a threshold and a clustered group output from the information processing device 400, and sorts multiple measurement data contained in the cluster by using the received threshold.
[0299] The information processing system according to the variant of the second embodiment enables appropriate classification of measurement data, similar to the information processing apparatus 400 according to the second embodiment.
[0300] 2.7. Flowchart of sorting based on FlowSOM
[0301] Next, the explanation Figure 26 The operation of the first embodiment of the FlowSOM circuit shown. Figure 31 This is a flowchart illustrating the operation of the first embodiment of the FlowSOM circuit in the second implementation.
[0302] like Figure 31 As shown, set i=0 (step S31), and determine whether i < d (d: dimension) is satisfied (step S32). When i < d is satisfied (yes in step S32), in step S32, calculate the difference between the i-dimensional value of the representative vector of each node and the i-dimensional value of the event to be sorted (step S33).
[0303] Next, the difference between the i-dimensional value of the representative vector of each node calculated in step S33 and the i-dimensional value of the event to be sorted is squared (step S34), and the squared difference is integrated (step S35). Then, i is set to i+1 (step S36), and the process returns to step S32.
[0304] In step S32, if i < d is not satisfied (No in step S32), the node with the smallest integral value of the squared difference is calculated (step S37), and the process ends. Thus, the node (cluster) with the smallest distance error is selected.
[0305] Next, we will describe Figure 28 The operation of the third embodiment of the FlowSOM circuit shown. Figure 32 This is a flowchart illustrating the operation of a third embodiment of the FlowSOM circuit in the second embodiment.
[0306] like Figure 32 As shown, set i=0 (step S41) and determine whether i < d (d: dimension) is satisfied (step S42). When i < d is satisfied in step S42 (yes in step S42), calculate the difference between the i-dimensional value of the representative vector of each meta-cluster and the i-dimensional value of the event to be sorted (step S43).
[0307] Next, the difference between the i-dimensional value of the representative vector of each meta-cluster calculated in step S43 and the i-dimensional value of the event to be sorted is squared (step S44), and the squared difference is integrated (step S45). Then, i is set to i+1 (step S46), and the process returns to step S42.
[0308] If i < d is not satisfied in step S42 (no in step S42), calculate the meta-cluster with the minimum squared difference (step S47), and set j = 0 (step S48).
[0309] Next, it is determined whether j < d (d: dimension) is satisfied (step S49). When j < d is satisfied (yes in step S49), in step S49, the difference between the j-dimensional value of the representative vector of each node belonging to the meta-cluster with the least square difference and the j-dimensional value of the event to be sorted is calculated (step S50).
[0310] Next, the difference between the j-dimensional value of the representative vector of each node belonging to the meta-cluster calculated in step S50 and the j-dimensional value of the event to be sorted is squared (step S51), and the squared difference is integrated (step S52). Next, j is set to j+1 (step S53), and the process returns to step S49.
[0311] In step S49, if j < d is not satisfied (No in step S49), the node with the smallest integral value of the squared difference is calculated (step S54), and the process ends. Thus, the node (cluster) with the smallest distance error is selected.
[0312] First, the meta-cluster with the shortest Euclidean distance is selected. Then, in the third embodiment, the distance to each node belonging to the meta-cluster is calculated, and thus a reduction in computational resources and a faster processing speed can be expected.
[0313] 3. Third Implementation Method
[0314] 3.1. Confidence-based sorting
[0315] The second embodiment has already been described from the perspective of a conventional flow cytometer (FCM) that primarily uses fluorescence intensity without using images. In the third embodiment, the case of sorting based on confidence level using an imaging flow cytometer (IFCM) will be described.
[0316] In a normal FCM, fluorescence intensity can be measured, and images of individual cells can be captured in an IFCM. In the third embodiment, the population to be sorted (target variable) is specified by using fluorescence intensity or images as input, such as dimensionality reduction or clustering, and then learning is performed using fluorescence intensity or images as explanatory variables. Then, an appropriate threshold is set, and sorting is performed.
[0317] Here, the fluorescence intensity can be either the data before or after fluorescence compensation, and the image can be the original image data or the image data after preprocessing such as convolution. Furthermore, the method described in "1.3 Threshold Setting" can be used to set the threshold.
[0318] 3.2. Functional block diagram of information processing device 600
[0319] Figure 33 This is a functional block diagram of the IFCM-based sorting of the information processing apparatus 600 according to the third embodiment.
[0320] like Figure 33 As shown, the measuring device 611 is connected to the information processing device 600. The measuring device 611 measures a sample, adds the required data to the measured data, and outputs the measured data with the required data added to the information processing device 600. The measurement includes events (e.g., cell 1) that measure at least the measured data.
[0321] The information processing device 600 includes an acquisition unit 612, a preprocessing unit 613, a determination unit 614, a dimensionality reduction / clustering unit 615, a population designation unit 616, a partitioning unit 617, a training unit 618, an estimation unit 619, a display unit 620, a threshold setting unit 621, and a sorting unit 622.
[0322] The acquisition unit 612 acquires multiple measurement data from the measurement device 611 outside the information processing device 600. The preprocessing unit 613 performs downsampling, target population reduction, and other operations on the measurement data measured by the acquisition unit 612.
[0323] The determining unit 614 determines which of the multiple measurement data acquired by the acquiring unit 612 should be included in the input fluorescence data or image data. The dimensionality reduction / clustering unit 615 performs dimensionality reduction on the fluorescence data or image data determined by the determining unit, or classifies the fluorescence data or image data into clusters.
[0324] The group designation unit 616 designates the groups to be sorted based on classified fluorescence data, image data reduced in dimensionality by the dimensionality reduction / clustering unit 615, or clusters of data that have already been classified. The partitioning unit 617 partitions the fluorescence data or image data designated by the group designation unit 616 into fluorescence data or image data for training and fluorescence data or image data for validation.
[0325] Training unit 618 performs training using multiple measurement data points divided by partitioning unit 617, and generates a learning model. Estimation unit 619 performs estimation on multiple measurement data points included in the population specified by population designation unit 616 for valid multiple measurement data points, and estimates the confidence level of the estimates for valid multiple measurement data points. Specifically, estimation unit 619 estimates the confidence level of fluorescence data or image data used for validation using the learning model generated by training unit 618.
[0326] In addition to verifying the purity and efficiency of the measurement data, the display unit 620 can also display, as needed, the measurement data to be verified, thresholds, classifications (categories), thresholds, modes, etc. on the screen.
[0327] The threshold setting unit 621 sets a threshold for classifying multiple measurement data acquired by the acquisition unit 612 based on the confidence level estimated by the estimation unit 619. The sorting unit 622 sorts the classified fluorescence data or image data or measurement data fragments included in the classified clusters, which have been reduced in size by the dimensionality reduction / clustering unit 615, into measurement data to be sorted, based on the threshold set by the threshold setting unit 621.
[0328] The sorting unit 622 sorts the remaining fluorescence data or image data from the multiple measurement data acquired by the acquisition unit 612, excluding fluorescence data or image data used for verification and fluorescence data or image data used for training, into sorting targets.
[0329] Specifically, when all the measurements of multiple measurement data included in the classified cluster fall within the range of the representative value ± the threshold, the sorting unit 622 sorts the sampled data included in the classified cluster as sorting targets.
[0330] When all measurements of multiple measurement data included in the classified cluster fall within the range of representative value ± representative value × threshold, the sorting unit 622 can sort the sampled data included in the cluster classified by the clustering unit as sorting targets.
[0331] 3.3. Description of the operation
[0332] Figure 34 This is a flowchart describing the IFCM-based sorting process in the information processing apparatus 600 according to the third embodiment.
[0333] First, some samples are fed into the measuring device 611, and the above-mentioned samples, which include multiple samples, are measured (step S131). Next, the measurement data of a portion of the measured samples are preprocessed, such as downsampling and target population reduction (step S132).
[0334] Next, it is determined which of the fluorescence or images from the multiple measurement data that have undergone preprocessing will be input (step S133). Then, the fluorescence or image determined in step S33 is subjected to dimensionality reduction and clustering (step S134). Next, in step S34, the populations to be sorted are specified from the clustered clusters (step S135). Here, "population" is an island obtained by reducing the size of the fluorescence or image, and the fluorescence or image with reduced size that forms the island is selected.
[0335] Here, the input data for dimensionality reduction and clustering, as well as the explanatory variables during training, can be either the original values before fluorescence compensation, such as the spectrum, or the fluorescence-compensated data. When using images, the original data can be used, or the image can be preprocessed, such as through convolution. Furthermore, during fluorescence compensation, the inverse matrix is calculated, and the solution can be obtained using the Gauss-Jordan method. Additionally, to suppress batch effects, algorithms such as normalization can be used as preprocessing.
[0336] Next, the multiple measurement data contained in the population specified in step S135 are divided into multiple measurement data for training and multiple measurement data for validation (step S136).
[0337] Next, the model is trained using multiple partitioned measurement data for training and fluorescence or images as explanatory variables, and a learning model is generated (step S137). Then, the generated learning model is used to perform estimation of multiple measurement data for validation by using the estimated ground truth and confidence of the multiple measurement data for validation (step S138).
[0338] Then, a threshold is set for the estimated confidence level (step S139). Next, the user checks the values of purity and efficiency, the status of the curve of the measurement data, etc. displayed on the display unit 320 (step S140), and if the threshold setting is not appropriate (not OK in step S140), the process returns to step S139, and the threshold is set again.
[0339] On the other hand, if a threshold is set appropriately (OK in step S140), the remaining measurement data is allowed to flow (step S141), and the remaining samples after the remaining sampling are sorted into clusters (step S142). Then, the sorted measurement data is sorted according to the confidence level of the remaining measurement data contained in the clusters classified based on the set threshold (step S143).
[0340] 3.4. Variations
[0341] In addition, in the third embodiment, the information processing device 600 is described as classifying the remaining measurement data. However, processing the classification of the remaining measurement data takes time, so the measurement device 611 can also perform the processing.
[0342] Figure 35 This is a functional block diagram of an information processing system according to a variation of the third embodiment. Note that in the following description, the same reference numerals are given as... Figure 30 The same components as those in the text. Figure 35 As shown, the sorting unit 622 provided in the information processing device 600 can be provided in the measuring device 611.
[0343] like Figure 35 As shown, the information processing device 600 outputs the threshold set by the threshold setting unit 621 and the clusters clustered by the dimensionality reduction / clustering unit 615 to the measuring device 611.
[0344] The sorting unit 622 of the measuring device 611 receives a threshold and a clustered group output from the information processing device 600, and sorts multiple measurement data contained in the cluster by using the received threshold.
[0345] The information processing system according to the variation of the third embodiment is capable of performing IFCM-based sorting appropriately.
[0346] 4. Hardware Configuration
[0347] Figure 36 This is a hardware configuration diagram of a computer that illustrates an embodiment of a computing device that implements an information processing apparatus 20, 300, 400 or 600 or a measuring device 311, 411 or 611 according to an embodiment.
[0348] Computer 1000 includes CPU 1100, RAM 1200, read-only memory (ROM) 1300, hard disk drive (HDD) 1400, communication interface 1500, and input / output interface 1600. The units of computer 1000 are connected together via bus 1050.
[0349] The CPU 1100 operates based on programs stored in ROM 1300 or HDD 1400 and controls each unit. For example, the CPU 1100 loads programs stored in ROM 1300 or HDD 1400 into RAM 1200 and executes processing operations corresponding to various programs.
[0350] The ROM 1300 stores boot programs (such as the Basic Input / Output System (BIOS) executed by the CPU 1100 when the computer 1000 is activated), programs that depend on the hardware of the computer 1000, etc.
[0351] HDD 1400 is a computer-readable storage medium that non-temporarily stores programs executed by CPU 1100, data used by the programs, etc. Specifically, HDD 1400 is a storage medium that stores an application program according to this disclosure, which is an embodiment of program data 1450.
[0352] Communication interface 1500 is an interface for computer 1000 to connect to external network 1550 (e.g., the Internet). For example, CPU 1100 receives data from another device via communication interface 1500 and sends data generated by CPU 1100 to another device via communication interface 1500.
[0353] Input / output interface 1600 is an interface for connecting input / output device 1650 and computer 1000. For example, CPU 1100 receives data from input devices such as keyboard and mouse via input / output interface 1600. CPU 1100 also sends data to output devices such as display, speaker, printer, etc. via input / output interface 1600. Input / output interface 1600 can also be used as a media interface for reading programs stored in a predetermined storage medium (medium). Examples of media include optical recording media such as digital versatile disc (DVD) or phase-change rewritable disc (PD), magneto-optical recording media such as magneto-optical disc (MO), magnetic tape media, magnetic recording media, semiconductor memory, etc.
[0354] Note that the CPU 1100 reads program data 1450 from the HDD 1400 and executes the program, but in another embodiment, the CPU 1100 may obtain these programs from another device via an external network 1550.
[0355] While preferred embodiments of the present disclosure have been described in detail above with reference to the accompanying drawings, the scope of the present disclosure is not limited to these embodiments. Obviously, those skilled in the art can make various improvements and modifications without departing from the concept of the present disclosure, and these improvements and modifications should also be considered within the scope of protection of the present disclosure.
[0356] Furthermore, the effects described in this specification are merely illustrative or exemplary and are not intended to be limiting. That is, in addition to or in lieu of the effects described above, other effects that may be apparent to those skilled in the art may arise from the description of this specification based on the technology disclosed herein.
[0357] It should be noted that this technology can also be configured as described below. [1]
[0359] A biological particle analysis system, comprising: The acquisition unit acquires measurement data based on biologically derived particles included in the sample; The compression unit performs data compression processing on the measurement data acquired by the acquisition unit; The gating unit selects the measurement data compressed by the compression unit into training measurement data and validation measurement data, and then adds labels to the training measurement data; The training unit builds a learning model using training measurement data and labels; The estimation unit inputs validation measurement data into the learning model and outputs the confidence level of the validation measurement data; and The threshold setting unit sets the threshold used when sorting samples based on the confidence level. [2]
[0361] According to the biological particle analysis system of [1], it also includes: The display unit shows the efficiency and yield of bio-derived particles based on output confidence and threshold. [3]
[0363] According to the biological particle analysis system in [1] or [2], in which, The training measurement data and the validation measurement data included in the measurement data are different from each other. [4]
[0365] The biological particle analysis system according to any one of [1] to [3] also includes: A bio-derived particle sorting device includes a determining unit that inputs measurement data of the bio-derived particles to be sorted into a learning model to infer whether the bio-derived particles to be sorted are sorting targets, and when it is inferred that the bio-derived particles to be sorted are sorting targets, makes a sorting decision based on a threshold set by a threshold setting unit. [5]
[0367] According to the biological particle analysis system in [4], in which, The bio-derived microparticle sorting device includes a sorting unit that sorts target particles based on a sorting decision made by a determining unit. [6]
[0369] According to the biological particle analysis system in [4], in which, The sorted bio-derived particles were included in the sample. [7]
[0371] According to any one of [1] to [6], a biological particle analysis system, wherein, Set the predetermined threshold to the threshold. [8]
[0373] According to the biological particle analysis system in [7], in which, A predetermined threshold is determined in correspondence with one or more patterns. [9]
[0375] According to any one of [1] to [6], a biological particle analysis system, wherein, The threshold is set by the user.
[10]
[0377] According to any one of [1] to [9], a biological particle analysis system, wherein, Data compression is a form of dimensionality reduction, and After dimensionality reduction, the target range for sorting is determined.
[11]
[0379] An information processing apparatus, comprising: The compression unit performs data compression processing on measurement data based on biologically derived particles included in the sample; The gating unit selects the measurement data compressed by the compression unit into training measurement data and validation measurement data, and then adds labels to the training measurement data; The training unit builds a learning model using training measurement data and labels, and the learning model determines whether biologically derived particles are sorting targets. The inference unit takes the validation measurement data into the learning model built by the training unit and then infers whether the validation measurement data is the sorting target. The confidence calculation unit calculates the confidence level of the verification measurement data used for inference; and The threshold setting unit sets a threshold for sorting samples based on the confidence level calculated by the confidence level calculation unit.
[12]
[0381] According to the information processing device of
[11] , wherein, The established learning model is output to the particle sorting device.
[13]
[0383] An information processing method, comprising: The compression step involves performing data compression processing on measurement data based on biologically derived particles included in the sample; The gating step involves gating the compressed measurement data into training measurement data and validation measurement data, and then adding labels to the training measurement data. The training process involves using training measurement data and labels to build a learning model, which then determines whether biologically derived particles are sorting targets. The inference step involves inputting the validation measurement data into the learning model built in the training step and inferring whether the validation measurement data is the sorting target. The confidence calculation steps include calculating the confidence level of the validation measurement data used in the inference; and The threshold setting step sets the threshold for sorting samples based on the confidence level calculated in the confidence level calculation step.
[14]
[0385] According to the information processing method of
[13] , it also includes: The steps for measuring data using a particulate analysis device.
[15]
[0387] According to the information processing method of
[14] , it also includes: The steps involve inputting optical information measured from bio-derived particles sorted by a particle analysis device into a learning model built during training, inferring whether the sorted bio-derived particles are sorting targets, and when it is inferred that the sorted bio-derived particles are sorting targets, making a sorting decision based on a threshold set in the threshold settings.
[16]
[0389] According to the information processing method of
[15] , it also includes: The steps for sorting target particles are determined based on the sorting criteria.
[17]
[0391] An information processing apparatus, comprising: The acquisition unit acquires multiple measurement data, including optical information measured based on biologically derived particles included in the sample; The clustering unit classifies multiple measurement data points acquired by the acquisition unit into multiple clusters; Cluster selection unit, which selects clusters as sorting targets from clusters classified by clustering units; and The threshold setting unit sets a threshold based on multiple measurement data included in the cluster selected by the cluster selection unit.
[18]
[0393] According to the information processing device of
[17] , wherein, The clustering unit classifies multiple measurement data points acquired by the acquisition unit into clusters; and The information processing device further includes a sorting unit that sorts measurement data to be sorted from multiple measurement data included in a cluster classified by a clustering unit, based on a threshold set by the threshold setting unit.
[19]
[0395] According to the information processing apparatus of
[17] or
[18] , wherein, The threshold setting unit sets a threshold corresponding to the representative value of the cluster or a threshold corresponding to the median of multiple measurement data included in the cluster selected by the cluster selection unit.
[0396] Reference number list
[0397] 300, 400, 600 Information Processing Devices
[0398] 311, 411, 611 Measuring devices
[0399] Acquisition Units 312, 412, and 612
[0400] Preprocessing units 313, 413, and 613
[0401] 314 Dimensionality Reduction Units
[0402] 315 Gating Unit
[0403] 316 division units
[0404] Training units 317 and 618
[0405] Estimation units 318 and 619
[0406] Threshold setting units 319, 417, and 621
[0407] 320, 416, 620 display units
[0408] Sorting units 321, 418, and 622
[0409] 414 Category Formation and Clustering Units
[0410] 415 Cluster Selection Unit
[0411] 614 Determine the unit
[0412] 615 Dimensionality Reduction / Clustering Units
[0413] 616 Group designated unit.
Claims
1. A biological particle analysis system, comprising: The acquisition unit acquires measurement data based on biologically derived particles included in the sample; The compression unit performs data compression processing on the measurement data acquired by the acquisition unit; The gating unit selects the measurement data compressed by the compression unit into training measurement data and verification measurement data, and then adds a tag to the training measurement data; The training unit builds a learning model using the training measurement data and the labels; The estimation unit inputs the verification measurement data into the learning model and outputs the confidence level of the verification measurement data; as well as The threshold setting unit sets a threshold for sorting the samples based on the confidence level.
2. The bioparticle analysis system according to claim 1 further includes: The display unit shows the efficiency and yield of the bio-derived particles based on the output confidence level and the threshold.
3. The bioparticle analysis system according to claim 1, wherein, The training measurement data and the validation measurement data included in the measurement data are different from each other.
4. The bioparticle analysis system according to claim 1 further includes: A bio-derived particle sorting device includes a determining unit that inputs measurement data of the bio-derived particles to be sorted into a learning model to infer whether the bio-derived particles to be sorted are sorting targets, and when it is inferred that the bio-derived particles to be sorted are sorting targets, makes a sorting decision based on a threshold set by the threshold setting unit.
5. The bioparticle analysis system according to claim 4, wherein, The bio-derived microparticle sorting device includes: a sorting unit that sorts target particles based on the sorting decision made by the determining unit.
6. The bioparticle analysis system according to claim 4, wherein, The sorted bio-derived particles are included in the sample.
7. The bioparticle analysis system according to claim 1, wherein, Set the predetermined threshold to the threshold.
8. The bioparticle analysis system according to claim 7, wherein, The predetermined threshold is determined in correspondence with one or more patterns.
9. The bioparticle analysis system according to claim 1, wherein, The threshold is set by the user.
10. The bioparticle analysis system according to claim 1, wherein, The data compression process is dimensionality reduction, and After the dimensionality reduction, the target sorting range is determined.
11. An information processing apparatus, comprising: The compression unit performs data compression processing on measurement data based on biologically derived particles included in the sample; The gating unit selects the measurement data compressed by the compression unit into training measurement data and verification measurement data, and then adds a tag to the training measurement data; The training unit establishes a learning model using the training measurement data and the labels, and the learning model determines whether the biologically derived particles are sorting targets. The inference unit inputs the verification measurement data into the learning model established by the training unit, and then infers whether the verification measurement data is the sorting target; A confidence calculation unit calculates the confidence level of the verification measurement data used for inference; as well as The threshold setting unit sets a threshold for sorting the samples based on the confidence level calculated by the confidence level calculation unit.
12. The information processing apparatus according to claim 11, wherein, The established learning model is output to the particle sorting device.
13. An information processing method, comprising: The compression step involves performing data compression processing on measurement data based on biologically derived particles included in the sample; The gating step involves gating the compressed measurement data into training measurement data and validation measurement data, and then adding tags to the training measurement data. The training step involves establishing a learning model using the training measurement data and the labels as training data, and the learning model determines whether the biologically derived particles are sorting targets. The inference step involves inputting the verification measurement data into the learning model established in the training step, and inferring whether the verification measurement data is the sorting target. The confidence calculation step involves calculating the confidence level of the verification measurement data used in the inference. as well as The threshold setting step sets a threshold for sorting the samples based on the confidence level calculated in the confidence level calculation step.
14. The information processing method according to claim 13, further comprising: The measurement data were measured using a particulate analysis device.
15. The information processing method according to claim 14, further comprising: The steps involve inputting optical information measured from biologically derived particles sorted by the particle analysis device into the learning model established during training, inferring whether the sorted biologically derived particles are the sorting target, and when it is inferred that the sorted biologically derived particles are the sorting target, making a sorting decision based on the threshold set in the threshold setting.
16. The information processing method according to claim 15, further comprising: The steps for sorting target particles are based on the sorting decision.
17. An information processing apparatus, comprising: The acquisition unit acquires multiple measurement data, including optical information measured based on bio-derived particles included in the sample; The clustering unit classifies the multiple measurement data obtained by the acquisition unit into multiple clusters; The cluster selection unit selects clusters as sorting targets from the clusters classified by the clustering unit; as well as The threshold setting unit sets a threshold based on multiple measurement data included in the cluster selected by the cluster selection unit.
18. The information processing apparatus according to claim 17, wherein, The clustering unit classifies multiple measurement data points acquired by the acquisition unit into clusters; and The information processing device further includes: a sorting unit, which sorts measurement data to be sorted from multiple measurement data included in a cluster classified by the clustering unit based on a threshold set by the threshold setting unit.
19. The information processing apparatus according to claim 17, wherein, The threshold setting unit sets a threshold corresponding to the representative value of the cluster or a threshold corresponding to the median of multiple measurement data included in the cluster selected by the cluster selection unit.