Method for determining uncertainty of detection of content of molecules to be detected in random emulsion and application thereof

By using the cumulative distribution function of Poisson binomial distribution and the skewness-corrected normal distribution method in random emulsion droplet digital PCR technology, the shortcomings of measurement uncertainty assessment are solved, higher detection precision and accuracy are achieved, and experimental costs are reduced.

CN122245456APending Publication Date: 2026-06-19MGI TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
MGI TECH CO LTD
Filing Date
2024-12-18
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

The existing random emulsion droplet digital PCR technology lacks a systematic method for assessing measurement uncertainty, which leads to quantitative bias and affects the accuracy of detection results.

Method used

The cumulative distribution function of the Poisson binomial distribution is used to evaluate the uncertainty of analyte molecules in random emulsions. By determining the total number, volume, and classification of reaction partitions, and combining the skewness-corrected normal distribution method, the calculation is simplified and the accuracy is improved.

Benefits of technology

This technology improves the precision and accuracy of random emulsion digital measurement techniques, reduces the difficulty and cost of experimental testing, and establishes a universally applicable technical standard for measurement uncertainty.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122245456A_ABST
    Figure CN122245456A_ABST
Patent Text Reader

Abstract

This application relates to a method and its application for determining the uncertainty of detecting the content of an analyte in a random emulsion. The method includes: S1, determining the total number of reaction zones in the random emulsion, the volume of each reaction zone, and classifying each reaction zone into positive and negative reaction zones; S2, determining the cumulative distribution function of a Poisson binomial distribution based on a predetermined numerical value of the analyte content *m* in the emulsion, the total number of reaction zones in the random emulsion, and the volume of each reaction zone; S3, evaluating the uncertainty of the analyte content *m* in the emulsion based on the cumulative distribution function. This method can effectively improve the precision and accuracy of random emulsion digital determination technology, and reduce the difficulty and cost of experimental testing.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of bioinformatics analysis, specifically to a method and application for determining the detection uncertainty of the content of analytes in random emulsions. Background Technology

[0002] Single-molecule counting is a precise measurement method for determining the concentration of a solution by counting the number of individual molecules in the solution. There are two main forms of this method: one is counting the number of individual molecules detected per unit time, and the other is counting the number of individual molecules per unit volume of solution. In both methods, the number of individual molecules detected in the solution is directly proportional to the solution concentration.

[0003] In recent years, digital assay techniques such as digital PCR (dPCR), digital isothermal amplification, and digital ELISA have become reliable methods for the accurate quantitative detection of biomarkers (such as nucleic acids and proteins). These techniques work by diluting and distributing the test solution into multiple reaction partitions, then performing molecular amplification or signal amplification on each partition, and using signals such as fluorescence, color development, and precipitation to distinguish between positive and negative partitions. A Poisson statistical model is then applied to calculate the total number of target molecules. These techniques have been widely applied in scientific research, medicine, agriculture, food safety, and other fields, demonstrating their effectiveness and reliability in accurate detection.

[0004] Existing digital PCR technologies typically rely on microfluidic systems or solid-phase microcavity chips, which are demanding and expensive. In 2019, BGI Genomics proposed an improved droplet digital PCR method, called random emulsified droplet digital PCR, in its invention patent "Random Emulsification Digital Absolute Quantitative Analysis Method and Device" (application number PCT / CN2019 / 122068, authorization number CN114729397B). This method simplifies the droplet partitioning process through mechanical oscillation, stirring, or spraying, enabling the rapid and efficient formation of droplets with random volumes and quantities. This method incorporates an innovative Poisson binomial probability model, accurately calculating the total number of target molecules in a sample and is applicable to absolute quantitative analysis of samples of any concentration. This technology offers advantages such as low cost, ease of operation, small space occupation, low read load, wide dynamic range, and high detection throughput, eliminating the dependence of traditional digital PCR on microfluidic systems or solid-phase microcavity chips.

[0005] However, this method does not provide a systematic approach to assessing measurement uncertainty. Existing uncertainty assessments for digital measurement techniques typically assume that each partition has an equal volume, but this premise is not applicable to randomized emulsion droplet digital PCR. Directly applying traditional probability models may lead to significant quantitative biases, thus affecting the accuracy of the results. Summary of the Invention

[0006] This application aims to at least partially address one of the technical problems in the related art. Therefore, one objective of this application is to provide a method for determining the uncertainty of detecting the content of analytes in random emulsions with high precision and accuracy.

[0007] Specifically, this application proposes the following technical solution:

[0008] In a first aspect, this application proposes a method for determining the detection uncertainty of the content of an analyte molecule in a random emulsion. According to an embodiment of this application, the method includes: S1, determining the total number of reaction zones in the random emulsion, the volume of each reaction zone, and classifying each reaction zone into a positive reaction zone and a negative reaction zone, wherein the positive reaction zone contains the analyte molecule, and the negative reaction zone does not contain the analyte molecule; S2, based on a predetermined numerical value of the content m of the analyte molecule in the emulsion... S3. Determine the cumulative distribution function (CDF) of the Poisson binomial distribution based on the total number of reaction zones and the volume of each reaction zone in the random emulsion, wherein the Poisson binomial distribution is used to describe the number of reaction zones containing a specific number of analyte molecules; S4. Evaluate the uncertainty of the analyte molecule content m in the emulsion based on the cumulative distribution function.

[0009] In some examples of this application, the aforementioned method can be used to evaluate and compare the measurement results of random emulsion digital assay, and can also be used to predict the measurement results. It can effectively improve the precision and accuracy of random emulsion digital assay, reduce the difficulty and cost of experimental testing, and can also be used to form a measurement uncertainty technical standard with universal significance.

[0010] Secondly, this application proposes a system for determining the detection uncertainty of the content of an analyte molecule in a random emulsion. According to an embodiment of this application, the system includes: a measurement parameter determination module, used to determine the total number of reaction zones in the random emulsion, the volume of each reaction zone, and to classify each reaction zone into a positive reaction zone and a negative reaction zone, wherein the positive reaction zone contains the analyte molecule, and the negative reaction zone does not contain the analyte molecule; and a distribution function construction module, used to construct a distribution function based on a pre-determined numerical value of the analyte molecule content m in the emulsion. The total number of reaction zones and the volume of each reaction zone in the random emulsion are used to determine the cumulative distribution function of the Poisson binomial distribution, wherein the Poisson binomial distribution is used to describe the number of reaction zones containing a specific number of analyte molecules; the uncertainty evaluation module is used to evaluate the uncertainty of the analyte molecule content m in the emulsion based on the cumulative distribution function.

[0011] In some examples of this application, the aforementioned system can be used to evaluate and compare the measurement results of random emulsion digital assays, and also to predict measurement results. This system can effectively improve the precision and accuracy of random emulsion digital assays, reduce the difficulty and cost of experimental testing, and contribute to the formation of universally applicable measurement uncertainty technical standards.

[0012] Thirdly, this application proposes a computer program product. According to an embodiment of this application, the computer program product includes: computer instructions; when some or all of the computer instructions are executed on a computer, the method for determining the detection uncertainty of the content of analytes in a random emulsion as described in the first aspect of this application is performed.

[0013] Fourthly, this application proposes a computing device. According to an embodiment of this application, the computing device includes: a processor and a memory; the memory is used to store a computer program; the processor is used to execute the computer program to implement the method for determining the detection uncertainty of the content of an analyte in a random emulsion as described in the first aspect of this application.

[0014] Fifthly, this application provides a computer-readable storage medium. According to embodiments of this application, the computer-readable storage medium stores computer instructions or programs that, when executed on a computer, cause the method for determining the detection uncertainty of the content of analytes in a random emulsion as described in the first aspect of this application to be performed.

[0015] The aforementioned computer program product, computing device, and computer-readable storage medium automatically execute a method for determining the detection uncertainty of the content of analytes in random emulsions via computer instructions, achieving high efficiency and automation, and improving the efficiency, accuracy, and precision of detection uncertainty assessment. Furthermore, the instruction-based nature of this method ensures high consistency, stability, and reliability across different experimental scenarios.

[0016] Additional aspects and advantages of this application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of this application. Attached Figure Description

[0017] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0018] Figure 1 A schematic flowchart illustrating a method for determining the detection uncertainty of the content of analytes in random emulsions, as provided in this application;

[0019] Figure 2 A schematic diagram of a system for determining the detection uncertainty of the content of analytes in random emulsions, provided for some examples in this application;

[0020] Figure 3 Schematic diagrams of electronic devices provided for some examples in this application;

[0021] Figure 4 The following are schematic diagrams comparing digital measurement techniques for homogeneous emulsion systems and random emulsion systems provided for some examples in this application (all taken by MGISEQ-2000 platform from BGI Genomics); where A represents the droplet radius and light intensity distribution of the homogeneous emulsion; and B represents the droplet radius and light intensity distribution of the random emulsion.

[0022] Figure 5 A schematic diagram of simulation data results for 1024 dispersed droplet volumes provided for some examples of this application;

[0023] Figure 6 The following diagram illustrates the coverage intervals (CIs) for calculating the number of target molecules with a 95% coverage probability using a Poisson binomial distribution NA and RNA, as provided in this application. (A) The coverage interval is [79.4, 123.2] when the estimated number of molecules is 100; (B) The approximate cumulative probability distribution is shown for the estimated number of molecules is 100; (C) The coverage interval is [440.6, 567.0] when the estimated number of molecules is 500; (D) The approximate cumulative probability distribution is shown for the estimated number of molecules is 500; (E) The coverage interval is [1806.2, 2225.6] when the estimated number of molecules is 2000; (F) The approximate cumulative probability distribution is shown for the estimated number of molecules is 2000; (G) The coverage interval is [8918.1, 11428.5] when the estimated number of molecules is 10000; (H) The approximate cumulative probability distribution is shown for the estimated number of molecules is 10000.

[0024] Figure 7 A schematic diagram illustrating the probability set of each droplet being a negative droplet corresponding to different values ​​of the content of the analyte molecule provided in this application;

[0025] Figure 8The following are schematic diagrams illustrating the coverage intervals (CIs) with a 95% coverage probability for the number of molecules in the target sample calculated using the Poisson binomial distribution DFT-CF for some examples provided in this application: (A) The coverage interval is [79.6, 123.2] when the estimated number of molecules is 100; (B) The cumulative probability distribution is shown for the estimated number of molecules is 100; (C) The coverage interval is [457.5, 577.0] when the estimated number of molecules is 500; (D) The cumulative probability distribution is shown for the estimated number of molecules is 500; (E) The coverage interval is [1803.0, 2219.6] when the estimated number of molecules is 2000; (F) The cumulative probability distribution is shown for the estimated number of molecules is 2000; (G) The coverage interval is [8934.5, 11441.2] when the estimated number of molecules is 10000; (H) The cumulative probability distribution is shown for the estimated number of molecules is 10000.

[0026] Figure 9 The following are schematic diagrams illustrating the distribution of target molecule count ranges and (optimal) measurement precision for 512 and 1024 droplet systems with a volume variation coefficient of 0 to 10, as provided for some examples of this application; wherein, (A) is a distribution diagram of the target molecule count range for 512 droplet systems with a volume variation coefficient of 0 to 10; (B) is a distribution diagram of (optimal) measurement precision for 512 droplet systems with a volume variation coefficient of 0 to 10; (C) is a distribution diagram of the target molecule count range for 1024 droplet systems with a volume variation coefficient of 0 to 10; and (D) is a distribution diagram of (optimal) measurement precision for 1024 droplet systems with a volume variation coefficient of 0 to 10.

[0027] Figure 10 The diagram illustrates the comparison of the global measurement uncertainty distribution and measurement dynamic range for 512 and 1024 droplet systems with volume variation coefficients ranging from 0 to 10, as provided in this application.

[0028] Figure 11 The following are examples of random emulsion digital PCR images and detailed diagrams of extracted droplets based on the MGISEQ-2000 platform provided in this application: (A) Original grayscale thumbnail of random emulsion digital PCR based on the MGISEQ-2000 platform; (B) Detailed diagram of original grayscale droplets of random emulsion digital PCR based on the MGISEQ-2000 platform; (C) Thumbnail of droplet segmentation of random emulsion digital PCR based on the MGISEQ-2000 platform; (D) Detailed diagram of droplet segmentation of random emulsion digital PCR based on the MGISEQ-2000 platform.

[0029] Figure 12 This is a schematic diagram showing the statistical results of the pixel area and fluorescence intensity distribution of extracted droplets based on the MGISEQ-2000 platform, which are provided for some examples in this application.

[0030] Figure 13 The following are schematic diagrams illustrating the inclusion intervals (CIs) of the target molecule count calculated using the Poisson binomial distribution DFT-CF based on the MGISEQ-2000 platform for random emulsion digital PCR, where the inclusion probability is 95%. (A) When the number of negative droplets is 11645, the inclusion interval for the target molecule count m is [9486, 10085.9]; (B) When the number of negative droplets is 11645, the cumulative probability distribution is shown. Detailed Implementation

[0031] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0032] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in sequences other than those illustrated or described herein. In this application, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or server that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or devices. In the description of this application, unless otherwise stated, "a plurality of" means two or more.

[0033] This application addresses the shortcomings of uncertainty assessment in random emulsion digital measurement technology. The inventors, from the perspective of probabilistic statistical model theory, correct the Poisson distribution statistical method used in classical digital measurement technology, establishing a more general and computationally accurate Poisson binomial distribution probability model. Several numerical calculation methods are then employed to assess the measurement uncertainty of random emulsion digital measurement technology. This allows for wider application of random emulsion digital measurement technology. Specifically, this application proposes a method, system, computer program product, computing device, and computer-readable storage medium for determining the detection uncertainty of the content of analytes in random emulsions. These are described below:

[0034] Methods for determining the detection uncertainty of analyte content in random emulsions

[0035] On the one hand, this application proposes a method for determining the detection uncertainty of the content of analyte molecules in random emulsions, referring to... Figure 1 ,include:

[0036] S1, determine the total number of reaction zones in the random emulsion, the volume of each reaction zone, and classify each reaction zone into a positive reaction zone and a negative reaction zone, wherein the positive reaction zone contains the test molecule, and the negative reaction zone does not contain the test molecule.

[0037] In some examples of this application, the aforementioned random emulsion is at least one of a digital PCR reaction system, a digital isothermal amplification reaction system, and a digital ELISA reaction system.

[0038] In this document, unless otherwise specified, the term "reaction partition" is derived by dispersing the reaction system into multiple small units or regions. Each reaction partition is an independent, tiny reaction region capable of independently undergoing a chemical or biological reaction. In some examples of this application, the reaction partition can be a droplet, micropore, or microcavity, etc. In a specific example of this application, the reaction partition is selected from droplets.

[0039] In some examples of this application, the system to be emulsified in a preset container (such as a flat rectangular capillary, a double-sided glass-sealed sandwich pool, or a glass-monocrystalline silicon-sealed sandwich pool, etc.) is randomly emulsified to obtain several isolated reaction partitions (such as droplets). The system to be emulsified includes a sample to be tested. The total number of reaction partitions is randomly generated, and the total number is a positive integer greater than 1. The reaction partitions are randomly formed, and each volume is randomly generated, and the sum of the volumes is not greater than the volume of the emulsified system. The reaction partitions are amplified. After the amplification process is completed, the reaction partitions are image-acquired to obtain a target image. The image regions corresponding to each reaction partition in the target image are analyzed to determine the total number of reaction partitions in the random emulsion, the volume of each reaction partition, and to classify each reaction partition into a positive reaction partition and a negative reaction partition.

[0040] Understandably, during the amplification process of reaction partitions, there are reaction partitions containing the target molecule. Specific primers, under the action of nucleic acid polymerase, cause temperature-sensitive cyclic amplification of the target molecule, thereby amplifying the signal of the analyte molecule and enhancing the signal of the indicator in the corresponding reaction partition. However, reaction partitions that do not contain the target molecule will not cause an enhancement of the indicator signal due to the amplification reaction. Therefore, the presence or absence of the target molecule in each partition can be determined based on the different indicator signal enhancement states. The indicator may include, but is not limited to, fluorescent dyes.

[0041] The amplification process is considered complete when the signal intensity based on the indicator no longer changes significantly. The target image is then acquired using an image acquisition device. After acquiring the target image, the image region of each reaction partition within the target image can be determined. Based on the location information of each reaction partition's image region in the target image, the total number of reaction partitions and the volume information of each reaction partition are obtained.

[0042] Furthermore, feature extraction is performed on the image regions corresponding to each reaction partition in the target image to obtain feature information corresponding to each image region. For each image region, the feature information of the image region is matched with preset feature information. If the feature information of the image region does not match the preset feature information, it is determined that the reaction partition corresponding to the image region does not contain the test molecule, and the reaction partition is a negative reaction partition. If the feature information of the image region matches the preset feature information, it is determined that the reaction partition corresponding to the image region contains the test molecule, and the reaction partition is a positive reaction partition.

[0043] In some examples of this application, the aforementioned preset feature information may be an optical signal. (See reference...) Figure 4 In digital measurement techniques for homogeneous and random emulsion systems, the volume information of reaction zones (such as droplets) and the information of positive and negative droplets can be obtained from the target image. Specifically, bright droplet areas indicate positive droplets, while dark droplet areas indicate negative droplets.

[0044] S2, based on a predetermined value of the content m of the analyte molecules in the emulsion. The cumulative distribution function of the Poisson binomial distribution is determined by the total number of reaction zones and the volume of each reaction zone in a random emulsion, wherein the Poisson binomial distribution is used to describe the number of reaction zones containing a specific number of analyte molecules.

[0045] The Poisson binomial distribution is a discrete probability distribution that describes the number of successes in a plurality of independent trials, where the probability of success in each trial can be different. The cumulative distribution function (CDF) of this distribution gives the probability for a given number of successes. However, directly calculating the CDF of the Poisson binomial distribution can be very complex, especially when the number of trials is large. Therefore, to simplify the calculation, an approximation method can be used, which approximates the Poisson binomial distribution as a normal distribution with skewness correction. Here, "skewness correction" is used to adjust the skewness of the normal distribution to make it closer to the true shape of the Poisson binomial distribution.

[0046] Specifically, the cumulative distribution function of the Poisson binomial distribution is determined using a skewness correction method, including:

[0047] S2-1, based on a predetermined numerical value of the content m of the analyte molecules in the emulsion. The total number of reaction zones and the volume of each reaction zone in a random emulsion are used to determine the expectation, variance, and skewness of a Poisson binomial distribution used to describe the number of reaction zones containing a specific number of analyte molecules.

[0048] In some examples of this application, the numerical value of the content m of the analyte molecules in the aforementioned emulsion... The number of reaction zones containing a specific number of test molecules in the random emulsion is determined based on the fact that the number of zones follows a Poisson binomial distribution.

[0049] In some examples of this application, the numerical value of the content m of the analyte molecules in the aforementioned emulsion... It is determined by satisfying the following formula:

[0050]

[0051] Among them, v i v represents the volume of the i-th reaction zone. p v represents the volume of the p-th positive reaction region. q Let q represent the volume of the q-th negative reaction zone, j represent the number of negative reaction zones, and n represent the number of reaction zones in the emulsion.

[0052] In some examples of this application, the numerical value of the content m of the analyte molecule in the emulsion is... It is determined through the following steps: (a) generating multiple values ​​of m; (b) calculating the left-hand and right-hand values ​​of formula (5) based on each value; and (c) selecting the value that satisfies the predetermined requirements for the left-hand and right-hand values ​​as the predetermined value of the molecular content m to be measured in the emulsion. In some examples of this application, the aforementioned values ​​of m are in the range of 0 to 1,000,000, generated with a step size of 0.1.

[0053] In some examples of this application, in step (c), a value that makes the ratio of the left-side value to the right-side value equal to 1 is selected as the predetermined value of the content m of the analyte in the emulsion.

[0054] In other examples of this application, in step (c), multiple values ​​are selected such that the ratio of the left-side value to the right-side value is between 0.99 and 1.01, and the average of the multiple values ​​is taken as the predetermined value of the content m of the analyte in the emulsion.

[0055] In some examples of this application, the data obtained in step S1 (including: the total number of reaction partitions, the volume of each reaction partition) and the value of m obtained by any of the aforementioned methods are used. Determine the expectation, variance, and skewness of the Poisson binomial distribution. The expectation represents the probability mean of the distribution of reaction partitions containing a specific number of test molecules; the variance represents the dispersion of the distribution of reaction partitions containing a specific number of test molecules; and the skewness measures the asymmetry (direction and degree of skewness) of the distribution of reaction partitions containing a specific number of test molecules.

[0056] In some examples of this application, the expectation, variance, and skewness are determined according to the following formulas:

[0057]

[0058]

[0059]

[0060] Where μ represents the expectation; σ 2 Variance is represented by γ; skewness by n; total number of response zones by v i denoted by , m represents the content of the analyte molecule, and C0 represents the number of negative reaction zones.

[0061] S2-2, Based on the expectation, variance and skewness determined in step S2-1, determine a fine normal approximate cumulative distribution function to describe the number of reaction partitions containing a specific number of analyte molecules.

[0062] In some examples of this application, the refined normal approximation cumulative distribution function is obtained by correcting the cumulative distribution function of the standard normal distribution using the skewness. The Poisson binomial distribution is simulated more accurately by adjusting the parameters of the normal distribution (such as expectation and variance).

[0063] In some examples of this application, the refined normal approximation cumulative distribution function is determined by the following steps:

[0064] (i) Generate a series of values ​​in the range of 0 to n, where n is the total number of the reaction partitions;

[0065] (ii) Convert the series of values ​​into Z-scores of the standard normal distribution as x values;

[0066] (iii) Construct a normal distribution curve for x based on the expected value and variance;

[0067] (iv) Based on the normal distribution curve, determine the probability density function of the standard normal distribution. The probability density function of the standard normal distribution determined by the expected value and variance obtained in step S2, and the cumulative distribution function Φ(x) of the standard normal distribution; and

[0068] (v) Based on formula (4), Determine the fine-grained normal approximation cumulative distribution function.

[0069] in, It is the probability density function of the standard normal distribution determined by the expected value and variance obtained in step S2, and Φ(x) is the cumulative distribution function of the standard normal distribution.

[0070] For example, if n is set to 1000, a series of numerical values ​​x in the range of 0 to 1000 are generated. i (i=1, 2,…,t; 0≤x1≤x t (≤1000), each value is a generated value. The generated values ​​are then converted to Z-scores of a standard normal distribution. The Z-score measures how many standard deviations an element has from the mean; the formula for calculating the Z-score is:

[0071]

[0072] Where, x i To generate numerical values, μ is the expected value and σ is the standard deviation. Let x... i Substituting these values ​​into the Z-score calculation formula yields a series of Z-score values, i.e., x-values.

[0073] Based on μ and σ, construct a normal distribution curve for x. Based on the normal distribution curve, determine the probability density function (PDF) and cumulative distribution function (CDF) of the standard normal distribution. The PDF gives the probability density of a specific value, while the CDF gives the probability that the value is less than or equal to a specific value.

[0074] The skewness of the Poisson binomial distribution is used to correct the CDF of the standard normal distribution, making the normal distribution closer to the actual shape of the Poisson binomial distribution. The skewness correction formula is shown in formula (4):

[0075]

[0076] in, It is the probability density function of the standard normal distribution determined by the expected value and variance obtained in step S2, and Φ(x) is the cumulative distribution function of the standard normal distribution.

[0077] The above method for determining the cumulative distribution function of the Poisson binomial distribution based on the refined normal approximation simplifies complex probability calculations and improves the accuracy and precision of the uncertainty assessment of the detection of the content of analytes in random emulsions.

[0078] In step S2, the cumulative distribution function of the Poisson binomial distribution can also be determined by the discrete Fourier transform (DFT) based on the characteristic function of the random variable.

[0079] In some examples of this application, the cumulative distribution function is determined by the following steps: for each reaction partition, based on the volume of the reaction partition and the numerical value of the analyte molecule content m. Based on the total volume of the emulsion, the probability that the reaction zone is a negative reaction zone is determined, resulting in multiple discrete values. A feature function is constructed based on these discrete values. A complex sequence of the feature function is determined based on the feature function, where each element is a complex number, and its real and imaginary parts correspond to cosine and sine values ​​at different frequencies, respectively. A fast Fourier transform is performed on the complex sequence to determine the probability mass function of the Poisson binomial distribution. The cumulative distribution function is determined based on the probability mass function.

[0080] The aforementioned value of the content m of the analyte The method for determining is the same as above. For a detailed description, please refer to section S2-1. Due to space limitations, it will not be repeated here.

[0081] To facilitate understanding, the following example will be used to describe in detail the process of determining the cumulative distribution function of the Poisson binomial distribution:

[0082] 1. Determine the probability that each reaction partition is a negative reaction partition.

[0083] Assuming the total emulsion volume is The total number of reaction zones is n, the content of the analyte molecules is m, and the volume of the i-th reaction zone is v. i The probability that the i-th reaction partition becomes a negative partition can be expressed as:

[0084]

[0085] For each reaction partition, calculate the probability that it will become a negative partition, and obtain a series of negative partition probability values ​​(these probability values ​​are discrete numerical values ​​that depend on the volume of each partition).

[0086] 2. Constructing characteristic functions based on multiple discrete numerical values

[0087] Let P be the probability that the i-th reaction partition is a negative partition. i The characteristic function is generally a complex exponential representation of the probability that each partition is a negative partition, which can be expressed as:

[0088]

[0089] Where t is the frequency parameter, representing the input of the characteristic function, and the complex exponent e is... itUsed to construct complex representations (complex numbers are introduced into the characteristic function for Fourier transform);

[0090] 3. Construct a complex sequence of characteristic functions.

[0091] For each frequency t of the characteristic function, the corresponding complex value can be calculated. The real part and imaginary part correspond to the cosine and sine values ​​at different frequencies, respectively. These complex values ​​form a sequence of complex numbers used to represent the probability characteristics of the Poisson binomial distribution.

[0092] 4. Perform Fast Fourier Transform (FFT) on complex sequences.

[0093] After applying the Fast Fourier Transform, the complex sequence of the feature function can be transformed into a real sequence, where each element represents the probability of a specific number of successes (the number of times a negative reaction partition occurs, i.e., the number of negative reaction partitions), which is the probability mass function (PMF). The resulting PMF contains all probabilities from zero successes (0) to the maximum number of successes (n).

[0094] 5. Determining the cumulative distribution function (CDF) based on the probability mass function.

[0095] The cumulative distribution function (CDF) represents the probability that a random variable takes a value less than or equal to a specific value. Therefore, the CDF is the cumulative form of the PMF; for each possible value k (representing the number of successes) of the Poisson binomial distribution, the cumulative distribution function can be expressed as: Based on the cumulative distribution function (CDF) of the Poisson binomial distribution, the distribution of the number of negative reaction zones can be determined.

[0096] In other examples of this application, the cumulative distribution function of the Poisson binomial distribution can also be determined by any of the Poisson approximation algorithm, neural network algorithm, recursive algorithm, convolution algorithm, and Monte Carlo algorithm.

[0097] The method for determining the cumulative distribution function of the Poisson binomial distribution using the Poisson approximation algorithm includes: 1) Assuming that the success probability of independent partitions is small and the number of reaction partitions is large, the probability of each partition in the Poisson binomial distribution is approximated by a single Poisson distribution; 2) Calculating the sum of the success probabilities of each partition (total event rate), i.e., λ = ∑p i , where p i 3) Calculate the cumulative distribution function (CDF) of the Poisson distribution: For a Poisson distribution, the CDF can be expressed as: Where λ represents the parameter of the Poisson distribution, corresponding to the expected number of successes; 4) Output the CDF of the Poisson approximation: Use this CDF to replace the cumulative distribution function of the Poisson binomial distribution to evaluate the measurement uncertainty.

[0098] Determining the cumulative distribution function (CDF) of the Poisson binomial distribution using a neural network algorithm includes: 1) Constructing a training dataset: Generating simulated data, including the volume of each partition, the number of target molecules, and positive / negative labels; 2) Designing a neural network model: Using a regression neural network model (such as a multilayer perceptron MLP) or a convolutional neural network (CNN) to learn the probability distribution of target molecules in the reaction partitions; 3) Training the model: By inputting partition features (such as volume, total number of partitions) and the number of molecules to be tested, training the neural network to output the probability that each partition is a positive partition; 4) Predicting and calculating the CDF: Using the trained model to predict the positive probability of each partition, and then accumulating the results to obtain an approximate CDF value of the Poisson binomial distribution; 5) Using the output of the neural network: Obtaining the cumulative distribution function of the Poisson binomial distribution based on the output of the neural network.

[0099] The cumulative distribution function of the Poisson binomial distribution is determined using a recursive algorithm, which includes: 1) initializing the partition state: using the probability of the first partition as the initial state; 2) recursively calculating the probability of each partition: assuming the current cumulative probability is F. k-1 For the k-th partition, calculate its negative or positive probability p. k 3) Accumulate the current partition probability: sum the current partition probability p k The cumulative probability F is obtained by adding it to the probability of the previous partition. k 4) Continue traversing all partitions to obtain the final CDF approximation of the Poisson binomial distribution. The recursive algorithm requires initial conditions for indirect calculation, and its computational complexity is O(n^2). 2 The recursive formula is inefficient and consumes a lot of computing resources. Furthermore, the calculation of the recursive formula is numerically unstable. It should be avoided when n is large (such as greater than or close to 20).

[0100] The cumulative distribution function of the Poisson binomial distribution is determined using a convolution algorithm, including:

[0101] 1) Initialize the probability distribution of a single partition: For each partition, construct its negative and positive probability distributions; 2) Convolution operation: Convolve the probability distributions of two adjacent partitions to obtain the joint probability distribution of the two partitions; Convolve this distribution with the probability of the third partition, and so on, until the probability distributions of all partitions have been convolved; 3) Obtain the total probability distribution: Obtain the joint probability distribution of all partitions through layer-by-layer convolution; 4) Calculate the CDF: Accumulate the convolution results to obtain the cumulative distribution function of the Poisson binomial distribution.

[0102] Determining the cumulative distribution function (CDF) of the Poisson binomial distribution using the Monte Carlo algorithm includes: 1) Initializing simulation parameters: determining the total number of reaction partitions, the volume of each partition, and the number of molecules to be tested; 2) Random simulation experiments: conducting multiple random experiments (e.g., more than 1000), randomly assigning molecules to be tested to each partition based on the partition volume and the total number of molecules in each experiment; counting the number of positive and negative partitions in each experiment; 3) Calculating the success rate distribution: recording the number of positive partitions for each simulation experiment; accumulating all experimental results to obtain the frequency distribution of the success rate of positive partitions; 4) Calculating the CDF: calculating the cumulative distribution function of the success rate based on the frequency distribution, which is used to estimate the CDF of the Poisson binomial distribution.

[0103] S3. Based on the cumulative distribution function, evaluate the uncertainty of the content m of the analyte molecules in the emulsion.

[0104] In measurement, uncertainty describes the coverage interval (CI) around the measurand m, that is, the probability that the actual value falls within a certain range. By setting a coverage probability, such as 95% or 99%, an estimate of the interval can be calculated under that coverage probability, and the uncertainty can be evaluated based on this interval.

[0105] In some examples of this application, the uncertainty of m is evaluated by the following steps: using the cumulative distribution function, based on a predetermined inclusion probability, the inclusion interval of the content m is determined in order to determine the interval estimate corresponding to m. and Based on the above and and the value of m Determine the uncertainty of m.

[0106] Specifically, the inclusion interval of the analyte molecule content *m* is calculated using the cumulative distribution function (CDF) and a pre-defined inclusion probability (e.g., 95%). First, the upper and lower limits of the inclusion interval, i.e., the interval estimates, are determined based on the CDF at the specified inclusion probability. and Let m and y represent the lower and upper limits of the true content m, respectively, such that there is a 95% probability that the true value lies within the range of m. and Between. Then, based on formula (6), use and and Calculate the uncertainty. The specific uncertainty can be calculated by calculating the relative uncertainty, i.e. Used to represent measured values The surrounding confidence range quantifies the measurement accuracy. Furthermore, Precision(j,n,v) can be determined. i Information regarding the distribution of C0 values ​​in the interval [0,n], measurement precision, measurement error, upper and lower limits of quantitation, dynamic range, etc.

[0107] The aforementioned technical solution, by utilizing the cumulative distribution function and the inclusion interval calculation method, can accurately assess the uncertainty of the content of the analyte, thereby providing a reliable range for the measurement results. This method effectively improves the measurement accuracy of random emulsion digital assays, enabling experimenters to quantify the reliability and deviation range of measured values, helping to reduce experimental errors, improve the repeatability of experimental data, and provide a basis for experimental optimization and cost control. This method also has universal applicability, contributing to the formation of a standardized measurement uncertainty assessment process.

[0108] To facilitate understanding, the following two examples illustrate in detail the method for determining the detection uncertainty of the analyte content in random emulsions:

[0109] Example 1: A method for determining the detection uncertainty of analyte content in random emulsions based on the fine normal approximation

[0110] In a random emulsion system, when the total number of analyte molecules m is an undetermined constant, the total number of analyte molecules X contained in the i-th droplet in the system is... i Expectation of (i = 1, 2, 3, ..., n) It is also a constant, and the probability that the i-th droplet contains any of the test molecules is also constant. If it is small enough, then by definition, X i Approximately follows the parameter: Poisson distribution: Further derivation yields the probability that the number of reaction partitions C0 that does not contain the analyte molecule is j:

[0111]

[0112] That is, C0 follows a Poisson binomial distribution. Let v be the number of combinations of selecting j reaction partitions from a total of n reaction partitions that do not contain the analyte molecule. p (p = 1, 2, 3, ..., nj) represents the volume of the p-th reaction partition containing the analyte molecule, v q (q=1,2,3,…,j) represents the volume of the q-th reaction partition that does not contain the analyte molecule. Formula (X1) is the Poisson binomial probability mass function of the random variable C0. Based on this distribution, the expected value μ and variance σ of C0 can be determined using the above formula. 2 And the skewness γ are respectively:

[0113]

[0114]

[0115]

[0116] When the number of molecules to be measured, m, is an undetermined constant, the above three distribution parameters depend only on the total number of droplets, n, and the volume of each droplet, v. i There is a relationship between the variables. According to the Central Limit Theorem, when n is very large (i.e., the total number of droplets is large), the Poisson binomial distribution of C0 approximates a normal distribution with expectation μ and standard deviation σ. That is, the cumulative distribution function of the Poisson binomial distribution of C0 has the following relationship:

[0117] CDF(j,n,v i )≈Φ((j+0.5-μ) / σ), j=0,1,2,…,n, formula (X2);

[0118] Where Φ(x) is the cumulative distribution function of the standard normal distribution, and the above is the normal approximation (NA) of the Poisson binomial distribution. Since the normal distribution is symmetrical, while the Poisson binomial distribution may be asymmetrical, the refined normal approximation (RNA) can be used to correct the skewness in the Poisson binomial distribution.

[0119] CDF(j,n,v i )≈G((j+0.5-μ) / σ), j=0,1,2,…,n, formula (X3);

[0120] Wherein, G(x) satisfies formula (4): Φ(x) and Here, are the cumulative distribution function and probability density function (PDF) of the standard normal distribution, respectively, and γ is the skewness of the Poisson binomial distribution of C0. The above formula represents the refined normal approximation of the Poisson binomial distribution. In some cases, since G(x) is not always within the interval [0,1], values ​​less than 0 need to be corrected to 0, and values ​​greater than 1 need to be corrected to 1. The refined normal approximation algorithm is a direct calculation method, much faster than the recursive algorithm. For example, for n=500, the precise recursive algorithm takes approximately 0.16 seconds to calculate the accurate PDF, while the RNA algorithm only takes 0.0002 seconds, approximately 800 times faster; for n=1000, the precise recursive algorithm takes approximately 0.6 seconds to calculate the accurate PDF, while the RNA algorithm only takes 0.00034 seconds, approximately 2000 times faster. Furthermore, the advantage of the RNA algorithm lies in its ability to directly evaluate the CDF and PDF functions at specific j values. In contrast, by definition, the exact recursive algorithm requires evaluating all previous values ​​to compute the distribution value at j = k: j = 0, 1, 2, ..., k. In summary, when the Poisson binomial distribution has many parameters, the improved normal approximation can be used to compute the CDF and PDF. The normal approximation works very well when n ≥ 500 and the mean of the distribution is sufficiently far from the values ​​0 and n. The RNA algorithm is not only fast, but it can also be used to directly compute the distribution for any value of j without computing all previous values.

[0121] Based on the above derivation, a fine-grained normal approximation CDF curve for C0 can be calculated. Using this curve and the values ​​on it, the two-sided quantile z of the fine-grained normal approximation of the Poisson binomial distribution can be calculated using interpolation (linear interpolation, conformal piecewise cubic interpolation, cubic spline interpolation, Akima algorithm interpolation, etc.) or fitting (linear fitting, nonlinear fitting, etc.) methods. α / 2 With z 1-α / 2 , where z α / 2 With z 1-α / 2 Satisfying P{C0≤z α / 2}=α / 2,P{C0≤z 1-α / 2}=1-α / 2, thus we obtain a fine normal approximation of C0 with a coverage probability of 1-α and a coverage interval (CI) of [z α / 2 ,z 1-α / 2 Based on the values ​​of these two-sided quantiles, j is taken respectively. low =z α / 2 and j high =z 1-α / 2 Substituting these values ​​into formula (5) allows us to calculate the interval estimate corresponding to the number of target molecules m to be determined. and

[0122]

[0123] Among them, v i v represents the volume of the i-th reaction zone. p v represents the volume of the p-th positive reaction region. q Let q represent the volume of the q-th negative reaction zone, j represent the number of negative reaction zones, and n represent the number of reaction zones in the emulsion.

[0124] Therefore, the containment interval of the target number of molecules m can be obtained as follows: Substituting the measured value j of C0 into the above equation, we can calculate the maximum likelihood point estimate of the number of target molecules m to be determined.

[0125] Therefore, the size of the containment interval of the target molecule number m can be further calculated. Relative to the target molecule number point estimate The relationship between the changes can be calculated, and the theoretical target number of molecules m and the corresponding C0 value can be obtained when the optimal measurement precision is achieved.

[0126]

[0127] Precision(j,n,v) is calculated based on the above formula. i When the minimum value is obtained, the corresponding The values ​​can be determined, and Precision(j,n,v) can be determined. i Regarding the distribution of C0 values ​​in the interval [0,n], further calculations are performed on measurement uncertainty information such as measurement precision, measurement error, coverage interval, detection limit, upper and lower limits of quantitation, dynamic range, and measurement resolution.

[0128] Example 2: A method for determining the uncertainty of analyte content detection in random emulsions based on Discrete Fourier Transform-characteristic function.

[0129] The characteristic function (CF) is an important concept in probability theory and statistics, primarily used to describe the probability distribution characteristics of a random variable. It is the Fourier transform of the random variable, providing comprehensive information about its distribution and fully defining its probability distribution. For a random variable X, its characteristic function... The Fourier transform of the exponential function of the expected value of the distribution of this random variable is defined as follows:

[0130]

[0131] Where E represents the expected value, and the bolded i is the imaginary unit, satisfying i 2 = -1, t is a real parameter, usually representing frequency, and X is a random variable. For (I i Let I be a random variable representing the state of the i-th droplet, indicating whether it contains or does not contain any copy of the target molecule. i The value is 0, excluding the case I. i Takes the value 1. This refers to the total number of droplets in the system that do not contain any copies of the target molecule, taking values ​​in the range [0, n], following a Poisson binomial distribution:

[0132]

[0133] Let t = ωl, l = 0, 1, 2, ..., n, and ω = 2π / (n+1). Substituting these values ​​into the above equation, we get:

[0134]

[0135] in, Note that the left side of the above equation is a complex sequence {PMF(0,n,v)} i ), PMF(1,n,v i ), ..., PMF(n,n,v i The inverse discrete fourier transform (IDFT) of the Poisson binomial distribution is obtained by performing a Fourier transform on both sides of the above equation. The left side of the equation yields the probability mass function sequence of the Poisson binomial distribution: {PMF(0,n,v)} i ), PMF(1,n,v i ), ..., PMF(n,n,v i Specifically, its probability mass function can be obtained:

[0136]

[0137] Therefore, the cumulative distribution function of the Poisson binomial distribution can be obtained:

[0138]

[0139] Furthermore, the Discrete Fourier Transform-Eigenfunction (DFT-CF) algorithm can be used to calculate the function values ​​of the PMF and CDF. The specific process is as follows:

[0140] First, based on the estimated number of target molecules... Total number of droplets n and volume of each droplet v i Calculate the probability that each droplet does not contain any copy of the target molecule.

[0141] Take z i (l)=1-P{X i =0}+P{X i =0}cos(ωl)+iP{X i =0}sin(ωl), where ω=2π / (n+1), formula (X10),

[0142] Then the complex number z i The modulus and argument of (l) are calculated using the following two formulas:

[0143] |z i (l)|={[1-P{X i =0}+P{X i =0}cos(ωl)] 2 +[P{X i =0}sin(ωl)] 2} 1 / 2 , formula (X11),

[0144] Arg(z i (l))=atan2(P{X i =0}sin(ωl) / [1-P{X i =0}+P{X i =0}cos(ωl)]), formula (X12), where atan2() is the arctangent function in the fourth quadrant, which is an extension of arctan(), and its range is extended from [-π / 2,π / 2] to [-π,π].

[0145] In addition, take We can further obtain:

[0146]

[0147] Let l take the values ​​1, 2, ..., ceil(n / 2), where ceil() is the floor function, and calculate all values ​​of a. l With b l ,

[0148] Let l take the values ​​ceil(n / 2)+1, ceil(n / 2)+2, ..., n respectively, and calculate all a values ​​using the following recursive formula. l With b l :

[0149] a l =a n+1-l b l =-b n+1-l , formula (X15)

[0150] The a calculated abovel As the real part, b l As the imaginary part, it forms the set of complex numbers: {a1+ib1,a2+ib2,…,a...} n+1 +ib n+1 Divide each term of the complex set by (n+1), then perform a Fast Fourier Transform (FFT) to obtain the PMF values ​​of the Poisson binomial distribution. Finally, calculate the CDF of the Poisson binomial distribution using the following formula:

[0151]

[0152] Based on the above derivation and calculation results, a Discrete Fourier Transform (CDF) curve of C0 can be calculated using the characteristic function. Based on this curve and the values ​​on it, the two-sided quantile z of the Poisson binomial distribution can be calculated using interpolation (linear interpolation, conformal piecewise cubic interpolation, cubic spline interpolation, Akima algorithm interpolation, etc.) or fitting (linear fitting, nonlinear fitting, etc.) methods. α / 2 With z 1-α / 2 , where P{C0≤z α / 2}=α / 2,P{C0≤z 1-α / 2}=1-α / 2, thus obtaining a containment interval [z] of C0 with a containment probability of 1-α. α / 2 ,z 1-α / 2 Based on the values ​​of these two-sided quantiles, j is taken respectively. low =z α / 2 and j hogh =z 1-α / 2 Substituting these values ​​into the following formula (see application number PCT / CN2019 / 122068, authorization number CN114729397B), the interval estimate corresponding to the number of target molecules m can be calculated. and

[0153]

[0154] Among them, v i v represents the volume of the i-th reaction zone. p v represents the volume of the p-th positive reaction region. q Let q represent the volume of the q-th negative reaction zone, j represent the number of negative reaction zones, and n represent the number of reaction zones in the emulsion.

[0155] Therefore, the containment interval of the target number of molecules m can be obtained as follows: Substituting the measured value j of C0 into the above equation, we can calculate the maximum likelihood point estimate of the number of target molecules m to be determined.

[0156] Therefore, the size of the containment interval of the target molecule number m can be further calculated. Relative to the target molecule number point estimate The relationship between the changes can be calculated, and the theoretical target number of molecules m and the corresponding C0 value can be obtained when the optimal measurement precision is achieved.

[0157]

[0158] Precision(j,n,v) is calculated based on the above formula. i When the minimum value is obtained, the corresponding The values ​​can be determined, and Precision(j,n,v) can be determined. i Regarding the distribution of C0 values ​​in the interval [0,n], further calculations are performed on measurement uncertainty information such as measurement precision, measurement error, coverage interval, detection limit, upper and lower limits of quantitation, dynamic range, and measurement resolution.

[0159] A systematic method for determining the detection uncertainty of analyte content in random emulsions

[0160] On the other hand, this application proposes a method system for determining the detection uncertainty of the content of analytes in random emulsions, referring to... Figure 2 The system includes: a measurement parameter determination module 100, a distribution function construction module 200, and an uncertainty evaluation module 300. Among them,

[0161] Module 100 is used to determine the total number of reaction zones in a random emulsion, the volume of each reaction zone, and to classify each reaction zone into a positive reaction zone and a negative reaction zone, wherein the positive reaction zone contains the analyte molecule, and the negative reaction zone does not contain the analyte molecule.

[0162] In some examples of this application, the random emulsion is at least one of a digital PCR reaction system, a digital isothermal amplification reaction system, and a digital ELISA reaction system.

[0163] Module 200 is used to determine the cumulative distribution function of a Poisson binomial distribution based on a predetermined content m of analyte molecules in an emulsion and the total number of reaction zones and the volume of each reaction zone in a random emulsion, wherein the Poisson binomial distribution is used to describe the number of reaction zones containing a specific number of analyte molecules.

[0164] In some examples of this application, the content m of the analyte molecules in the emulsion is determined based on the reaction partitions in the emulsion satisfying a Poisson binomial distribution. Specifically, the content m of the analyte molecules in the emulsion is determined by satisfying the following formula:

[0165]

[0166] Among them, v i v represents the volume of the i-th reaction zone. p v represents the volume of the p-th positive reaction region. q Let q represent the volume of the q-th negative reaction zone, j represent the number of negative reaction zones, and n represent the number of reaction zones in the emulsion.

[0167] More specifically, the numerical value of the content m of the analyte molecules in the emulsion. It is determined through the following steps: (a) generating multiple values ​​of m; (b) calculating the left-hand and right-hand values ​​of formula (5) based on each value; and (c) selecting the value that satisfies the predetermined requirements for the left-hand and right-hand values ​​as the predetermined value of the molecular content m to be measured in the emulsion.

[0168] In some examples of this application, the plurality of values ​​are generated in the range of 0 to 1,000,000 with a step size of 0.1.

[0169] In some examples of this application, in step (c), a value that makes the ratio of the left-side value to the right-side value equal to 1 is selected as the predetermined value of the content m of the analyte in the emulsion.

[0170] In some examples of this application, in step (c), multiple values ​​are selected such that the ratio of the left-side value to the right-side value is between 0.99 and 1.01, and the average of the multiple values ​​is taken as the predetermined value of the molecular content m to be measured in the emulsion.

[0171] In some examples of this application, the cumulative distribution function of the Poisson binomial distribution is determined by approximating the Poisson binomial distribution as a normal distribution with skewness correction.

[0172] In some examples of this application, module 200 further includes: a value based on a predetermined numerical value of the content m of the analyte molecule in the emulsion. The total number of reaction zones in a random emulsion and the volume of each reaction zone are used to determine the expectation, variance, and skewness of a Poisson binomial distribution used to describe the number of reaction zones containing a specific number of analyte molecules. Based on the determined expectation, variance, and skewness, a fine-grained normal approximate cumulative distribution function is determined to describe the number of reaction zones containing a specific number of analyte molecules.

[0173] In some examples of this application, the expectation, variance, and skewness are determined according to the following formulas:

[0174]

[0175]

[0176]

[0177] Where μ represents the expectation; σ 2 Variance is represented by γ; skewness by n; total number of response zones by v i denoted by , m represents the content of the analyte molecule, and C0 represents the number of negative reaction zones.

[0178] In some examples of this application, the fine normal approximation cumulative distribution function is obtained by correcting the cumulative distribution function of the standard normal distribution using the skewness.

[0179] In some examples of this application, the refined normal approximation cumulative distribution function is determined by the following steps: (i) generating a series of values ​​in the range of 0 to n, where n is the total number of reaction zones; (ii) converting the series of values ​​into Z-scores of a standard normal distribution as x values; (iii) constructing a normal distribution curve for x according to the expected value and variance; and (iv) determining the probability density function of the standard normal distribution based on the normal distribution curve. The probability density function of the standard normal distribution determined by the expected value and variance obtained in step S2, and the cumulative distribution function Φ(x) of the standard normal distribution; and (v) based on formula (4), Determine the fine-grained normal approximation cumulative distribution function, where, It is the probability density function of the standard normal distribution determined by the expected value and variance obtained in step S2, and Φ(x) is the cumulative distribution function of the standard normal distribution.

[0180] In other examples of this application, the cumulative distribution function of the Poisson binomial distribution is determined by the discrete Fourier transform (DFT) based on the characteristic function of the random variable. Specifically, the cumulative distribution function is determined by the following steps: for each reaction partition, based on the total number of reaction partitions n and the partition volume v... i The total volume of the emulsion and the value of m The probability of the reaction partition being a negative reaction partition is determined, resulting in multiple discrete values, and a feature function is constructed. Based on the feature function, a complex sequence of the feature function is determined, where each element is a complex number, and its real and imaginary parts correspond to the cosine and sine values ​​at different frequencies, respectively. A fast Fourier transform is performed on the complex sequence to determine the probability mass function of the Poisson binomial distribution. The cumulative distribution function is determined based on the probability mass function.

[0181] In some other examples of this application, the cumulative distribution function of the Poisson binomial distribution can also be implemented by any of the Poisson approximation algorithm, neural network algorithm, recursive algorithm, convolution algorithm, and Monte Carlo algorithm.

[0182] The 300 module is used to evaluate the uncertainty of the content m of the analyte molecules in the emulsion based on the cumulative distribution function.

[0183] In some examples of this application, evaluating the uncertainty of the content m of the analyte molecule in the emulsion based on the cumulative distribution function further includes: using the cumulative distribution function, based on a predetermined inclusion probability, determining the inclusion interval of the content value m, so as to determine the interval estimate corresponding to m. and Based on the above and and the value of m Determine the uncertainty of m. Specifically, according to formula (6), Based on the above and and Determine the uncertainty of m.

[0184] The aforementioned uncertainty-related information includes measurement precision, measurement error, coverage interval, detection limit, upper and lower limits of quantitation, dynamic range, and measurement resolution.

[0185] In some examples of this application, the aforementioned modules are interconnected. This interconnection includes both physical and network connections.

[0186] Those skilled in the art will understand that the features and advantages described above for the method of determining the detection uncertainty of the content of analytes in random emulsions are also applicable to the above system, and will not be repeated here.

[0187] It should be understood that the system embodiments and method embodiments can correspond to each other, and similar descriptions can be found in the method embodiments. To avoid repetition, further details are omitted here. Specifically, Figure 2The system shown can perform the above-described method for determining the uncertainty of the content of analytes in random emulsions, and the operations and / or functions performed by each module in the system correspond to those in the method embodiments. For the sake of brevity, they will not be described in detail here.

[0188] The system of this application embodiment has been described above from the perspective of functional modules in conjunction with the accompanying drawings. It should be understood that this functional module can be implemented in hardware, in software instructions, or in a combination of hardware and software modules. Specifically, the steps of the method embodiments in this application can be completed by integrated logic circuits in the processor's hardware and / or by software instructions. The steps of the method disclosed in this application embodiment can be directly embodied as being executed by a hardware decoding processor, or by a combination of hardware and software modules in the decoding processor. Optionally, the software module can reside in a mature storage medium in the art, such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, or registers. This storage medium is located in memory, and the processor reads information from the memory and, in conjunction with its hardware, completes the steps in the above method embodiments.

[0189] Computer program products, computing devices and computer-readable storage media

[0190] In another aspect, this application provides a computer program product, a computing device, and a computer-readable storage medium. Based on the aforementioned computer program product, computing device, or computer-readable storage medium, the method described above for determining the detection uncertainty of the content of analytes in a random emulsion is performed.

[0191] Descriptions of computer program products, computing devices, or computer-readable storage media may be referenced interchangeably. This document uses electronic devices as examples for detailed description. The term "electronic device" is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. Computing devices may also represent various forms of mobile devices, such as personal digital processors, cellular phones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely illustrative and are not intended to limit the implementation of the present disclosure described and / or claimed herein.

[0192] like Figure 3As shown, the electronic device 500 includes a computing unit 501, which can perform various appropriate actions and processes based on a computer program stored in ROM (Read-Only Memory) 502 or a computer program loaded from storage unit 508 into RAM (Random Access Memory) 503. The RAM 503 can also store various programs and data required for the operation of the device 500. The computing unit 501, ROM 502, and RAM 503 are interconnected via a bus 504. An I / O (Input / Output) interface 505 is also connected to the bus 504.

[0193] Multiple components in device 500 are connected to I / O interface 505, including: input unit 506, such as keyboard, mouse, etc.; output unit 507, such as various types of monitors, speakers, etc.; storage unit 508, such as disk, optical disk, etc.; and communication unit 509, such as network card, modem, wireless transceiver, etc. Communication unit 509 allows device 500 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.

[0194] The computing unit 501 can be various general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, CPUs (Central Processing Units), GPUs (Graphics Processing Units), various special-purpose AI (Artificial Intelligence) computing chips, various computing units running machine learning model algorithms, DSPs (Digital Signal Processors), and any suitable processor, controller, microcontroller, etc. The computing unit 501 performs the various methods and processes described above, such as methods for determining the detection uncertainty of the content of analytes in a random emulsion. For example, in some embodiments, the method for determining the detection uncertainty of the content of analytes in a random emulsion can be implemented as a computer software program tangibly contained in a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program can be loaded and / or installed on device 500 via ROM 502 and / or communication unit 509. When the computer program is loaded into RAM 503 and executed by the computing unit 501, one or more steps of the methods described above can be performed. Alternatively, in other embodiments, the computing unit 501 may be configured by any other suitable means (e.g., by means of firmware) to perform the aforementioned method for determining the detection uncertainty of the content of analytes in a random emulsion.

[0195] In this application, the logic and / or steps represented in the flowcharts or otherwise described herein, for example, can be considered as a sequenced list of executable instructions for implementing logical functions, and can be embodied in any computer-readable medium for use by, or in conjunction with, an instruction execution system, apparatus, or device (such as a computer-based system, a processor-included system, or other system that can fetch and execute instructions from, an instruction execution system, apparatus, or device). For the purposes of this specification, "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transmit programs for use by, or in conjunction with, an instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of computer-readable media include: an electrical connection having one or more wires (electronic device), a portable computer disk drive (magnetic device), random access memory (RAM), read-only memory (ROM), erasable and editable read-only memory (EPROM or flash memory), fiber optic devices, and portable optical disc read-only memory (CDROM). Furthermore, the computer-readable medium can even be paper or other suitable media on which the aforementioned program can be printed, because the aforementioned program can be obtained electronically, for example, by optically scanning the paper or other medium, followed by editing, interpreting, or, if necessary, processing in other suitable ways, and then stored in a computer memory. The various computer-readable storage media described in this invention can represent one or more devices and / or other machine-readable storage media for storing information. The term "machine-readable storage medium" can include, but is not limited to, wireless channels and various other media capable of storing, containing, and / or carrying instructions and / or data.

[0196] It should be understood that various parts of the present invention can be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, multiple steps or methods can be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, it can be implemented using any one or a combination of the following techniques known in the art: discrete logic circuits having logic gates for implementing logical functions on data signals, application-specific integrated circuits (ASICs) having suitable combinational logic gates, programmable gate arrays (PGAs), field-programmable gate arrays (FPGAs), etc.

[0197] Those skilled in the art will understand that all or part of the steps of the methods in the above embodiments can be implemented by a program instructing related hardware. The aforementioned program can be stored in a computer-readable storage medium, and when executed, the program includes one or a combination of the steps of the method embodiments.

[0198] Furthermore, the functional units in the various embodiments of the present invention can be integrated into a processing module, or each unit can exist physically separately, or two or more units can be integrated into a module. The integrated module can be implemented in hardware or as a software functional module. If the integrated module is implemented as a software functional module and sold or used as an independent product, it can also be stored in a computer-readable storage medium.

[0199] It should be understood that the various forms of processes shown above can be used to rearrange, add, or delete steps. For example, the steps described in this disclosure can be executed in parallel, sequentially, or in different orders, as long as the desired result of the technical solution disclosed in this disclosure can be achieved, and this is not limited herein.

[0200] It should be noted that the features and technical effects described in this article for different aspects can be mutually referenced, and will not be elaborated further here.

[0201] The present application's solution will be explained below with reference to embodiments. Those skilled in the art will understand that the following embodiments are for illustrative purposes only and should not be construed as limiting the scope of the present application.

[0202] Example 1: Fine Normal Approximation Evaluation Method

[0203] This embodiment uses simulated data of dispersed droplets following a log-normal distribution to verify a fine normal approximation method for evaluating the uncertainty in detecting the content of analytes in random emulsions. The specific method steps are as follows:

[0204] Step 1: Set preset parameters: total number of droplets n is 1024, average droplet volume is 4, and standard deviation of droplet volume is 8 (i.e., coefficient of variation is 200%). Use a random number generation function to generate 1024 simulated droplet volumes v that follow a log-normal distribution. i Arranged according to the size of the value, as follows Figure 5 As shown. The average droplet volume in the simulation data. The standard deviation of volume is 4.13. The maximum volume is 8.12, the minimum volume is 0.0267514598060987, the maximum volume is 149.958178997929, and the median volume is 1.75023610586248.

[0205] Step 2, assuming a predetermined value for the content m of the analyte molecules in the emulsion. The values ​​are 100, 500, 2000, and 10000 respectively. Calculate the values ​​corresponding to the different values ​​using formulas (1), (2), and (3). The expected value μ and variance σ of C02 The skewness γ is shown in Table 1.

[0206] Table 1. Expectation, variance and skewness of the distribution of negative droplet numbers corresponding to different values ​​of analyte molecule content.

[0207]

[0208] Step 3: The total number of droplets n is 1024, meaning the effective range of the number of negative droplets C0 is [0, 1024], i.e., j = 0, 1, 2, ..., 1024. Using the formula: x = (j + 0.5 - μ) / σ, calculate the values ​​of the (fine) normally approximate random variable x, i.e., x = -116.9462, -116.8217, -116.6973, ..., 10.4806.

[0209] Step 4: Based on the cumulative distribution function Φ(x) and probability density function of the standard normal distribution. And formula (4), the number of negative droplets C0 is approximately followed by a fine normal cumulative distribution function: CDF(j,n,v) i )≈g((j+0.5-μ) / σ), j=0,1,2,…,n. Simultaneously, based on the difference operation, the corresponding fine-grained normal approximation probability mass function can be calculated.

[0210] Step 5: Set the included probability to 95%, and interpolate the refined normal approximation cumulative distribution function to obtain the upper and lower quantiles z. 0.025 With z 0.975 The range of negative droplet number C0 between the two values ​​is determined, and the range of corresponding analyte content m is calculated according to formula (5). As shown in Table 2.

[0211] Table 2 shows the upper and lower limits of the number of negative droplets and the upper and lower limits of the content of the analyte corresponding to different values ​​of the analyte molecule content.

[0212]

[0213] Based on the above probability mass function, cumulative distribution function, and calculation results of the included interval, we can obtain the results corresponding to different... The probability distribution of the normal approximation and the fine normal approximation method are summarized as follows: Figure 6 As shown.

[0214] Example 2: Discrete Fourier Transform - Characteristic Function Evaluation Method

[0215] This embodiment uses simulated data of dispersed droplets following a log-normal distribution to verify the discrete Fourier transform-characteristic function evaluation method for determining the detection uncertainty of the content of analytes in random emulsions. The specific method steps are as follows:

[0216] Step 1: Set preset parameters: total number of droplets n is 1024, average droplet volume is 4, and standard deviation of droplet volume is 8 (i.e., coefficient of variation is 200%). Use a random number generation function to generate 1024 simulated droplet volumes v that follow a log-normal distribution. i Arranged according to the size of the value, as follows Figure 5 As shown. The average droplet volume in the simulation data. The standard deviation of volume is 4.13. The maximum volume is 8.12, the minimum volume is 0.0267514598060987, the maximum volume is 149.958178997929, and the median volume is 1.75023610586248.

[0217] Step 2, assuming a predetermined value for the content m of the analyte molecules in the emulsion. The values ​​are 100, 500, 2000, and 10000 respectively. Calculate the values ​​corresponding to each of these values ​​using formula X9. P i =P{X i The probability sets of =0} are shown in Table 3. The above four probability sets are arranged in order of their values ​​as follows: Figure 7 As shown.

[0218] Table 3 shows the probability set of each droplet being a negative droplet corresponding to different values ​​of the content of the analyte molecule.

[0219]

[0220] Step 3: Calculate the real part 'a' of the constructed complex number element step by step according to formulas (X10), (X11), (X12), (X13), (X14), and (X15). l and the imaginary part b l This forms the set of complex numbers: {a1+ib1, a2+ib2, ..., a 1025 +ib 1025 As shown in Table 4, the complex set of numbers is divided by 1025 and then subjected to a Fast Fourier Transform (FFT) to obtain the probability mass function values ​​of the Poisson binomial distribution of the number of negative droplets C0.

[0221] Table 4. Complex element a corresponding to different analyte molecule contents. l +ib l gather.

[0222]

[0223]

[0224] Step 4: Perform a summation operation based on the above probability mass function values ​​and formula (X16) to calculate the cumulative distribution function of the number of negative droplets C0.

[0225] Step 5: Set the included probability to 95% and interpolate the cumulative distribution function to obtain the upper and lower quantiles z. 0.025 With z 0.975 The range of negative droplet number C0 between the two values ​​is determined, and the range of corresponding analyte content m is calculated according to formula (5). As shown in Table 5.

[0226] Table 5 shows the upper and lower limits of the number of negative droplets and the upper and lower limits of the content of the analyte corresponding to different values ​​of the analyte molecule content.

[0227]

[0228] Based on the above probability mass function, cumulative distribution function, and calculation results of the included interval, we can obtain the results corresponding to different... The probability distribution of the Discrete Fourier Transform-Eigenfunction method is summarized below, along with related calculation results. Figure 8 As shown.

[0229] Example 3: Comparison of measurement precision of analyte molecules in random emulsion systems with different volume variation coefficients

[0230] Step 1: Referring to Step 1 in Example 1, the total number of droplets n is set to 512 and 1024 respectively. Based on an average droplet volume of 4 and standard deviations of droplet volume of 0, 4, 8, 20, and 40 (coefficients of variation of 0%, 100%, 200%, 500%, and 1000% respectively), a random number generation function is used to generate 5 sets of simulated droplet volumes (512 and 1024 droplets each) following a log-normal distribution (a total of 10 sets). The statistical results of each set of data are shown in Table 6.

[0231] Table 6. Statistical analysis of simulated droplet volumes for 10 groups.

[0232]

[0233] Step 2: Following the method described in Example 1 or 2, set the inclusion probability to 95%, and calculate the value of the analyte molecule content m when the number of negative droplets C0 traverses all possible values ​​j, j = 0, 1, 2, ..., n. and the range of the content m of the analyte And the corresponding measurement precision is further calculated according to formula (6). Determine Precision(j,n,v) iRegarding the distribution of C0 values ​​in the interval [0, n], the theoretical target number m and the corresponding C0 value for achieving optimal measurement precision were calculated. Related results are as follows: Figure 9 As shown, when the total number of droplets is 512, and the droplet volume variation coefficient is 0%, 100%, 200%, 500%, and 1000%, the optimal measurement precision can be achieved when the number of negative droplets is 114, 172, 198, 183, and 223, respectively. When the total number of droplets is 1024, and the droplet volume variation coefficient is 0%, 100%, 200%, 500%, and 1000%, the optimal measurement precision can be achieved when the number of negative droplets is 218, 333, 398, 402, and 443, respectively.

[0234] Example 4: Comparison of global measurement uncertainty distribution and measurement dynamic range

[0235] Referring to steps 1 and 2 in Example 3, with the inclusion probability set to 95%, the value of the content m of the analyte is calculated when the number of negative droplets C0 traverses all possible values ​​j, j = 0, 1, 2, ..., n. and the range of the content m of the analyte Using the entire range of negative droplet count C0 from 10 data sets as the x-axis, we plot all values ​​corresponding to the 10 data sets. And each Matched containment interval Using the data as the ordinate (semi-logarithmic coordinate system), a global measurement uncertainty distribution map is plotted, such as... Figure 10 As shown in the figure, the results indicate that, when the total number of droplets is the same, the coefficient of variation in volume affects the measurement dynamic range of the random emulsion system: the larger the coefficient of variation, the wider the measurement dynamic range; the smaller the coefficient of variation, the narrower the measurement dynamic range. When the coefficient of variation in volume is the same, the total number of droplets affects the measurement dynamic range of the random emulsion system: the more droplets, the wider the measurement dynamic range; the fewer droplets, the narrower the measurement dynamic range.

[0236] Based on the above conclusions, the method described in this embodiment can be used to evaluate and predict the results of a single or multiple random emulsion measurement experiments. Furthermore, it can calculate measurement uncertainty-related information such as measurement precision, measurement error, coverage interval, detection limit, upper and lower limits of quantitation, dynamic range, and measurement resolution. This allows for better design of experimental procedures and data analysis methods, improving measurement precision and accuracy while reducing experimental testing difficulty and cost.

[0237] Example 5: Evaluation of the uncertainty of random emulsion digital PCR detection results

[0238] A PCR amplification system containing fluorescent probes or dye indicators was prepared in a PCR tube. The target molecular sample was added as an amplification template and mixed thoroughly. Then, an emulsifier was added, and the tube was vortexed to prepare random emulsion droplets. The PCR tube was then placed in a PCR instrument to complete PCR amplification. The random emulsion droplets were loaded into an MGISEQ-2000 sequencing chip, and images were taken and stitched together as shown in the image. Figure 11 The fluorescence images shown are illustrated in Table 7, which employs a specific algorithm to perform digital image processing and extract information about the random emulsified droplets. In this table, DropNum represents the droplet number, PixelArea represents the droplet pixel area, Xcenter represents the droplet's X-coordinate center, Ycenter represents the droplet's Y-coordinate center, Intensity represents the droplet fluorescence intensity, Threshold represents the fluorescence intensity threshold, DropCall represents the droplet classification state, DropVolume represents the droplet volume, and Metric represents the droplet volume unit.

[0239] Table 7 shows the information of random emulsified droplets extracted from images from the MGISEQ-2000 platform.

[0240] DropNum PixelArea X center Y center Intensity Threshold DropCall DropVolume Metric 1 50 6.28 821.84 36.17391 108.88 0 3.7257E-06 μL 2 13 5 1530 35.36364 108.88 0 4.9393E-07 μL 3 61 7.229508 1816.148 38.25455 108.88 0 5.0205E-06 μL 4 205 12.53171 1836.571 41.12973 108.88 0 3.093E-05 μL 5 160 9.6125 437.9938 41.75694 108.88 0 2.1327E-05 μL 6 161 9.10559 1143.727 44.41379 108.88 0 2.1527E-05 μL 7 79 8.367089 1943.367 36.09859 108.88 0 7.3993E-06 μL 8 92 9.826087 1978.587 39.36585 108.88 0 9.2989E-06 μL 9 150 11.42667 957.4533 38.88235 108.88 0 1.9359E-05 μL 10 30 7.5 1257 33.42857 108.88 0 1.7315E-06 μL … … … … … … … … … 10001 1428 11406.27 803.0084 49.99378 108.88 0 0.00040461 μL 10002 2779 11414.18 577.4923 159.569 108.88 1 0.0007874 μL 10003 598 11398.82 1547.873 59.8513 108.88 0 0.00016944 μL 10004 81 11392.3 1945.469 54.89041 108.88 0 7.682E-06 μL 10005 2093 11416 1405.247 176.7074 108.88 1 0.00059303 μL 10006 108 11394.61 522.3056 53.84694 108.88 0 1.1827E-05 μL 10007 1154 11410.99 2070.735 171.4364 108.88 1 0.00032697 μL 10008 10187 11450.38 147.0806 155.4888 108.88 1 0.00288637 μL 10009 13 11394 1040 48.36364 108.88 0 4.9393E-07 μL 10010 84 11396.37 1098.548 42.01316 108.88 0 8.1127E-06 μL … … … … … … … … … 16901 1272 20894.19 625.2547 161.0638 108.88 1 0.00036041 μL 16902 889 20892.75 126.7435 48.13358 108.88 0 0.00025189 μL 16903 26 20878.23 1082.231 54.125 108.88 0 1.397E-06 μL 16904 596 20895.47 263.4933 46.74813 108.88 0 0.00016887 μL 16905 126 20893.31 1157.262 52.83333 108.88 0 1.4904E-05 μL 16906 109 20897.41 17.87156 38.32323 108.88 0 1.1992E-05 μL 16907 109 20900.53 1527.046 59.57576 108.88 0 1.1992E-05 μL 16908 27 20898.07 1570.519 79.52 108.88 0 1.4784E-06 μL 16909 198 20904.78 99.57071 42.71348 108.88 0 2.9359E-05 μL 16910 153 20904.52 41.80392 35.89051 108.88 0 1.9943E-05 μL 16911 30 20903.4 659.6 46.71429 108.88 0 1.7315E-06 μL

[0241] Based on the statistical results of random emulsified droplet information, a total of 16,911 droplets were extracted, with an average volume of 0.000481457 μL, a minimum volume of 4.93928E-07 μL, and a maximum volume of 0.011672423 μL. The volume variation coefficient was 1.395905302, or 139.59%. Based on the fluorescence intensity threshold of 108.88, 5,266 positive droplets and 11,645 negative droplets were identified. The statistical results of droplet pixel area and fluorescence intensity distribution are as follows: Figure 12 As shown. Based on the above information and formula (5), the content m of the analyte is calculated to be 9773.60 copies, and the concentration is 1200.41 copies / μL.

[0242] Following the method described in Example 1 or 2, with a coverage probability of 95%, the coverage interval corresponding to the analyte content m is calculated when the number of negative droplets C0 is 11645. The result was calculated using the Discrete Fourier Transform-Eigenfunction Evaluation Method. It is 9486. The value is 10085.9, and the corresponding measurement precision is further calculated according to formula (6) to be 0.06138, or 6.138%. The relevant calculation results are summarized as follows: Figure 13 As shown.

[0243] In the description of this specification, the references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of this application. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples.

[0244] Although embodiments of this application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting this application. Those skilled in the art can make changes, modifications, substitutions and variations to the above embodiments within the scope of this application without departing from the principles and spirit of this application.

Claims

1. A method for determining the detection uncertainty of the content of an analyte in a random emulsion, characterized in that, include: S1, determine the total number of reaction zones in the random emulsion, the volume of each reaction zone, and classify each reaction zone into a positive reaction zone and a negative reaction zone, wherein the positive reaction zone contains the test molecule, and the negative reaction zone does not contain the test molecule; S2, based on a predetermined value of the content m of the analyte molecules in the emulsion. The total number of reaction zones and the volume of each reaction zone in a random emulsion are used to determine the cumulative distribution function of the Poisson binomial distribution, wherein the Poisson binomial distribution is used to describe the number of reaction zones containing a specific number of analyte molecules. S3. Based on the cumulative distribution function, evaluate the uncertainty of the content m of the analyte molecules in the emulsion.

2. The method according to claim 1, characterized in that, The cumulative distribution function of the Poisson binomial distribution is determined by approximating the Poisson binomial distribution as a normal distribution with skewness correction.

3. The method according to claim 2, characterized in that, Step S2 further includes: S2-1, based on a predetermined numerical value of the content m of the analyte molecules in the emulsion. The total number of reaction zones and the volume of each reaction zone in a random emulsion are used to determine the expectation, variance, and skewness of a Poisson binomial distribution used to describe the number of reaction zones containing a specific number of analyte molecules. S2-2, Based on the expectation, variance and skewness determined in step S2-1, determine a fine normal approximate cumulative distribution function to describe the number of reaction partitions containing a specific number of analyte molecules.

4. The method according to claim 3, characterized in that, In step S2-1, the expectation, variance, and skewness are determined according to the following formulas: Where μ represents the expectation; σ 2 Variance is represented by γ; skewness by n; total number of response zones by v i denoted by , m represents the volume of the i-th reaction partition; m represents the content of the analyte molecule; C0 represents the number of negative reaction partitions.

5. The method according to claim 4, characterized in that, The refined normal approximation cumulative distribution function is obtained by correcting the cumulative distribution function of the standard normal distribution using the skewness.

6. The method according to claim 5, characterized in that, The refined normal approximation cumulative distribution function is determined through the following steps: (i) Generate a series of values ​​in the range of 0 to n, where n is the total number of the reaction partitions; (ii) Convert the series of values ​​into Z-scores of the standard normal distribution as x values; (iii) Construct a normal distribution curve for x based on the expected value and variance; (iv) Based on the normal distribution curve, determine the probability density function of the standard normal distribution. The probability density function of the standard normal distribution determined by the expected value and variance obtained in step S2, and the cumulative distribution function Φ(x) of the standard normal distribution; and (v) Based on formula (4), Determine the fine-grained normal approximation cumulative distribution function. in, It is the probability density function of the standard normal distribution determined by the expected value and variance obtained in step S2, and Φ(x) is the cumulative distribution function of the standard normal distribution.

7. The method according to claim 1, characterized in that, The cumulative distribution function of the Poisson binomial distribution is determined by the discrete Fourier transform based on the characteristic function of the random variable.

8. The method according to claim 7, characterized in that, The cumulative distribution function is determined through the following steps: For each reaction partition, based on the total number of reaction partitions n and the partition volume v i The total volume of the emulsion and the value of m The probability that the reaction partition is a negative reaction partition is determined, resulting in multiple discrete values; Based on the multiple discrete values, a feature function is constructed; Based on the aforementioned characteristic function, a complex sequence of characteristic functions is determined, where each element is a complex number, and its real part and imaginary part correspond to the cosine and sine values ​​at different frequencies, respectively. Perform a Fast Fourier Transform on the complex sequence to determine the probability mass function of the Poisson binomial distribution; The cumulative distribution function is determined based on the probability mass function.

9. The method according to claim 1, characterized in that, The numerical value of the analyte molecular content m in the emulsion. The number of reaction zones containing a specific number of analyte molecules in the emulsion is determined based on the fact that the number of zones follows a Poisson binomial distribution.

10. The method according to claim 9, characterized in that, The numerical value of the analyte molecular content m in the emulsion. It is determined by satisfying the following formula: Among them, v i v represents the volume of the i-th reaction zone. p v represents the volume of the p-th positive reaction region. q Let q represent the volume of the q-th negative reaction zone, j represent the number of negative reaction zones, and n represent the number of reaction zones in the emulsion.

11. The method according to claim 1, characterized in that, Based on the cumulative distribution function, the uncertainty of the content m of the analyte molecule in the emulsion is evaluated, further including: Using the cumulative distribution function and based on a predetermined inclusion probability, the inclusion interval of the content m of the analyte is determined, so as to determine the interval estimate corresponding to m. and Based on the above and and Determine the uncertainty of m.

12. A system for determining the detection uncertainty of the content of an analyte in a random emulsion, characterized in that, include: The measurement parameter determination module is used to determine the total number of reaction zones in a random emulsion, the volume of each reaction zone, and to classify each reaction zone into a positive reaction zone and a negative reaction zone, wherein the positive reaction zone contains the analyte molecule, and the negative reaction zone does not contain the analyte molecule; The distribution function construction module is used to construct the distribution function based on a predetermined numerical value of the content m of the analyte molecules in the emulsion. The total number of reaction zones and the volume of each reaction zone in a random emulsion are used to determine the cumulative distribution function of the Poisson binomial distribution, wherein the Poisson binomial distribution is used to describe the number of reaction zones containing a specific number of analyte molecules. The uncertainty evaluation module is used to evaluate the uncertainty of the content m of the analyte molecules in the emulsion based on the cumulative distribution function.

13. A computing device, characterized in that, include: Processor and memory; The memory is used to store computer programs; The processor is configured to execute the computer program to implement the method for determining the detection uncertainty of the content of analyte molecules in a random emulsion as described in any one of claims 1 to 15.