Dynamic minimal residual disease detection

By employing a dynamic probability fractional classification method and utilizing targeted cell-free DNA sequencing data and a likelihood model, the problem of insufficient sensitivity and specificity in existing MRD detection technologies has been solved, achieving low-cost and high-efficiency MRD detection.

CN122249860APending Publication Date: 2026-06-19THE BROAD INST INC

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
THE BROAD INST INC
Filing Date
2024-07-26
Publication Date
2026-06-19

Smart Images

  • Figure CN122249860A_ABST
    Figure CN122249860A_ABST
Patent Text Reader

Abstract

This disclosure relates to systems and methods for detecting circulating tumor DNA (ctDNA) and minimal residual disease in patient samples. Aspects of this disclosure involve using a classification module to assess whether a patient sample contains ctDNA and minimal residual disease, based in part on patient-specific input. This disclosure also relates, at least in part, to determining whether a patient has cancer based on the output of the classification module.
Need to check novelty before this filing date? Find Prior Art

Description

[0001] Cross-reference to related applications

[0002] This application claims the benefit of U.S. Provisional Application No. 63 / 515,801, filed July 26, 2023, under 35 USC § 119(e), entitled “Sequencing for enrichment of minor alleles by recognizing oligonucleotides,” the entire disclosure of which is incorporated herein by reference.

[0003] Government licensing rights

[0004] This invention was made with government support under grant number CA221874 from the National Institutes of Health in the United States. The government holds certain rights to this invention. Background Technology

[0005] Tracking patient-specific tumor mutations in cell-free DNA (cfDNA) for minimal residual disease (MRD) detection is promising but challenging. Sensitivity, specificity, and cost must be considered when screening patient cohorts. Efforts to improve sensitivity and specificity remain expensive and laborious. Summary of the Invention

[0006] This disclosure relates to a system comprising: a classification module implemented in a non-transitory computer-readable storage medium and configured to: classify a sample as positive or negative for circulating tumor DNA based on a dynamic probability score relative to a threshold, the dynamic probability score being determined using a specificity factor based on targeted cell-free DNA sequencing data of the sample, the specificity factor including the number of mutated double strands in the sample carrying a mutation determined in the sample and the total number of putative double strands in the sample that have been detected by mutation, the dynamic probability score being further based on: a first likelihood that the number of mutated double strands is entirely derived from a tumor; and a second likelihood that the number of mutated double strands is a spontaneous error or a mutation of non-cancer origin; and outputting the classification of the sample.

[0007] This disclosure relates to a method comprising: receiving targeted cell-free DNA sequencing data from a patient sample, the targeted cell-free DNA sequencing data being enriched for DNA duplexes having mutation sites found in a tumor fingerprint; determining a dynamic probability score for the minimal residual disease (MRD) status of the sample based on a specificity factor, the specificity factor including the number of mutant duplexes in the sample and the total number of putative duplexes detected in the sample for mutation sites, by: outputting a first likelihood that the number of mutant duplexes is entirely derived from the tumor via a first likelihood model using the number of mutant duplexes and the total number of putative duplexes detected in the sample for mutation sites; outputting a second likelihood that the number of mutant duplexes is a spontaneous error or a non-cancer-originating mutation via a second likelihood model using the number of mutant duplexes and the total number of putative duplexes detected in the sample for mutation sites; determining a dynamic probability score based on the first and second likelihoods; classifying the MRD status of the sample as MRD positive or MRD negative based on the dynamic probability score relative to a threshold; and outputting the classified MRD status. Attached Figure Description

[0008] The following figures form part of this specification and are included to further illustrate certain aspects of this disclosure, which can be better understood by referring to one or more of these figures in conjunction with the detailed description of the specific embodiments presented herein. For clarity, not every component may be labeled in every figure. It should be understood that the data shown in the figures in no way limits the scope of this disclosure. In the figures:

[0009] Figure 1 This document provides an overview of MAESTRO-Pool (minor allele enrichment sequencing by recognizing oligonucleotides) and dynamic MRD (minimal residual disease) determination. The workflow involves designing patient-specific tumor fingerprints and pooling them to form a single MAESTRO-Pool MRD assay, which is applied to all samples from all patients. Each patient's own plasma sample (i.e., "matched samples") is used to detect and quantify MRD, while plasma samples from other patients (i.e., "unmatched samples") are used to assess the experimental specificity of the tumor fingerprint. MRD is determined using a dynamic model that calculates the probability of MRD detection based on sample-specific properties such as the observed tumor fraction, tumor fingerprint size, and the number of cfDNA molecules.

[0010] Figures 2A-2E This demonstrates that dynamic MRD determination achieves high specificity and sensitivity when tracking mutations and cfDNA molecules. Figures 2A-2E The theoretical specificity and sensitivity for determining MRD using either a fixed mutation threshold or a dynamic, probability-based threshold are shown. The left and middle figures depict the results using 1 x 10⁻⁶ PCR techniques across fingerprint sizes and cfDNA quality ranges.-7 The probability of error detection with a fixed background error rate ( Figure 2A 50ng cfDNA; Figure 2B 100ng cfDNA; Figure 2C 250ng cfDNA; Figure 2D 500ng cfDNA; Figure 2E (1000 ng cfDNA). The left figure shows the false detection probability when using a fixed threshold of ≥2 mutations to determine MRD, and the middle figure shows the false detection probability when using a dynamic threshold of P ≥ 0.95. The right figure depicts the limits of detection (i.e., the tumor fraction detectable with 90% power) for the two MRD determination methods within the same range. All figures assume 75 duplexes per site per 1 ng cfDNA.

[0011] Figures 3A-3B The results demonstrate that MAESTRO-Pool and dynamic MRD determination achieve analytical sensitivity below 1 ppm, while also validating a high experimental specificity of >98%. Figure 3A The MRD results are shown by applying MAESTRO-Pool to all samples from all patients and using either a fixed MRD determination with a mutation threshold of ≥2 (left) or a dynamic MRD determination with a P ≥ 0.95 threshold (right). Patients were sorted by validated fingerprint size, while samples were sorted by the quality of cfDNA within each patient. Figure 3B The experiment quantifies the specificity of a customized MRD test (“patient fingerprint”) for each patient using unmatched samples and depicts the range of tumor scores for samples judged to be MRD positive. Similarly, the results are divided into fixed MRD judgments (left) and dynamic MRD judgments (right).

[0012] Figures 4A-4D The results show that MAESTRO enrichment reduced sequencing requirements by a median of 33-fold. Figure 4A The MAESTRO and MRD Tracker are shown for comparing double-stranded variant allele frequencies (VAFs) at sites in two fingerprints. Figure 4B The association between median VAF enrichment and tumor score for each MAESTRO sample is shown, which establishes an upper limit for VAF enrichment. Figure 4C An example of mutant duplex downsampling is shown for quantifying the number of read pairs required to reach mutant duplex saturation at sites in two fingerprints. Figure 4D This demonstrates that the number of read pairs required to reach saturation of the mutant double strand is reduced due to MAESTRO enrichment.

[0013] Figures 5A-5B The figures illustrate that MAESTRO and dynamic MRD assessment provide a sensitive technique for MRD monitoring, and are generally superior to imaging and clinical examinations. Figure 5A Pt_966 and ( Figure 5B The MRD results for Pt_1361 are overlaid with clinical annotations. Each graph shows MRD results detected using MAESTRO and dynamic MRD as a line graph, the detection limits of most commercially available MRD tests at 100 ppm, and significant clinical progressions (i.e., surgery and progression) along the top axis. Any sample with a dynamic score between 0.5 ≤ P < 0.95 is marked as “critical” and clearly annotated with its dynamic probability and specificity score. Detailed clinical annotations (including treatment, imaging results, biopsy results, and interpreted physician notes) are shown below the graphs. The patient’s status at the last clinical follow-up is shown at the end of the timeline.

[0014] Figures 6A-6B This demonstrates the consistency between fixed and dynamic MRD determinations in the TBCRC030 cohort. Dynamic MRD determinations were applied to 46 patients from the TBCRC030 study, which previously used fixed MRD determinations. Figure 6A The corresponding MRD determinations (illustrated) and tumor scores for the two MRD determination methods are shown. Figure 6B The comparison between dynamic probability scores and tumor scores is shown (line depicting the P≥0.95 threshold).

[0015] Figures 7A-7B The validation of tumor fingerprints in tumor DNA and the estimation of detection limits in plasma are shown. Figure 7A The diagram illustrates how, for MAESTRO and MRD Tracker, each patient's tumor fingerprint is applied to their matched tumor and normal DNA for validation. Only sites that are fully detected in the matched tumor are considered validated. Figure 7B The validated tumor fingerprint is shown for calculating the limit of detection (LOD) for each plasma sample—the lower limit of ≥90% detection power.

[0016] Figures 8A to 8B The sensitivity and specificity of MAESTRO-Pool with different MRD detection thresholds are shown. Figure 8A The evaluation of experimental specificity and sensitivity as a function of a fixed mutation threshold is shown. The left graph shows the experimental specificity using >70 unmatched samples, while the right graph shows the number of matched samples detected as MRD positive. Triangles represent specificity or sensitivity at a standard fixed threshold of ≥2 mutations. Figure 8BSimilarly, the evaluation of experimental specificity and sensitivity of dynamic MRD detection as a function of dynamic probability thresholds is shown. Triangles represent specificity or sensitivity when using only a fixed threshold of ≥2 mutations, while circles represent values ​​when using a dynamic threshold of P ≥ 0.95. Notably, as shown in the inset, several patients showed a specificity of 1 for all dynamic thresholds.

[0017] Figure 9 The experiment-specific scores are generated by comparing dynamic probability scores with the distributions from unmatched samples. Figure 9 The distribution of dynamic probability scores for matching and non-matching samples for each tumor fingerprint is shown. A threshold of P ≥ 0.95 was used to detect MRD-positive samples, while samples with 0.5 ≤ P < 0.95 were considered borderline. Experimental specificity scores were calculated for each borderline matching sample by assessing the scores of non-matching samples with lower probability scores.

[0018] Figures 10A to 10B The consistency between the results from MAESTRO and MRD Tracker is shown. Figure 10A The consistency of tumor score and MRD detection results (shown in the inset) is shown when MAESTRO with dynamic MRD detection and MRD Tracker with fixed MRD detection are applied to the same sample. Figure 10B The study demonstrates the subsettization of samples with inconsistent MRD detection results and examines whether the detection limit for MRD-negative samples is higher than the tumor fraction for MRD-positive samples.

[0019] Figures 11A to 11B The figures show a comparison between MAESTRO MRD results and clinical annotations. These figures illustrate the overlay of MRD results with ( Figure 11A Pt_1070 and ( Figure 11B Clinical annotations for Pt_1478. Each graph shows MRD results detected using MAESTRO and dynamic MRD as a line graph, the detection limits of most commercially available MRD tests at 100 ppm, and significant clinical progressions (i.e., surgery and progression) along the top axis. Any sample with a dynamic score between 0.5 ≤ P < 0.95 was marked as “critical” and clearly annotated with its dynamic probability and specificity score. Detailed clinical annotations (including treatment, imaging results, biopsy results, and interpreted physician notes) are shown below the graphs. The patient’s status at the last clinical follow-up is shown at the end of the timeline.

[0020] Figures 12A to 12B The figures show a comparison between MAESTRO MRD results and clinical annotations. These figures illustrate the overlay of MRD results with ( Figure 12A Pt_1452 and ( Figure 12BClinical annotations for Pt_973. Each graph shows MRD results detected using MAESTRO and dynamic MRD as a line graph, the detection limits of most commercially available MRD tests at 100 ppm, and major clinical progressions (i.e., surgery and progression) along the top axis. Any sample with a dynamic score between 0.5 ≤ P < 0.95 is marked as “critical” and clearly annotated with its dynamic probability and specificity score. Detailed clinical annotations (including treatment, imaging results, biopsy results, and interpreted physician notes) are shown below the graphs. The patient’s status at the last clinical follow-up is shown at the end of the timeline.

[0021] Figures 13A to 13B The figures show a comparison between MAESTRO MRD results and clinical annotations. These figures illustrate the overlay of MRD results with ( Figure 13A Pt_1083 and ( Figure 13B Clinical annotations for Pt_1367. Each graph shows MRD results detected using MAESTRO and dynamic MRD as a line graph, the detection limits of most commercially available MRD tests at 100 ppm, and significant clinical progressions (i.e., surgery and progression) along the top axis. Any sample with a dynamic score between 0.5 ≤ P < 0.95 is marked as “critical” and clearly annotated with its dynamic probability and specificity score. Detailed clinical annotations (including treatment, imaging results, biopsy results, and interpreted physician notes) are shown below the graphs. The patient’s status at the last clinical follow-up is shown at the end of the timeline.

[0022] Figure 14 A comparison of MAESTRO MRD results with clinical annotations is shown. The figure shows overlayed MRD results with clinical annotations for Pt_1406. Each graph displays MRD results detected using MAESTRO and dynamic MRD as a line graph, the detection limits of most commercially available MRD tests at 100 ppm, and significant clinical progressions (i.e., surgery and progression) along the top axis. Any sample with a dynamic score between 0.5 ≤ P < 0.95 is marked as “critical” and clearly annotated with its dynamic probability and specificity score. Detailed clinical annotations (including treatment, imaging results, biopsy results, and interpreted physician notes) are shown below the figure. The patient’s status at the last clinical follow-up is shown at the end of the timeline.

[0023] Figures 15A to 15C The effect of plasma volume on the MAESTRO estimated detection limit is shown. Figure 15A cfDNA yield in typical (< 10 mL) and large (≥ 10 mL) plasma volume samples; Figure 15B Estimated limit of detection (LOD95) with 95% power for typical and large plasma volume samples; Figure 15C The estimated LOD95 of large-volume plasma samples was compared with the simulated LOD95 of their actual single-volume apheresis tube equivalent (i.e., 4 mL). Each large-volume plasma sample was downsampled 50 times, and the median LOD95 was selected as the predicted LOD95 shown here.

[0024] Figures 16A to 16E The results of MAESTRO-Pool are shown. Figure 16A The MAESTRO-Pool results are shown when a uniform background SNV frequency is assumed. The associated plasma volume and T>C background SNV frequency are shown above, with asterisks indicating samples with unknown plasma volumes. Figure 16B The association between false-positive MRD detection results in unmatched patient samples and plasma volume (left) and T>C background SNV frequency (right) is shown. P-values ​​were assessed using a one-sided Mann-Whitney U test. Figure 16C An improved dynamic MRD detector is shown using sample-specific and context-specific background SNV frequencies measured based on MAESTRO-Pool data. Figure 16D The MAESTRO-Pool results with sample-specific and context-specific background SNV frequency tuning are shown. Figure 16E The results of MRD positive detection using dynamic MRD detectors with uniform background SNV frequency or sample-specific and context-specific SNV frequencies are shown.

[0025] Figures 17A to 17C The effect of larger plasma volumes on MRD detection is shown. Figure 17A The estimated tumor fraction relative to plasma volume is shown for MRD-positive samples. Figure 17B The comparison of efficacy (i.e., the probability of detecting the observed tumor fraction) and the fraction of MRD positivity detected in a simulated 4 mL equivalent is shown for large volume (≥10 mL) MRD positive samples. Figure 17C The figure shows the estimated tumor fractions and confidence intervals (sorted in descending order of tumor fraction) for large-volume MRD-positive samples and their actual simulated 4 mL equivalents. The figure above shows the metrics derived from these tumor fraction estimates: the confidence interval ratio (upper CI / lower CI) for each sample, and the percentage error of simulating a 4 mL equivalent sample (|TFx4 mL - TFxFull| / TFxFull x 100%).

[0026] Figures 18A to 18DThe association between MAESTRO-Pool ctDNA results and clinical response is shown. The time course for each glioblastoma patient is displayed, including their plasma ctDNA MRD results (top); MRI and neuropathological assessments (middle); and treatment (bottom). Three patterns of association between ctDNA assay results, radiological assessments, and pathological assessments are identified, among which: Figure 18A This showed that MRD positivity preceded histological tumor progression, while concurrent radiological findings were indeterminate (i.e., true progression relative to pseudo-progression). Figure 18B Analysis based on large plasma volumes showed that MRD negativity was associated with a lack of tumor progression, despite multiple concurrent post-radiotherapy MRI reports indicating indeterminate progression. Figure 18C This study demonstrates persistent negative MRD in a patient with Lynch syndrome who exhibits a durable response to immune checkpoint blockade. Figure 18D Radiological and / or pathological evidence of tumor progression was presented; however, MRD was not detected at previous time points.

[0027] Figure 19 The total number of cellular SNVs detected based on whole-genome sequencing is shown, along with the number of SNVs targeted by the MAESTRO and MRD Tracker fingerprints for each patient.

[0028] Figures 20A to 20B The measured double-strand depth is shown in relation to plasma volume ( Figure 20A ) or cfDNA production ( Figure 20B The correlation between them.

[0029] Figures 21A to 21B A comparison of MRD detection results between MAESTRO-Pool and MRD Tracker measurements is shown. Figure 21A The probability fraction of the dynamic detector for each measurement is shown, with the dashed line representing the MRD detection threshold (i.e., P ≥ 0.95). Figure 21A The table below shows the corresponding MRD detection results for each assay. Of the 70 total plasma samples, 66 were available for this analysis because 4 samples from GBM_6 were inadvertently mislabeled as GBM_5. These sample exchanges were identified via MAESTRO-Pool but were not reanalyzed using the correct MRD Tracker fingerprint. Figure 21B The consistency of tumor scores between MAESTRO-Pool and MRDTracker measurements is shown.

[0030] Figure 22AThe analysis specificity (top) and sensitivity (bottom) of MAESTRO-Pool using the dynamic MRD detector relative to each MAESTRO fingerprint are shown when a uniform background SNV frequency of 0.1 ppm is assumed. Figure 22B The false positive rate (top) and sensitivity (bottom) of the MAESTRO-Pool analysis using the dynamic MRD detector relative to plasma samples from each patient are shown.

[0031] Figure 23A The overall background SNV frequency is shown as measured by MAESTRO-Pool relative to MRD Tracker. Figure 23B The consistency of the measured background SNV frequency is shown. Figure 23C A comparison of background SNV frequencies between MAESTRO-Pool and MRDTracker across time points is shown.

[0032] Figure 24 The background SNV frequency estimated using MAESTRO-Pool is shown in 98 plasma samples from 8 melanoma patients.

[0033] Figure 25A The context-specific background SNV frequency, as measured by MAESTRO-Pool, is shown. Figure 25B The consistency of context-specific background SNV frequencies between MAESTRO-Pool and MRD Tracker is shown.

[0034] Figure 26A The comparison of dynamic probability scores is shown when using a uniform background SNV frequency of 0.1 ppm relative to sample-specific and context-specific SNV frequencies. The panel is divided into patient-matched samples (left) and patient-unmatched samples (right). The dashed line represents the dynamic probability score threshold of 0.95 for detecting MRD. Figure 26B The analytical specificity (top) and sensitivity (bottom) of the MAESTRO-Pool are shown relative to the sample-specific and context-specific SNV frequencies for each MAESTRO fingerprint. Figure 26C The false positive rate (top) and sensitivity (bottom) of MAESTRO-Pool analysis using sample-specific and context-specific SNV frequencies relative to each patient's plasma sample are shown.

[0035] Figure 27The MAESTRO-Pool and clinical outcomes for GBM_32 are shown. The time course of GBM_32 in patients is presented, including: plasma ctDNA MRD results (top); MRI and neuropathological assessment (middle); and treatment (bottom). Only four plasma time points were available for analysis in GBM_32, and the MRI frequency was low, complicating a rigorous comparison of the two approaches, although both post-radiotherapy plasma time points (day 147 and day 161) were MRD-positive, while the concurrent MRI (day 147) indicated disease stability with a small, enlarged basal dura mater that likely represented a tumor seeding area.

[0036] Figure 28 This is a flowchart illustrating a schematic process 2800 for determining the posterior probability of a patient sample being MRD positive, based on some embodiments of the techniques described herein.

[0037] Figure 29 This is a flowchart illustrating an illustrative process 2900 for identifying the presence of one or more mutation sites in a patient sample, according to some embodiments of the techniques described herein.

[0038] Figure 30 This is a flowchart illustrating a schematic process 3000 for sequencing and processing and filtering sequencing data from tumor genomic DNA from a subject, according to some embodiments of the techniques described herein.

[0039] Figure 31 This is a diagram illustrating a flowchart of a schematic process 3100 for determining whether to associate a mutation site with a patient to create a tumor fingerprint, based on some embodiments of the techniques described herein.

[0040] Figure 32 This is a diagram of an environment that can be run to utilize dynamic MRD classification in an example implementation.

[0041] Figure 33 This is a diagram of an environment that can be run to utilize dynamic MRD classification in another exemplary implementation.

[0042] Figure 34 An instance program is shown in which dynamic MRD classification is performed.

[0043] Figure 35 The diagram illustrates an instanced system comprising various components that can be implemented as an instanced device of any type of computing device, as shown in the reference. Figures 28 to 34 This and / or is used in instances that implement the techniques described herein. Detailed Implementation

[0044] Minimal residual disease (MRD) refers to the presence of a small number of cancer cells remaining in the body after or during cancer treatment. While cancer therapies have improved significantly over the past century, many treatments do not completely eliminate tumor cells from a patient's body, often leading to treatment resistance (i.e., cancer cells adapt to treatments by acquiring molecular changes that allow them to evade treatment) and / or tumor recurrence (i.e., tumors that were previously undetectable recur after treatment). Improved methods for detecting cancer cells are needed to provide patients with early detection of cancer, treatment monitoring to ensure cancer does not develop treatment resistance, and monitoring for tumor recurrence.

[0045] Therefore, this disclosure relates to the development of a dynamic minimal residual disease (MRD) assay, a method for determining whether a patient's sample is MRD-positive or MRD-negative with statistical significance. In some embodiments, the dynamic MRD assay is used to evaluate patient samples that have been analyzed using MAESTRO (micro-allele enrichment sequencing by recognizing oligonucleotides), a method for enriching against cancer-specific sequence mutations in DNA found in the patient's blood. MAESTRO is described in US 2023 / 0203568A1, the entire contents of which are incorporated herein by reference in their entirety.

[0046] Dynamic minimal residual disease (MRD) detection addresses a long-standing but unmet need: the ability to detect cancer in patient samples before tumor development, to detect treatment resistance in cancer cells, and to detect residual cancer cells in patients who have already undergone cancer treatment. MRD detection can also be used in cancer treatment planning. The results of MRD detection can inform physicians about the type of cancer therapy to administer, whether to intensify treatment, downgrade treatment, change treatment, and / or discontinue treatment. MRD detection has the potential to significantly improve cancer patient outcomes by enabling early detection of cancer, monitoring of treatment efficacy, and monitoring of tumor recurrence.

[0047] Dynamic minimal residual disease (MRD) detection is an improved method for determining whether a patient sample is MRD-positive or MRD-negative. In some embodiments, a patient's sample is MRD-positive if a mutation associated with the patient's tumor is identified in the sample. In some embodiments, a patient's sample is MRD-negative if no mutation associated with the patient's tumor is identified in the sample. Previous methods for determining whether a patient sample is MRD-positive or MRD-negative used a "fixed" threshold for MRD detection: ≥2 mutant DNA duplexes when measuring approximately 1,000 whole-genome mutations from a standard blood volume (e.g., 1 to 3 x 10 cc tubes). This approach is effective in benchmark experiments but is insufficient for tracking additional mutations and cell-free DNA molecules for each patient. Dynamic MRD detection uses an MRD detection probability based on sample-specific properties relative to an assumed error rate of 1 / 10 M. Dynamic MRD detection becomes more sensitive and specific when patient-specific data (e.g., single nucleotide variants found in sequence reads derived from sequencing of a patient's plasma sample) are used as background rather than an assumed background. In some embodiments, dynamic minimal residual disease (MRD) detection is performed at low parts per million (ppm) or below. In some embodiments, dynamic MRD detection is performed at at least 0.1, at least 0.2, at least 0.3, at least 0.4, at least 0.5, at least 0.6, at least 0.7, at least 0.8, at least 0.9, at least 1.0, at least 1.1, at least 1.2, at least 1.3, at least 1.4, at least 1.5, at least 2.0, at least 2.5, at least 3.0, at least 3.5, at least 4.0, at least 4.5, at least 5.0, at least 5.5, at least 6.0, at least 6.5, at least 7.0, at least 7.5, at least 8.0, at least 8.5, at least 9.0, at least 9.5, or at least 10.0 ppm.

[0048] Therefore, this disclosure relates to a method for using dynamic minimal residual disease (MRD) detection to determine whether a patient sample is MRD positive or MRD negative.

[0049] Cancer detection using dynamic minimal residual disease detection

[0050] This disclosure relates to methods for determining whether a patient is MRD-positive, which may indicate the presence of cancer cells in the patient's body. As used herein, "cancer" means any malignant and / or invasive growth or tumor caused by abnormal cell growth in the subject's body, including solid tumors, blood cancers, myeloma, or lymphoma.

[0051] Figure 28This is a flowchart of an illustrated process 2800 for determining the posterior probability that a patient sample is MRD positive, which uses the number of mutant double strands in the determined patient sample and the total number of double strands in the determined patient sample.

[0052] Various (e.g., some or all) actions of process 2800 can be implemented using any suitable computing device. For example, in some embodiments, one or more actions of the illustrative process 2800 can be implemented in a clinical or laboratory setting. For example, one or more actions of process 2800 can be implemented on a computing device located within a clinical or laboratory setting. In some embodiments, the computing device can obtain sequencing data directly from a sequencing apparatus located within a clinical or laboratory setting.

[0053] Alternatively or concurrently, one or more actions of the illustrative process 2800 may be performed in an environment remote from the clinical or laboratory setting. For example, one or more actions of process 2800 may be performed on a computing device located outside the clinical or laboratory setting. In this case, the computing device may indirectly obtain sequencing data generated using sequencing equipment located inside or outside the clinical or laboratory setting. For example, expression data may be provided to the computing device via a communication network, such as the Internet or any other suitable network.

[0054] It should be understood that in some embodiments, this is not the case. Figure 28 All actions in the illustrated process 2800 can be performed using one or more computing devices. For example, the action 2810 of modifying a patient's treatment plan can be performed manually (e.g., by a clinician).

[0055] Process 2800 begins with action 2802, in which sequencing data from the patient is obtained. In some embodiments, sequencing data is obtained from MAESTRO analysis. In some embodiments, sequencing data is obtained by sequencing DNA duplexes in the patient sample. This document describes examples of sequencing data, sources of sequencing data, and formats of sequencing data, including in a section entitled “Micro-allele enrichment sequencing by oligonucleotides (MAESTRO)”.

[0056] As an illustrative example, in some embodiments, sequencing data may include duplex sequencing data. In some embodiments, as described herein, sequencing data includes data from sequencing duplex DNA molecules captured by allele-specific probes. In some embodiments, sequencing data includes sequencing reads.

[0057] Next, process 2800 proceeds to action 2804, where sequencing data is analyzed to determine the number of mutated DNA double strands in the patient sample and the estimated total number of mutation-detected DNA double strands in the patient sample. In some embodiments, the number of mutated DNA double strands is determined by identifying mutation sites found in the patient's tumor fingerprint in the sequencing data. The process of determining the patient's tumor fingerprint is described in... Figure 30 and Figure 31 .

[0058] In some embodiments, the estimated total number of mutation-detected DNA duplexes in a patient sample is estimated. In some embodiments, the estimated total number of DNA duplexes is determined by using probes that are nonspecific to mutation sites found in the patient's tumor fingerprint using the MAESTRO workflow. The MAESTRO workflow depletes wild-type DNA duplexes and enriches mutant DNA duplexes. Therefore, it is difficult to determine the estimated total number of mutation-detected duplexes in a patient sample, and thus difficult to determine the ratio of mutant DNA duplexes to the total mutation-detected DNA duplexes (tumor fraction). To address this problem, the number of DNA duplexes at each locus is measured and combined with the number of mutations in the tumor fingerprint to approximate the total number of mutation-detected DNA duplexes in the sample. In some embodiments, the average total number of DNA duplexes at each locus is multiplied by the number of mutations in the tumor mutation fingerprint to estimate the total number of mutation-detected DNA duplexes in the patient sample.

[0059] In some embodiments, probes nonspecific to mutations found in a patient's tumor fingerprint are used to enrich wild-type DNA duplexes and calculate an estimated number of total DNA duplexes at each locus. In some embodiments, allele-specific probes with low binding affinity to mutation sites found in a patient's tumor fingerprint are used to determine the estimated total number of DNA duplexes at each locus in the patient sample. Using allele-specific probes with low binding affinity to mutation sites found in a patient's tumor fingerprint results in the capture of both mutant and wild-type DNA duplexes, which allows for the estimation of the total number of DNA duplexes at each locus. The methods described herein for determining the number of mutant DNA duplexes and the total number of mutation-detected DNA duplexes are used at least in part to determine the tumor score (TFx). The tumor score is the ratio of mutant DNA duplexes to the total mutation-detected DNA duplexes in the patient sample.

[0060] Next, process 2800 proceeds to action 2806, where data derived from action 2804 is used to determine the posterior probability that the patient sample is MRD positive. Action 2806 is divided into three sub-actions. In action 2806a, a first statistical model, the number of mutated DNA duplexes, and the total number of mutation-detected DNA duplexes are used to determine the first likelihood that the number of mutated DNA duplexes is entirely derived from the tumor in the estimated total number of mutation-detected DNA duplexes. In some embodiments, the first statistical model is a binomial model. The term "binomial model" is known in the art. In some embodiments, the first statistical model is a binomial model with the number of mutated duplexes, the estimated total number of mutation-detected duplexes in the patient sample, and the background mutation frequency as inputs. In some embodiments, the background mutation frequency is set to the same value for all mutations. In some embodiments, different background mutation frequencies are set for mutations in different mutation contexts. In some embodiments, the background mutation frequency is set using empirically measured values ​​on a per-sample basis. In some embodiments, when the background error rate is set to a value empirically measured on a per-sample basis, pooled probe testing as described herein is used to empirically determine the background mutation frequency, or on patient samples, targeted duplex sequencing is used to empirically determine this value. In some embodiments, the first likelihood is determined as the product of the context-specific likelihoods of a set of discrete mutation contexts. In some embodiments, the abundance of single nucleotide variants in patient samples is determined to indicate the background mutation frequency. The term “single nucleotide variant” or “SNV” refers to a change in the DNA sequence that occurs when a single nucleotide in the genome is altered. In some embodiments, SNV frequencies are used to calculate the background mutation frequency.

[0061] In action 2806b, a second statistical model, the number of mutant DNA duplexes, and the estimated total number of mutation-detected DNA duplexes are used to determine the number of mutant DNA duplexes as a spontaneous error second likelihood in the estimated total number of mutation-detected DNA duplexes.

[0062] After performing actions 2806a and 2806b, process 2800 proceeds to 2806c, where a posterior probability of the patient sample being MRD-positive is determined using a first likelihood and a second likelihood. In some embodiments, the second statistical model is a binomial model. In some embodiments, the second statistical model takes as input the number of mutated DNA duplexes, the estimated total number of mutation-detected DNA duplexes in the sample, and the background mutation frequency. The method for determining the background mutation frequency is described under action 2806a above. In some embodiments, the second statistical model is a β-binomial model. The term "β-binomial model" is known in the art. In some embodiments, the second likelihood is determined as a product of β-binomial model likelihoods determined for each background using a β-binomial model with a context-specific prior set based on the context-specific mutation frequency. In some embodiments, nucleotide abundance and nucleotide ratio in the sequencing data are used to set the context-specific prior. In some embodiments, the MAESTRO workflow described herein is used to measure sample-specific SNV frequencies. Sample-specific SNV frequencies can be used to determine the likelihood that a mutated DNA duplex is a true mutation originating from a tumor or a spontaneous error. The term "spontaneous error" refers to a technical artifact or a random error that occurs within the DNA of a cell. Spontaneous errors can occur naturally, for example, through replication errors during DNA replication. Spontaneous errors determine background error rates, such as the frequency of background single nucleotide variants (SNVs).

[0063] Action 2806 provides the posterior probability. Procedure 2800 proceeds to action 2808, where the posterior probability is compared to a predetermined threshold. Action 2806 is a decision point. If the posterior probability is greater than the predetermined threshold, procedure 2800 proceeds to action 2808a, where the patient sample is determined to be MRD positive. If the posterior probability is less than the predetermined threshold, procedure 2800 proceeds to action 2808b, where the patient sample is determined to be MRD negative.

[0064] Next, process 2800 proceeds to action 2810, in which the patient's cancer treatment plan is modified based on whether the patient's sample is MRD positive or MRD negative.

[0065] In some embodiments, if a patient's sample is determined to be MRD-positive or MRD-negative, the patient's cancer treatment plan is modified. In some embodiments, the patient is treated according to the initial treatment plan before determining the posterior probability that the patient's sample is MRD-positive. In some embodiments, the initial treatment plan is modified in response to determining that the patient's sample is MRD-positive to create an updated treatment. In some embodiments, the patient is treated with the updated treatment plan in response to determining that the patient's sample is MRD-positive. In some embodiments, the initial treatment plan includes administering one or more first drugs to the patient at one or more first doses. In some embodiments, modifying the treatment plan includes increasing one or more of one or more first doses to one or more second doses. In some embodiments, the modified treatment plan includes administering one or more first drugs at one or more second doses. In some embodiments, the modified treatment plan includes one or more of one or more first drugs and one or more second drugs. In some embodiments, when a patient's sample is determined to be MRD-negative, the patient's treatment plan is modified to reduce the dose of drug administered to the patient or to discontinue treatment entirely. In some embodiments, the drug is a chemotherapy drug.

[0066] Figure 29 This is a flowchart illustrating a schematic process 2900 for identifying the presence of one or more mutation sites in a patient sample. Specifically, Figure 29 The main steps in the MAESTRO workflow are described, with a detailed description in the section entitled "Micro-allele enrichment sequencing by oligonucleotides (MAESTRO)".

[0067] Procedure 2900 begins with action 2902, in which a pool of DNA duplexes with, suspected of having, or at risk of having mutation sites found in a tumor fingerprint is obtained. The term "tumor fingerprint" refers to one or more mutations associated with a tumor found in a patient's body. The procedure for determining a patient's tumor fingerprint is described in detail in [the relevant section]. Figure 30 and Figure 31 .

[0068] Next, process 2900 proceeds to action 2904, where a unique molecular identifier (UMI) is attached to the 5' and 3' ends of the DNA duplexes in the DNA duplex pool to generate labeled duplexes. As described herein, a UMI is a short nucleotide sequence used to uniquely label each molecule in a patient sample. The UMI is described in detail in the section entitled "Micro-allele enrichment sequencing by oligonucleotides (MAESTRO)".

[0069] Process 2900 proceeds to action 2906, in which the labeled DNA duplex is amplified by polymerase chain reaction (PCR) to produce amplified DNA duplex.

[0070] Process 2900 proceeds to action 2908, in which the amplified DNA double strand is denatured to produce single-stranded amplified DNA.

[0071] Procedure 2900 proceeds to action 2910, in which single-stranded amplified DNA with one or more mutation sites found in the patient's tumor fingerprint is captured using an allele-specific probe, which anneals to the one or more mutation sites found in the tumor fingerprint to produce an enriched sample. Examples of allele-specific probes are described in detail in the section entitled "MAESTRO Probes".

[0072] In some embodiments, allele-specific probes are designed to anneal to mutation sites found in a patient's tumor fingerprint. In some embodiments, the allele-specific probes are patient-specific. In some embodiments, multiple allele-specific probes are used to capture multiple mutation sites found in a patient's tumor fingerprint. In some embodiments, a first set of allele-specific probes is designed to capture one or more mutation sites found in a first patient's tumor fingerprint. In some embodiments, a second set of allele-specific probes is designed to capture one or more mutation sites found in a second patient's tumor fingerprint. In some embodiments, an Nth set of allele-specific probes is designed to capture one or more mutation sites found in a Nth patient's tumor fingerprint. In some embodiments, multiple sets of allele-specific probes are designed, each set designed to capture one or more mutations in the tumor fingerprints of multiple patients. In some embodiments, multiple sets of allele-specific probes are added to a patient sample derived from a single patient. Using multiple sets of allele-specific probes (each set designed to capture one or more mutations in the tumor fingerprints of multiple patients) in a patient sample derived from a single patient increases the specificity and sensitivity of the allele-specific probe set designed to capture one or more mutations found in the tumor fingerprint of a single patient. Sensitivity and specificity are increased because allele-specific probes not designed to capture one or more mutations found in the tumor fingerprint of a single patient serve as internal controls, compared with allele-specific probes designed to capture one or more mutations found in the tumor fingerprint of a single patient.

[0073] In some embodiments, additional probes with low binding specificity to mutation sites found in a patient’s tumor fingerprint are used to determine an estimated total number of mutation-detected DNA duplexes in the patient’s sample.

[0074] Process 2900 proceeds to action 2912, in which the enriched sample is sequenced. In some embodiments, next-generation sequencing (NGS) is used to sequence the enriched sample.

[0075] Process 2900 proceeds to action 2914, wherein if one or more mutation sites are observed in the two strands of the labeled duplex (as identified by analysis of the UMI sequence), the presence of one or more mutation sites is identified.

[0076] Figure 30 This is a flowchart illustrating the process of sequencing, processing, and filtering sequencing data from tumor genomic DNA obtained from a subject.

[0077] Various (e.g., some or all) actions of process 3000 can be implemented using any suitable computing device. For example, in some embodiments, one or more actions of the illustrative process 3000 can be implemented in a clinical or laboratory setting. For example, one or more actions of process 3000 can be implemented on a computing device located within a clinical or laboratory setting. In some embodiments, the computing device can obtain sequencing data directly from a sequencing apparatus located within a clinical or laboratory setting.

[0078] Alternatively or concurrently, one or more actions of the illustrative process 3000 may be performed in an environment remote from the clinical or laboratory setting. For example, one or more actions of process 3000 may be performed on a computing device located outside the clinical or laboratory setting. In this case, the computing device may indirectly obtain sequencing data generated using sequencing equipment located inside or outside the clinical or laboratory setting. For example, expression data may be provided to the computing device via a communication network, such as the Internet or any other suitable network.

[0079] Process 3000 begins with action 3002, in which tumor genomic DNA sequencing data is obtained from a tumor originating from the patient to identify multiple mutation sites in the tumor genomic DNA.

[0080] Process 3000 is performed at action 3004, where multiple mutation sites are processed and filtered to identify at least one selected mutation site associated with the patient, thereby creating a tumor fingerprint. The steps of processing and filtering to identify at least one selected mutation site are described in detail in... Figure 31 .

[0081] Figure 31 This is a flowchart illustrating the process of sequencing, processing, and filtering sequencing data from tumor genomic DNA obtained from a subject. 3100

[0082] Various (e.g., some or all) actions of process 3100 can be implemented using any suitable computing device. For example, in some embodiments, one or more actions of the illustrative process 3100 can be implemented in a clinical or laboratory setting. For example, one or more actions of process 3100 can be implemented on a computing device located within a clinical or laboratory setting. In some embodiments, the computing device can obtain sequencing data directly from a sequencing apparatus located within a clinical or laboratory setting.

[0083] Alternatively or concurrently, one or more actions of the illustrative process 3100 may be performed in an environment remote from the clinical or laboratory setting. For example, one or more actions of process 3100 may be performed on a computing device located outside the clinical or laboratory setting. In this case, the computing device may indirectly obtain sequencing data generated using sequencing equipment located inside or outside the clinical or laboratory setting. For example, expression data may be provided to the computing device via a communication network, such as the Internet or any other suitable network.

[0084] Process 3100 begins with action 3102, where selection... Figure 30 At least one of the multiple mutation sites identified in the study.

[0085] Process 3100 is performed at action 3104, wherein at least one selected mutation site is analyzed to determine matching normal genomic DNA and matching tumor genomic DNA.

[0086] Process 3100 is performed at action 3106, wherein the following are determined for at least one mutation site: a first number of mutant double strands in the matched normal genomic DNA, a second number of mutant double strands in the matched tumor genomic DNA, and the ratio of mutant double strands (ALT double strands) to mutant single-stranded common molecules (ALT single-stranded common molecules) in the matched tumor genomic DNA.

[0087] Process 3100 proceeds to action 3108, which is a decision point. If the first number of mutant double strands in the matched normal genomic DNA is zero, the second number of mutant double strands in the matched tumor genomic DNA is greater than zero, and the ratio of mutant double strands to mutant single strands in the matched tumor genomic DNA is greater than 0.15, then process 3100 proceeds to action 3108a. If the number of mutant double strands in the matched normal genomic DNA is not zero, the second number of mutant double strands in the matched tumor genomic DNA is not greater than zero, or the ratio of mutant double strands to mutant single strands in the matched tumor genomic DNA is not greater than 0.15, then process 3100 proceeds to action 3108b.

[0088] If process 3100 proceeds to action 3108a, the mutation site is associated with the patient and becomes part of the patient's tumor fingerprint. If process 3100 proceeds to action 3108b, the mutation site is not associated with the patient and does not become part of the patient's tumor fingerprint.

[0089] The procedure described herein is used to determine whether a patient sample is MRD-positive or MRD-negative. The method described herein was developed due to the surprising discovery that using probabilistic analysis to interpret the MAESTRO workflow can increase the sensitivity and specificity of MAESTRO. In some embodiments, the method described herein is used to detect MRD at a concentration between at least 0.10 parts per million (ppm) and at least 10.0 ppm of tumor-derived cell-free DNA. In some embodiments, MRD is detected at at least 0.10, at least 0.15, at least 0.20, at least 0.25, at least 0.30, at least 0.35, at least 0.40, at least 0.45, at least 0.50, at least 0.55, at least 0.60, at least 0.65, at least 0.70, at least 0.75, at least 0.80, at least 0.85, at least 0.90, at least 0.95, or at least 1.00 ppm. In some embodiments, MRD is detected at 0.78 ppm. The specificity of the method described herein is a significant improvement over other methods that typically only detect MRD when the concentration is 10.0 ppm or higher. In some embodiments, the method described herein is used to detect mutation sites derived from a patient's tumor fingerprint with high specificity. In some embodiments, the specificity is at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100%. In some embodiments, the specificity is 98% or higher.

[0090] The method described in this article has several advantages over existing methods for identifying MRDs in patient samples.

[0091] Micro-allele enrichment sequencing by recognizing oligonucleotides (MAESTRO)

[0092] This disclosure relates to the use of micro-allele enrichment sequencing by recognizing oligonucleotides (MAESTRO). MAESTRO is described in US 2023 / 0203568A1, the entire contents of which are incorporated herein by reference.

[0093] MAESTRO is a method for identifying the presence of one or more mutation sites or specific mutations in a patient sample. One embodiment of the MAESTRO method includes: (a) obtaining a pool of DNA duplexes having, suspected of having, or at risk of having one or more mutation sites in at least one strand of a DNA duplex, and optionally fragmenting the DNA duplexes; (b) attaching (e.g., ligating) a unique molecular identifier (UMI) (e.g., as part of an adapter molecule) to the 5' and 3' ends of the DNA duplexes to produce labeled duplexes, wherein the UMI is unique for each labeled duplex; (c) amplifying the labeled duplexes by polymerase chain reaction (PCR) to produce amplified duplexes; (d) denaturing the amplified duplexes to produce single-stranded amplified DNA; (e) capturing the single-stranded amplified DNA with the specific mutation using an allele-specific probe annealed to the specific mutation to produce an enriched sample; (f) sequencing the enriched sample; and (g) confirming the presence of the specific mutation if the specific mutation is observed in both strands of the labeled duplexes identified by the UMI.

[0094] In some aspects, the MAESTRO method includes: (a) obtaining a pool of DNA duplexes containing a specific mutation in at least one strand, and attaching (e.g., ligating) a unique molecular identifier (UMI) to the 5' and 3' ends of each strand of the DNA duplexes to generate labeled duplexes, wherein the UMI is specific for each labeled duplex; (b) amplifying the labeled duplexes by polymerase chain reaction (PCR) to generate amplified duplexes, subsequently denaturing the amplified duplexes to generate single-stranded amplified DNA; (c) capturing the single-stranded amplified DNA with the specific mutation using an allele-specific probe annealed to the specific mutation to generate an enriched sample, and sequencing the enriched sample; and (d) calculating the duplex common (DSC) to single-strand common (SSC) ratio (DSC to SSC ratio) using the UMI, and identifying a specific mutation if the DSC to SSC ratio is greater than 0.15.

[0095] As used herein, the terms "specific mutation" and "mutation site" refer to a change, alteration, or modification of nucleotides in a nucleic acid compared to a wild-type sequence of interest (e.g., unmutated, reference sequence) that is a target of the probes of this disclosure. For example, a specific mutation may be known to be associated with a condition (e.g., disease or condition). Therefore, assessing the presence of a specific mutation in a subject or a sample from a subject (e.g., a DNA duplex pool), or assessing it to identify any such specific mutation, may be used for (but not limited to) the diagnosis, treatment, and / or evaluation of the subject. In some embodiments of this disclosure, the identification and / or presence of a specific mutation is used to indicate the presence of a nucleic acid (e.g., DNA, cfDNA) associated with a condition. In some embodiments, the methods of this disclosure use this determination to indicate and / or evaluate minimal residual disease (MRD) in a subject.

[0096] Not limited to, mutations may include substitution, insertion, deletion, or any combination thereof. In some embodiments, at least one mutation is present. In some embodiments, more than one mutation is present. In some embodiments, where more than one mutation is present, the mutations are distinct (e.g., not of the same type (e.g., substitution, insertion, deletion)). In some embodiments, where more than one mutation is present, the mutations are identical (e.g., not of the same type (e.g., substitution, insertion, deletion)). Furthermore, in some embodiments, the mutation results in a frameshift. In some embodiments, the mutation comprises a single nucleotide polymorphism (SNP). In some embodiments, the mutation is a structural variant. As used herein, a structural variant should refer to a change in the structure of a subject's chromosome, which may include multiple changes in the subject's genome. For example (not limited to), structural variants may include microscopic and submicroscopic alterations such as deletions, duplications, copy number variations, insertions, inversions, and translocations. In some embodiments, the mutation occurs in one strand of a nucleic acid duplex. In some embodiments, the strand is the positive strand (e.g., '+', sense strand). In some embodiments, the strand is the negative strand (e.g., '-', antisense strand). In some embodiments, the mutation occurs in both strands of a nucleic acid duplex (e.g., '+' and '-' strands). In some embodiments, the mutation is a mutation known to be associated with cancer. In some embodiments, the cancer is leukemia. In some embodiments, the mutation is known to be associated with or originate from tumor tissue.

[0097] In some embodiments, specific mutations are selected (e.g., established as targets) based on existing information, such as literature presenting lists of known mutations, databases of known mutations, and / or any other source of known mutations. In some embodiments, specific mutations are selected from existing information about the subject (e.g., from subjects from whom DNA duplex pools and / or enriched samples are obtained). For example, existing information might be the subject's history of a disease or condition, or the subject's history of specific mutations. In some embodiments, specific mutations are selected based on known associations with a disease or condition. In some embodiments, specific mutations are selected based on the fact that the subject has, is suspected of having, or has had a disease associated with or related to the specific mutation. In some embodiments, specific mutations are selected based on existing information or sequencing data from tissue samples (currently or previously obtained) from the subject. In some embodiments, the tissue sample is tumor tissue.

[0098] In some embodiments, a DNA double strand is obtained from the sample. As used in the methods herein, the sample can be any sample from a subject. In some embodiments, the sample is a patient sample. In some embodiments, the patient sample is a biological sample. A biological sample can be obtained from (but is not limited to) blood, skin, tissue, hair, saliva, bodily fluids, cells, or any other biological component from which a technician can use techniques known in the art and readily available to determine the parameter being evaluated (e.g., the presence of a nucleic acid containing a specific mutation or a double strand containing that mutation). In some embodiments, the sample is a blood sample. In some embodiments, the blood sample contains cell-free DNA (“cfDNA”). In some embodiments, the sample is a plasma sample. In some embodiments, the plasma sample contains cfDNA.

[0099] In some embodiments, samples are obtained by biopsy. In some embodiments, the biopsy is a liquid biopsy. Liquid biopsies are well known to those skilled in the art. They are generally referred to as liquid or fluid-phase biopsies, in which sampling and analysis are performed on non-solid biological material (e.g., bodily fluids, blood, saliva, etc.) from a subject. The sample from the liquid biopsy is then analyzed for the presence of biomarkers (e.g., specific mutations or nucleic acids and / or duplexes carrying specific mutations or sequences). The composition of the fluid may vary depending on the target to be analyzed, such as circulating tumor cells and / or circulating tumor DNA (ctDNA), circulating endothelial cells, cell-free DNA (cfDNA), and / or cell-free fetal DNA (cffDNA). In some embodiments, the liquid biopsy sample is a blood sample. In some embodiments, the liquid biopsy is performed on the subject's germ cells (e.g., from an egg or sperm). In some embodiments, the methods disclosed herein target cfDNA. However, any suitable liquid biopsy may be used in conjunction with the methods described herein, which can be determined by those skilled in the art without excessive experimentation.

[0100] Once a sample is obtained (e.g., acquired), it is used to analyze the DNA duplex. As used herein, the term "DNA duplex" refers to a single double-stranded nucleic acid molecule. Therefore, the term should be understood to include genomic DNA (gDNA), germline DNA, cell-free DNA, and other forms of DNA, provided that the molecule contains two annealed strands in at least a portion of the nucleic acid molecule. Thus, a DNA duplex can refer to a complete DNA molecule comprising the entire genome, a portion thereof, or a fragment thereof (e.g., after fragmentation or shearing), provided that the molecule retains double strands in at least a portion of the nucleic acid molecule.

[0101] In some embodiments, the DNA duplex is fragmented. This fragmentation breaks the nucleic acid into smaller fragments. In some embodiments, the DNA duplex is fragmented to reduce its size. In some embodiments, the DNA duplex is fragmented to make the DNA duplex more uniform in size. In some embodiments, the DNA duplex is fragmented to produce a length of about 50 to about 250 base pairs (e.g.,The length is approximately 50 to approximately 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 10 7, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203 Fragments of 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, and 250 base pairs. In some embodiments,The DNA double helix is ​​fragmented to produce fragments of about 100 to about 200 base pairs in length. In some embodiments, the DNA double helix is ​​fragmented to produce fragments of about 120 to about 180 base pairs in length. In some embodiments, the DNA double helix is ​​fragmented to produce fragments of about 130 to about 170 base pairs in length. In some embodiments, the DNA double helix is ​​fragmented to produce fragments of about 140 to about 160 base pairs in length. In some embodiments, the DNA double helix is ​​fragmented to produce fragments of about 150 base pairs in length. In some embodiments, the DNA double helix has been fragmented, for example, from cell-free DNA derived from blood plasma.

[0102] Fragmentation can be accomplished by physical means (e.g., sonication or physical force), enzymatic means, or chemical means. However, all forms of fragmentation inherently damage the strands to break them down into smaller pieces. Fragmentation methods are well known in the art and are readily understood and chosen by those skilled in the art. In some embodiments, prior to step (a), the sample has been: (i) fragmented; or (ii) cut and labeled. In some embodiments, fragmentation is performed by: (a) physical fragmentation; (b) enzymatic fragmentation; and / or (c) chemical fragmentation. In some embodiments, fragmentation is performed by physical fragmentation. In some embodiments, physical fragmentation is performed by nebulization. In some embodiments, physical fragmentation is performed by acoustic shearing. In some embodiments, physical fragmentation is performed by needle shearing. In some embodiments, physical fragmentation is performed by Freund's pressure cell disruption. In some embodiments, physical fragmentation is performed by sonication. In some embodiments, physical fragmentation is performed by hydrodynamic shearing. In some embodiments, fragmentation is performed by enzymatic fragmentation. In some embodiments, enzymatic fragmentation is performed by nucleases or endonucleases. In some embodiments, enzymatic fragmentation is performed by DNase I. In some embodiments, enzymatic fragmentation is performed by restriction endonucleases. In some embodiments, enzymatic fragmentation is performed via transposases. In some embodiments, it is performed via chemical fragmentation. In some embodiments, chemical fragmentation is performed via thermal and divalent metal cation fragmentation.

[0103] Once the DNA duplex is fragmented, a unique molecular identifier (UMI) can be attached as part of a sequencing adapter to one or both ends of the DNA duplex, containing sequences that facilitate primer binding and amplification. This sequencing preparation process is well-established in the field, while other methods exist that attach sequencing adapters containing UMIs. A UMI is a tag (e.g., a specific sequence) that can be used to identify the strand and / or its duplex counterpart (e.g., the complementary strand) during the remainder of the method and during any post-sequencing processing and / or evaluation (e.g., analysis). In some embodiments, the UMI is contained within the sequencing adapter. The use of UMIs is well-known throughout the field. In some embodiments, the UMI is attached to at least the 5' end of at least one strand of the DNA duplex. In some embodiments, the UMI is attached to both 5' ends of the DNA duplex. In some embodiments, the UMI is attached to at least the 3' end of at least one strand of the DNA duplex. In some embodiments, the UMI is attached to both 3' ends of the DNA duplex. In some embodiments, the UMI is attached to at least the 5' end of at least one strand of the DNA duplex and each of the 3' ends of at least one strand of the DNA duplex. In some embodiments, the UMI is attached to all 5' ends and all 3' ends of the DNA duplex. In some embodiments, the UMIs attached to the DNA duplex are identical to each other but unique to the DNA duplex. In some embodiments, the UMIs of the DNA duplex are unique to each other and unique to the DNA duplex. In some embodiments, the UMI is not unique to the DNA duplex, but is unique to the DNA duplex when evaluated in conjunction with start and / or stop sequencing sites. In some embodiments, the length of the UMI is between about 1 nucleotide and about 20 nucleotides. In some embodiments, the length of the UMI is between about 3 nucleotides and about 18 nucleotides. In some embodiments, the length of the UMI is between about 5 nucleotides and about 16 nucleotides. In some embodiments, the length of the UMI is between about 6 nucleotides and about 15 nucleotides. In some embodiments, the length of the UMI is between about 8 nucleotides and about 15 nucleotides. In some embodiments, the UMI is attached to the DNA duplex by ligation. One of the benefits and characteristics of duplex sequencing is that the associations between the UMI sequences added to the top and bottom strands are known (e.g., complementary to each other, or providing an indication of which sequence comes from the top and bottom strands), so reads from each strand can be paired back to the same original DNA duplex. This understanding is a key component of duplex sequencing. In some embodiments, the UMI is then unique for each duplex. In other embodiments, DNA duplexes may share the same UMI sequence. However, the probability of two DNA duplexes sharing the same UMI and the same start and end positions in the genome is extremely low. With this in mind, deduplication can be performed on sequencing reads.

[0104] Following UMI attachment (e.g., adapters containing UMIs), the DNA duplex is amplified to produce amplified duplexes (i.e., sequencing libraries, which can be defined as a collection of DNA fragments to which adapters have been added to facilitate their amplification and sequencing). Any suitable method known to those skilled in the art can be employed, but amplification is generally accomplished via polymerase chain reaction (PCR). PCR has been known in the field for decades and is well-documented, with methods and protocols readily available and readily understood by those skilled in the art. In some embodiments, the DNA duplex is amplified by PCR.

[0105] Once amplified, the amplified DNA duplex (i.e., the sequencing library) needs to be prepared for capture by the allele-specific probes of this disclosure. In some embodiments, the amplified DNA duplex (i.e., the sequencing library) is denatured to separate the strands of the DNA duplex, thereby producing single-stranded amplified DNA. Any method determined by a person skilled in the art to be suitable may be used for denaturation or strand separation, such as (but not limited to), altering the ambient temperature of the DNA duplex (e.g., heating, cooling), treating with sodium hydroxide (NaOH), or placing the DNA duplex in a salt-rich environment. In some embodiments, denaturation (e.g., strand separation) of the DNA duplex is achieved by altering the ambient temperature. In some embodiments, the temperature change is accomplished by heating.

[0106] Once the DNA duplex has been fragmented, attached with UMI, amplified, and denatured, it can be enriched against a target sequence (e.g., a single-stranded amplified DNA carrying (e.g., containing) a specific mutation). The enrichment process can be accomplished using probes. In some embodiments, the probes disclosed herein are any probes as described herein or probes manufactured according to the methods for manufacturing probes disclosed herein. In some embodiments, the probe is an allele-specific probe. Further embodiments of the probe are disclosed below. In some embodiments, the probe comprises a sequence complementary to a portion of the single-stranded amplified DNA (e.g., such that it targets and anneals to that sequence (e.g., binds selectively)), wherein that portion contains a specific mutation, and a means for recovering (e.g., capturing) the probe or separating the probe from foreign matter (e.g., unbound nucleic acids). For example, the probe may target a sequence as described herein and contain biotin. Thus, the property of biotin binding to streptavidin can be utilized to recover the probe. Once the probe binds to single-stranded amplified DNA containing a specific mutation, they are captured, resulting in an enriched sample. Through this process, the sample will contain a higher concentration of single-stranded amplified DNA containing specific mutations than the original sample (e.g., enriched with single-stranded amplified DNA containing specific mutations). This capture (e.g., enrichment) of single-stranded amplified DNA can occur once or multiple times. In the case of multiple captures (e.g., multiple enrichments), capture can be performed on the sample containing single-stranded amplified DNA and / or the enriched sample. In some embodiments, capture is performed at least once. In some embodiments, capture is performed more than once (e.g., 2, 3, 4, 5, 6 or more times). In some embodiments, capture is performed more than 10 times. In some embodiments, capture is performed more than 100 times. In some embodiments, capture is performed more than 1,000 times.

[0107] Furthermore, multiple probes can be used to perform capture. In some embodiments, more than one probe is used to capture single-stranded amplified DNA. In some embodiments, the multiple probes can be different and target the same specific mutation. In some embodiments, more than one probe is used during capture, and these probes are different from each other and target different specific mutations. By using different probes that target sequences containing different (e.g., unique) specific mutations, the methods disclosed herein can be used to capture (e.g., enrich) DNA duplexes concurrently (e.g., simultaneously) a set of (e.g., combinations, multiple) mutations. Each probe can target a specific mutation (or more than one mutation) known to be associated with the same or different conditions. In some embodiments, multiple probes are used, each targeting a specific mutation (the same, different, or a combination thereof), wherein all specific mutations are associated with or known to be associated with a single condition (e.g., disease). In some embodiments, multiple probes are used, each targeting a specific mutation, wherein at least one specific mutation is associated with or known to be associated with at least one condition (e.g., disease) that is different from at least one condition known to be associated with at least one other specific mutation.

[0108] In some embodiments, when using more than one probe, each probe targets the same specific mutation that the other probes target. In some embodiments, when using more than one probe, at least one probe targets a specific mutation that is different from the specific mutation targeted by at least one other probe.

[0109] In some embodiments, at least 25 (e.g., 25, 26, 27, 27, 50, 100 or more) different probes are used (e.g., targeting 25 different specific mutations). In some embodiments, at least 50 (e.g., 50 or more) different probes are used (e.g., targeting 50 different specific mutations). In some embodiments, at least 100 (e.g., 100 or more) probes are used (e.g., targeting 100 different specific mutations). In some embodiments, at least 500 (e.g., 500 or more) probes are used (e.g., targeting 500 different specific mutations). In some embodiments, at least 1,000 (e.g., 1,000 or more) different probes are used (e.g., targeting 1,000 different specific mutations). In some embodiments, at least 10,000 (e.g., 10,000 or more) different probes are used (e.g., targeting 10,000 different specific mutations). In some embodiments, when more than one probe is used to capture more than one different specific mutation, the specific mutation is located in a non-overlapping region of the subject's genome to obtain a DNA duplex.

[0110] Once the probe is annealed with the single-stranded amplified DNA, and the probe, along with any bound single-stranded amplified DNA, is recovered to produce an enriched sample, the sample is prepared for sequencing. In some embodiments, the single-stranded DNA is sequenced using a double-stranded sequencing method. Double-stranded sequencing is a type of nucleic acid sequencing that uses information from both strands of a double-stranded DNA molecule to generate a genomic map of the sample or the subject from whom the sample was obtained. Here, we also use the term "double-stranded sequencing" to represent any sequencing method that achieves high accuracy by requiring the sequences of both strands of each DNA double-stranded DNA molecule to be common, although any suitable nucleic acid sequencing method can be used. Double-stranded sequencing inherently has the ability to provide higher accuracy in nucleic acid sequencing because computational analysis can use known properties of the double-stranded DNA molecule to correct for errors. For example (but not limited to), understanding that nucleobases form canonical base "pairs" when they are part of a double-stranded DNA molecule. This property of nucleic acids has been known since at least the second half of the last century and is readily understood and appreciated by those skilled in the art. Therefore, using this knowledge, a predicted complementary sequence can be inferred and determined from the sequencing of one strand of the double-stranded DNA molecule. This inferred complementary sequence can then be compared with the results of sequencing the second nucleic acid strand of the double-stranded DNA molecule. When comparing the two strands, they can confirm the obtained sequence or highlight differences, thereby identifying potential damage (e.g., damaged bases) or mismatches found only on one strand, or sequencing errors or regions requiring further investigation. These differences may stem from incorrect base insertions, deletions, or mutations (e.g., damaged bases). Furthermore, the results of double-strand sequencing can be further compared with reference data to gain deeper insights into potential mutations in the sequence. Therefore, double-strand sequencing provides a high-precision method for resolving nucleic acid sequences, and due to its accuracy, the impact of differences (e.g., the impact of mutations in genomic data) can be more clearly determined. In some embodiments, enriched samples are sequenced using double-strand sequencing.

[0111] After sequencing, the generated data (e.g., sequencing results) can be queried by the user to identify (e.g., determine, assess, confirm) the presence of sequences containing specific mutations. In some embodiments, if a sequence containing (e.g., including) a specific mutation is present in the sequencing results, the specific mutation is identified. In some embodiments, the sequence containing the specific mutation may be the original top (e.g., sense, '+') strand. In some embodiments, the sequence containing the specific mutation may be the original bottom (e.g., antisense, '-') strand. In some embodiments, if the specific mutation occurs or is contained in the top strand... or Specific mutations are identified in the bottom strand-related sequences. In some embodiments, if the specific mutation occurs or is contained in the top strand of the original DNA duplex, the mutation is identified. and In both strands, a specific mutation is identified. When a specific mutation appears in both strands, the technician understands that the specific mutation is relative to... Base pairing Therefore, the sequencing will differ (because they are complementary) but will contain the same specific mutations. Assessing the top and bottom strands to determine sequence pairing can be done by utilizing the unique properties of UMIs, which are attached to each strand and unique to the duplex. After separating the pairings, the sequences can be aligned using standard tools for nucleic acid alignment (e.g., BWA, BLAST, HPC-BLAST, CS-BLAST, CUDASW++, DIAMOND, FASTA, etc.). Such methods are well-known in the field, and software for performing such alignments is readily available free of charge.

[0112] In some embodiments, double-stranded co-occurrence (DSC) and single-stranded co-occurrence (SSC) are used to form a ratio. Methods for determining co-occurrence sequences are well known in the art and, in the context of nucleic acids, are generally known to refer to determining an acceptable sequence based on the most frequent nucleotide found at a given position in a sequence by comparing the positions of numerous aligned sequences. When establishing a DSC / SSC ratio, a co-occurrence sequence is prepared for each sequence targeted by a given probe. Ideally, there will be one given co-occurrence sequence for each set of single-stranded amplified DNA captured by a given probe, and furthermore, there will be one given co-occurrence sequence for the complementary strand of the single-stranded amplified DNA captured by a given probe. As mentioned elsewhere in this disclosure, the single-stranded amplified DNA contains a UMI, which allows the strand to be traced to its DNA duplex, thus allowing the two strands to be analyzed as a single duplex. By utilizing this property, a co-occurrence sequence (e.g., a double-stranded co-occurrence sequence (DSC)) can be established for the duplex. Ideally, there should be only one DSC for each set of SSCs captured by the probe targeting a given specific mutation. Therefore, the optimal DSC to SSC ratio is 0.5 (e.g., 1 DSC to 2 SSCs). However, variations can occur in single-stranded amplified DNA due to imperfect capture, as well as other point mutations, sequencing errors, or sequence errors introduced during PCR. Therefore, achieving a DSC to SSC ratio of 0.5 is difficult, if not impossible. However, by setting a threshold on the DSC to SSC ratio, a filter is created to eliminate false detections due to lack of accuracy and / or the presence of too many variant sequences. In some embodiments, the DSC to SSC ratio of any method of this disclosure is at least 0.1 (e.g., 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3.0, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 4.0 or more). In some embodiments, the DSC to SSC ratio of any method of this disclosure is greater than or equal to 0.15. In some embodiments, the DSC to SSC ratio of any method of this disclosure is greater than or equal to 0.2. In some embodiments, the DSC to SSC ratio of any method of this disclosure is greater than or equal to 0.3.

[0113] In some embodiments, the methods of this disclosure relate to methods for detecting specific mutations, wherein the specific mutation is a single nucleotide polymorphism. In some embodiments, the methods of this disclosure relate to methods for detecting specific mutations, wherein the specific mutation is a structural variant.

[0114] It has been observed that certain base and / or base pairs are more prone to error than others (e.g., low noise, high noise). By investigating the presence of low-noise mutations (e.g., those less prone to error), the likelihood of the observed specific mutations increases. Therefore, when establishing specific mutations identified using the methods of this disclosure, the accuracy of mutations containing specific mutations at adenine (A) and / or thymine (T) sites in a reference sequence increases. As used herein, a site in a reference sequence refers to the location of a base pair in a common sequence of a given genome (or a fragment thereof). In some embodiments, the method involves tracking low-noise mutations. In other embodiments, the method involves tracking high-noise mutations. In some embodiments, low-noise mutations comprise mutations at reference sites containing A / T base pairs. In some embodiments, high-noise mutations comprise mutations at reference sites containing cytosine.

[0115] The methods of this disclosure may also include additional steps. For example (but not limited to), the methods may include a step of introducing a control (e.g., a positive control, a control used to evaluate and / or measure the efficiency of the method and / or probe). In some embodiments, the methods of this disclosure include a control. In some embodiments, the control is a positive control. As used herein, a positive control refers to a set of conditions known to produce a specific result in the method. For example, a synthetic mutant sequence (e.g., a synthetic polynucleotide) containing a probe target sequence (e.g., a sequence containing a specific mutation and annealed with the probe). In some embodiments, the methods of this disclosure include a positive control. In some embodiments, the positive control comprises a polynucleotide containing a specific mutation in a sequence annealed with a specific probe. In some embodiments, the internal control polynucleotide further comprises an index sequence. In some embodiments, the index sequence is variable. In some embodiments, the internal control polynucleotide is further side-attached at the 5' end by a universal forward binding primer and at the 3' end by a universal reverse binding primer. In some embodiments, the internal control polynucleotide is further side-attached at the 5' and 3' ends by sequencing adapters. In some embodiments, the internal control polynucleotide is further side-attached at the 5' end by a universal forward binding primer and at the 3' end by a universal reverse binding primer, these binding primers being further side-attached at the distal ends (e.g., the 5' and 3' ends of the construct) by sequencing adapters. By using such polynucleotides, along with an index and appropriate binding primers and sequencing adapters (collectively referred to as synthetic mutants), controls can be established by including synthetic mutants in DNA duplexes and / or enrichment samples prior to probe capture. If a probe fails to capture the synthetic mutant it targets, it may indicate a problem with the method and / or conditions. If a synthetic mutant is captured but no single-stranded amplified DNA is captured, a positive control is used to validate the method and the absence of such single-stranded amplified DNA. Using an index of synthetic mutants allows tracking multiple synthetic mutants against multiple probes (e.g., against multiple target sequences containing specific mutations). In some embodiments, different synthetic mutants are used for each different probe and / or different specific mutation.

[0116] In some embodiments, internal controls comprise a fixed number, but more than one, synthetic mutant for a single probe (e.g., a single specific mutation), wherein each synthetic mutant contains a unique index. By using more than one, but a known number, of synthetic mutants for a given specific mutation (e.g., a target sequence), each with a unique index, the method can evaluate (e.g., assess, quantify) the capture efficiency of the probe. For example, the number of unique synthetic mutants captured can be assessed based on the number of specific mutations (e.g., real mutants) captured by the probe. This property can be used for each specific mutation of the method (e.g., for multiple, more than one). In some embodiments, a set of internal controls is used for each different probe, wherein each set of synthetic mutants is targeted by a probe for the specific mutation, comprises a known fixed number, and contains a unique index.

[0117] In some embodiments, the term "internal" is used to describe the property of these controls being placed in DNA duplexes and / or enriched samples and sequenced together with single-stranded amplified DNA (e.g., internal controls). The term "internal control" should be understood to include all of the control types and variants described above.

[0118] In some embodiments, compared to conventional duplex sequencing methods, the method of this disclosure can identify specific mutations or select duplexes with at least 10-fold fewer sequencing reads (e.g., 10^1, 10^2, 10^3, 10^4, 10^5, 10^6). In some embodiments, compared to conventional duplex sequencing methods, the method of this disclosure can identify specific mutations or select duplexes with at least 50-fold fewer sequencing reads. In some embodiments, compared to conventional duplex sequencing methods, the method of this disclosure can identify specific mutations or select duplexes with at least 100-fold fewer sequencing reads. In some embodiments, compared to conventional duplex sequencing methods, the method of this disclosure can identify specific mutations or select duplexes with at least 500-fold fewer sequencing reads. In some embodiments, compared to conventional duplex sequencing methods, the method of this disclosure can identify specific mutations or select duplexes with at least 1,000-fold fewer sequencing reads. In some embodiments, compared to conventional duplex sequencing methods, the method of this disclosure can identify specific mutations or select duplexes with at least 10,000-fold fewer sequencing reads. In some embodiments, the methods disclosed herein can identify specific mutations or select duplexes with at least 100,000 times fewer sequencing reads compared to conventional duplex sequencing methods.

[0119] MAESTRO probe

[0120] The probes associated with this disclosure can help identify specific mutations (and / or low-abundance mutations) in DNA duplexes and / or enriched samples (such as samples from subjects).

[0121] In some embodiments, the probe length of any method of this disclosure is 10 to 60 nucleotides (e.g., 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, or 60 nucleotides). In some embodiments, the probe length of any method of this disclosure is about 15 to about 50 nucleotides. In some embodiments, the probe length of any method of this disclosure is about 20 to about 40 nucleotides. In some embodiments, the probe length of any method of this disclosure is about 12 to about 32 nucleotides. In some embodiments, the probe length of any method of this disclosure is about 28 to about 32 nucleotides. In some embodiments, the probe length of any method of this disclosure is 30 nucleotides.

[0122] The probes disclosed herein can be of any conformation known in the art. For example (but not limited to), the probe may comprise nucleotides of deoxyribose (e.g., DNA) and / or ribose (e.g., RNA). In some embodiments, the probe comprises DNA. In some embodiments, at least one nucleotide of the probe comprises a modification (e.g., a change or alteration to at least one component of the nucleotide (e.g., a nucleobase, sugar, or phosphate group)). In some embodiments, the probe does not contain modified nucleotides.

[0123] In some embodiments, the probe includes an additional portion. This portion may be a marker or tag. As used herein, a “marker” or “tag” refers to a molecule (e.g., nucleic acid, protein, etc.) that can be used in vitro and / or in vivo to identify the probe. A marker or tag can be any composition or molecule (e.g., nucleic acid, amino acid, peptide (e.g., glycosylated protein, oxine), fluorescent protein (e.g., green and / or red fluorescent protein), structure (e.g., tetracysteine ​​ring, epitope), any of which can be detected in vivo, in vitro, ex vivo, visually, or by utilizing the properties of the tag (e.g., fluorescence, magnetism, radioactivity, size, affinity, enzyme activity, etc.)). The portion may also be used to recover or isolate the probe, and thereby extend to any molecule bound to it. In some embodiments... In the example, the portion is a recovery portion, wherein the portion has a property that can be isolated and / or manipulated to separate the probe based on that property. For example (but not limited to), the portion may contain magnetic, chemical, physical, or affinity properties that can be used to separate the probe from foreign substances that do not have this property. Examples of such portions are well known in the art, and any suitable such portion may be used. For example (but not limited to), the recovery portion may contain biotin. In some embodiments, an additional portion is attached to the probe via a 5' nucleotide. In some embodiments, the recovery portion is attached to the probe via a 5' nucleotide. In some embodiments, the attachment is via a covalent bond.

[0124] In some embodiments, the probe comprises a nucleic acid sequence specific to a target sequence (e.g., a binding target). In some embodiments, the target sequence represents a specific mutation (e.g., a nucleotide sequence equivalent to a reference sequence except that it contains the mutation). In other words, the probe is designed to target a complementary sequence, wherein the complementary sequence contains a specific mutation compared to the reference sequence. In some embodiments, the specific mutation is associated with or related to a disease. Therefore, if the probe binds to the target sequence (e.g., containing the specific mutation), it indicates the presence of nucleic acid data associated with a disease.

[0125] In some embodiments, the probe sequence portion binding to a specific mutation, target sequence, or SNP is located within the middle 50% of the nucleotides containing the probe, or in other words, the probe portion contains nucleotides not located in the first quarter of the probe (e.g., the 5' end quarter) or the last quarter of the probe (e.g., the 3' end quarter). In some embodiments, the probe sequence portion binding to a specific mutation, target sequence, or SNP is located within the middle third of the nucleotides containing the probe, or in other words, the probe portion contains nucleotides not located in the first third of the probe (e.g., the 5' end third) or the last third of the probe (e.g., the 3' end third).

[0126] In some embodiments, the probe nucleotide that binds to a specific mutation or SNP is located within the middle 50% of the nucleotides containing the probe, or in other words, the probe portion contains nucleotides not located within the first quarter of the probe (e.g., the 5' end quarter) or the last quarter of the probe (e.g., the 3' end quarter). In some embodiments, the probe nucleotide that binds to a specific mutation or SNP is located within the middle third of the nucleotides containing the probe, or in other words, the probe portion contains nucleotides not located within the first third of the probe (e.g., the 5' end third) or the last third of the probe (e.g., the 3' end third). In some embodiments, the probe nucleotide that binds to a specific mutation or SNP is located within the middle 6% of the nucleotides containing the probe, or in other words, the probe portion contains nucleotides not located within the first 47% of the probe or the last 47% of the probe (e.g., the 3' end third).

[0127] In some embodiments, allele-specific probes are evaluated and modified to increase / decrease the Gibbs free energy (ΔG) of annealing the allele-specific probe with its complementary sequence. By controlling and / or modifying this property of the probe, the specificity and ability of the probe to more precisely distinguish sequences and amplify single-stranded DNA can be modulated (e.g., increased, decreased). Furthermore, by controlling this property, the stability of the probe binding can also be modulated (e.g., increased, decreased). In some embodiments, the Gibbs free energy (ΔG) of annealing the allele-specific probe with its complementary sequence is at least -25 kcal / mol at Temp = 50 °C, but not exceeding -5 kcal / mol at Temp = 50 °C (e.g., -25, -24, -23, -22, -21, -20, -19, -18, -17, -16, -15, -14, -13, -12, -11, -10, -9, -8, -7, -6, -5, or increments thereof). In some embodiments, the Gibbs free energy (ΔG) of annealing the allele-specific probe to its complementary sequence is at least -23 kcal / mol and not more than -7 kcal / mol at Temp = 50 °C. In some embodiments, the Gibbs free energy (ΔG) of annealing the allele-specific probe to its complementary sequence is at least -21 kcal / mol and not more than -9 kcal / mol at Temp = 50 °C. In some embodiments, the Gibbs free energy (ΔG) of annealing the allele-specific probe to its complementary sequence is at least -20 kcal / mol and not more than -12 kcal / mol at Temp = 50 °C. In some embodiments, the Gibbs free energy (ΔG) of annealing the allele-specific probe to its complementary sequence is at least -19 kcal / mol and not more than -13 kcal / mol at Temp = 50 °C. In some embodiments, the Gibbs free energy (ΔG) of the allele-specific probe annealing to its complementary sequence is at least -18 kcal / mol but not more than -14 kcal / mol at Temp = 50 °C. In some embodiments, the Gibbs free energy (ΔG) of the allele-specific probe annealing to its complementary sequence is at least -17 kcal / mol but not more than -15 kcal / mol at Temp = 50 °C. In some embodiments, the Gibbs free energy (ΔG) is modified by adjusting the length of the sequence in which the probe binds to the target sequence (e.g., by incorporating a specific mutation). In some embodiments, the length is increased. In some embodiments, the length is decreased. In some embodiments, the length is repeatedly adjusted until the Gibbs free energy (ΔG) is within a preferred range. In some embodiments, the length is repeatedly adjusted until the Gibbs free energy (ΔG) is within the range described herein.

[0128] Further evaluation and design considerations for probes constructed according to this disclosure include assessing the probe's potential ability to bind to other parts of the nucleic acid (e.g., other regions, parts, or fragments of the genome). Therefore, once a probe sequence is developed, it can be evaluated to see if it is homologous to any other region of the subject's genome from which DNA duplexes and / or enriched samples have been extracted. Many well-known methods, tools, and software programs are publicly and freely available for performing such searches (e.g., BLAST, etc.). In some embodiments, the target sequence of the allele-specific probe is homologous to fewer than 20 sequences of the subject's reference genome. In some embodiments, the target sequence of the allele-specific probe is homologous to fewer than 15 sequences of the subject's reference genome. In some embodiments, the target sequence of the allele-specific probe is homologous to fewer than 10 sequences of the subject's reference genome. In some embodiments, the target sequence of the allele-specific probe is homologous to fewer than 5 sequences of the subject's reference genome. In some embodiments, the target sequence of the allele-specific probe is 100% homologous to fewer than 20 sequences of the subject's reference genome. In some embodiments, the target sequence of the allele-specific probe is 100% homologous to fewer than 15 sequences of the subject's reference genome. In some embodiments, the target sequence of the allele-specific probe is 100% homologous to fewer than 10 sequences in the subject's reference genome. In some embodiments, the target sequence of the allele-specific probe is 100% homologous to fewer than 5 sequences in the subject's reference genome. If the number of sites homologous to the probe's target sequence (e.g., the sequence it binds to contains a specific mutation) is too large, the probe may be modified (e.g., altered). For example (but not limited to), the targeted sequence may be frameshifted in one or another direction relative to the position of the mutated nucleotide. This modification can be performed in either direction. Furthermore, this modification may also include changing the length of the probe (while maintaining the Gibbs free energy within an appropriate range), or the length of the probe may remain unchanged during this shift. In some embodiments, the sequence targeted by the allele-specific probe is shifted by 5 nucleotides or less in the 5' direction (e.g., 1, 2, 3, 4, or 5). In some embodiments, the sequence targeted by the allele-specific probe is shifted by 10 nucleotides or less in the 5' direction (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10). In some embodiments, the sequence targeted by the allele-specific probe is shifted by 5 nucleotides or less in the 3' direction (e.g., 1, 2, 3, 4, or 5). In some embodiments, the sequence targeted by the allele-specific probe is shifted by 10 nucleotides or less in the 3' direction (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10).

[0129] In some embodiments, the probes designed and / or selected according to one or more methods of this disclosure are at least partly attributable to their annealing temperature. For example, but not limited to, in some embodiments, allele-specific probes have an annealing temperature of at least 44 degrees Celsius (°C) but not exceeding 56 degrees Celsius. In some embodiments, allele-specific probes have an annealing temperature of at least 45 degrees Celsius (°C) but not exceeding 55 degrees Celsius. In some embodiments, allele-specific probes have an annealing temperature of at least 47 degrees Celsius (°C) but not exceeding 54 degrees Celsius. In some embodiments, allele-specific probes have an annealing temperature of at least 48 degrees Celsius (°C) but not exceeding 52 degrees Celsius. In some embodiments, allele-specific probes have an annealing temperature of at least 49 degrees Celsius (°C) but not exceeding 51 degrees Celsius. In some embodiments, allele-specific probes have an annealing temperature of at least 50 degrees Celsius (°C). In other embodiments, the allele-specific probe has a temperature of at least 40°C, or at least 41°C, at least 42°C, at least 43°C, at least 44°C, at least 45°C, at least 46°C, at least 47°C, at least 48°C, at least 49°C, at least 50°C, at least 51°C, at least 52°C, at least 53°C, at least 54°C, at least 55°C, at least 56°C, at least 57°C, at least 58°C, at least 59°C, at least 60°C, at least 61°C, at least 62°C, at least 63°C, at least 64°C, at least 65°C, at least 66°C, at least 67°C, at least 68°C, or at least 40°C. Annealing temperatures of at least 69°C, at least 70°C, at least 71°C, at least 72°C, at least 73°C, or at least 74°C but not exceeding 75°C, or not exceeding 50°C, not exceeding 51°C, not exceeding 52°C, not exceeding 53°C, not exceeding 54°C, not exceeding 55°C, not exceeding 56°C, not exceeding 57°C, not exceeding 58°C, not exceeding 59°C, not exceeding 60°C, not exceeding 61°C, not exceeding 62°C, not exceeding 63°C, not exceeding 64°C, not exceeding 65°C, not exceeding 66°C, not exceeding 67°C, not exceeding 68°C, not exceeding 69°C, or not exceeding 70°C.

[0130] In some embodiments, the recovery portion is attached to the 5' end of the allele-specific probe. In some embodiments, a minor groove binder (MGB) is attached to the 3' end of the allele-specific probe. In some embodiments, the recovery portion is biotin. However, it should be noted that any suitable tag or portion can be used that provides the means or property by which the probe (and any single-stranded amplified DNA bound thereto) can be separated and / or recovered. Suitable such tags and / or portions are well known in the art and are readily identifiable by those skilled in the art. In some embodiments, the allele-specific probe contains biotin. In some embodiments, biotin is recovered (e.g., captured) by utilizing the ability of biotin to preferentially bind to avidin. In some embodiments, biotin is recovered (e.g., captured) by utilizing the ability of biotin to preferentially bind to streptavidin. In some embodiments, biotin is recovered (e.g., captured) by utilizing the ability of biotin to preferentially bind to neutral avidin.

[0131] In some embodiments, this disclosure relates to an allele-specific probe that further comprises a minor groove binder (MGB). An MGB is a molecule, typically a crescent-shaped molecule, that selectively binds to the minor groove of a nucleic acid. MGBs typically bind to a specific sequence and may non-covalently bind via directional hydrogen bonding with base pair edges. An MGB ODN (+MGB) exhibits a larger free energy difference (ΔΔG) in the MGB region than an ODN without MGB (-MGB). In some embodiments, the probe may be modified by any known means to increase the ΔΔG between matched and mismatched pairs, e.g., locked nucleic acids; peptide nucleic acids; Super G, C, T, A (e.g., commercially available or tender); XNA nucleotides; etc.

[0132] Furthermore, MGB remains effective in distinguishing and binding target sequences at increasingly smaller dilutions (e.g., 1 copy). Finally, it is shown that MGB increases the melting temperature (T) for ODN binding in various configurations (mismatch ±, MGB ±). m ), where the ODN without mismatch and MGB showed elevated T m Therefore, adding MGB to the probes of this disclosure will improve affinity and specificity, further enhancing the resolution and sensitivity of the methods described herein. In some embodiments, the allele-specific probes contain MGB.

[0133] In some embodiments, this disclosure includes a method for manufacturing allele-specific probes, the method comprising: for each target sequence (e.g., a sequence containing a specific mutation), creating a 30-nucleotide probe centered on a modified base (e.g., a nucleotide targeting the specific mutation, such as a nucleotide complementary to the specific mutation). Depending on the base change, the probe may be designed to target either the positive or negative strand. The length is adjusted until the estimated delta G of the probe sequence is within an acceptable range (producing probe candidates of length between 20 and 40 nucleotides). Using the same strategy, the center of the probe is shifted up to 5 bp in either direction to create multiple candidates for each target. A BLAST search is performed and the candidate with the highest specificity to the target is selected. If its probe characteristics (delta G, length, %GC, melting temperature, BLAST hit count) do not meet pre-specified requirements, the given target can be removed from the design.

[0134] In some aspects, this disclosure includes a method for manufacturing an allele-specific probe, the method comprising: (a) identifying a specific mutation in a nucleic acid sequence of a genome; (b) generating a complementary nucleic acid (CNA) comprising the complementary base of the specific mutation; and (c) attaching a recovered portion of a 5' nucleotide to the allele-specific probe; wherein the complementary base is located in the middle 50% of the nucleotides of the CNA; wherein the CNA comprises at least 12 but not more than 60 nucleotides; wherein the Gibbs free energy of the CNA and the nucleic acid comprising the specific mutation is at least -20 but not more than -12; wherein the annealing temperature of the allele-specific probe is at least 48 degrees Celsius (°C) but not more than 52°C; and wherein the CNA is 100% homologous to fewer than 10 sequences in the genome.

[0135] Cancer treatment

[0136] This disclosure relates to the treatment of cancer patients. In some embodiments, dynamic minimal residual disease (MRD) detection is used to monitor the efficacy of treatment in patients with cancer. Many types of cancer can evade treatment by undergoing molecular changes, such as mutations or pathway alterations. Dynamic MRD detection can be used to detect with high sensitivity and specificity whether cancer cells in a patient's body are responding to cancer therapy. In some embodiments, dynamic MRD detection is used to determine whether a patient suspected of having cancer actually has cancer. Dynamic MRD detection can detect small amounts of cancer cells early on, which would otherwise not be detected until a tumor has formed. In some embodiments, dynamic MRD detection is used to determine whether a patient at risk of developing cancer actually has cancer. Individuals who may have genetic or other cancer risk factors may develop low levels of cancer cells before a tumor develops. Dynamic MRD detection can detect small amounts of cancer cells in a patient's body before other diagnostic procedures. In some embodiments, dynamic MRD detection is used to monitor tumor recurrence in patients who have previously had cancer. Many cancer therapies do not completely eliminate cancer from a subject's body, and a small number of cancer cells may remain after the tumor is removed from the patient. This small number of cancer cells can rapidly divide and form new tumors after the patient's treatment program ends. Minimal residual disease (MRD) detection provides monitoring for tumor recurrence in patients. MRD detection can be used in many aspects of cancer patient care. In some embodiments, MRD results can inform a physician whether to initiate, intensify, downgrade, modify, and / or discontinue cancer treatment. MRD detection can be used to initiate cancer treatment in a patient. In some embodiments, a patient who has not received cancer treatment but whose sample is determined to be MRD positive may subsequently receive cancer treatment, providing early diagnosis. In some embodiments, a patient receiving cancer treatment whose sample is determined to have an elevated MRD value may subsequently receive intensified or modified treatment. In some embodiments, a patient receiving cancer treatment whose sample is determined to be MRD negative may subsequently receive downgraded or discontinued treatment. In some embodiments, a patient receiving cancer treatment by administering one or more chemotherapy drugs whose sample is determined to have an elevated MRD value may subsequently receive treatment with one or more different chemotherapy drugs. In some embodiments, one or more chemotherapy drugs comprise dabrafenib, trametinib, ipilimumab, and / or nivolumab.

[0137] In some embodiments, a patient is treated according to an initial treatment plan, and then, if a sample from the patient is found to be MRD positive, the initial treatment plan is modified to create an updated treatment plan. In some embodiments: the initial treatment plan includes administering a first drug to the patient at a first dose; modifying the initial treatment plan includes changing (increasing or decreasing) the first dose to a second dose; and the updated treatment plan includes administering the first drug to the patient at the second dose. In some embodiments: the initial treatment plan includes administering the first drug to the patient; modifying the initial treatment plan includes changing the first drug to a second drug or providing the first drug in combination with the second drug; and the updated treatment plan includes administering the first drug, the second drug, or a combination thereof to the patient.

[0138] In some embodiments, the patient is treated according to an initial treatment plan, and if a sample from the patient is found to be MRD negative, the initial treatment plan is modified to create an updated treatment plan. In some embodiments, the initial treatment plan includes administering a first drug to the patient at a first dose; modifying the initial treatment plan includes increasing or decreasing the first dose to a second dose; and the updated treatment plan includes administering the first drug to the patient at a second dose. In some embodiments, the initial treatment plan includes administering the first drug to the patient; modifying the initial treatment plan includes removing the first drug from the initial treatment plan; and the updated treatment plan includes treating the patient without administering the first drug or discontinuing treatment of the patient.

[0139] Regardless of the type of cancer a patient has, dynamic minimal residual disease (MRD) testing can be used to determine whether a patient's sample is MRD-positive or MRD-negative. In some embodiments, the cancer is blood or lymphoma, bone or soft tissue cancer, brain or central nervous system cancer, breast cancer, childhood cancer, digestive system cancer, eye cancer, head and neck cancer, lung cancer, pelvic cancer, skin cancer, or urinary tract cancer. In some embodiments, the cancer is glioblastoma.

[0140] Subjects

[0141] As used herein, the term "subject" refers to any organism that requires treatment or diagnosis using the subject matter of this document. For example (but not limited to), subjects may include mammals and non-mammals. In some embodiments, the subject is a mammal. In some embodiments, the subject is a non-mammal. As used herein, "mammal" means any animal constituting the class Mammalia (e.g., human, mouse, rat, cat, dog, sheep, rabbit, horse, cow, goat, pig, guinea pig, hamster, chicken, turkey, or non-human primates (e.g., marmoset, macaque)). In some embodiments, the mammal is a human. In some embodiments, the subject is under the care and / or guidance of a medical professional (e.g., a patient). In some embodiments, the individual is a patient. In some embodiments, the subject has, is at risk of having, has had, or is suspected of having cancer. In some embodiments, the subject is a subject with a tumor, a subject who has had a tumor in the past, a subject at risk of having a tumor, or a subject suspected of having a tumor. In some embodiments, the tumor is cancerous.

[0142] Reagent test kit

[0143] In one aspect, this disclosure relates to kits for performing one or more methods of this disclosure (e.g., identifying specific mutations and / or low-abundance mutations) and / or determining whether a patient sample is MRD positive in DNA duplex samples and / or enriched samples.

[0144] In some embodiments, the kit comprises materials and / or reagents for performing one or more methods of this disclosure. For example (but not limited to), the kit may comprise components and / or reagents for performing the entire method and / or any part thereof. In some embodiments, the kit provides materials and apparatus for obtaining and / or procuring DNA duplex samples. In some embodiments, the kit comprises apparatus and / or housing (e.g., container) for containing any liquid phase or material of one or more methods of this disclosure.

[0145] In some embodiments, the kit contains any probe as described herein for use with one or more methods of this disclosure.

[0146] In some embodiments, the kit includes materials and / or reagents for performing a method for manufacturing allele-specific probes according to the present disclosure. In some embodiments, the kit includes probes generated by the methods of the present disclosure.

[0147] In some embodiments, the kit includes materials, devices, and / or reagents for performing liquid biopsies to detect one or more mutations.

[0148] Instructions for performing one or more methods of this disclosure may also be included in the kit described herein.

[0149] The kit may contain packaging or containers containing the components described herein.

[0150] In view of the desired applications and uses of one or more methods of this disclosure, other suitable components to be included in such kits will be apparent to those skilled in the art.

[0151] Having described several aspects and embodiments of the technology presented in this disclosure, it should be understood that various changes, modifications, and improvements will readily occur to those skilled in the art. Such changes, modifications, and improvements are intended to fall within the spirit and scope of the technology described herein. For example, those skilled in the art will readily conceive of various other means and / or structures for performing functions and / or obtaining results and / or one or more advantages described herein, and each such variation and / or modification is considered to be within the scope of the embodiments described herein. Those skilled in the art will recognize or be able to determine many equivalents of the specific embodiments of the invention described herein using only conventional experiments. Therefore, it should be understood that the above embodiments are presented as examples only, and that embodiments of the invention may be practiced in ways other than those specifically described within the scope of the appended claims and their equivalents. Furthermore, any combination of two or more features, systems, articles, materials, kits, and / or methods described herein, provided that such features, systems, articles, materials, kits, and / or methods do not contradict each other, is included within the scope of this disclosure.

[0152] Furthermore, as described, some aspects can be implemented as one or more methods. Actions performed as part of a method can be ordered in any suitable manner. Thus, embodiments can be constructed in which actions are performed in a different order than those shown, which may include performing several actions simultaneously, even if the actions are shown as sequential actions in the illustrative embodiments. Unless otherwise defined herein, scientific and technical terms used in connection with this disclosure should have the meanings commonly understood by one of ordinary skill in the art (e.g., a technician). The meaning and scope of terms are clear; however, in the event of any potential ambiguity, the definitions provided herein take precedence over any dictionary or external definition. Furthermore, unless the context requires otherwise, singular terms should include plurals, and plural terms should include singulars. In this disclosure, unless otherwise stated, the use of “or” means “and / or”. Furthermore, the use of the term “comprising” and other forms such as “including” and “already included” is non-limiting. Furthermore, unless otherwise specifically stated, terms such as “element” or “component” cover both elements and components comprising one unit and elements and components comprising more than one subunit.

[0153] Generally, the nomenclature and techniques used in conjunction with cell and tissue culture, molecular biology, immunology, microbiology, genetics, and protein and nucleic acid chemistry and hybridization described herein are those well-known and commonly used in the art. Unless otherwise stated, the methods and techniques of this disclosure are generally performed according to conventional methods well-known in the art and as described in the various general and more specific references cited and discussed throughout this disclosure. Enzymatic reactions and purification techniques are performed according to the manufacturer's specifications, as commonly done in the art or as described herein. The glossaries of analytical chemistry, synthetic organic chemistry, and pharmaceutical and medicinal chemistry used in conjunction with those described herein are those well-known and commonly used in the art. Standard techniques are used for chemical synthesis, chemical analysis, drug preparation, formulation and delivery, and treatment of subjects.

[0154] The terms “approximately” or “about” are used interchangeably herein and, when applied to one or more values ​​of interest, refer to a value similar to the reference value. In some embodiments, the terms “approximately” or “about” refer to a value falling within the range of 15%, 14%, 13%, 12%, 11%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, or less in either direction (i.e., greater than or less than the percentage) of the reference value, unless otherwise stated or obvious from the context (e.g., when the number exceeds 100% of the possible value).

[0155] Example

[0156] Example 1. MAESTRO-Pool enables highly parallel and specific mutation enrichment sequencing for the detection of minimal residual disease in cohort studies.

[0157] Background: Tracking patient-specific tumor mutations in cell-free DNA (cfDNA) for the detection of minimal residual disease (MRD) is promising but also challenging. Determining more mutations and cfDNA holds promise for improving MRD detection, but highly accurate and efficient sequencing methods and appropriate calibration are required to prevent false detections in custom-designed tests.

[0158] Methods: MAESTRO (Micro-allele enrichment sequencing by recognizing oligonucleotides) uses mutation-specific oligonucleotide probes to enrich cfDNA libraries of tumor mutations, enabling accurate detection with minimal sequencing. A novel approach, MAESTRO-Pool, which pools MAESTRO probes from all patients and applies them to all samples from all patients, was used to screen 22,333 tumor mutations in 98 plasma samples from 9 melanoma patients. This enabled the quantification of MRD detections in patient-matched samples and false detections in non-matched samples from other patients. For MRD detection, a novel dynamic MRD detector was used, which calculates the probability of MRD detection based on the number of mutations and sequenced cfDNA molecules, thus calibrating for variants in each custom test.

[0159] Results: MAESTRO-Pool enables sensitive detection of MRD as low as 0.78 parts per million (ppm), reflecting a 10 to 100-fold improvement over existing tests. Of the eight MRD-positive samples with ultra-low tumor fractions (<10 ppm), seven showed an upward trend pre-recurrence or a downward trend consistent with the response. In an unmatched test of 784 patients, only one was found to be MRD-positive (tumor fraction = 2.7 ppm), indicating high specificity.

[0160] Conclusion: MAESTRO-Pool enables large-scale, parallel, tumor-informed MRD testing while benchmarking custom MRD tests. Furthermore, the new MRD detector allows for testing of a wider range of mutations and cfDNA molecules without compromising specificity. These advancements enhance the ability to detect MRD traces in blood.

[0161] introduce

[0162] Detection of minimal residual disease (MRD) in the blood using circulating tumor DNA (ctDNA) holds promise for better care of cancer patients. It allows for early intensification or switching of treatment before clinical recurrence, or downgrading treatment when it is no longer needed, thus avoiding treatment side effects. 1–3 Multiple studies have shown that ctDNA testing has a high predictive value for tumor recurrence. 2, 4, 5 However, the sensitivity of MRD detection is low at key clinical time points (such as within a few months after surgery), typically below 25%, especially in cancer types considered to be "low ctDNA shedding individuals". 6–11Therefore, there is a key unmet need to maximize sensitivity. The ability to detect MRD in more patients with longer recurrence advances could enable ctDNA testing to inform treatment escalation and de-escalation for cancer patients.

[0163] Higher MRD detection sensitivity can be achieved by tracking more tumor mutations in each patient or by sampling more cell-free DNA (cfDNA) molecules from the blood. Given the genetic diversity of tumors in most patients, tracking more tumor mutations requires "tumor-informed" assays. 12–14 The process involves sequencing a patient's tumor to identify a unique tumor mutational fingerprint, and then determining each patient's own plasma DNA within that fingerprint. However, this presents a challenge due to the abundance of normal cfDNA in the blood and the extreme sequencing volumes required to accurately resolve low-abundance mutations. Furthermore, as custom assays develop to detect increasingly lower tumor fractions in cfDNA, mitigating the possibility of false detections becomes crucial. This has largely remained unexplored because custom assays generally do not benchmark numerous samples on an individualized basis. It is inferred that plasma samples from other patients could be used as controls for each other, coupled with improved MRD detection methods to characterize and prevent false detections.

[0164] This section introduces MAESTRO-Pool, a recently developed MAESTRO (micro-allele enrichment sequencing by recognizing oligonucleotides) method. 13 Modifications were used for mutation enrichment sequencing of rare mutations; this method has been used to improve MRD detection in liquid biopsies. 15 Using the MAESTRO-Pool, MAESTRO probes from the patient cohort were synthesized and pooled, and used on all samples from all patients, enabling the detection of MRD in patient-matched samples and quantifying false detections in patient-unmatched samples. Furthermore, a novel “dynamic” MRD detector was introduced, providing probability fractions for MRD detection, allowing sequencing of more mutations and cfDNA molecules while limiting false detections.

[0165] Materials and methods

[0166] Patients and samples

[0167] All patients provided written informed consent, allowing for the collection of blood and tumor tissue for research purposes and the analysis of clinical and genetic data. Patients with high-risk melanoma were prospectively identified for tissue analysis and a bank cohort approved by the enrollment institution's review committee, from which nine were selected for analysis. DNA was extracted and prepared into sequencing libraries as previously described. 16, 17 .

[0168] Designing tumor fingerprints

[0169] For MAESTRO tumor fingerprinting, patient-specific fingerprint design follows the previously described method. 15 Tumor DNA was extracted from fresh frozen tissue and submitted for 30x whole-genome sequencing (WGS) via Illumina NovaSeq S4. Simultaneously, paired normal DNA was extracted from the erythrocyte sedimentation rate (ESR) amber layer and submitted for 15x WGS. Using the paired tumor-normal WGS data, somatic variants were detected through the GATK4 best practice Mutect 2 workflow and used as the MAESTRO probe design tool. 13 The input was used to generate a candidate list of tumor fingerprints for each patient. These lists were further filtered by removing target sites in overlapping low-complexity regions or common germline variations to reduce false positives due to poor alignment or contamination, respectively. Then, to create the MAESTRO-Pool assay, all individual fingerprints were combined and subjected to the following filters: (1) if two or more patients had probes targeting the same locus but different alleles, or (2) if two or more patients had probes targeting loci within 200 bp of each other, the probes of the patient with the smaller fingerprint were retained. However, probes targeting the same locus and allele in multiple patients were allowed. The resulting MAESTRO-Pool probe set was ordered from IDT as an o-pool product.

[0170] In addition, the optimized Parsons et al. method 16 It was used to design personalized MRDTracker assays that do not utilize MAESTRO enrichment. Due to the large amount of sequencing required compared to MAESTRO, the MRD Tracker fingerprint is capped at 1000 probes per patient, which are prioritized by WGS variant allele frequencies. The probe set for each patient is then ordered from IDT as an xGen custom Hyb panel product.

[0171] Sample processing

[0172] cfDNA was extracted from 4 to 7 ml of plasma using the QIAsymphony circulating DNA kit and quantified using the Quant-iT PicoGreen assay on a Hamilton STAR-line liquid handling system, as previously stated. 16,17 cfDNA and gDNA libraries were constructed using the Kapa Hyper Prep kit and a custom-designed double-indexed duplex UMI adapter (IDT). The prepared libraries were then quantified using the Quant-iT PicoGreen assay on a Hamilton STAR-line liquid handling system.

[0173] Hybridization capture and sequencing

[0174] By following the previously published method 13 Perform MAESTRO-Pool hybridization capture. In short, hybridization capture was performed using the xGen Hybridization and Wash Kit with the xGen Universal Inhibitor (IDT). Each hybridization capture contained up to 12 samples, each with a library quality equivalent to 50 times the DNA quality used for library construction, and used a 4 pmol MAESTRO-Pool panel. The hybridization program began at 95°C for 30 seconds, followed by a gradual decrease in temperature from 65°C to 50°C, decreasing by 1°C every 48 minutes. Finally, the plate was held at 50°C for at least four hours. The heat wash step was performed at 50°C. After the first round of hybridization capture, 16 cycles of PCR were applied. The products were used for a second round of hybridization capture using a 2 pmol MAESTRO-Pool panel. This was followed by another 16 cycles of PCR. The final captured products were quantified and pooled for sequencing on an Illumina NovaSeq S4 (151 bp paired-end reads) at a target raw depth of 50 million reads per sample.

[0175] MRD Tracker hybrid capture follows the previously published method. 16,17 The procedure differed significantly from the MAESTRO-Pool described above, including: 1) the library quality used for hybridization capture was 25 times the DNA quality used for library construction; 2) the hybridization program began at 95°C for 30 seconds, followed by holding at 65°C for at least four hours; and 3) the heat washing step was performed at 65°C. The final capture products were sequenced on an Illumina NovaSeq S4, targeting a raw depth of 40,000 x 10 ng of DNA quality used for library construction per site.

[0176] Data processing and filtering

[0177] Sequencing and co-detection follow the same protocol as previously described. 13,16 The resulting shared data from each MAESTRO-Pool sample was then used as Miredas (a set of custom MRD detection scripts). 16The script applied additional fragment-level filtering and quantified the number of shared molecules at each site. After generating these site-specific counts for each sample, a validated tumor fingerprint for each patient was determined to ensure robust, tumor-specific mutation tracking. Sites that did not meet all of the following criteria were excluded from all downstream analyses: (1) 0 ALT duplexes in the matched normal gDNA, (2) >0 ALT duplexes in the matched tumor gDNA, and (3) the ratio of ALT duplexes to ALT single-stranded shared molecules in the matched tumor gDNA (i.e., the DSC / SSC ratio) >0.15 (Gydush et al. 2022). After defining the validated tumor fingerprint for each patient, matched and unmatched samples were distinguished for the final site-level filtering step. For matched samples, each site was considered to be required to meet the following criteria for MRD detection: (1) tumor validated, and (2) a DSC / SSC ratio >0.15 if >0 ALT duplexes. However, a different set of filters was applied to non-matching samples because these samples were used to estimate false detections rather than true tumor signals. Therefore, each site needed to meet the following criteria to ensure that the detection was not affected by any germline or somatic factors: (1) the target site was validated for tumors in the fingerprinted patient, (2) the target site was not in the matching fingerprint of the sample, (3) there were 0 ALT duplexes in the matching normal gDNA of the sample, (4) there were 0 ALT duplexes in the matching tumor gDNA of the sample, and (5) if there were >0 ALT duplexes in the cfDNA sample, the DSC / SSC ratio was >0.15.

[0178] MRD Tracker samples underwent similar processing to MAESTRO-Pool matched samples. They were processed using the same common detection workflow and Miredas steps. They also passed tumor validation filtering, but without the DSC / SSC ratio filter specific to MAESTRO data.

[0179] Finally, if 2 to 10 mutations are detected, the MAESTRO and MRD Tracker samples are tagged for semi-automated review. In these cases, each detected mutation is reviewed to determine if it is likely a false positive (e.g., to identify artifacts), in which case it is discarded from the validated tumor fingerprint.

[0180] Tumor score estimation

[0181] Tumor score estimation for MAESTRO-Pool and MRD Tracker samples relies on methods described previously. 13,16However, only sites that pass through the desired filter are included in the tumor score estimation. The MAESTRO sample assumes that all sites passing through the filter have a consistent doublet depth, which is consistent with previous work. 13 The 10 least enriched sites described in the text were generated.

[0182] Dynamic MRD detection

[0183] The goal of dynamic MRD detection is to design a method capable of measuring sample-specific properties (such as observed tumor fraction, validated fingerprint size, and cfDNA quality) and quantifying the probability that observed data are genuine tumor signals rather than spurious spontaneous errors. Furthermore, the output is designed to be easily interpretable. Therefore, the following model was designed to complement previous requirements for ≥2 mutant duplexes across ≥2 sites:

[0184]

[0185]

[0186] The model's output (see Equation 1) is the probability that a sample is MRD-positive given the observed data, which facilitates easy calculation, interpretation, and adjustment via Bayes' theorem. The model's sample-specific inputs include the total mutated duplex and the total assumed duplex across all sites passing through all filters. As described in the tumor score estimation section, the total assumed duplex is estimated using the 10 least enriched sites. Notably, this variable indirectly integrates attributes such as fingerprint size and cfDNA quality, allowing the model to easily adapt its confidence to a specific sample. To estimate the likelihood of observed data in MRD-positive or negative samples, it is assumed that sampling mutated duplexes from cfDNA can be modeled as a binomial distribution. The likelihood that mutated duplexes are entirely tumor-derived is calculated according to Equation 2. To this end, a detection baseline equal to the background mutation frequency is established to ensure that the model does not weight detection rates below the background mutation frequency as tumor-derived. Similarly, the likelihood that mutated duplexes are spontaneous errors is calculated (Equation 3). The chosen default error rate is 1 x 10^6. -7 This stems from the error rate in the previous MAESTRO dataset (ranging from 1 x 10^6). -8 Up to 1 x 10 -7 Finally, no prior information is assumed for samples that are MRD positive or negative, on the grounds that plasma samples collected in an MRD monitoring setting have a considerable chance of being positive or negative.

[0187] Detection limit estimation

[0188] The detection limit represents the tumor fraction with 90% power given the number of duplexes observed at each site. However, previous estimates assumed a fixed MRD detection strategy of detecting ≥2 mutant duplexes at ≥2 sites. This framework was adjusted to find the tumor fraction with 90% power when using dynamic MRD detection. This involves first finding the minimum number of mutant duplexes required for a specific sample to have P ≥ 0.95. Then, the same logic described previously is used. 16 To calculate which tumor score has 90% power to detect at least the minimum number of mutant duplexes across ≥2 sites. The duplex depth of the MAESTRO sample is estimated in conjunction with other analyses, as described in the tumor score estimation section.

[0189] Instance environment

[0190] Figure 32 This is an illustration of an environment 3200 that can be operated to utilize dynamic MRD classification as described herein. The illustrated environment 3200 includes a service provider system 3202, a client device 3204, and a sequencing data processor 3206, which are communicatively coupled to each other via a network 3208. Although the sequencing data processor 3206 is illustrated separately from the service provider system 3202 and client device 3204, this functionality can be incorporated as part of the service provider system 3202 and / or client device 3204, further divided among other entities, and so on. For example, all or part of the functionality of the sequencing data processor 3206 can be incorporated as part of the service provider system 3202 and / or client device 3204. Additionally or alternatively, all or part of the client device 3204 can be incorporated as part of the service provider system 3202.

[0191] The computing devices used to implement the service provider system 3202, client device 3204, and sequencing data processor 3206 can be configured in various ways. For example, the computing device can be configured as a desktop computer, laptop computer, mobile device (e.g., in a handheld configuration, such as a tablet or mobile phone), etc. Therefore, the range of computing devices can range from full-resource devices with abundant memory and processor resources (e.g., personal computers, game consoles) to low-resource devices with limited memory and / or processing resources (e.g., mobile devices). Furthermore, the computing device can represent multiple different devices, such as multiple servers utilized to perform "cloud" operations, such as combined... Figure 35 Further description.

[0192] The service provider system 3202 is illustrated as including an application manager module 3210, which represents the ability to provide users of client device 3204 with access to sequencing data processor 3206 via network 3208. For example, the application manager module 3210 can expose the content or functionality of sequencing data processor 3206, which can be accessed by application 3212 of client device 3204 via network 3208. Application 3212 can be configured as a network-enabled application, browser, native application, etc., exchanging data with service provider system 3202 via network 3208. This data can be used by application 3212 to enable users of client device 3204 to communicate with service provider system 3202, such as receiving application updates and features when service provider system 3202 provides functionality to manage application 3212.

[0193] In the context of the technology described, application 3212 includes the ability to analyze data generated by at least one sequencing event. In the illustrated example, application 3212 includes interface 3214, which is at least partially implemented in the hardware of client device 3204 to facilitate communication between client device 3204 and sequencing data processor 3206. For example, interface 3214 includes the ability to receive input to sequencing data processor 3206 from client device 3204 (e.g., from a user of client device 3204) and output information, data, etc., from sequencing data processor 3206 to client device 3204, as will be further detailed herein.

[0194] Sequencing events involve determining the sequence of nucleotides (e.g., adenine, thymine or uracil, cytosine, and guanine) in one or more nucleic acid samples (such as those derived from one or more biological samples). The sequence of nucleotides is referred to herein as a “sequence.” Nucleotides are also called “bases.” Sequencing events will be described herein for deoxyribonucleic acid (DNA) sequencing, particularly for cell-free DNA (cfDNA), such as that generated using the MAESTRO or MAESTRO-Pool techniques described herein. For example, see the sections on designing tumor fingerprints, sample processing and hybridization capture, and sequencing described above. Such techniques produce targeted cfDNA sequencing data 3216, which is analyzed by a sequencing data processor 3206 to determine the MRD status of the corresponding sample. For example, the corresponding sample is classified by the sequencing data processor 3206 as MRD positive (e.g., presence of circulating tumor DNA) or MRD negative (e.g., absence of circulating tumor DNA). The MRD status can be output as an MRD classification 3218. In at least one embodiment, the targeted cfDNA sequencing data 3216 comprises a text-based file format, such as a FASTQ file, which stores nucleotide sequence information and quality fractions of the bases in the sequencing reads. In a variant, the targeted cfDNA sequencing data 3216 comprises another type of file format.

[0195] In at least one variant, the corresponding sample is classified as either circulating tumor DNA positive or circulating tumor DNA negative, rather than the MRD classifications described above. Therefore, it should be understood that the technique is suitable for detecting circulating tumor DNA outside the context of MRD.

[0196] In at least one embodiment, the sequencing data processor 3206 receives targeted cfDNA sequencing data 3216 and performs data processing and filtering via a quantization and analysis module 3220. The quantization and analysis module 3220 represents a function for identifying single nucleotide variants (SNVs) in the targeted cfDNA sequencing data 3216 for determining MRD classification 3218. Therefore, in at least one embodiment, the quantization and analysis module 3220 includes one or more post-processing algorithms 3222 to identify sites (e.g., SNVs) for MRD detection (see, for example, the data processing and filtering section described above). For instance, the targeted cfDNA sequencing data 3216 can be processed by one or more post-processing algorithms 3222 to identify reads matching the targeted variants of the MAESTRO probe, while filtering out background noise and sequencing errors to enhance the detection of true variants.

[0197] The quantification and analysis module 3220 may also include a specificity factor determination algorithm 3224. The specificity factor determination algorithm 3224 can quantify specificity factors 3226 used to determine MRD classification 3218, as further described herein. In an example of environment 3200, specificity factors 3226 include total mutant duplexes 3228 and total hypothetical duplexes 3230. Total mutant duplexes 3228 refer to the count of DNA duplexes containing specific variants of interest at all sites that have passed through the filter in the targeted cfDNA sequencing data 3216 of the sample. Total mutant duplexes 3228 are also referred to herein as alternative (“ALT”) duplexes. Total hypothetical duplexes 3230 refer to the total count of mutation-detected DNA duplexes (including mutant and wild-type (non-mutant) duplexes) in the sample. In at least one embodiment, the total hypothetical duplexes 3230 are estimated using a predetermined number (e.g., 10) of least enriched sites, as described in the tumor score estimation section. For example, a subset of MAESTRO probes with minimal enrichment for mutations can be used to estimate total hypothetical duplexes 3230, such that the probes are largely unbiased for mutant alleles relative to wild-type alleles. Therefore, these sites can be used to estimate the number of DNA molecules at each locus, which, when multiplied by the number of mutations determined, yields the total number of molecules determined. Alternatively or concurrently, total hypothetical duplexes 3230 can be estimated by including a set of control probes designed to be unbiased for mutations relative to wild-type DNA, and thus can be used to estimate the number of DNA molecules at each locus, which, when multiplied by the number of mutations determined, yields the total number of molecules determined. Total mutant duplexes 3228 and total hypothetical duplexes 3230 can be used together to determine tumor fractions, for example, the proportion of reads containing mutations (e.g., total mutant duplexes 3228) out of the total number of reads covering mutation sites (e.g., total hypothetical duplexes 3230). For example, total mutant duplexes 3228 and total hypothetical duplexes 3230 contain sample-specific factors.

[0198] In environment 3200, specificity factor 3226 further includes a hypothetical background mutation frequency 3232. As described herein, the hypothetical background mutation frequency 3232 can be a fixed value, a context-specific value, or a sample-specific value (e.g., an empirical measurement). The hypothetical background mutation frequency 3232 can refer to, for example, the background SNV frequency and can be used as an error rate. For example, the hypothetical background mutation frequency 3232 can be used as the error rate mentioned in the dynamic MRD detection section above. As a non-limiting example of a fixed value, the hypothetical background mutation frequency 3232 can be one part per 10 million (e.g., 1 x 10⁻⁶). -7(Although other values ​​may be used.) It should be understood that specificity factor 3226 may include other factors that supplement or replace those listed above, including those discussed below. Figure 33 Additional context-specific factors are used for description.

[0199] According to the techniques described herein, the sequencing data processor 3206 includes an MRD classification module 3234. The MRD classification module 3234 represents the function of determining whether a given sample is MRD positive or MRD negative (e.g., MRD classification 3218) based on targeted cfDNA sequencing data 3216 and a specificity factor 3226. In at least one variant, the MRD classification module 3234 is configured to determine whether a given sample is positive or negative for circulating tumor DNA. In one or more embodiments, the MRD classification module 3234 includes a tuned algorithm 3236 that takes into account parameters of the specificity factor 3226 tuned to output a first likelihood 3240 for an MRD positive likelihood model 3238 (e.g., a first likelihood model) and an MRD negative likelihood model 3242 (e.g., a second likelihood model) configured to output a second likelihood 3244.

[0200] For example, the MRD positive likelihood model 3238 determines the likelihood of observed data given that the sample is MRD positive, such as according to Equation 2 given above. The MRD positive likelihood model 3238 can use a binomial distribution, given probability parameters. To model the total mutant double strand 3228 and the total assumed double strand 3230, where Ensure that the estimated mutation frequency is not lower than the assumed background mutation frequency 3232. The first likelihood 3240 can be the likelihood that the total mutated duplex 3228 in the total assumed duplex 3230 is entirely derived from the tumor.

[0201] For example, the MRD negative likelihood model 3242 determines the likelihood of observed data when the sample is MRD negative, such as according to Equation 3 given above. The MRD negative likelihood model 3242 can use a binomial distribution, given the assumed background mutation frequency 3232, to model the total mutant double strand 3228 and the total assumed double strand 3230. The second likelihood 3244 can be the likelihood that the total mutant double strand 3228 in the total assumed double strand 3230 is a spontaneous error.

[0202] The total mutant double strand 3228 and the total assumed double strand 3230 are used as sample-specific inputs by the MRD positive likelihood model 3238 and the MRD negative likelihood model 3242, thereby enabling the MRD classification module 3234 to adapt its confidence to the specific sample being evaluated.

[0203] The first likelihood 3240 and the second likelihood 3244 are input into the dynamic probability scoring algorithm 3246, which outputs a dynamic probability score 3248. For example, the dynamic probability scoring algorithm 3246 can use posterior probability calculations, such as Equation 1 provided above, where... Given observation data ( The dynamic probability fraction of a sample being MRD positive was 3248. The first likelihood is 3240, and It is the second likelihood 3244. The dynamic probability scoring algorithm 3246 can further utilize priors, such as those in equation 1. and It can be a fixed value (e.g., 0.5 as a non-restrictive instance).

[0204] MRD classification module 3234 may further include a threshold comparison algorithm 3250 configured to output MRD classification 3218 based on a dynamic probability score 3248 relative to a threshold. For example, MRD classification 3218 may classify a sample as MRD positive in response to a dynamic probability score 3248 being greater than or equal to the threshold, and MRD classification 3218 may classify a sample as MRD negative in response to a dynamic probability score 3248 being less than the threshold. As a non-limiting example, the threshold is 0.95. In at least one embodiment, the threshold is adjustable, for example, via user input.

[0205] The client device 3204 is shown displaying the MRD classification 3218 via the display device 3252. For example, the display device 3252 can display output indicating whether the sample is MRD positive or MRD negative. It should be understood that the MRD classification 3218 is also stored in memory, either in a single data file or across multiple data files, for subsequent access.

[0206] In this way, the MRD classification module 3234 generates MRD classification 3218 to improve the sensitivity of circulating tumor DNA detection and reduce the probability of false detection.

[0207] MAESTRO enrichment and sequencing requirements

[0208] MAESTRO enrichment and sequencing requirements follow those of previous work. 13A similar strategy. Variant allele frequency (VAF), the ratio of mutant double strands at a specific locus to the total number of double strands, was used to estimate mutation enrichment between MAESTRO and MRD Tracker. For this purpose, sites were limited to those that (1) were present in both the MAESTRO and MRD Tracker fingerprints, (2) passed all filters of both MAESTRO and MRD Tracker (described in Data Processing and Filtering), and (3) had ≥1 mutant double strand detected using both MAESTRO and MRD Tracker. This enabled direct comparisons of MAESTRO and MRD Tracker at the site level and allowed for easy calculation of MAESTRO's VAF fold enrichment (MAESTRO VAF / MRD Tracker VAF). Similarly, for comparative sequencing requirements, sites were limited to those that (1) were present in both the MAESTRO and MRD Tracker fingerprints and (2) passed all filters of both MAESTRO and MRD Tracker. Using these sites, each duplex family was downsampled stepwise to 0.01% to form a saturation curve between the total read pairs and the mutant duplexes. The read pairs required to reach mutant duplex saturation were compared by: (1) limiting samples to those detecting ≥2 mutant duplexes at two fingerprint sites using MAESTRO and MRD Tracker; (2) using the minimum saturated mutant duplex count between MAESTRO and MRD Tracker (the saturation point was defined as the minimum mutant duplex count ≥90% of the total mutant duplex count, or, if the sample was not saturated, only the total mutant duplex count) as the comparison point; (3) using MAESTRO and MRD Tracker to find the minimum number of read pairs required to reach the comparison point; and (4) calculating the ratio of read pairs required using MAESTRO and MRD Tracker.

[0209] result

[0210] MAESTRO-POOL and improved MRD detector

[0211] MAESTRO-Pool enables large-scale, parallel, tumor-informed MRD testing for a patient cohort. It involves performing whole-genome sequencing (WGS) on each patient's tumor, designing MAESTRO probes that target patient-specific tumor mutations, and combining these probes into a single assay applicable to all samples from all patients. Figure 1 Mutations were verified in the tumor DNA of each patient, and those previously identified as being found in their own germline DNA were excluded. 13, 15, 16MRD assays were performed on plasma samples from each patient (i.e., “matched samples”) and plasma samples from other patients (i.e., “unmatched samples”) to assess the specificity of each custom assay. Mutations shared in tumor biopsies from multiple patients were excluded when calculating the specificity of each MRD test in the unmatched samples.

[0212] To date, a “fixed” threshold for MAESTRO MRD detection has been used when determining approximately 1000 genome-wide mutations from standard blood volumes (e.g., 1 to 3 × 10⁻⁶ cc tubes). This has shown high specificity in benchmark experiments and cancer patients. 13, 16 However, it is speculated that this may not be sufficient to track more mutations and cfDNA molecules in each patient. Figures 2A to 2E Therefore, a novel “dynamic” MRD detector was developed that calculates the probability of MRD detection based on sample-specific properties (such as observed tumor fraction, fingerprint size, and cfDNA molecules) relative to a hypothetical error rate of 1 / 10 M (see Materials and Methods). A threshold of P ≥ 0.95—an adjustable parameter—was set to consider a sample as MRD positive. At P ≥ 0.95, the model showed that this would limit false detections while maintaining high sensitivity for tumor-derived cfDNA at low to low parts per million (ppm) levels. Figures 2A to 2E First, it was confirmed that when using MAESTRO to determine up to 1000 mutations per patient in up to 148 ng cfDNA, the effect of the dynamic detector on previous results was negligible (n = 133 / 134 consistent MRD test results). Figures 6A to 6B It is worth noting that a sample with inconsistent MRD detection results is borderline negative when using dynamic MRD with a probability fraction of 0.94.

[0213] MAESTRO-POOL test for melanoma patients

[0214] Next, an attempt was made to push the boundaries of the number of mutations identified and to experimentally validate the novel MRD detector using the MAESTRO-Pool in matched versus unmatched samples. Nine patients who received radical treatment for stage III melanoma were identified, with a median of 10 plasma samples collected from each (range 5 to 22, total 98), and a median clinical follow-up of 2.5 years (range 0.5 to 4.0). Eight of these patients experienced relapse, seven of whom collected subsequent samples, while one remained disease-free but died due to unrelated circumstances. Whole-genome sequencing (WGS) was performed on tumor and normal DNA for each patient, and a median of 40,348 mutations (range 5932 to 160,598) were identified per patient. A MAESTRO-Pool assay was created for all patients (median 1856 mutations per patient, range 447 to 5571) and applied to tumor and normal DNA for each patient, as well as 98 plasma DNA samples from all patients. For comparison, nine personalized MRD Tracker assays were created based on the previously validated method of Parsons et al. 16 Optimizations were made to include a median of 1000 mutations per patient (range 334 to 1000). These limits were set at 1000 mutations per patient, and considering the greater sequencing needs without MAESTRO enrichment, they were applied only to samples from each patient individually.

[0215] Tumor mutation fingerprints for each patient were first validated by applying MRD assays to both tumor and normal DNA. With and without MAESTRO enrichment, median 78% (range 3% to 89%) and 97% (range 3% to 99%) of mutations were identified, respectively. Figure 7A ). Expectedly, specimens with the lowest validation rates had the lowest median variant allele frequency (VAF) in WGS, which may indicate low tumor purity. Mutations validated only in tumor biopsies and absent in normal DNA were used to detect MRD from plasma (see Materials and Methods). Furthermore, estimated limits of detection at 90% power were calculated for each sample based on duplex depth and validated fingerprint size (see Materials and Methods). Most MAESTRO tests were found capable of detecting low ppm, except for three patients with empirically validated fingerprint sizes well below 1000 mutations. Figure 7B ).

[0216] Using a dynamic MRD detector with P ≥ 0.95, 47 out of 98 matched samples were found to be MRD positive (median tumor fraction 1.1 × 10⁻⁶). −4 The range is 7.8 × 10 −7(Up to 0.13), only 1 out of 784 unmatched samples was MRD positive (tumor score 2.7 × 10⁻⁶). −6 , Figures 3A to 3B This resulted in median experimental specificities of 100% (range 100% to 100%) for detecting ≥10 ppm and <10 ppm ctDNA in a median of 88 (range 76 to 93) non-matched patient samples, and 100% (range 98% to 100%), respectively. Notably, if a fixed threshold of ≥2 mutations was used, an additional 3 matching and 20 non-matched samples were found to be MRD-positive, with most false detections occurring in the largest panels of 4337, 3816, and 3479 mutations, as expected. Figures 3A to 3B While the more stringent threshold of ≥3 or ≥4 mutations achieves specificity comparable to dynamic MRD detection, it also reduces the number of MRD-positive detections in matched samples. Figure 8A These results highlight the importance of dynamic MRD detection, as more tumor mutations and cfDNA molecules were sequenced for each patient.

[0217] Interestingly, among the three samples that were MRD-positive at fixed MRD but negative at dynamic MRD, all samples tested positive for two or three mutations and had tumor fractions below 1 ppm (range 5.7 × 10⁻⁶). −7 Up to 9.9 × 10 −7 This indicates that these were initially critical MRD detections. This is further reflected in their dynamic probability scores, which range between 0.69 and 0.92. Although dynamic MRD detection successfully reduced the false detection rate, the P ≥ 0.95 threshold was fairly conservative (e.g., median specificity of 1). Tuning this parameter was explored. It was observed that predictive specificity underestimated measurement specificity; therefore, lower probability thresholds (such as P ≥ 0.80) still yielded higher specificity, and recall was comparable to fixed MRD detection. Figure 8B Furthermore, dynamic probability scores enable comparisons of MRD signals between samples while correcting for sample-specific properties. This can be used to quantify confidence scores by comparing the probability scores of matched samples with the distribution of probability scores in unmatched samples. For example, a confidence score can be assigned to a matched sample with a critically negative MRD signal (i.e., a dynamic score of 0.50 ≤ P < 0.95) by quantifying that its probability score is greater than or equal to the score of the unmatched sample itself. Figure 9 This can further indicate whether MRD traces are present in critically negative samples.

[0218] As further validation, MAESTRO-Pool results for matched patient samples were compared with MRD Tracker results using a dynamic detector. For MRD Tracker, a cutoff value of ≥2 mutations was used. 16 High consistency in MRD detection was observed—70 out of 77 samples showed consistent MRD detection results—and a consistent correlation between the corresponding tumor fractions in cfDNA between MAESTRO-Pool and MRD Tracker. 2 = 0.97) Figure 10A Of the 7 samples with inconsistent MRD detection results, 5 / 7 of the MRD-positive samples had observed tumor fractions lower than the detection limit of the MRD-negative samples. Figure 10B Furthermore, 4 / 7 of the samples came from patient 1406, who had only 22 and 28 tumor-validated mutations, respectively, measured by MAESTRO-Pool and MRD Tracker. This suggests that the inconsistent samples were due to insufficient efficacy of one of the samples. Encouragingly, of the 30 samples that were MRD-negative in MAESTRO, 29 were also MRD-negative in MRD Tracker.

[0219] Finally, the impact of MAESTRO enrichment on sequencing efficiency was investigated. For mutations tracked using MAESTRO-Pool and MRDTracker, the median enrichment of VAF using MAESTRO was 115-fold (range 0.44 to 12,056). Figure 4A ). Expectedly, fold enrichment was most abundant in samples with low tumor fractions, for example, for samples with <100 ppm tumor DNA, the median was 777 (range 1.3 to 8631). Figure 4B Next, on a sample-by-sample basis, the reads required by MAESTRO-Pool to reveal mutant DNA duplexes relative to MRD Tracker were examined. Figures 4C to 4D The use of MAESTRO reduced the median number of reads required by 33-fold (range 1.5 to 439) to reveal 90% of the mutant DNA double strands in each sample. Figure 4D The results using MAESTRO-Pool are consistent with previous observations that MAESTRO enables highly sensitive MRD detection with significantly reduced sequencing.

[0220] Case studies of individual melanoma patients

[0221] Finally, the relationship between MRD test results and individual patient treatment and prognosis was examined. Because this was a cohort study rather than a clinical trial, individual patient treatment and follow-up varied considerably, thus requiring individual examination of each patient. Figures 5A to 5B , Figures 11A to 11B , Figures 12A to 12B , Figures 13A to 13B and Figure 14 In one exemplary patient, MRD was detected immediately postoperatively at 2.6 ppm prior to adjuvant dabrafenib and trametinib therapy. Figure 5A The MRD then became undetectable at two subsequent time points, before rising to 2.1 ppm 138 days before a recurrence was detected in the brain. Following definitive craniotomy and radiation therapy, the MRD was undetectable at both time points. Then, prior to pembrolizumab, the MRD was immediately detected at 3.1 ppm, remaining at similar levels for the next two time points. The patient continued treatment for 355 days until the MRD level rose to 6612 ppm, and gastric metastasis was detected 442 days after the initial detection. The patient was then given ipilimumab and nivolumab, at which point the MRD became undetectable, while imaging scans yielded indeterminate results.

[0222] In the second exemplary patient, no samples were available shortly after surgery, but MRD was detected 48 days and 84 days before local and distant recurrence, respectively. Figure 5B The MRD level subsequently decreased to 0.8 ppm before rising again. Notably, an increase in MRD from 0.8 ppm to 24 ppm was detected when imaging scans showed disease stability and 35 days before scans showed progression. Ipilimumab and nivolumab were then administered to the patient, and a significant decrease in MRD levels was observed, at least to 0.9 ppm, prior to a long-lasting response that persisted at the last follow-up. These cases illustrate the depth of MRD levels detectable using MAESTRO and demonstrate the potential of using these measurements to guide care.

[0223] Interestingly, of the eight MRD-positive samples with extremely low tumor scores (<10 ppm) detected across all patients in the cohort, seven were found before recurrence (in an upward trend of increasing tumor scores) or after a period of treatment response resulting from decreasing tumor scores. Notably, four of these samples occurred at time points when patients showed no evidence of disease via imaging scans. These observations, along with the experimental specificity, suggest that in-depth detection of MRD can fill a meaningful gap in cancer treatment response monitoring.

[0224] discuss

[0225] Liquid biopsy holds great promise for detecting MRD and providing more precise cancer care information, but higher sensitivity is needed. Determining more mutations in more cfDNA using custom tests can improve sensitivity, but this can create opportunities for false detections if not properly considered. Here, MAESTRO-Pool is introduced for large-scale parallel, tumor-informed MRD testing in cohort studies. By pooling MRD tests from multiple patients, MAESTRO-Pool determined thousands of patient-specific tumor mutations while using non-matched samples as controls to benchmark each patient's custom MRD test. This simplifies MRD testing and enables the detection of low ppm levels of ctDNA in melanoma patients, including when scans show no evidence of disease. Furthermore, a novel dynamic MRD detector illustrates the differences in the number of sequenced mutations and cfDNA molecules by calculating a probability fraction of MRD in each sample. When each patient's test was benchmarked against a median of 88 (range 76 to 93) unmatched samples from other patients, the median experimental specificity for detecting ≥10 ppm and <10 ppm ctDNA was observed to be 100% (range 100% to 100%) and 100% (range 98% to 100%), respectively, using a dynamic MRD detector with MAESTRO-Pool. The ability to identify borderline negative patients who may benefit from subsequent testing was demonstrated using the dynamic detector and its probability fractional distribution in pooled testing.

[0226] The abundance of normal cfDNA in the blood and the large number of sequencing reads required to overcome sequencing errors often preclude routine benchmarking of personalized MRD assays in control samples. This challenge is further amplified when more mutations and cfDNA are measured at critical clinical time points to enhance MRD detection. MAESTRO-Pool addresses these challenges by utilizing mutation enrichment to reduce the amount of sequencing required per sample and allowing MRD tests from multiple patients to be pooled and applied to many samples at once. A similar approach using standard double-stranded sequencing methods would be impractical, as it would be significantly slower and economically infeasible, both ultimately burdening healthcare systems and patients. Furthermore, the dynamic detector further illustrates varying numbers of mutations and cfDNA molecules to limit erroneous detection. Notably, the dynamic detector is applicable to both MAESTRO and MAESTRO-Pool MRD assays.

[0227] In summary, MAESTRO-Pool and dynamic MRD detection enable ctDNA detection down to 1 ppm, while using non-matched patient samples as controls for benchmarking of customized MRD testing for each patient. These features simplify MRD testing and enable higher sensitivity and specificity as more mutations and cfDNA are analyzed. In turn, these methods for enhancing MRD detection promise to enable more precise care for cancer patients.

[0228] References

[0229] 1. Etienne, G. et al. Long-Term Follow-Up of the French Stop Imatinib(STIM1) Study in Patients With Chronic Myeloid Leukemia. J. Clin. Oncol. 35, 298–305 (2017).

[0230] 2. Magbanua, MJM et al. Circulating tumor DNA in neoadjuvant-treated breast cancer reflects response and survival. Ann. Oncol. 32, 229–239 (2021).

[0231] 3. Radovich, M. et al. Association of Circulating Tumor DNA andCirculating Tumor Cells After Neoadjuvant Chemotherapy With DiseaseRecurrence in Patients With Triple-Negative Breast Cancer: PreplannedSecondary Analysis of the BRE12-158 Randomized Clinical Trial. JAMA Oncol 6,1410–1415 (2020).

[0232] 4. Garcia-Murillas, I. et al. Assessment of Molecular Relapse Detection in Early-Stage Breast Cancer. JAMA Oncol 5, 1473–1478 (2019).

[0233] 5. Lipsyc-Sharf, M. et al. Circulating Tumor DNA and Late Recurrence in High-Risk Hormone Receptor-Positive, Human Epidermal Growth Factor Receptor 2-Negative Breast Cancer. J. Clin. Oncol. 40, 2408–2419 (2022).

[0234] 6. Azad, T. D. et al. Circulating Tumor DNA Analysis for Detection of Minimal Residual Disease After Chemoradiotherapy for Localized Esophageal Cancer. Gastroenterology 158, 494–505.e6 (2020).

[0235] 7. Bettegowda, C. et al. Detection of circulating tumor DNA in early- and late-stage human malignancies. Sci. Transl. Med. 6, 224ra24 (2014).

[0236] 8. Cohen, J. D. et al. Detection and localization of surgically resectable cancers with a multi-analyte blood test. Science 359, 926–930 (2018).

[0237] 9. Corrò, C. et al. Detecting circulating tumor DNA in renal cancer: An open challenge. Exp. Mol. Pathol. 102, 255–261 (2017).

[0238] 10. Kim, Y.-W. et al. Monitoring circulating tumor DNA by analyzing personalized cancer-specific rearrangements to detect recurrence in gastric cancer. Exp. Mol. Med. 51, 1–10 (2019).

[0239] 11. Yamamoto, Y. et al. Clinical significance of the mutational landscape and fragmentation of circulating tumor DNA in renal cell carcinoma. Cancer Sci. 110, 617–628 (2019).

[0240] 12. Coombes, R. C. et al. Personalized Detection of Circulating Tumor DNA Antedates Breast Cancer Metastatic Recurrence. Clin. Cancer Res. 25, 4255–4263 (2019).

[0241] 13. Gydush, G. et al. Massively parallel enrichment of low-frequency alleles enables duplex sequencing at low depth. Nat Biomed Eng 6, 257–266 (2022).

[0242] 14. Wan, J. C. M. et al. ctDNA monitoring using patient-specific sequencing and integration of variant reads. Sci. Transl. Med. 12, (2020).

[0243] 15. Parsons, HA et al. Circulating tumor DNA association with residual cancer burden after neoadjuvant chemotherapy in triple-negative breast cancer in TBCRC 030. medRxiv (2023) doi:10.1101 / 2023.03.06.23286772

[0244] 16. Parsons, HA et al. Sensitive Detection of Minimal Residual Disease in Patients Treated for Early-Stage Breast Cancer. Clin. Cancer Res. 26, 2556–2564 (2020).

[0245] 17. Xiong, K. et al. Duplex-Repair enables highly accurate sequencing, despite DNA damage. Nucleic Acids Res. 50, e1 (2022).

[0246] Example 2. The impact of higher cell-free DNA production on liquid biopsy assays for glioblastoma.

[0247] Background: Minimally invasive molecular profiling analysis using cell-free DNA (cfDNA) is increasingly important for the management of cancer patients; however, low sensitivity remains a major limitation, especially for patients with brain tumors. Temporarily reducing the clearance of cfDNA in vivo—thus allowing for the sampling of more cfDNA—has been proposed to improve the performance of liquid biopsy diagnostics. However, clinical data on the effects of higher cfDNA recovery rates are lacking. Here, we investigated the effect of collecting more cfDNA on the sensitivity of circulating tumor DNA (ctDNA) in glioblastoma, a 'low-shedding' cancer type, by analyzing plasma samples up to ~15 times more than the clinically routine.

[0248] Methods: Seventy plasma samples (median 17.0 mL, range 2.5–66.5 mL) from eight patients with IDH wild-type glioblastoma were tested using an optimized version of the MAESTRO-Pool ctDNA assay. Results were compared with simulated apheresis tube equivalents of cfDNA. ctDNA results were then compared with MRI and pathological assessments of true progression versus pseudoprogression in glioblastoma patients.

[0249] Results: Larger cfDNA production resulted in a doubling of ctDNA positivity, achieving 99% median specificity and more precise ctDNA quantification. In 8 patients with glioblastoma, ctDNA was detected in 88%, including at multiple time points in 6 / 7. In cases where MRI-determined progression is indeterminate, the data suggest that a large plasma volume in the MAESTRO-Pool can help differentiate true glioblastoma progression from pseudoprogression.

[0250] Conclusion: These findings provide a principled demonstration that most glioblastomas shed ctDNA into the plasma, and that greater ctDNA production could help improve liquid biopsies for “low-shedding” cancer types such as glioblastoma.

[0251] The Importance of Research

[0252] Liquid biopsy holds promise for improving cancer detection and monitoring, but is limited by the scarcity of circulating tumor DNA (ctDNA) in the blood. IDH wild-type glioblastoma is considered a “low-shedding” cancer, with current technologies rarely detecting its ctDNA. The ultrasensitive MAESTRO liquid biopsy is optimized for large plasma volumes (up to 66.5 mL, ~15 times the typical blood volume) from glioblastoma patients. The larger cfDNA yield showed a doubling of ctDNA positivity compared to typical plasma volumes, while achieving a median specificity of 99%. ctDNA was detected in 7 / 8 glioblastoma patients, including multiple time points in 6 / 7. Compared to MRI and pathology, ctDNA from larger plasma volumes helps improve the sensitivity and specificity for determining glioblastoma progression relative to pseudoprogression. These findings provide a principled demonstration that most glioblastomas shed ctDNA into the plasma, and that greater ctDNA yield can help improve liquid biopsy for “low-shedding” cancers such as glioblastoma.

[0253] introduce

[0254] Advances in cell-free DNA (cfDNA) diagnostics have enabled minimally invasive cancer diagnosis, identification of therapeutic targets and drug resistance, and monitoring of minimal / measurable residual disease (MRD)—all from a simple blood draw. However, in many cancer contexts where plasma circulating tumor DNA (ctDNA) levels are often below 0.01–0.1% of total cfDNA, higher sensitivity is required. To enhance the sensitivity of ctDNA testing, research has primarily focused on tracking more somatic variants. And integrate other features (such as DNA methylation or fragmentation patterns). However, at such low levels of ctDNA, random sampling of the tumor genome in blood limits the ability of such assays to sensitively detect, monitor, and genotype cancer from ctDNA. To help overcome these challenges, temporarily reducing in vivo cfDNA clearance has recently been proposed as a method to recover more cfDNA from a patient's blood and thus improve liquid biopsy testing. Specifically, two complementary approaches have been previously developed: anti-DNA monoclonal antibodies and liposomes that can temporarily reduce cfDNA clearance. While both methods improve the sensitivity of cancer detection in mice, there is limited clinical data on the impact of higher cfDNA yield on liquid biopsy performance due to the typically limited blood volume available in clinical settings.

[0255] To answer this pressing question, the custom (i.e., tumor-informed) MAESTRO-Pool MRD assay from Example 1 was further optimized to investigate the role of higher cfDNA yields from larger plasma volumes in tumor-informed ctDNA detection. This utilizes the MAESTRO (micro-allele enrichment sequencing by recognizing oligonucleotides) method as applied in Example 1 and this example. It is ideal for analyzing large volumes of blood because it consumes the large amount of excess normal cfDNA in the blood, thus significantly reducing sequencing costs. As described in this article, using MAESTRO-Pool allows MAESTRO assays from multiple patients to be pooled into a single assay, which is applied to samples from all patients, enabling simultaneous detection of MRD and the use of non-matched patient samples as controls to assess specificity.

[0256] As a particularly informative setting for this study, the cohort of patients with isocitrate dehydrogenase (IDH)-wildtype glioblastoma enrolled in the ongoing cancer vaccine trial (NCT02287428) is of particular interest due to its several unique characteristics. Firstly, glioblastoma is an aggressive primary brain tumor that affects 17 people per million annually, with a median survival of only 12 to 17 months. When symptoms appear, glioblastoma can be observed to have extensively infiltrated the brain under a microscope. Consequently, even if surgical resection and radiotherapy / chemotherapy can treat the imaging-visible tumor, the remaining tumor cells inevitably lead to tumor progression. Secondly, the unique drainage mechanisms of the blood-brain barrier and cerebral interstitial fluid are believed to be the reason why glioblastoma is classified as a "low-shedding" ctDNA cancer type, for which existing plasma-based ctDNA assays have relatively low sensitivity. Third, this cohort participated in a clinical trial that included continuous large-volume peripheral blood collection and rapid plasma storage, providing a broad range of cfDNA yields for analysis. Finally, contrast-enhanced magnetic resonance imaging (MRI)—currently the gold standard for glioblastoma monitoring and response assessment—is often limited by low clinical sensitivity and specificity. It is worth noting that while MRI is crucial for assessing glioblastoma tumor progression, it becomes complicated because treatment response, inflammatory response, and radiation necrosis can all present with radiographic features similar to tumor progression (i.e., pseudoprogression). Therefore, patients with glioblastoma urgently need a diagnostic method that can differentiate between radiographic pseudoprogression and true tumor progression. Surgical pathology can definitively diagnose tumor progression relative to pseudoprogression, but repeated surgeries are often impractical due to the refuge of glioblastoma within delicate brain structures—highlighting the need for minimally invasive techniques for longitudinal tumor assessment. To improve the performance of the MAESTRO-Pool MRD assay in this setting, a novel MRD detection algorithm was developed that incorporates sample-specific and context-specific background single nucleotide variant (SNV) frequencies into the dynamic MRD detection algorithm described in Example 1. This study demonstrates that higher cfDNA yield can improve ctDNA testing in low-shedding cancers.

[0257] method

[0258] Patients and samples

[0259] Tissue and blood samples were collected from patients as part of a clinical trial (NCT02287428), while ctDNA analysis was performed outside the scope of the trial's objectives. One requirement for participation in this clinical trial was gross resection of the tumor, which generated a uniform baseline across the eight patients, with no gross evidence of tumor involvement. All patients provided written informed consent, and tissue, blood, and clinical data were collected and analyzed using a protocol approved by the Dana-Farber / Harvard Cancer Center Institutional Review Board and in accordance with the Declaration of Helsinki.

[0260] Tumor samples were formalin-fixed and paraffin-embedded (FFPE) after surgery. Peripheral blood samples used as a source of germline genetic information were frozen upon collection. Peripheral blood samples from which plasma was obtained were collected in K2 EDTA or BCT Streck tubes and processed within 4 hours. Plasma was carefully separated from peripheral blood by centrifugation at 1,500 to 1,800 g for 15 minutes at room temperature; followed by a second centrifugation at ≥2,500 g for 10 minutes at room temperature. Plasma was stored at -80 °C until thawed. For one of the eight patients (GBM_7), plasma samples were not available during the first 450 days of tumor treatment. For the remaining seven patients, plasma samples were collected throughout their treatment. As part of the clinical trial protocol, patients underwent MRI approximately every 1 to 3 months, and tumor progression was confirmed relative to pseudoprogression by surgical pathology upon clinical indication. The presence of tumor progression on MRI and pathology in clinically determined patients was determined by neuroradiologists, neuro-oncologists, and neuropathologists using the criteria described in Table 1 without the knowledge of the MRD results.

[0261] Whole-genome sequencing (WGS) of tumor and normal samples

[0262] To identify tumor-specific mutations, tumor and normal genomic DNA were extracted from FFPE tissue and peripheral blood at the Broad Institute Genomics Platform, respectively, and whole-cell sequencing (WGS) was performed on tumor DNA (60x mean coverage) and normal DNA (15x mean coverage) using NovaSeq S4. Somatic variants were detected from the tumor and normal WGS data using the GATK best practice Mutect2 workflow. MAESTRO fingerprints and MRD Tracker fingerprints were designed as described in this paper.

[0263] Plasma sample processing and liquid biopsy determination

[0264] cfDNA was extracted from plasma using the QIAsymphony Circulating DNA Kit and quantified using the Quant-iT PicoGreen assay on a Hamilton STAR-line liquid processing system. As previously described, cfDNA libraries were constructed using the Kapa HyperPrep Kit and a custom-designed double-indexed duplex UMI adapter (IDT). The prepared library was then quantified using the Quant-iT PicoGreen assay on a Hamilton STAR-line liquid handling system. (MAESTRO-Pool and MRD Tracker) Follow the method described in Example 1. Details are provided in the additional methods of Example 2.

[0265] MRD Data Analysis

[0266] The returned FASTQ data were first aligned, deduplicated, and recalibrated following the GATK best practice workflow for "Data Preprocessing for Mutation Discovery." Next, UMIs were extracted using fgbio, the raw sequencing reads were converted into duplex concordant molecules, and fragment-level and site-level filtering was performed on the duplex molecules as described herein. Notably, the data was limited to tumor validation sites with ≥1 and 0 mutant duplexes in patient-matched tumors and normal controls, respectively. One variation of the MAESTRO-Pool processing workflow was the addition of a site-level outlier filter due to several low tumor fraction samples containing a site with an abnormally high number of mutant duplexes. The decision was made to implement outlier filtering on low tumor fraction samples (≤10 ppm) because each site was expected to contain only 0 or 1 ctDNA molecule in the low tumor fraction protocol. To prevent potential false positives, each site in the low tumor fraction samples underwent the following binomial test with a p-value ≥ 0.01 (with multiple hypothesis correction):

[0267] ●Bin(ALT i D i TFx LOO ) ≥ 0.01 / n

[0268] ○ALT i The number of ALT duplexes at site i

[0269] ○D i Estimated duplex depth at site i

[0270] ○TFx LOO Exclude tumor fractions from site i (leave-one-out method)

[0271] ○n: Number of tumor-verified sites in the fingerprint

[0272] Only duplex molecules that pass through all applicable filters are included in downstream analyses, including limit of detection estimation, MRD detection, and tumor fraction estimation. These methods follow the protocols described in Examples 1 and 2. Detailed information on estimating sample-specific SNV frequencies and dynamic MRD detector noise tuning is provided in the supplementary methods section of Example 2.

[0273] Predictive MRD results for apheresis tube equivalents

[0274] To assess the impact of large plasma volume collection on MRD results, computer-simulated downsampling was performed on MAESTRO results to simulate the expected MRD results of a typical plasma collection. 4 mL was chosen to represent a typical plasma collection, assuming a typical blood draw volume of ~10 mL and a plasma yield of ~40%. A normalization factor was calculated based on the plasma volume of each sample, and this normalization factor was used to randomly, binomially sample duplexes observed at each site. The downsampled duplexes were then used to detect MRD and estimate LOD95 and tumor fraction. This process was repeated 50 times for each sample to obtain the distribution of random downsampling. The predicted LOD95 was the median LOD95 from the random downsampling, and the predicted tumor fraction (and error bars) was the median tumor fraction (and error bars) from the MRD-positive downsampling. Finally, this distribution was used to estimate the MRD likelihood by calculating the proportion of downsamplings detected as MRD-positive.

[0275] result

[0276] The increased cfDNA yield resulting from larger plasma volumes improves the analytical sensitivity of ctDNA MRD assays.

[0277] To investigate the effect of larger plasma volumes—and thus higher cfDNA yields—on the analytical sensitivity of ctDNA MRD assays, 70 plasma samples collected from 8 glioblastoma patients (median 8 per patient, range 4 to 13) were evaluated. Plasma samples of known volume were classified as typical volumes (i.e., <10 mL obtained through 1 to 2 blood collection tubes). (n = 30, median 7.35 mL, range 2.5 to 9.7 mL) or large volumes (i.e., ≥10 mL; n = 32, median 34.0 mL, range 14.0 to 66.5 mL). Based on whole-genome sequencing of each patient's tumor and germline, tumor-specific SNVs were identified, from which a custom mutation enrichment MAESTRO assay was designed. MAESTRO assays showed a median of 1,225 tumor-specific SNVs per patient (range 726 to 5,007); Figure 19 The MAESTRO assays from each of the eight patients were then combined into a pooled assay called MAESTRO-Pool, targeting a total of 13,505 SNVs, as described in Example 1 of this paper, and MAESTRO-Pool was applied to all plasma samples from all patients. MAESTRO-Pool enabled simultaneous assessment of the specificity of MRD detection and custom assays by evaluating each sample using either its patient-matched tumor target combination (i.e., fingerprint) or patient-unmatched tumor fingerprint. For orthogonal validation, a custom MRD Tracker assay that did not involve mutation-enriched sequencing was used. ( Figure 19 And because of their high sequencing requirements, these are only applied to each patient's own plasma sample.

[0278] A correlation was observed between cfDNA production and plasma volume (Pearson r = 0.42, p = 7.5 x 10⁻⁶). -4 ; Figure 15A Furthermore, the measured duplex depth (i.e., the average number of DNA duplexes retrievable per locus) was also correlated with plasma volume (Pearson r = 0.54, p = 5.6 x 10⁻⁶). -6 ; Figure 20A ) and cfDNA input (Pearson r = 0.88, p = 2.3 x 10) -23 ; Figure 20B The detection limits (LODs) were correlated with the total doublets evaluated for each matched tumor fingerprint. Based on the total doublets evaluated for each matched tumor fingerprint, the limits of detection (LODs) for all samples (i.e., the tumor fraction with 95% power [LOD95]) were estimated, and it was found that larger volume samples (median LOD95 = 1.9 parts per million (ppm), range 0.7–24.1 ppm) had a lower LOD95 than typical volume samples (median LOD95 = 5.9 ppm, range 1.0–190.3 ppm; Mann-Whitney Up = 2.3 x 10⁻⁶). -6 ; Figure 15B It has a lower LOD95. In contrast, if only apheresis tube equivalents (i.e., 4 mL plasma; method) are collected, the LOD95 will be 5.0 times higher than the median (range 2.4 to 11.9). Figure 15C ).

[0279] Sample specificity and context specificity: Background noise tuning increases the specificity of MRD detection in MAESTRO assays.

[0280] To detect MRD using the MAESTRO-Pool, the empirically validated dynamic MRD detector described in Example 1 of this paper was first applied. This detector uses sample-specific properties—such as the observed tumor fraction, the assessed total double-stranded structure, and the fingerprint size—to estimate the likelihood that the observed MRD signal is a genuine tumor signal, rather than an artifact from the background SNV frequency observed in cfDNA. As in Example 1, a uniform background SNV frequency of 0.1 ppm was assumed for all samples. In patient-matched samples, this resulted in 20 MRD-positive detections out of 70 samples, with a median tumor fraction of 3.9 ppm (range 0.8 to 254.1 ppm). Figure 16A and Figure 16EOf the 66 samples also tested with the MRD Tracker, 63 samples had consistent MRD detection results and tumor scores. Figures 21A to 21B However, among the unmatched patient samples, 20 out of 490 samples (4.1%) were incorrectly detected as MRD positive, with a median tumor fraction of 0.9 ppm (range 0.5 to 4.0 ppm). Figure 16A , Figure 16E and Figures 22A to 22B Notably, the MAESTRO fingerprint of patient GBM_4 was the largest (n = 4,781) due to the high mutation rate associated with Lynch syndrome, but it had the lowest specificity (specificity = 0.81). Figure 22A This group accounted for the majority of false positive MRD detections (n ​​= 11 / 20). Since larger plasma volumes were the primary difference between Example 1 and Example 2, it was first investigated whether the false positives were due to larger plasma volumes. In the 20 MRD-positive non-matched samples, the plasma volume was not significantly larger than its MRD-negative counterpart (Mann-Whitney Up = 0.27; Figure 16B This indicates that false positives are not caused by a large plasma volume.

[0281] The assumption that the background SNV frequency of cfDNA samples is higher than 0.1 ppm caused the dynamic detector to overestimate the probability of MRD. To investigate this, the sample-specific SNV frequency from MAESTRO-Pool data was quantified. Figures 23A to 23C (Method), and confirmed the hypothesis that the background SNV frequency of multiple samples was higher than 0.1 ppm ( Figure 23A This finding was quite unexpected, as 87 out of 98 samples (89%) from Example 1 showed background SNV frequencies consistently at ~0.1 ppm or lower. Figure 24 It is noteworthy that patients with the highest median SNV frequency (GBM_9 = 0.5 ppm, GBM_7 = 0.4 ppm, GBM_2 = 0.4 ppm) accounted for 18 out of 20 false positive samples. Figure 22B To understand the source of the noise, SNV frequencies were deconstructed based on the mutation context (i.e., C>G, T>A, T>C, T>G). This further revealed that elevated SNV frequencies in patients with GBM_2, GBM_7, and GBM_9 were particularly due to their higher T>C frequencies ( Figure 16A and Figures 25A to 25B MRD-positive unmatched samples showed a significantly higher T>C frequency than MRD-negative unmatched samples (Mann-Whitney Up = 5.1 x 10⁻⁶). -7 ; Figure 16BAs orthogonal validation, the background SNV frequency was estimated for the same sequencing library using the MRD Tracker. The estimated SNV frequencies from the MRD Tracker were highly consistent with those from the MAESTRO-Pool, confirming an increased T>C frequency in many samples. Figures 23A to 23C and Figure 25B ).

[0282] Notably, examination of their clinical characteristics revealed that these were the only glioblastomas in the cohort with MGMT promoter methylation who had received temozolomide alkylation chemotherapy. In glioblastomas with MGMT promoter methylation, temozolomide can induce acquired mismatch repair deficiency. Given that mismatch repair defects are associated with T>C mutation patterns, these findings may suggest that previous use of temozolomide was a source of the observed T>C error rate, although further investigation is needed. Although temozolomide induces C>T mutation signature, the estimated background C>T frequency from patients with GBM_7, GBM_9, and GBM_2 was not disproportionately increased. Figures 25A to 25B However, C>T mutations inherently have a higher background SNV frequency (~1 ppm vs. 0.1 ppm) than other mutation contexts, so the differences may be more difficult to resolve.

[0283] Based on the above observations of SNV specificity, a novel MRD detection strategy was developed by tuning the assumed SNV frequency of the dynamic detector using sample-specific and context-specific SNV frequencies. Figure 16C More specifically, the dynamic detector is adapted to allow independent tuning and weighting (method) for each mutation context. This approach effectively suppressed false-positive MRD detections in most patient-mismatched samples (16 out of 20). Figures 16D to 16E Furthermore, it improved the specificity of most MAESTRO panels to nearly 100% (median specificity = 99%, range 94-100%). Figures 26A-26C Notably, due to the measured SNV frequency being below 0.1 ppm, tuning introduced 3 false positives, resulting in a dynamic probability score above 0.95 for the dynamic detector. Furthermore, tuning the dynamic detector had minimal impact on MRD detection in patient-matched samples. Of the 20 untuned MRD positives, 19 remained after tuning, while another MRD positive was detected during tuning. Figures 16D to 16E and Figures 26A to 26C It is noteworthy that samples that became MRD positive with tuning (from day 147 of GBM_32); Figure 27The presence of another MRD-positive sample with a similar tumor score 14 days later further supports the existence of MRD and highlights the influence of sample-specific and context-specific tuning.

[0284] Larger plasma volume collection improves the sensitivity of MRD detection and the accuracy of tumor fraction estimation.

[0285] In all MRD-positive samples, larger plasma volumes detected lower tumor fractions (median tumor fraction = 2.7 ppm, range 0.8 to 7.6 ppm; Mann-Whitney Up = 3.6 x 10⁻⁶). -3 The median tumor fraction was collected at a typical volume (median tumor fraction = 5.9 ppm, range 3.5 to 254.1 ppm). Figure 17A To further evaluate the impact of large plasma volume collection on MRD detection, data were simulated for apheresis tube equivalents (i.e., 4 mL plasma) by downsampling DNA duplexes recovered from MRD-positive samples with >10 mL plasma. The process was simulated 50 times per sample to allow estimation of the likelihood of retaining the MRD-positive status. Of the 10 MRD-positive samples with >10 mL plasma, samples n = 3, n = 1, and n = 6 were detected as MRD-positive in ≥95%, 50%–95%, and <50% of the simulated apheresis tube equivalents, respectively. Figure 17B Of the six samples that were detected as MRD positive in <50% of the simulations, five samples had ≥50% potency when using whole plasma volume to detect the observed tumor fraction, which doubled the ctDNA positivity rate and was described as 'made possible by large-volume collection'. Figure 17B These studies tended to collect larger plasma volumes (median 24.2 mL, range 7.5 to 34.0 mL), with greater cfDNA yields (median 118.0 ng, range 33.8 to 243.9 ng), and lower tumor fractions (median 2.2 ppm, range 1.2 to 4.1 ppm), suggesting that higher cfDNA yields (i.e., collected by larger volumes) improve the ability to detect low ppm levels of ctDNA. The effect of plasma volume on tumor fraction estimation was also examined. As expected, differences were found in tumor fraction estimation between large-volume samples and their apheresis tube equivalents (median error for apheresis tube equivalents = 114.5%, range 2.0 to 521.3%). Figure 17C The greatest difference was observed for the lowest tumor scores. Furthermore, simulated apheresis tube equivalents showed a larger confidence interval ratio (the median confidence interval ratio increased by 3.0-fold, ranging from 1.8 to 3.8). Figure 17C ).

[0286] Association between MRD testing and clinical response in glioblastoma patients

[0287] After determining the performance characteristics of the MAESTRO-Pool MRD assay, a secondary analysis was conducted to evaluate whether MAESTRO-Pool was helpful in differentiating radiographic pseudoprogression from true tumor progression in the clinical context. The performance of ctDNA-based assays was compared with that of standard care MRI and histopathology for glioblastoma (Table 1).

[0288] Overall, 7 out of 8 glioblastoma patients (87.5%) had detectable MRD at at least one time point, with the 8th patient showing a borderline MRD detection result (0.75 ≤ P ≤ 0.95). Figures 18A to 18D Of the seven patients whose plasma was collected within three weeks of their initial surgery, plasma MRD was detected in five (71.4%). Three patterns of association were identified among ctDNA testing results, radiological assessments, and pathological assessments in seven of the eight patients—described below. The lower frequency of MRI in the eighth patient (GBM_32) complicated the comparison between radiological findings and ctDNA testing results. Figure 27 ).

[0289] In 'Mode 1', MRD positivity precedes histological tumor progression, while concurrent radiological findings are indeterminate (i.e., true progression versus false progression; GBM_2, GBM_6); Figure 18A In GBM_2, MRI during immunotherapy and chemotherapy (day 191) reported indeterminate progression (i.e., pseudoprogression relative to tumor progression); however, MRD was positive at three proximal plasma time points (days 170, 184, and 212) (tumor fractions 1.2 to 2.6 ppm). Figure 18A-left). These findings were consistent with subsequent MRI (day 233) and surgical resection (day 241), both of which confirmed tumor progression, indicating that MRD positivity preceded radiographic confirmation of tumor progression by 63 days. Based on a simulation using a 4 mL plasma volume, detection at all three MRD-positive time points was only possible with larger plasma volumes. Conversely, a typical volume of plasma sample was collected three days prior to surgery (day 238), but no MRD was detected, further supporting the hypothesis that larger volumes can circumvent false negatives. During surgery, only tumor debulking was achieved, and subsequent imaging showed continued tumor progression (days 254 to 336), consistent with MRD detection (day 336; the last plasma time point before the patient's death). Similarly, GBM_6 was reported to have indeterminate progression on imaging after radiotherapy (days 104 to 165), followed by tumor progression reported on MRI between days 214 and 256; however, MAESTRO-Pool detected MRD (tumor fraction 2.7 to 7.6 ppm) between days 165 and 214. Figure 18A (Right), prior to radiographic confirmation of tumor progression 49 days prior. As in GBM_2, the earlier time point (day 165) made MRD detection possible due to the larger plasma volume. Although intermediate MRI reported persistent tumor progression, MRD was not detected in GBM_6 at subsequent time points (days 235 and 276). These findings suggest that large-volume MAESTRO-pool MRD testing, combined with standard contrast-enhanced MRI, may play a role in the sensitive and early detection of true glioblastoma progression.

[0290] In 'Mode 2', despite multiple concurrent post-radiotherapy MRI reports of indeterminate progression, negative MRD based on large plasma volume analysis was associated with a lack of tumor progression (GBM_5, GBM_8, ...). Figure 18B Two patients subsequently underwent surgical resection due to indeterminate progression; pathology reports showed <50% viable tumor and no histological evidence of progression, confirming that the MRI findings were most consistent with pseudoprogression. Furthermore, for GBM_5, the MRI report of tumor progression 11 days prior to the pathological finding of pseudoprogression (day 259) was positive, while the corresponding plasma time point (day 259) was MRD negative. In summary, these results suggest that large-volume MRD testing, combined with standard contrast-enhanced MRI, may play a role in the specific diagnosis of pseudoprogression, potentially helping to save patients with stable symptoms from repeat surgery. A similar pattern was observed in GBM_4, where persistently negative MRD postoperatively was consistent with the lack of imaging evidence of tumor progression. Figure 18CHowever, this patient exhibited unique characteristics of germline mismatch repair deficiency and a persistent immune checkpoint blockade response, with small nodular enhancements around the surgical cavity that slowly subsided over time.

[0291] Finally, in 'Mode 3', there was radiological and / or pathological evidence of tumor progression; however, no MRD was detected at previous time points (GBM_7, GBM_9). Figure 18D The negative time point may be partly attributed to the need for greater T>C noise tuning, potentially due to MGMT promoter methylation and / or the effects of temozolomide in these tumors. Furthermore, MAESTRO-Pool showed insufficient efficacy in detecting MRD at perioperative time points in GBM_9, with LOD95 > 100 ppm — due to the typical plasma volume with low cfDNA yield (~3 ng) — and the plasma volumes in GBM_7 were not large.

[0292] discuss

[0293] Liquid biopsy holds promise for improving cancer detection and surveillance, but remains limited by the scarcity of ctDNA in blood. Temporarily reducing cfDNA removal has been proposed as a solution to improve ctDNA recovery and enhance the performance of liquid biopsy. In fact, the first intravenous initiator was recently described that temporarily reduced cfDNA clearance and recovered up to 60-fold ctDNA in mice. Therefore, initiators could enable greater ctDNA recovery without requiring excessive blood volumes. However, initiators are not yet available for human testing, and due to the limited blood volumes typically drawn in routine clinical practice (i.e., one or two 10 mL tubes), clinical data on assay performance characteristics associated with higher cfDNA yields are scarce. Here, blood volumes up to ~15 times greater than those from a single-tube blood draw were analyzed, and some preliminary insights into the impact of higher cfDNA yields were provided. MAESTRO-Pool was also optimized and applied to utilize higher cfDNA input and prevent false detections due to intrinsic sample variability.

[0294] MAESTRO technology uses probes that preferentially hybridize with mutated DNA molecules and enrich their abundance after hybridization capture to target thousands of genome-wide single nucleotide variants (SNVs) in each patient, thus reducing the cost of double-stranded sequencing by up to one hundred times. Double-strand sequencing (where both strands of a DNA fragment are barcoded and only mutations represented by the two strands are considered) significantly improves sequencing accuracy. Thus, MAESTRO is well-suited for tracking large numbers of mutations and utilizing all available cfDNA. To further improve MAESTRO's specificity, a novel dynamic MRD detector was developed that incorporates sample-specific background SNV frequencies. This helps prevent false detections in "marginal cases" with atypical background SNV frequencies, as observed in plasma samples from three patients with MGMT promoter methylation treated with temozolomide, for whom the presumed elevated T>C background SNV frequencies might be treatment-related. Furthermore, having MAESTRO-Pool data from both matched and unmatched patient fingerprints allows for sampling of sufficient cfDNA duplexes, enabling accurate quantification of low-level (<1 ppm) sample-specific background SNV frequencies.

[0295] Given that IDH wild-type glioblastoma is typically classified as a "low-shedding" tumor type and the need for minimally invasive diagnostics that can provide longitudinal and accurate monitoring of disease status remains unmet, this was chosen as an ideal research context. The unique biological characteristics of the blood-brain barrier and cerebral interstitial fluid drainage are presumed to limit the presence of glioblastoma ctDNA in plasma. This theory is supported by the high rate of glioblastoma ctDNA detected in cerebrospinal fluid (CSF) studies—for example, in one study, detectable ctDNA was found in the CSF of all nine evaluated glioblastoma patients using mutation-specific digital droplet PCR—but it has not been supported by several plasma studies. In another example, among 11 cases of CSF ctDNA-positive IDH wild-type glioblastoma, only 1 case (11%) also showed detectable ctDNA in plasma. Similarly, early pan-cancer studies using typical 1 to 2 tubes of blood collection detected plasma ctDNA in less than 30% of glioblastoma patients. Finally, in another study of 20 patients with high-grade gliomas, although mutations were detected in preoperative plasma in 55% of cases using non-personalized Guardant360 NGS-based ctDNA assays, none of the mutations were shared with the corresponding tumor tissue sequencing—this suggests that many could represent false positives (e.g., from clonal hematopoiesis). .

[0296] In contrast, tumor-specific ctDNA was detected in 71% (5 / 7) of glioblastoma patients within 3 weeks of initial surgery and in 88% (7 / 8) of glioblastoma patients overall—including detection at multiple time points in 75% (6 / 8) of patients. In summary, these findings suggest that recovering more cfDNA than is typically collected in routine clinical practice can help improve ctDNA detection. It was also found that taking into account the variation in background SNV frequencies between samples reduced the number and proportion of false positive patient mismatch tests without affecting sensitivity. The importance of this improved performance characteristic is highlighted as research in the liquid biopsy field is increasingly evaluating higher cfDNA yields per sample and seeking more accurate detection of ctDNA at the parts-per-million level.

[0297] As a secondary analysis, the utility of MAESTRO-Pool in assessing true tumor progression versus pseudoprogression (which is difficult to distinguish using contemporary MRI techniques) was evaluated in eight patients with glioblastoma. In cases where progression is indeterminate on contrast-enhanced MRI, data suggest that the MAESTRO-Pool, with its large plasma volume, may improve the sensitivity of true progression detection (Mode 1) and the specificity of identifying false progression (Mode 2). Early detection of tumor progression is crucial for identifying glioblastoma patients who are no longer responding to current therapies and may benefit from participation in new clinical trials, while accurate diagnosis of false progression can help save symptom-stable glioblastoma patients from unnecessary surgery.

[0298] In summary, these findings suggest that increased cfDNA recovery can improve liquid biopsy testing in cancers with low ctDNA shedding, such as glioblastoma. Efforts to recover more cfDNA from the body may help make liquid biopsy available to more patients with various cancer types.

[0299] References 1. Cohen, JD et al. Detection and localization of surgicallyresectable cancers with a multi-analyte blood test. Science 359, 926–930 (2018). 2. Parikh, AR et al.Liquid versus tissue biopsy for detectingacquired resistance and tumor heterogeneity in gastrointestinal cancers. Nat Med 25, 1415–1421 (2019). 3.Moding, E. J., Nabet, B. Y., Alizadeh, A. A. & Diehn, M. DetectingLiquid Remnants of Solid Tumors: Circulating Tumor DNA Minimal ResidualDisease. Cancer Discov 11, 2968–2986 (2021). 4.Liu, M. C. et al. Sensitive and specific multi-cancer detection andlocalization using methylation signatures in cell-free DNA. Ann Oncol 31,745–759 (2020). 5.Jamshidi, A. et al. Evaluation of cell-free DNA approaches for multi-cancer early detection. Cancer Cell 40, 1537-1549.e12 (2022). 6.Parsons, H. A. et al. Sensitive Detection of Minimal Residual Diseasein Patients Treated for Early-Stage Breast Cancer. Clin Cancer Res 26, 2556–2564 (2020). 7.Zhang, Y. et al. Pan-cancer circulating tumor DNA detection in over10,000 Chinese patients. Nat Commun 12, 11 (2021). 8.Kurtz, D. M. et al. Enhanced detection of minimal residual disease bytargeted sequencing of phased variants in circulating tumor DNA. Nat Biotechnol 39, 1537–1547 (2021). 9.Gydush, G. et al. Massively parallel enrichment of low-frequencyalleles enables duplex sequencing at low depth. Nat Biomed Eng 6, 257–266(2022). 10. Zviran, A. et al. Genome-wide cell-free DNA mutational integrationenables ultra-sensitive cancer monitoring. Nat Med 26, 1114–1124 (2020). 11. Shen, S. Y. et al. Sensitive tumour detection and classificationusing plasma cell-free DNA methylomes. Nature 563, 579–583 (2018). 12. Chemi, F. et al. cfDNA methylome profiling for detection andsubtyping of small cell lung cancers. Nat Cancer 3, 1260–1270 (2022). 13. Cristiano, S. et al. Genome-wide cell-free DNA fragmentation inpatients with cancer. Nature 570, 385–389 (2019). 14. Martin-Alonso, C. et al. Priming agents transiently reduce theclearance of cell-free DNA to improve liquid biopsies. Science 383, eadf2341(2024). 15. Blewett, T. et al. MAESTRO-Pool Enables Highly Parallel andSpecific Mutation-Enrichment Sequencing for Minimal Residual DiseaseDetection in Cohort Studies. Clin Chem 70, 434–443 (2024). 16. Iorgulescu, J. B. et al. Molecular biomarker-defined brain tumors:Epidemiology, validity, and completeness in the United States. Neuro Oncol 24, 1989–2000 (2022). 17. Ostrom, Q. T. et al. National-level overall survival patterns formolecularly-defined diffuse glioma types in the United States. Neuro Oncol 25, 799–807 (2023). 18. Bagley, S. J. et al. Clinical Utility of Plasma Cell-Free DNA inAdult Patients with Newly Diagnosed Glioblastoma: A Pilot Prospective Study. Clin Cancer Res 26, 397–407 (2020). 19. Schwaederle, M. et al.Detection rate of actionable mutations indiverse cancers using a biopsy-free (blood) circulating tumor cell DNA assay. Oncotarget 7, 9707–9717 (2016). 20. Bettegowda, C. et al. Detection of circulating tumor DNA in early-and late-stage human malignancies. Sci Transl Med 6, 224ra24 (2014). 21. Ellingson, B. M., Chung, C., Pope, W. B., Boxerman, J. L. &Kaufmann, T. J. Pseudoprogression, radionecrosis, inflammation or true tumorprogression? challenges associated with glioblastoma response assessment inan evolving therapeutic landscape. J Neurooncol 134, 495–504 (2017). 22. Carpenter, E. L. & Bagley, S. J. Clinical utility of plasma cell-free DNA in gliomas. Neurooncol Adv 4, ii41–ii44 (2022). 23. Odegaard, J. I. et al. Validation of a Plasma-Based ComprehensiveCancer Genotyping Assay Utilizing Orthogonal Tissue- and Plasma-BasedMethodologies. Clin Cancer Res 24, 3539–3549 (2018). 24. Kasi, P. M. et al.BESPOKE IO protocol: a multicentre, prospectiveobservational study evaluating the utility of ctDNA in guiding immunotherapyin patients with advanced solid tumours. BMJ Open 12, e060342 (2022). 25. Finkle, J. D. et al. Validation of a liquid biopsy assay withmolecular and clinical profiling of circulating tumor DNA. NPJ Precis Oncol 5, 63 (2021). 26. Woodhouse, R. et al. Clinical and analytical validation ofFoundationOne Liquid CDx, a novel 324-Gene cfDNA-based comprehensive genomicprofiling assay for cancers of solid tumor origin. PLoS One 15, e0237802(2020). 27. Merryman, R. W. et al. Comparison of whole-genome andimmunoglobulin-based circulating tumor DNA assays in diffuse large B-celllymphoma. HemaSphere 8, e47 (2024). 28. Parsons, H. A. et al. Circulating tumor DNA association withresidual cancer burden after neoadjuvant chemotherapy in triple-negativebreast cancer in TBCRC 030. Ann Oncol34, 899–906 (2023). 29. Touat, M. et al. Mechanisms and therapeutic implications ofhypermutation in gliomas. Nature 580, 517–523 (2020). 30. Alexandrov, L. B. et al. The repertoire of mutational signatures inhuman cancer. Nature 578, 94–101 (2020). 31. Xiong, K. et al. Duplex-Repair enables highly accurate sequencing,despite DNA damage. Nucleic Acids Res 50, e1 (2022). 32. Friedman, J. S., Hertz, C. A. J., Karajannis, M. A. & Miller, A.M. Tapping into the genome: the role of CSF ctDNA liquid biopsy in glioma. Neurooncol Adv 4, ii33–ii40 (2022). 33. Martínez-Ricarte, F. et al. Molecular Diagnosis of Diffuse Gliomasthrough Sequencing of Cell-Free Circulating Tumor DNA from CerebrospinalFluid. Clin Cancer Res 24, 2812–2819 (2018). 34. Miller, A. M. et al. Tracking tumour evolution in glioma throughliquid biopsies of cerebrospinal fluid. Nature 565, 654–658 (2019).

[0300] Table 1: Assessment of radiological and pathological responses.

[0301]

[0302] Additional methods in Example 2

[0303] MAESTRO-Pool and MRD Tracker measurements

[0304] In summary, for MAESTRO-Pool, hybridization capture was performed using the xGen hybridization and washing kit with the xGen universal blocking agent (IDT). Each hybridization capture contained up to 12 samples, each with a library mass equivalent to 50 times the DNA mass used for library construction, and used a 4 pmol MAESTRO-Pool panel. The hybridization program began at 95°C for 30 seconds, followed by a gradual decrease in temperature from 65°C to 50°C, decreasing by 1°C every 48 minutes. Finally, the plate was held at 50°C for at least four hours. The heat washing step was performed at 50°C. After the first round of hybridization capture, 16 cycles of PCR were applied. The products were used for a second round of hybridization capture using a 2 pmol Maestro-Pool panel. This was followed by another 16 cycles of PCR. The final captured products were quantified and pooled for sequencing on Illumina NovaSeq S4 (151 bp paired-end reads) at a target raw depth of 10,000 x per locus per 20 ng of DNA mass used for library construction.

[0305] The MRD Tracker hybridization capture differs significantly from the MAESTRO-Pool described above, including: 1) the library quality used for hybridization capture is approximately 25 times the DNA quality used for library construction; 2) the hybridization procedure begins at 95°C for 30 seconds, followed by holding at 65°C for at least four hours; and 3) the heat washing step is performed at 65°C. The final capture product is sequenced, targeting a raw depth of 10,000 x 10 ng of DNA quality used for library construction per site.

[0306] Estimating sample-specific SNV frequencies

[0307] To quantify sample-specific SNV frequencies, the region of the probe-binding domain of the captured cfDNA molecule was utilized. It was hypothesized that the captured and sequenced bases of overlapping probes would not represent sample-specific SNV frequencies due to the mutation enrichment of MAESTRO. However, the MAESTRO probe is small (~30 bp) relative to the typical size of cfDNA (~167 bp), so most bases in each captured molecule do not interact with the MAESTRO probe and should therefore represent sample-specific background SNV frequencies. Therefore, MAESTRO data were initially limited to ±250 bp of the probe region. Then, the same fragment-level and site-level filters used for MRD detection were applied to remove the effects of known technical artifacts (see MAESTRO-Pool above). This yielded counts of the number of reference bases and the number of SNVs observed for each sample, which were used to calculate sample-specific and context-specific background SNV frequencies.

[0308] Dynamic MRD detector with noise tuning

[0309] Example 1 describes a dynamic MRD detector that measures sample-specific properties such as observed tumor fraction, validated fingerprint size, and cfDNA quality, and quantifies the probability that the observed data is a genuine tumor signal rather than a spurious spontaneous error. Analysis of nearly 100 negative control samples from previous studies consistently showed a background SNV frequency ≤ 0.1 ppm. Therefore, this is assumed to be the default background SNV frequency for the dynamic detector, and this method was applied to all MAESTRO-Pool and MRD Tracker samples in this example. Furthermore, as described in Example 1, a dynamic probability threshold of P ≥ 0.95 was used to detect MRD-positive samples.

[0310] Here, the dynamic MRD detector is adapted to enable context-specific noise tuning. This is achieved by assuming independence between mutation contexts, as follows:

[0311] Equation (4)

[0312]

[0313] Equation (5)

[0314]

[0315] Equation (6)

[0316]

[0317]

[0318] Equation (7)

[0319]

[0320] Equation (8)

[0321]

[0322]

[0323] Two key changes occur in the likelihood function, which becomes a product of context-specific likelihoods for each context (Equations 5 and 6). Furthermore, the likelihood function for the MRD negative state is changed to a β-binomial model, which allows (1) initializing the prior with the background SNV frequency for each context, and (2) tuning the background SNV frequency for each context with the sample-specific SNV frequency measured by MAESTRO (see Estimating Sample-Specific SNV Frequency). For (1), inferring the establishment of the prior is crucial because not every MAESTRO sample can measure context-specific SNV frequencies as low as 0.01 ppm. To account for this, a prior of β (mean = 0.05 ppm, standard deviation = 0.05 ppm) is initialized for each mutation context, translating to a conservative overall SNV frequency of 2 ppm (i.e., 0.5 ppm). C>G + 0.5 ppm T>A + 0.5 ppm T>C + 0.5 ppm T>G For (2), the sample-specific SNV frequencies measured by MAESTRO, modeled as binomials (number of observed SNVs / number of observed bases), can be combined with the prior to calculate the posterior, representing the tuned sample-specific SNV frequency for each mutation context. In summary, these changes enable the dynamic detector to perform sample-specific and context-specific noise tuning when background noise is known. It is used in conjunction with the SNV frequencies measured by MAESTRO and is applied to all MAESTRO-Pool samples.

[0324] Instance environment

[0325] Figure 33 This is an illustration of environment 3300, which can be operated to apply dynamic MRD classification as described herein. Environment 3300 includes... Figure 32 The adaptation for environment 3200 allows each mutation context to be independently tuned and weighted. Therefore, for the sake of brevity, the following discussion will focus on environment 3300 and... Figure 32 The difference in environment 3200, and Figure 32 The previously introduced components have the same number and function as described above.

[0326] In environment 3300, specificity factor 3226 further includes context-specific factor 3302. Context-specific factor 3302 is determined for multiple mutation contexts, such as C to G mutations, T to A mutations, T to C mutations, and T to G mutations. Furthermore, in at least one embodiment, specificity factor 3226 further includes sample-specific background mutation frequency 3304, which may be used as a supplement to or alternative to the assumed background mutation frequency 3232. Sample-specific background mutation frequency 3304 is an empirically derived background mutation frequency, for example, derived from a sample. For example, sample-specific background mutation frequency 3304 may represent a sample-specific error rate. Thus, in at least one embodiment, context-specific analysis made possible by environment 3300 is performed using the assumed background mutation frequency 3232, while in at least one other embodiment, context-specific analysis made possible by environment 3300 is performed using the sample-specific background mutation frequency 3304.

[0327] For example, the specificity factor determination algorithm 3224 is adapted to quantify the observed background mutation frequency 3306, which may be the frequency of SNVs observed in the sample (see the section on estimating sample-specific SNV frequencies). The specificity factor determination algorithm 3224 may be further adapted to determine the number of observed background bases 3308. For example, the number of observed background bases 3308 refers to the number of sequencing bases analyzed. The observed background mutation frequency 3306 and the observed number of background bases 3308 may be used together to determine the sample-specific background mutation frequency 3304. In at least one embodiment, the observed background mutation frequency 3306 and the observed number of background bases 3308 are determined for each of a plurality of mutation contexts.

[0328] For example, the observed background mutation frequency 3306 and the observed number of background bases 3308 refer to the outputs of some implementations of the specificity factor determination algorithm 3224 that quantifies the sample-specific background mutation frequency 3304. The sample-specific background mutation frequency 3304 differs from tumor signals and therefore should not be analyzed at sites in the tumor fingerprint. For example, when analyzing MAESTRO data, the sample-specific background mutation frequency 3304 is quantified in regions surrounding and excluding the probe region. Furthermore, the observed background mutation frequency 3306 refers to the count of mutant bases when assessing DNA duplexes in these regions, while the observed number of background bases 3308 refers to the total number of bases (wild-type and mutant) when assessing DNA duplexes in these regions.

[0329] In addition to using the total mutant double strand 3228 and the total assumed double strand 3230, the tuning algorithm 3236 also uses a context-specific factor 3302 to tune the parameters of the MRD positive likelihood model 3310 configured to output a first likelihood 3240 and the MRD negative likelihood model 3312 configured to output a second likelihood 3244. Therefore, the environment 3300 and... Figure 32 Compared to environment 3200, MRD classification module 3234 can model the first likelihood 3240 and the second likelihood 3244 in different ways.

[0330] For example, the MRD positive likelihood model 3310 determines the likelihood of observed data when the sample is MRD positive, such as according to Equation 5 given above. The MRD positive likelihood model 3310 differs from... Figure 32 The MRD positive likelihood model 3238 is used to utilize the context-specific factor 3302. For example, the MRD positive likelihood model 3310 can be a binomial probability calculation (e.g., a binomial model) that applies to each mutation context. Using the total mutant double strand 3228 and the total assumed double strand 3230, the first likelihood 3240 can be calculated as the product of the individual likelihoods of the various mutant contexts. Therefore, the first likelihood 3240 corresponds to the overall likelihood of all mutant contexts.

[0331] For example, the MRD negative likelihood model 3312 determines the likelihood of observed data in the case of a negative MRD status, such as according to Equation 6 given above. The MRD negative likelihood model 3312 can be a β-binomial probability calculation (e.g., a β-binomial model) that yields a probability for each mutation context. Using total mutant double strand 3228, total assumed double strand 3230, and parameters and The MRD negative likelihood model 3312 can calculate the second likelihood 3244 as the product of the individual likelihoods of each mutation context across multiple mutation contexts. Therefore, the second likelihood 3244 corresponds to the overall likelihood of all mutation contexts.

[0332] In at least one implementation, the tuning algorithm 3236 can be adjusted according to Equation 7 above. .For example, It can be based on (For example, the expected average frequency of SNV, in parts per million, can be set to 0.05 ppm) and (For example, the expected average frequency of SNVs, in parts per million, can also be set to 0.05 ppm.) The calculated baseline parameters can be empirically derived constants reflecting the expected frequency of variation and its variability. To determine... The observed background mutation frequency of 3306 (e.g., the observed count of single nucleotide variants in the mutation context) (As determined based on SNV frequency estimation) added (See Equation 7).

[0333] Similarly, in at least one implementation, the tuning algorithm 3236 can be adjusted according to Equation 8 above. .For example, It can be based on and The calculated baseline expectation. To determine... The number of background bases observed was 3308 (e.g., in the context of mutations). Sequencing bases) added to And subtract the observed background mutation frequency 3306 (see Equation 8). Therefore, the observed background mutation frequency 3306 and the observed number of background bases 3308 can be used to tune the MRD negative likelihood model 3312 for each mutation context.

[0334] The first likelihood 3240 and the second likelihood 3244 are used to output a dynamic probability score 3248 by means of a dynamic probability scoring algorithm 3246, as mentioned above. Figure 32 As described above. For example, the dynamic probability scoring algorithm 3246 can use Bayesian probability calculations, such as Equation 4 provided above.

[0335] In at least one implementation, the dynamic probability scoring algorithm 3246 uses initialization parameters (e.g., and The prior probability (also known as the "prior") is calculated, and then the sample-specific background mutation frequency 3304 (e.g., modeled as a binomial (observed background mutation frequency 3306, observed number of background bases 3308)) is combined with the prior probability to calculate the dynamic probability score 3248.

[0336] The dynamic probability score of 3248 enables the comparison of ctDNA signals between samples. For example, when using a dynamic model for MRD classification of MAESTRO-Pool data, dynamic probability scores can be compared between patient-matched and patient-unmatched samples. One application is the identification of borderline positive samples, which are patient-matched samples whose probability scores are below a threshold (e.g., P > 0.95) but higher than those of most patient-unmatched samples.

[0337] Thus, when the background noise is known, the MRD classification module 3234 is suitable for performing sample-specific and context-specific noise tuning.

[0338] Instance program

[0339] This section describes an exemplary procedure for performing dynamic MRD classification in one or more embodiments. Aspects of this procedure may be implemented in hardware, firmware, or software, or a combination thereof. The procedure is shown as a set of blocks specifying operations performed by one or more devices, and is not necessarily limited to the order in which the operations are performed by the respective blocks as shown. In at least some embodiments, the procedure is performed by a suitably configured device, such as… Figure 32 and 33 The sequencing data processor 3206.

[0340] Figure 34 An example program 3400 for performing dynamic MRD classification is shown.

[0341] Targeted cell-free DNA sequencing data from patient samples is received (box 3402). For example, targeted cell-free DNA sequencing data 3216 can be generated using the MAESTRO or MAESTRO-Pool techniques described herein. The targeted cell-free DNA sequencing data 3216 is enriched for DNA duplexes with mutation sites found in tumor fingerprints associated with the patient. In some embodiments, the targeted cell-free DNA sequencing data 3216 may be further enriched for DNA duplexes with additional mutation sites found in tumor fingerprints associated with other patients.

[0342] Identifying specificity factors based on targeted cell-free DNA sequencing data (box 3404). This includes identifying sample-specific specificity factors (box 3406), such as the number of mutant double strands in the sample (e.g., total mutant double strands 3228) and the total number of probable double strands detected by mutation in the sample (e.g., total probable double strands 3230). Optionally, identifying specificity factors further includes identifying context-specific specificity factors (box 3408), including the observed background mutation frequency and the observed number of background bases for each of the multiple mutation contexts.

[0343] The number of mutant double strands in the sample determined by the first likelihood model is entirely derived from the first likelihood of the tumor (box 3410). In at least one embodiment, the first likelihood model can be Figure 32 The MRD positive likelihood model is 3238. Alternatively, the first likelihood model could be... Figure 33 The MRD positive likelihood model 3310. The first likelihood model can be a binomial model, which uses the number of mutant double strands and the total number of assumed double strands determined by the mutation site in the sample, as well as the assumed background mutation frequency. Figure 33 In some instances, the first likelihood model may additionally or alternatively use sample-specific background mutation frequencies.

[0344] Determining the number of mutant double strands in a sample via a second likelihood model is a spontaneously erroneous second likelihood (box 3412). In at least one embodiment, the second likelihood model is Figure 32 The MRD negative likelihood model 3242. In such embodiments, the second likelihood model may be a binomial model, which uses the number of mutant double strands and the total number of assumed double strands determined by the mutation site in the sample, as well as the assumed background mutation frequency. Alternatively, the second likelihood model may be Figure 33 The MRD negative likelihood model 3312. In such embodiments, the second likelihood model may be a β binomial model, which uses the number of mutant double strands and the total number of assumed double strands determined by mutation sites in the sample, as well as the observed background mutation frequency 3306 and the observed number of background bases 3308 for context-specific noise tuning.

[0345] Based on the first and second likelihoods, the probability that a sample contains circulating tumor DNA is determined via a probability scoring algorithm (box 3414). For example, the dynamic probability scoring algorithm 3246 can use Bayesian probability calculation to output a dynamic probability score 3248.

[0346] Determine whether the probability is greater than or equal to a threshold (box 3416). In at least one embodiment, the threshold is a predetermined, adjustable value used to distinguish samples that include circulating tumor DNA (e.g., MRD-positive samples) from samples that do not include circulating tumor DNA (e.g., MRD-negative samples). As a non-limiting illustrative example, the threshold is 0.95.

[0347] If the probability is greater than or equal to the threshold (the "Yes" branch of box 3416), the sample is classified as positive for circulating tumor DNA (e.g., MRD positive) (box 3418). If the probability is less than the threshold (the "No" branch of box 3416), the sample is classified as negative for circulating tumor DNA (e.g., MRD negative) (box 3420).

[0348] The classification of the output sample (e.g., as MRD positive or MRD negative) (box 3422). For example, the MRD classification 3218 can be displayed via the display device 3252 of the client device 3204 or stored in memory for later access.

[0349] In this way, program 3400 generates MRD classifications to improve the sensitivity of circulating tumor DNA detection and reduce the probability of false detections, using sample-specific and / or context-specific factors to adapt its confidence level to the specific sample being evaluated.

[0350] Instantaneous systems and devices

[0351] Figure 35Generally, an exemplary system is illustrated at 3500, which includes an exemplary computing device 3502, representing one or more computing systems and / or devices that can implement the various technologies described herein. This is illustrated by the inclusion of a sequencing data processor 3206. The computing device 3502 can be, for example, a server of a service provider, a device associated with a client (e.g., a client device), a system-on-a-chip, and / or any other suitable computing device or computing system.

[0352] The illustrated exemplary computing device 3502 includes a processing system 3504, one or more computer-readable media 3506, and one or more I / O interfaces 3508, which are communicatively coupled to each other. Although not shown, the computing device 3502 may also include a system bus or other data and command transfer system that couples the various components to each other. The system bus may include any one or a combination of different bus architectures, such as a memory bus or memory controller, a peripheral bus, a universal serial bus, and / or a processor or local bus utilizing any of the various bus architectures. Various other examples, such as control lines and data lines, are also contemplated.

[0353] Processing system 3504 represents the use of hardware to perform one or more operational functions. Therefore, processing system 3504 is illustrated as including hardware elements 3510 that can be configured as processors, function blocks, etc. This can include implementations in hardware as application-specific integrated circuits (ASICs) or other logic devices formed using one or more semiconductors. Hardware element 3510 is not limited by its forming material or the processing mechanism employed therein. For example, a processor can consist of semiconductors and / or transistors (e.g., integrated circuits (ICs)). In this context, processor-executable instructions can be electronically executable instructions.

[0354] Computer-readable storage medium 3506 is illustrated as including memory / storage 3512. Memory / storage 3512 represents a memory / storage capacity associated with one or more computer-readable media. Memory / storage 3512 may include volatile media (such as random access memory (RAM)) and / or non-volatile media (such as read-only memory (ROM), flash memory, optical disk, magnetic disk, etc.). Memory / storage 3512 may include fixed media (e.g., RAM, ROM, fixed hard disk, etc.) and removable media (e.g., flash memory, removable hard disk, optical disk, etc.). Computer-readable medium 3506 may be configured in various other ways as further described below.

[0355] Input / output interface 3508 represents the functionality that allows a user to input commands and information into computing device 3502 and allows information to be presented to the user and / or other components or devices using various input / output devices. Examples of input devices include keyboards, cursor control devices (e.g., mice), microphones, scanners, touch functionality (e.g., capacitive or other sensors configured to detect physical touch), cameras (e.g., capable of recognizing motion as non-touch gestures using visible or non-visible light wavelengths such as infrared frequencies), and so on. Examples of output devices include display devices (e.g., monitors or projectors), speakers, printers, network interface cards, haptic-responsive devices, and so on. Therefore, as further described below, computing device 3502 can be configured in various ways to support user interaction.

[0356] Various technologies can be described in this document within the general context of software, hardware components, or program modules. Generally, these modules include routines, programs, objects, elements, components, data structures, etc., that perform specific tasks or implement specific abstract data types. As used herein, the terms “module,” “function,” and “component” generally represent software, firmware, hardware, or a combination thereof. The technical features described herein are platform-independent, meaning that these technologies can be implemented on a variety of commercial computing platforms with various processors.

[0357] For example, the terms "module," "function," and "component" can include hardware and / or software systems that operate to perform one or more functions. For instance, a module, function, or component can include a computer processor, controller, or other logic-based device that performs operation based on instructions stored on a tangible and non-transitory computer-readable storage medium, such as computer memory. Alternatively, a module, function, or component can include a hard-wired device that performs operation based on hard-wired logic. The various modules, systems, and components illustrated in the figures can represent hardware that operates based on software or hard-wired instructions, software that directs the hardware to perform operation, or a combination thereof.

[0358] Implementations of the described modules and technologies may be stored on or transmitted via some form of computer-readable medium. The computer-readable medium may include a variety of media accessible by the computing device 3502. For example (but not limited to), the computer-readable medium may include a "computer-readable storage medium" and a "computer-readable signal medium."

[0359] In contrast to mere signal transmission, carrier waves, or the signal itself, "computer-readable storage medium" can refer to media and / or devices that enable the persistent and / or non-transitory storage of information. Therefore, a computer-readable storage medium is a non-signal-bearing medium. Computer-readable storage media include hardware such as volatile and non-volatile, removable and non-removable media, and / or storage devices implemented in a manner or technique suitable for storing information such as computer-readable instructions, data structures, program modules, logic elements / circuits, or other data. Examples of computer-readable storage media may include, but are not limited to, RAM, ROM, EEPROM, flash memory or other storage technologies, CD-ROM, digital universal disk (DVD) or other optical storage, hard disk, magnetic tape cassette, magnetic tape, disk storage or other magnetic storage devices, or other storage devices, tangible media, or articles of art suitable for storing desired information and accessible by a computer.

[0360] "Computer-readable signal medium" can refer to a signal-bearing medium configured to transmit instructions to computing device 3502 hardware (such as via a network). Signal media can typically embody computer-readable instructions, data structures, program modules, or other data in modulated data signals (such as carrier waves, data signals, or other transmission mechanisms). Signal media also includes any information transmission medium. The term "modulated data signal" means a signal having one or more characteristics that are set or altered in a manner that encodes information in the signal. For example (but not limited to), communication media include wired media (such as wired networks or direct wired connections) and wireless media (such as acoustic, radio frequency, infrared, and other wireless media).

[0361] As previously described, hardware element 3510 and computer-readable medium 3506 represent modules, programmable device logic, and / or fixed device logic implemented in hardware, which in some embodiments can be used to implement at least some aspects of the techniques described herein, such as executing one or more instructions. The hardware may include components of integrated circuits or systems-on-a-chip, application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), complex programmable logic devices (CPLDs), and other implementations in silicon or other hardware. In this context, the hardware may operate as a processing device that executes program tasks defined by logic embodied by instructions and / or hardware, and utilizes hardware for storing execution instructions, such as the computer-readable storage medium described above.

[0362] The foregoing combinations can also be applied to implement the various techniques described herein. Therefore, software, hardware, or implementable modules can be implemented as one or more instructions and / or logic embodied on some form of computer-readable storage medium and / or embodied by one or more hardware elements 3510. The computing device 3502 can be configured to implement specific instructions and / or functions corresponding to software and / or hardware modules. Therefore, implementation of a module that can be implemented by the computing device 3502 as software can be at least partially achieved in hardware, for example, by using a computer-readable storage medium and / or hardware elements 3510 of the processing system 3504. Instructions and / or functions can be implemented / operated by one or more articles of manufacture (e.g., one or more computing devices 3502 and / or processing systems 3504) to implement the techniques, modules, and examples described herein.

[0363] The techniques described herein can be supported by various configurations of computing device 3502, and are not limited to specific instances of the techniques described herein. As described below, this functionality can also be implemented, in whole or in part, using distributed systems, such as via platform 3516 on “cloud” 3514.

[0364] Cloud 3514 includes and / or represents platform 3516 for resource 3518, shown as including sequencing data processor 3206. Platform 3516 abstracts the underlying functionality of the hardware (e.g., server) and software resources of cloud 3514. Resource 3518 may include applications and / or data that can be utilized when computer processing is performed on a server remote from computing device 3502. Resource 3518 may also include services provided via the Internet and / or via subscriber networks such as cellular or Wi-Fi networks.

[0365] Platform 3516 can abstract resources and functions to connect computing device 3502 to other computing devices. Platform 3516 can also be used to abstract resource scaling to provide a corresponding level of scale for the demands encountered by resource 3518 implemented via platform 3516. Therefore, in interconnect device embodiments, the implementation of the functions described herein can be distributed throughout system 3500. For example, the function can be implemented partly on computing device 3502 and partly via platform 3516, which abstracts the functions of cloud 3514.

[0366] In addition to the embodiments explicitly described herein, it should be understood that all features disclosed in this disclosure can be combined in any combination (e.g., permutations, combinations). Each element disclosed in this disclosure may be replaced by an alternative feature having the same, equivalent, or similar purpose. Therefore, unless otherwise expressly stated, each disclosed feature is merely an instance of a series of general equivalent or similar features.

[0367] Based on the above description, those skilled in the art can readily identify the essential features of the present invention, and various changes and modifications can be made to adapt it to various uses and conditions without departing from its spirit and scope. Therefore, other embodiments are also within the scope of the claims.

[0368] Equivalents and scope

[0369] Articles such as “a,” “an,” and “the” may mean one or more, unless otherwise stated or obvious from the context. An embodiment or description including “or” among one or more members of the group is considered satisfied if one, more than one, or all members of the group are present, used in, or associated with a given product or process, unless otherwise stated or obvious from the context. This invention includes embodiments in which exactly one member of the group is present, used in, or otherwise associated with a given product or process. This invention also includes embodiments in which more than one or all members of the group are present, used in, or associated with a given product or process.

[0370] Furthermore, this disclosure covers all variations, combinations, and permutations in which one or more limitations, elements, clauses, and descriptive terms from one or more of the listed claims are incorporated into another claim. For example, any claim relying on another claim may be modified to include one or more limitations found in any other claim relying on the same basic claim. If elements are presented in list form, such as in Markush group format, each subgroup of elements is also disclosed, and any element may be removed from the group. It should be understood that, generally, when referring to the invention or aspects of the invention as including specific elements and / or features, certain embodiments or aspects of this disclosure consist of or are substantially composed of these elements and / or features. For simplicity, these embodiments are not listed verbatim herein. It should also be noted that the terms “comprising” and “containing” are intended to be open and allow for the inclusion of additional elements or steps. If a range is given, endpoints are included. Furthermore, unless otherwise stated or apparent from the context and understanding of one of ordinary skill in the art, values ​​expressed as ranges may take any specific value or subrange within that range in different embodiments of the invention, up to one-tenth of the lower limit unit of the range, unless the context expressly specifies otherwise.

[0371] This application relates to various granted patents, published patent applications, journal articles, and other publications, all of which are incorporated herein by reference. In the event of any conflict between any incorporated reference and this specification, this specification shall prevail. Furthermore, any particular embodiment of the invention falling within the prior art may be expressly excluded from any one or more embodiments. Since such embodiments are considered to be known to those skilled in the art, they may be excluded even if not expressly stated herein. Any particular embodiment of the invention may be excluded from any embodiment for any reason, whether or not related to the presence of prior art.

[0372] Using only conventional experiments, those skilled in the art will recognize or be able to identify many equivalents of the specific embodiments of the invention described herein. The scope of the embodiments described herein is not intended to be limited to the foregoing description, but rather to those as set forth in the appended embodiments. Those skilled in the art will understand that various changes and modifications can be made to this description without departing from the spirit or scope of the invention as defined in the following embodiments.

[0373] Example

[0374] Without departing from the scope of this document, the features described above and the features claimed below can be combined in various ways. The following examples illustrate some possible, non-limiting combinations:

[0375] This disclosure also provides support for a system comprising: a classification module implemented in a non-transitory computer-readable storage medium and configured to: classify a sample as positive or negative for circulating tumor DNA based on a dynamic probability score relative to a threshold, the dynamic probability score being determined using a specificity factor based on targeted cell-free DNA sequencing data of the sample, the specificity factor including the number of mutated double strands in the sample carrying a mutation determined in the sample and the total number of putative double strands in the sample that have been detected by mutation, the dynamic probability score being further based on: a first likelihood that the number of mutated double strands is entirely derived from the tumor, and a second likelihood that the number of mutated double strands is a spontaneous error or a mutation of non-cancer origin, and outputting a classification of the sample. In a first instance of the system, targeted cell-free DNA sequencing data is enriched for DNA double strands having mutation sites found in a tumor fingerprint. In a second instance of the system (optionally including the first instance), the tumor fingerprint is associated with the patient from whom the sample was obtained. In a third instance of the system (optionally including one or both of the first and second instances), the targeted cell-free DNA sequencing data is further enriched for mutated DNA duplexes with additional mutation sites found in tumor fingerprints associated with other patients. In a fourth instance of the system (optionally including one or each of the first to third instances), the tumor fingerprint is associated with patients other than the patient from whom the sample was obtained. In a fifth instance of the system (optionally including one or each of the first to fourth instances), the targeted cell-free DNA sequencing data is enriched for DNA duplexes using MAESTRO enrichment. In a sixth instance of the system (optionally including one or each of the first to fifth instances), the dynamic probability score is further based on the background mutation frequency. In a seventh instance of the system (optionally including one or each of the first to sixth instances), the background mutation frequency is a fixed value, a context-specific value, or a sample-specific value. In an eighth instance of the system (optionally including one or each of the first to seventh instances), the first likelihood and the second likelihood are binomial likelihoods. In a ninth instance of the system (optionally including one or more of the first to eighth instances), the first likelihood is a binomial likelihood, and the second likelihood is a β-binomial likelihood. In a tenth instance of the system (optionally including one or more of the first to ninth instances), the system further includes a quantization and analysis module implemented in a non-transitory computer-readable storage medium and configured to: determine the number of mutant double strands in the sample carrying the mutations determined in the sample, and estimate the total number of assumed double strands in the sample determined by such mutations.In the eleventh instance of the system (optionally including one or more of the first to tenth instances), the total number of mutation-detected presumed duplexes in the sample is estimated using a predetermined number of the lowest enrichment sites of such mutations. In the twelfth instance of the system (optionally including one or more of the first to eleventh instances), the total number of mutation-detected presumed duplexes in the sample is estimated using a control probe designed to be unbiased against mutant alleles relative to wild-type alleles. In the thirteenth instance of the system (optionally including one or more of the first to twelfth instances), the classification module is further configured to: output a first likelihood via a first likelihood model using the number of mutant duplexes in the sample and the total number of mutation-detected presumed duplexes in the sample; and output a second likelihood via a second likelihood model using the number of mutant duplexes in the sample and the total number of mutation-detected presumed duplexes in the sample. In the fourteenth instance of the system (optionally including one or more of the first to thirteenth instances), the first likelihood model comprises a binomial distribution that models the number of mutant double strands in the sample and the total number of assumed double strands detected by mutation in the sample, given a probability parameter based on an assumed background mutation frequency. In the fifteenth instance of the system (optionally including one or more of the first to fourteenth instances), the second likelihood model comprises a binomial distribution that models the number of mutant double strands in the sample and the total number of assumed double strands detected by mutation in the sample, given an assumed background mutation frequency. In the sixteenth instance of the system (optionally including one or more of the first to fifteenth instances), the specificity factor further comprises a mutation context-specific factor, including a background mutation frequency adjusted for multiple mutation contexts. In the seventeenth instance of the system (optionally including one or more of the first to sixteenth instances), the classification module is further configured to: output a first likelihood via a first likelihood model using the number of mutant double strands in a sample for a given mutation context and the total number of mutation-detected hypothetical double strands in the sample for that given mutation context, the first likelihood corresponding to the product of the respective first likelihoods of the plurality of mutation contexts; and output a second likelihood via a second likelihood model using the number of mutant double strands in a sample for a given mutation context, the total number of mutation-detected hypothetical double strands in the sample for that given mutation context, the observed background mutation frequency in the sample for that given mutation context, and the observed number of background bases in the sample for that given mutation context, the second likelihood corresponding to the product of the respective second likelihoods of the plurality of mutation contexts.In the eighteenth instance of the system (optionally including one or more of the first to seventeenth instances), the first likelihood model is a binomial model. In the nineteenth instance of the system (optionally including one or more of the first to eighteenth instances), the second likelihood model is a β-binomial model. In the twentieth instance of the system (optionally including one or more of the first to nineteenth instances), the classification module is further configured to output a dynamic probability score via Bayesian probability calculation using the first and second likelihoods. In the twenty-first instance of the system (optionally including one or more of the first to twentieth instances), in order to classify a sample as circulating tumor DNA positive or negative based on the dynamic probability score relative to a threshold, the classification module is further configured to classify the sample as circulating tumor DNA positive in response to the dynamic probability score being greater than or equal to the threshold, and to classify the sample as circulating tumor DNA negative in response to the dynamic probability score being less than the threshold.

[0376] This disclosure also provides support for a method comprising: receiving targeted cell-free DNA sequencing data from a patient sample, the targeted cell-free DNA sequencing data being enriched for DNA duplexes having mutation sites found in tumor fingerprints; determining a dynamic probability fraction of minimal residual disease (MRD) status of the sample based on a specificity factor, the specificity factor including the number of mutant duplexes in the sample and the total number of putative duplexes in the sample determined by the mutation sites; and outputting a first likelihood via a first likelihood model, the first likelihood model using the number of mutant duplexes in the sample and the total number of putative duplexes determined by the mutation sites. The method involves determining the total number of hypothetical double strands at the mutation site, where the first likelihood is the number of mutant double strands entirely derived from the tumor; outputting a second likelihood via a second likelihood model using the number of mutant double strands in the sample and the total number of hypothetical double strands determined at the mutation site, where the second likelihood is the number of mutant double strands either a spontaneous error or a non-cancer-originating mutation; determining the dynamic probability score based on the first and second likelihoods; classifying the MRD status of the sample as MRD positive or MRD negative according to the dynamic probability score relative to a threshold; and outputting the classified MRD status. In a first embodiment of the method, classifying the MRD status of the sample as MRD positive or MRD negative according to the dynamic probability score relative to a threshold includes: classifying the MRD status of the sample as MRD positive in response to a dynamic probability score greater than or equal to a threshold, or classifying the MRD status of the sample as MRD negative in response to a dynamic probability score less than a threshold. In a second embodiment of the method (optionally including the first embodiment), the specificity factor further includes a mutation context-specific factor. In a third example of the method (optionally including one or both of the first and second examples), the mutation context-specific factor comprises the frequency of background mutations observed in samples of various mutation contexts for multiple mutation contexts targeting the mutation site and the number of background bases observed in the sample. In a fourth example of the method (optionally including one or more of the first to third examples), the method further comprises adjusting the parameters of the first and second likelihood models based on the mutation context-specific factor. In a fifth example of the method (optionally including one or more of the first to fourth examples), the first and second likelihood models further utilize the background mutation frequency. In a sixth example of the method (optionally including one or more of the first to fifth examples), the background mutation frequency is a fixed value, a context-specific value, or a sample-specific value. In a seventh example of the method (optionally including one or more of the first to sixth examples), targeted cell-free DNA sequencing data are further enriched for additional DNA duplexes with additional mutation sites found in additional tumor fingerprints.In the eighth instance of the method (optionally including one or more of the first through seventh instances), the first likelihood model and the second likelihood model are binomial models. In the ninth instance of the method (optionally including one or more of the first through eighth instances), the first likelihood model is a binomial model, and the second likelihood model is a β-binomial model. In the tenth instance of the method (optionally including one or more of the first through ninth instances), determining the dynamic probability score based on the first likelihood and the second likelihood is included in the Bayesian probability calculation using the first likelihood and the second likelihood.

[0377] (A1) A method for determining whether a patient sample obtained from a patient is positive for minimal residual disease (MRD) using targeted sequencing assays targeting mutations in a set of patient-specific tumor mutations, the method comprising:

[0378] To execute using at least one computer hardware processor:

[0379] Sequencing data of the patient was obtained, which had previously been obtained by sequencing the patient sample using probes designed to enrich DNA duplexes with mutations in the aforementioned set of patient-specific tumor mutations.

[0380] Using the sequencing data to estimate

[0381] (i) the number of mutant DNA double strands in the patient sample, and

[0382] (ii) The total number of DNA double strands detected for mutation in the patient sample;

[0383] The posterior probability of the patient sample being MRD-positive is determined using the number of mutated DNA duplexes in the patient sample and the total number of DNA duplexes detected for mutation. This determination includes:

[0384] The number of mutant DNA duplexes in the total number of DNA duplexes was determined using a first statistical model, the number of mutant DNA duplexes, and the total number of DNA duplexes, and was determined entirely from the first likelihood of the tumor.

[0385] Using a second statistical model, the number of the mutant DNA duplexes and the total number of the DNA duplexes, a second likelihood is determined that the number of mutant DNA duplexes in the total number of DNA duplexes is a spontaneous error or a mutation of non-cancer origin; and

[0386] The first likelihood and the second likelihood are used to determine the posterior probability that the patient sample is MRD positive;

[0387] When the posterior probability that the determined patient sample is MRD positive is lower than a threshold.

[0388] The patient sample was determined to be MRD negative; and

[0389] When the posterior probability that the determined patient sample is MRD positive exceeds the threshold.

[0390] The patient's sample was confirmed to be MRD positive.

[0391] (A2) A method for determining whether a patient sample obtained from a patient contains circulating tumor DNA (ctDNA) using targeted sequencing assays targeting mutations in a set of patient-specific tumor mutations, the method comprising:

[0392] To execute using at least one computer hardware processor:

[0393] Sequencing data of the patient was obtained, which had previously been obtained by sequencing the patient sample using probes designed to enrich DNA duplexes with mutations in the aforementioned set of patient-specific tumor mutations.

[0394] Using the sequencing data to estimate

[0395] (i) the number of mutant DNA double strands in the patient sample, and

[0396] (ii) The total number of DNA double strands detected for mutation in the patient sample;

[0397] The posterior probability of the patient sample being MRD-positive is determined using the number of mutated DNA duplexes in the patient sample and the total number of DNA duplexes detected for mutation. This determination includes:

[0398] Assuming the patient sample contains ctDNA, a first likelihood of the number of mutant DNA duplexes observed in the total number of DNA duplexes is determined using a first statistical model, the number of mutant DNA duplexes, and the total number of DNA duplexes.

[0399] Assuming the patient sample does not contain ctDNA, a second likelihood is determined using a second statistical model, the number of the mutant DNA duplexes, the total number of the DNA duplexes, and the estimated error probability to determine the number of the mutant DNA duplexes in the total number of DNA duplexes; and

[0400] The first likelihood and the second likelihood are used to determine the posterior probability that the patient sample contains ctDNA;

[0401] When the posterior probability is lower than the threshold

[0402] It was determined that the patient sample did not contain ctDNA; and

[0403] When the posterior probability exceeds the threshold

[0404] The patient sample was determined to contain ctDNA.

[0405] (A3) For methods denoted as (A1) or (A2), wherein the sequencing is performed using the targeted sequencing assay, the sequencing comprising:

[0406] (i) Obtaining a DNA double helix;

[0407] (ii) A unique molecular identifier (UMI) is attached to the 5' and 3' ends of the DNA duplex to produce a labeled duplex, wherein each UMI is unique to each labeled duplex;

[0408] (iii) The labeled duplex is amplified by polymerase chain reaction (PCR) to produce an amplified duplex;

[0409] (iv) Denature the labeled duplex to produce single-stranded amplified DNA;

[0410] (v) Use an allele-specific probe annealed to one of the patient-specific tumor mutations in the group of mutations to capture single-stranded amplified DNA with the mutation to produce an enriched sample.

[0411] (vi) Sequencing the enriched samples; and

[0412] (vii) If a mutation is observed in one of the group of patient-specific tumor mutations in the two strands of the marker duplex identified by UMI, the presence of a mutated DNA duplex is identified.

[0413] (A4) For the method represented as (A3), a plurality of allele-specific probes (each probe being specific to a different mutation site) are annealed with one or more mutation sites from the set of patient-specific tumor mutations to produce the enriched sample.

[0414] (A5) For the method represented as (A4), each of the plurality of allele-specific probes is specific to a mutation site found in a set of patient-specific tumor mutations originating from their respective different patients, or the plurality of allele-specific probes originating from different patients are pooled and applied to a single patient sample.

[0415] (A6) For methods denoted as (A1) or (A2), wherein a probe with limited selectivity for mutant DNA relative to non-mutant DNA is used to perform an estimation of the total number of mutation-detected DNA duplexes in the patient sample.

[0416] (A7) For methods denoted as (A1) or (A2), wherein the number of thresholds for the lowest enrichment sites is used to perform the estimation of the total number of mutation-detected DNA duplexes in the patient sample.

[0417] (A8) For the method represented as (A7), the lowest enrichment site contains mutated and non-mutated DNA duplexes captured by the probe.

[0418] (A9) For methods denoted as (A1) or (A2), wherein a subset of control probes that are not specific to the patient-specific tumor mutations is used and the values ​​of mutated and wild-type DNA duplexes are determined based on sequencing of the subset of control probes, thereby performing an estimation of the total number of mutation-detected DNA duplexes in the patient sample.

[0419] (A10) For the method represented as (A9), it further includes multiplying the number of patient-specific tumor mutations determined by the average or median value of mutated and non-mutated DNA duplexes at the subset of indicator sites to estimate the total number of mutation-detected DNA duplexes in the patient sample.

[0420] (A11) For any of the methods represented by (A1) to (A10), wherein the posterior probability that the patient sample is MRD positive is determined according to the following:

[0421]

[0422] in It is the first likelihood and It is the second likelihood.

[0423] Where M+ indicates a positive MRD status, and This represents its prior probability.

[0424] Where M- indicates a positive MRD status, and Let represent its prior probability, and

[0425] Where D represents sequencing data.

[0426] (A12) For any of the terms (A1) to (A10), wherein the first statistical model is a binomial model.

[0427] (A13) For a method represented as any one of (A1) to (A10), wherein the first likelihood is determined as:

[0428] binomial

[0429] Where #double-stranded DNA represents the number of mutated DNA double-stranded DNA samples in the patient's sample.

[0430] Where #total duplexes refers to the total number of DNA duplexes detected for mutation in the patient sample.

[0431] Where t = max ((#ALT) / (#total), background mutation frequency).

[0432] Where M+ indicates MRD positive status, and D indicates sequencing data.

[0433] (A14) For the method represented as (A13), it further includes setting the background mutation frequency as:

[0434] (a) The same value for all mutations;

[0435] (b) Different background mutation frequencies of mutations in different mutation contexts; or

[0436] (c) Values ​​measured empirically on a per-sample basis or different values ​​measured empirically for different mutation contexts.

[0437] (A15) For the method represented as (A14), wherein when the background mutation frequency is set to a value empirically measured on a per-sample basis or a different value empirically measured for different mutation contexts, the method further comprises using pooled probe testing to empirically determine the value or these values, or using targeted duplex sequencing on the patient sample to empirically determine the value or these values.

[0438] (A16) For a method represented as any one of (A1) to (A12), wherein the first likelihood is determined as the product of the context-specific and sample-specific likelihoods of a set of discrete mutation contexts.

[0439] (A17) For the method denoted as (A15), the first likelihood is determined as:

[0440]

[0441] .

[0442] in It is a discrete mutation context set.

[0443] Where c represents a mutation context in the discrete mutation context group.

[0444] Where #double strands represent the number of mutant DNA double strands in the mutation context c of the patient sample.

[0445] Where #total duplexes is the total number of DNA duplexes detected under mutation context c, and

[0446] Where TFx is the estimated tumor score for the patient sample.

[0447] (A18) For any of the methods represented by (A1) to (A17), wherein the second statistical model is a binomial model.

[0448] (A19) For the method denoted as (A18), the second likelihood is determined as:

[0449]

[0450] Where #double-stranded DNA represents the number of mutated DNA double-stranded DNA samples in the patient's sample.

[0451] Where #total duplexes refers to the total number of DNA duplexes detected for mutation in the patient sample.

[0452] The error rate is the frequency of background mutations, and

[0453] Where M- indicates MRD negative status and D indicates sequencing data.

[0454] (A20) For the method represented as (A19), it further includes setting the background mutation frequency as:

[0455] (a) The same value for all mutations;

[0456] (b) Different background mutation frequencies of mutations in different mutation contexts; or

[0457] (c) Values ​​measured empirically on a per-sample basis or different values ​​measured empirically for different mutation contexts.

[0458] (A21) For the method denoted as (A20), wherein when the background mutation frequency is set to a value empirically measured on a per-sample basis or a different value empirically measured for different mutation contexts, the method further comprises using pooled probe testing to empirically determine the value or these values, or using targeted duplex sequencing on the patient sample to empirically determine the value or these values.

[0459] (A22) For a method represented as any one of (A1) to (A12), wherein the second likelihood is determined as a product of context-specific likelihoods of a set of discrete mutation contexts to provide context-specific noise tuning.

[0460] (A23) For methods denoted as (A1) or (A2), the second statistical model is a β-binomial model.

[0461] (A24) For a method denoted as (A22) or (A23), wherein the second likelihood is determined as a product of β-binomial likelihoods, the likelihoods being determined for each background using a β-binomial model having a context-specific prior based on the frequency of context-specific mutations.

[0462] (A25) For the method denoted as (A24), the context-specific background mutation frequency is determined using the sample-specific SNV frequency measured by the targeted sequencing assay.

[0463] (A26) For a method represented as any one of (A22) to (A24), wherein the second likelihood is determined as:

[0464]

[0465]

[0466] Where βBinom represents the β-binomial model.

[0467] in It is a discrete mutation context set.

[0468] Where c represents a mutation context in the discrete mutation context group.

[0469] in# It refers to the number of mutant DNA double strands in the mutation context c in the patient sample.

[0470] in# It is the total number of DNA double strands detected under mutation context c.

[0471] in ,

[0472] in And, arbitrarily, ,

[0473] in

[0474] -

[0475] in .

[0476] (A27) For any of the methods represented by (A1) to (A26), wherein the set of patient-specific tumor mutations is determined by:

[0477] (i) Sequencing of tumor genomic DNA from a tumor derived from a patient to identify multiple mutation sites in the tumor genomic DNA; and

[0478] (ii) Processing and filtering the plurality of mutation sites to identify at least one selected mutation site associated with the patient.

[0479] (A28) For the method represented as (A27), wherein processing and filtering the plurality of mutation sites comprises:

[0480] Select at least one of the plurality of mutation sites;

[0481] Analyze the at least one selected mutation site in the matched normal genomic DNA;

[0482] Analyze the at least one selected mutation site in the matched tumor genomic DNA;

[0483] For the at least one selected mutation site, determine a first number of mutant double strands in the matching normal genomic DNA;

[0484] For the at least one selected mutation site, determine a second number of mutant double strands in the matching tumor genomic DNA;

[0485] Determine the ratio of mutant double-stranded molecules to mutant single-stranded molecules in the matched tumor genomic DNA; and

[0486] Determine whether the first number of mutant double strands in the matched normal genomic DNA is zero, whether the second number of mutant double strands in the matched tumor genomic DNA is greater than zero, and whether the ratio of mutant double strands to mutant single strands in the matched tumor genomic DNA is greater than 0.15.

[0487] (A29) For the method represented as (A28), wherein the at least one selected mutation site is identified for consideration for minimal residual disease detection if the first number of mutant double strands in the matched normal genomic DNA is zero, the second number of mutant double strands in the matched tumor genomic DNA is greater than zero, and the ratio of mutant double strands to mutant single strands in the matched tumor genomic DNA is greater than 0.15.

[0488] (A30) For the method represented as (A28), wherein if the first number of mutant double strands in the matched normal genomic DNA is not zero, the second number of mutant double strands in the matched tumor genomic DNA is not greater than zero, and the ratio of mutant double strands to mutant single strands in the matched tumor genomic DNA is not greater than 0.15, then the at least one selected mutation site is not identified for consideration for minimal residual disease detection, and then multiple criteria are examined to determine whether the detection is affected by germline or somatic factors.

[0489] (A31) For the method represented as (A30), the plurality of criteria includes one or more of the following: tumor validation, the at least one selected mutation site is not in a matching set of patient-specific tumor mutations in the patient sample, a first number of mutant double strands in the matching normal genomic DNA is zero, a second number of mutant double strands in the matching tumor genomic DNA is zero, and the ratio of mutant double strands to mutant single strands in the matching tumor genomic DNA is greater than 0.15.

[0490] (A32) For any of the methods represented by (A1) to (A31), wherein the patient sample is a biological sample.

[0491] (A33) For the method represented as (A32), the biological sample is a blood sample.

[0492] (A34) For any of the methods represented by (A1) to (A31), wherein the patient sample is derived from a liquid biopsy.

[0493] (A35) For any of the methods represented by (A1) to (A34), wherein the patient has cancer or has had cancer.

[0494] (A36) For the method represented as (A35), the cancer is blood or lymphoma, bone or soft tissue cancer, brain or central nervous system cancer, breast cancer, childhood cancer, digestive system cancer, eye cancer, head and neck cancer, lung cancer, pelvic cancer, skin cancer, or urinary tract cancer.

[0495] (A37) For the method represented as (A36), the cancer is glioblastoma.

[0496] (A38) For the method represented as (A37), the cancer is melanoma.

[0497] (A39) For any of the methods represented by (A1) to (A38), wherein the sequencing is next-generation sequencing (NGS).

[0498] (A40) For methods denoted as (A1) or (A2), the following is further included:

[0499] Before determining the posterior probability that the patient sample is MRD positive.

[0500] The patient was treated according to the initial treatment plan;

[0501] In response to determining that the patient sample is MRD positive, the initial treatment plan is modified to create an updated treatment plan; and

[0502] The patient was treated according to the updated treatment plan.

[0503] (A41) For the method denoted as (A40), where:

[0504] The initial treatment plan includes administering a first drug to the patient at a first dose.

[0505] Changing the initial treatment plan includes increasing the first dose to a second dose, and

[0506] The updated treatment plan includes administering the first drug to the patient at the second dose.

[0507] (A42) For the method denoted as (A40), where:

[0508] The initial treatment plan includes performing surgery on the patient;

[0509] Changing the treatment plan includes identifying the need for further therapy; and

[0510] The updated treatment plan includes administering the further therapy to the patient.

[0511] (A43) For the method represented as (A42), the further therapy includes adjuvant chemotherapy.

[0512] (A44) For the method denoted as (A40), where:

[0513] The initial treatment plan includes administering a first drug to the patient and / or performing surgery on the patient.

[0514] Changing the initial treatment plan includes determining whether to administer a second drug to the patient, and

[0515] The updated treatment plan includes administering the second drug to the patient.

[0516] (A45) For methods denoted as (A1) or (A2), the following is further included:

[0517] Before determining the posterior probability that the patient sample is MRD negative.

[0518] The patient was treated according to the initial treatment plan;

[0519] In response to determining that the patient sample is MRD negative, the initial treatment plan is modified to create an updated treatment plan; and

[0520] The patient was treated according to the updated treatment plan.

[0521] (A46) For the method denoted as (A45), where:

[0522] The initial treatment plan includes administering a first drug to the patient at a first dose.

[0523] Changing the initial treatment plan includes reducing the first dose to a second dose, and

[0524] The updated treatment plan includes administering the first drug to the patient at the second dose.

[0525] (A47) For the method denoted as (A45), where:

[0526] The initial treatment plan includes administering a first drug to the patient.

[0527] Changing the initial treatment plan includes removing the first drug from the initial treatment plan, and

[0528] The updated treatment plan includes treating the patient without administering the first drug or discontinuing treatment for the patient.

[0529] (A48) For any of the methods represented by (A1) to (A47), wherein MRD is detected between at least 0.10 and at least 10.0 parts per million (ppm) of tumor-derived cell-free DNA.

[0530] (A49) For the method expressed as (A48), wherein MRD is detected at a concentration of at least 0.10, at least 0.15, at least 0.20, at least 0.25, at least 0.30, at least 0.35, at least 0.40, at least 0.45, at least 0.50, at least 0.55, at least 0.60, at least 0.65, at least 0.70, at least 0.75, at least 0.80, at least 0.85, at least 0.90, at least 0.95, or at least 1.00 ppm.

[0531] (A50) For any of the methods represented by (A1) to (A49), wherein MRD is detected with a specificity of at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100%.

[0532] (B1) A system comprising:

[0533] At least one computer hardware processor; and

[0534] At least one non-transitory computer-readable storage medium, the storage processor of which is executable with instructions, when executed by the at least one computer hardware processor, causing the at least one computer hardware processor to perform a method represented as any one of (A1) to (A39) or (A48) to (A50).

[0535] (C1). At least one non-transitory computer-readable storage medium, the storage processor of which is executable instructions, when executed by at least one computer hardware processor, to cause the at least one computer hardware processor to perform a method represented as any one of (A1) to (A39) or (A48) to (A50).

Claims

1. A system comprising: A classification module, implemented in a non-transitory computer-readable storage medium and configured to: Samples are classified as positive or negative for circulating tumor DNA based on a dynamic probability score relative to a threshold. This dynamic probability score is determined using a specificity factor based on targeted cell-free DNA sequencing data from the sample. The specificity factor includes the number of mutated double strands carrying mutations detected in the sample and the total number of probable double strands detected by mutation in the sample. The dynamic probability score is further based on: The number of the mutant duplexes is derived entirely from the first likelihood of the tumor; and The number of the mutated double strands is the second likelihood of spontaneous errors or non-cancer-derived mutations; and Output the classification of the sample.

2. The system of claim 1, wherein the targeted cell-free DNA sequencing data is enriched for DNA duplexes with mutation sites found in tumor fingerprints.

3. The system of claim 2, wherein the tumor fingerprint is associated with a patient from whom the sample was obtained.

4. The system of claim 3, wherein the targeted cell-free DNA sequencing data is further enriched for mutated DNA duplexes having additional mutation sites found in tumor fingerprints associated with other patients.

5. The system of claim 2, wherein the tumor fingerprint is associated with a patient other than the patient from whom the sample was obtained.

6. The system according to claims 2 to 5, wherein the targeted cell-free DNA sequencing data is enriched against the DNA duplex using MAESTRO enrichment.

7. The system of claim 1, wherein the dynamic probability score is further based on the background mutation frequency.

8. The system of claim 7, wherein the background mutation frequency is a fixed value, a context-specific value, or a sample-specific value.

9. The system according to claim 1, wherein the first likelihood and the second likelihood are binomial likelihoods.

10. The system of claim 1, wherein the first likelihood is a binomial likelihood and the second likelihood is a β-binomial likelihood.

11. The system of claim 1, further comprising a quantization and analysis module, the quantization and analysis module being implemented in the non-transitory computer-readable storage medium and configured to: Determine the number of mutant double strands carrying the mutation detected in the sample; and Estimate the total number of hypothetical double strands detected by the mutation in the sample.

12. The system of claim 11, wherein the total number of assumed duplexes detected by the mutation in the sample is estimated using a predetermined number of the lowest enrichment sites of the mutation.

13. The system of claim 11, wherein the total number of assumed double strands detected by the mutation in the sample is estimated using a control probe designed to be unbiased against mutant alleles and wild-type alleles.

14. The system according to claim 1, wherein the classification module is further configured to: The first likelihood is output via a first likelihood model using the number of mutant double strands in the sample and the total number of assumed double strands detected by the mutation in the sample; and The second likelihood is output via a second likelihood model that uses the number of mutant double strands in the sample and the total number of assumed double strands detected by the mutation in the sample.

15. The system of claim 14, wherein the first likelihood model comprises a binomial distribution that models the number of mutated double strands in the sample and the total number of assumed double strands in the sample detected by the mutation, given a probability parameter based on the assumed background mutation frequency.

16. The system of claim 14, wherein the second likelihood model comprises a binomial distribution that models the number of mutated double strands in the sample at a given assumed background mutation frequency and the total number of assumed double strands in the sample detected by the mutation.

17. The system of claim 1, wherein the specificity factor further comprises a mutation context specificity factor, including background mutation frequencies adjusted for multiple mutation contexts.

18. The system of claim 17, wherein the classification module is further configured to: The first likelihood is output via a first likelihood model, which uses the number of mutated double strands in the sample of a given mutation context and the total number of assumed double strands detected by the mutation in the sample of the given mutation context. The first likelihood corresponds to the product of the first likelihoods of the plurality of mutation contexts; and The second likelihood is output via a second likelihood model, which uses the number of mutant double strands in the sample of the given mutation context of the plurality of mutation contexts, the total number of assumed double strands detected by the mutation in the sample of the given mutation context, the frequency of background mutations observed in the sample of the given mutation context, and the number of background bases observed in the sample of the given mutation context. The second likelihood corresponds to the product of the second likelihoods of the plurality of mutation contexts.

19. The system of claim 18, wherein the first likelihood model is a binomial model.

20. The system of claim 18, wherein the second likelihood model is a β-binomial model.

21. The system according to claim 1, wherein the classification module is further configured to: The dynamic probability score is output by calculating the Bayesian probability using the first likelihood and the second likelihood.

22. The system of claim 1, wherein, in order to classify the sample as positive or negative for circulating tumor DNA based on the dynamic probability score relative to the threshold, the classification module is further configured to: The sample is classified as positive for circulating tumor DNA in response to the dynamic probability score being greater than or equal to the threshold; and The sample is classified as negative for circulating tumor DNA in response to the dynamic probability score being less than the threshold.

23. A method comprising: Receive targeted cell-free DNA sequencing data from a patient sample, the targeted cell-free DNA sequencing data being enriched for DNA duplexes with mutation sites found in tumor fingerprints; The dynamic probability fraction of the minimal residual disease (MRD) status of the sample is determined based on a specificity factor, which includes the number of mutant double strands in the sample and the total number of putative double strands detected at the mutation site in the sample, and is determined as follows: The number of mutant duplexes is output as a first likelihood model using the number of mutant duplexes and the total number of assumed duplexes detected by the mutation site in the sample, and is entirely derived from the first likelihood of the tumor. The number of mutant duplexes is output as a second likelihood of spontaneous error or non-cancer-derived mutations by a second likelihood model using the number of mutant duplexes and the total number of assumed duplexes detected by the mutation site in the sample; as well as The dynamic probability score is determined based on the first likelihood and the second likelihood. The MRD status of the sample is classified as MRD positive or MRD negative based on the dynamic probability score relative to a threshold. as well as Output the classified MRD status.

24. The method of claim 23, wherein classifying the MRD status of the sample as MRD positive or MRD negative based on the dynamic probability score relative to the threshold comprises: The MRD status of the sample is classified as MRD positive in response to the dynamic probability score being greater than or equal to the threshold. or The MRD status of the sample is classified as MRD negative in response to the dynamic probability score being less than the threshold.

25. The method of claim 23, wherein the specificity factor further comprises a mutation context-specific factor.

26. The method of claim 25, wherein the mutation context specificity factor comprises the observed background mutation frequency of the sample and the observed number of background bases in each of the plurality of mutation contexts for the mutation site in the sample.

27. The method of claim 25, further comprising: The parameters of the first likelihood model and the second likelihood model are adjusted based on the mutation context-specific factor.

28. The method of claim 23, wherein the first likelihood model and the second likelihood model further utilize background mutation frequencies.

29. The method of claim 28, wherein the background mutation frequency is a fixed value, a context-specific value, or a sample-specific value.

30. The method of claim 23, wherein the targeted cell-free DNA sequencing data is further enriched for additional DNA duplexes having additional mutation sites found in additional tumor fingerprints.

31. The method of claim 23, wherein the first likelihood model and the second likelihood model are binomial models.

32. The method of claim 23, wherein the first likelihood model is a binomial model and the second likelihood model is a β-binomial model.

33. The method of claim 23, wherein determining the dynamic probability score based on the first likelihood and the second likelihood includes using the first likelihood and the second likelihood in Bayesian probability calculation.