Methods for detecting nucleic acid variants
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- GUARDANT HEALTH INC
- Filing Date
- 2025-10-10
- Publication Date
- 2026-06-11
Smart Images

Figure US2025050489_11062026_PF_FP_ABST
Abstract
Description
Atty. Docket No. GH0206WOMETHODS FOR DETECTING NUCLEIC ACID VARIANTS CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of priority of US Provisional Patent Application No. 63 / 706,404, filed October 11, 2024, which is incorporated by reference herein in its entirety for all purposes.FIELD OF THE INVENTION
[0002] The present disclosure provides methods related to analyzing DNA, such as cell-free DNA. In some embodiments, the DNA is contacted with adapter-blocking probes and intronblocking probes, and a plurality of primers that bind V regions or J regions. In some embodiments, at least a portion of primers bound to DNA comprising recombined CDR3 sequences are extended across the recombined CDR3 sequences. In some embodiments, the resulting CDR3-enriched DNA is sequenced.. In some embodiments, a plurality of target regions comprising sequence-variable target regions and / or epigenetic target regions is captured. In some embodiments, the DNA is from a subject having or suspected of having cancer, and / or the DNA includes DNA from cancer cells.SEQUENCE LISTING
[0003] The present application is filed with a Sequence Listing in electronic format. The Sequence Listing is provided as a file entitled “GH0206WO-ST26.xml” created on September 30, 2025, which is 252,555 bytes in size. The information in the electronic format of the sequence listing is incorporated herein by reference in its entirety.INTRODUCTION AND SUMMARY
[0004] Cancer is responsible for millions of deaths per year worldwide. Early cancer detection may result in improved outcomes because early-stage cancer tends to be more susceptible to treatment.
[0005] Improperly controlled cell growth is a hallmark of cancer. Cancer is usually caused by the accumulation of mutations within an individual's normal cells, at least some resulting in improperly regulated cell division. Such mutations commonly include single nucleotide variations (SNVs), gene fusions, insertions and deletions (indels), transversions, translocations, and inversions. Cancers derived from certain cell types, such as lymphocytes, may also comprise recombined CDR3 sequences that can be useful, e.g., for detecting or identifying cancer cells. Cancers may also exhibit an accumulation of epigenetic changes, including modification ofAtty. Docket No. GH0206WO cytosine (e.g., 5-methylcytosine, 5-hydroxymethylcytosine, and other more oxidized forms) and association of DNA with chromatin proteins and transcription factors.
[0006] Thus, cancer can be indicated by non-sequence modifications, such as methylation. Examples of methylation changes in cancer include local gains of DNA methylation, e.g., in the CpG islands at the transcription start sites of genes involved in normal growth control, DNA repair, cell cycle regulation, and / or cell differentiation. Hypermethylation can be associated with an aberrant loss of transcriptional capacity of involved genes and occurs at least as frequently as point mutations and deletions as a cause of altered gene expression. Furthermore, without wishing to be bound by any particular theory, cells in or around a cancer or neoplasm may shed more DNA than cells of the same tissue type in a healthy subject. The DNA from such cells may differ epigenetically from shed DNA in a healthy subject. As such, the distribution of epigenetically modified (e.g., methylated) DNA in certain DNA samples, such as cell-free DNA (cfDNA), may change upon carcinogenesis. Thus, sufficiently sensitive epigenetic (e.g., DNA methylation) profiling can be used to detect aberrant methylation in DNA of a sample.
[0007] Biopsies represent a traditional approach for detecting or diagnosing cancer in which cells or tissue are extracted from a possible cancer site and analyzed for relevant phenotypic and / or genotypic features. Biopsies have the drawback of being invasive.
[0008] Cancer detection based on analysis of body fluids (“liquid biopsies”), such as blood, is an intriguing alternative based on the observation that DNA from cancer cells is released into body fluids. A liquid biopsy is noninvasive (sometimes requiring only a blood draw). However, it has been challenging to develop accurate and sensitive methods for analyzing liquid biopsy material because the amount of nucleic acids released into body fluids is low and variable, as is recovery of nucleic acids from such fluids in analyzable form. These sources of variation can obscure the predictive value of mutations (e.g., rearrangements, such as translocations and indels) among samples. Such mutations may include biomarkers that can be used to evaluate whether a subject diagnosed with, or suspected of having signs of, a cancer will benefit from a specific type of cancer therapy, such as Immuno-Oncology (I-O) therapy. Isolating and processing cell-free DNA useful for further analysis in liquid biopsy procedures can be a useful part of these methods. Accordingly, there is a need for improved methods and compositions for analyzing cell-free DNA, e.g., in liquid biopsies.
[0009] To identify recombined CDR3 sequences, simple capture of V, D, or J segments will not necessarily distinguish them from uninformative germline sequences, e g., from cells of nonAtty. Docket No. GH0206WO lymphocytic lineages. Accordingly, multiplex primer extension using primers and blocking probes to provide CDR3-enriched DNA can be used in some embodiments disclosed herein.
[0010] With regard to identifying structural variations, DNA breakpoints can vary; therefore, effective methods of detecting rearrangements in DNA should capture as many breakpoints as possible. Existing methods may not be selective for rearrangements and may result in significant detection of wild type DNA and thus, poor signal -to-noise ratios. In some embodiments, the methods herein comprise multiplex amplification of DNA comprising a rearrangement and detection of the primer-extended products.
[0011] The methods herein can provide combined information about CDR3 sequences and / or DNA rearrangements and other modifications, including but not limited to sequence variations. Existing methods may not provide for capture and analysis of modifications such as sequence variations in sequence-variable target regions and / or epigenetic variations in epigenetic target regions together with analysis of CDR3 sequences and / or DNA rearrangements from a single sample.
[0012] The present disclosure aims to meet the need for improved analysis of DNA (such as cell- free DNA) comprising a rearrangement and / or recombined CDR3 sequences together with other modifications, such as sequence variations in sequence-variable target regions and / or epigenetic variations in epigenetic target regions, provide other benefits, or at least provide the public with a useful choice. Accordingly, the following exemplary embodiments are provided.
[0013] Embodiment l is a method of analyzing DNA in an adapted library, the method comprising: a) contacting the DNA with (i) adapter-blocking probes and intron-blocking probes, thereby providing blocked DNA; and (ii) a plurality of primers, wherein the primers bind J regions and the intron-blocking probes bind J-region introns, or the primers bind V regions and the intronblocking probes bind V-region introns; b) extending at least a portion of the primers, wherein at least a portion of primers bound to DNA comprising recombined CDR3 sequences are extended across the recombined CDR3 sequences, thereby providing CDR3 -enriched DNA; and c) sequencing the CDR3 -enriched DNA.
[0014] Embodiment 2 is the method of the immediately preceding embodiment, wherein the primers are extended using a non-strand-displacing polymerase.Atty. Docket No. GH0206WO
[0015] Embodiment 3 is the method of any one of the preceding embodiments, wherein the primers are extended using a polymerase that lacks 5’ to 3’ exonuclease activity.
[0016] Embodiment 4 is the method of any one of embodiments 1-3, comprising contacting the DNA with adapter-blocking probes, V region intron-blocking probes, and a plurality of primers that bind V regions.
[0017] Embodiment 5 is the method of any one of embodiments 1-3, comprising contacting the DNA with adapter-blocking probes, J region intron-blocking probes, and a plurality of primers that bind J regions.
[0018] Embodiment 6 is the method of any one of embodiments 1-3, comprising contacting the DNA with adapter-blocking probes, J region intron-blocking probes, V region intron-blocking probes, a plurality of primers that bind V regions, and a plurality of primers that bind J regions.
[0019] Embodiment 7 is the method of any one of the preceding embodiments, wherein the plurality of primers that bind V regions are oriented to prime extension toward the J exon.
[0020] Embodiment 8 is the method of any one of the preceding embodiments, wherein the plurality of primers that bind J regions are oriented to prime extension toward the V exon.
[0021] Embodiment 9 is the method of any one of the preceding embodiments, wherein the plurality of primers that bind V regions comprises from 5 to 300, from 5 to 10, from 10 to 20, from 20 to 30, from 30 to 40, from 40 to 50, from 50 to 60, from 60 to 70, from 70 to 80, from 80 to 90, from 90 to 100, from 100 to 120, from 120 to 140, from 140 to 160, from 160 to 180, from 180 to 200, from 200 to 220, from 220 to 240, from 240 to 260, from 260 to 280, from 280 to 300, from 40 to 300, from 40 to 280, from 100 to 300, from 100 to 280, from 150 to 300, from 100 to 200, from 200 to 300, from 10 to 150, from 15 to 125, from 20 to 100, from 30 to 90, from 40 to 80, or from 50 to 70 primers that bind V regions.
[0022] Embodiment 10 is the method of any one of the preceding embodiments, wherein the plurality of primers that bind J regions comprises from 5 to 120, from 5 to 10, from 10 to 20, from 20 to 30, from 30 to 40, from 40 to 50, from 50 to 60, from 60 to 70, from 70 to 80, from 80 to 90, from 90 to 100, from 100 to 120, from 50 to 120, from 80 to 120, from 60 to 100, from 5 to 30, from 10 to 25, or from 10 to 20 primers that bind J regions.
[0023] Embodiment 11 is the method of any one of the preceding embodiments, wherein the DNA contacted with the adapter-blocking probes, intron-blocking probes, and primers is in a first subsample.Atty. Docket No. GH0206WO
[0024] Embodiment 12 is the method of the immediately preceding embodiment, wherein at least a second subsample of the adapted library is retained as a backup.
[0025] Embodiment 13 is the method of any one of the preceding embodiments, further comprising capturing a plurality of target regions from the DNA, thereby providing captured regions.
[0026] Embodiment 14 is the method of the immediately preceding embodiment, wherein the plurality of target regions comprises sequence-variable target regions.
[0027] Embodiment 15 is the method of embodiment 13 or embodiment 14, wherein the plurality of target regions comprises epigenetic target regions.
[0028] Embodiment 16 is the method of any one of embodiments 13-15, wherein the plurality of target regions comprises sequence-variable target regions and epigenetic target regions.
[0029] Embodiment 17 is the method of any one of embodiments 13-16, wherein the plurality of target regions is captured from a first subsample or from a second subsample of the adapted library.
[0030] Embodiment 18 is the method of any one of embodiments 11-17, wherein a third subsample of the adapted library is retained as a backup.
[0031] Embodiment 19 is the method of any one of embodiments 13-18, further comprising sequencing the captured regions.
[0032] Embodiment 20 is the method of any one of embodiments 14-19, further comprising sequencing the captured sequence-variable target regions.
[0033] Embodiment 21 is the method of any one of embodiments 15-20, further comprising sequencing the captured epigenetic target regions.
[0034] Embodiment 22 is the method of any one of embodiments 13-21, wherein the captured regions are amplified prior to sequencing.
[0035] Embodiment 23 is the method of any one of embodiments 13-22, wherein the captured regions and the CDR3 -enriched DNA are pooled and sequenced together.
[0036] Embodiment 24 is the method of any one of embodiments 13-22, wherein the captured regions and the CDR3-enriched DNA are sequenced separately.
[0037] Embodiment 25 is the method of any one of embodiments 15-24, wherein the epigenetic target regions comprise hypermethylation variable target regions, hypomethylation variable target regions, methylation control target regions, or fragmentation variable target regions.Atty. Docket No. GH0206WO
[0038] Embodiment 26 is the method of any one of embodiments 14-25, further comprising quantifying a somatic mutation load using a plurality of captured regions comprising the sequence-variable target regions.
[0039] Embodiment 27 is the method of any one of the preceding embodiments, wherein at least a portion of the plurality of primers that bind J regions and / or at least a portion of the plurality of primers that bind V regions do not exponentially amplify a target region that does not comprise a CDR3 sequence.
[0040] Embodiment 28 is the method of any one of the preceding embodiments, wherein hybridization of an intron-blocking probe to the DNA at least partially blocks extension of at least a portion of the plurality of primers
[0041] Embodiment 29 is the method of any one of the preceding embodiments, wherein the plurality of primers that bind V regions and / or the plurality of primers that bind J regions comprise a label.
[0042] Embodiment 30 is the method of the immediately preceding embodiment, wherein the plurality of primers that bind V regions and / or the plurality of primers that bind J regions comprise the same label.
[0043] Embodiment 31 is the method of embodiment 29 or embodiment 30, wherein the plurality of primers that bind V regions comprises a first label and the plurality of primers that bind J regions comprises a second label.
[0044] Embodiment 32 is the method of any one of the preceding embodiments, wherein a label is incorporated into the CDR3 enriched DNA during the extending segments comprising recombined CDR3 sequences.
[0045] Embodiment 33 is the method of any one of embodiments 29-32, wherein the label is biotin, avidin, streptavidin, neutravidin, an oligonucleotide, digoxygenin, a histidine tag, an affinity tag, an immunoglobulin constant domain, a nucleic acid comprising a particular nucleotide sequence, a hapten recognized by an antibody, or a magnetically attractable particle.
[0046] Embodiment 34 is the method of any one of the preceding embodiments, wherein the CDR3 sequence is a part of a T cell receptor (TCR), TCR beta chain, B cell receptor, immunoglobulin, B cell receptor heavy chain, or immunoglobulin heavy chain.
[0047] Embodiment 35 is the method of any one of the preceding embodiments, wherein the CDR3-enriched DNA comprises extended primers that bind V regions, and / or extended primersAtty. Docket No. GH0206WO that bind J regions, wherein extension was not blocked by a J region intron-blocking probe or by a V region intron-blocking probe.
[0048] Embodiment 36 is the method of any one of the preceding embodiments, wherein each of the plurality of primers that bind J regions and / or each of the plurality of primers that bind V regions comprises at least 18, 19, or 20 linked nucleosides.
[0049] Embodiment 37 is the method of any one of the preceding embodiments, wherein each of the plurality of primers that bind J regions and / or each of the plurality of primers that bind V regions consists of 18, 19, or 20 to 60 linked nucleosides.
[0050] Embodiment 38 is the method of any of the preceding embodiments, wherein the plurality of primers that bind J regions and / or the plurality of primers that bind V regions are resistant to 5’ exonucleolysis.
[0051] Embodiment 39 is a method of analyzing DNA in an adapted library, the method comprising: a) contacting the DNA with one or more blocking probes, thereby providing blocked DNA; b) performing multiplex amplification of a plurality of target regions that may comprise a structural variation using a plurality of first primers and a plurality of second primers that anneal to the plurality of target regions, wherein the blocking probes inhibit amplification of wild-type DNA, thereby providing structural variation-enriched DNA; and c) sequencing the structural variation-enriched DNA.
[0052] Embodiment 40 is the method of the immediately preceding embodiment, wherein the multiplex amplification is performed with a non-strand-displacing polymerase that lacks 5’ to 3’ exonuclease activity.
[0053] Embodiment 41 is the method of embodiment 39 or 40, wherein the plurality of first primers comprises from 5 to 300, from 5 to 10, from 10 to 20, from 20 to 30, from 30 to 40, from 40 to 50, from 50 to 60, from 60 to 70, from 70 to 80, from 80 to 90, from 90 to 100, from 100 to 120, from 120 to 140, from 140 to 160, from 160 to 180, from 180 to 200, from 200 to 220, from 220 to 240, from 240 to 260, from 260 to 280, from 280 to 300, from 40 to 300, from 40 to 280, from 100 to 300, from 100 to 280, from 150 to 300, from 100 to 200, from 200 to 300, from 10 to 150, from 15 to 125, from 20 to 100, from 30 to 90, from 40 to 80, or from 50 to 70 primers.
[0054] Embodiment 42 is the method of any one of embodiments 39-41, wherein the plurality of second primers comprises from 5 to 300, from 5 to 10, from 10 to 20, from 20 to 30, from 30 to 40, from 40 to 50, from 50 to 60, from 60 to 70, from 70 to 80, from 80 to 90, from 90 to 100,Atty. Docket No. GH0206WO from 100 to 120, from 120 to 140, from 140 to 160, from 160 to 180, from 180 to 200, from 200 to 220, from 220 to 240, from 240 to 260, from 260 to 280, from 280 to 300, from 40 to 300, from 40 to 280, from 100 to 300, from 100 to 280, from 150 to 300, from 100 to 200, from 200 to 300, from 10 to 150, from 15 to 125, from 20 to 100, from 30 to 90, from 40 to 80, or from 50 to 70 primers.
[0055] Embodiment 43 is the method of any one of embodiments 39-41, comprising contacting a first subsample of the DNA with the one or more blocking probes.
[0056] Embodiment 44 is the method of the immediately preceding embodiment, wherein a second subsample of the adapted library is retained as a backup.
[0057] Embodiment 45 is the method of any one of embodiments 39-44, further comprising capturing a second plurality of target regions from the DNA, thereby providing captured regions.
[0058] Embodiment 46 is the method of the immediately preceding embodiment, wherein the second plurality of target regions comprises sequence-variable target regions.
[0059] Embodiment 47 is the method of embodiment 45 or embodiment 46, wherein the second plurality of target regions comprises epigenetic target regions.
[0060] Embodiment 48 is the method of any one of embodiments 45-47, wherein the second plurality of target regions comprises sequence-variable target regions and epigenetic target regions.
[0061] Embodiment 49 is the method of any one of embodiments 45-48, wherein the second plurality of target regions is captured from a first subsample or from a second subsample of the adapted library.
[0062] Embodiment 50 is the method of any one of embodiments 43-49, wherein a third subsample of the adapted library is retained as a backup.
[0063] Embodiment 51 is the method of any one of embodiments 45-50, further comprising sequencing the captured regions.
[0064] Embodiment 52 is the method of any one of embodiments 46-51, further comprising sequencing the captured sequence-variable target regions.
[0065] Embodiment 53 is the method of any one of embodiments 47-52, further comprising sequencing the captured epigenetic target regions.
[0066] Embodiment 54 is the method of any one of embodiments 45-53, wherein the captured regions are amplified prior to sequencing.Atty. Docket No. GH0206WO
[0067] Embodiment 55 is the method of embodiment 45-54, wherein the captured regions and the structural variation-enriched DNA are pooled and sequenced together.
[0068] Embodiment 56 is the method of embodiment 45-55, wherein the captured regions and the structural variation-enriched DNA are sequenced separately.
[0069] Embodiment 57 is the method of any one of embodiments 47-56, wherein the epigenetic target regions comprise hypermethylation variable target regions, hypomethylation variable target regions, methylation control target regions, or fragmentation variable target regions.
[0070] Embodiment 58 is the method of any one of embodiments 46-57, further comprising quantifying a somatic mutation load using a plurality of captured regions comprising the sequence-variable target regions.
[0071] Embodiment 59 is the method of any one of embodiments 39-58, wherein at least a portion of the plurality of first primers and / or at least a portion of the plurality of second primers do not exponentially amplify a target region that does not comprise a structural variation.
[0072] Embodiment 60 is the method of any one of embodiments 39-59, wherein hybridization of a blocking probe to a region of the DNA at least partially blocks extension of at least a portion of the plurality of first primers and / or at least a portion of the plurality of second primers.
[0073] Embodiment 61 is the method of any one of embodiments 39-60, wherein the structural variation comprises a rearrangement, an insertion, or a deletion.
[0074] Embodiment 62 is the method of the immediately preceding embodiment, wherein the rearrangement comprises translocations, gene fusions, duplications, copy-number variants, or inversions.
[0075] Embodiment 63 is the method of any one of embodiments 39-62, wherein the structural variation-enriched DNA comprises extended first primers, and / or extended second primers, wherein extension was not blocked by a blocking probe.
[0076] Embodiment 64 is the method of any one of embodiments 39-63, wherein the plurality of first primers and / or the plurality of second primers comprise a label.
[0077] Embodiment 65 is the method of the immediately preceding embodiment, wherein the plurality of first primers and the plurality of second primers comprise the same label.
[0078] Embodiment 66 is the method of any one of embodiments 39-65, wherein the plurality of first primers comprises a first label and the plurality of second primers comprises a second label.Atty. Docket No. GH0206WO
[0079] Embodiment 67 is the method of any one of embodiments 39-66, wherein a label is incorporated into the structural variation-enriched DNA during the multiplex amplification of the plurality of target regions that may comprise a structural variation.
[0080] Embodiment 68 is the method of any one of embodiments 64-67, wherein the label is biotin, avidin, streptavidin, neutravidin, an oligonucleotide, digoxygenin, a histidine tag, an affinity tag, an immunoglobulin constant domain, a nucleic acid comprising a particular nucleotide sequence, a hapten recognized by an antibody, or a magnetically attractable particle.
[0081] Embodiment 69 is the method of any one of the preceding embodiments, wherein each of the plurality of the first and second primers comprises at least 20 linked nucleosides.
[0082] Embodiment 70 is the method of any one of the preceding embodiments, wherein each of the plurality of the first and second primers consists of 18, 19, or 20 to 60 linked nucleosides.
[0083] Embodiment 71 is the method of any of the preceding embodiments, wherein the plurality of the first and second primers are resistant to 5’ exonucleolysis.
[0084] Embodiment 72 is the method of any one of the preceding embodiments, wherein the method comprises preparing the adapted library by ligating adapters to DNA, thereby producing adapted DNA.
[0085] Embodiment 73 is the method of embodiment 72, wherein the adapted DNA comprises molecular barcodes.
[0086] Embodiment 74 is the method of any one of the preceding embodiments, wherein the adapted library is prepared from cfDNA.
[0087] Embodiment 75 is the method of any one of the preceding embodiments, wherein the method comprises preparing the adapted library by ligating adaptors to cfDNA, thereby producing adapted cfDNA.
[0088] Embodiment 76 is the method of any one of the preceding embodiments, wherein the adapted library is prepared from DNA from a subject having or suspected of having a cancer.
[0089] Embodiment 77 is the method of any one of the preceding embodiments, wherein the method comprises preparing the adapted library by ligating adaptors to cfDNA from a subject having or suspected of having a cancer, thereby producing adapted cfDNA.
[0090] Embodiment 78 is the method of any one of the preceding embodiments, further comprising determining a likelihood that the subject has a cancer.
[0091] Embodiment 79 is the method of any one of embodiments 76-78, wherein the cancer is a lymphocytic cancer.Atty. Docket No. GH0206WO
[0092] Embodiment 80 is the method of the immediately preceding embodiment, wherein the lymphocytic cancer is a leukemia, a lymphoma, or a myeloma.
[0093] Embodiment 81 is the method of any one of embodiments 76-80, wherein the cancer is a lymphoma.
[0094] Embodiment 82 is the method of embodiment 81, wherein the lymphoma is B-cell lymphoma, non-Hodgkin lymphoma, diffuse large B-cell lymphoma, Mantle cell lymphoma, T cell lymphoma, non-Hodgkin lymphoma, precursor T-lymphoblastic lymphoma / leukemia, or peripheral T cell lymphoma.
[0095] Embodiment 83 is the method of any one of the preceding embodiments, further comprising separating the CDR3-enriched DNA or the structural variation-enriched DNA from non-enriched DNA in the sample.
[0096] Embodiment 84 is the method of the immediately preceding embodiment, wherein the separating uses the label to separate the structural variation-enriched DNA or the CDR3-enriched DNA from non-enriched DNA in the sample.
[0097] Embodiment 85 is the method of embodiment 83 or embodiment 84, wherein the separating comprises precipitating the structural variation-enriched DNA or the CDR3-enriched DNA.
[0098] Embodiment 86 is the method of any one of embodiments 83-85, wherein the separating is performed at a temperature that facilitates (i) separation of extended primers that bind J regions from non-extended or partially-extended primers that bind J regions and / or separation of extended primers that bind V regions from non-extended or partially-extended primers that bind V regions; or (ii) separation of extended first primers and / or extended second primers from nonextended or parti ally -extended first and / or second primers.
[0099] Embodiment 87 is the method of any one of embodiments 83-86, wherein the separating is performed at a temperature that is (i) higher than the melting temperature of the non-extended and / or the partially-extended primers that bind J regions, and / or (ii) higher than the melting temperature of the non-extended and / or the partially-extended primers that bind V regions.
[0100] Embodiment 88 is the method of embodiment 86 or embodiment 87, wherein the temperature is 1-15, 1-10, 1-5, 5-15, or 5-15, such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 degrees Fahrenheit above the melting temperature of (i) the non-extended and / or the partially- extended primers that bind J regions and / or (ii) the non-extended or partially-extended primers that bind V regions.Atty. Docket No. GH0206WO
[0101] Embodiment 89 is the method of any one of embodiments 83-86, wherein the separating is performed at a temperature that is higher than the melting temperature of the non-extended and / or the partially-extended first and / or second primers.
[0102] Embodiment 90 is the method of embodiment 86 or embodiment 89, wherein the temperature is 1-15, 1-10, 1-5, 5-15, or 5-15, such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 degrees Fahrenheit above the melting temperature of the non-extended and / or the partially- extended first and / or second primers.
[0103] Embodiment 91 is the method of any one of embodiments 83-90, wherein the DNA is rendered single stranded prior to the separating.
[0104] Embodiment 92 is the method of any of the preceding embodiments, comprising differentially tagging and pooling the first subsample and the second subsample.
[0105] Embodiment 93 is the method of embodiment 92, wherein the pool comprises less than or equal to about 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, or 5% of the DNA of the second subsample.
[0106] Embodiment 94 is the method of the immediately preceding embodiment, wherein the pool comprises about 70-90%, about 75-85%, or about 80% of the DNA of the second subsample.
[0107] Embodiment 95 is the method of any one of embodiments 92-94, wherein the pool comprises substantially all of the DNA of the first subsample.
[0108] Embodiment 96 is the method of any one of the preceding embodiments, further comprising detecting a presence or absence of a DNA molecule that comprises a CDR3 sequence of interest.
[0109] Embodiment 97 is the method of any one of embodiments 39-96, comprising detecting a presence or absence of a DNA molecule that comprises a structural variation.
[0110] Embodiment 98 is the method of embodiment 96 or 97, wherein the detecting comprises generating a plurality of sequencing reads; and the method further comprises mapping the plurality of sequence reads to one or more reference sequences to generate mapped sequence reads, and processing the mapped sequence reads to determine the likelihood that the subject has cancer.
[0111] Embodiment 99 is the method of the immediately preceding embodiment, further comprising detecting a presence or absence of DNA originating or derived from a tumor cell using the mapped sequence reads.Atty. Docket No. GH0206WO
[0112] Embodiment 100 is the method of the immediately preceding embodiment, further comprising determining a cancer recurrence score that is indicative of the presence or absence of the DNA originating or derived from the tumor cell for the test subject, optionally further comprising determining a cancer recurrence status based on the cancer recurrence score, wherein the cancer recurrence status of the test subject is determined to be at risk for cancer recurrence when a cancer recurrence score is determined to be at or above a predetermined threshold or the cancer recurrence status of the test subject is determined to be at lower risk for cancer recurrence when the cancer recurrence score is below the predetermined threshold.
[0113] Embodiment 101 is the method of the immediately preceding embodiment, further comprising comparing the cancer recurrence score of the test subject with a predetermined cancer recurrence threshold, wherein the test subject is classified as a candidate for a subsequent cancer treatment when the cancer recurrence score is above the cancer recurrence threshold or not a candidate for a subsequent cancer treatment when the cancer recurrence score is below the cancer recurrence threshold.
[0114] In some embodiments, the results of the methods disclosed herein are used as an input to generate a report. The report may be in a paper or electronic format. For example, true copy number variation, as obtained by the methods disclosed herein, or information derived therefrom, can be displayed directly in such a report. Alternatively or additionally, diagnostic information or therapeutic recommendations which are at least in part based on the methods disclosed herein can be included in the report.
[0115] The various steps of the methods disclosed herein may be carried out at the same or different times, in the same or different geographical locations, e.g. countries, and / or by the same or different people.
[0116] Additional advantages will be set forth in part in the description which follows or may be learned by practice. The advantages will be realized and attained by means of the elements and combinations particularly pointed out in the appended claims.BRIEF DESCRIPTION OF THE DRAWINGS
[0117] FIG. 1A illustrates an exemplary workflow according to certain embodiments of the disclosure, beginning with DNA isolated from a sample and ligated to adapters for sequencing library preparation. Optionally, the adapted library may be amplified to provide an amplified, adapted DNA library. The adapted DNA library may be divided into at least a first subsampleAtty. Docket No. GH0206WO and a second subsample, and an optional third subsample, e.g., to be retained as a backup. The DNA is contacted with (i) adapter-blocking probes and intron-blocking probes, thereby providing blocked DNA; and (ii) a plurality of primers, wherein the primers bind J regions and the intron-blocking probes bind J-region introns, or the primers bind V regions and the intronblocking probes bind V-region introns. At least a portion of the primers are extended (such as using multiplex PCR), wherein at least a portion of primers bound to DNA comprising recombined CDR3 sequences are extended across the recombined CDR3 sequences, thereby providing CDR3-enriched DNA. Sequence-variable target regions and / or epigenetic target regions are optionally captured and amplified from, e.g., a first subsample, a second subsample, or a third subsample of the DNA. Then, the captured regions and the CDR3 -enriched DNA are pooled and sequenced together.
[0118] FIG. IB illustrates an exemplary workflow according to certain embodiments of the disclosure, beginning with DNA isolated from a sample and ligated to adapters for sequencing library preparation. Optionally, the adapted library may be amplified to provide an amplified, adapted DNA library. The adapted DNA library may be divided into at least a first subsample and a second subsample, and an optional third subsample, e.g., to be retained as a backup. The DNA is contacted with (i) adapter-blocking probes and intron-blocking probes, thereby providing blocked DNA; and (ii) a plurality of primers, wherein the primers bind J regions and the intron-blocking probes bind J-region introns, or the primers bind V regions and the intronblocking probes bind V-region introns. At least a portion of the primers are extended (such as using multiplex PCR), wherein at least a portion of primers bound to DNA comprising recombined CDR3 sequences are extended across the recombined CDR3 sequences, thereby providing CDR3-enriched DNA. Sequence-variable target regions and / or epigenetic target regions are optionally captured and amplified from, e.g., a first subsample, a second subsample, or a third subsample of the DNA. Then, the captured regions and the CDR3-enriched DNA are pooled and sequenced separately.
[0119] FIG. 1C illustrates certain features of blocking probes and primer sets according to certain embodiments of the disclosure.
[0120] FIG. ID illustrates certain features of blocking probes and primer sets according to certain embodiments of the disclosure.
[0121] FIG. 2 is a schematic diagram of an example of a system suitable for use with some embodiments of the disclosure.Atty. Docket No. GH0206WODETAILED DESCRIPTION OF CERTAIN EMBODIMENTS
[0122] Reference will now be made in detail to certain embodiments of the disclosure. While the disclosure will be described in conjunction with such embodiments, it will be understood that they are not intended to limit the disclosure to those embodiments. On the contrary, the disclosure is intended to cover all alternatives, modifications, and equivalents, which may be included within the disclosure as defined by the appended claims.
[0123] Before describing the present teachings in detail, it is to be understood that the disclosure is not limited to specific compositions or process steps, as such may vary. It should be noted that, as used in this specification and the appended claims, the singular form “a”, “an” and “the” include plural references unless the context clearly dictates otherwise. Thus, for example, reference to “a nucleic acid” includes a plurality of nucleic acids.
[0124] Numeric ranges are inclusive of the numbers defining the range. Measured and measurable values are understood to be approximate, taking into account significant digits and the error associated with the measurement. Also, the use of “comprise”, “comprises”, “comprising”, “contain”, “contains”, “containing”, “include”, “includes”, and “including” are not intended to be limiting. It is to be understood that both the foregoing general description and detailed description are exemplary and explanatory only and are not restrictive of the teachings.
[0125] Unless specifically noted in the above specification, embodiments in the specification that recite “comprising” various components are also contemplated as “consisting of’ or “consisting essentially of’ the recited components; embodiments in the specification that recite “consisting of’ various components are also contemplated as “comprising” or “consisting essentially of’ the recited components; and embodiments in the specification that recite “consisting essentially of’ various components are also contemplated as “consisting of’ or “comprising” the recited components (this interchangeability does not apply to the use of these terms in the claims).
[0126] The section headings used herein are for organizational purposes and are not to be construed as limiting the disclosed subject matter in any way. In the event that any document or other material incorporated by reference contradicts any explicit content of this specification, including definitions, this specification controls.
[0127] All patents, patent applications, websites, other publications or documents and the like cited herein whether supra or infra, are expressly incorporated by reference in their entirety for all purposes to the same extent as if each individual item were specifically and individuallyAtty. Docket No. GH0206WO indicated to be so incorporated by reference. If different versions of a publication, website or the like are published at different times, the version most recently published at the effective fding date of the application is meant, unless otherwise indicated.I. Definitions
[0128] “Cell-free DNA,” “cfDNA molecules,” or simply “cfDNA” include DNA molecules that naturally occur in a subject in extracellular form (e.g., in blood, serum, plasma, or other bodily fluids such as lymph, cerebrospinal fluid, urine, or sputum). While the cfDNA originally existed in a cell or cells in a large complex biological organism, e.g., a mammal, it has undergone release from the cell(s) into a fluid found in the organism, and may be obtained from a sample of the fluid without the need to perform an in vitro cell lysis step. cfDNA molecules may occur as DNA fragments.
[0129] As used herein, “primer-annealed DNA” means DNA to which at least one primer is annealed.
[0130] As used herein, a DNA polymerase that is “5’ to 3’ exonuclease negative” does not have significant 5’ to 3’ exonuclease activity. A DNA polymerase that is “strand displacement negative” does not have significant helicase activity to displace a strand, such as a primer, annealed to DNA. In some embodiments, a 5’ to 3’ exonuclease negative, strand displacement negative DNA polymerase is used for primer extension in order to prevent extension of primers annealed to wild type target regions.
[0131] As used herein, a “primer-extended product,” when referring to primers that anneal to at least one target region, means a nucleic acid strand formed by extension of a primer annealed to a DNA target region. In some embodiments, a primer-extended product is a significant primer- extended product or is formed by significant primer extension, meaning that the resulting nucleic acid strand has sufficient additional length (e.g., at least 10, 15, 20, 30, 40, 50, 60, 75, or 100 nucleotides in addition to the length of the original primer) to be detected and / or identified using methods described herein. In some embodiments, significant primer extension results in primer- extended products comprising a capture moiety present at a low percentage in the deoxynucleoside triphosphate mixture. In some embodiments, primer extension that results in no primer-extended products or only short primer-extended products that do not comprise the capture moiety occurs on target regions comprising a completely tiled primer-annealed target region.Atty. Docket No. GH0206WO
[0132] As used herein, “adjacent” nucleosides or oligonucleotides are nucleosides or oligonucleotides that are next to each other, with no intervening nucleosides. For example, “adjacent” nucleosides may be covalently linked together within a nucleic acid or oligonucleotide, or they may be unlinked but are next to each other because they are annealed to or hybridized to adjacent linked nucleosides of a nucleic acid. “Adjacent” oligonucleotides may likewise be linked together or unlinked to each other but annealed to or hybridized to adjacent, linked portions of a nucleic acid.
[0133] As used herein, “partitioning” refers to physically separating or fractionating a mixture of nucleic acid molecules in a sample based on a characteristic of the nucleic acid molecules. The partitioning can be physical partitioning of molecules. Partitioning can involve separating the nucleic acid molecules into groups or sets based on the level of epigenetic feature (for e.g., methylation). For example, the nucleic acid molecules can be partitioned based on the level of methylation of the nucleic acid molecules. In some embodiments, the methods and systems used for partitioning may be found in PCT Patent Application No. PCT / US2017 / 068329, which is hereby incorporated by reference in its entirety.
[0134] As used herein, “partitioned set” or “partition” refers to a set of nucleic acid molecules partitioned into a set or group based on the differential binding affinity of the nucleic acid molecules or proteins associated with the nucleic acid molecules to a binding agent. A partitioned set may also be referred to as a subsample. The binding agent binds preferentially to the nucleic acid molecules comprising nucleotides with epigenetic modification. For example, if the epigenetic modification is methylation, the binding agent can be a methyl binding domain (MBD) protein. In some embodiments, a partitioned set can comprise nucleic acid molecules belonging to a particular level or degree of epigenetic feature (for e.g., methylation). For example, the nucleic acid molecules can be partitioned into three sets - one set for highly methylated nucleic acid molecules (first subsample, hyper partition, hyper partitioned set or hypermethylated partitioned set), a second set for low methylated nucleic acid molecules (second subsample, hypo partition, hypo partitioned set or hypomethylated partitioned set), and a third set for intermediate methylated nucleic acid molecules (third subsample, intermediate partitioned set, intermediately methylated partitioned set, residual partitioned set, or residual partition). In another example, the nucleic acid molecules can be partitioned based on the number of methylated nucleotides - one partitioned set can have nucleic acid molecules with nineAtty. Docket No. GH0206WO methylated nucleotides, and another partitioned set can have unmethylated nucleic acid molecules (zero methylated nucleotides).
[0135] As used herein, a modification or other feature is present in “a greater proportion” in a first sample or population of nucleic acid than in a second sample or population when the fraction of nucleotides with the modification or other feature is higher in the first sample or population than in the second population. For example, if in a first sample, one tenth of the nucleotides are mC, and in a second sample, one twentieth of the nucleotides are mC, then the first sample comprises the cytosine modification of 5-methylation in a greater proportion than the second sample.
[0136] As used herein, the form of the “originally isolated” sample refers to the composition or chemical structure of a sample at the time it was isolated and before undergoing any procedure that changes the chemical structure of the isolated sample. Similarly, a feature that is “originally present” in a molecule refers to a feature present in an “original molecule” or in molecules “originally comprising” the feature before the molecule undergoes any procedure that changes the chemical structure of the molecule.
[0137] As used herein, “without substantially altering base-pairing specificity” of a given nucleobase means that a majority of molecules comprising that nucleobase that can be sequenced do not have alterations of the base pairing specificity of the second nucleobase relative to its base pairing specificity as it was in the originally isolated sample. In some embodiments, 75%, 90%, 95%>, or 99% of molecules comprising that nucleobase that can be sequenced do not have alterations of the base pairing specificity of the second nucleobase relative to its base pairing specificity as it was in the originally isolated sample.
[0138] As used herein, “base pairing specificity” refers to the standard DNA base (A, C, G, or T) for which a given base most preferentially pairs. Thus, for example, unmodified cytosine and 5- m ethyl cytosine have the same base pairing specificity (i.e., specificity for G) whereas uracil and cytosine have different base pairing specificity because uracil has base pairing specificity for A while cytosine has base pairing specificity for G. The ability of uracil to form a wobble pair with G is irrelevant because uracil nonetheless most preferentially pairs with A among the four standard DNA bases.
[0139] As used herein, a “combination” comprising a plurality of members refers to either of a single composition comprising the members or a set of compositions in proximity, e.g., inAtty. Docket No. GH0206WO separate containers or compartments within a larger container, such as a multiwell plate, tube rack, refrigerator, freezer, incubator, water bath, ice bucket, machine, or other form of storage.
[0140] The “capture yield” of a collection of probes for a given target region set refers to the amount (e.g., amount relative to another target region set or an absolute amount) of nucleic acid corresponding to the target region set that the collection of probes captures under typical conditions. Exemplary typical capture conditions are an incubation of the sample nucleic acid and probes at 65 °C for 10-18 hours in a small reaction volume (about 20 pL) containing stringent hybridization buffer. The capture yield may be expressed in absolute terms or, for a plurality of collections of probes, relative terms. When capture yields for a plurality of sets of target regions are compared, they are normalized for the footprint size of the target region set (e.g., on a per-kilobase basis). Thus, for example, if the footprint sizes of first and second target regions are 50 kb and 500 kb, respectively (giving a normalization factor of 0.1), then the DNA corresponding to the first target region set is captured with a higher yield than DNA corresponding to the second target region set when the mass per volume concentration of the captured DNA corresponding to the first target region set is more than 0.1 times the mass per volume concentration of the captured DNA corresponding to the second target region set. As a further example, using the same footprint sizes, if the captured DNA corresponding to the first target region set has a mass per volume concentration of 0.2 times the mass per volume concentration of the captured DNA corresponding to the second target region set, then the DNA corresponding to the first target region set was captured with a two-fold greater capture yield than the DNA corresponding to the second target region set.
[0141] As used herein, a “label” is a capture moiety, fluorophore, oligonucleotide, or other moiety that facilitates detection, separation, or isolation of that to which it is attached.
[0142] “Capturing” one or more target nucleic acids refers to preferentially isolating or separating the one or more target nucleic acids from non-target nucleic acids.
[0143] A “captured set” of nucleic acids or “captured” nucleic acids refers to nucleic acids that have undergone capture.
[0144] As used herein, a “capture moiety” is a molecule that allows affinity separation of molecules, such as nucleic acids, linked to the capture moiety from molecules lacking the capture moiety. Exemplary capture moieties include biotin, which allows affinity separation by binding to streptavidin linked or linkable to a solid phase or an oligonucleotide, which allows affinityAtty. Docket No. GH0206WO separation through binding to a complementary oligonucleotide linked or linkable to a solid phase.
[0145] As used herein, a “tag” is a molecule, such as a nucleic acid, label, fluorophore, or peptide, containing information that indicates a feature of the molecule to which the tag is associated. For example, molecules can bear a sample tag (which distinguishes molecules in one sample from those in a different sample), a molecular tag / molecular barcode / barcode (which distinguishes different molecules from one another (in both unique and non-unique tagging scenarios), a purification tag, and / or a detectable tag or label.
[0146] As used herein, a “target-specific probe” means a probe that specifically binds to a target region, such as an epigenetic target region or a sequence-variable target region. In some embodiments, target-specific probes comprise a capture moiety to facilitate capture of the target region to which it specifically binds.
[0147] A “target region set” or “set of target regions” or “target regions” or “target regions of interest” or “regions of interest” or “genomic regions of interest” refers to a plurality of genomic loci or a plurality of genomic regions targeted for capture and / or targeted by a set of probes (e.g., through sequence complementarity).
[0148] As used herein, a DNA “structural variation” is a mutation comprising a DNA sequence not present in the wild-type genome other than a point mutation (e.g., in which at least 5, 10, 20, or 50 contiguous nucleotides are different relative to the wild type sequence at the corresponding locus). Examples of DNA structural variations include rearrangements, such as translocations, insertions, deletions, duplications, copy-number variants, and inversions. As used herein, a DNA “rearrangement” is a structural variation, wherein the DNA sequence comprises two adjacent sequence portions that are not adjacent to each other in the germline genomic DNA. In some embodiments, a rearrangement is a translocation, gene fusion, insertion, deletion, or inversion. Exemplary rearrangements include products of a translocation, gene fusion, and VDI recombination. In some embodiments, the rearrangement is the product of a translocation comprising fusion of two intronic regions. In some embodiments, the rearrangement is a product of VDJ recombination comprising adjacent I exonic regions. A molecule comprising a structural variation may be referred to as a structural variant. In some embodiments, an insertion is an insertion of at least 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100 nucleotides. In some embodiments, a deletion affects sequence spanning the end of the target region, so as to result in a primer being unblocked and undergoing extension in a method described herein.Atty. Docket No. GH0206WO
[0149] “Corresponding to a target region set” means that a nucleic acid, such as cfDNA, originated from a locus in the target region set or specifically binds one or more primers or probes for the target region set.
[0150] “Specifically binds” in the context of a probe or other oligonucleotide and a target sequence means that under appropriate hybridization conditions, the oligonucleotide or probe hybridizes to its target sequence, or replicates thereof, to form a stable probe:target hybrid, while at the same time formation of stable probe:non-target hybrids is minimized. Thus, a probe hybridizes to a target sequence or replicate thereof to a sufficiently greater extent than to a nontarget sequence, to enable capture or detection of the target sequence. Appropriate hybridization conditions are well-known in the art, may be predicted based on sequence composition, or can be determined by using routine testing methods (see, e.g., Sambrook et al., Molecular Cloning, A Laboratory Manual, 2nd ed. (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1989) at §§ 1.90-1.91, 7.37-7.57, 9.47-9.51 and 11.47-11.57, particularly §§ 9.50-9.51, 11.12- 11.13, 11.45-11.47 and 11.55-11.57, incorporated by reference herein).
[0151] As used herein, “cellular nucleic acids” means nucleic acids that are located within one or more cells from which the nucleic acids have originated, at least at the point a sample is taken or collected from a subject, even if those nucleic acids are subsequently removed (e.g., via cell lysis) as part of a given analytical process.
[0152] “Sequence-variable target region set” refer to a set of target regions that may exhibit changes in sequence such as nucleotide substitutions (i.e., single nucleotide variations), insertions, deletions, or gene fusions or transpositions in neoplastic cells (e g., tumor cells and cancer cells).
[0153] “Epigenetic target region set” refers to target regions that may show sequenceindependent changes in neoplastic cells (e.g., tumor cells or cancer cells) or that may show sequence-independent changes in cfDNA from subjects having cancer relative to cfDNA from healthy subjects. Examples of sequence-independent changes include, but are not limited to, changes in methylation (increases or decreases), nucleosome distribution, CCCTC-binding factor (“CTCF”) binding, transcription start sites, and regulatory protein binding regions. For present purposes, loci susceptible to neoplasia-, tumor-, or cancer-associated focal amplifications and / or gene fusions may also be included in an epigenetic target region set because detection of a change in copy number by sequencing or a fused sequence that maps to more than one locus in a reference genome tends to be more similar to detection of exemplary epigenetic changesAtty. Docket No. GH0206WO discussed above than detection of nucleotide substitutions, insertions, or deletions, e.g., in that the focal amplifications and / or gene fusions can be detected at a relatively shallow depth of sequencing because their detection does not depend on the accuracy of base calls at one or a few individual positions.
[0154] As used herein, an “epigenetic feature” refers to any feature of DNA or chromatin other than primary sequence (i.e., the sequence of A, C, G, and T bases). Epigenetic features include covalent modifications of bases, such as methylation, and modifications and positioning of histones and other stably DNA-associated proteins.
[0155] As used herein, a “differentially methylated region” (DMR) refers to a region of DNA having a detectably different degree of methylation in at least one cell or tissue type relative to the degree of methylation in the same region of DNA from at least one other cell or tissue type; or having a detectably different degree of methylation in at least one cell or tissue type obtained from a subject having a disease or disorder relative to the degree of methylation in the same region of DNA in the same cell or tissue type obtained from a healthy subject . In some embodiments, a DMR has a detectably higher degree of methylation (e.g., hypermethylated region) in at least one cell or tissue type relative to the degree of methylation in the same region of DNA from at least one other cell or tissue type or from the same cell or tissue type from a healthy subject. In some embodiments, a DMR has a detectably lower degree of methylation (e.g., hypomethylated region) in at least one cell or tissue type relative to the degree of methylation in the same region of DNA from at least one other cell or tissue type or from the same cell or tissue type from a healthy subject.
[0156] As used herein, “type-specific” in the context of an epigenetic variation means an epigenetic variation that is present at a detectably different degree in one cell or tissue type, or in a plurality of related cell or tissue types, relative to other cell or tissue types. Similarly, a “typespecific epigenetic target region” is an epigenetic target region that has a detectably different epigenetic characteristic in one cell or tissue type, or in a plurality of related cell or tissue types, relative to other cell or tissue types. Exemplary epigenetic characteristics are discussed in the definition of epigenetic target regions set forth above. For example, a “type-specific differentially methylated region” is a region of DNA that has a detectably different degree of methylation in one cell or tissue type, or in a plurality of related cell or tissue types, relative to other cell or tissue types. Examples of a type-specific differentially methylated region include tissue-specific differentially methylated regions, including those associated with copy-numberAtty. Docket No. GH0206WO gain in early cancer. In some embodiments, capturing, identification, and / or detection of typespecific differentially methylated regions facilitates identification of the cell or tissue type from which the DNA originated. The cell or tissue from which a type-specific differentially methylated region originated may be a wild type cell or tissue or a neoplastic cell or tissue. In another example, a “type-specific fragment” of DNA is a DNA fragment arising from a typespecific fragmentation pattern that is present at a detectably different degree in one cell or tissue type, or in a plurality of related cell or tissue types, relative to other cell or tissue types. In some embodiments, a type-specific fragment is only present in the specific cell or tissue type(s). In some embodiments, a type-specific fragment is present to a detectably greater extent in the specific cell or tissue type(s).
[0157] DNA is “derived from cancerous cells” if it originated from a tumor cell. Cell free DNA derived from cancerous cells includes ctDNA or circulating tumor DNA. Tumor cells are neoplastic cells that originated from a tumor, regardless of whether they remain in the tumor or become separated from the tumor (as in the cases, e.g., of metastatic cancer cells and circulating tumor cells).
[0158] The term “methylation” or “DNA methylation” refers to addition of a methyl group to a nucleotide base in a nucleic acid molecule. In some embodiments, methylation refers to addition of a methyl group to a cytosine at a CpG site (cytosine-phosphate-guanine site (i.e., a cytosine followed by a guanine in a 5’ -> 3’ direction of the nucleic acid sequence)). In some embodiments, DNA methylation refers to addition of a methyl group to adenine, such as in N6- methyladenine (6mA). In some embodiments, DNA methylation is 5-methylation (modification of the carbon in the 5th position of the cytosine ring). In some embodiments, 5-methylation refers to addition of a methyl group to the 5C position of the cytosine to create 5-methylcytosine (5mC). In some embodiments, methylation comprises a derivative of 5mC. Derivatives of 5mC include, but are not limited to, 5-hydroxymethylcytosine (5-hmC), 5 -formyl cytosine (5-fC), and 5-caryboxylcytosine (5-caC). In some embodiments, DNA methylation is 3C methylation (modification of the carbon in the 3rdposition of the cytosine ring). In some embodiments, 3C methylation comprises addition of a methyl group to the 3C position of the cytosine to generate 3 -methylcytosine (3mC). Methylation can also occur at non-CpG sites, for example, methylation can occur at a CpA, CpT, or CpC site. DNA methylation can change the activity of methylated DNA region. For example, when DNA in a promoter region is methylated, transcription of the gene may be repressed. DNA methylation is critical for normal development and abnormality inAtty. Docket No. GH0206WO methylation may disrupt epigenetic regulation. The disruption, e.g., repression, in epigenetic regulation may cause diseases, such as cancer. Promoter methylation in DNA may be indicative of cancer.
[0159] The “modified nucleoside profile of DNA” means the position and identity of the nucleoside and the modification status of the nucleoside, such as methylations, within a DNA sequence. As described above, different modification sensitive sequencing methods can be used to detect such modifications. This includes methods which involve conversion followed by sequencing detect one or more different types of modified or unmodified nucleoside. For example, the TAPS method detects, but does not distinguish between, 5-methylcytosine (5mC) and 5-hydroxymethyl-cytosine (5hmC). Hence, a method for analyzing the modified nucleoside profile of DNA in a sample typically means identifying particular modifications or groups of modification, such as 5mC and / or 5hmC. Modified nucleosides are identified according to the specific method / conversion procedure being used as described above. This generally involves comparing sequence data obtained from DNA that has been subjected to a conversion procedure to a reference sequence. Typically, the method involves (i) comparing the sequence data with (A) one or more pre-determined reference sequence; or (B) sequence data obtained by sequencing a sub-sample of the DNA that was not subjected to the conversion procedure, for example a subsample that was separated before subjecting a separate subsample to the conversion procedure, for example as described herein; and (ii) identifying point differences between the converted DNA sequences and the reference sequence(s) (A) or non-converted DNA sequences (B) as nucleosides (in the initial sample) having a modification status that permits a change in base pairing specificity on exposure to the conversion procedure.
[0160] The term “hypermethylation” refers to an increased level or degree of methylation of nucleic acid molecule(s) relative to the other nucleic acid molecules within a population (e.g., sample) of nucleic acid molecules. In some embodiments, hypermethylated DNA can include DNA molecules comprising at least 1 methylated residue, at least 2 methylated residues, at least 3 methylated residues, at least 5 methylated residues, or at least 10 methylated residues.
[0161] As used herein, “type-specific hypermethylation” means an increased level or degree of methylation of nucleic acid molecules in at one cell or tissue type, or in a plurality of related cell or tissue types, relative to other cell or tissue types. In some embodiments, capturing, identification, and / or detection of type-specific hypermethylated regions facilitates identification of the cell or tissue type from which the nucleic acid molecules originated. The cell or tissueAtty. Docket No. GH0206WO from which a type-specific hypermethylated region originated may be a wild type cell or tissue or a neoplastic cell or tissue.
[0162] The term “hypomethylation” refers to a decreased level or degree of methylation of nucleic acid molecule(s) relative to the other nucleic acid molecules within a population (e.g., sample) of nucleic acid molecules. In some embodiments, hypomethylated DNA includes unmethylated DNA molecules. In some embodiments, hypomethylated DNA can include DNA molecules comprising 0 methylated residues, at most 1 methylated residue, at most 2 methylated residues, at most 3 methylated residues, at most 4 methylated residues, or at most 5 methylated residues.
[0163] As used herein, “type-specific hypomethylation” means a decreased level or degree of methylation of nucleic acid molecules in at one cell or tissue type, or in a plurality of related cell or tissue types, relative to other cell or tissue types. In some embodiments, capturing, identification, and / or detection of type-specific hypomethylated regions facilitates identification of the cell or tissue type from which the nucleic acid molecules originated. The cell or tissue from which a type-specific hypomethylated region originated may be a wild type cell or tissue or a neoplastic cell or tissue.
[0164] As used herein, “methylation status” can refer to the presence or absence of methyl group on a DNA base (e.g. cytosine) at a particular genomic position in a nucleic acid molecule. It can also refer to the degree of methylation in a nucleic acid sequence (e.g., highly methylated, low methylated, intermediately methylated or unmethylated nucleic acid molecules). The methylation status can also refer to the number of nucleotides methylated in a particular nucleic acid molecule.
[0165] As used herein, “mutation” refers to a variation from a known reference sequence and includes mutations such as, for example, single nucleotide variants (SNVs), and insertions or deletions (indels). A mutation can be a germline or somatic mutation. In some embodiments, a reference sequence for purposes of comparison is a wildtype genomic sequence of the species of the subject providing a test sample, typically the human genome.
[0166] As used herein, the terms “neoplasm” and “tumor” are used interchangeably. They refer to abnormal growth of cells in a subject. A neoplasm or tumor can be benign, potentially malignant, or malignant. A malignant tumor is referred to as a cancer or a cancerous tumor.
[0167] As used herein, “next-generation sequencing” or “NGS” refers to sequencing technologies having increased throughput as compared to traditional Sanger- and capillaryAtty. Docket No. GH0206WO electrophoresis-based approaches, for example, with the ability to generate hundreds of thousands of relatively small sequence reads at a time. Some examples of next-generation sequencing techniques include, but are not limited to, sequencing by synthesis, sequencing by ligation, and sequencing by hybridization. In some embodiments, next-generation sequencing includes the use of instruments capable of sequencing single molecules. Examples of commercially available instruments for performing next-generation sequencing include, but are not limited to, NextSeq, HiSeq, NovaSeq, MiSeq, Ion PGM and Ion GeneStudio S5.
[0168] As used herein, “nucleic acid tag” refers to a short nucleic acid (e.g., less than about 500 nucleotides, about 100 nucleotides, about 50 nucleotides, or about 10 nucleotides in length), used to distinguish nucleic acids from different samples (e g., representing a sample index), distinguish nucleic acids from different partitions (e.g., representing a partition tag) or different nucleic acid molecules in the same sample (e.g., representing a molecular barcode), of different types, or which have undergone different processing. The nucleic acid tag comprises a predetermined, fixed, non-random, random or semi-random oligonucleotide sequence. Such nucleic acid tags may be used to label different nucleic acid molecules or different nucleic acid samples or sub-samples. Nucleic acid tags can be single-stranded, double-stranded, or at least partially double-stranded. Nucleic acid tags optionally have the same length or varied lengths. Nucleic acid tags can also include double-stranded molecules having one or more blunt-ends, include 5’ or 3’ single-stranded regions (e.g., an overhang), and / or include one or more other single-stranded regions at other locations within a given molecule. Nucleic acid tags can be attached to one end or to both ends of the other nucleic acids (e.g., sample nucleic acids to be amplified and / or sequenced). Nucleic acid tags can be decoded to reveal information such as the sample of origin, form, or processing of a given nucleic acid. For example, nucleic acid tags can also be used to enable pooling and / or parallel processing of multiple samples comprising nucleic acids bearing different molecular barcodes and / or sample indexes in which the nucleic acids are subsequently being deconvolved by detecting (e.g., reading) the nucleic acid tags. Nucleic acid tags can also be referred to as identifiers (e.g. molecular identifier, sample identifier).Additionally, or alternatively, nucleic acid tags can be used as molecular identifiers (e.g., to distinguish between different molecules or amplicons of different parent molecules in the same sample or sub-sample). This includes, for example, uniquely tagging different nucleic acid molecules in a given sample, or non-uniquely tagging such molecules. In the case of non-unique tagging applications, a limited number of tags (i.e., molecular barcodes) may be used to tag eachAtty. Docket No. GH0206WO nucleic acid molecule such that different molecules can be distinguished based on their endogenous sequence information (for example, start and / or stop positions where they map to a selected reference genome, a sub-sequence of one or both ends of a sequence, and / or length of a sequence) in combination with at least one molecular barcode. Typically, a sufficient number of different molecular barcodes are used such that there is a low probability (e.g., less than about a 10%, less than about a 5%, less than about a 1%, or less than about a 0.1% chance) that any two molecules may have the same endogenous sequence information (e.g., start and / or stop positions, subsequences of one or both ends of a sequence, and / or lengths) and also have the same molecular barcode. Terms such as “library adapters having distinct molecular barcodes” encompass library adapters for uniquely or non-uniquely tagging molecules, in that regardless of whether the adapters are for unique or non-unique tagging, distinct barcodes will be present in the population of adapters.
[0169] As used herein, DNA that is “not immobilized” or that is “free in solution” refers to DNA that is not bound covalently or non-covalently to a solid support, such as a bead. Such DNA may be free in solution during any step (such as all steps) of the disclosed methods.
[0170] The terms “agent that recognizes a modified nucleobase in DNA,” such as an “agent that recognizes a modified cytosine in DNA” refers to a molecule or reagent that binds to or detects one or more modified nucleobases in DNA, such as methyl cytosine.
[0171] A “modified nucleobase” is a nucleobase that comprises a difference in chemical structure from an unmodified nucleobase. In the case of DNA, an unmodified nucleobase is adenine, cytosine, guanine, or thymine. In some embodiments, a modified nucleobase is a modified cytosine. In some embodiments, a modified nucleobase is a methylated nucleobase. In some embodiments, a modified cytosine is a methyl cytosine, e.g., a 5-methyl cytosine. In such embodiments, the cytosine modification is a methyl. Agents that recognize a methyl cytosine in DNA include but are not limited to “methyl binding reagents,” which refer herein to reagents that bind to a methyl cytosine. Methyl binding reagents include but are not limited to methyl binding domains (MBDs) and methyl binding proteins (MBPs). In some such embodiments, the DNA may be single- stranded or double-stranded. Suitable agents include agents that recognize modified nucleotides in double-stranded DNA, single-stranded DNA, and both double-stranded and single- stranded DNA.
[0172] As used herein, “modified cytosine” refers to a cytosine in which at least one position of the cytosine has been substituted with a chemical moiety, such as a methyl or hydroxymethyl,Atty. Docket No. GH0206WO that is different from the substituent at that position in unmodified cytosine. For the avoidance of doubt, “modified cytosine” does not include unmodified cytosine.
[0173] As used herein, “polynucleotide”, “nucleic acid”, “nucleic acid molecule”, or “oligonucleotide” refers to a linear polymer of nucleosides (including deoxyribonucleosides, ribonucleosides, or analogs thereof) joined by inter-nucleosidic linkages. Typically, a polynucleotide comprises at least three nucleosides. Oligonucleotides often range in size from a few monomeric units, e.g., 3-4, to hundreds of monomeric units. Whenever a polynucleotide is represented by a sequence of letters, such as “ATGCCTG”, the nucleotides are in 5’ - 3’ order from left to right, and in the case of DNA, “A” denotes deoxyadenosine, “C” denotes deoxycytidine, “G” denotes deoxyguanosine, and “T” denotes deoxythymidine, unless otherwise noted. The letters A, C, G, and T may be used to refer to the bases themselves, to nucleosides, or to nucleotides comprising the bases.
[0174] As used herein, “processing” refers to a set of steps used to generate a library of nucleic acids that is suitable for sequencing. The set of steps can include, but are not limited to, partitioning, end repairing, addition of sequencing adapters, tagging, and / or PCR amplification of nucleic acids.
[0175] As used herein, “quantitative measure” refers to an absolute or relative measure. A quantitative measure can be, without limitation, a number, a statistical measurement (e.g., frequency, mean, median, standard deviation, or quantile), or a degree or a relative quantity (e.g., high, medium, and low). A quantitative measure can be a ratio of two quantitative measures. A quantitative measure can be a linear combination of quantitative measures. A quantitative measure may be a normalized measure.
[0176] As used herein, “reference sequence” refers to a known sequence used for purposes of comparison with experimentally determined sequences. For example, a known sequence can be an entire genome, a chromosome, or any segment thereof. A reference sequence can align with a single contiguous sequence of a genome or chromosome or chromosome arm or can include noncontiguous segments that align with different regions of a genome or chromosome. Examples of reference sequences include, for example, human genomes, such as, hgl9 and hg38.
[0177] As used herein, “sample” means anything capable of being analyzed by the methods and / or systems disclosed herein.
[0178] As used herein, “sequencing” refers to any of a number of technologies used to determine the sequence (e.g., the identity and order of monomer units) of a biomolecule, e.g., a nucleic acidAtty. Docket No. GH0206WO such as DNA or RNA. Examples of sequencing methods include, but are not limited to, targeted sequencing, single molecule real-time sequencing, exon or exome sequencing, intron sequencing, electron microscopy -based sequencing, panel sequencing, transistor-mediated sequencing, direct sequencing, random shotgun sequencing, Sanger dideoxy termination sequencing, wholegenome sequencing, sequencing by hybridization, pyrosequencing, duplex sequencing, cycle sequencing, single-base extension sequencing, solid-phase sequencing, high-throughput sequencing, massively parallel signature sequencing, emulsion PCR, co-amplification at lower denaturation temperature-PCR (COLD-PCR), multiplex PCR, sequencing by reversible dye terminator, paired-end sequencing, near-term sequencing, exonuclease sequencing, sequencing by ligation, short-read sequencing, single-molecule sequencing, sequencing-by-synthesis, realtime sequencing, reverse-terminator sequencing, nanopore sequencing, 454 sequencing, Solexa Genome Analyzer sequencing, SOLiD™ sequencing, MS-PET sequencing, and a combination thereof. In some embodiments, sequencing can be performed by a gene analyzer such as, for example, gene analyzers commercially available from Illumina, Inc., Pacific Biosciences, Inc., or Applied Biosystems / Thermo Fisher Scientific, among many others.
[0179] As used herein, “sequence information” in the context of a nucleic acid polymer means the order and identity of monomer units (e.g., nucleotides, etc.) in that polymer.
[0180] As used herein “sequence-variable target region set” refers to a set of target regions that may exhibit changes in sequence such as nucleotide substitutions, insertions, deletions, or gene fusions or transpositions in neoplastic cells (e.g., tumor cells and cancer cells).
[0181] As used herein, the terms “somatic mutation” or “somatic variation” are used interchangeably. They refer to a mutation in the genome that occurs after conception. Somatic mutations can occur in any cell of the body except germ cells and accordingly, are not passed on to progeny.
[0182] As used herein, “subject” refers to an animal, such as a mammalian species (e.g., human) or avian (e.g., bird) species, or other organism, such as a plant. More specifically, a subject can be a vertebrate, e.g., a mammal such as a mouse, a primate, a simian or a human. Animals include farm animals (e.g., production cattle, dairy cattle, poultry, horses, pigs, and the like), sport animals, and companion animals (e.g., pets or support animals). A subject can be a healthy individual, an individual that has or is suspected of having a disease or a predisposition to the disease, or an individual in need of therapy or suspected of needing therapy. The terms “individual” or “patient” are intended to be interchangeable with “subject”. For example, aAtty. Docket No. GH0206WO subject can be an individual who has been diagnosed with having a cancer, is going to receive a cancer therapy, and / or has received at least one cancer therapy. The subject can be in remission of a cancer. As another example, the subject can be an individual who is diagnosed of having an autoimmune disease. As another example, the subject can be a female individual who is pregnant or who is planning on getting pregnant, who may have been diagnosed of or suspected of having a disease, e.g., a cancer, an auto-immune disease.
[0183] As used herein, “tumor fraction” refers to the proportion of cfDNA molecules that originated from tumor cells for a given sample, or sample-region pair.
[0184] As used herein, an “asymmetric adapter” is a double stranded adapter in which the two strands are not completely complementary or are otherwise distinguishable such that synthesis of a complementary sequence of one strand of the adapter results in a sequence that is distinguishable from the sequence of the other strand of the adapter. Examples of asymmetric adapters are Y-shaped adapters and bubble adapters.
[0185] As used herein, a “Y-shaped adapter” refers to an adapter comprising two DNA strands comprising complementary and non-complementary parts, wherein the non-complementary parts form single-stranded arms. The adapter can be attached to a sample or insert DNA molecule, e.g., by ligation, such that the complementary (double-stranded) part of the adapter is proximal to the sample or insert DNA molecule. Prior to attachment, the double stranded portion of the Y- shaped adapter may have a blunt end or an overhang, e.g., of one to three nucleotides. The single stranded arms may or may not be of identical length.
[0186] As used herein, a “bubble adapter” refers to an adapter comprising two DNA strands comprising a non-complementary part flanked by complementary parts, such that the adapter has a single stranded region located between double-stranded regions. The adapter can be attached to a sample or insert DNA molecule, e.g., by ligation, such that one of the complementary (doublestranded) parts of the adapter is proximal to the sample or insert DNA molecule. Prior to attachment, the double stranded portion of the Y-shaped adapter that would be attached to the insert or sample molecule may have a blunt end or an overhang, e.g., of one to three nucleotides. The single stranded portions of the two strands may or may not be of identical length.
[0187] The terms “or a combination thereof’ and “or combinations thereof’ as used herein refers to any and all permutations and combinations of the listed terms preceding the term. For example, “A, B, C, or combinations thereof’ is intended to include at least one of: A, B, C, AB, AC, BC, or ABC, and if order is important in a particular context, also BA, CA, CB, ACB, CBA,Atty. Docket No. GH0206WOBCA, BAC, or CAB. Continuing with this example, expressly included are combinations that contain repeats of one or more item or term, such as BB, AAA, AAB, BBC, AAABCCCC, CBBAAA, CABABB, and so forth. The skilled artisan will understand that typically there is no limit on the number of items or terms in any combination, unless otherwise apparent from the context.
[0188] “Buffy coat” refers to the portion of a blood (such as whole blood) or bone marrow sample that contains all or most of the white blood cells and platelets of the sample. The buffy coat fraction of a sample can be prepared from the sample using centrifugation, which separates sample components by density. For example, following centrifugation of a whole blood sample, the buffy coat fraction is situated between the plasma and erythrocyte (red blood cell) layers. The buffy coat can contain both mononuclear (e.g, T cells, B cells, NK cells, dendritic cells, and monocytes) and polymorphonuclear (e.g., granulocytes such as neutrophils and eosinophils) white blood cells.
[0189] As used herein, “leukapheresis” refers to a procedure in which white blood cells (leukocytes) are isolated from a sample of blood collected from a subject. Leukapheresis may be performed, e.g., obtain cells for research, diagnostic, prognostic, or monitoring purposes, such as those described herein. Thus, as used herein, a “leukapheresis sample” refers to a sample comprising leukocytes collected from a subject using leukapheresis.
[0190] As used herein, “peripheral blood mononuclear cells” or “PBMCs” refers to immune cells having a single, round nucleus that originate in bone marrow and are found in the peripheral circulation. Such cells include, e.g., lymphocytes (T cells, B cells, and NK cells) as well as monocytes, and are isolated from blood samples (such as from a whole blood sample collected from a subject) using density gradient centrifugation.
[0191] As used herein, “amplify,” “amplifying,” or “amplification” refers to a process by which extra or multiple copies of a particular polynucleotide are formed. Amplification methods can include any suitable methods known in the art. As used herein, a nucleic acid molecule amplified using “methylation-preserving amplification” substantially maintains its methylation status postamplification.
[0192] The term “methylation-dependent nuclease” refers to a nuclease that preferentially cuts methylated DNA relative to unmethylated DNA. For example, a methylation-dependent nuclease may cut at or near a recognition sequence such as a restriction site in a manner dependent on methylation of at least one of the nucleobases in the recognition sequence, such as a cytosine. InAtty. Docket No. GH0206WO some embodiments, the nucleolytic activity of the methylation-dependent nuclease is at least 10, 20, 50, or 100-fold higher on a methylated recognition site relative to an unmethylated control in a standard nucleolysis assay. Methylation-dependent nucleases include methylation-dependent restriction enzymes.
[0193] The term “methylation-sensitive nuclease” refers to a nuclease that preferentially cuts unmethylated DNA relative to methylated DNA. For example, a methylation-sensitive nuclease may cut at or near a recognition sequence such as a restriction site in a manner dependent on lack of methylation of at least one of the nucleobases in the recognition sequence, such as a cytosine. In some embodiments, the nucleolytic activity of the methylation-sensitive nuclease is at least 10, 20, 50, or 100-fold higher on an unmethylated recognition site relative to a methylated control in a standard nucleolysis assay. Methylation-sensitive nucleases include methylationsensitive restriction enzymes.
[0194] As used herein, “methylation-dependent restriction enzyme” or “MDRE” refers to a restriction enzyme that is dependent on methylation of the DNA (e.g. cytosine methylation) i.e., the presence or absence of methyl group in a nucleotide base alters the rate at which the enzyme cleaves the target DNA. In some embodiments, the methylation dependent restriction enzymes do not cleave the DNA if a particular nucleotide base is unmethylated at the recognition sequence. For example, MspJI is a methylation dependent restriction enzyme with a recognition sequence “mCNNR(N9)” and it does not cleave DNA if the absence of the methylated cytosine (mC) in the recognition sequence.
[0195] As used herein, “methylation sensitive restriction enzyme” or “MSRE” refers to a restriction enzyme that is sensitive to the methylation status of the DNA (e.g. cytosine methylation) i.e., the presence or absence of methyl group in a nucleotide base alters the rate at which the enzyme cleaves the target DNA. In some embodiments, the methylation sensitive restriction enzymes do not cleave the DNA if a particular nucleotide base is methylated at the recognition sequence. For example, Hpall is a methylation sensitive restriction enzyme with a recognition sequence “CCGG” and it does not cleave DNA if the second cytosine in the recognition sequence is methylated.
[0196] “ Solid tissue” as used herein means tissue other than blood, blood components, other fluids such as lymph and interstitial fluid, and includes, e.g., epithelial tissue, connective tissue, muscle tissue, nervous tissue, and tissue of the colon, lung, breast, skin, prostate, stomach,Atty. Docket No. GH0206WO pancreas, bladder, kidney, and liver. Solid tissue may be normal, precancerous, or cancerous (e.g., a malignant solid tumor such as a carcinoma or sarcoma).
[0197] “ Solid tissue cells” as used herein means cells, respectively, in or derived from a solid tissue. Solid tissue cells exclude circulating cell types, such as cells normally present in blood or lymph.
[0198] A “reaction cleanup” refers to the removal of contaminants such as salts, enzymes, unincorporated dNTPs, primers, ethidium bromide, and other impurities that can interfere with downstream analysis. For example, when a reaction cleanup is performed between end repair and an A-tailing reaction, it removes unincorporated dNTPs such that the A-tailing reaction can be performed solely in the presence of dATP (i.e. not dCTP, dGTP and dCTP, as used in the end tailing reaction). Reaction cleanups can be performed using commercially available kits such as MinElute Reaction Cleanup Kit (Qiagen).
[0199] “Regions of the end-repaired DNA that were synthesized during the end repair reaction”, also referred to as “repaired regions” or “synthesized regions,” refer to regions of the DNA that were not present in the DNA prior to the end repair and A-tailing reactions. They are regions which have been synthesized by the polymerases used in the end repair and / or A tailing reactions, if present. In instances where the A-tailing is performed in the same tube as the end repair reaction, all four types of dNTPs will be present, and thus the polymerases used for A- tailing may generate synthesized regions, e.g. through nick translation. In instances where the A- tailing is performed separately to the end repair reaction, and these steps are separated by a reaction cleanup, only dATP will be present in the A-tailing reaction, and thus the polymerases used for A-tailing will not typically generate synthesized regions because the dNTP components are not all present in the A-tailing reaction mix.
[0200] A “type of dNTP” refers to a dNTP comprising a specific base, including A, T, G or C. Accordingly, wherein an end repair reaction is performed with dNTPs, wherein at least one type of dNTP comprises a modified base, the end repair reaction may be performed using dCTP comprising 5mC, and dATP, dTTP and dGTP all comprising non-modified bases.
[0201] “Capable of identifying the base modification in the at least one type of dNTP” refers to the ability of a modification-sensitive sequencing method to detect the presence or absence of the base modification in the at least one type of dNTP comprising a modified base used in the end repair. This detection of the base modification may be direct, such as in nanopore sequencing or single molecule real time sequencing, wherein the sequencing data itself indicates the presenceAtty. Docket No. GH0206WO or absence of a base modification. Alternatively, the detection of the base modification may be indirect, for example wherein the method involves a conversion procedure which alters the base pairing specificity dependent on the base modification status. It is these changes in base pairing specificity which can be detected by the sequencing method, e.g. through the comparison of the sequencing data to a reference sequence. Moreover, a modification-sensitive sequencing method is capable of identifying the base modification in the at least one type of dNTP regardless of whether it can distinguish one base modification from all other base modifications. For example, one form of modification-sensitive sequencing is sequencing after bisulfite conversion. This method is capable of distinguishing 5hmC and 5mC from unmethylated cytosine, but cannot distinguish 5hmC from 5mC.
[0202] Bases of the “same identity” refer to the same base, regardless of modification status of that base. For example, cytosine is considered to be the “same identity” as 5-methylcytosine (5mC) and / or 5-hydroxymethyl-cytosine (5hmC), despite them having different modification statuses.
[0203] A “X1 / / 7 / / X2 mutation” in a specified polypeptide as used herein, where Xi and X2 are amino acids and mm is a position in an amino acid sequence, refers to a substitution in the polypeptide of amino acid Xi present at position mm of the full-length wild-type polypeptide with amino acid X2. The polypeptide is the human polypeptide unless indicated otherwise. The polypeptide comprising the XmmiX mutation may, but does not necessarily, comprise additional differences from the wild-type sequence, including but not limited to truncations and deletions as well as other substitutions. For example, a “T1372S mutation” in TET2 refers to a substitution in a TET2 enzyme of the threonine present at position 1372 of the full-length wildtype human TET2 enzyme with a serine. Position 1372 of wild-type human TET2 aligns to position 258 and 248, respectively, of the truncated TET2 sequences disclosed as SEQ ID NOs: 23 and 24 of US Patent 10,961,525. Similarly, a “V1900X2 mutation” where X2 is A, C, G, I, or P in TET2 refers to a substitution in a TET2 enzyme of the valine present at position 1900 of the full-length wild-type human TET2 enzyme with an alanine, cysteine, glycine, isoleucine, or proline.
[0204] “ Or” is used in the inclusive sense, i.e., equivalent to “and / or,” unless the context requires otherwise.II. Exemplary methodsAtty. Docket No. GH0206WOA. Overview
[0205] The present disclosure provides methods and systems for analyzing DNA, such as cell- free DNA (cfDNA), and / or for analyzing epigenetic and / or sequence-variable target regions. Without wishing to be bound by any particular theory, cells in or around a cancer or neoplasm may shed more DNA than cells of the same tissue type in a healthy subject. As such, the distribution of tissue of origin of certain DNA samples, such as cfDNA, may change upon carcinogenesis. Thus, for example, a variation in the T cell receptor (e g., a variation in the genetic sequence of the T cell receptor), such as a variation in a CDR3 sequence, in a cell or tissue relative to at least one other cell or tissue type can be an indicator of the presence (or recurrence, depending on the history of the subject) of cancer.
[0206] In some embodiments, for example, as illustrated in FIG. 1A, methods disclosed herein include steps of (a) contacting the DNA with (i) adapter-blocking probes and intron-blocking probes, thereby providing blocked DNA; and (ii) a plurality of primers, wherein the primers bind J regions and the intron-blocking probes bind J-region introns, or the primers bind V regions and the intron-blocking probes bind V-region introns; (b) extending at least a portion of the primers, wherein at least a portion of primers bound to DNA comprising recombined CDR3 sequences are extended across the recombined CDR3 sequences, thereby providing CDR3-enriched DNA; and (c) sequencing the CDR3 -enriched DNA. Optionally, sequence-variable target regions and / or epigenetic target regions are captured from the same DNA sample, optionally from a first, second, or third subsample of the DNA sample, thereby providing captured regions. In such embodiments, as illustrated in Fig. 1A, the captured regions and the CDR3-enriched DNA are pooled and sequenced together. In other embodiments, as illustrated in Fig. IB, the captured regions and the CDR3-enriched DNA are sequenced separately.
[0207] Some embodiments of the disclosed methods comprise (a) contacting the DNA with one or more blocking probes, thereby providing blocked DNA; (b) performing multiplex amplification of a plurality of target regions that may comprise a structural variation using a plurality of first primers and a plurality of second primers that anneal to the plurality of target regions, wherein the blocking probes inhibit amplification of wild-type DNA, thereby providing structural variation-enriched DNA; and (c) sequencing the structural variation-enriched DNA. Optionally, sequence-variable target regions and / or epigenetic target regions are captured from the same DNA sample, optionally from a first, second, or third subsample of the DNA sample, thereby providing captured regions. In some such embodiments, the captured regions and theAtty. Docket No. GH0206WO structural variation-enriched DNA are pooled and sequenced together. In other embodiments, the captured regions and the structural variation -enriched DNA are sequenced separately. In some embodiments the structural variation is a recombined CDR3 sequence.
[0208] As illustrated in Figs. 1C and ID, the plurality of primers, such as the primers that bind V regions (V primers) and the primers that bind J regions (J primers) can be designed using the method, for example, described in Montagne et al., EBioMedicine 59 (2020) 102972. Exemplary primers that bind V regions (V primers) and primers that bind J regions (J primers) can include those described in Montagne et al., EBioMedicine 59 (2020) 102972 and listed in Tables 1-3. In some embodiments, the plurality of primers can include from 5 to 300, from 5 to 10, from 10 to 20, from 20 to 30, from 30 to 40, from 40 to 50, from 50 to 60, from 60 to 70, from 70 to 80, from 80 to 90, from 90 to 100, from 100 to 120, from 120 to 140, from 140 to 160, from 160 to 180, from 180 to 200, from 200 to 220, from 220 to 240, from 240 to 260, from 260 to 280, from 280 to 300, from 40 to 300, from 40 to 280, from 100 to 300, from 100 to 280, from 150 to 300, from 100 to 200, from 200 to 300, from 10 to 150, from 15 to 125, from 20 to 100, from 30 to 90, from 40 to 80, or from 50 to 70 primers that bind V regions. In some embodiments, the plurality of primers can include about 5, about 10, about 20, about 30, about 40, about 45, about 50, about 52, about 54, about 55, about 56, about 57, about 58, about 59, about 60, about 61, about 62, about 63, about 65, about 70, about 75, about 80, about 90, about 100, about 110, about 120, about 130, about 140, about 150, about 152, about 160, about 170, about 180, about 190, about 200, about 210, about 220, about 230, about 240, about 250, about 260, about 266, about 270, about 280, about 290, about 300 primers that bind V regions. In some embodiments, the plurality of primers can include from 5 to 100, from 5 to 10, from 10 to 20, from 20 to 30, from 30 to 40, from 40 to 50, from 50 to 60, from 60 to 70, from 70 to 80, from 80 to 90, from 90 to 100, from 100 to 120, from 50 to 120, from 80 to 120, from 60 to 100, from 5 to 30, from 10 to 25, or from 10 to 20 primers that bind J regions. In some embodiments, the plurality of second primers can include about 5, about 10, about 11, about 12, about 13, about 14, about 15, about 20, about 30, about 40, about 50, about 60, about 70, about 74, about 80, about 89, about 90, about 100, about 110, or about 120 primers that bind J regions.Table 1: Human TCRB V primers and adapter sequencesAty. Docket No. GH0206WOAty. Docket No. GH0206WOAty. Docket No. GH0206WOAty. Docket No. GH0206WOAtty. Docket No. GH0206WOTable 2: Human TCRA, TCRG, and TCRD primers. Adapter sequences can be found in Table 1Aty. Docket No. GH0206WOAty. Docket No. GH0206WOAty. Docket No. GH0206WOAty. Docket No. GH0206WOAtty. Docket No. GH0206WOTable 3: Human TCRB J primers. Adapter sequences can be found in Table 1 (e.g., adapter sequences for the constant region primers).
[0209] In some embodiments, a method of analyzing DNA in an adapted library is provided. Methods disclosed herein include an optional step of dividing an adapted DNA library into at least a first subsample and a second subsample. DNA isolated from a sample can be ligated to adapters for sequencing library preparation. Then, the adapted DNA may be amplified (e.g., LP- PCR) to provide an amplified, adapted library. In some embodiments, the adapted library may beAtty. Docket No. GH0206WO divided into at least a first subsample that comprises recombined CDR3 sequences, a second subsample comprising sequence-variable target regions and / or epigenetic target regions, and optionally a third subsample that can be retained as a test backup. In some embodiments, the adapted library may be divided into at least a first subsample that can be used to prepare CDR3- enriched DNA as described herein, a second subsample from which sequence-variable target regions and / or epigenetic target regions can be captured, and optionally a third subsample that can be retained as a test backup. In some embodiments, the adapted library may be divided into at least a first subsample that may comprise a structural variation , a second subsample comprising sequence-variable target regions and / or epigenetic target regions, and an optional third subsample that can be retained as a test backup. In some embodiments, the CDR3-enriched DNA is separated from a first subsample of the DNA, and sequence-variable target regions and / or epigenetic regions are captured from a second subsample of the DNA. In some embodiments, the CDR3-enriched DNA is separated from a first subsample of the DNA, and sequence-variable target regions and epigenetic regions are captured from a second subsample of the DNA.In some embodiments, the CDR3-enriched DNA is separated from a first subsample of the DNA, sequence-variable target regions are captured from a second subsample of the DNA, and epigenetic target regions are captured from a third subsample of the DNA.
[0210] In some embodiments, the adapted library may be divided into at least a first subsample that can be used to prepare structural variation-enriched DNA as described herein, a second subsample from which sequence-variable target regions and / or epigenetic target regions can be captured, and optionally a third subsample that can be retained as a test backup. In some embodiments, a plurality of target regions that may comprise a structural variation from the first subsample can be amplified, e.g, by multiplex amplification, using a plurality of first primers and a plurality of second primers that anneal to the plurality of target regions, wherein blocking probes inhibit amplification of wild-type DNA, thereby providing structural variation-enriched DNA. A second plurality of target regions comprising sequence-variable target regions and / or epigenetic target regions, e.g., from the second subsample, can be captured, thereby providing captured regions. In some embodiments, the structural variation-enriched DNA is separated from a first subsample of the DNA, sequence-variable target regions are captured from a second subsample of the DNA, and epigenetic target regions are captured from a third subsample of the DNA.Atty. Docket No. GH0206WO
[0211] The captured regions and the CDR3-enriched DNA or the structural variation-enriched DNA can be sequenced. In some embodiments, the captured regions and the CDR3 -enriched DNA or the structural variation-enriched DNA can be pooled and sequenced together. Alternatively, in some other embodiments, the captured regions and the CDR3 -enriched DNA or the structural variation-enriched DNA can be sequenced separately.
[0212] In some embodiments, at least a portion of the plurality of primers that bind I regions and / or at least a portion of the plurality of primers that bind V regions do not exponentially amplify a target region that does not comprise a CDR3 sequence. For example, when a J region intron-blocking probe anneals to a I region intron (and / or when a V region intron-blocking probe anneals to a V-region intron), extension of a primer (such as a primer that binds V regions or a primer that binds J regions) annealed upstream of the intron-blocking probe is blocked by the presence of the blocking probe (See FIG. 1C). However, when a target region comprises a recombined CDR3 region (and thus does not comprise a V intron or a J intron), the primer can be extended in the direction of the CDR3 region because there is no annealed downstream blocking probe in a position to block it. In some embodiments, the blocking probes inhibit amplification of wild-type DNA. Thus, in some embodiments, at least a portion of a plurality of first primers and / or at least a portion of a plurality of second primers do not exponentially amplify a target region that does not comprise a structural variation. For example, when a blocking probe is annealed to a wild type target region, extension of a first or second primer annealed upstream of the blocking probe is blocked by the downstream blocking probe. However, when a first and / or a second primer is annealed upstream of a target region comprising a structural variation (such as a deletion, inversion, or translocation), the primer can be extended in the direction of the structural variation because there is no annealed downstream blocking probe in a position to block it, e.g., because the structural variation disrupts, removes, or translocates the sequence that otherwise would have been bound by the blocking probe, such that amplification is not inhibited by the blocking probe. Thus “extended primers” as used herein refers to primers whose extension was not blocked by blocking probe, such as a J region intron-blocking probe, a V region intron blocking probe, or a probe that blocks a wild type sequence.
[0213] Primer extension can result in incorporation of a labeled (e.g., biotinylated) nucleotide, and complexes of such extended primers and the molecules to which they are annealed can then be enriched, e.g., for further analysis such as sequencing. In some embodiments, a majority (e.g., at least 60%, 70%, 80%, 90%, or 95%) of the plurality of primers annealed to V regions or to JAtty. Docket No. GH0206WO regions are blocked by a downstream blocking probe (such as a V intron blocking probe or a J intron blocking probe). In some embodiments, a majority (e.g., at least 60%, 70%, 80%, 90%, or 95%) of the plurality of primers annealed to a wild type target region are blocked by a downstream blocking probe. In some embodiments, a small amount of extension may be permitted depending on the amount of separation between primers, but the V intron, J intron, or wild type target region cannot be exponentially amplified. The presence of short primer-extended products and unextended primers can be minimized by performing an amplification that is dependent on the presence of adapter sequences at both ends of the DNA for exponential amplification, e.g., such that unextended primers or short primer-extended products (which resulted from extension that was blocked before reaching an adapter sequence in the template molecule) are not exponentially amplified or are amplified to a lesser extent than primer- extended products that comprise adapter sequences at both ends.
[0214] Extension being blocked by a downstream V intron blocking probe or a downstream J intron blocking probe when the plurality of primers that bind V regions and / or primers that bind J regions are annealed to a sequence that does not comprise a CDR3 sequence means that the 5’ to 3’ exonuclease negative, strand displacement negative DNA polymerase is unable to continue extending a primer when it encounters the 5’ end of a blocking probe. Similarly, extension being blocked by a downstream blocking probe when the plurality of first primers or the plurality of second primers are annealed to a wild-type sequence means that the 5’ to 3’ exonuclease negative, strand displacement negative DNA polymerase is unable to continue extending a primer when it encounters the 5’ end of a blocking probe. This may occur after a short amount of extension, such as up to 50 nucleotides, e.g., up to 40, 30, 20, 10, 5, 4, 3, 2, or 1 nucleotides. The amount of permitted extension before becoming blocked will depend on the distance between the primer and the probe when annealed to the template sequence. In some embodiments, at least 60%, 70%, 80%, 90%, or 95% of the the plurality of primers that bind V regions and the plurality of primers that bind J regions are blocked this way when annealed to sequence that does not comprise a CDR3 sequence. In some embodiments, at least 60%, 70%, 80%, 90%, or 95% of the the plurality of first primers and the plurality of second primers are blocked this way when annealed to wild-type sequence.
[0215] In some embodiments, the sample comprises a mixture of DNA lacking a CDR3 sequence and DNA comprising a CDR3 sequence, and the extension of at least one primer is blocked on the DNA lacking a CDR3 sequence but not on the DNA comprising a CDR3Atty. Docket No. GH0206WO sequence. Thus, such methods can preferentially generate products in an amplified (primer- extended) sample complementary to the CDR3 sequence. This can facilitate the detection of such CDR3 sequences because they are more common in the amplified sample than in the initial sample.
[0216] In some embodiments, the sample comprises a mixture of DNA having a wild-type sequence and a structural variation mutant sequence, and the extension of at least one primer is blocked on the wild-type sequence but not on the structural variation mutant sequence. Thus, such methods can preferentially generate products in an amplified (primer-extended) sample complementary to the structural variation mutant sequence. This can facilitate the detection of such sequences because they are more common in the amplified sample than in the initial sample.
[0217] In some embodiments, a method of analyzing DNA in an adapted library is provided. Methods disclosed herein include preparing the adapted library by ligating adaptors to cfDNA from a subject having or suspected of having a cancer, thereby producing adapted cfDNA, optionally followed by amplifying (e.g., LP-PCR) the adapted cfDNA. The adapted library may be partitioned into at least a first subsample comprising recombined CDR3 sequences, a second subsample comprising sequence-variable target regions and / or epigenetic target regions, and an optional third subsample that can be retained as a test backup. A plurality of target regions comprising sequence-variable target regions and / or epigenetic target regions from the second subsample can be captured, thereby providing captured regions. At least a portion of primers bound to DNA comprising recombined CDR3 sequences can be extended across the recombined CDR3 sequences of the DNA of the first subsample, thereby providing CDR3-enriched DNA. The captured regions and the CDR3 -enriched DNA can be sequenced. In some embodiments, the captured regions and the CDR3-enriched DNA can be pooled and sequenced together. In other embodiments, the captured regions and the CDR3-enriched DNA can be sequenced separately.
[0218] Some embodiments of the disclosed methods comprise (a) contacting the DNA with (i) adapter-blocking probes and intron-blocking probes, thereby providing blocked DNA; and (ii) a plurality of primers, wherein the primers bind J regions and the intron-blocking probes bind J- region introns, or the primers bind V regions and the intron-blocking probes bind V-region introns; (b) extending at least a portion of the primers, wherein at least a portion of primers bound to DNA comprising recombined CDR3 sequences are extended across the recombinedAtty. Docket No. GH0206WOCDR3 sequences, thereby providing CDR3 -enriched DNA; and (c) sequencing the CDR3- enriched DNA. In some embodiments, the adapter-blocking probes and intron-blocking probes are contacted with the DNA before the DNA is contacted with the plurality of primers. In some embodiments, the adapter-blocking probes and intron-blocking probes are contacted with the DNA at the same time (i.e., simultaneously) as the DNA is contacted with the plurality of primers. In some embodiments, the primers are extended using a polymerase that lacks 5’ to 3’ exonuclease activity.
[0219] In some embodiments, the DNA is contacted with adapter-blocking probes, V region intron-blocking probes, and a plurality of primers that bind V regions. In some embodiments, the DNA is contacted with adapter-blocking probes, J region intron-blocking probes, and a plurality of primers that bind J regions. In some embodiments, the DNA is contacted with adapterblocking probes, J region intron-blocking probes, V region intron-blocking probes, a plurality of primers that bind V regions, and a plurality of primers that bind I regions. In some embodiments, an adapter-blocking probe blocks a 3’ adapter of the DNA of the adapted library (e g., by annealing to the adapter). In some embodiments, an adapter-blocking probe blocks a 5’ adapter of the DNA of the adapted library (e.g., by annealing to the adapter).
[0220] In certain embodiments, the plurality of primers that bind V regions are oriented to prime extension toward the J exon. In some embodiments, the plurality of primers that bind J regions are oriented to prime extension toward the V exon.
[0221] In some embodiments, the plurality of primers comprises from 5 to 300, from 5 to 10, from 10 to 20, from 20 to 30, from 30 to 40, from 40 to 50, from 50 to 60, from 60 to 70, from 70 to 80, from 80 to 90, from 90 to 100, from 100 to 120, from 120 to 140, from 140 to 160, from 160 to 180, from 180 to 200, from 200 to 220, from 220 to 240, from 240 to 260, from 260 to 280, from 280 to 300, from 40 to 300, from 40 to 280, from 100 to 300, from 100 to 280, from 150 to 300, from 100 to 200, from 200 to 300, from 10 to 150, from 15 to 125, from 20 to 100, from 30 to 90, from 40 to 80, or from 50 to 70 primers that bind V regions. In some embodiments, the plurality of primers comprises from 5 to 120, from 5 to 10, from 10 to 20, from 20 to 30, from 30 to 40, from 40 to 50, from 50 to 60, from 60 to 70, from 70 to 80, from 80 to 90, from 90 to 100, from 100 to 120, from 50 to 120, from 80 to 120, from 60 to 100, from 5 to 30, from 10 to 25, or from 10 to 20 primers that bind J regions. In some embodiments, the plurality of primers comprises from 5 to 300, from 5 to 10, from 10 to 20, from 20 to 30, from 30 to 40, from 40 to 50, from 50 to 60, from 60 to 70, from 70 to 80, from 80 to 90, from 90 to 100,Atty. Docket No. GH0206WO from 100 to 120, from 120 to 140, from 140 to 160, from 160 to 180, from 180 to 200, from 200 to 220, from 220 to 240, from 240 to 260, from 260 to 280, from 280 to 300, from 40 to 300, from 40 to 280, from 100 to 300, from 100 to 280, from 150 to 300, from 100 to 200, from 200 to 300, from 10 to 150, from 15 to 125, from 20 to 100, from 30 to 90, from 40 to 80, or from 50 to 70 primers that bind V regions, and the plurality of primers comprises from 5 to 120, from 5 to 10, from 10 to 20, from 20 to 30, from 30 to 40, from 40 to 50, from 50 to 60, from 60 to 70, from 70 to 80, from 80 to 90, from 90 to 100, from 100 to 120, from 50 to 120, from 80 to 120, from 60 to 100, from 5 to 30, from 10 to 25, or from 10 to 20 primers that bind J regions.
[0222] In some embodiments, the adapter-blocking probes comprise from 5 to 300, from 5 to 10, from 10 to 20, from 20 to 30, from 30 to 40, from 40 to 50, from 50 to 60, from 60 to 70, from 70 to 80, from 80 to 90, from 90 to 100, from 100 to 120, from 120 to 140, from 140 to 160, from 160 to 180, from 180 to 200, from 200 to 220, from 220 to 240, from 240 to 260, from 260 to 280, from 280 to 300, from 40 to 300, from 40 to 280, from 100 to 300, from 100 to 280, from 150 to 300, from 100 to 200, from 200 to 300, from 10 to 150, from 15 to 125, from 20 to 100, from 30 to 90, from 40 to 80, or from 50 to 70 adapter-blocking probes.
[0223] In some embodiments, the intron-blocking probes comprise from 5 to 300, from 5 to 10, from 10 to 20, from 20 to 30, from 30 to 40, from 40 to 50, from 50 to 60, from 60 to 70, from 70 to 80, from 80 to 90, from 90 to 100, from 100 to 120, from 120 to 140, from 140 to 160, from 160 to 180, from 180 to 200, from 200 to 220, from 220 to 240, from 240 to 260, from 260 to 280, from 280 to 300, from 40 to 300, from 40 to 280, from 100 to 300, from 100 to 280, from 150 to 300, from 100 to 200, from 200 to 300, from 10 to 150, from 15 to 125, from 20 to 100, from 30 to 90, from 40 to 80, or from 50 to 70 intron-blocking probes. In some embodiments, the intron-blocking probes comprise about 5, about 10, about 20, about 30, about 40, about 45, about 50, about 52, about 54, about 55, about 56, about 57, about 58, about 59, about 60, about 61, about 62, about 63, about 65, about 70, about 75, about 80, about 90, about 100, about 110, about 120, about 130, about 140, about 150, about 152, about 160, about 170, about 180, about 190, about 200, about 210, about 220, about 230, about 240, about 250, about 260, about 266, about 270, about 280, about 290, about 300 intron-blocking probes. In some embodiments, the intronblocking probes comprise from 5 to 120, from 5 to 10, from 10 to 20, from 20 to 30, from 30 to 40, from 40 to 50, from 50 to 60, from 60 to 70, from 70 to 80, from 80 to 90, from 90 to 100, from 100 to 120, from 50 to 120, from 80 to 120, from 60 to 100, from 5 to 30, from 10 to 25, or from 10 to 20 intron-blocking probes. In some embodiments, the intron-blocking probesAtty. Docket No. GH0206WO comprise about 5, about 10, about 11, about 12, about 13, about 14, about 15, about 20, about 30, about 40, about 50, about 60, about 70, about 74, about 80, about 89, about 90, about 100, about 110, or about 120 intron-blocking probes.
[0224] In some embodiments, the V region intron-blocking probes comprise from 5 to 300, from 5 to 10, from 10 to 20, from 20 to 30, from 30 to 40, from 40 to 50, from 50 to 60, from 60 to 70, from 70 to 80, from 80 to 90, from 90 to 100, from 100 to 120, from 120 to 140, from 140 to 160, from 160 to 180, from 180 to 200, from 200 to 220, from 220 to 240, from 240 to 260, from 260 to 280, from 280 to 300, from 40 to 300, from 40 to 280, from 100 to 300, from 100 to 280, from 150 to 300, from 100 to 200, from 200 to 300, from 10 to 150, from 15 to 125, from 20 to 100, from 30 to 90, from 40 to 80, or from 50 to 70 V region intron-blocking probes. In some embodiments, the V region intron-blocking probes comprise about 5, about 10, about 20, about 30, about 40, about 45, about 50, about 52, about 54, about 55, about 56, about 57, about 58, about 59, about 60, about 61, about 62, about 63, about 65, about 70, about 75, about 80, about 90, about 100, about 110, about 120, about 130, about 140, about 150, about 152, about 160, about 170, about 180, about 190, about 200, about 210, about 220, about 230, about 240, about 250, about 260, about 266, about 270, about 280, about 290, about 300 V region intron-blocking probes.
[0225] In some embodiments, the J region intron-blocking probes comprise from 5 to 120, from 5 to 10, from 10 to 20, from 20 to 30, from 30 to 40, from 40 to 50, from 50 to 60, from 60 to 70, from 70 to 80, from 80 to 90, from 90 to 100, from 100 to 120, from 50 to 120, from 80 to 120, from 60 to 100, from 5 to 30, from 10 to 25, or from 10 to 20 J region intron-blocking probes. In some embodiments, the J region intron-blocking probes comprise about 5, about 10, about 11, about 12, about 13, about 14, about 15, about 20, about 30, about 40, about 50, about 60, about 70, about 74, about 80, about 89, about 90, about 100, about 110, or about 120 J region intronblocking probes.
[0226] In some embodiments of the disclosed methods, the DNA contacted with the adapterblocking probes, intron-blocking probes, and primers is in a first subsample. In some embodiments, at least a second subsample of the adapted library is retained as a backup.
[0227] In some embodiments, a plurality of target regions is captured from the DNA, thereby providing captured regions. In some embodiments, the plurality of target regions comprise sequence-variable target regions and / or epigenetic target regions. In particular embodiments, the plurality of target regions comprise sequence-variable target regions and epigenetic targetAtty. Docket No. GH0206WO regions. In particular embodiments, the plurality of target regions comprise sequence-variable target regions. In particular embodiments, the plurality of target regions comprise epigenetic target regions. In some embodiments, the plurality of target regions is captured from a first subsample or from a second subsample of the adapted library. In particular embodiments, a third subsample of the adapted library is retained as a backup. In some embodiments, the plurality of target regions are captured from a first subsample of the adapted library (such as from the same subsample from which CDR3-enriched DNA is separated). In some embodiments, the plurality of target regions are captured from a second subsample of the adapted library. In some embodiments, the plurality of target regions are captured from a third subsample of the adapted library.
[0228] In some embodiments, the captured regions are sequenced. In some embodiments, the captured sequence-variable target regions are sequenced. In some embodiments, the captured epigenetic target regions are sequenced. In some embodiments, the captured sequence-variable target regions and the captured epigenetic target regions are sequenced. In some embodiments, the captured regions are amplified prior to sequencing. In particular embodiments, the captured regions and the CDR3 -enriched DNA are pooled and sequenced together. In other particular emnbodiments, the captured regions and the CDR3-enriched DNA are sequenced separately.
[0229] In some embodiments, the epigenetic target regions comprise hypermethylation variable target regions, hypomethylation variable target regions, methylation control target regions, or fragmentation variable target regions. In some embodiments, a somatic mutation load is quantified using a plurality of captured regions comprising the sequence-variable target regions.
[0230] In some embodiments, at least a portion of the plurality of primers that bind J regions and / or at least a portion of the plurality of primers that bind V regions do not exponentially amplify a target region that does not comprise a CDR3 sequence. In some embodiments, hybridization of an intron-blocking probe to the DNA at least partially blocks extension of at least a portion of the plurality of primers. In some embodiments, hybridization of a J-region intron-blocking probe to a region of the DNA at least partially blocks extension of at least a portion of the plurality of primers that bind J regions and / or at least a portion of the plurality of primers that bind V regions. In some embodiments, hybridization of a V-region intron-blocking probe to a region of the DNA at least partially blocks extension of at least a portion of the plurality of primers that bind J regions and / or at least a portion of the plurality of primers that bind V regions.Atty. Docket No. GH0206WO
[0231] In certain embodiments, the plurality of primers that bind V regions and / or the plurality of second that bind J regions comprise a label. In some embodiments, the plurality of primers that bind V regions and / or the plurality of primers that bind J regions comprise the same label. In other embodiments, the plurality of primers that bind V regions comprises a first label and the plurality of primers that bind J regions comprises a second label. In some embodiments, as described elsewhere herein, a label is incorporated into the CDR-3 enriched DNA during the extending segments comprising recombined CDR3 sequences. In some embodiments, the label is biotin, avidin, streptavidin, neutravidin, an oligonucleotide, digoxygenin, a histidine tag, an affinity tag, an immunoglobulin constant domain, a nucleic acid comprising a particular nucleotide sequence, a hapten recognized by an antibody, or a magnetically attractable particle.
[0232] In some embodiments, the CDR3 sequence is a part of a T cell receptor (TCR), TCR beta chain, B cell receptor, immunoglobulin, B cell receptor heavy chain, or immunoglobulin heavy chain.
[0233] In some embodiments, the CDR3-enriched DNA comprises extended primers that bind V regions, and / or extended primers that bind J regions, wherein extension was not blocked by a J region intron-blocking probe or by a V region intron-blocking probe. In some embodiments, the CDR3-enriched DNA comprises extended primers that bind V regions, and extended primers that bind J regions, wherein extension was not blocked by a J region intron-blocking probe or by a V region intron-blocking probe. In some embodiments, the CDR3-enriched DNA comprises extended primers that bind V regions, and extended primers that bind J regions, wherein extension was not blocked by a J region intron-blocking probe. In some embodiments, the CDR3-enriched DNA comprises extended primers that bind V regions, and extended primers that bind J regions, wherein extension was not blocked by a V region intron-blocking probe.
[0234] In some embodiments, the CDR3-enriched DNA comprises extended primers that bind V regions, wherein extension was not blocked by a J region intron-blocking probe or by a V region intron-blocking probe. In some embodiments, the CDR3 -enriched DNA comprises extended primers that bind V regions, wherein extension was not blocked by a J region intron-blocking probe. In some embodiments, the CDR3-enriched DNA comprises extended primers that bind V regions, wherein extension was not blocked by a V region intron-blocking probe. In some embodiments, the CDR3-enriched DNA comprises extended primers that bind J regions, wherein extension was not blocked by a J region intron-blocking probe or by a V region intron-blocking probe. In some embodiments, the CDR3-enriched DNA comprises extended primers that bind JAtty. Docket No. GH0206WO regions, wherein extension was not blocked by a J region intron-blocking probe. In some embodiments, the CDR3 -enriched DNA comprises extended primers that bind J regions, wherein extension was not blocked by a V region intron-blocking probe.
[0235] In some embodiments of the disclosed methods, each of the plurality of primers that bind J regions and / or each of the plurality of primers that bind V regions comprises at least 18, 19, or 20 linked nucleosides. In some embodiments, each of the plurality of primers that bind J regions and each of the plurality of primers that bind V regions comprises at least 18, 19, or 20 linked nucleosides. In some embodiments, each of the plurality of primers that bind J regions comprises at least 18, 19, or 20 linked nucleosides. In some embodiments, each of the plurality of primers that bind V regions comprises at least 18, 19, or 20 linked nucleosides.
[0236] In some embodiments of the disclosed methods, each of the plurality of primers that bind J regions and / or each of the plurality of primers that bind V regions comprises 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 linked nucleosides. In some embodiments, each of the plurality of primers that bind I regions and each of the plurality of primers that bind V regions comprises 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 linked nucleosides. In some embodiments, each of the plurality of primers that bind J regions comprises 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 linked nucleosides. In some embodiments, each of the plurality of primers that bind V regions comprises 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 linked nucleosides.
[0237] In some embodiments, each of the plurality of primers that bind J regions and / or each of the plurality of primers that bind V regions consists of 18, 19, or 20 to 60 linked nucleosides. In some embodiments, each of the plurality of primers that bind J regions and each of the plurality of primers that bind V regions consists of 18, 19, or 20 to 60 linked nucleosides. In some embodiments, each of the plurality of primers that bind J regions consists of 18, 19, or 20 to 60 linked nucleosides. In some embodiments, each of the plurality of primers that bind V regions consists of 18, 19, or 20 to 60 linked nucleosides.
[0238] The method of any of the preceding claims, wherein the plurality of primers that bind J regions and / or the plurality of primers that bind V regions are resistant to 5’ exonucleolysis.
[0239] Some embodiments of the disclosed methods comprise (a) contacting the DNA with one or more blocking probes, thereby providing blocked DNA; (b) performing multiplex amplification of a plurality of target regions that may comprise a structural variation using a plurality of first primers and a plurality of second primers that anneal to the plurality of target regions, wherein the blocking probes inhibit amplification of wild-type DNA, thereby providingAtty. Docket No. GH0206WO structural variation-enriched DNA; and (c) sequencing the structural variation-enriched DNA. In some emnbodiments, the multiplex amplification is performed with a non-strand-displacing polymerase. In some embodiments, the polymerase lacks 5’ to 3’ exonuclease activity.
[0240] In some such embodiments, the plurality of first primers comprises from 5 to 300, from 5 to 10, from 10 to 20, from 20 to 30, from 30 to 40, from 40 to 50, from 50 to 60, from 60 to 70, from 70 to 80, from 80 to 90, from 90 to 100, from 100 to 120, from 120 to 140, from 140 to 160, from 160 to 180, from 180 to 200, from 200 to 220, from 220 to 240, from 240 to 260, from 260 to 280, from 280 to 300, from 40 to 300, from 40 to 280, from 100 to 300, from 100 to 280, from 150 to 300, from 100 to 200, from 200 to 300, from 10 to 150, from 15 to 125, from 20 to 100, from 30 to 90, from 40 to 80, or from 50 to 70 primers. In some embodiments, the plurality of first primers comprises from 5 to 120, from 5 to 10, from 10 to 20, from 20 to 30, from 30 to 40, from 40 to 50, from 50 to 60, from 60 to 70, from 70 to 80, from 80 to 90, from 90 to 100, from 100 to 120, from 50 to 120, from 80 to 120, from 60 to 100, from 5 to 30, from 10 to 25, or from 10 to 20 primers.
[0241] In some such embodiments, the plurality of second primers comprises from 5 to 300, from 5 to 10, from 10 to 20, from 20 to 30, from 30 to 40, from 40 to 50, from 50 to 60, from 60 to 70, from 70 to 80, from 80 to 90, from 90 to 100, from 100 to 120, from 120 to 140, from 140 to 160, from 160 to 180, from 180 to 200, from 200 to 220, from 220 to 240, from 240 to 260, from 260 to 280, from 280 to 300, from 40 to 300, from 40 to 280, from 100 to 300, from 100 to 280, from 150 to 300, from 100 to 200, from 200 to 300, from 10 to 150, from 15 to 125, from 20 to 100, from 30 to 90, from 40 to 80, or from 50 to 70 primers. In some embodiments, the plurality of second primers comprises from 5 to 120, from 5 to 10, from 10 to 20, from 20 to 30, from 30 to 40, from 40 to 50, from 50 to 60, from 60 to 70, from 70 to 80, from 80 to 90, from 90 to 100, from 100 to 120, from 50 to 120, from 80 to 120, from 60 to 100, from 5 to 30, from 10 to 25, or from 10 to 20 primers.
[0242] In some embodiments, the adapter-blocking probes comprise from 5 to 300, from 5 to 10, from 10 to 20, from 20 to 30, from 30 to 40, from 40 to 50, from 50 to 60, from 60 to 70, from 70 to 80, from 80 to 90, from 90 to 100, from 100 to 120, from 120 to 140, from 140 to 160, from 160 to 180, from 180 to 200, from 200 to 220, from 220 to 240, from 240 to 260, from 260 to 280, from 280 to 300, from 40 to 300, from 40 to 280, from 100 to 300, from 100 to 280, from 150 to 300, from 100 to 200, from 200 to 300, from 10 to 150, from 15 to 125, from 20 to 100, from 30 to 90, from 40 to 80, or from 50 to 70 adapter-blocking probes.Atty. Docket No. GH0206WO
[0243] In some embodiments, the blocking probes comprise from 5 to 300, from 5 to 10, from 10 to 20, from 20 to 30, from 30 to 40, from 40 to 50, from 50 to 60, from 60 to 70, from 70 to 80, from 80 to 90, from 90 to 100, from 100 to 120, from 120 to 140, from 140 to 160, from 160 to 180, from 180 to 200, from 200 to 220, from 220 to 240, from 240 to 260, from 260 to 280, from 280 to 300, from 40 to 300, from 40 to 280, from 100 to 300, from 100 to 280, from 150 to 300, from 100 to 200, from 200 to 300, from 10 to 150, from 15 to 125, from 20 to 100, from 30 to 90, from 40 to 80, or from 50 to 70 blocking probes. In some embodiments, the blocking probes comprise about 5, about 10, about 20, about 30, about 40, about 45, about 50, about 52, about 54, about 55, about 56, about 57, about 58, about 59, about 60, about 61, about 62, about 63, about 65, about 70, about 75, about 80, about 90, about 100, about 110, about 120, about 130, about 140, about 150, about 152, about 160, about 170, about 180, about 190, about 200, about 210, about 220, about 230, about 240, about 250, about 260, about 266, about 270, about 280, about 290, about 300 blocking probes. In some embodiments, the blocking probes comprise from 5 to 120, from 5 to 10, from 10 to 20, from 20 to 30, from 30 to 40, from 40 to 50, from 50 to 60, from 60 to 70, from 70 to 80, from 80 to 90, from 90 to 100, from 100 to 120, from 50 to 120, from 80 to 120, from 60 to 100, from 5 to 30, from 10 to 25, or from 10 to 20 blocking probes. In some embodiments, the blocking probes comprise about 5, about 10, about 11, about 12, about 13, about 14, about 15, about 20, about 30, about 40, about 50, about 60, about 70, about 74, about 80, about 89, about 90, about 100, about 110, or about 120 blocking probes.
[0244] In some embodiments, the blocking probes comprise adaptor-blocking probes. In some embodiments, the blocking probes comprise intron-blocking probes. In some embodiments, the blocking probes comprise adaptor-blocking probes and intron-blocking probes.
[0245] In particular embodiments, a first subsample of the DNA is contacted with the one or more blocking probes. In some embodiments, a second subsample of the adapted library is retained as a backup. In some emnbodiments, a second plurality of target regions is captured from the DNA, thereby providing captured regions. In some embodiments, the second plurality of target regions comprise sequence-variable target regions and / or epigenetic target regions. In some embodiments, the second plurality of target regions comprise sequence-variable target regions. In some embodiments, the second plurality of target regions comprise epigenetic target regions. In some embodiments, the second plurality of target regions are captured from a first subsample or from a second subsample of the adapted library. In some embodiments, a third subsample of the adapted library is retained as a backup. In some embodiments, the secondAtty. Docket No. GH0206WO plurality of target regions are captured from a first subsample of the adapted library (such as from the same subsample from which structural varation-enriched DNA is separated). In some embodiments, the second plurality of target regions are captured from a second sub sample of the adapted library. In some embodiments, the second plurality of target regions are captured from a third subsample of the adapted library.
[0246] In some embodiments, the captured regions are sequenced. In some embodiments, the captured sequence-variable target regions are sequenced. In some embodiments, the captured epigenetic target regions are sequenced. In some embodiments, the captured sequence-variable target regions and the captured epigenetic target regions are sequenced. In some embodiments, the captured regions are amplified prior to sequencing. In particular embodiments, the captured regions and the structural varation-enriched DNA are pooled and sequenced together. In other particular emnbodiments, the captured regions and the structural varation -enriched DNA are sequenced separately.
[0247] In some embodiments, the epigenetic target regions comprise hypermethylation variable target regions, hypomethylation variable target regions, methylation control target regions, or fragmentation variable target regions. In some embodiments, a somatic mutation load is quantified using a plurality of captured regions comprising the sequence-variable target regions.
[0248] In some embodiments, at least a portion of the plurality of first primers and / or at least a portion of the plurality of second primers do not exponentially amplify a target region that does not comprise a structural variation. In some embodiments, hybridization of a blocking probe to a region of the DNA at least partially blocks extension of at least a portion of the plurality of first primers and / or at least a portion of the plurality of second primers.
[0249] In some embodiments, the structural variation comprises a rearrangement, an insertion, or a deletion. In particular embodiments, the rearrangement comprises translocations, gene fusions, duplications, copy-number variants, or inversions. In some embodiments, the rearrangement is a CDR3 sequence.
[0250] In some embodiments, the structural variation-enriched DNA comprises extended first primers, and / or extended second primers, wherein extension was not blocked by a blocking probe.
[0251] In certain embodiments, the plurality of first primers and / or the plurality of second primers comprise a label. In some embodiments, the plurality of first primers and the plurality of second primers comprise the same label. In other embodiments, the plurality of first primersAtty. Docket No. GH0206WO comprises a first label and the plurality of second primers comprises a second label. In some embodiments, such as discussed elsewhere herein, a label is incorporated into the structural variation-enriched DNA during the multiplex amplification of the plurality of target regions that may comprise a structural variation. In some embodiments, the label is biotin, avidin, streptavidin, neutravidin, an oligonucleotide, digoxygenin, a histidine tag, an affinity tag, an immunoglobulin constant domain, a nucleic acid comprising a particular nucleotide sequence, a hapten recognized by an antibody, or a magnetically attractable particle.
[0252] In some embodiments of the disclosed methods, each of the plurality of first primers and / or each of the plurality of second primers comprises at least 18, 19, or 20 linked nucleosides. In some embodiments, each of the plurality of first primers and each of the plurality of second primers comprises at least 18, 19, or 20 linked nucleosides. In some embodiments, each of the plurality of first primers comprises at least 18, 19, or 20 linked nucleosides. In some embodiments, each of the plurality of second primers comprises at least 18, 19, or 20 linked nucleosides.
[0253] In some embodiments of the disclosed methods, each of the plurality of first primers and / or each of the plurality of second primers comprises 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 linked nucleosides. In some embodiments, each of the plurality of first primers and each of the plurality of second primers that bind V regions comprises 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 linked nucleosides. In some embodiments, each of the plurality of first primers comprises 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 linked nucleosides. In some embodiments, each of the plurality of second primers comprises 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 linked nucleosides.
[0254] In some embodiments, each of the plurality of first primers and / or each of the plurality of second primers consists of 18, 19, or 20 to 60 linked nucleosides. In some embodiments, each of the plurality of first primers and each of the plurality of second primers consists of 18, 19, or 20 to 60 linked nucleosides. In some embodiments, each of the plurality of first primers consists of 18, 19, or 20 to 60 linked nucleosides. In some embodiments, each of the plurality of second primers consists of 18, 19, or 20 to 60 linked nucleosides.
[0255] Some embodiments of the disclosed methods comprise preparing the adapted library by ligating adapters to DNA, thereby producing adapted DNA. In some embodiments, the adapted DNA comprises molecular barcodes. In some embodiments, the adapted library is prepared from cfDNA, and the adapted library is prepared by ligating adaptors to cfDNA, thereby producingAtty. Docket No. GH0206WO adapted cfDNA, and the adapted cfDNA can then optionally be amplified. In some embodiments, the adapted library is prepared from DNA from a subject having or suspected of having a cancer. Some embodiments of the disclosed methods comprise preparing the adapted library by ligating adaptors to cfDNA from a subject having or suspected of having a cancer, thereby producing adapted cfDNA. Thus, some embodiments further comprise determining a likelihood that the subject has a cancer.
[0256] In some embodiments, the primers used herein can be shorter than 100 nucleosides in length and at least 20 nucleotides in length. In some embodiments, the primers can be 20-60, 25- 60, or 30-40 nucleotides in length.
[0257] In some embodiments, a majority (e.g., at least 60%, 70%, 80%, 90%, or all) of the primers used in primer extension anneal to their corresponding target regions in a parallel orientation. In such embodiments, the 3’ end of each primer points toward the 5’ end of another primer and / or the 5’ end of each primer points toward the 3’ end of another primer when they are annealed to the same DNA strand. Thus, in such embodiments, only one strand of DNA in target region anneals to any of the plurality of primers. In some embodiments, a majority (e.g., at least 60%, 70%, 80%, 90%, or all) of the primers anneal to a site separated from the site to which another primer anneals by less than or equal to about 50 nucleotides (e.g., less than or equal to 40, 30, 20, 10, 5, 4, 3, 2, or 1 nucleotides).
[0258] In some embodiments, the primers used have approximately uniform melting temperatures with respect to annealing to their complementary sequences, e.g., so that substantially all of the primers are capable of annealing under the same condition. For example, the plurality of primers may each anneal to their respective complementary sequence with a melting temperature within a range of 10°C, 9°C, 8°C, 7°C, 6°C, 5°C, 4°C, 3°C, 2°C, or 1°C. Variation of parameters such as primer length, GC content, and nucleotide modifications (e.g., methylation, base analogs, or LNA modifications) are known approaches for adjusting primer melting temperatures to desired values.
[0259] In some embodiments, the amplified products are enriched and / or captured using a solid support linked to a binding partner of the label, thereby also enriching and / or capturing the DNA molecules isolated from the sample that are hybridized to the amplified products. In some embodiments, the DNA molecules are denatured from their corresponding amplified products and sequenced in order to determine if they comprise a rearrangement and, if so, the breakpoint s) of the rearrangement and further analysis as described herein. In someAtty. Docket No. GH0206WO embodiments, the DNA molecules isolated from the sample that are hybridized to the amplified products comprise adapters and / or barcodes. In some embodiments, the adapters are used to amplify such molecules after denaturation. The barcodes can be used to identify sequence reads originating from the same molecule.
[0260] In some embodiments, the first or second primers that anneal to the target region of the DNA do not comprise a tail that does not bind to a target region. In some embodiments, the first or second primers that anneal to the target region of the DNA do comprise a tail that does not bind to a target region but is short enough to allow the primers to anneal to the target region. In some such embodiments, the primer tail is at the 5’ end of the primer. In such embodiments, the primer tail binds to the 5’ end of the adapter ligated to the 3’ end of the DNA molecule. In some embodiments, the primer tail comprises a capture moiety and the deoxynucleoside triphosphates used in primer extension may not comprise a label. In other embodiments, the primer tail does not comprise a label and the deoxynucleoside triphosphates used in primer extension comprise a label. The primer-extended products are captured using a solid support linked to a binding partner of the capture moiety and amplified on the solid support using PCR primers that anneal to a sequence within the primer tail and to the adapter ligated to the 5’ end of the DNA.Alternatively, the primer tail comprises a modification at the 5’ end that protects the primer from exonuclease activity, such as phosphor othioate internucleoside linkages. In some such embodiments, primer-extended products are enriched by contacting the primer-extended sample with a 5’ to 3’ exonuclease to degrade wild type sequences. The remaining sequences are amplified by PCR. In some embodiments, TdT ddATP tailing is performed prior to amplification in order to prevent truncated sequences from acting as primers in the PCR.
[0261] In some embodiments, the CDR3 -enriched DNA or the structural variation-enriched DNA is separated from non-enriched DNA in the sample. In some embodiments, the separating uses the label to separate the structural variation-enriched DNA or the CDR3-enriched DNA from non-enriched DNA in the sample. In some embodiments, the separating comprises precipitating the structural variation-enriched DNA or the CDR3 -enriched DNA.
[0262] In particular embodiments, the separating is performed at a temperature that facilitates (i) separation of extended primers that bind J regions from non-extended or partially-extended primers that bind J regions and / or separation of extended primers that bind V regions from nonextended or parti ally -extended primers that bind V regions; or (ii) separation of extended first primers and / or extended second primers from non-extended or partially-extended first and / orAtty. Docket No. GH0206WO second primers. In some embodiments, the separating is performed at a temperature that facilitates separation of extended primers that bind J regions from non-extended or partially- extended primers that bind J regions and / or separation of extended primers that bind V regions from non-extended or partially-extended primers that bind V regions. In some embodiments, the separating is performed at a temperature that facilitates separation of extended primers that bind J regions from non-extended or partially-extended primers that bind J regions. In some embodiments, the separating is performed at a temperature that facilitates separation of extended primers that bind V regions from non-extended or partially-extended primers that bind V regions. In some embodiments, the separating is performed at a temperature that facilitates separation of extended first primers and / or extended second primers from non-extended or partially-extended first and / or second primers. In some embodiments, the separating is performed at a temperature that facilitates separation of extended first primers from non-extended or partially-extended first primers. In some embodiments, the separating is performed at a temperature that facilitates separation of extended second primers from non-extended or partially-extended second primers.
[0263] In some embodiments, the separating is performed at a temperature that is (i) higher than the melting temperature of the non-extended and / or the partially-extended primers that bind J regions, and / or (ii) higher than the melting temperature of the non-extended and / or the partially- extended primers that bind V regions. In some embodiments, the separating is performed at a temperature that is (i) higher than the melting temperature of the non-extended and / or the partially-extended primers that bind J regions, and (ii) higher than the melting temperature of the non-extended and / or the partially-extended primers that bind V regions. In some embodiments, the separating is performed at a temperature that is higher than the melting temperature of the non-extended and / or the partially-extended primers that bind J regions. In some embodiments, the separating is performed at a temperature that is higher than the melting temperature of the non-extended and / or the partially-extended primers that bind V regions.
[0264] In particular embodiments, the temperature is 1-15, 1-10, 1-5, 5-15, or 5-15, such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 degrees Fahrenheit above the melting temperature of (i) the non-extended and / or the partially-extended primers that bind J regions and / or (ii) the nonextended or partially-extended primers that bind V regions. In some embodiments, the temperature is 1-15, 1-10, 1-5, 5-15, or 5-15, such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 degrees Fahrenheit above the melting temperature of (i) the non-extended and / or the partially- extended primers that bind J regions and (ii) the non-extended or partially-extended primers thatAtty. Docket No. GH0206WO bind V regions. In some embodiments, the temperature is 1-15, 1-10, 1-5, 5-15, or 5-15, such asI, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 degrees Fahrenheit above the melting temperature of the non-extended and / or the partially-extended primers that bind J regions. In some embodiments, the temperature is 1-15, 1-10, 1-5, 5-15, or 5-15, such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,I I, 12, 13, 14, or 15 degrees Fahrenheit above the melting temperature of the non-extended or partially-extended primers that bind V regions.
[0265] In some embodiments, the separating is performed at a temperature that is higher than the melting temperature of the non-extended and / or the partially-extended first and / or second primers. In some embodiments, the separating is performed at a temperature that is higher than the melting temperature of the non-extended and / or the partially-extended first and second primers. In some embodiments, the separating is performed at a temperature that is higher than the melting temperature of the non-extended and / or the partially-extended first primers. In some embodiments, the separating is performed at a temperature that is higher than the melting temperature of the non-extended and / or the partially-extended second primers.
[0266] In particular embodiments, the temperature is 1-15, 1-10, 1-5, 5-15, or 5-15, such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 degrees Fahrenheit above the melting temperature of the non-extended and / or the partially-extended first and / or second primers.
[0267] In some embodiments, the DNA is rendered single stranded prior to the separating. In some embodiments, the plurality of the first and second primers are resistant to 5’ exonucleolysis.
[0268] In some embodiments, the first subsample and the second subsample are differentially tagged. In some embodiments, the first subsample and the second subsample are differentially tagged and pooled. In some embodiments, the pool comprises less than or equal to about 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, or 5% of the DNA of the second subsample. In some embodiments, the pool comprises about 70-90%, about 75-85%, or about 80% of the DNA of the second subsample. In particular embodiments, the pool comprises substantially all of the DNA of the first subsample.
[0269] Some embodiments of the disclosed methods comprise detecting a presence or absence of a DNA molecule that comprises a CDR3 sequence of interest. A CDR3 sequence of interest may be a cancer-associated CDR3 sequence. In some embodiments, a CDR3 sequence of interest is part of a cancer-associated TCR sequence / motif. Some embodiments of the disclosed methods comprise detecting a presence or absence of a DNA molecule that comprises a structuralAtty. Docket No. GH0206WO variation. In some embodiments, the detecting comprises generating a plurality of sequencing reads; and the method further comprises mapping the plurality of sequence reads to one or more reference sequences to generate mapped sequence reads, and processing the mapped sequence reads to determine the likelihood that the subject has cancer. Some embodiments further comprise detecting a presence or absence of DNA originating or derived from a tumor cell using the mapped sequence reads.
[0270] The disclosed methods can be combined with analysis of one or more additional biomarkers. In some embodiments, the disclosed methods are combined with one or more methods, such as but not limited to, methods for assessing DNA methylation patterns, DNA mutations (such as somatic mutations), nucleic acid fragmentation patterns, non-coding RNA (such as micro RNAs (miRNAs), ribosomal RNAs, transfer RNAs, small nucleolar RNAs (snow RNAs), and / or small nuclear RNAs (snRNAs)) levels, and / or cell type proportions / levels, cellular locations, and / or structural modifications of one or more proteins (such as in a sample from a subject), and / or levels or abundance or proportions of one or more metabolites or metabolite signatures, and / or levels or abundance or proportions of one or more lipid molecules or lipid signatures, and / or the levels or proportion or abundance of one or more proteins and / or nucleic acids associated extracellular vesicles (e.g., exosomes). In some embodiments, the disclosed methods are combined with one or more analyses of genetic variations including mutations, rare mutations, indels, rearrangements, copy number variations, transversions, translocations, recombinations, inversion, deletions, aneuploidy, partial aneuploidy, polyploidy, chromosomal instability, chromosomal structure alterations, gene fusions, chromosome fusions, gene truncations, gene amplification, gene duplications, chromosomal lesions, DNA lesions, abnormal changes in nucleic acid chemical modifications, abnormal changes in epigenetic patterns, and / or abnormal changes in nucleic acid 5-methylcytosine.B. Adapter ligation or addition; tagging
[0271] In some embodiments, the methods comprise ligating adapters to DNA. In some embodiments, the ligating adapters to DNA produces adapter-ligated DNA. In some embodiments, DNA molecules can be subjected to blunt-end ligation with blunt-ended adapters. In some embodiments, DNA molecules can be subjected to sticky-end ligation with sticky-ended adapters. DNA molecules can be ligated to adapters at either one end or both ends. DNA molecules can be ligated with at least partially double stranded adapter (e.g., a Y shaped or bellshaped adapter).Atty. Docket No. GH0206WO
[0272] In some embodiments, the ligation step can take place prior to sequencing the DNA. In some embodiments, the ligation step can take place prior to or after capturing the DNA. In some embodiments, the ligation step can take place prior to capturing the DNA. In some embodiments, the ligation step can take place prior to or after capturing the DNA and prior to or after sequencing the DNA. In some embodiments, the ligation step can take place after capturing the DNA and prior to sequencing the DNA. In some embodiments, the ligation step can take place prior to or after partitioning the DNA into a plurality of subsamples. In some embodiments, the ligation step can take place prior to partitioning the DNA into a plurality of subsamples. In some embodiments, the ligation step can take place prior to or after partitioning the DNA into a plurality of subsamples and prior to or after sequencing the DNA. In some embodiments, the ligation step can take place after partitioning the DNA into a plurality of subsamples and prior to the sequencing the DNA. In some embodiments, the ligation step can take place prior to or after subjecting the sample or one or more subsamples to a procedure that affects a first nucleobase in the DNA differently from a second nucleobase. In some embodiments, the ligation step can take place prior to subjecting the sample or one or more subsamples to a procedure that affects a first nucleobase in the DNA differently from a second nucleobase. In some embodiments, the ligation step can take place prior to or after the subjecting the sample or one or more subsamples to a procedure that affects a first nucleobase in the DNA differently from a second nucleobase and prior to or after the sequencing the DNA. In some embodiments, the ligation step can take place after subjecting the sample or one or more subsamples to a procedure that affects a first nucleobase in the DNA differently from a second nucleobase and prior to the sequencing the DNA.
[0273] In some embodiments, the procedure that affects a first nucleobase in the DNA differently from a second nucleobase is a conversion step. In some embodiments, the ligation step can take place before or after the conversion step. In general, “conversion step” or “conversion procedure” refers to any step or procedure that changes the base pairing specificity of one or more nucleotides. In some embodiments, the conversion step comprises contacting DNA with a deaminase.
[0274] DNA ligase and adapters are added to ligate DNA molecules in the sample with an adapter on one or both ends, i.e. to form adapted DNA. As used herein, “adapter” refers to short nucleic acids (e.g., less than about 500, less than about 100 or less than about 50 nucleotides in length, or be 20-30, 20-40, 30-50, 30-60, 40-60, 40-70, 50-60, 50-70, 20-500, or 30-100 basesAtty. Docket No. GH0206WO from end to end) that are typically at least partially double-stranded and can be ligated to the end of a given sample DNA molecule. In some instances, two adapters can be ligated to a single sample DNA molecule, with one adapter ligated to each end of the sample nucleic acid molecule.
[0275] In some embodiments, the ligase used in ligation reactions can act on both single strand DNA nicks and double stranded DNA ends. In some cases, the ligase is T4 DNA ligase or T3 DNA ligase. Adapters can include nucleic acid primer binding sites to permit amplification of a sample DNA molecule flanked by adapters at both ends, and / or a sequencing primer binding site, including primer binding sites for sequencing applications, such as various next generation sequencing (NGS) applications. Adapters can include a sequence for hybridizing to a solid support, e.g., a flow cell sequence. Adapters can also include binding sites for capture probes, such as an oligonucleotide attached to a flow cell support or the like. Adapters can also include sample indexes and / or molecular barcodes. These are typically positioned relative to amplification primer and sequencing primer binding sites, such that the sample index and / or molecular barcode is included in amplicons and sequencing reads of a given DNA molecule. Adapters of the same or different sequence can be linked to the respective ends of a sample DNA molecule. In some cases, adapters of the same or different sequence are linked to the respective ends of the DNA molecule except that the sample index and / or molecular barcode differs in its sequence. In some embodiments, the adapter is an asymmetric adapter, such as a Y-shaped adapter in which one end is blunt ended or tailed as described herein, for joining to a nucleic acid molecule, which is also blunt ended or tailed with one or more complementary nucleotides to those in the tail of the adapter. In another exemplary embodiment, an adapter is a bell-shaped adapter that includes a blunt or tailed end for joining to a DNA molecule to be analyzed. Other exemplary adapters include T-tailed, C-tailed or hairpin shaped adapters and bubble adapters. For example, a hairpin shaped adapter can comprise a complementary double stranded portion and a loop portion, where the double stranded portion can be attached (e g. ligated) to a doublestranded polynucleotide. Hairpin shaped sequencing adapters can be attached to both ends of a polynucleotide fragment to generate a circular molecule, which can be sequenced multiple times. The adapters used in the methods of the present disclosure comprise one or more known modified nucleosides, such as methylated nucleosides. In some embodiments, the modified nucleosides comprise modification resistant cytosines. In some embodiments, each cytosine in each adapter is a modification resistant cytosine. In some embodiments, the modification resistant cytosine is a deamination resistant cytosine. In some embodiments, the deaminationAtty. Docket No. GH0206WO resistant cytosine comprises 5-propynylC (5pyC), 5-pyrrolo-dC (5pyrC), 5- hydroxymethylcytosine (5hmC), glucosylated5-hydroxymethylcytosine (5ghmC), cytosine 5- methylenesulfonate (CMS), or N4-modified cytosine. In some embodiments, the adapters are resistant to digestion by a (methylation resistant restriction enzyme) MSRE. In some embodiments, the MSRE digestion-resistant adapters comprise one or more methylated nucleotides (e g., 5-methylcytosine, 5-hydroxymethylcytosine, or a combination thereof), comprise one or more nucleotide analogs resistant to methylation sensitive restriction enzymes, or do not comprise a nucleotide sequence recognized by the MSRE. In some embodiments, the one or more methylated nucleotides in the MSRE digestion-resistant adapters comprise 5- methylcytosine and / or 5-hydroxymethylcytosine. In some embodiments, the adapters are resistant to digestion by a methylation dependent restriction enzyme (MDRE). In some embodiments, the MDRE digestion-resistant adapters comprise one or more unmethylated nucleotides, comprise one or more nucleotide analogs resistant to methylation dependent restriction enzymes, or do not comprise a nucleotide sequence recognized by the MDRE.
[0276] In instances where two adapters are ligated to a sample nucleic acid (one at each end), either or both of the adapters may comprise one or more known modified nucleosides. Typically, the primer binding site(s), sequencing primer binding site(s), sample index(es) and / or molecular barcode(s), if present, do not comprise the known modified nucleosides that change base pairing specificity as a result of the conversion procedure.
[0277] In some embodiments, adapters may be added to the DNA or a subsample thereof. Adapters can be ligated to DNA at any point in the methods herein. In some embodiments, adapters are ligated to the DNA in a sample. In some embodiments, adapters are ligated to the DNA of a sample or subsample thereof prior to annealing primers to the DNA for capture probe generation. In some such embodiments, the adapter-ligated DNA is amplified prior to annealing primers to the DNA for capture probe generation. In some embodiments, adapters are ligated to the DNA of a sample or subsample thereof before the DNA is contacted with the capture probes. In some embodiments, the DNA to which the adapters are ligated is in the same sample or subsample as the DNA used as a template to generate capture probes. In some embodiments, the DNA to which the adapters are ligated is in a different sample or subsample, e.g., a second sample or a second subsample of a first sample, than the DNA used as a template to generate capture probes. In some embodiments, the adapters ligated to DNA captured by the capture probes.Atty. Docket No. GH0206WO
[0278] In some embodiments, the primers used to generate capture probes are not complementary to adapters, and the resulting capture probes therefore do not comprise adapters. Adapter-ligated DNA can therefore be selectively amplified in the presence of capture probes that do not comprise adapters. Similarly, adapter-ligated DNA can be separated from DNA that does not comprise adapters.
[0279] In some embodiments, the disclosed methods comprise analyzing DNA in a sample. In such methods, adapters may be added to the DNA. This may be done concurrently with an amplification procedure, e.g., by providing the adapters in a 5’ portion of a primer (where PCR is used, this can be referred to as library prep-PCR or LP-PCR), before, or after an amplification step. In some embodiments, adapters are added by other approaches, such as ligation. In some such methods, first adapters are added to the 3’ ends of the nucleic acids by ligation, which may include ligation to single- stranded DNA. In some embodiments, prior to any partitioning or capturing steps, first adapters are added to the nucleic acids by ligation, which may include ligation to single-stranded DNA (e.g., to the 3’ ends thereof). In some embodiments, the capture probes can be isolated after partitioning and ligation. For example, the hypomethylated partition can be ligated with adapters and a portion of the ligated hypomethylated partition can then be used to generate the capture probes for rearrangements. The adapter can be used as a priming site for second-strand synthesis, e.g., using a universal primer and a DNA polymerase. A second adapter can then be ligated to at least the 3’ end of the second strand of the now double-stranded molecule. In some embodiments, the first adapter comprises an affinity tag, such as biotin, and nucleic acid ligated to the first adapter is bound to a solid support (e.g., bead), which may comprise a binding partner for the affinity tag, such as streptavidin. For further discussion of a related procedure, see Gansauge et al., Nature Protocols 8:737-748 (2013). Commercial kits for sequencing library preparation compatible with single-stranded nucleic acids are available, e.g., the Accel-NGS® Methyl-Seq DNA Library Kit from Swift Biosciences. In some embodiments, after adapter ligation, nucleic acids are amplified.
[0280] In some embodiments, the single-stranded DNA library preparation is performed in a one-step combined phosphorylation / ligation reaction, e.g., as described in Troll et al., BMC Genomics, 20: 1023 (2019), available at https: / / doi.org / 10.1186 / sl2864-019-6355-0. This method, called Single Reaction Single-stranded LibrarY (“SRSLY,”) can be performed without end-polishing. SRSLY may be useful for converting short and fragmented DNA molecules, e.g., cfDNA fragments, into sequencing libraries while retaining native lengths and ends. The SRSLYAtty. Docket No. GH0206WO method can create sequencing libraries (e.g., Illumina sequencing libraries) from fragmented or degraded template (input) DNA. In particular embodiments, template DNA is first heat denatured and then immediately cold shocked to render the template DNA molecules singlestranded. The DNA can be maintained as single-stranded throughout the ligation reaction by the inclusion of a thermostable single-stranded binding protein (SSB). Next, the template DNA, which at this point can be single-stranded and coated with SSB, is placed in a phosphorylation / ligation dual reaction with directional dsDNA NGS adapters that contain singlestranded overhangs. Both the forward and reverse sequencing adapters can share similar structures but differ in which termini is unblocked in order to facilitate proper ligations. Both sequencing adapters can comprise a dsDNA portion and a single-stranded splint overhang of random nucleotides that occurs on the 3 -prime terminus of the bottom strand of the forward adapter and the 5-prime terminus of the bottom strand of the reverse adapter. In this way, the forward adapter (e g., (P5) Illumina adapter) can be delivered to the 5-prime end of template molecules and the reverse adapter (e.g., (P7) Illumina adapter) is delivered to the 3-prime end of template molecules. Thus, the native polarity of input DNA molecules can be retained.
[0281] During the dual phosphorylation / ligation reaction, T4 Polynucleotide Kinase (PNK) can be used to prepare template DNA termini for ligation by phosphorylating 5-prime termini and dephosphorylating 3-prime termini. T4 PNK works on both ssDNA and dsDNA molecules and has no activity on the phosphorylation state of proteins. Simultaneously, the random nucleotides of the splint adapter can be annealed to the single-stranded template molecule. This creates a short, localized dsDNA molecule, enabling ligation of template to adapter with a ligase such as T4 DNA ligase, which has high ligation efficiency on dsDNA templates but low efficiency on ssDNA. After the single phosphorylation / ligation reaction is complete, the library DNA can be, e.g., purified and placed directly into standard NGS indexing PCR, compatible with both traditional single or dual index primers.
[0282] In some embodiments, the adapters include different tags of sufficient numbers that the number of combinations of tags results in a low probability e.g., 95, 99 or 99.9% of two nucleic acids with the same start and stop points receiving the same combination of tags. Adapters, whether bearing the same or different tags, can include the same or different primer binding sites, but preferably adapters include the same primer binding site.Atty. Docket No. GH0206WO
[0283] In some embodiments, following attachment of adapters, the nucleic acids are subject to amplification. The amplification can use, e.g., universal primers that recognize primer binding sites in the adapters.
[0284] In some embodiments, following attachment of adapters, the DNA or a sub sample or portion of the DNA is partitioned, comprising contacting the DNA with an agent that preferentially binds to nucleic acids bearing a sequence-variable target region or an epigenetic modification. The nucleic acids are partitioned into at least two partitioned subsamples differing in the extent to which the nucleic acids bear the modification from binding to the agents. For example, if the agent has affinity for nucleic acids bearing the modification, nucleic acids overrepresented in the modification (compared with median representation in the population) preferentially bind to the agent, whereas nucleic acids underrepresented for the modification do not bind or are more easily eluted from the agent. The nucleic acids can then be amplified from primers binding to the primer binding sites within the adapters. Partitioning may be performed instead before adapter attachment, in which case the adapters may comprise differential tags that include a component that identifies which partition a molecule occurred in.
[0285] In some embodiments, the nucleic acids are linked at both ends to Y-shaped adapters including primer binding sites and tags. The molecules are amplified.C. Molecular Tagging
[0286] In some embodiments, the DNA molecules of the adapted library may be tagged with sample indexes and / or molecular barcodes (referred to generally as “tags”). In some embodiments, the DNA molecules of the sample comprise barcodes. Tags can be molecules, such as nucleic acids, containing information that indicates a feature of the molecule with which the tag is associated. For example, DNA molecules can bear a sample tag or sample index (which distinguishes molecules in one sample from those in a different sample), a partition tag (which distinguishes molecules in one partition from those in a different partition) and / or a molecular tag / molecular barcode (which distinguishes different molecules from one another (in both unique and non-unique tagging scenarios).
[0287] Tagging strategies can be divided into unique tagging and non-unique tagging strategies. In unique tagging, all or substantially all of the molecules in a sample bear a different tag, so that reads can be assigned to original molecules based on tag information alone. Tags used in such methods are sometimes referred to as “unique tags”. In non-unique tagging, different molecules in the same sample can bear the same tag, so that other information in addition to tag informationAtty. Docket No. GH0206WO is used to assign a sequence read to an original molecule. Such information may include start and stop coordinate, coordinate to which the molecule maps, start or stop coordinate alone, etc. Tags used in such methods are sometimes referred to as “non-unique tags”. Accordingly, it is not necessary to uniquely tag every molecule in a sample. It suffices to uniquely tag molecules falling within an identifiable class within a sample. Thus, molecules in different identifiable families can bear the same tag without loss of information about the identity of the tagged molecule.
[0288] In certain embodiments, a tag can comprise one or a combination of barcodes. As used herein, the term “barcode” refers to a nucleic acid molecule having a particular nucleotide sequence, or to the nucleotide sequence, itself, depending on context. A barcode can have, for example, between 10 and 100 nucleotides. A collection of barcodes can have degenerate sequences or can have sequences having a certain Hamming distance, as desired for the specific purpose. So, for example, a molecular barcode can be comprised of one barcode or a combination of two barcodes, each attached to different ends of a molecule. Additionally or alternatively, for different partitions and / or samples, different sets of molecular barcodes, molecular tags, or molecular indexes can be used such that the barcodes serve as a molecular tag through their individual sequences and also serve to identify the partition and / or sample to which they correspond based the set of which they are a member.
[0289] For example, barcodes can be used to allow the origin of the DNA (e.g., the subject, biological sample (e.g., samples collected at various time points), enriched DNA sample (e.g., enriched DNA comprising an epigenetic target region set or enriched DNA comprising a sequence-variable target region set), partition, or similar) to be identified, e.g., following pooling of a plurality of samples for parallel sequencing. Tags comprising barcodes can be incorporated into or otherwise joined to adapters. Tags can be incorporated by ligation, overlap extension PCR among other methods. Tags can be used to label the individual polynucleotide population partitions so as to correlate the tag (or tags) with a specific partition. Alternatively, tags can be used in embodiments of the disclosure that do not employ a partitioning step. In some embodiments, a single tag can be used to label a specific partition. In some embodiments, multiple different tags can be used to label a specific partition. In embodiments employing multiple different tags to label a specific partition, the set of tags used to label one partition can be readily differentiated for the set of tags used to label other partitions. In some embodiments, the tags may have additional functions, for example the tags can be used to index sample sourcesAtty. Docket No. GH0206WO or used as unique molecular identifiers (which can be used to improve the quality of sequencing data by differentiating sequencing errors from mutations, for example as in Kinde et al., Proc Nat’l Acad Sci USA 108: 9530-9535 (2011), Kou et al., PLoS ONE, 11 : e0146638 (2016)) or used as non-unique molecule identifiers, for example as described in US Pat. No. 9,598,731. Similarly, in some embodiments, the tags may have additional functions, for example the tags can be used to index sample sources or used as non-unique molecular identifiers (which can be used to improve the quality of sequencing data by differentiating sequencing errors from mutations).
[0290] Tags may be incorporated into or otherwise joined to adapters by chemical synthesis, ligation (e.g., as described above, e.g. by blunt-end ligation or sticky-end ligation), or overlap extension polymerase chain reaction (PCR), among other methods. Such adapters are ultimately joined to the sample DNA molecule. In other embodiments, one or more rounds of amplification cycles (e.g., PCR amplification) may be applied to introduce sample indexes to a nucleic acid molecule using conventional nucleic acid amplification methods. The amplifications may be conducted in one or more reaction mixtures (e.g., a plurality of microwells in an array). Molecular barcodes and / or sample indexes may be introduced simultaneously, or in any sequential order. In some embodiments, molecular barcodes and / or sample indexes are introduced prior to and / or after any conversion procedure. In the case of molecular barcodes and / or sample indexes being introduced through amplification processes, the conversion step will occur before the molecular barcodes and / or sample indexes are introduced. In some embodiments, molecular barcodes and / or sample indexes are introduced prior to and / or after sequence capturing steps, if present, are performed. In some embodiments, only the molecular barcodes are introduced prior to probe capturing and the sample indexes are introduced after sequence capturing steps are performed. In some embodiments, both the molecular barcodes and the sample indexes are introduced prior to performing probe-based capturing steps, if present. In some embodiments, the sample indexes are introduced after sequence capturing steps are performed, if present. In some embodiments, sample indexes are incorporated through overlap extension polymerase chain reaction (PCR).
[0291] In some embodiments, the tags may be located at one end or at both ends of the sample DNA molecule. In some embodiments, tags are predetermined or random or semi -random sequence oligonucleotides. In some embodiments, the tag(s) may together be less than about 500, 200, 100, 50, 20, 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1 nucleotides in length. Typically tags are about 5 toAtty. Docket No. GH0206WO20 or 6 to 15 nucleotides in length. The tags may be linked to sample DNA molecules randomly or non-randomly.
[0292] In some embodiments, each sample or partition (discussed below) is uniquely tagged with a sample index or a combination of sample indexes. In some embodiments, each nucleic acid molecule of a sample or sub-sample is uniquely tagged with a molecular barcode or a combination of molecular barcodes. In other embodiments, a plurality of molecular barcodes may be used such that molecular barcodes are not necessarily unique to one another in the plurality (e.g., non-unique molecular barcodes). In these embodiments, molecular barcodes are generally attached (e.g., by ligation as part of an adapter) to individual molecules such that the combination of the molecular barcode and the sequence it may be attached to creates a unique sequence that may be individually tracked. Detection of non-unique molecular barcodes in combination with endogenous sequence information (e.g., the beginning (start) and / or end (stop) genomic location / position corresponding to the sequence of the original DNA molecule in the sample, start and stop genomic positions corresponding to the sequence of the original DNA molecule in the sample, the beginning (start) and / or end (stop) genomic location / position of the sequence read that is mapped to the reference sequence, start and stop genomic positions of the sequence read that is mapped to the reference sequence, sub-sequences of sequence reads at one or both ends, length of sequence reads, and / or length of the original DNA molecule in the sample) typically allows for the assignment of a unique identity to a particular molecule. In some embodiments, beginning region comprises the first 1, first 2, the first 5, the first 10, the first 15, the first 20, the first 25, the first 30 or at least the first 30 base positions at the 5' end of the sequencing read that align to the reference sequence. In some embodiments, the end region comprises the last 1, last 2, the last 5, the last 10, the last 15, the last 20, the last 25, the last 30 or at least the last 30 base positions at the 3' end of the sequencing read that align to the reference sequence. The length, or number of base pairs, of an individual sequence read are also optionally used to assign a unique identity to a given molecule. As described herein, fragments from a single strand of nucleic acid having been assigned a unique identity, may thereby permit subsequent identification of fragments from the parent strand, and / or a complementary strand.
[0293] In certain embodiments of non-unique tagging, the number of different tags used can be sufficient that there is a very high likelihood (e.g., at least 99%, at least 99.9%, at least 99.99% or at least 99.999% that all DNA molecules of a particular group bear a different tag. It is to be noted that when barcodes are used as tags, and when barcodes are attached, e.g., randomly, toAtty. Docket No. GH0206WO both ends of a molecule, the combination of barcodes, together, can constitute a tag. This number, in term, is a function of the number of molecules falling into the calls. For example, the class may be all molecules mapping to the same start-stop position on a reference genome. The class may be all molecules mapping across a particular genetic locus, e.g., a particular base or a particular region (e.g., up to 100 bases or a gene or an exon of a gene). In certain embodiments, the number of different tags used to uniquely identify a number of molecules, z, in a class can be between any of 2*z, 3*z, 4*z, 5*z, 6*z, 7*z, 8*z, 9*z, 10*z, 11 *z, 12*z, 13*z, 14*z, 15*z, 16*z, 17*z, 18*z, 19*z, 20*z or 100*z (e.g., lower limit) and any of 100,000*z, 10,000*z, 1000*z or 100*z (e.g., upper limit).
[0294] In some embodiments, molecular barcodes are introduced at an expected ratio of a set of identifiers (e.g., a combination of unique or non-unique molecular barcodes) to molecules in a sample. One example format uses from about 2 to about 1,000,000 different molecular barcode sequences, or from about 5 to about 150 different molecular barcode sequences, or from about 20 to about 50 different molecular barcode sequences, ligated to both ends of a target molecule. Alternatively, from about 25 to about 1,000,000 different molecular barcode sequences may be used. For example, 20-50 x 20-50 molecular barcode sequences (i.e., one of the 20-50 different molecular barcode sequences can be attached to each end of the target molecule) can be used. Such numbers of identifiers are typically sufficient for different molecules having the same start and stop points to have a high probability (e.g., at least 94%, 99.5%, 99.99%, or 99.999%) of receiving different combinations of identifiers. In some embodiments, about 80%, about 90%, about 95%, or about 99% of molecules have the same combinations of molecular barcodes. For example, in a sample of about 5 ng to 30 ng of cell free DNA, one expects around 3000 molecules to map to a particular nucleotide coordinate, and between about 3 and 10 molecules having any start coordinate to share the same stop coordinate. Accordingly, about 50 to about 50,000 different tags (e.g., between about 6 and 220 barcode combinations) can suffice to uniquely tag all such molecules. To uniquely tag all 3000 molecules mapping across a nucleotide coordinate, about 1 million to about 20 million different tags would be required.
[0295] In some embodiments, the assignment of unique or non-unique molecular barcodes in reactions is performed using methods and systems described for example, U.S. Patent Application Nos. 20010053519, 20030152490, and 20110160078, and U.S. Patent Nos. 6,582,908, 7,537,898, 9,598,731, and 9,902,992, each of which is hereby incorporated by reference in its entirety. Alternatively, in some embodiments, different nucleic acid molecules ofAtty. Docket No. GH0206WO a sample may be identified using only endogenous sequence information (e.g., start and / or stop positions, sub-sequences of one or both ends of a sequence, and / or lengths. Tags can be linked to sample nucleic acids randomly or non-randomly.
[0296] In some embodiments, the assignment of unique molecular barcodes in reactions is performed using methods and systems described in Lim et al., Communications Biology . (2025)8: 1098, e.g., SPIDER-seq. In some such embodiments, amplicons are tagged with a pair of two unique molecular barcodes using primers that contain a barcode. Successive daughter strands synthesized through each round of PCR amplification are grouped into clusters (e.g., peer-to peer networks, as illustrated in Fig. 1c of Lim et al.) based on a chain of common unique barcodes between immediate parent and daughter strands. That is, strand synthesis events (with a synthesized strand as a template) involve copying one barcode from the template and include one new barcode from the primer, so each daughter strand shares a unique barcode with its parent. By clustering strands in this way, a consensus can be generated that reduces errors.
[0297] In some embodiments, the tagged nucleic acids are sequenced after loading into a microwell plate. The microwell plate can have 96, 384, or 1536 microwells. In some cases, they are introduced at an expected ratio of unique tags to microwells. For example, the unique tags may be loaded so that more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, 100, 500, 1000, 5000, 10000, 50,000, 100,000, 500,000, 1,000,000, 10,000,000, 50,000,000 or 1,000,000,000 unique tags are loaded per genome sample. In some cases, the unique tags may be loaded so that less than about 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, 100, 500, 1000, 5000, 10000, 50,000, 100,000, 500,000, 1,000,000, 10,000,000, 50,000,000 or 1,000,000,000 unique tags are loaded per genome sample. In some cases, the average number of unique tags loaded per sample genome is less than, or greater than, about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, 100, 500, 1000, 5000, 10000, 50,000, 100,000, 500,000, 1,000,000, 10,000,000, 50,000,000 or 1,000,000,000 unique tags per genome sample.
[0298] In some embodiments, a format uses 20-50 different tags (e.g., barcodes) ligated to both ends of target nucleic acids. For example, 35 different tags (e.g., barcodes) ligated to both ends of target molecules creating 35 x 35 permutations, which equals 1225 for 35 tags. Such numbers of tags are sufficient so that different molecules having the same start and stop points have a high probability (e.g., at least 94%, 99.5%, 99.99%, 99.999%) of receiving different combinations of tags. Other barcode combinations include any number between 10 and 500, e.g., about 15x15, about 35x35, about 75x75, about 100x100, about 250x250, about 500x500.Atty. Docket No. GH0206WO
[0299] In some cases, unique tags may be predetermined or random or semi-random sequence oligonucleotides. In other cases, a plurality of barcodes may be used such that barcodes are not necessarily unique to one another in the plurality. In this example, barcodes may be ligated to individual molecules such that the combination of the barcode and the sequence it may be ligated to creates a unique sequence that may be individually tracked. As described herein, detection of non-unique barcodes in combination with sequence data of beginning (start) and end (stop) portions of sequence reads may allow assignment of a unique identity to a particular molecule. The length or number of base pairs, of an individual sequence read may also be used to assign a unique identity to such a molecule. As described herein, fragments from a single strand of nucleic acid having been assigned a unique identity, may thereby permit subsequent identification of fragments from the parent strand.
[0300] In some embodiments, the method includes adding one or more internal control DNAs and forward and reverse primers for amplifying the internal control DNAs. The internal control DNAs may be added before amplification using the primers that anneal upstream and downstream of the rearrangement breakpoints. The forward and reverse primers for amplifying the internal control DNAs may be included with, or added at the same time as, the primers that anneal upstream and downstream of the rearrangement breakpoints. The internal control DNAs may comprise or consist of sequences that do not occur in the genome of the subject, or that do not occur in the genome of the species of which the subject is a member (e.g., the human genome). The forward and / or reverse primers for amplifying the internal control DNAs may comprise sequences that are not complementary to any sequence in the genome of the subject, e.g., the human genome. The internal control DNAs may be used to ensure that the amplification process proceeded as designed. As such, the method may comprise detecting (e.g., sequencing) molecules amplified from and / or captured by the one or more internal control DNAs. The method can comprise comparing an amount of internal control DNAs (e g., number of molecules or reads detected that correspond to an internal control DNA sequence) to a predetermined threshold, and either rejecting sequencing results if the predetermined threshold is not met or accepting sequencing results if the predetermined threshold is met. The predetermined threshold may be established, e.g., based on historical data or by testing the method on samples of DNA from test subjects, such as healthy volunteers. For example, amplification and detection of the one or more internal control DNAs provides confirmation that the amplification process proceeded properly, thus reducing the likelihood of a false negative.Atty. Docket No. GH0206WOD. Partitioning
[0301] In some instances, a heterogeneous nucleic acid sample is partitioned into two or more partitions (sub-samples). In some embodiments, each partition is differentially tagged. Tagged partitions can then be pooled together for collective sample prep and / or sequencing. The partitioning-tagging-pooling steps can occur more than once, with each round of partitioning occurring based on different characteristics and tagged using differential tags that are distinguished from other partitions and partitioning means. In some embodiments, the separating comprises partitioning the DNA in the sample into a plurality of partitioned subsamples. In some embodiments, the plurality of partitioned subsamples comprises a first subsample and a second subsample.
[0302] In some embodiments, the adapted library is partitioned into at least a first subsample and a second subsample. This may be accomplished simply by dividing the library into identical or substantially identical subsamples. Alternatively, in some methods, different DNA (e.g., sequence-variable target regions, recombined CDR3 sequences, and epigenetic target regions) can be partitioned based on one or more characteristics of the DNA. Detecting aberrant features in DNA (whether sequence-based, epigenetic, or both) while also detecting recombined CDR3 sequences and / or target regions comprising sequence-variable target regions and / or epigenetic target regions may provide greater specificity and / or sensitivity for identifying an abnormal state than detecting the DNA features alone or levels of one or more post-translationally modified proteins alone.
[0303] In some embodiments, the first subsample comprises sequence-variable target regions and / or epigenetic target regions in a greater proportion than the second sub sample. In some embodiments, the CDR3-enriched DNA is separated from non-enriched DNA of the first subsample of the adapted library. In some embodiments, the CDR3-enriched DNA is separated from non-enriched DNA of the second subsample of the adapted library.
[0304] In some embodiments, sequence-variable target regions and / or epigenetic target regions are captured from the adapted library. In some embodiments, sequence-variable target regions are captured from the adapted library. In some embodiments, epigenetic target regions are captured from the adapted library. In some embodiments, sequence-variable target regions and / or epigenetic target regions are captured from a first subsample of the adapted library. In some embodiments, sequence-variable target regions and / or epigenetic target regions are captured from a second subsample of the adapted library. In some embodiments, sequence-variable targetAtty. Docket No. GH0206WO regions and / or epigenetic target regions are captured from a third subsample of the adapted library. In some embodiments, sequence-variable target regions are captured from a first subsample of the adapted library. In some embodiments, sequence-variable target regions are captured from a second subsample of the adapted library. In some embodiments, sequencevariable target regions are captured from a third subsample of the adapted library. In some embodiments, epigenetic target regions are captured from a first subsample of the adapted library. In some embodiments, epigenetic target regions are captured from a second subsample of the adapted library. In some embodiments, epigenetic target regions are captured from a third subsample of the adapted library. In some embodiments, CDR3-enriched DNA is separated from non-enriched DNA of a first subsample of the adapted library. In some embodiments, CDR3- enriched DNA is separated from non-enriched DNA of a second subsample of the adapted library. In some embodiments, CDR3-enriched DNA is separated from non-enriched DNA of a third subsample of the adapted library. In some embodiments, CDR3-enriched DNA is separated from non-enriched DNA of a first subsample of the adapted library, and sequence-variable target regions and / or epigenetic target regions are captured from the first subsample of the adapted library. In some embodiments, CDR3-enriched DNA is separated from non-enriched DNA of a first subsample of the adapted library, and sequence-variable target regions and epigenetic target regions are captured from the first subsample of the adapted library. In some embodiments, CDR3-enriched DNA is separated from non-enriched DNA of a first subsample of the adapted library, and sequence-variable target regions and / or epigenetic target regions are captured from a second subsample of the adapted library. In some embodiments, CDR3-enriched DNA is separated from non-enriched DNA of a first subsample of the adapted library, and sequencevariable target regions and epigenetic target regions are captured from a second subsample of the adapted library. In some embodiments, CDR3-enriched DNA is separated from non-enriched DNA of a first subsample of the adapted library, and sequence-variable target regions are captured from a second subsample of the adapted library. In some embodiments, CDR3-enriched DNA is separated from non-enriched DNA of a first subsample of the adapted library, and epigenetic target regions are captured from a second subsample of the adapted library. In some embodiments, CDR3-enriched DNA is separated from non-enriched DNA of a first subsample of the adapted library, sequence-variable target regions are captured from a second subsample of the adapted library, and epigenetic target regions are captured from a third subsample of the adapted library.Atty. Docket No. GH0206WO
[0305] In some embodiments, the partitioning the DNA into a plurality of subsamples comprises contacting the DNA with an agent that recognizes methyl cytosine in the DNA. The partitioning step can occur prior to or after capturing an epigenetic target region set of DNA or a sequencevariable target region of the DNA. The partitioning step can occur prior to capturing an epigenetic target region set of DNA or a sequence-variable target region of the DNA. The partitioning step can occur prior to or after capturing an epigenetic target region set of DNA or a sequence-variable target region of the DNA and prior to or after sequencing the DNA. The partitioning step can occur after capturing an epigenetic target region set of DNA or a sequencevariable target regions of the DNA and prior to sequencing the DNA.
[0306] In some embodiments, a second subsample of the adapted library is retained as a backup. In some embodiments, a third subsample of the adapted library is retained as a backup.
[0307] Disclosed methods herein comprise analyzing DNA in a sample. In some embodiments described herein, the disclosed methods comprise partitioning DNA. In such methods, different forms of DNA (e.g., hypermethylated and hypom ethylated DNA) can be physically partitioned based on one or more characteristics of the DNA. This approach can be used to determine, for example, whether certain sequences are hypermethylated or hypomethylated.
[0308] In some embodiments, a first subsample or aliquot of a sample is subjected to steps for making capture probes as described elsewhere herein and a second subsample or aliquot of a sample is subjected to partitioning. In some embodiments, a sample or subsample or aliquot thereof is subjected to partitioning and differential tagging, followed by a capture step using capture probes for rearranged sequences and optionally additional capture probes, e.g., for sequence-variable and / or epigenetic target regions.
[0309] Methylation profiling can involve determining methylation patterns across different regions of the genome. For example, after partitioning molecules based on extent of methylation (e.g., relative number of methylated nucleobases per molecule) and sequencing, the sequences of molecules in the different partitions can be mapped to a reference genome. This can show regions of the genome that, compared with other regions, are more highly methylated or are less highly methylated. In this way, genomic regions, in contrast to individual molecules, may differ in their extent of methylation.
[0310] In some embodiments, the partitioning comprises contacting the DNA with an agent that recognizes a modification associated with (e.g., in) the DNA. In some embodiments, the agent that recognizes the modification is an antibody or a methyl binding domain (MBD) protein. InAtty. Docket No. GH0206WO some embodiments, the agent is immobilized on a solid support. In some embodiments, the solid support comprises a bead. In some embodiments, the partitioning comprises immunoprecipitation, e.g., using the agent that recognizes the modification, such as an antibody or an MBD protein, immobilized on solid support.
[0311] In some embodiments, the partitioning comprises precipitating the methylated DNA. In some embodiments, the partitioning comprises precipitating the methylated DNA to separate it from the unmethylated DNA. In some embodiments, the precipitating the methylated DNA can be performed using any pair of binding partners. In some embodiments, one of the binding partners may be linked to the MBD protein or antibody, and the other binding partner may be linked to a solid support. In some embodiments, the binding partner comprises biotin and streptavidin. In some embodiments, the biotin may be linked to the MBD protein, and the streptavidin may be linked to a solid support. In some embodiments, the MBD protein is linked to a solid support, optionally using any pair of binding partners. In some embodiments, the partitioning comprises immunoprecipitating the methylated DNA. In some embodiments, the partitioning comprises immunoprecipitating the methylated DNA separately from the unmethylated DNA.
[0312] In some embodiments, the modification is methylation, and in some such embodiments, the partitioning comprises partitioning on the basis of methylation level. In some such embodiments, the agent is a methyl binding reagent. In some embodiments, the methyl binding reagent specifically recognizes 5-methylcytosine. In some such embodiments, the agent is a hydroxymethyl binding reagent. In some embodiments, the methyl binding reagent specifically recognizes 5-hydroxymethylcytosine, biotinylated 5-hydroxymethylcytosine, glucosylated 5- hydroxymethylcytosine, or sulfonylated 5-hydroxymethylcytosine. In some embodiments, the partitioning comprises partitioning on the basis of binding to a protein comprising contacting the sample comprising the DNA with a binding reagent specific for the protein. In some such embodiments, binding reagent specifically binds a methylated protein or an acetylated protein, such as a methylated or acetylated histone, or an unmethylated protein or an unacetylated protein such as an unmethylated or unacetylated histone. In some embodiments, the binding reagent specifically binds an unmethylated or unacetylated protein epitope.
[0313] In some embodiments, the modification is hydroxymethylation, and in some such embodiments, the partitioning comprises partitioning on the basis of hydroxymethylation level. In some such embodiments, the agent is a hydroxymethyl binding reagent, such as an antibody.Atty. Docket No. GH0206WOIn some embodiments, the hydroxymethyl binding reagent (e.g., antibody) specifically recognizes 5-hydroxymethylcytosine (5-hmC). In some embodiments, a modification such as hydroxymethylation is labeled (e.g., biotinylated, glucosylated, or sulfonated) before being contacted with an agent that recognizes the labeled form of the modification. For example, 5- hmC can be enzymatically glucosylated and then partitioned based on binding to J-binding protein 1. Exemplary methods of labeling and / or partitioning 5-hmC are provided, e g., in Song et al., Nat. Biotech. 29:68-72 (2010); Ko et al., Nature 468:839-843 (2010); and Robertson et al., Nucleic Acids Res. 39:e55 (2011).
[0314] Where immunoprecipitation is used and involves an antibody that recognizes singlestranded DNA, the DNA may be converted to double-stranded form by complementary strand synthesis before a subsequent step. Such synthesis may use an adapter as a primer binding site, or can use random priming.
[0315] Partitioning nucleic acid molecules in a sample can increase a rare signal, e.g., by enriching rare nucleic acid molecules that are more prevalent in one partition of the sample. For example, a genetic variation present in epigenetic target regions, sequence-variable target regions, or recombined CDR3 sequences, e g. in a TCR or BCR or immunoglobulin, can be more easily detected by partitioning a sample into a subsample comprising those target regions. By analyzing multiple partitions of a sample, a multi-dimensional analysis of a single molecule can be performed, and hence, greater sensitivity can be achieved. Partitioning may include physically partitioning nucleic acid molecules into partitions or subsamples based on the presence or absence of one or more methylated nucleobases. A sample may be partitioned into partitions or subsamples based on a characteristic that is indicative of differential gene expression or a disease state. A sample may be partitioned based on a characteristic, or combination thereof that provides a difference in signal between a normal and diseased state during analysis of nucleic acids, e.g., cell free DNA (cfDNA), non-cfDNA, tumor DNA, circulating tumor DNA (ctDNA) and cell free nucleic acids (cfNA).
[0316] In some embodiments, hypermethylation and / or hypomethylation variable epigenetic target regions are analyzed to determine whether they show differential methylation characteristic of tumor cells or cells of a type that does not normally contribute to the DNA sample being analyzed (such as cfDNA), and / or particular immune cell types.
[0317] In some instances, heterogenous DNA in a sample can be partitioned into two or more partitions (e.g., at least 3, 4, 5, 6 or 7 partitions). In some embodiments, each partition isAtty. Docket No. GH0206WO differentially tagged. Tagged partitions can then be pooled together for collective sample prep and / or sequencing. The partitioning-tagging-pooling steps can occur more than once, with each round of partitioning occurring based on a different characteristic (examples provided herein) and tagged using differential tags that are distinguished from other partitions and partitioning means. In other instances, the differentially tagged partitions are separately sequenced.
[0318] In some embodiments, sequence reads from differentially tagged and pooled DNA are obtained and analyzed in silico. After sequencing, analysis of reads can be performed on a partition-by-partition level, as well as a whole DNA population level. Tags are used to sort reads from different partitions. Analysis to detect genetic variants can be performed on a partition-by- partition level, as well as whole nucleic acid population level. For example, analysis can include in silico analysis to determine genetic variants, such as copy number variations (CNVs), single nucleotide variations (SNVs), insertions / deletions (indels), and / or fusions in nucleic acids in each partition. In some instances, in silico analysis can include analysis to determine epigenetic variation (one or more of methylation chromatin structure, etc.). Analysis can include in silico using sequence information, genomic coordinates length, coverage, and / or copy number. For example, coverage of sequence reads can be used to determine nucleosome positioning in chromatin. Tags are used to sort reads from different partitions. Higher coverage can correlate with higher nucleosome occupancy in genomic region while lower coverage can correlate with lower nucleosome occupancy or nucleosome depleted region (NDR).
[0319] Examples of characteristics that can be used for partitioning include sequence length, methylation level, nucleosome binding, sequence mismatch, immunoprecipitation, and / or proteins that bind to DNA. Resulting partitions can include one or more of the following nucleic acid forms: single-stranded DNA (ssDNA), double-stranded DNA (dsDNA), shorter DNA fragments and longer DNA fragments. In some embodiments, partitioning based on a cytosine modification (e.g., cytosine methylation) or methylation generally is performed and is optionally combined with at least one additional partitioning step, which may be based on any of the foregoing characteristics or forms of DNA. In some embodiments, a heterogeneous population of nucleic acids is partitioned into nucleic acids with base modification and without one or more base modifications, including e g., one or more sequence-variable target regions or one or more epigenetic modifications and nucleic acids with recombined CDR3 sequences. Examples of base modifications are described elsewhere herein. Alternatively or additionally, a heterogeneous population of nucleic acids can be partitioned into nucleic acid molecules associated withAtty. Docket No. GH0206WO nucleosomes and nucleic acid molecules devoid of nucleosomes. Alternatively or additionally, a heterogeneous population of nucleic acids may be partitioned into single-stranded DNA (ssDNA) and double-stranded DNA (dsDNA). Alternatively, or additionally, a heterogeneous population of nucleic acids may be partitioned based on nucleic acid length (e.g., molecules of up to 160 bp and molecules having a length of greater than 160 bp).
[0320] In some cases, different procedures are applied to different partitions to determine different characteristics of the initial sample. In some embodiments, the DNA of at least one partition is subjected to an end repair and sequencing procedure described herein. In some embodiments at least one partition is not subjected to the end repair and sequencing procedure described herein. In cases where the method comprises a conversion procedure, corresponding sequences from the converted and non-converted partitions can be compared to identify single nucleotides that have undergone conversion and therefore identify corresponding modified nucleosides in the initial sample.
[0321] In some embodiments, partition tagging comprises tagging molecules in each partition with a partition tag. After re-combining partitions (e.g., to reduce the number of sequencing runs needed and avoid unnecessary cost) and sequencing molecules, the partition tags identify the source partition. In another embodiment, different partitions are tagged with different sets of molecular tags, e.g., comprised of a pair of barcodes. In this way, each molecular barcode indicates the source partition as well as being useful to distinguish molecules within a partition. For example, a first set of 35 barcodes can be used to tag molecules in a first partition, while a second set of 35 barcodes can be used tag molecules in a second partition.
[0322] In some embodiments, after partitioning and tagging with partition tags, the molecules may be pooled for sequencing in a single run. In some embodiments, a sample tag is added to the molecules, e.g., in a step subsequent to addition of partition tags and pooling. Sample tags can facilitate pooling material generated from multiple samples for sequencing in a single sequencing run.
[0323] Alternatively, in some embodiments, partition tags may be correlated to the sample as well as the partition. As a simple example, a first tag can indicate a first partition of a first sample; a second tag can indicate a second partition of the first sample; a third tag can indicate a first partition of a second sample; and a fourth tag can indicate a second partition of the second sample.Atty. Docket No. GH0206WO
[0324] While tags may be attached to molecules already partitioned based on one or more characteristics, the final tagged molecules in the library may no longer possess that characteristic. For example, while single stranded DNA molecules may be partitioned and tagged, the final tagged molecules in the library are likely to be double stranded. Similarly, while DNA may be subject to partition based on different levels of methylation, in the final library, tagged molecules derived from these molecules are likely to be unmethylated. Accordingly, the tag attached to a molecule in the library typically indicates the characteristic of the “parent molecule” from which the ultimate tagged molecule is derived, not necessarily to characteristic of the tagged molecule, itself.
[0325] As an example, barcodes 1, 2, 3, 4, etc. are used to tag and label molecules in the first partition; barcodes A, B, C, D, etc. are used to tag and label molecules in the second partition; and barcodes a, b, c, d, etc. are used to tag and label molecules in the third partition.Differentially tagged partitions can be pooled prior to sequencing. Differentially tagged partitions can be separately sequenced or sequenced together concurrently, e.g., in the same flow cell of an Illumina sequencer.
[0326] After sequencing, analysis of reads can be performed on a partition-by-partition level, as well as a whole DNA population level. Tags are used to sort reads from different partitions. Analysis can include in silico analysis to determine genetic and epigenetic variation (one or more of methylation, chromatin structure, etc.) using sequence information, genomic coordinates length, coverage, and / or copy number. In some embodiments, higher coverage can correlate with higher nucleosome occupancy in a genomic region, while lower coverage can correlate with lower nucleosome occupancy or a nucleosome depleted region (NDR).
[0327] The agents used to partition populations of nucleic acids within a sample can be affinity agents, such as antibodies with the desired specificity, natural binding partners or variants thereof (Bock et al., Nat Biotech 28: 1106-1 114 (2010); Song et al., Nat Biotech 29: 68-72 (2011)), or artificial peptides selected e.g., by phage display to have specificity to a given target. In some embodiments, the agent used in the partitioning is an agent that recognizes a modified nucleobase. In some embodiments, the modified nucleobase recognized by the agent is a modified cytosine, such as a methylcytosine (e.g., 5-methylcytosine). In some embodiments, the modified nucleobase recognized by the agent is a product of a procedure that affects the first nucleobase in the DNA differently from the second nucleobase in the DNA of the sample. In some embodiments, the modified nucleobase may be a “converted nucleobase,” meaning that itsAtty. Docket No. GH0206WO base pairing specificity was changed by a procedure. For example, certain procedures convert unmethylated or unmodified cytosine to dihydrouracil, or more generally, at least one modified or unmodified form of cytosine undergoes deamination, resulting in uracil (considered a modified nucleobase in the context of DNA) or a further modified form of uracil. Examples of partitioning agents include antibodies, such as antibodies that recognize a modified nucleobase, which may be a modified cytosine, such as a methylcytosine (e.g., 5-methylcytosine). In some embodiments, the partitioning agent is an antibody that recognizes a modified cytosine other than 5-methylcytosine, such as 5-carboxylcytosine (5-caC). Alternative partitioning agents include methyl binding domain (MBDs) and methyl binding proteins (MBPs) as described herein, including proteins such as MeCP2, MBD2, and antibodies preferentially binding to 5- methylcytosine. Where an antibody is used to immunoprecipitate methylated DNA, the methylated DNA may be recovered in single- stranded form. In such embodiments, a second strand can be synthesized. Hypermethylated (and optionally intermediately methylated) subsamples may then be contacted with a methylation sensitive nuclease that does not cleave hemi -methylated DNA, such as Hpall, BstUI, or Hin6i. Alternatively or in addition, hypomethylated (and optionally intermediately methylated) subsamples may then be contacted with a methylation dependent nuclease that cleaves hemi-methylated DNA.
[0328] Additional, non-limiting examples of partitioning agents are histone binding proteins which can separate nucleic acids bound to histones from free or unbound nucleic acids. Examples of histone binding proteins that can be used in the methods disclosed herein include RBBP4, RbAp48 and SANT domain peptides.
[0329] In some embodiments, partitioning can comprise both binary partitioning and partitioning based on degree / level of modifications. For example, methylated fragments can be partitioned by methylated DNA immunoprecipitation (MeDIP), or all methylated fragments can be partitioned from unmethylated fragments using methyl binding domain proteins (e.g., MethylMiner™ Methylated DNA Enrichment Kit (ThermoFisher Scientific). Subsequently, additional partitioning may involve eluting fragments having different levels of methylation by adjusting the salt concentration in a solution with the methyl binding domain and bound fragments. As salt concentration increases, fragments having greater methylation levels are eluted.
[0330] Analyzing DNA may comprise detecting or quantifying DNA of interest. Analyzing DNA can comprise detecting genetic variants and / or epigenetic features (e.g., DNA methylation and / or DNA fragmentation). In some embodiments, the DNA of interest is one or moreAtty. Docket No. GH0206WO differentially methylated regions of the DNA. In some embodiments, the detecting or quantifying the DNA of interest comprises quantifying and / or detecting a level of methylation at one or more differentially methylated regions of the DNA. In some embodiments, quantifying and / or detecting the level of methylation at one or more differentially methylated regions of the DNA comprises sequencing at least a portion of the amplified DNA or quantitative PCR (qPCR). In some embodiments, the DNA of interest is a copy number variant. In some embodiments, the detecting or quantifying the DNA of interest comprises quantifying and / or detecting a level of a copy number variant of the DNA. In some embodiments, quantifying and / or detecting the level of a copy number variant of the DNA comprises quantitative PCR (qPCR).
[0331] In some embodiments, methylation levels can be determined using partitioning, modification-sensitive conversion such as bisulfite conversion, direct detection during sequencing, methylation-sensitive restriction enzyme digestion, methylation-dependent restriction enzyme digestion, or any other suitable approach. For example, different forms of DNA (e.g., hypermethylated and hypomethylated DNA) can be physically partitioned based on one or more characteristics of the DNA. For example, a methylated DNA binding protein (e.g., an MBD such as MBD2, MBD4, or MeCP2) or an antibody specific for 5-methylcytosine (as in MeDIP) can be used to partition the DNA. This approach can be used to determine, for example, whether certain sequences are hypermethylated or hypomethylated. In some embodiments, a DNA fragmentation pattern can be determined based on endpoints and / or centerpoints of DNA molecules, such as cfDNA molecules.
[0332] In some instances, the final partitions are enriched in nucleic acids having different extents of modifications (overrepresentative or underrepresentative of modifications). Overrepresentation and underrepresentation can be defined by the number of modifications bom by a nucleic acid relative to the median number of modifications per strand in a population. For example, if the median number of 5-methylcytosine residues in nucleic acid in a sample is 2, a nucleic acid including more than two 5-methylcytosine residues is overrepresented in this modification and a nucleic acid with 1 or zero 5-methylcytosine residues is underrepresented. The effect of affinity separation is to enrich for nucleic acids overrepresented in a modification in a bound phase and for nucleic acids underrepresented in a modification in an unbound phase (i.e. in solution). The nucleic acids in the bound phase can be eluted before subsequent processing.Atty. Docket No. GH0206WO
[0333] When using MeDIP or MethylMiner™ Methylated DNA Enrichment Kit (ThermoFisher Scientific) various levels of methylation can be partitioned using sequential elutions. For example, a hypomethylated partition (no methylation) can be separated from a methylated partition by contacting the nucleic acid population with the MBD from the kit, which is attached to magnetic beads. The beads are used to separate out the methylated nucleic acids from the nonmethylated nucleic acids. Subsequently, one or more elution steps are performed sequentially to elute nucleic acids having different levels of methylation. For example, a first set of methylated nucleic acids can be eluted at a salt concentration of 160 mM or higher, e.g., at least 150 mM, at least 200 mM, 300 mM, 400 mM, 500 mM, 600 mM, 700 mM, 800 mM, 900 mM, 1000 mM, or 2000 mM. After such methylated nucleic acids are eluted, magnetic separation is once again used to separate higher level of methylated nucleic acids from those with lower level of methylation. The elution and magnetic separation steps can be repeated to create various partitions such as a hypomethylated partition (enriched in nucleic acids comprising no methylation), a methylated partition (enriched in nucleic acids comprising low levels of methylation), and a hyper methylated partition (enriched in nucleic acids comprising high levels of methylation).
[0334] In some methods, nucleic acids bound to an agent used for affinity separation based partitioning are subjected to a wash step. The wash step washes off nucleic acids weakly bound to the affinity agent. Such nucleic acids can be enriched in nucleic acids having the modification to an extent close to the mean or median (i.e., intermediate between nucleic acids remaining bound to the solid phase and nucleic acids not binding to the solid phase on initial contacting of the sample with the agent).
[0335] The affinity separation results in at least two, and sometimes three or more partitions of nucleic acids with different extents of a modification. While the partitions are still separate, the nucleic acids of at least one partition, and usually two or three (or more) partitions are linked to nucleic acid tags, usually provided as components of adapters, with the nucleic acids in different partitions receiving different tags that distinguish members of one partition from another. The tags linked to nucleic acid molecules of the same partition can be the same or different from one another. But if different from one another, the tags may have part of their code in common so as to identify the molecules to which they are attached as being of a particular partition.
[0336] For further details regarding portioning nucleic acid samples based on characteristics such as methylation, see WO2018 / 119452, which is incorporated herein by reference.Atty. Docket No. GH0206WO
[0337] In some embodiments, the nucleic acid molecules can be partitioned into different partitions based on the nucleic acid molecules that are bound to a specific protein or a fragment thereof and those that are not bound to that specific protein or fragment thereof.
[0338] Nucleic acid molecules can be partitioned based on DNA-protein binding. Protein-DNA complexes can be partitioned based on a specific property of a protein. Examples of such properties include various epitopes, modifications (e.g., histone methylation or acetylation) or enzymatic activity. Examples of proteins which may bind to DNA and serve as a basis for fractionation may include, but are not limited to, protein A and protein G. Any suitable method can be used to partition the nucleic acid molecules based on protein bound regions. Examples of methods used to partition nucleic acid molecules based on protein bound regions include, but are not limited to, SDS-PAGE, chromatin-immuno-precipitation (ChIP), heparin chromatography, and asymmetrical field flow fractionation (AF4).
[0339] In some embodiments, the partitioning comprises contacting the DNA with a methylation sensitive restriction enzyme (MSRE) and / or a methylation dependent restriction enzyme (MDRE). Following the treatment of the DNA with a MSRE or a MDRE, the DNA may be partitioned based on size to generate hypermethylated (longest DNA molecules following MSRE treatment and shortest DNA fragments following MDRE treatment), intermediate (intermediate length DNA molecules following MSRE or MDRE treatment), and hypomethylated (shortest DNA molecules following MSRE treatment and longest DNA fragments following MDRE treatment) subsamples.
[0340] In some embodiments, the partitioning is performed by contacting the nucleic acids with a methyl binding domain (“MBD”) of a methyl binding protein (“MBP”). In some such embodiments, the nucleic acids are contacted with an entire MBP. In some embodiments, an MBD binds to 5-methylcytosine (5mC), and an MBP comprises an MBD and is referred to interchangeably herein as a methyl binding protein or a methyl binding domain protein. In some embodiments, MBD is coupled to paramagnetic beads, such as Dynabeads® M-280 Streptavidin via a biotin linker. Partitioning into fractions with different extents of methylation can be performed by eluting fractions by increasing the NaCl concentration.
[0341] In some embodiments, bound DNA is eluted by contacting the antibody or MBD with a protease, such as proteinase K. This may be performed instead of or in addition to elution steps using NaCl as discussed above.Atty. Docket No. GH0206WO
[0342] Examples of agents that recognize a modified nucleobase contemplated herein include, but are not limited to:(a) MeCP2 is a protein that preferentially binds to 5-methyl-cytosine over unmodified cytosine.(b) RPL26, PRP8 and the DNA mismatch repair protein MHS6 preferentially bind to 5- hydroxymethyl -cytosine over unmodified cytosine.(c) FOXK1, FOXK2, FOXP1, FOXP4 and FOXI3 preferably bind to 5 -formyl -cytosine over unmodified cytosine (lurlaro et al., Genome Biol. 14: R119 (2013)).(d) Antibodies specific to one or more methylated or modified nucleobases or conversion products thereof, such as 5mC, 5-caC, or DHU.
[0343] In general, elution is a function of the number of modifications, such as the number of methylated sites per molecule, with molecules having more methylation eluting under increased salt concentrations. To elute the DNA into distinct populations based on the extent of methylation, one can use a series of elution buffers of increasing NaCl concentration. Salt concentration can range from about 100 nm to about 2500 mM NaCl. In one embodiment, the process results in three (3) partitions. Molecules are contacted with a solution at a first salt concentration and comprising a molecule comprising an agent that recognizes a modified nucleobase, which molecule can be attached to a capture moiety, such as streptavidin. At the first salt concentration a population of molecules will bind to the agent and a population will remain unbound. The unbound population can be separated as a “hypom ethylated” population. For example, a first partition enriched in hypomethylated form of DNA is that which remains unbound at a low salt concentration, e.g., 100 mM or 160 mM. A second partition enriched in intermediate methylated DNA is eluted using an intermediate salt concentration, e.g., between 100 mM and 2000 mM concentration. This is also separated from the sample. A third partition enriched in hypermethylated form of DNA is eluted using a high salt concentration, e.g., at least about 2000 mM.
[0344] In some embodiments, a monoclonal antibody raised against 5-methylcytidine (5mC) is used to purify methylated DNA. DNA is denatured, e.g., at 95°C in order to yield single-stranded DNA fragments. Protein G coupled to standard or magnetic beads as well as washes following incubation with the anti-5mC antibody are used to immunoprecipitate DNA bound to the antibody. Such DNA may then be eluted. Partitions may comprise unprecipitated DNA and one or more partitions eluted from the beads.Atty. Docket No. GH0206WO
[0345] In some embodiments, the partitions of DNA are desalted and concentrated in preparation for enzymatic steps of library preparation.
[0346] Sequences that comprise aberrantly high copy numbers may tend to be hypermethylated. Accordingly, in some embodiments, the DNA contacted with target-specific probes specific for members of an epigenetic target region set comprising a plurality of target regions that are both type-specific differentially methylated regions and copy number variants comprises at least a portion of a hypermethylated partition. The DNA from or comprising at least a portion of the hypermethylated partition may or may not be combined with DNA from or comprising at least a portion of one or more other partitions, such as an intermediate partition or a hypomethylated partition.
[0347] In some cases, different procedures are applied to different partitions to determine different characteristics of the initial sample. In some embodiments, the DNA of at least one partition is subjected to an end repair and sequencing procedure described herein. In some embodiments at least one partition is not subjected to the end repair and sequencing procedure according to the methods of the disclosure described herein. In cases where the sequencing procedure comprises a conversion procedure, corresponding sequences from the converted and non-converted partitions can be compared to identify single nucleotides that have undergone conversion and therefore identify corresponding modified nucleosides in the initial sample.
[0348] Disclosed methods herein can comprise analyzing DNA in a sample. In some embodiments described herein, the disclosed methods comprise partitioning DNA. In such methods, different forms of DNA (e.g., hypermethylated and hypomethylated DNA) can be physically partitioned based on one or more characteristics of the DNA. This approach can be used to determine, for example, whether certain sequences are hypermethylated or hypomethylated and whether certain hypermethylated regions overlap with regions with copy number variants. In some embodiments, a first subsample or aliquot of a sample is subjected to steps for making capture probes as described elsewhere herein and a second subsample or aliquot of a sample is subjected to partitioning. In some embodiments, a sample or subsample or aliquot thereof is subjected to partitioning and differential tagging, followed by a capture step using capture probes for rearranged sequences and optionally additional capture probes, e.g., for sequence-variable and / or epigenetic target regions.
[0349] Methylation profiling can involve determining methylation patterns across different regions of the genome. For example, after partitioning molecules based on extent of methylationAtty. Docket No. GH0206WO(e.g., relative number of methylated nucleobases per molecule) and sequencing, the sequences of molecules in the different partitions can be mapped to a reference genome. This can show regions of the genome that, compared with other regions, are more highly methylated or are less highly methylated. In this way, genomic regions, in contrast to individual molecules, may differ in their extent of methylation.E. Conversion; Contacting the DNA with a Deaminase
[0350] The methods disclosed herein can comprise subjecting the sample or one or more subsamples (e.g., DNA in an adapted library) to a procedure that affects a first nucleobase in the DNA differently from a second nucleobase. In some embodiments, the first nucleobase is a modified or an unmodified nucleobase, and the second nucleobase is a modified or an unmodified nucleobase different from the first nucleobase. In some embodiments, the first nucleobase and the second nucleobase have the same base pairing specificity. In some embodiments, the procedure that affects a first nucleobase in the DNA differently from a second nucleobase comprises a conversion procedure that changes the base pairing specificity of the base or does not change the base pairing specificity of the base, depending on the modification status of the base. In some embodiments, the first nucleobase is an unmodified cytosine and the second nucleobase is a modified cytosine (e.g., 5-methylcytosine or 5-hydroxymethylcytosine).
[0351] In some embodiments, the procedure that affects a first nucleobase of the DNA differently from a second nucleobase of the DNA is conversion. In some embodiments, the procedure that affects a first nucleobase of the DNA differently from a second nucleobase of the DNA is methylation-sensitive conversion. The methods disclosed herein can comprise contacting DNA in a sample with a deaminase, thereby providing a converted sample. In some embodiments, the deaminase is a methyl-sensitive deaminase or a methyl -insensitive deaminase. In some embodiments, the deaminase is a dsDNA deaminase and / or a ssDNA deaminase. This step of contacting the DNA in the sample with a deaminase can be referred to as, or be included in, a conversion procedure, such as any of the conversion procedures described elsewhere herein. For an exemplary description of conversion using a deaminase, see, e.g., Schutsky et al., Nature Biotechnology 2018; 36: 1083-1090. In some embodiments, the DNA in the converted sample is then sequenced, and a level or methylation at one or more differentially methylated regions of the DNA is quantified, or a variation of the copy number at one or more regions of the DNA is quantified.Atty. Docket No. GH0206WO
[0352] In some embodiments, the deaminase (e.g., the methyl-sensitive deaminase or the methyl-insensitive deaminase) comprises any one or more of the following deaminases or a truncated version thereof, such as any of the truncated versions disclosed in Vaisvila et al., Mol Cell. 2024 Mar 7;84(5):854-866.e7. doi: 10.1016 / j.molcel.2024.01.027. Epub 2024 Feb 22, available on www.cell.com at / molecular-cell / fulltext / S 1097-2765(24)00094-7 (“Vaisvila 2024”): MsddA, AshDaOl, MGYPDa21, PpDa03, SbDaOl, BlDaOl, PpDaO4, CsDaOl, MGYPDa22, FIDaOl, MGYPDa24, AaDaO2, MmgDaOl, PbDaOl, BcDaO2, LsfDaOl, SmgDaOl, XcDaOl, KsDaOl, PwDaOl, CaDaOl, SrDaOl, NgDaOl, NsDaOl, SzDaOl, SpDaOl, AdDaOl, MGYPDa23, WWTPDaO7, PdDaOl, MGYPDa25, MGYPDa26, DaDaOl, EcDaOl, EcDaO2, NgDaO2, PaDaOl, AsDaOl, HgmDaOl, MsDaO2, XinDaOl, XjaDaOl, RhDaOl, MGYPDaO4, MGYPDa05, BaDaOl, WWTPDaO4, PbDaO2, CrDaOl, MGYPDal5, MGYPDal6, MGYPDal7, BaDaO2, VsDaOl, MGYPDal8, MGYPDal9, HmDaO6, MmgDaO2, HgmDaO2, CgmDaOl, FbiDaOl, PvmDaOl, MGYPDa408, MGYPDa687, MGYPDa917, MGYPDa624, DddA, StsDaOl, LbsDaOl, BpDaO2, AmDa03, MsDaOl, KsDaO2, MGYPDa829, PaDaO2, RaDaOl, BadTF3,HmDa01, HmDaO2, HmDa03, AmDaOl, SjDaOl, MGYPDaOl, SqDaOl, TeDaOl, StsDaO3, SaDaO2, PpDaO2, EcDaO4, MGYPDaO2, MGYPDa03, BcDaOl, IfDaOl, PcDaOl, StsDaO4,AmDaO4, AbDaO2, WWTPDa05, WWTPDaO6, PeDaOl, SaDaO3, HgDaOl, AbcDaOl, HmDaO4, AmDaO2, AcDaOl, MGYPDal3, LbDaOl, CbDaOl, HcDaOl, MGYPDaO6, CseDaOl, AvDaOl, LbDaO2, MGYPDaO7, FbDaOl, IfDaO2, RsDaOl, NoDaO 1, PfDaOl, ScDaO3, PsDaOl, PvDaOl, CdDaOl, AzDaOl, BdDaOl, MGYPDa08, MGYPDaO9, AoDaOl, MGYPDalO, MGYPDal l, MGYPDal2, MGYPDal4, SsdA, gp317, xpl2da, APOBEC3A, KcDaOl, TuDaOl, BsDaOl, PpDaOl, SaDaOl, CpDaOl, EcDaO3, ScDaOl, BpDaOl, ScDaO2, StsDaO2, OTT-1508, NpDaOl, BmDaOl, BsDaO2, PIDaOl, BbDaOl, OlDaOl, WcDaOl, BbDaO2, PrDaOl, VRDaO2, VRDaO3, VRDaO4, VRDaO5, VRDaO6, AbDaOl, AaDaOl, WWTPDaOl , WWTPDaO2, WWTPDa03, SoCaDaOl, SoCaDa02, SoCaDa03, SoCaDa04, SoCaDa05, SoCaDa06, SoCaDa07, SoCaDa08, or SoCaDa09. In some embodiments, the deaminase (e.g., the methyl-sensitive deaminase or the methyl-insensitive deaminase) comprises a mutant deaminase or an alternatively truncated deaminase.
[0353] Table 4 summarizes exemplary methods of deamination with the type of modified bases detectable with these methods. These are described in more detail below.Table 4 - Exemplary deamination methodsAtty. Docket No. GH0206WO
[0354] As outlined below, there are various methods of detecting and / or identifying modified nucleosides that rely on a conversion procedure that changes the base-pairing specificity of a nucleoside, based on the modification status of the nucleosides. These changes of base-pairing specificity can then be detected, and thus the modification status of the nucleoside inferred, by sequencing.
[0355] In some embodiments, the conversion procedure used in the methods of the disclosure is one that changes the base pairing specificity of a modified nucleoside (e.g. methylated cytosine) but does not change the base pairing specificity of the corresponding unmodified nucleoside (e.g. cytosine) or does not change the base pairing specificity of any un-modified nucleoside (e.g. cytosine, adenosine, guanosine and thymidine (or uracil)). Advantages of methods that do not convert the base-pairing specificity of unmodified nucleosides include reduced loss of sequence complexity, higher sequencing efficiency and reduced alignment losses. Additionally, methods such as TAPS may in some cases be preferred over methods such as bisulfite sequencing and EM-seq because they are less destructive (especially important for low yield samples such as cfDNA or FFPE samples) and do not require denaturation, meaning that non-conversion errors are theoretically more likely to be random. In methods that require denaturation for conversion, failure to denature a DNA molecule will result in non-conversion of all bases in the DNA molecule. As biological changes in methylation are predominantly concerted to a localized regions of interest, these non-random (localized) non-conversion events can appear as false negatives (non-methylated regions). Random non-conversion methods can maximally affect a low percent of bases within a region, and thus the specificity of methylation change detection canAtty. Docket No. GH0206WO be maximized (reduce false positives) by placing a threshold on percentage of bases within a region that are methylated / non-methylated. Hence, in some cases, a conversion procedure that does not involve denaturation can be preferred.
[0356] In other cases, the conversion procedure that can be used in the methods of the disclosure is one that changes the base pairing specificity of an unmodified nucleoside (e.g. cytosine) but does not change the base pairing specificity of the corresponding modified nucleoside (e g. methylated cytosine such as 5hmC and / or 5mC). Such methods include, for example, bisulfite sequencing.
[0357] The skilled person can select a suitable method according to their needs, including which nucleoside modifications are to be detected and / or identified and which type of modified base is used in the end repair reaction.
[0358] In some embodiments, the conversion procedure converts modified nucleosides. In some embodiments, the conversion procedure which converts modified nucleosides comprises Tet- assisted conversion with a substituted borane reducing agent, optionally wherein the substituted borane reducing agent is 2-picoline borane, borane pyridine, tert-butylamine borane, ammonia borane or pyridine borane. In Tet-assisted pic-borane conversion with a substituted borane reducing agent conversion, a TET protein is used to convert 5mC and 5hmC to 5caC, without affecting unmodified C. 5caC, and 5fC if present, are then converted to dihydrouracil (DHU) by treatment with 2-picoline borane (pic-borane) or another substituted borane reducing agent such as borane pyridine, tert-butylamine borane, or ammonia borane, also without affecting unmodified C. See, e.g., Liu et al., Nature Biotechnology 2019; 37:424-429 (e.g., at Supplementary Fig. 1 and Supplementary Note 7). Thus, when this type of conversion is used, the first nucleobase comprises one or more of 5mC, 5fC, 5caC, or 5hmC, and the second nucleobase comprises unmodified cytosine. DHU is read as a T in sequencing. Sequencing of the converted DNA identifies positions that are read as cytosine as being unmodified C positions. Meanwhile, positions that are read as T are identified as being T, 5mC, 5fC, 5caC, or 5hmC. Performing TAP conversion, such as on a DNA sample as described herein, thus facilitates identifying positions containing unmodified C using the sequence reads obtained.
[0359] Hence, in these embodiments, the end repair reaction can be performed with dNTPs, wherein the at least one type of dNTP comprises a 5mC or 5hmC, and regions of the end- repaired DNA synthesized during the end repair reaction can be identified as those regions comprising 5mC or 5hmC (via T being called at positions which are C in the reference) at nonAtty. Docket No. GH0206WOCpG positions. This procedure encompasses Tet-assisted pyridine borane sequencing (TAPS), described in further detail in Liu et al. 2019, supra. In this method Tet enzyme is used to progressively oxidize 5mC and 5hmC to 5fC or 5caC, then pyridine borane deaminates 5fC, 5CaC to DHU, amplified as T.
[0360] Alternatively, protection of 5hmC (e.g., using PGT or 5-hydroxymethylcytosine carbamoyltransferase) can be combined with Tet-assisted conversion with a substituted borane reducing agent, e.g. as described above. In this method (TAPS-P), 5hmC can be protected from conversion, for example through glucosylation using P-glucosyl transferase (PGT), forming (forming 5-glucosylhydroxymethylcytosine) 5ghmC, or through carbamoylation using 5- hydroxymethylcytosine carbamoyltransferase, forming 5cmC. This is described in Yu et al., Cell 2012; 149: 1368-80. Treatment with a TET protein such as mTetl then converts 5mC to 5caC but does not convert C, 5ghmC, or 5cmC. 5caC is then converted to DHU by treatment with pic- borane or another substituted borane reducing agent such as borane pyridine, tert-butylamine borane, or ammonia borane, also without affecting ghmC, 5cmC, or unmodified C. Thus, when Tet-assisted conversion with a substituted borane reducing agent is used, the first nucleobase comprises mC, and the second nucleobase comprises one or more of unmodified cytosine or hmC, such as unmodified cytosine and optionally hmC, fC, and / or caC. Sequencing of the converted DNA identifies positions that are read as cytosine as being either 5hmC or unmodified C positions. Meanwhile, positions that are read as T are identified as being T, 5fC, 5caC, or 5mC. Performing TAPSP conversion on a sample as described herein thus facilitates distinguishing positions containing unmodified C or 5hmC on the one hand from positions containing 5mC using the sequence reads obtained. Hence, in these embodiments, the end repair reaction can be performed with dNTPs, wherein the at least one type of dNTP comprises a 5mC, and regions synthesized during the end repair reaction can be identified as those regions comprising 5mC (via T being called at positions which are C in the reference) at non-CpG positions. For an exemplary description of this type of conversion, see, e.g., Liu et al., Nature Biotechnology 2019; 37:424-429. 5-hydroxymethylcytosine carbamoyltransferase is described in Yang et al., Bio-protocol, 2022; 12(17): e4496.
[0361] In some embodiments, the conversion procedure converts modified nucleosides. In some embodiments, the conversion procedure which converts modified nucleosides comprises chemical-assisted conversion with a substituted borane reducing agent, optionally wherein the substituted borane reducing agent is 2-picoline borane, borane pyridine, tert-butylamine borane,Atty. Docket No. GH0206WO borane pyridine or ammonia borane. In chemical-assisted conversion with a substituted borane reducing agent, an oxidizing agent such as potassium perruthenate (KRuCh) (also suitable for use in ox-BS conversion) is used to specifically oxidize 5hmC to 5fC. Treatment with pic-borane or another substituted borane reducing agent such as borane pyridine, tert-butylamine borane, or ammonia borane converts 5fC and 5caC to DHU but does not affect 5mC or unmodified C. Thus, when this type of conversion is used, the first nucleobase comprises one or more of hmC, fC, and caC, and the second nucleobase comprises one or more of unmodified cytosine or mC, such as unmodified cytosine and optionally mC. Sequencing of the converted DNA identifies positions that are read as cytosine as being either 5mC or unmodified C positions. Meanwhile, positions that are read as T are identified as being T, 5fC, 5caC, or 5hmC. Performing this type of conversion as described herein thus facilitates distinguishing positions containing unmodified C or 5mC on the one hand from positions containing 5hmC using the sequence reads obtained. Hence, in these embodiments, the end repair reaction can be performed with dNTPs, wherein at least one type of dNTP comprises a 5hmC, and regions synthesized during the end repair reaction can be identified as those regions comprising 5hmC (via T being called at positions which are C in the reference) at non-CpG positions. For an exemplary description of this type of conversion, see, e.g., Liu et al., Nature Biotechnology 2019; 37:424-429.
[0362] Exemplary conversion procedures that change the base-pairing specificity of modified cytosines have been described. However, the methods described herein could in principle use any modified nucleoside and suitable conversion procedure (i.e. single-base epigenetic conversion assay) that changes the base-pairing specificity of the modified nucleoside and thereby allows the modified base to be distinguished from the corresponding unmodified nucleoside and / or other types of modification when sequenced. For example, any conversion procedure could be used allowing any one of N6-methyladenine (6mA), N6- hydroxymethyladenine (6hmA), or N6-formyl adenine (6fA) to be distinguished from unmodified adenosine.
[0363] In some embodiments, the conversion procedure converts unmodified nucleosides. In some embodiments, the conversion procedure which converts unmodified nucleosides comprises bisulfite conversion. Treatment with bisulfite converts unmodified cytosine and certain modified cytosine nucleotides (e.g. 5-formyl cytosine (5fC) or 5-carboxylcytosine (5caC)) to uracil whereas other modified cytosines (e.g., 5mC and 5hmC) are not converted. Thus, where bisulfite conversion is used, the first nucleobase comprises one or more of unmodified cytosine, 5fC,Atty. Docket No. GH0206WO5caC, or other cytosine forms affected by bisulfite, and the second nucleobase may comprise one or more of 5mC and 5hmC, such as 5mC and optionally 5hmC. Sequencing of bisulfite-treated DNA identifies positions that are read as cytosine as being 5mC or 5hmC positions. Meanwhile, positions that are read as T are identified as being T or a bisulfite-susceptible form of C, such as unmodified cytosine, 5fC, or 5caC. Thus, performing bisulfite conversion, such as on a DNA sample as described herein facilitates identifying positions containing 5mC or 5hmC. Hence, in these embodiments, the end repair reaction can be performed with dNTPs, wherein at least one type of dNTP comprises a 5mC and / or a 5hmC, and regions synthesized during the end repair reaction can be identified as those regions comprising 5mC or a 5hmC (via C being called at these positions) at non-CpG positions. For an exemplary description of bisulfite conversion, see, e.g., Moss et al., Nat Commun. 2018; 9: 5068.
[0364] In some embodiments, the procedure which converts unmodified nucleosides comprises oxidative bisulfite (Ox-BS) conversion. This procedure first converts 5hmC to 5fC, which is bisulfite susceptible, followed by bisulfite conversion. Thus, when oxidative bisulfite conversion is used, the first nucleobase comprises one or more of unmodified cytosine, 5fC, 5caC, 5hmC, or other cytosine forms affected by bisulfite, and the second nucleobase comprises 5mC.Sequencing of Ox-BS converted DNA identifies positions that are read as cytosine as being 5mC positions. Meanwhile, positions that are read as T are identified as being T or a bisulfite- susceptible form of C, such as unmodified cytosine, 5fC, or 5hmC. Hence, in these embodiments, the end repair reaction can be performed with dNTPs, wherein at least one type of dNTP comprises a 5mC, and regions synthesized during the end repair reaction can be identified as those regions comprising 5mC (via C being called at these positions) at non-CpG positions. Performing Ox-BS conversion thus facilitates identifying positions containing mC. For an exemplary description of oxidative bisulfite conversion, see, e.g., Booth et al., Science 2012; 336: 934-937.
[0365] In some embodiments, the procedure which converts unmodified nucleosides comprises Tet-assisted bisulfite (TAB) conversion. In TAB conversion, 5hmC is protected from conversion and 5mC is oxidized in advance of bisulfite treatment, so that positions originally occupied by 5mC are converted to U while positions originally occupied by 5hmC remain as a protected form of cytosine. For example, as described in Yu et al., Cell 2012; 149: 1368-80, [3-glucosyl transferase can be used to protect 5hmC (forming 5 -glucosylhydroxymethylcytosine (5ghmC)),Atty. Docket No. GH0206WO then a TET protein such as mTetl can be used to convert 5mC to 5caC, and then bisulfite treatment can be used to convert C and 5caC to U while 5ghmC remains unaffected.
[0366] Alternatively, a carbamoyltransferase enzyme, such as 5-hydroxymethylcytosine carbamoyltransferase as described in Yang et al., Bio-protocol, 2022; 12(17): e4496, can be used to protect hmC (by converting hmC to 5-carbamoyloxymethylcytosine (5cmC)), then a TET protein such as mTetl can be used to convert mC to caC, and then bisulfite treatment can be used to convert C and caC to U while 5cmC remains unaffected. Thus, when TAB conversion is used, the first nucleobase comprises one or more of unmodified cytosine, 5fC, 5caC, 5mC, or other cytosine forms affected by bisulfite, and the second nucleobase comprises 5hmC. Sequencing of TAB-converted DNA identifies positions that are read as cytosine as being 5hmC positions. Meanwhile, positions that are read as T are identified as being T, or a bisulfite-susceptible form of C, such as unmodified cytosine, 5mC, 5fC, or 5caC. Performing TAB conversion on a first subsample as described herein thus facilitates identifying positions containing 5hmC. Hence, in these embodiments, the end repair reaction can be performed with dNTPs, wherein at least one type of dNTP comprises a 5hmC, and regions synthesized during the end repair reaction can be identified as those regions comprising 5hmC (via C being called at these positions) at non-CpG positions.
[0367] In some embodiments, the conversion procedure which converts unmodified cytosines comprises APOBEC-coupled epigenetic (ACE) conversion. In ACE conversion, an AID / APOBEC family DNA deaminase enzyme such as APOBEC3A (A3 A) is used to deaminate an unmodified cytosine and 5mC without deaminating 5hmC, 5fC, or 5-caC. Thus, when ACE conversion is used, the first nucleobase comprises unmodified C and / or mC (e.g., unmodified C and optionally mC), and the second nucleobase comprises hmC. Sequencing of ACE-converted DNA identifies positions that are read as cytosine as being 5hmC, 5fC, or 5-caC positions. Meanwhile, positions that are read as T are identified as being T, unmodified C, or 5mC. Performing ACE conversion as described herein thus facilitates distinguishing positions containing 5hmC from positions containing 5mC or unmodified C using the sequence reads obtained from the first subsample. In some embodiments, the end repair reaction can be performed with dNTPs, wherein at least one type of dNTP comprises a 5hmC, and regions synthesized during the end repair reaction can be identified as those regions comprising 5hmC (via C being called at these positions) at non-CpG positions. For an exemplary description of ACE conversion, see, e.g., Schutsky et al., Nature Biotechnology 2018; 36: 1083-1090.Atty. Docket No. GH0206WO
[0368] In some embodiments, the procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA of the first subsample comprises enzymatic conversion of the first nucleobase, e.g., as in EM-Seq. See, e.g., Vaisvila R, et al. (2019) EM- seq: Detection of DNA methylation at single base resolution from picograms of DNA. bioRxiv, DOI: 10.1101 / 2019.12.20.884692, available at www.biorxiv.org / content / 10.1101 / 2019.12.20.884692vl . For example, TET2 and T4-[BGT or 5-hydroxymethylcytosine carbamoyltransferase (described in Yang et al., Bio-protocol, 2022; 12(17): e4496) can be used to convert 5mC and 5hmC into substrates that cannot be deaminated by a deaminase (e.g., APOBEC3A), and then a deaminase (e.g., APOBEC3A) can be used to deaminate unmodified cytosines, converting them to uracils.
[0369] In some embodiments, the procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA comprises enzymatic conversion of the first nucleobase using a non-specific, modification-sensitive double-stranded DNA deaminase, e.g., as in SEM-seq. See, e.g., Vaisvila et al. (2023) Discovery of novel DNA cytosine deaminase activities enables a nondestructive single-enzyme methylation sequencing method for base resolution high-coverage methylome mapping of cell-free and ultra-low input DNA. bioRxiv; DOI: 10.1101 / 2023.06.29.547047, available at https: / / www.biorxiv.org / content / 10.1101 / 2023.06.29.547047vl. SEM-Seq employs a nonspecific, modification-sensitive double-stranded DNA deaminase (MsddA) in a nondestructive single-enzyme 5-methylctyosine sequencing (SEM-seq) method that deaminates unmodified cytosines. Accordingly, SEM-seq does not require the TET2 and T4- GT or 5- hydroxymethylcytosine carbamoyltransferase protection and denaturing steps that are of use, e.g., in APOEC3A-based protocols. Additionally, MsddA does not deaminate 5-formylated cytosines (5fC) or 5-carboxylated cytosines (5-caC). In SEM-seq, unmodified cytosines in the DNA are deaminated to uracil and is read as “T” during sequencing. Modified cytosines (e.g., 5mC) are not converted and are read as “C” during sequencing. Cytosines that are read as thymines are identified as unmodified (e.g., unmethylated) cytosines or as thymines in the DNA. Performing SEM-seq conversion thus facilitates identifying positions containing 5mC using the sequence reads obtained. In some embodiments, the procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA comprises enzymatic conversion of unmodified cytosine using MsddA or a modification-sensitive DNA deaminase A (MsddA)-like deaminase. For an exemplary description of MsddA and MsddA-like deaminases, see, e.g.,Atty. Docket No. GH0206WOVaisvila et al. Mol Cell. 2024 Mar 7;84(5):854-866.e7, which illustrates in Fig. 2A-C that MsddA-like deaminases have reduced activity on each of 5mC, 5hmC, and 5gmC relative to unmodified cytosine in dsDNA, e.g., a reduction of about 75%, 80%, or more on each of 5mC, 5hmC, and 5gmC relative to unmodified cytosine (e.g., using assay conditions as described in Vaisvila et al., such as analysis of deamination of C in E. coli or lambda dem- DNA, deamination of 5mC in XP12 phage DNA, deamination of 5hmC in a C-hydroxymethylated adenovirus PCR fragment or fully C-hydroxymethylated T4147 phage DNA, and deamination of 5gmC in alpha-glucosyltransferase knockout (AGT-) T4 phage DNA. Deamination can be performed by contacting substrate DNA with deaminase and analyzed using NGS as follows: 50 ng of unmodified E. coli C2566 genomic DNA can be combined with the control DNAs (about 1 ng of Lambda, XP12, and T4147, and 0.1 ng of the 5hmC Adenovirus PCR fragment), sheared to about 300 bp and ligated to pyrrolo-dC adapters with 1 uL of in vitro synthesized deaminase (e.g., synthesized using the PURExpress In Vitro Protein Synthesis kit (NEB, Ipswich, MA) following manufacturer’s recommendations with 100-400 ng of PCR fragment template DNA containing codon-optimized deaminase coding sequence and T7 promoter and terminator). Exemplary deamination reaction conditions are 50 mM Bis-Tris pH 6.0, 0.1% Triton X-100 for 1 hour at 37 degrees C. After the deamination reaction, 1 uL of Thermolabile Proteinase K (NEB, Ipswich, MA) can be added and incubated for 30 min at 37 degrees C and then the Proteinase K can be heat inactivated at 60 degrees C for 10 minutes. The deaminated product can then be used for library amplification using the NEBNext Q5U Master Mix (New England Biolabs, Ipswich, MA, USA) with 5mMof NEBNext Unique Dual Index Primers. The resulting library can be purified using IX NEBNext Sample Purification Beads according to the manufacturer’s instructions and the purified library can be analyzed and quantified by an Agilent Bioanalyzer 2100 DNA Highsensitivity chip. The libraries can be sequenced using the Illumina NextSeq and NovaSeq platforms. Paired-end sequencing of 75 cycles (2 x 75 bp) can be performed for all the sequencing runs. Base calling and demultiplexing can be carried out with the standard Illumina pipeline.
[0370] In some embodiments, the conversion procedure converts modified nucleosides. In some embodiments, the conversion procedure which converts modified nucleosides comprises enzymatic conversion, such as DM-seq, for example, as described in Wang et al., Nat Chem Biol. 2023, 19(8): 1004-1012 and WO2023 / 288222A1. In DM-seq, unmodified cytosines in the DNA are enzymatically protected from a subsequent deamination step wherein 5mC in 5mCpGAtty. Docket No. GH0206WO is converted to T. The enzymatically protected unmodified (e.g., unmethylated) cytosines are not converted and are read as “C” during sequencing. Cytosines that are read as thymines (in a CpG context) are identified as methylated cytosines in the DNA.
[0371] Thus, when this type of conversion is used, the first nucleobase comprises unmodified (such as unmethylated) cytosine, and the second nucleobase comprises modified (such as methylated) cytosine. Sequencing of the converted DNA identifies positions that are read as cytosine as being unmodified C positions. Meanwhile, positions that are read as T are identified as being T or 5mC. Performing DM-seq conversion thus facilitates identifying positions containing 5mC using the sequence reads obtained.
[0372] Exemplary cytosine deaminases for use herein include APOBEC enzymes, for example, APOBEC3A. Generally, AID / APOBEC family DNA deaminase enzymes such as APOBEC3A (A3 A) are used to deaminate (unprotected) unmodified cytosine and 5mC. For an exemplary description of APOBEC enzymes, see, e.g., Gajula el al.. Nucleic Acids Res . 2014 Sep;42(15):9964-75 and Schutsky etal., Nucleic Acids Res. 2017 Jul 27;45(13):7655-7665. For an exemplary description of APOBEC conversion, see, e.g., Schutsky et al., Nature Biotechnology 2018; 36: 1083-1090.
[0373] The enzymatic protection of unmodified cytosines in the DNA comprises addition of a protective group to the unmodified cytosines. Such protective groups can comprise an alkyl group, an alkyne group, a carboxyl group, a carboxyalkyl group, an amino group, a hydroxymethyl group, a glucosyl group, a glucosylhydroxymethyl group, an isopropyl group, or a dye. For example, DNA can be treated with a methyltransferase, such as a CpG-specific methyltransferase, which adds the protective group to unmodified cytosines. The term methyltransferase is used broadly herein to refer to enzymes capable of transferring a methyl or substituted methyl (e.g., carboxymethyl) to a substrate (e.g., a cytosine in a nucleic acid). In some embodiments, the DNA is contacted with a CpG-specific DNA methyltransferase (MTase), such as a CpG-specific carboxymethyltransferase (CxMTase), and a substituted methyl donor, such as a carboxymethyl donor (e.g., carboxymethyl-S-adenosyl-L-methionine). See, e.g., WO2021 / 236778A2. In particular embodiments, the CxMTase can facilitate the addition of a protective carboxymethyl group to an unmethylated cytosine. In some embodiments, the unmethylated cytosine is unmodified cytosine. The carboxymethyl group can prevent deamination of the cytosine during a deamination step (such as a deamination step using an APOBEC enzyme, such as A3 A). Substituted methyl or carboxymethyl donors useful in theAtty. Docket No. GH0206WO disclosed methods include but are not limited to, S-adenosyl-L-methionine (SAM) analogs, optionally wherein the SAM analog is carboxy-S-adenosyl-L-methionine (CxSAM). SAM analogs are described, for example, in WO2022 / 197593 Al. The MTase may be, for example, a CpG methyltransferase from Spiroplasma sp. strain MQ1 (M.SssI), DNA-methyltransferase 1 (DNMT1), DNA-methyltransferase 3 alpha (DNMT3A), DNA-methyltransferase 3 beta (DNMT3B), or DNA adenine methyltransferase (Dam). The CxMTase may be a CpG methyltransferase from Mycoplasma penetrans (M.Mpel).
[0374] In one embodiment, the methyltransferase enzyme is a variant of M.Mpel having an N374R substitution or an N374K substitution. The methyltransferase can further comprise one or more amino acid substitutions selected from a) substitution of one or both residues T300 and E305 with S, A, G, Q, D, or N; b) substitution of one or more residues A323, N306, and Y299 with a positively charged amino acid selected from K, R or H; and / or c) substitution of S323 with A, G, K, R or H, which may enhance the activity of the enzyme.
[0375] Optionally, the conversion procedure further includes enzymatic protection of 5hmCs, such as by glucosylation of the 5hmCs (e.g., using 0GT) or by carbamoylation of the 5hmCs (e.g., using 5-hydroxymethylcytosine carbamoyltransferase), in the DNA prior to the deamination of unprotected modified cytosines. In this method, 5hmC can be protected from conversion, for example through glucosylation using P-glucosyl transferase (PGT), forming (5- glucosylhydroxymethylcytosine) 5ghmC, or through carbamoylation using 5- hydroxymethylcytosine carbamoyltransferase, forming 5cmC. This is described, for example, in Yu et al., Cell 2012; 149: 1368-80, and in Yang et al., Bio-protocol, 2022; 12(17): e4496. Glucosylation or carbamoylation of 5hmC can reduce or eliminate deamination of 5hmC by a deaminase such as APOBEC3A. Treatment with an MTase or CxMTase then adds a protecting group to unmodified (unmethylated) cytosines in the DNA. 5mC (but not protected, unmodified cytosine and not 5ghmC or 5cmC) is then deaminated (converted to T in the case of 5mC) by treatment with a deaminase, for example, an APOBEC enzyme (such as APOBEC3A).Sequencing of the converted DNA identifies positions that are read as cytosine as being either 5hmC or unmodified C positions. Meanwhile, positions that are read as T are identified as being T or 5mC. Performing DM-seq conversion with glucosylation of 5hmC on a sample as described herein thus facilitates distinguishing positions containing unmodified C or 5hmC on the one hand from positions containing 5mC using the sequence reads obtained.Atty. Docket No. GH0206WO
[0376] Also provided herein are methods in which alternative base conversion schemes can be used. For example, unmethylated cytosines can be left intact while methylated cytosines and hydroxymethylcytosines are converted to a base read as a thymine (e.g., uracil, thymine, or dihydrouracil).
[0377] In some embodiments, methylating a cytosine in at least one first complementary strand or second complementary strand comprises contacting the cytosine with a methyltransferase such as DNMT1 or DNMT5. In such embodiments, the step of oxidizing a 5-hydroxymethylated cytosine to 5 -formyl cytosine (such as by contacting the 5 -hydroxymethyl cytosine in a first strand and a second strand with KRuC ) can be optional.
[0378] In some embodiments, converting the modified cytosine in at least one first or second strand to a thymine or a base read as thymine comprises oxidizing a hydroxymethyl cytosine, e.g., the hydroxymethyl cytosine is oxidized to formylcytosine. In some embodiments, oxidizing the hydroxymethyl cytosine to formylcytosine comprises contacting the hydroxymethyl cytosine with a ruthenate, such as potassium ruthenate (KRuC ).
[0379] In some embodiments, the modified cytosine is converted to thymine, uracil, or dihydrouracil. In any such embodiments, amplification methods may comprise uracil- and / or dihydrouracil-tolerant amplification methods, such as PCR using a uracil- and / or dihydrouracil - tolerant DNA polymerase.
[0380] In some embodiments, the method comprises converting a formyl cytosine and / or a methylcytosine to carboxyl cytosine as part of converting the modified cytosine in at least one first or second strand to a thymine or a base read as thymine. For example, converting the formylcytosine and / or the methylcytosine to carboxylcytosine can comprise contacting the formylcytosine and / or the methylcytosine with a TET enzyme, such as TET1, TET2, or TET3. In some embodiments, the method comprises reducing the carboxylcytosine as part of converting the modified cytosine in at least one first or second strand to a thymine or a base read as thymine, and / or the carboxylcytosine is reduced to dihydrouracil. In some embodiments, reducing the carboxylcytosine comprises contacting the carboxylcytosine with a borane or borohydride reducing agent.
[0381] In some embodiments, the borane or borohydride reducing agent comprises pyridine borane, 2-pi coline borane, borane, tert-butyl amine borane, ammonia borane, sodium borohydride, sodium cyanoborohydride (NaBHaCN), lithium borohydride (LiBEU), ethylenediamine borane, dimethylamine borane, sodium triacetoxyborohydride, morpholineAtty. Docket No. GH0206WO borane, 4-methylmorpholine borane, trimethylamine borane, dicyclohexylamine borane, or a salt thereof. In other embodiments, the reducing agent comprises lithium aluminum hydride, sodium amalgam, amalgam, sulfur dioxide, dithionate, thiosulfate, iodide, hydrogen peroxide, hydrazine, diisobutylaluminum hydride, oxalic acid, carbon monoxide, cyanide, ascorbic acid, formic acid, dithiothreitol, beta-mercaptoethanol, or any combination thereof.
[0382] As discussed above, in some embodiments, a TET protein can be used to convert 5mC and optionally 5hmC (but not unmodified C) into substrates (e.g., 5caC) that cannot be deaminated by a deaminase, and then a deaminase (e.g., APOBEC3A) can be used to deaminate unmodified cytosines, converting them to uracils. Various TET enzymes may be used in the disclosed methods as appropriate. In some embodiments, the one or more TET enzymes comprise TETv. TETv is described in US Patent 10,260,088 and its sequence is SEQ ID NO: 1 therein. In some embodiments, the one or more TET enzymes comprise TETcd. TETcd is described in US Patent 10,260,088 and its sequence is SEQ ID NO: 3 therein. In some embodiments, the one or more TET enzymes comprise TET1. In some embodiments, the one or more TET enzymes comprise TET2. TET2 may be expressed and used as a fragment comprising TET2 residues 1129-1480 joined to TET2 residues 1844-1936 by a linker as described, e.g., in US Patent 10,961,525. In some embodiments, the one or more TET enzymes comprise TET1 and TET2. In some embodiments, the one or more TET enzymes comprise a T1372 TET mutant, such as T1372S. In some embodiments, the one or more TET enzymes comprise a V1900 TET mutant, such as a VI 900 A, V1900C, V1900G, VI 9001, or V1900P TET mutant. In some embodiments, the one or more TET enzymes comprise a VI 900 TET2 mutant, such as a V1900A, V1900C, V1900G, V1900I, or V1900P TET2 mutant. It can be beneficial to use a TET enzyme that maximizes formation of 5-carboxylcytosine (5-caC) relative to less oxidized modified cytosines, particularly 5-formylcytosine, because 5-caC is not a substrate for enzymatic deamination, e.g., by APOBEC enzymes such as APOBEC3A. Maximizing formation of 5-caC thus reduces the risk of false calls in which a base is identified as unmethylated because it underwent deamination even though it was methylated (or hydroxymethylated) in the original sample. Accordingly, in some embodiments, the TET enzyme comprises a mutation that increases formation of 5-caC. Exemplary mutations are set forth above. “A mutation that increases formation of 5-caC” means that the TET enzyme having the mutation produces more 5- caC than a TET enzyme that lacks the mutation but is otherwise identical. 5-caC production can be measured as described, e.g., in Liu et al., Nat Chem Biol 13: 181-187 (2017) (see OnlineAtty. Docket No. GH0206WOMethods section, TET reactions in vitro subsection, “driving” conditions). Any variants and / or mutants described in Liu et al. (2017) can be used in the disclosed methods as appropriate.
[0383] In some embodiments, the one or more TET enzymes comprise a TET2 enzyme comprising a T1372S mutation, such as TET2-CS-T1372S and TET2-CD-T1372S. A TET2 comprising a T1372S mutation is described in US Patent 10,961,525 and may be expressed and used as a fragment comprising TET2 residues 1129-1480 joined to TET2 residues 1844-1936 by a linker. Position 1372 of TET2 corresponds to position 258 of SEQ ID NO: 21 (wild type TET2 catalytic domain) of US Patent 10,961,525. Thus, the sequence of a T1372S TET2 catalytic domain may be obtained by changing the threonine at position 258 of SEQ ID NO: 21 of US Patent 10,961,525 to serine. TET2 comprising a T1372S mutation is also described in Liu et al., Nat Chem Biol. 2017 February; 13(2): 181-187. As demonstrated in Liu et al., TET2 comprising a T1372S mutation can more efficiently oxidize 5mC to produce 5-carboxylcytosine (5-caC) than other versions of TET2 such as TET2 lacking a T1372S mutation.
[0384] In some embodiments, the deaminase is thermally inactivated after contacting DNA with the deaminase. In some embodiments, the thermal inactivation comprises heating or cooling of the deaminase to a temperature at which the deaminase has reduced or inhibited activity relative to a deaminase that has not been subjected to heating or cooling. In some embodiments, the thermal inactivation completely inhibits the activity of the deaminase or reduces the activity of the deaminase by at least about 5%, about 10%, about 15%, about 20%, about 25%, about 50%, about 75%, about 90%, about 95%, about 98%, about 99%, or 100% relative to a deaminase that has not been subjected to heating or cooling.F. Contacting DNA with a methylation-sensitive or methylation-dependent nuclease
[0385] In some embodiments, a DNA sample or a subsample thereof (e.g., a first, second, or third subsample prepared by partitioning a sample as described herein, such as on the basis of a level of a cytosine modification, such as methylation, e.g., 5-methylation, such as of cytosine) is contacted with a methylation-dependent nuclease or methylation-sensitive nuclease. The contacting can be performed using a subsample of a sample that has been divided into a plurality of subsamples as disclosed herein, and / or using a sample that has been partitioned into a plurality of subsamples as disclosed herein. Unless otherwise indicated, where partitioning is performed on the basis of a cytosine modification, the first subsample is the subsample with a higher levelAtty. Docket No. GH0206WO of the modification; the second subsample is the subsample with a lower level of the modification; and, when present, the third subsample has a level of the modification intermediate between the first and second subsamples has a level of the modification intermediate between the first and second subsamples.
[0386] In some embodiments, methods herein comprise contacting DNA with a methylationsensitive nuclease, thereby degrading DNA comprising unmethylated sequences or sequences having low levels of methylation. In some such embodiments, the methylation-sensitive nuclease is a methylation-sensitive restriction enzyme (MSRE), thereby degrading DNA comprising an unmethylated recognition site of the MSRE. Methylation-sensitive nucleases can thus be used in methods herein comprising one or more steps that deplete unmodified or unmethylated sequences, such as those that are prevalent in cfDNA from a subject.
[0387] In some embodiments, methods herein comprise contacting DNA with a methylationdependent nuclease, thereby degrading DNA comprising methylated sequences or sequences having high levels of methylation. In some such embodiments, the methylation-dependent nuclease is a methylation-dependent restriction enzyme (MDRE), thereby degrading DNA comprising a methylated recognition site of the MDRE. Methylation-dependent nucleases can thus be used in methods herein comprising one or more steps that deplete modified or methylated sequences, such as those that are prevalent in cfDNA from a subject.
[0388] As discussed above, partitioning procedures may result in imperfect sorting of DNA molecules among the subsamples. The choice of a methylation-dependent nuclease or methylation-sensitive nuclease can be made so as to degrade nonspecifically partitioned DNA. For example, the second subsample can be contacted with a methylation-dependent nuclease, such as a methylation-dependent restriction enzyme. This can degrade nonspecifically partitioned DNA in the second subsample (e.g., methylated DNA) to produce a treated second subsample. Alternatively or in addition, the first subsample can be contacted with a methylationsensitive endonuclease, such as a methylation-sensitive restriction enzyme, thereby degrading nonspecifically partitioned DNA in the first subsample to produce a treated first subsample. Degradation of nonspecifically partitioned DNA in either or both of the first or second subsamples is proposed as an improvement to the performance of methods that rely on accurate partitioning of DNA on the basis of a cytosine modification, e.g., to detect the presence of aberrantly modified DNA in a sample, to determine the tissue of origin of DNA, and / or to determine whether a subject has cancer. For example, such degradation may provide improvedAtty. Docket No. GH0206WO sensitivity and / or simplify downstream analyses. In general, where nonspecifically partitioned DNA would be hypermethylated, such as in a hypomethylated partition, a methylation-dependent nuclease, such as a methylation-dependent restriction enzyme, should be used. Conversely, where nonspecifically partitioned DNA would be hypomethylated, such as in a hypermethylated partition, a methylation-sensitive nuclease, such as a methylation-sensitive restriction enzyme, should be used. Methylation-dependent nucleases, such as methylation-dependent restriction enzymes, preferentially cut methylated DNA relative to unmethylated DNA, while methylationsensitive nucleases, such as methylation-sensitive restriction enzymes, preferentially cut unmethylated DNA relative to methylated DNA.
[0389] In contacting a subsample with a nuclease, one or more nucleases can be used. In some embodiments, a subsample is contacted with a plurality of nucleases. The subsample may be contacted with the nucleases sequentially or simultaneously. Simultaneous use of nucleases may be advantageous when the nucleases are active under similar conditions (e.g., buffer composition) to avoid unnecessary sample manipulation. Contacting the second subsample with more than one methylation-dependent restriction enzyme can more completely degrade nonspecifically partitioned hypermethylated DNA. Similarly, contacting the first subsample with more than one methylation-sensitive restriction enzyme can more completely degrade nonspecifically partitioned hypomethylated and / or unmethylated DNA.
[0390] In some embodiments, a methylation-dependent nuclease comprises one or more of MspJI, LpnPI, FspEI, or McrBC. In some embodiments, at least two methylation-dependent nucleases are used. In some embodiments, at least three methylation-dependent nucleases are used. In some embodiments, the methylation-dependent nuclease comprises FspEI. In some embodiments, the methylation-dependent nuclease comprises FspEI and MspJI, e.g., used sequentially.
[0391] In some embodiments, a methylation-sensitive nuclease comprises one or more of Aatll, AccII, Acil, Aorl3HI, Aorl5HI, BspT104I, BssHII, BstUI, CfrlOI, Clal, Cpol, Eco52I, Haell, HapII, Hhal, Hin6I, Hpall, HpyCH4IV, Mlul, MspI, Nael, Notl, Nrul, Nsbl, PmaCI, Psp 14061, Pvul, SacII, Sall, Smal, and SnaBI. In some embodiments, at least two methylation-sensitive nucleases are used. In some embodiments, at least three methylation-sensitive nucleases are used. In some embodiments, the methylation-sensitive nucleases comprise BstUI and Hpall. In some embodiments, the two methylation-sensitive nucleases comprise Hhal and AccII. In some embodiments, the methylation-sensitive nucleases comprise BstUI, Hpall and Hin6I.Atty. Docket No. GH0206WO
[0392] In some embodiments, FspEI is used for digesting the nucleic acid molecules in at least one subsample (e.g., a hypomethylated partition). In some embodiments, BstUI, Hpall and Hin6I are used for digesting the nucleic acid molecules in at least one subsample (e.g., a hypermethylated partition) and FspEI is used for digesting the nucleic acid molecules in at least one other subsample (e.g., a hypomethylated partition). In embodiments involving an intermediately methylated partition, the nucleic acid molecules therein may be digested with a methylation-sensitive nuclease or a methylation-dependent nuclease. In some embodiments, the nucleic acid molecules in an intermediately methylated partition are digested with the same nuclease(s) as the hypermethylated partition. For example, the intermediately methylated partition may be pooled with the hypermethylated partition and then the pooled partitions may be subjected to digestion. In some embodiments, the nucleic acid molecules in an intermediately methylated partition are digested with the same nuclease(s) as the hypomethylated partition. For example, the intermediately methylated partition may be pooled with the hypomethylated partition and then the pooled partitions may be subjected to digestion.
[0393] In some embodiments, a subsample is contacted with a nuclease as described above after a step of tagging or attaching adapters to both ends of the DNA. The tags or adapters can be resistant to cleavage by the nuclease using any of the approaches described above. In this approach, cleavage can prevent the nonspecifically partitioned molecule from being carried through the analysis because the cleavage products lack tags or adapters at both ends.
[0394] Alternatively, a step of tagging or attaching adapters can be performed after cleavage with a nuclease as described above. Cleaved molecules can be then identified in sequence reads based on having an end (point of attachment to tag or adapter) corresponding to a nuclease recognition site. Processing the molecules in this way can also allow the acquisition of information from the cleaved molecule, e.g., observation of somatic mutations. When tagging or attaching adapters after contacting the subsample with a nuclease, and low molecular weight DNA such as cfDNA is being analyzed, it may be desirable to remove high molecular weight DNA (such as contaminating genomic DNA) from the sample before the contacting step. It may also be desirable to use nucleases that can be heat-inactivated at a relatively low temperature (e.g., 65°C or less, or 60°C or less) to avoid denaturing DNA, in that denaturation may interfere with subsequent ligation steps.
[0395] Where a sample is partitioned into three subsamples, including a third subsample containing intermediately methylated molecules, the third subsample is in some embodimentsAtty. Docket No. GH0206WO contacted with a methylation- sensitive nuclease. Such a step may have any of the features described elsewhere herein with respect to contacting steps, and may be performed before or after a step of tagging or attaching adapters as discussed above. In some embodiments, the first and third subsamples are combined before being contacted with a methylation-sensitive nuclease. Such a step may have any of the features described elsewhere herein with respect to contacting steps, and may be performed before or after a step of tagging or attaching adapters as discussed above. In some embodiments, the first and third subsamples are differentially tagged before being combined.
[0396] Alternatively, where a sample is partitioned into three subsamples, including a third subsample containing intermediately methylated molecules, the third subsample is in some embodiments contacted with a methylation-dependent nuclease. Such a step may have any of the features described elsewhere herein with respect to contacting steps, and may be performed before or after a step of tagging or attaching adapters as discussed above. In some embodiments, the second and third subsamples are combined before being contacted with a methylationdependent nuclease. Such a step may have any of the features described elsewhere herein with respect to contacting steps, and may be performed before or after a step of tagging or attaching adapters as discussed above. In some embodiments, the second and third subsamples are differentially tagged before being combined.
[0397] In some embodiments, the DNA is purified after being contacted with the nuclease, e.g., using SPRI beads. Such purification may occur after heat inactivation of the nuclease. Alternatively, purification can be omitted; thus, for example, a subsequent step such as amplification can be performed on the subsample containing heat-inactivated nuclease. In another embodiment, the contacting step can occur in the presence of a purification reagent such as SPRI beads, e g., to minimize losses associated with tube transfers. After cleavage and heat inactivation, the SPRI beads can be re-used for cleanup by adding molecular crowding reagents (e g., PEG) and salt.
[0398] In some embodiments, DNA fragmentation is detected by determining the endpoints and / or midpoints of sequenced fragments of DNA (e.g., cfDNA). For example, differences in fragmentation patterns may occur depending on whether the fragments originated from a tumor or from healthy cells. To detect tumor-cell derived DNA of cfDNA based on fragmentation, the presence or absence of an increased level of abnormal fragments can be determined at regionsAtty. Docket No. GH0206WO with copy-number amplifications, (e.g., proportional to the degree of amplification), e.g., where the increase and abnormality are relative to control or healthy samples.
[0399] In some embodiments, where a modification sensitive conversion is performed on a sample or subsample, the subsequent capturing of one or more target region sets (e.g., at least an epigenetic target region set) from that sample or subsample uses target-specific probes that comprise probes specific for a modification state (e.g., of at least one base in the sequence to which the probe hybridizes), e.g., complementary to target sequences that have undergone conversion (e.g., conversion of modified or unmodified cytosines to uracils or analogs thereof, such as DHU, that preferentially pair with adenine) or that have not undergone conversion, as desired. As such, the probes can be specific for sequences in which a modification of interest, such as methylation, was or was not present. In some embodiments, where a modification sensitive conversion is performed on a sample or subsample, the subsequent capturing of one or more target region sets (e.g., at least an epigenetic target region set) from that sample or subsample uses target-specific probes that comprise probes that can hybridize to target sequences regardless of modification state (e.g., comprise a promiscuously pairing nucleobase at a position that may or may not have undergone conversion of modified or unmodified cytosines to uracils or analogs thereof, such as DHU, that preferentially pair with adenine; for example, inosine can pair with C or U).
[0400] In some embodiments, the methods comprise preparing a pool comprising at least a portion of the DNA of the second subsample (also referred to as the hypomethylated partition) and at least a portion of the DNA of the first subsample (also referred to as the hypermethylated partition). Target regions, e.g., including epigenetic target regions and / or sequence-variable target regions, may be captured from the pool. The steps of capturing a target region set from at least a portion of a subsample described elsewhere herein encompass capture steps performed on a pool comprising DNA from the first and second subsamples. A step of amplifying DNA in the pool may be performed before capturing target regions from the pool. The capturing step may have any of the features described elsewhere herein.
[0401] The epigenetic target regions may show differences in methylation levels and / or fragmentation patterns depending on whether they originated from a tumor or from healthy cells, or what type of tissue they originated from, as discussed elsewhere herein. The sequencevariable target regions may show differences in sequence depending on whether they originated from a tumor or from healthy cells.Atty. Docket No. GH0206WO
[0402] Analysis of epigenetic target regions from the hypomethylated partition may be less informative in some applications than analysis of sequence-variable target-regions from the hypermethylated and hypomethylated partitions and epigenetic target regions from the hypermethylated partition. As such, in methods where sequence-variable target-regions and epigenetic target regions are being captured, the latter may be captured to a lesser extent than one or more of the sequence-variable target-regions from the hypermethylated and hypomethylated partitions and epigenetic target regions from the hypermethylated partition. For example, sequence-variable target regions can be captured from the portion of the hypomethylated partition not pooled with the hypermethylated partition, and the pool can be prepared with some (e.g., a majority, substantially all, or all) of the DNA from the hypermethylated partition and none or some (e.g., a minority) of the DNA from the hypomethylated partition. Such approaches can reduce or eliminate sequencing of epigenetic target regions from the hypomethylated partition, thereby reducing the amount of sequencing data that suffices for further analysis.
[0403] In some embodiments, including a minority of the DNA of the hypomethylated partition in the pool facilitates quantification of one or more epigenetic features (e.g., methylation or other epigenetic feature(s) discussed in detail elsewhere herein), e.g., on a relative basis.
[0404] In some embodiments, the pool comprises a minority of the DNA of the hypomethylated partition, e.g., less than about 50% of the DNA of the hypomethylated partition, such as less than or equal to about 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, or 5% of the DNA of the hypomethylated partition. In some embodiments, the pool comprises about 5%-25% of the DNA of the hypomethylated partition. In some embodiments, the pool comprises about 10%-20% of the DNA of the hypomethylated partition. In some embodiments, the pool comprises about 10% of the DNA of the hypomethylated partition. In some embodiments, the pool comprises about 15% of the DNA of the hypomethylated partition. In some embodiments, the pool comprises about 20% of the DNA of the hypomethylated partition.
[0405] In some embodiments, the pool comprises a portion of the hypermethylated partition, which may be at least about 50% of the DNA of the hypermethylated partition. For example, the pool may comprise at least about 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95% of the DNA of the hypermethylated partition. In some embodiments, the pool comprises 50-55%, 55- 60%, 60-65%, 65-70%, 70-75%, 75-80%, 80-85%, 85-90%, 90-95%, or 95-100% of the DNA of the hypermethylated partition. In some embodiments, the second pool comprises all or substantially all of the hypermethylated partition.Atty. Docket No. GH0206WO
[0406] In some embodiments, the methods comprise preparing a first pool comprising at least a portion of the DNA of the hypomethylated partition. In some embodiments, the methods comprise preparing a second pool comprising at least a portion of the DNA of the hypermethylated partition. In some embodiments, the first pool further comprises a portion of the DNA of the hypermethylated partition. In some embodiments, the second pool further comprises a portion of the DNA of the hypomethylated partition. In some embodiments, the first pool comprises a majority of the DNA of the hypomethylated partition, and optionally and a minority of the DNA of the hypermethylated partition. In some embodiments, the second pool comprises a majority of the DNA of the hypermethylated partition and a minority of the DNA of the hypomethylated partition. In some embodiments involving an intermediately methylated partition, the second pool comprises at least a portion of the DNA of the intermediately methylated partition, e.g., a majority of the DNA of the intermediately methylated partition. In some embodiments, the first pool comprises a majority of the DNA of the hypomethylated partition, and the second pool comprises a majority of the DNA of the hypermethylated partition and a majority of the DNA of the intermediately methylated partition.
[0407] In some embodiments, the methods comprise capturing at least a first set of target regions from the first pool, e.g., wherein the first pool is as set forth in any of the embodiments above. In some embodiments, the first set comprises sequence-variable target regions. In some embodiments, the first set comprises hypomethylation variable target regions and / or fragmentation variable target regions. In some embodiments, the first set comprises sequencevariable target regions and fragmentation variable target regions. In some embodiments, the first set comprises sequence-variable target regions, hypomethylation variable target regions and fragmentation variable target regions. A step of amplifying DNA in the first pool may be performed before this capture step. In some embodiments, capturing the first set of target regions from the first pool comprises contacting the DNA of the first pool with a first set of targetspecific probes. In some embodiments, the first set of target-specific probes comprises targetbinding probes specific for the sequence-variable target regions. In some embodiments, the first set of target-specific probes comprises target-binding probes specific for the sequence-variable target regions, hypomethylation variable target regions and / or fragmentation variable target regions.
[0408] In some embodiments, the methods comprise capturing a second set of target regions or plurality of sets of target regions from the second pool, e g., wherein the first pool is as set forthAtty. Docket No. GH0206WO in any of the embodiments above. In some embodiments, the second plurality comprises epigenetic target regions, such as hypermethylation variable target regions and / or fragmentation variable target regions. In some embodiments, the second plurality comprises sequence-variable target regions and epigenetic target regions, such as hypermethylation variable target regions and / or fragmentation variable target regions. A step of amplifying DNA in the second pool may be performed before this capture step. In some embodiments, capturing the second plurality of sets of target regions from the second pool comprises contacting the DNA of the first pool with a second set of target-specific probes, wherein the second set of target-specific probes comprises target -binding probes specific for the sequence-variable target regions and target-binding probes specific for the epigenetic target regions. In some embodiments, the first set of target regions and the second set of target regions are not identical. For example, the first set of target regions may comprise one or more target regions not present in the second set of target regions. Alternatively or in addition, the second set of target regions may comprise one or more target regions not present in the first set of target regions. In some embodiments, at least one hypermethylation variable target region is captured from the second pool but not from the first pool. In some embodiments, a plurality of hypermethylation variable target regions are captured from the second pool but not from the first pool. In some embodiments, the first set of target regions comprises sequence-variable target regions and / or the second set of target regions comprises epigenetic target regions. In some embodiments, the first set of target regions comprises sequence-variable target regions, and fragmentation variable target regions; and the second set of target regions comprises epigenetic target regions, such as hypermethylation variable target regions and fragmentation variable target regions. In some embodiments, the first set of target regions comprises sequence-variable target regions, fragmentation variable target regions, and comprises hypomethylation variable target regions; and the second set of target regions comprises epigenetic target regions, such as hypermethylation variable target regions and fragmentation variable target regions.
[0409] In some embodiments, the first pool comprises a majority of the DNA of the hypomethylated partition and a portion of the DNA of the hypermethylated partition (e.g., about half), and the second pool comprises a portion of the DNA of the hypermethylated partition (e.g., about half). In some such embodiments, the first set of target regions comprises sequencevariable target regions and / or the second set of target regions comprises epigenetic targetAtty. Docket No. GH0206WO regions. The sequence-variable target regions and / or the epigenetic target regions may be as set forth in any of the embodiments described elsewhere herein.G. Enriching / Capturing step; amplification
[0410] Methods disclosed herein can comprise enriching, capturing, or isolating target regions and / or segments comprising recombined CDR3 sequences from DNA, such as cfDNA, e.g., from the first subsample. In some embodiments, e.g., to capture target regions such as sequencevariable target regions and / or epigenetic target regions, the capturing comprises contacting the DNA with probes specific for the target regions. In some embodiments, the capturing step comprises contacting the DNA with probes specific for the target regions. Enrichment or capture may be performed on any sample or subsample described herein using any suitable approach known in the art. In some embodiments, e g., to enrich segments comprising recombined CDR3 sequences, the segments can be amplified by performing multiplex amplification, e.g., from the second subsample. Exemplary primers for such multiplex amplification are provided elsewhere herein. In some embodiments, the capturing comprises enriching for or capturing one or more primer-extended products and / or target regions hybridized to the one or more primer-extended products. Enrichment or capture may be performed on any sample or subsample described herein using any suitable approach known in the art.
[0411] In some embodiments, a CDR3 sequence can be a part of a T cell receptor (TCR), TCR beta chain, B cell receptor, immunoglobulin, B cell receptor heavy chain, or immunoglobulin heavy chain. In some embodiments, the CDR3 can be part of a TCR CDR3 repertoire. A TCR repertoire, for example, generated by the process of V(D)J recombination, encompasses the T cell clones within a given individual or sample, and TCRs can be indicative of disease (e.g., cancer) status, prior infections or immunizations, and individual-specific attributes of epitope selection. In some embodiments, a structural variation is a recombined CDR3 sequence.
[0412] In some embodiments, the probes specific for DNA target regions comprise a capture moiety that facilitates the enrichment or capture of the DNA hybridized to the probes, respectively. In some embodiments, the amplified products comprise a capture moiety that facilitates the enrichment or capture of the primer-extended products and DNA hybridized to the primer-extended products. The capture moiety may be provided as part of the primer or incorporated during extension as part of a modified deoxyribonucleotide triphosphate as discussed in detail elsewhere herein.Atty. Docket No. GH0206WO
[0413] As discussed above, nucleic acids in a sample can be subject to a capture step, in which molecules having certain characteristics are captured and analyzed. Target capture can involve use of a bait set comprising oligonucleotide baits labeled with a capture moiety, such as biotin or the other examples noted below. The probes can have sequences selected to tile across a panel of regions, such as genes. In some embodiments, a bait set can have higher and lower capture yields for sets of target regions such as those of the sequence-variable target region set and the epigenetic target region set, respectively, as discussed elsewhere herein. Such bait sets are combined with a sample under conditions that allow hybridization of the target molecules with the baits. Then, captured molecules are isolated using the capture moiety. DNA capture can involve use of oligonucleotides labeled with a capture moiety, such as target-specific probes labeled with biotin, and a second moiety or binding partner that binds to the capture moiety, such as streptavidin. In some embodiments, a capture moiety and binding partner can have higher and lower capture yields for different sets of target regions, such as those of the sequence-variable target region set, recombined CDR3 sequences, and / or an epigenetic target region set, respectively, as discussed elsewhere herein.
[0414] Capture may be performed using any suitable approach known in the art. Target capture can involve use of a bait set comprising oligonucleotide baits (a type of probe useful herein) labeled with a capture moiety, such as biotin or the other examples noted below. The probes can have sequences selected to tile across a panel of regions, such as genes. Such bait sets are combined with a sample under conditions that allow hybridization of the target molecules with the baits. Then, captured molecules are isolated using the capture moiety. For example, a biotin capture moiety by bead-based streptavidin. Such methods are further described in, for example, U.S. patent 9,850,523, issuing December 26, 2017, which is incorporated herein by reference.
[0415] In some embodiments, a label (e.g., on a primer) comprises a capture moiety. Capture moieties include, without limitation, biotin, avidin, streptavidin, a nucleic acid comprising a particular nucleotide sequence, digoxygenin, a histidine tag, an affinity tag, an immunoglobulin constant domain, a hapten recognized by an antibody, and magnetically attractable particles. In some embodiments, the immunoglobulin constant domain may be bound using protein A, protein G, or a secondary antibody. In some embodiments, the secondary antibody comprises an antimouse secondary antibody. In some embodiments, the anti-mouse secondary antibody is a goat anti-mouse secondary antibody, rabbit anti-mouse secondary antibody, or a donkey anti-mouse secondary antibody. The extraction moiety can be a member of a binding pair, such asAtty. Docket No. GH0206WO biotin / streptavidin or hapten / antibody. In some embodiments, a capture moiety that is attached to an analyte is captured by its binding pair which is attached to an isolatable moiety, such as a magnetically attractable particle or a large particle that can be sedimented through centrifugation. The capture moiety can be any type of molecule that allows affinity separation of nucleic acids bearing the capture moiety from nucleic acids lacking the capture moiety. Exemplary capture moi eties are biotin that allows affinity separation by binding to streptavidin linked or linkable to a solid phase or an oligonucleotide, which allows affinity separation through binding to a complementary oligonucleotide linked or linkable to a solid phase.
[0416] In some embodiments, the probes specific for the target regions (i.e., target-specific probes) comprise a capture moiety that facilitates the enrichment or capture of the DNA hybridized to the probes. In some embodiments, the capture moiety is biotin. In some such embodiments, streptavidin attached to a solid support, such as magnetic beads, is used to bind to the biotin. Nonspecifically bound DNA that does not comprise a target region is washed away from the captured DNA. In some embodiments, DNA is then dissociated from the probes and eluted from the solid support using salt washes or buffers comprising another DNA denaturing agent. In some embodiments, the probes are also eluted from the solid support by, e.g., disrupting the biotin-streptavidin interaction. In some embodiments, captured DNA is amplified following elution from the solid support. In some such embodiments, DNA comprising adapters is amplified using PCR primers that anneal to the adapters. In some embodiments, captured DNA is amplified while attached to the solid support. In some such embodiments, the amplification comprises the use of a PCR primer that anneals to a sequence within an adapter and a PCR primer that anneals to a sequence within a probe annealed to the target region of the DNA.
[0417] A panel of regions targeted for enrichment can be selected such that they do not contain regions known to include the base modification used in the end repair reaction. When the end repair is performed with dNTPs comprising 5mC or 5hmC, a panel of regions targeted for enrichment may be selected such that they do not contain CpH dinucleotides which are known to be naturally methylated in the subject (e.g. humans). Such CpH dinucleotides can be identified through the use of publicly available resources (e.g. MethBank3.0: a database of DNA methylomes across a variety of species Nucleic Acids Res 2018). Such an approach has the advantage that any detected methylated CpH dinucleotides can unambiguously be attributed to regions synthesized in the end repair.Atty. Docket No. GH0206WO
[0418] In some embodiments, the methods herein comprise enriching for or capturing DNA comprising sequence-variable target region set, recombined CDR3 sequences, and / or the epigenetic target region set. Such regions may be captured from an aliquot of a sample (e.g., a sample that has undergone attachment of adapters and amplification). In some embodiment, capturing comprises contacting the DNA to be captured with a set of targe-specific probes. Enriching for or capturing DNA comprising sequence-variable target region set, recombined CDR3 sequences, and the epigenetic target region set may comprise contacting the DNA with a different set of target-specific probes. The set of target-specific probes may have any of the features described herein for sets of target-specific probes, including but not limited to in the embodiments set forth above and the sections relating to probes below. Capturing may be performed on one or more subsamples prepared during methods disclosed herein. In some embodiments, DNA is captured from the first subsample or the second subsample, e.g., at least the first subsample and the second subsample. In some embodiments, the subsamples are differentially tagged (e g., as described herein) and then pooled before undergoing capture.
[0419] The capturing step may be performed using conditions suitable for specific nucleic acid hybridization, which generally depend to some extent on features of the probes such as length, base composition, etc. Those skilled in the art will be familiar with appropriate conditions given general knowledge in the art regarding nucleic acid hybridization. In some embodiments, complexes of target-specific probes and DNA are formed.
[0420] In some embodiments, a method described herein comprises capturing cfDNA obtained from a subject for a plurality of sets of target regions. The target regions comprise epigenetic target region, which may show differences in methylation levels and / or fragmentation patterns depending on whether they originated from a tumor or from healthy cells. The target regions also comprise sequence-variable target regions, which may show differences in sequence depending on whether they originated from a tumor or from healthy cells. The capturing step produces a captured set of cfDNA molecules and the cfDNA molecules corresponding to the sequencevariable target region set are captured at a greater capture yield in the captured set of cfDNA molecules than cfDNA molecules corresponding to the epigenetic target region set. For additional discussion of capturing steps, capture yields, and related aspects, see W02020 / 160414, which is incorporated herein by reference for all purposes.
[0421] In some embodiments, a method described herein comprises contacting cfDNA obtained from a subject with a set of target-specific probes, wherein the set of target-specific probes isAtty. Docket No. GH0206WO configured to capture cfDNA corresponding to the sequence-variable target region set at a greater capture yield than cfDNA corresponding to the epigenetic target region set. In some embodiments, a method described herein comprises contacting cfDNA obtained from a subject with a set of target-specific probes, wherein the set of target-specific probes is configured to capture cfDNA corresponding to the ligation-extension products at a greater capture yield than cfDNA corresponding to other target region set.
[0422] In some embodiments, it can be beneficial to capture cfDNA corresponding to the ligation-extension products at a greater capture yield than cfDNA corresponding to the sequencevariable target region s...
Claims
Atty. Docket No. GH0206WOWhat is claimed is:
1. A method of analyzing DNA in an adapted library, the method comprising: a) contacting the DNA with (i) adapter-blocking probes and intron-blocking probes, thereby providing blocked DNA; and (ii) a plurality of primers, wherein the primers bind J regions and the intron-blocking probes bind J-region introns, or the primers bind V regions and the intronblocking probes bind V-region introns; b) extending at least a portion of the primers, wherein at least a portion of primers bound to DNA comprising recombined CDR3 sequences are extended across the recombined CDR3 sequences, thereby providing CDR3 -enriched DNA; and c) sequencing the CDR3 -enriched DNA.
2. The method of the immediately preceding claim, wherein the primers are extended using a non-strand-displacing polymerase.
3. The method of any one of the preceding claims, wherein the primers are extended using a polymerase that lacks 5’ to 3’ exonuclease activity.
4. The method of any one of claims 1-3, comprising contacting the DNA with adapterblocking probes, V region intron-blocking probes, and a plurality of primers that bind V regions.
5. The method of any one of claims 1-3, comprising contacting the DNA with adapterblocking probes, J region intron-blocking probes, and a plurality of primers that bind J regions.
6. The method of any one of claims 1-3, comprising contacting the DNA with adapterblocking probes, J region intron-blocking probes, V region intron-blocking probes, a plurality of primers that bind V regions, and a plurality of primers that bind J regions.
7. The method of any one of the preceding claims, wherein the plurality of primers that bindV regions are oriented to prime extension toward the J exon.
8. The method of any one of the preceding claims, wherein the plurality of primers that bind J regions are oriented to prime extension toward the V exon.
9. The method of any one of the preceding claims, wherein the plurality of primers that bindV regions comprises from 5 to 300, from 5 to 10, from 10 to 20, from 20 to 30, from 30 to 40,Atty. Docket No. GH0206WO from 40 to 50, from 50 to 60, from 60 to 70, from 70 to 80, from 80 to 90, from 90 to 100, from 100 to 120, from 120 to 140, from 140 to 160, from 160 to 180, from 180 to 200, from 200 to 220, from 220 to 240, from 240 to 260, from 260 to 280, from 280 to 300, from 40 to 300, from 40 to 280, from 100 to 300, from 100 to 280, from 150 to 300, from 100 to 200, from 200 to 300, from 10 to 150, from 15 to 125, from 20 to 100, from 30 to 90, from 40 to 80, or from 50 to 70 primers that bind V regions.
10. The method of any one of the preceding claims, wherein the plurality of primers that bind J regions comprises from 5 to 120, from 5 to 10, from 10 to 20, from 20 to 30, from 30 to 40, from 40 to 50, from 50 to 60, from 60 to 70, from 70 to 80, from 80 to 90, from 90 to 100, from 100 to 120, from 50 to 120, from 80 to 120, from 60 to 100, from 5 to 30, from 10 to 25, or from 10 to 20 primers that bind J regions.
11. The method of any one of the preceding claims, wherein the DNA contacted with the adapter-blocking probes, intron-blocking probes, and primers is in a first subsample.
12. The method of the immediately preceding claim, wherein at least a second subsample of the adapted library is retained as a backup.
13. The method of any one of the preceding claims, further comprising capturing a plurality of target regions from the DNA, thereby providing captured regions.
14. The method of the immediately preceding claim, wherein the plurality of target regions comprises sequence-variable target regions.
15. The method of claim 13 or claim 14, wherein the plurality of target regions comprises epigenetic target regions.
16. The method of any one of claims 13-15, wherein the plurality of target regions comprises sequence-variable target regions and epigenetic target regions.
17. The method of any one of claims 13-16, wherein the plurality of target regions is captured from a first subsample or from a second subsample of the adapted library.
18. The method of any one of claims 11-17, wherein a third subsample of the adapted library is retained as a backup.Atty. Docket No. GH0206WO19. The method of any one of claims 13-18, further comprising sequencing the captured regions.
20. The method of any one of claims 14-19, further comprising sequencing the captured sequence-variable target regions.
21. The method of any one of claims 15-20, further comprising sequencing the captured epigenetic target regions.
22. The method of any one of claims 13-21, wherein the captured regions are amplified prior to sequencing.
23. The method of any one of claims 13-22, wherein the captured regions and the CDR3- enriched DNA are pooled and sequenced together.
24. The method of any one of claims 13-22, wherein the captured regions and the CDR3- enriched DNA are sequenced separately.
25. The method of any one of claims 15-24, wherein the epigenetic target regions comprise hypermethylation variable target regions, hypomethylation variable target regions, methylation control target regions, or fragmentation variable target regions.
26. The method of any one of claims 14-25, further comprising quantifying a somatic mutation load using a plurality of captured regions comprising the sequence-variable target regions.
27. The method of any one of the preceding claims, wherein at least a portion of the plurality of primers that bind J regions and / or at least a portion of the plurality of primers that bind V regions do not exponentially amplify a target region that does not comprise a CDR3 sequence.
28. The method of any one of the preceding claims, wherein hybridization of an intronblocking probe to the DNA at least partially blocks extension of at least a portion of the plurality of primers29. The method of any one of the preceding claims, wherein the plurality of primers that bind V regions and / or the plurality of primers that bind J regions comprise a label.Atty. Docket No. GH0206WO30. The method of the immediately preceding claim, wherein the plurality of primers that bind V regions and / or the plurality of primers that bind J regions comprise the same label.
31. The method of claim 29 or claim 30, wherein the plurality of primers that bind V regions comprises a first label and the plurality of primers that bind J regions comprises a second label.
32. The method of any one of the preceding claims, wherein a label is incorporated into the CDR3 enriched DNA during the extending segments comprising recombined CDR3 sequences.
33. The method of any one of claims 29-32, wherein the label is biotin, avidin, streptavidin, neutravidin, an oligonucleotide, digoxygenin, a histidine tag, an affinity tag, an immunoglobulin constant domain, a nucleic acid comprising a particular nucleotide sequence, a hapten recognized by an antibody, or a magnetically attractable particle.
34. The method of any one of the preceding claims, wherein the CDR3 sequence is a part of a T cell receptor (TCR), TCR beta chain, B cell receptor, immunoglobulin, B cell receptor heavy chain, or immunoglobulin heavy chain.
35. The method of any one of the preceding claims, wherein the CDR3 -enriched DNA comprises extended primers that bind V regions, and / or extended primers that bind J regions, wherein extension was not blocked by a J region intron-blocking probe or by a V region intronblocking probe.
36. The method of any one of the preceding claims, wherein each of the plurality of primers that bind J regions and / or each of the plurality of primers that bind V regions comprises at least 18, 19, or 20 linked nucleosides.
37. The method of any one of the preceding claims, wherein each of the plurality of primers that bind J regions and / or each of the plurality of primers that bind V regions consists of 18, 19, or 20 to 60 linked nucleosides.
38. The method of any of the preceding claims, wherein the plurality of primers that bind J regions and / or the plurality of primers that bind V regions are resistant to 5’ exonucleolysis.
39. A method of analyzing DNA in an adapted library, the method comprising: a) contacting the DNA with one or more blocking probes, thereby providing blocked DNA;Atty. Docket No. GH0206WO b) performing multiplex amplification of a plurality of target regions that may comprise a structural variation using a plurality of first primers and a plurality of second primers that anneal to the plurality of target regions, wherein the blocking probes inhibit amplification of wild-type DNA, thereby providing structural variation -enriched DNA; and c) sequencing the structural variation-enriched DNA.
40. The method of the immediately preceding claim, wherein the multiplex amplification is performed with a non-strand-displacing polymerase that lacks 5’ to 3’ exonuclease activity.
41. The method of claim 39 or 40, wherein the plurality of first primers comprises from 5 to 300, from 5 to 10, from 10 to 20, from 20 to 30, from 30 to 40, from 40 to 50, from 50 to 60, from 60 to 70, from 70 to 80, from 80 to 90, from 90 to 100, from 100 to 120, from 120 to 140, from 140 to 160, from 160 to 180, from 180 to 200, from 200 to 220, from 220 to 240, from 240 to 260, from 260 to 280, from 280 to 300, from 40 to 300, from 40 to 280, from 100 to 300, from 100 to 280, from 150 to 300, from 100 to 200, from 200 to 300, from 10 to 150, from 15 to 125, from 20 to 100, from 30 to 90, from 40 to 80, or from 50 to 70 primers.
42. The method of any one of claims 39-41, wherein the plurality of second primers comprises from 5 to 300, from 5 to 10, from 10 to 20, from 20 to 30, from 30 to 40, from 40 to 50, from 50 to 60, from 60 to 70, from 70 to 80, from 80 to 90, from 90 to 100, from 100 to 120, from 120 to 140, from 140 to 160, from 160 to 180, from 180 to 200, from 200 to 220, from 220 to 240, from 240 to 260, from 260 to 280, from 280 to 300, from 40 to 300, from 40 to 280, from 100 to 300, from 100 to 280, from 150 to 300, from 100 to 200, from 200 to 300, from 10 to 150, from 15 to 125, from 20 to 100, from 30 to 90, from 40 to 80, or from 50 to 70 primers.
43. The method of any one of claims 39-41, comprising contacting a first subsample of the DNA with the one or more blocking probes.
44. The method of the immediately preceding claim, wherein a second subsample of the adapted library is retained as a backup.
45. The method of any one of claims 39-44, further comprising capturing a second plurality of target regions from the DNA, thereby providing captured regions.Atty. Docket No. GH0206WO46. The method of the immediately preceding claim, wherein the second plurality of target regions comprises sequence-variable target regions.
47. The method of claim 45 or claim 46, wherein the second plurality of target regions comprises epigenetic target regions.
48. The method of any one of claims 45-47, wherein the second plurality of target regions comprises sequence-variable target regions and epigenetic target regions.
49. The method of any one of claims 45-48, wherein the second plurality of target regions is captured from a first subsample or from a second subsample of the adapted library.
50. The method of any one of claims 43-49, wherein a third subsample of the adapted library is retained as a backup.
51. The method of any one of claims 45-50, further comprising sequencing the captured regions.
52. The method of any one of claims 46-51, further comprising sequencing the captured sequence-variable target regions.
53. The method of any one of claims 47-52, further comprising sequencing the captured epigenetic target regions.
54. The method of any one of claims 45-53, wherein the captured regions are amplified prior to sequencing.
55. The method of claim 45-54, wherein the captured regions and the structural variation- enriched DNA are pooled and sequenced together.
56. The method of claim 45-55, wherein the captured regions and the structural variation- enriched DNA are sequenced separately.
57. The method of any one of claims 47-56, wherein the epigenetic target regions comprise hypermethylation variable target regions, hypomethylation variable target regions, methylation control target regions, or fragmentation variable target regions.Atty. Docket No. GH0206WO58. The method of any one of claims 46-57, further comprising quantifying a somatic mutation load using a plurality of captured regions comprising the sequence-variable target regions.
59. The method of any one of claims 39-58, wherein at least a portion of the plurality of first primers and / or at least a portion of the plurality of second primers do not exponentially amplify a target region that does not comprise a structural variation.
60. The method of any one of claims 39-59, wherein hybridization of a blocking probe to a region of the DNA at least partially blocks extension of at least a portion of the plurality of first primers and / or at least a portion of the plurality of second primers.
61. The method of any one of claims 39-60, wherein the structural variation comprises a rearrangement, an insertion, or a deletion.
62. The method of the immediately preceding claim, wherein the rearrangement comprises translocations, gene fusions, duplications, copy-number variants, or inversions.
63. The method of any one of claims 39-62, wherein the structural variation-enriched DNA comprises extended first primers, and / or extended second primers, wherein extension was not blocked by a blocking probe.
64. The method of any one of claims 39-63, wherein the plurality of first primers and / or the plurality of second primers comprise a label.
65. The method of the immediately preceding claim, wherein the plurality of first primers and the plurality of second primers comprise the same label.
66. The method of any one of claims 39-65, wherein the plurality of first primers comprises a first label and the plurality of second primers comprises a second label.
67. The method of any one of claims 39-66, wherein a label is incorporated into the structural variation -enriched DNA during the multiplex amplification of the plurality of target regions that may comprise a structural variation.Atty. Docket No. GH0206WO68. The method of any one of claims 64-67, wherein the label is biotin, avidin, streptavidin, neutravidin, an oligonucleotide, digoxygenin, a histidine tag, an affinity tag, an immunoglobulin constant domain, a nucleic acid comprising a particular nucleotide sequence, a hapten recognized by an antibody, or a magnetically attractable particle.
69. The method of any one of the preceding claims, wherein each of the plurality of the first and second primers comprises at least 20 linked nucleosides.
70. The method of any one of the preceding claims, wherein each of the plurality of the first and second primers consists of 18, 19, or 20 to 60 linked nucleosides.
71. The method of any of the preceding claims, wherein the plurality of the first and second primers are resistant to 5’ exonucleolysis.
72. The method of any one of the preceding claims, wherein the method comprises preparing the adapted library by ligating adapters to DNA, thereby producing adapted DNA.
73. The method of claim 72, wherein the adapted DNA comprises molecular barcodes.
74. The method of any one of the preceding claims, wherein the adapted library is prepared from cfDNA.
75. The method of any one of the preceding claims, wherein the method comprises preparing the adapted library by ligating adaptors to cfDNA, thereby producing adapted cfDNA.
76. The method of any one of the preceding claims, wherein the adapted library is prepared from DNA from a subject having or suspected of having a cancer.
77. The method of any one of the preceding claims, wherein the method comprises preparing the adapted library by ligating adaptors to cfDNA from a subject having or suspected of having a cancer, thereby producing adapted cfDNA.
78. The method of any one of the preceding claims, further comprising determining a likelihood that the subject has a cancer.
79. The method of any one of claims 76-78, wherein the cancer is a lymphocytic cancer.Atty. Docket No. GH0206WO80. The method of the immediately preceding claim, wherein the lymphocytic cancer is a leukemia, a lymphoma, or a myeloma.
81. The method of any one of claims 76-80, wherein the cancer is a lymphoma.
82. The method of claim 81, wherein the lymphoma is B-cell lymphoma, non-Hodgkin lymphoma, diffuse large B-cell lymphoma, Mantle cell lymphoma, T cell lymphoma, nonHodgkin lymphoma, precursor T-lymphoblastic lymphoma / leukemia, or peripheral T cell lymphoma.
83. The method of any one of the preceding claims, further comprising separating the CDR3- enriched DNA or the structural variation-enriched DNA from non-enriched DNA in the sample.
84. The method of the immediately preceding claim, wherein the separating uses the label to separate the structural variation-enriched DNA or the CDR3-enriched DNA from non-enriched DNA in the sample.
85. The method of claim 83 or claim 84, wherein the separating comprises precipitating the structural variation-enriched DNA or the CDR3-enriched DNA.
86. The method of any one of claims 83-85, wherein the separating is performed at a temperature that facilitates (i) separation of extended primers that bind J regions from nonextended or parti ally -extended primers that bind J regions and / or separation of extended primers that bind V regions from non-extended or partially-extended primers that bind V regions; or (ii) separation of extended first primers and / or extended second primers from non-extended or partially-extended first and / or second primers.
87. The method of any one of claims 83-86, wherein the separating is performed at a temperature that is (i) higher than the melting temperature of the non-extended and / or the partially-extended primers that bind J regions, and / or (ii) higher than the melting temperature of the non-extended and / or the partially-extended primers that bind V regions.
88. The method of claim 86 or claim 87, wherein the temperature is 1-15, 1-10, 1-5, 5-15, or 5-15, such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 degrees Fahrenheit above theAtty. Docket No. GH0206WO melting temperature of (i) the non-extended and / or the partially-extended primers that bind J regions and / or (ii) the non-extended or partially-extended primers that bind V regions.
89. The method of any one of claims 83-86, wherein the separating is performed at a temperature that is higher than the melting temperature of the non-extended and / or the partially- extended first and / or second primers.
90. The method of claim 86 or claim 89, wherein the temperature is 1-15, 1-10, 1-5, 5-15, or 5-15, such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 degrees Fahrenheit above the melting temperature of the non-extended and / or the partially-extended first and / or second primers.
91. The method of any one of claims 83-90, wherein the DNA is rendered single stranded prior to the separating.
92. The method of any of the preceding claims, comprising differentially tagging and pooling the first subsample and the second subsample.
93. The method of claim 92, wherein the pool comprises less than or equal to about 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, or 5% of the DNA of the second subsample.
94. The method of the immediately preceding claim, wherein the pool comprises about 70- 90%, about 75-85%, or about 80% of the DNA of the second subsample.
95. The method of any one of claims 92-94, wherein the pool comprises substantially all of the DNA of the first subsample.
96. The method of any one of the preceding claims, further comprising detecting a presence or absence of a DNA molecule that comprises a CDR3 sequence of interest.
97. The method of any one of claims 39-96, comprising detecting a presence or absence of a DNA molecule that comprises a structural variation.
98. The method of claim 96 or 97, wherein the detecting comprises generating a plurality of sequencing reads; and the method further comprises mapping the plurality of sequence reads toAtty. Docket No. GH0206WO one or more reference sequences to generate mapped sequence reads, and processing the mapped sequence reads to determine the likelihood that the subject has cancer.
99. The method of the immediately preceding claim, further comprising detecting a presence or absence of DNA originating or derived from a tumor cell using the mapped sequence reads.
100. The method of the immediately preceding claim, further comprising determining a cancer recurrence score that is indicative of the presence or absence of the DNA originating or derived from the tumor cell for the test subject, optionally further comprising determining a cancer recurrence status based on the cancer recurrence score, wherein the cancer recurrence status of the test subject is determined to be at risk for cancer recurrence when a cancer recurrence score is determined to be at or above a predetermined threshold or the cancer recurrence status of the test subject is determined to be at lower risk for cancer recurrence when the cancer recurrence score is below the predetermined threshold.
101. The method of the immediately preceding claim, further comprising comparing the cancer recurrence score of the test subject with a predetermined cancer recurrence threshold, wherein the test subject is classified as a candidate for a subsequent cancer treatment when the cancer recurrence score is above the cancer recurrence threshold or not a candidate for a subsequent cancer treatment when the cancer recurrence score is below the cancer recurrence threshold.