Compositions and methods for analyzing DNA using a methylation-discriminating nuclease and conversion

Methylation-discriminating nucleases and base conversion processes improve DNA analysis in liquid biopsies by enhancing sensitivity and accuracy for cancer detection, addressing the limitations of existing methods in precise methylation pattern identification and reducing sequencing costs.

WO2026136701A1PCT designated stage Publication Date: 2026-06-25GUARDANT HEALTH INC

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
GUARDANT HEALTH INC
Filing Date
2025-12-18
Publication Date
2026-06-25

Smart Images

  • Figure US2025060380_25062026_PF_FP_ABST
    Figure US2025060380_25062026_PF_FP_ABST
Patent Text Reader

Abstract

The present disclosure provides compositions and methods related to analyzing DNA, such as cell-free DNA. In some embodiments, the cell-free DNA is from a subject having or suspected of having cancer and / or the cell-free DNA includes DNA from cancer cells. In some embodiments, the DNA is contacted with a methylation-discriminating nuclease, thereby degrading methylated or unmethylated DNA to produce a treated sample, and the treated sample is subjected to a procedure that affects a first nucleobase differently from a second nucleobase to produce a treated and converted sample. In some embodiments, the DNA is subjected to a procedure that affects a first nucleobase differently from a second nucleobase to produce a converted sample and the converted sample is contacted with a methylation-discriminating nuclease, thereby degrading methylated or unmethylated DNA to produce a treated and converted sample.
Need to check novelty before this filing date? Find Prior Art

Description

Atty. Docket No. GH0193WOCOMPOSITIONS AND METHODS FOR ANALYZING DNA USING A METHYLATION-DISCRIMINATING NUCLEASE AND CONVERSIONCROSS REFERENCE TO RELATED APPLICATIONS

[0001] This application claims the benefit of priority to US Provisional Patent Application No. 63 / 736,936, filed December 20, 2024, which is incorporated by reference herein in its entirety for all purposes.FIELD OF THE INVENTION

[0002] The present disclosure provides compositions and methods related to analyzing DNA, such as cell-free DNA. In some embodiments, the DNA is contacted with a methylationdiscriminating nuclease, thereby degrading methylated or unmethylated DNA to produce a treated sample, and the treated sample is subjected to a procedure that affects a first nucleobase differently from a second nucleobase to produce a treated and converted sample. In some embodiments, the DNA is subjected to a procedure that affects a first nucleobase differently from a second nucleobase to produce a converted sample and the converted sample is contacted with a methylation-discriminating nuclease, thereby degrading methylated or unmethylated DNA to produce a treated and converted sample. In some embodiments, the DNA is from a subject having or suspected of having cancer and / or the DNA includes DNA from cancer cells.INTRODUCTION AND SUMMARY

[0003] Cancer is responsible for millions of deaths per year worldwide. Early cancer detection may result in improved outcomes because early-stage cancer tends to be more susceptible to treatment.

[0004] Improperly controlled cell growth is a hallmark of cancer. Cancer is usually caused by the accumulation of mutations within an individual's normal cells, at least some resulting in improperly regulated cell division. Such mutations commonly include single nucleotide variations (SNVs), gene fusions, insertions and deletions (indels), transversions, translocations, and inversions. Cancers may also exhibit an accumulation of epigenetic changes, including modification of cytosine (e.g., 5-methylcytosine, 5- hydroxymethylcytosine, and other more oxidized forms) and association of DNA with chromatin proteins and transcription factors. Thus, cancer can be indicated by non-sequence modifications, such as methylation. Examples of methylation changes in cancer include local gains of DNA methylation, e.g., in the CpG islands at the transcription start sites of genes involved in normal growth control, DNA repair, cell cycle regulation, and / or cell differentiation. Hypermethylation can be associated with an aberrant loss of transcriptionalAtty. Docket No. GH0193WO capacity of involved genes and occurs at least as frequently as point mutations and deletions as a cause of altered gene expression. Furthermore, without wishing to be bound by any particular theory, cells in or around a cancer or neoplasm may shed more DNA than cells of the same tissue type in a healthy subject. The DNA from such cells may differ epigenetically from shed DNA in a healthy subject. As such, the distribution of epigenetically modified (e.g., methylated) DNA in certain DNA samples, such as cell-free DNA (cfDNA), may change upon carcinogenesis. Thus, sufficiently sensitive epigenetic (e.g., DNA methylation) profiling can be used to detect aberrant methylation in DNA of a sample.

[0005] Biopsies represent a traditional approach for detecting or diagnosing cancer in which cells or tissue are extracted from a possible cancer site and analyzed for relevant phenotypic and / or genotypic features. Biopsies have the drawback of being invasive. Cancer detection based on analysis of body fluids (“liquid biopsies”), such as blood, is an intriguing alternative based on the observation that DNA from cancer cells is released into body fluids. A liquid biopsy is noninvasive (sometimes requiring only a blood draw). However, it has been challenging to develop accurate and sensitive methods for analyzing liquid biopsy material in part because the amount of nucleic acids released into body fluids is low and variable as is recovery of nucleic acids from such fluids in analyzable form. Further, the contribution of DNA from cells in or around a cancer or neoplasm to a sample may be relatively small relative to the contribution from other cells, and the DNA contributed from other cells may be uninformative as to cancer status. Isolating and processing cell-free DNA useful for further analysis in liquid biopsy procedures can be a useful part of these methods.

[0006] Current methods of cancer diagnostic assays of cell-free nucleic acids (e.g., cell-free DNA or cell-free RNA) may focus on the detection of tumor-related somatic variants, including single nucleotide variants (SNVs), copy number variations (CNVs), fusions, and indels (i.e., insertions or deletions), which are all mainstream targets for liquid biopsy. There is growing evidence that non-sequence modifications like methylation status and fragmentomic signal in cell-free DNA can provide information on the source of cell-free DNA and disease level. Detailed knowledge of the non-sequence modifications of the cell- free DNA (e.g., when combined with somatic mutation calling) can improve assessments of tumor status.

[0007] Accordingly, there is a need for improved methods and compositions for analyzing non-sequence modifications (such as methylation status or copy number variants) in DNA, including cell-free DNA, e.g., in liquid biopsies.Atty. Docket No. GH0193WO

[0008] Sequencing workflows, including next-generation sequencing workflows, are often employed for DNA-based disease (e.g., cancer) diagnostics because a high level of target multiplexing is needed for desired sensitivity and / or performance. Where methylated molecules are of interest, sequencing unmethylated molecules at each target adds significant sequencing costs. Methylation enrichment sequencing workflows have been developed but can result in coarser, fragment-based methylation quantification, which theoretically limits the ability to precisely identify disease-specific methylation patterns of a target, which in turn reduces the performance of the assay. As such, there is a need for workflows that enrich methylated molecules while maintaining the ability to precisely identify methylation patterns.

[0009] The present disclosure aims to meet the need for improved sensitivity of DNA analysis, such as in a cfDNA sample, provide other benefits, or at least provide the public with a useful choice. In some embodiments, the present disclosure provides methods for analyzing DNA in a sample through contacting the sample with a methylation-discriminating nuclease to produce a treated sample, wherein the sample is an original sample or is a proportionately methylated sample relative to an original sample, and subjecting the treated sample to a procedure that affects a first nucleobase differently from a second nucleobase to produce a treated and converted sample. In some embodiments, the present disclosure provides methods for analyzing DNA in a sample through subjecting the sample to a procedure that affects a first nucleobase differently from a second nucleobase to produce a converted sample and contacting the converted sample with a methylation-discriminating nuclease to produce a treated and converted sample. In some embodiments, the methylationdiscriminating nuclease degrades methylated or unmethylated DNA. In some embodiments, the first nucleobase is a modified or unmodified nucleobase, the second nucleobase is a modified or unmodified nucleobase different from the first nucleobase, and the first nucleobase and the second nucleobase have the same base pairing specificity. In some embodiments, the procedure that affects a first nucleobase differently from a second nucleobase alters the base pairing specificity of the first or second nucleobase.

[0010] Accordingly, the embodiments described herein are provided, which include steps that can provide information about DNA variations and modifications, including but not limited to epigenetic, copy number, and sequence variations in cfDNA. Such methods comprising DNA analysis may provide even more improved information about the likelihood of a particular disease state of a subject. Improved detection of cancer markers in blood allows for more accurate detection of disorders (diagnosis) and therefore improved treatments.Atty. Docket No. GH0193WO

[0011] Without wishing to be bound by any particular theory, the present disclosure provides methods that can remove abundant molecules of a certain methylation state (e.g., hypermethylated molecules or hypomethylated molecules) followed by single-site resolution methylation sequencing for a high accuracy of methylation detection and reduced sequencing costs. For example, the methods of the present disclosure can reduce the amount of sequencing of unmethylated molecules in an assay of cell-free DNA for hypermethylation at cancer-specific differentially methylated regions. The hypermethylated regions at cancerspecific differentially methylated regions can be enriched prior to sequencing, reducing the costs of sequencing while maintaining a high technical (e.g., methylation detection) and clinical detection accuracy. The methods of the present disclosure can enable the resolution of molecules that indicate the presence of a disease (e.g., cancer) and molecules that indicate the absence of a disease (e.g., cancer) based on a methylation level threshold. Furthermore, the methods of the present disclosure can provide better molecular recovery than other methods, e.g., that involve partitioning to enrich methylated molecules or high stringency hybridizations after base conversion, and / or can utilize samples with a low input of DNA.

[0012] Accordingly, the following exemplary embodiments are provided.

[0013] Embodiment l is a method of analyzing DNA in a sample, the method comprising: a) contacting the sample with a methylation-discriminating nuclease, thereby degrading methylated or unmethylated DNA to produce a treated sample, wherein the sample is an original sample or is a proportionately methylated sample relative to an original sample; and b) subjecting the treated sample to a procedure that affects a first nucleobase differently from a second nucleobase, wherein the first nucleobase is a modified or unmodified nucleobase, the second nucleobase is a modified or unmodified nucleobase different from the first nucleobase, and the first nucleobase and the second nucleobase have the same base pairing specificity, and wherein the procedure alters the base pairing specificity of the first or second nucleobase to produce a treated and converted sample.

[0014] Embodiment 2 is a method of analyzing DNA in a sample, the method comprising: a) subjecting the sample to a procedure that affects a first nucleobase differently from a second nucleobase, wherein the sample is an original sample or is a proportionately methylated sample relative to an original sample, wherein the first nucleobase is a modified or unmodified nucleobase, the second nucleobase is a modified or unmodified nucleobase different from the first nucleobase, and the first nucleobase and the second nucleobase have the same base pairing specificity, and wherein the procedure alters the base pairing specificity of the first or second nucleobase to produce a converted sample; andAtty. Docket No. GH0193WO b) contacting the converted sample with a methylation-discriminating nuclease, thereby degrading methylated or unmethylated DNA to produce a treated and converted sample.

[0015] Embodiment 3 is the method of any one of the preceding embodiments, wherein the DNA comprises cell-free DNA (cfDNA).

[0016] Embodiment 4 is the method of any one of the preceding embodiments, wherein the sample is a tissue sample.

[0017] Embodiment 5 is the method of any one of the preceding embodiments, wherein the sample is a blood sample.

[0018] Embodiment 6 is the method of the immediately preceding embodiment, wherein the blood sample is a whole blood sample, a plasma sample, a buffy coat sample, a leukapheresis sample, or a peripheral blood mononuclear cell (PBMC) sample.

[0019] Embodiment 7 is the method of any one of the preceding embodiments, wherein the methylation-discriminating nuclease is a methylation-dependent restriction enzyme (MDRE).

[0020] Embodiment 8 is the method of the immediately preceding embodiment, wherein the MDRE cleaves a methylated CpG sequence.

[0021] Embodiment 9 is the method of embodiment 7 or embodiment 8, wherein the MDRE is one or more of MspJI, LpnPI, FspEI, or McrBC.Embodiment 9.1 is the method of any one of embodiments 7-9, wherein the MDRE is one or more of MspJI.Embodiment 9.2 is the method of any one of embodiments 7-9, wherein the MDRE is one or more of LpnPI.Embodiment 9.3 is the method of any one of embodiments 7-9, wherein the MDRE is one or more of FspEI.Embodiment 9.4 is the method of any one of embodiments 7-9, wherein the MDRE is one or more of McrBC.

[0022] Embodiment 10 is the method of any one of embodiments 1-6, wherein the methylation-discriminating nuclease is a methylation-sensitive restriction enzyme (MSRE).

[0023] Embodiment 11 is the method of the immediately preceding embodiment, wherein the MSRE cleaves an unmethylated CpG sequence.

[0024] Embodiment 12 is the method of embodiment 10 or embodiment 11, wherein the MSRE is one or more of Aatll, AccII, Acil, Aorl3HI, Aorl5HI, BspT104I, BssHII, BstUI, CfirlOI, Clal, Cpol, Eco52I, Haell, HapII, Hhal, Hin6I, Hpall, HpyCH4IV, Mlul, Nael, Notl, Nrul, Nsbl, PmaCI, Pspl406I, Pvul, SacII, Sall, Smal, and SnaBI.Atty. Docket No. GH0193WOEmbodiment 12.1 is the method of any one of embodiments 10-12, wherein the MSRE is AccII.Embodiment 12.2 is the method of any one of embodiments 10-12, wherein the MSRE is BstUI.Embodiment 12.3 is the method of any one of embodiments 10-12, wherein the MSRE is Hhal.Embodiment 12.4 is the method of any one of embodiments 10-12, wherein the MSRE is Hin6I.Embodiment 12.5 is the method of any one of embodiments 10-12, wherein the MSRE is Hpall.

[0025] Embodiment 13 is the method of any one of the preceding embodiments, wherein the procedure that affects a first nucleobase in the DNA differently from a second nucleobase comprises a conversion procedure.

[0026] Embodiment 14 is the method of any one of the preceding embodiments, wherein the first nucleobase is an unmodified cytosine and the second nucleobase is a modified cytosine, optionally wherein the modified cytosine is 5-methylcytosine or 5-hydroxymethylcytosine.

[0027] Embodiment 15 is the method of any one of the preceding embodiments, wherein the procedure that affects a first nucleobase of the DNA differently from a second nucleobase of the DNA is methylation-sensitive conversion.

[0028] Embodiment 16 is the method of the immediately preceding embodiment, wherein the methylation-sensitive conversion is bisulfite conversion, oxidative bisulfite (Ox-BS) conversion, Tet-assisted bisulfite (TAB) conversion, APOBEC-coupled epigenetic (ACE) conversion, enzymatic methyl-seq (EM-seq) conversion, single-enzyme 5-methylcytosine sequencing (SEM-seq) conversion, direct methylation sequencing (DM-seq), Tet-assisted pyridine borane sequencing (TAPS), or Tet-assisted pyridine borane sequencing with protection of 5hmC (TAPS-P).Embodiment 16.1 is the method of embodiment 15 or embodiment 16, wherein the methylation-sensitive conversion is bisulfite conversion.Embodiment 16.2 is the method of embodiment 15 or embodiment 16, wherein the methylation-sensitive conversion is oxidative bisulfite (Ox-BS) conversion.Embodiment 16.3 is the method of embodiment 15 or embodiment 16, wherein the methylation-sensitive conversion is Tet-assisted bisulfite (TAB) conversion.Embodiment 16.4 is the method of embodiment 15 or embodiment 16, wherein the methylation-sensitive conversion is APOBEC-coupled epigenetic (ACE) conversion.Atty. Docket No. GH0193WOEmbodiment 16.5 is the method of embodiment 15 or embodiment 16, wherein the methylation-sensitive conversion is enzymatic methyl-seq (EM-seq) conversion. Embodiment 16.6 is the method of embodiment 15 or embodiment 16, wherein the methylation-sensitive conversion is single-enzyme 5-methylcytosine sequencing (SEM-seq) conversion,.Embodiment 16.7 is the method of embodiment 15 or embodiment 16, wherein the methylation-sensitive conversion is direct methylation sequencing (DM-seq).Embodiment 16.8 is the method of embodiment 15 or embodiment 16, wherein the methylation-sensitive conversion is Tet-assisted pyridine borane sequencing (TAPS). Embodiment 16.9 is the method of embodiment 15 or embodiment 16, wherein the methylation-sensitive conversion is Tet-assisted pyridine borane sequencing with protection of 5hmC (TAPS-P).

[0029] Embodiment 17 is the method of the immediately preceding embodiment, wherein the Tet-assisted conversion further comprises a substituted borane reducing agent, optionally wherein the substituted borane reducing agent is 2-picoline borane, borane pyridine, tertbutylamine borane, or ammonia borane.

[0030] Embodiment 18 is the method of any one of embodiments 16-17, wherein the procedure that affects a first nucleobase of the DNA differently from a second nucleobase procedure comprises contacting the DNA with a CpG-specific DNA methyltransferase (MTase) or a CpG-specific carboxymethyltransferase (CxMTase), a methyl donor or a carboxymethyl donor, and a cytosine deaminase.

[0031] Embodiment 19 is the method of the immediately preceding embodiment, wherein the cytosine deaminase is an APOBEC enzyme, optionally wherein the APOBEC enzyme is APOBEC3A.

[0032] Embodiment 20 is the method of any one of embodiments 13-16, wherein the procedure that affects a first nucleobase in the DNA differently from a second nucleobase comprises bisulfite conversion; the contacting the sample with the methylation-discriminating nuclease occurs prior to bisulfite conversion; and the methylation-discriminating nuclease is a methylation-sensitive restriction enzyme (MSRE).

[0033] Embodiment 21 is the method of any one of embodiments 13-16, wherein the procedure that affects a first nucleobase in the DNA differently from a second nucleobase comprises direct methylation sequencing (DM-seq);Atty. Docket No. GH0193WO the contacting the sample with the methylation-discriminating nuclease occurs prior to direct methylation sequencing (DM-seq); and the methylation-discriminating nuclease is a methylation-sensitive restriction enzyme (MSRE).

[0034] Embodiment 22 is the method of any one of embodiments 13-16, wherein the procedure that affects a first nucleobase in the DNA differently from a second nucleobase comprises APOBEC-coupled epigenetic (ACE) conversion; the contacting the sample with the methylation-discriminating nuclease occurs prior to contacting the sample with APOBEC; and the methylation-discriminating nuclease is a methylation-sensitive restriction enzyme (MSRE).

[0035] Embodiment 23 is the method of any one of embodiments 13-16, wherein the procedure that affects a first nucleobase in the DNA differently from a second nucleobase comprises enzymatic methyl-seq (EM-seq) conversion, wherein EM-seq comprises a contacting a sample with APOBEC; the contacting the sample with the methylation-discriminating nuclease occurs prior to contacting the sample with APOBEC; and the methylation-discriminating nuclease is a methylation-sensitive restriction enzyme (MSRE).

[0036] Embodiment 24 is the method of any one of embodiments 13-16, wherein the procedure that affects a first nucleobase in the DNA differently from a second nucleobase comprises single-enzyme 5-methylcytosine sequencing (SEM-seq) conversion; the contacting the sample with the methylation-discriminating nuclease occurs prior to singleenzyme 5-methylcytosine sequencing (SEM-seq) conversion; and the methylation-discriminating nuclease is a methylation-sensitive restriction enzyme (MSRE).

[0037] Embodiment 25 is the method of any one of the preceding embodiments, further comprising ligating one or more adapters to the DNA, thereby producing adapter-ligated DNA.

[0038] Embodiment 26 is the method of the immediately preceding embodiment, wherein at least one cytosine in the one or more adapters is an unmodified cytosine, optionally wherein each cytosine in the one or more adapters is an unmodified cytosine.Atty. Docket No. GH0193WO

[0039] Embodiment 27 is the method any one of embodiments 25-26, wherein at least one cytosine in the one or more adapters is a modification resistant cytosine, optionally wherein each cytosine in the one or more adapters is a modification resistant cytosine.

[0040] Embodiment 28 is the method of the immediately preceding embodiment, wherein the modification resistant cytosine is a deaminase resistant cytosine.

[0041] Embodiment 29 is the method of the immediately preceding embodiment, wherein the deaminase resistant cytosine is 5-propynylC (5pyC), 5-pyrrolo-dC (5pyrC), 5- hydroxymethylcytosine (5hmC), glucosylated5-hydroxymethylcytosine (5ghmC), cytosine 5- methylenesulfonate (CMS), or N4-modified cytosine.

[0042] Embodiment 30 is the method of any one of embodiments 25-29, wherein the one or more adapters are Y-shaped adapters.

[0043] Embodiment 31 is the method of embodiment any one of embodiments 25-30, wherein the one or more adapters comprise molecular barcodes.

[0044] Embodiment 32 is the method of any one of embodiments 25-31, wherein the one or more adapters is resistant to digestion by the methylation-discriminating nuclease.

[0045] Embodiment 33 is the method of the immediately preceding embodiment, wherein the methylation-discriminating nuclease is a methylation-sensitive restriction enzyme (MSRE) and wherein the one or more adapters that is resistant to digestion by the MSRE:(i) comprises one or more methylated nucleotides, optionally wherein the methylated nucleotides comprise 5-methylcytosine and / or 5-hydroxymethylcytosine;(ii) comprises one or more nucleotide analogs resistant to methylation sensitive restriction enzymes; or(iii) does not comprise a nucleotide sequence recognized by the MSRE.

[0046] Embodiment 34 is the method of embodiment 32, wherein the methylationdiscriminating nuclease is a methylation-dependent restriction enzyme (MDRE) and wherein the one or more adapters that is resistant to digestion by the MDRE:(i) comprises one or more unmethylated nucleotides;(ii) comprises one or more nucleotide analogs resistant to methylation dependent restriction enzymes; or(iii) does not comprise a nucleotide sequence recognized by the MDRE.

[0047] Embodiment 35 is the method of any one of embodiments 25-34, wherein the ligating one or more adapters to the DNA occurs prior to subjecting the sample or the treated sample to a procedure that affects a first nucleobase differently from a second nucleobase.Atty. Docket No. GH0193WO

[0048] Embodiment 36 is the method of any one of the preceding embodiments, further comprising subjecting the DNA to end repair to generate end-repaired DNA molecules, wherein the end repair is performed using deoxynucleotide triphosphates (dNTPs).

[0049] Embodiment 37 is the method of the immediately preceding embodiment, wherein the end repair is performed using at least one type of dNTP which comprises a modified base, wherein the modified base is other than 5mC or 5hmC, and the at least one type of dNTP comprising a modified base is incorporated into a repaired region of the end-repaired DNA molecules at one or more locations.

[0050] Embodiment 38 is the method of embodiment 36, wherein the end repair is performed using at least one type of dNTP which comprises a modified base, wherein the modified base is a methylated cytosine, optionally wherein the methylated base is 5mC or 5hmC, and the at least one type of dNTP comprising a modified base is incorporated into a repaired region of the end-repaired DNA molecules at one or more locations.

[0051] Embodiment 39 is the method of embodiment 36, wherein the end repair is performed using at least one type of dNTP which comprises a modified base, wherein the modified base is a methylated cytosine, optionally wherein the methylated base is 5mC or 5hmC, wherein the at least one type of dNTP comprising a modified base is incorporated into a repaired region of the end-repaired DNA molecules at one or more locations, and the repaired region is defined as:(i) the sequence between two non-methylated cytosines which span one or more methylated CpH cytosines; and / or(ii) the sequence between a methylated CpH cytosine and an end of a sequence read, wherein the methylated CpH cytosine is the CpH cytosine most distant from the end of the sequence read, or a subsequence thereof comprising one or more methylated CpH cytosines.

[0052] Embodiment 40 is the method of embodiment 36, wherein at least one type of dNTP comprises a modified base, and the at least one dNTP comprising a modified base is incorporated into a repaired region of the end-repaired DNA molecules at one or more locations.

[0053] Embodiment 41 is the method of any one of embodiments 36-40, wherein the end repair is performed using a DNA polymerase that does not have 5 ’-3’ exonuclease activity and / or is not a strand displacing DNA polymerase.

[0054] Embodiment 42 is the method of any one of embodiments 36-40, wherein the end repair is performed using a DNA polymerase that has 5 ’-3’ exonuclease activity and / or is a strand displacing DNA polymerase.Atty. Docket No. GH0193WO

[0055] Embodiment 43 is the method of any one of embodiments 36-42, wherein the at least one type of dNTP which comprises a modified base, wherein the modified base includes a dNTP comprising 4-methylcytosine (4mC), a dNTP comprising 5-methylcytosine (5mC), a dNTP comprising 5-hydroxymethyl-cytosine (5hmC), a dNTP comprising N6- methyladenosine (6mA), a dNTP comprising bromodeoxyuridine (BrdU) and / or a dNTP comprising 8-oxoguanine (8oxoG).

[0056] Embodiment 44 is the method of any one of embodiments 36-43, wherein the subjecting the DNA to end repair occurs prior step a) and / or prior to ligating one or more adapters to the DNA.

[0057] Embodiment 45 is the method of any one of the preceding embodiments, further comprising performing an A-tailing reaction, optionally after a step of subjecting the DNA to end repair.

[0058] Embodiment 46 is the method of the immediately preceding embodiment, wherein the end-repair and the A-tailing reaction are performed in the same reaction mixture, optionally wherein the end-repair and the A-tailing reaction are performed a single tube and / or optionally wherein the end-repair and the A-tailing reaction are performed without an intervening clean-up step.

[0059] Embodiment 47 is the method of embodiment 45 or embodiment 46, wherein the A- tailing is performed using a DNA polymerase that does not possess 5’-3’ exonuclease activity and / or is not a strand displacing DNA polymerase, optionally wherein the DNA polymerase is HemoKlen Taq.

[0060] Embodiment 48 is the method of any one of embodiments 45-48, wherein the A- tailing is performed using a thermostable DNA polymerase.

[0061] Embodiment 49 is the method of any one of the preceding embodiments, further comprising amplifying DNA in the sample using a DNA polymerase.

[0062] Embodiment 50 is the method of the immediately preceding embodiment, wherein the DNA polymerase is a uracil-tolerant DNA polymerase.

[0063] Embodiment 51 is the method of embodiment 49 or embodiment 50, wherein the amplifying occurs after step b).

[0064] Embodiment 52 is the method of any one of the preceding embodiments, further comprising capturing a first target region set comprising epigenetic target regions from the DNA.

[0065] Embodiment 53 is the method of the immediately preceding embodiment, wherein the capturing comprises contacting the DNA in the sample with a plurality of target-specificAtty. Docket No. GH0193WO probes specific for members of the epigenetic target region set, thereby providing captured DNA.

[0066] Embodiment 54 is the method of embodiment 52 or embodiment 53, wherein the capturing further comprises capturing sequence-variable target regions of the DNA, comprising contacting the DNA with a plurality of target-specific probes specific for the sequence-variable target regions.

[0067] Embodiment 55 is the method of any one of embodiments 52-54, wherein the capturing occurs after step b).

[0068] Embodiment 56 is the method of any one of embodiments 52-55, wherein the capturing occurs after amplifying the DNA.

[0069] Embodiment 57 is the method of any one of embodiments 52-56, wherein the capturing occurs after step b) and after amplifying the DNA.

[0070] Embodiment 58 is the method of any one of embodiments 52-57, wherein the first target region set comprises a hypermethylation variable target region set.

[0071] Embodiment 59 is the method of the immediately preceding embodiment, wherein the hypermethylation variable target region set comprises regions having a higher degree of methylation in at least one type of tissue than the degree of methylation in cell-free DNA from a healthy subject.

[0072] Embodiment 60 is the method of embodiment 58 or embodiment 59, wherein the method further comprises determining a presence, absence, or likelihood of cancer based at least in part on sequences or quantities of regions in the hypermethylation variable target region set.

[0073] Embodiment 61 is the method of any one of embodiments 58-60, further comprising quantifying tumor DNA in the sample based at least in part on sequences or quantities of regions in the hypermethylation variable target region set.

[0074] Embodiment 62 is the method of any one of embodiments 52-57, wherein the epigenetic target regions comprise a hypomethylation variable target region set.

[0075] Embodiment 63 is the method of the immediately preceding embodiment, wherein the hypomethylation variable target region set comprises regions having a lower degree of methylation in at least one type of tissue than the degree of methylation in cell-free DNA from a healthy subject.

[0076] Embodiment 64 is the method of the immediately preceding embodiment, wherein the method further comprises determining a presence, absence, or likelihood of cancer based atAtty. Docket No. GH0193WO least in part on sequences or quantities of regions in the hypomethylation variable target region set.

[0077] Embodiment 65 is the method of any one of embodiments 62-64, further comprising quantifying tumor DNA in the sample based at least in part on sequences or quantities of regions in the hypomethylation variable target region set.

[0078] Embodiment 66 is the method of any one of embodiments 52-65, wherein the epigenetic target regions comprise a methylation control target region set.

[0079] Embodiment 67 is the method of any one of embodiments 65-66, wherein the epigenetic target region set comprise a fragmentation variable target region set.

[0080] Embodiment 68 is the method of the immediately preceding embodiment, wherein the fragmentation variable target region set comprises transcription start site regions.

[0081] Embodiment 69 is the method of embodiment 67 or embodiment 68, wherein the fragmentation variable target region set comprises CTCF binding regions.

[0082] Embodiment 70 is the method of any one of embodiments 54-69, wherein DNA molecules corresponding to the sequence-variable target region set are captured with a greater capture yield than DNA molecules corresponding to the epigenetic target region set.

[0083] Embodiment 71 is the method of any one of embodiments 52-70, wherein capturing comprises contacting DNA to be captured with a set of target-specific probes, whereby complexes of target-specific probes and DNA are formed.

[0084] Embodiment 72 is the method of the immediately preceding embodiment, wherein capturing further comprises separating the complexes from DNA not bound to target-specific probes, thereby providing captured DNA.

[0085] Embodiment 73 is the method of embodiment 71 or embodiment 72, wherein the set of target-specific probes is configured to capture DNA corresponding to the sequencevariable target region set with a greater capture yield than DNA corresponding to the epigenetic target region set.

[0086] Embodiment 74 is the method of any one of embodiments 54-73, comprising sequencing DNA molecules corresponding to the sequence-variable target region set to a greater depth of sequencing than DNA molecules corresponding to the epigenetic target region set.

[0087] Embodiment 75 is the method of any one of the preceding embodiments, further comprising sequencing at least a portion of the DNA in the sample.

[0088] Embodiment 76 is the method of the immediately preceding embodiment, wherein the sequencing occurs after step b).Atty. Docket No. GH0193WO

[0089] Embodiment 77 is the method of embodiment 75 or embodiment 76, wherein the sequencing occurs after amplifying a treated and converted sample.

[0090] Embodiment 78 is the method of any one of embodiments 75-77, wherein the sequencing occurs after capturing a first target region set comprising epigenetic target regions from the sample.

[0091] Embodiment 79 is the method of any one of embodiments 75-78, wherein the sequencing occurs after step b), after amplifying a treated and converted sample, and after capturing a first target region set comprising epigenetic target regions from the sample.

[0092] Embodiment 80 is the method of any one of the preceding embodiments, further comprising quantifying a level of methylation at one or more differentially methylated regions of the DNA.

[0093] Embodiment 81 is the method of the immediately preceding embodiment, wherein quantifying the level of methylation at one or more differentially methylated regions of the DNA comprises sequencing at least a portion of the amplified DNA or quantitative PCR.

[0094] Embodiment 82 is the method of any one of the preceding embodiments, wherein the sequencing comprises sequencing the DNA in a manner that distinguishes the first nucleobase from the second nucleobase.

[0095] Embodiment 83 is the method of any one of embodiments 74-82, wherein the sequencing comprises next-generation sequencing (NGS).

[0096] Embodiment 84 is the method of the immediately preceding embodiment, wherein the NGS comprises pyrosequencing, sequencing-by-synthesis, semiconductor sequencing, sequencing-by-ligation, or sequencing-by-hybridization.

[0097] Embodiment 85 is the method of any one of embodiments 74-82, wherein the sequencing comprises single-molecule real time (SMRT) sequencing.

[0098] Embodiment 86 is the method of any one of embodiments 74-82, wherein the sequencing comprises long-read sequencing.

[0099] Embodiment 87 is the method of any one of embodiments 74-82, wherein the sequencing comprises nanopore-based sequencing.

[0100] Embodiment 88 is the method of any one of embodiments 74-82, wherein the sequencing comprises 5-letter or 6-letter sequencing.

[0101] Embodiment 89 is the method of embodiments 74-82 or 87, wherein the sequencing comprises nanopore-based sequencing and the method comprises subjecting the DNA in the sample to end repair to generate end-repaired DNA molecules, wherein the end repair is performed using at least one type of dNTP which comprises a modified base including aAtty. Docket No. GH0193WO dNTP comprising 4mC, a dNTP comprising 5mC, a dNTP comprising 5hmC, a dNTP comprising 6mA, a dNTP comprising BrdU, dUTP, a dNTP comprising fluorodeoxyuridine (FldU), a dNTP comprising 5-iododeoxyuridine (IdU), and / or a dNTP comprising 5- ethynyldeoxyuridine (EdU), and the at least one type of dNTP comprising a modified base is incorporated into a repaired region of the end-repaired DNA molecules at one or more locations.

[0102] Embodiment 90 is the method of embodiments 74-82 or 85, wherein the sequencing comprises single-molecule real time (SMRT) sequencing and the method comprises subjecting the DNA in the sample to end repair to generate end-repaired DNA molecules, wherein the end repair is performed using at least one type of dNTP which comprises a modified base including a dNTP comprising a 4mC, a dNTP comprising 5mC, a dNTP comprising 5hmC, a dNTP comprising 6mA, and / or a dNTP comprising 8oxoG, and the at least one type of dNTP comprising a modified base is incorporated into a repaired region of the end-repaired DNA molecules at one or more locations.

[0103] Embodiment 91 is the method of any one of embodiments 74-90, further comprising analyzing at least some of the sequence data corresponding to regions that are not identified as being synthesized during the end repair to detect the presence or absence of base modifications or mutations present in the DNA sample.

[0104] Embodiment 92 is the method of any one of embodiments 74-91, wherein the method further comprises detecting the methylation status of cytosines in the DNA in the sample, and further comprises analyzing the sequence data, wherein the analyzing the sequence data filtering out the one or more repaired regions of the end-repaired DNA molecules such that the one or more repaired regions are not used to determine the methylation status of cytosines in the DNA sample.

[0105] Embodiment 93 is the method of any one of embodiments 74-91, wherein the method is for detecting the single nucleotide variants (SNVs) in the DNA sample, and further comprises analyzing the sequence data, wherein the analyzing the sequence data comprises classifying all base calls within the one or more end repaired regions as not having double stranded support.

[0106] Embodiment 94 is the method of any one of the preceding embodiments, further comprising analyzing the sequence data to determine a level of measured artifacts in the DNA of the sample.

[0107] Embodiment 95 is the method of any one of the preceding embodiments, wherein the sample is from a subject.Atty. Docket No. GH0193WO

[0108] Embodiment 96 is the method of any one of the preceding embodiments, wherein the sample is from a subject and the method further comprises determining the presence or absence of cancer in the subject based at least in part on the sequencing data.

[0109] Embodiment 97 is the method of any one of embodiments 95-96, wherein the subject is an animal.

[0110] Embodiment 98 is the method of the immediately preceding embodiment, wherein the subject is a human.[oni] Embodiment 99 is the method of any one of embodiments 95-98, wherein the subject has or is at risk of having a cancer.

[0112] Embodiment 100 is the method of any one of embodiments 95-99, further comprising determining the presence or status of a cancer in the subject.

[0113] Embodiment 101 is the method of any one of embodiments 95-99, further comprising determining a likelihood that the subject has cancer.

[0114] Embodiment 102 is the method of the immediately preceding embodiment, wherein the sequencing generates a plurality of sequencing reads; and the method further comprises mapping the plurality of sequence reads to one or more reference sequences to generate mapped sequence reads, and processing the mapped sequence reads corresponding to the sequence-variable target region set and to the epigenetic target region set to determine the likelihood that the subject has cancer.

[0115] Embodiment 103 is the method of embodiment 101, wherein the test subject was previously diagnosed with a cancer and received one or more previous cancer treatments, optionally wherein the cfDNA is obtained at one or more preselected time points following the one or more previous cancer treatments, and sequencing the captured set of cfDNA molecules, whereby a set of sequence information is produced.

[0116] Embodiment 104 is the method of the immediately preceding embodiment, further comprising detecting a presence or absence of DNA originating or derived from a tumor cell at a preselected timepoint using the set of sequence information.

[0117] Embodiment 105 is the method of the immediately preceding embodiment, further comprising determining a cancer recurrence score that is indicative of the presence or absence of the DNA originating or derived from the tumor cell for the test subject, optionally further comprising determining a cancer recurrence status based on the cancer recurrence score, wherein the cancer recurrence status of the test subject is determined to be at risk for cancer recurrence when a cancer recurrence score is determined to be at or above a predetermined threshold or the cancer recurrence status of the test subject is determined to beAtty. Docket No. GH0193WO at lower risk for cancer recurrence when the cancer recurrence score is below the predetermined threshold.

[0118] Embodiment 106 is the method of the immediately preceding embodiment, further comprising comparing the cancer recurrence score of the test subject with a predetermined cancer recurrence threshold, wherein the test subject is classified as a candidate for a subsequent cancer treatment when the cancer recurrence score is above the cancer recurrence threshold or not a candidate for a subsequent cancer treatment when the cancer recurrence score is below the cancer recurrence threshold.

[0119] In some embodiments, the results of the methods disclosed herein are used as an input to generate a report. The report may be in a paper or electronic format. For example, true copy number variation, as obtained by the methods disclosed herein, or information derived therefrom, can be displayed directly in such a report. Alternatively or additionally, diagnostic information or therapeutic recommendations which are at least in part based on the methods disclosed herein can be included in the report.

[0120] The various steps of the methods disclosed herein may be carried out at the same or different times, in the same or different geographical locations, e.g. countries, and / or by the same or different people.

[0121] Additional advantages will be set forth in part in the description which follows or may be learned by practice. The advantages will be realized and attained by means of the elements and combinations particularly pointed out in the appended claims.I. BRIEF DESCRIPTION OF THE DRAWINGS

[0122] FIG. 1 is a schematic diagram representation of a comparative MSRE-sequencing method for determining the methylation status of nucleic acid molecules in a polynucleotide sample obtained from a subject. A cfDNA sample, in which cfDNA is isolated from the blood sample and the cfDNA sample, is digested with one or more MSREs, cleaving unmethylated cfDNA molecules at the RE recognition site, to produce a treated sample; and then the treated sample is sequenced using NGS. The cfDNA sample can contain a hypermethylated differentially methylated region (DMR) from a subject with cancer. MSRE cleaves abundant unmethylated and some lowly methylated cfDNA molecules from a sample. MSRE-sequencing enriches methylated molecules but partially methylated molecules lead to low signal-to-noise ratio. MSRE-sequencing can create false positive cancer detection because of partially methylated molecules, including in samples from subjects without cancer, with a methylated MSRE site.Atty. Docket No. GH0193WO

[0123] FIG. 2 is a schematic diagram representation of a comparative single-site methylation sequencing (SSM) method for determining the methylation status of nucleic acid molecules in a polynucleotides sample obtained from a subject. A cfDNA sample, in which cfDNA is isolated from the blood sample and the cfDNA sample, is converted with an SSM conversion process, and then the converted samples are sequenced using NGS. The cfDNA sample can contain a hypermethylated differentially methylated region (DMR) from a subject with cancer. SSM accurately resolves DNA molecules derived from a cancerous sample and normal DNA molecules, but SSM has high sequencing costs because unmethylated molecules are sequenced in the assay.

[0124] FIG. 3 is a schematic diagram of a method for detecting the presence or absence of cancer in a subject according to certain embodiments of the disclosure. A cfDNA sample, in which cfDNA is isolated from the blood sample and the cfDNA sample, is digested with one or more MSREs, cleaving unmethylated cfDNA molecules at the RE recognition site, to produce a treated sample; the treated sample is converted, and then the converted samples are sequenced using NGS. The cfDNA sample can contain a hypermethylated differentially methylated region (DMR) from a subject with cancer. MSRE cleaves abundant unmethylated and some lowly methylated cfDNA molecules from a sample. The method accurately resolves DNA molecules derived from a cancerous sample and normal DNA molecules without necessitating sequencing of unmethylated molecules.

[0125] FIG. 4 is a flow chart representation of a method for detecting the presence or absence of cancer in a subject according to certain embodiments of the disclosure. A cfDNA sample, in which cfDNA is isolated from the blood sample and the cfDNA sample, undergoes end-repair and A-tailing reactions; the end-repaired cfDNA is ligated to an adapter that is protected from a conversion step; the ligated cfDNA is digested with one or more MSREs, cleaving unmethylated cfDNA molecules at the MSRE recognition site, to produce a treated sample; the treated sample is converted, producing a treated and converted sample; and then the treated and converted sample is amplified, optionally enriched for target sequences, and sequenced using NGS. End-repair may be performed with mCTP. This enables identification of end-repair synthesized regions in conversion analysis, leading to improved methylation calling accuracy. The method enriches methylated molecules with a high resolution of signal.

[0126] FIG. 5 is a schematic diagram of an example of a system suitable for use with some embodiments of the disclosure.II. DETAILED DESCRIPTION OF CERTAIN EMBODIMENTSAtty. Docket No. GH0193WO

[0127] Reference will now be made in detail to certain embodiments of the disclosure. While the disclosure will be described in conjunction with such embodiments, it will be understood that they are not intended to limit the disclosure to those embodiments. On the contrary, the disclosure is intended to cover all alternatives, modifications, and equivalents, which may be included within the disclosure as defined by the appended claims.

[0128] Before describing the present teachings in detail, it is to be understood that the disclosure is not limited to specific compositions or process steps, as such may vary. It should be noted that, as used in this specification and the appended claims, the singular form “a”, “an” and “the” include plural references unless the context clearly dictates otherwise. Thus, for example, reference to “a nucleic acid” includes a plurality of nucleic acids.

[0129] Numeric ranges are inclusive of the numbers defining the range. Measured and measurable values are understood to be approximate, taking into account significant digits and the error associated with the measurement.

[0130] Unless specifically noted in the above specification, embodiments in the specification that recite “comprising” various components are also contemplated as “consisting of’ or “consisting essentially of’ the recited components.

[0131] The section headings used herein are for organizational purposes and are not to be construed as limiting the disclosed subject matter in any way.

[0132] All patents, patent applications, websites, other publications or documents and the like cited herein whether supra or infra, are expressly incorporated by reference in their entirety for all purposes to the same extent as if each individual item were specifically and individually indicated to be so incorporated by reference. If different versions of a publication, website or the like are published at different times, the version most recently published at the effective filing date of the application is meant, unless otherwise indicated.LDefinitions

[0133] “ Solid tissue” or “solid tissue cells” as used herein means tissue or cells, respectively, in or derived from a solid tissue. Solid tissue cells exclude circulating cell types, such as cells normally present in blood or lymph. Examples of solid tissue types include but are not limited to colon, lung, breast, skin, prostate, stomach, pancreas, bladder, kidney, and liver.

[0134] A “reaction cleanup” refers to the removal of contaminants such as salts, enzymes, unincorporated dNTPs, primers, ethidium bromide, and other impurities that can interfere with downstream analysis. For example, when a reaction cleanup is performed between end repair and an A-tailing reaction, it removes unincorporated dNTPs such that the A-tailing reaction can be performed solely in the presence of dATP (z.e. not dCTP, dGTP and dCTP, asAtty. Docket No. GH0193WO used in the end tailing reaction). Reaction cleanups can be performed using commercially available kits such as MinElute Reaction Cleanup Kit (Qiagen).

[0135] “Regions of the end-repaired DNA that were synthesized during the end repair reaction”, also referred to as “repaired regions” or “synthesized regions,” refer to regions of the DNA that were not present in the DNA prior to the end repair and A-tailing reactions. They are regions which have been synthesized by the polymerases used in the end repair and / or A tailing reactions, if present. In instances where the A-tailing is performed in the same tube as the end repair reaction, all four types of dNTPs will be present, and thus the polymerases used for A-tailing may generate synthesized regions, e.g. through nick translation. In instances where the A-tailing is performed separately to the end repair reaction, and these steps are separated by a reaction cleanup, only dATP will be present in the A-tailing reaction, and thus the polymerases used for A-tailing will not typically generate synthesized regions because the dNTP components are not all present in the A-tailing reaction mix.

[0136] A “type of dNTP” refers to a dNTP comprising a specific base, including A, T, G or C. Accordingly, wherein an end repair reaction is performed with dNTPs, wherein at least one type of dNTP comprises a modified base, the end repair reaction may be performed using dCTP comprising 5mC, and dATP, dTTP and dGTP all comprising non-modified bases.

[0137] “Capable of identifying the base modification in the at least one type of dNTP” refers to the ability of a modification-sensitive sequencing method to detect the presence or absence of the base modification in the at least one type of dNTP comprising a modified base used in the end repair. This detection of the base modification may be direct, such as in nanopore sequencing or single molecule real time sequencing, wherein the sequencing data itself indicates the presence or absence of a base modification. Alternatively, the detection of the base modification may be indirect, for example wherein the method involves a conversion procedure which alters the base pairing specificity dependent on the base modification status. It is these changes in base pairing specificity which can be detected by the sequencing method, e.g. through the comparison of the sequencing data to a reference sequence. Moreover, a modification-sensitive sequencing method is capable of identifying the base modification in the at least one type of dNTP regardless of whether it can distinguish one base modification from all other base modifications. For example, one form of modificationsensitive sequencing is sequencing after bisulfite conversion. This method is capable of distinguishing 5hmC and 5mC from unmethylated cytosine, but cannot distinguish 5hmC from 5mC.Atty. Docket No. GH0193WO

[0138] Bases of the “same identity” refer to the same base, regardless of modification status of that base. For example, cytosine is considered to be the “same identity” as 5- methylcytosine (5mC) and / or 5-hydroxymethyl-cytosine (5hmC), despite them having different modification statuses.

[0139] “Cell-free DNA,” “cfDNA molecules,” or simply “cfDNA” include DNA molecules that naturally occur in a subject in extracellular form (e.g., in blood, serum, plasma, or other bodily fluids such as lymph, cerebrospinal fluid, urine, or sputum). While the cfDNA originally existed in a cell or cells in a large complex biological organism, e.g., a mammal, it has undergone release from the cell(s) into a fluid found in the organism, and may be obtained from a sample of the fluid without the need to perform an in vitro cell lysis step.

[0140] As used herein, “cellular nucleic acids” means nucleic acids that are located within one or more cells from which the nucleic acids have originated, at least at the point a sample is taken or collected from a subject, even if those nucleic acids are subsequently removed (e.g., via cell lysis) as part of a given analytical process.

[0141] A “target region set” or “set of target regions” or “target regions” or “target regions of interest” or “regions of interest” or “genomic regions of interest” refers to a plurality of genomic loci or a plurality of genomic regions targeted for capture and / or targeted by a set of probes (e.g., through sequence complementarity).

[0142] “Sequence-variable target region set” refers to a set of target regions that may exhibit changes in sequence such as nucleotide substitutions (i.e., single nucleotide variations), insertions, deletions, or gene fusions or transpositions in neoplastic cells (e.g., tumor cells and cancer cells).

[0143] “Epigenetic target region set” refers to target regions that may show sequenceindependent changes in neoplastic cells (e.g., tumor cells or cancer cells) or that may show sequence-independent changes in cfDNA from subjects having cancer relative to cfDNA from healthy subjects. Examples of sequence-independent changes include, but are not limited to, changes in methylation (increases or decreases), nucleosome distribution, CCCTC-binding factor (“CTCF”) binding, transcription start sites, and regulatory protein binding regions. For present purposes, loci susceptible to neoplasia-, tumor-, or cancer- associated focal amplifications and / or gene fusions may also be included in an epigenetic target region set because detection of a change in copy number by sequencing or a fused sequence that maps to more than one locus in a reference genome tends to be more similar to detection of exemplary epigenetic changes discussed above than detection of nucleotide substitutions, insertions, or deletions, e.g., in that the focal amplifications and / or gene fusionsAtty. Docket No. GH0193WO can be detected at a relatively shallow depth of sequencing because their detection does not depend on the accuracy of base calls at one or a few individual positions.

[0144] As used herein, an “epigenetic feature” refers to any feature of DNA or chromatin other than primary sequence (i.e., the sequence of A, C, G, and T bases). Epigenetic features include covalent modifications of bases, such as methylation, and modifications and positioning of histones and other stably DNA-associated proteins.

[0145] As used herein, a “differentially methylated region” (DMR) refers to a region of DNA having a detectably different degree of methylation in at least one cell or tissue type relative to the degree of methylation in the same region of DNA from at least one other cell or tissue type; or having a detectably different degree of methylation in at least one cell or tissue type obtained from a subject having a disease or disorder relative to the degree of methylation in the same region of DNA in the same cell or tissue type obtained from a healthy subject . In some embodiments, a DMR has a detectably higher degree of methylation (e.g., hypermethylated region) in at least one cell or tissue type relative to the degree of methylation in the same region of DNA from at least one other cell or tissue type or from the same cell or tissue type from a healthy subject. In some embodiments, a DMR has a detectably lower degree of methylation (e.g., hypom ethylated region) in at least one cell or tissue type relative to the degree of methylation in the same region of DNA from at least one other cell or tissue type or from the same cell or tissue type from a healthy subject.

[0146] As used herein, “type-specific” in the context of an epigenetic variation means an epigenetic variation that is present at a detectably different degree in one cell or tissue type, or in a plurality of related cell or tissue types, relative to other cell or tissue types. Similarly, a “type-specific epigenetic target region” is an epigenetic target region that has a detectably different epigenetic characteristic in one cell or tissue type, or in a plurality of related cell or tissue types, relative to other cell or tissue types. Exemplary epigenetic characteristics are discussed in the definition of epigenetic target regions set forth above. For example, a “typespecific differentially methylated region” is a region of DNA that has a detectably different degree of methylation in one cell or tissue type, or in a plurality of related cell or tissue types, relative to other cell or tissue types. Examples of a type-specific differentially methylated region include tissue-specific differentially methylated regions, including those associated with copy-number gain in early cancer. In some embodiments, capturing, identification, and / or detection of type-specific differentially methylated regions facilitates identification of the cell or tissue type from which the DNA originated. The cell or tissue from which a typespecific differentially methylated region originated may be a wild type cell or tissue or aAtty. Docket No. GH0193WO neoplastic cell or tissue. In another example, a “type-specific fragment” of DNA is a DNA fragment arising from a type-specific fragmentation pattern that is present at a detectably different degree in one cell or tissue type, or in a plurality of related cell or tissue types, relative to other cell or tissue types. In some embodiments, a type-specific fragment is only present in the specific cell or tissue type(s). In some embodiments, a type-specific fragment is present to a detectably greater extent in the specific cell or tissue type(s).

[0147] As used herein, a “blood sample” refers to a sample comprising whole blood or a component thereof (e.g., plasma, serum, buffy coat, plasma pellet).

[0148] As used herein, “partitioning” refers to physically separating or fractionating a mixture of nucleic acid molecules in a sample based on a characteristic of the nucleic acid molecules. The partitioning can be physical partitioning of molecules. Partitioning can involve separating the nucleic acid molecules into groups or sets based on the level of epigenetic feature (for e.g., methylation). For example, the nucleic acid molecules can be partitioned based on the level of methylation of the nucleic acid molecules. In some embodiments, the methods and systems used for partitioning may be found in PCT Patent Application No. PCT / US2017 / 068329, which is hereby incorporated by reference in its entirety.

[0149] As used herein, “partitioned set” or “partition” refers to a set of nucleic acid molecules partitioned into a set or group based on the differential binding affinity of the nucleic acid molecules or proteins associated with the nucleic acid molecules to a binding agent. A partitioned set may also be referred to as a subsample. The binding agent binds preferentially to the nucleic acid molecules comprising nucleotides with epigenetic modification. For example, if the epigenetic modification is methylation, the binding agent can be a methyl binding domain (MBD) protein. In some embodiments, a partitioned set can comprise nucleic acid molecules belonging to a particular level or degree of epigenetic feature (for e.g., methylation). For example, the nucleic acid molecules can be partitioned into three sets - one set for highly methylated nucleic acid molecules (first subsample, hyper partition, hyper partitioned set or hypermethylated partitioned set), a second set for low methylated nucleic acid molecules (second subsample, hypo partition, hypo partitioned set or hypomethylated partitioned set), and a third set for intermediate methylated nucleic acid molecules (third subsample, intermediate partitioned set, intermediately methylated partitioned set, residual partitioned set, or residual partition). In another example, the nucleic acid molecules can be partitioned based on the number of methylated nucleotides - oneAtty. Docket No. GH0193WO partitioned set can have nucleic acid molecules with nine methylated nucleotides, and another partitioned set can have unmethylated nucleic acid molecules (zero methylated nucleotides).

[0150] As used herein, the form of the “originally isolated” sample refers to the composition or chemical structure of a sample at the time it was isolated and before undergoing any procedure that changes the chemical structure of the isolated sample. Similarly, a feature that is “originally present” in a molecule refers to a feature present in an “original molecule” or in molecules “originally comprising” the feature before the molecule undergoes any procedure that changes the chemical structure of the molecule.

[0151] As used herein, “base pairing specificity” refers to the standard DNA base (A, C, G, or T) for which a given base most preferentially pairs. Thus, for example, unmodified cytosine and 5-methylcytosine have the same base pairing specificity (i.e., specificity for G) whereas uracil and cytosine have different base pairing specificity because uracil has base pairing specificity for A while cytosine has base pairing specificity for G. The ability of uracil to form a wobble pair with G, for example, is irrelevant because uracil nonetheless most preferentially pairs with A among the four standard DNA bases.

[0152] “Capturing” one or more target molecules refers to preferentially isolating or separating the one or more target molecules from non-target molecules.

[0153] A “captured set” of nucleic acids refers to nucleic acids that have undergone capture.

[0154] “Corresponding to a target region set” means that a nucleic acid, such as cfDNA, originated from a locus in the target region set or specifically binds one or more probes for the target-region set.

[0155] As used herein, a “label” is a capture moiety, fluorophore, oligonucleotide, or other moiety that facilitates detection, separation, or isolation of that to which it is attached.

[0156] As used herein, a “capture moiety” is a molecule that allows affinity separation of molecules linked to the capture moiety from molecules lacking the capture moiety.Exemplary capture moieties include biotin, which allows affinity separation by binding to streptavidin linked or linkable to a solid phase or an oligonucleotide, which allows affinity separation through binding to a complementary oligonucleotide linked or linkable to a solid phase.

[0157] As used herein, a “target-specific probe” means a probe that specifically binds to a target region, such as an epigenetic target region or a sequence-variable target region. In some embodiments, target-specific probes comprise a capture moiety to facilitate capture of the target region to which it specifically binds.Atty. Docket No. GH0193WO

[0158] As used herein, a “tag” is a molecule, such as a nucleic acid, label, fluorophore, or peptide, containing information that indicates a feature of the molecule to which the tag is associated. For example, molecules can bear a sample tag (which distinguishes molecules in one sample from those in a different sample), a molecular tag / molecular barcode / barcode (which distinguishes different molecules from one another (in both unique and non-unique tagging scenarios), a purification tag, and / or a detectable tag or label.

[0159] As used herein, a “target molecule” is a molecule, such as a protein, carbohydrate, nucleic acid, or lipid, that is targeted for capture, identification, and / or detection. In some embodiments, a target molecule is a nucleic acid comprising an epigenetic target region and / or a sequence-variable target region.

[0160] “Specifically binds” in the context of a probe or other oligonucleotide and a target sequence means that under appropriate hybridization conditions, the oligonucleotide or probe hybridizes to its target sequence, or replicates thereof, to form a stable probe:target hybrid, while at the same time formation of stable probemon-target hybrids is minimized. Thus, a probe hybridizes to a target sequence or replicate thereof to a sufficiently greater extent than to a non-target sequence, to enable capture or detection of the target sequence. Appropriate hybridization conditions are well-known in the art, may be predicted based on sequence composition, or can be determined by using routine testing methods (see, e.g., Sambrook et al., Molecular Cloning, A Laboratory Manual, 2nded. (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1989) at §§ 1.90-1.91, 7.37-7.57, 9.47-9.51 and 11.47-11.57, particularly §§ 9.50-9.51, 11.12-11.13, 11.45-11.47 and 11.55-11.57, incorporated by reference herein).

[0161] DNA is “derived from cancerous cells” if it originated from a tumor cell. Cell free DNA derived from cancerous cells includes ctDNA or circulating tumor DNA Tumor cells are neoplastic cells that originated from a tumor, regardless of whether they remain in the tumor or become separated from the tumor (as in the cases, e.g., of metastatic cancer cells and circulating tumor cells).

[0162] The “capture yield” of a collection of probes for a given target region set refers to the amount (e.g., amount relative to another target region set or an absolute amount) of nucleic acid corresponding to the target set that the collection of probes captures under typical conditions. Exemplary typical capture conditions are an incubation of the sample nucleic acid and probes at 65°C for 10-18 hours in a small reaction volume (about 20 pL) containing stringent hybridization buffer. The capture yield may be expressed in absolute terms or, for a plurality of collections of probes, relative terms. When capture yields for a plurality of sets ofAtty. Docket No. GH0193WO target regions are compared, they are normalized for the footprint size of the target region set (e.g., on a per-kilobase basis). Thus, for example, if the footprint sizes of first and second target regions are 50 kb and 500 kb, respectively (giving a normalization factor of 0.1), then the DNA corresponding to the first target region set is captured with a higher yield than DNA corresponding to the second target region set when the mass per volume concentration of the captured DNA corresponding to the first target region set is more than 0.1 times the mass per volume concentration of the captured DNA corresponding to the second target region set. As a further example, using the same footprint sizes, if the captured DNA corresponding to the first target region set has a mass per volume concentration of 0.2 times the mass per volume concentration of the captured DNA corresponding to the second target region set, then the DNA corresponding to the first target region set was captured with a two-fold greater capture yield than the DNA corresponding to the second target region set.

[0163] The term “methylation” or “DNA methylation” refers to addition of a methyl group to a nucleotide base in a nucleic acid molecule. In some embodiments, methylation refers to addition of a methyl group to a cytosine at a CpG site (cytosine-phosphate-guanine site (i.e., a cytosine followed by a guanine in a 5’ -> 3’ direction of the nucleic acid sequence)). In some embodiments, DNA methylation refers to addition of a methyl group to adenine, such as in N6-methyladenine (6mA). In some embodiments, DNA methylation is 5-methylation (modification of the carbon in the 5thposition of the cytosine ring). In some embodiments, 5- methylation refers to addition of a methyl group to the 5C position of the cytosine to create 5- methylcytosine (5mC). In some embodiments, methylation comprises a derivative of 5mC. Derivatives of 5mC include, but are not limited to, 5-hydroxymethylcytosine (5-hmC), 5- formylcytosine (5-fC), and 5-caryboxylcytosine (5-caC). In some embodiments, DNA methylation is 3C methylation (modification of the carbon in the 3rdposition of the cytosine ring). In some embodiments, 3C methylation comprises addition of a methyl group to the 3C position of the cytosine to generate 3 -methylcytosine (3mC). Methylation can also occur at non-CpG sites, for example, methylation can occur at a CpA, CpT, or CpC site. DNA methylation can change the activity of methylated DNA region. For example, when DNA in a promoter region is methylated, transcription of the gene may be repressed. DNA methylation is critical for normal development and abnormality in methylation may disrupt epigenetic regulation. The disruption, e.g., repression, in epigenetic regulation may cause diseases, such as cancer. Promoter methylation in DNA may be indicative of cancer.

[0164] The “modified nucleoside profile of DNA” means the position and identity of the nucleoside and the modification status of the nucleoside, such as methylations, within a DNAAtty. Docket No. GH0193WO sequence. As described above, different modification sensitive sequencing methods can be used to detect such modifications. This includes methods which involve conversion followed by sequencing detect one or more different types of modified or unmodified nucleoside. For example, the TAPS method detects, but does not distinguish between, 5-methylcytosine (5mC) and 5-hydroxymethyl-cytosine (5hmC). Hence, a method for analyzing the modified nucleoside profile of DNA in a sample typically means identifying particular modifications or groups of modification, such as 5mC and / or 5hmC. Modified nucleosides are identified according to the specific method / conversion procedure being used as described above. This generally involves comparing sequence data obtained from DNA that has been subjected to a conversion procedure to a reference sequence. Typically, the method involves (i) comparing the sequence data with (A) one or more pre-determined reference sequence; or (B) sequence data obtained by sequencing a sub-sample of the DNA that was not subjected to the conversion procedure, for example a subsample that was separated before subjecting a separate subsample to the conversion procedure, for example as described herein; and (ii) identifying point differences between the converted DNA sequences and the reference sequence(s) (A) or non-converted DNA sequences (B) as nucleosides (in the initial sample) having a modification status that permits a change in base pairing specificity on exposure to the conversion procedure.

[0165] As used herein, a modification or other feature is present in “a greater proportion” in a first sample or population of nucleic acid than in a second sample or population when the fraction of nucleotides with the modification or other feature is higher in the first sample or population than in the second population. For example, if in a first sample, one tenth of the nucleotides are mC, and in a second sample, one twentieth of the nucleotides are mC, then the first sample comprises the cytosine modification of 5-methylation in a greater proportion than the second sample.

[0166] As used herein, “without substantially altering base-pairing specificity” of a given nucleobase means that a majority of molecules comprising that nucleobase that can be sequenced do not have alterations of the base pairing specificity of the second nucleobase relative to its base pairing specificity as it was in the originally isolated sample. In some embodiments, 75%, 90%, 95%, or 99% of molecules comprising that nucleobase that can be sequenced do not have alterations of the base pairing specificity of the second nucleobase relative to its base pairing specificity as it was in the originally isolated sample.

[0167] As used herein, “modified cytosine” refers to a cytosine in which at least one position of the cytosine has been substituted with a chemical moiety, such as a methyl orAtty. Docket No. GH0193WO hydroxymethyl, that is different from the substituent at that position in unmodified cytosine. For the avoidance of doubt, “modified cytosine” does not include unmodified cytosine.

[0168] As used herein, a “combination” comprising a plurality of members refers to either of a single composition comprising the members or a set of compositions in proximity, e.g., in separate containers or compartments within a larger container, such as a multiwell plate, tube rack, refrigerator, freezer, incubator, water bath, ice bucket, machine, or other form of storage.

[0169] The term “hypermethylation” refers to an increased level or degree of methylation of nucleic acid molecule(s) relative to the other nucleic acid molecules within a population (e.g., sample) of nucleic acid molecules. In some embodiments, hypermethylated DNA can include DNA molecules comprising at least 1 methylated residue, at least 2 methylated residues, at least 3 methylated residues, at least 5 methylated residues, or at least 10 methylated residues.

[0170] As used herein, “type-specific hypermethylation” means an increased level or degree of methylation of nucleic acid molecules in at one cell or tissue type, or in a plurality of related cell or tissue types, relative to other cell or tissue types. In some embodiments, capturing, identification, and / or detection of type-specific hypermethylated regions facilitates identification of the cell or tissue type from which the nucleic acid molecules originated. The cell or tissue from which a type-specific hypermethylated region originated may be a wild type cell or tissue or a neoplastic cell or tissue.

[0171] The term “hypomethylation” refers to a decreased level or degree of methylation of nucleic acid molecule(s) relative to the other nucleic acid molecules within a population (e.g., sample) of nucleic acid molecules. In some embodiments, hypomethylated DNA includes unmethylated DNA molecules. In some embodiments, hypomethylated DNA can include DNA molecules comprising 0 methylated residues, at most 1 methylated residue, at most 2 methylated residues, at most 3 methylated residues, at most 4 methylated residues, or at most 5 methylated residues.

[0172] As used herein, “type-specific hypomethylation” means a decreased level or degree of methylation of nucleic acid molecules in at one cell or tissue type, or in a plurality of related cell or tissue types, relative to other cell or tissue types. In some embodiments, capturing, identification, and / or detection of type-specific hypomethylated regions facilitates identification of the cell or tissue type from which the nucleic acid molecules originated. The cell or tissue from which a type-specific hypomethylated region originated may be a wild type cell or tissue or a neoplastic cell or tissue.Atty. Docket No. GH0193WO

[0173] As used herein, “methylation status” can refer to the presence or absence of methyl group on a DNA base (e.g. cytosine) at a particular genomic position in a nucleic acid molecule. It can also refer to the degree of methylation in a nucleic acid sequence (e.g., highly methylated, low methylated, intermediately methylated or unmethylated nucleic acid molecules). The methylation status can also refer to the number of nucleotides methylated in a particular nucleic acid molecule.

[0174] As used herein, “mutation” refers to a variation from a known reference sequence and includes mutations such as, for example, single nucleotide variants (SNVs), and insertions or deletions (indels). A mutation can be a germline or somatic mutation. In some embodiments, a reference sequence for purposes of comparison is a wildtype genomic sequence of the species of the subject providing a test sample, typically the human genome.

[0175] As used herein, the terms “neoplasm” and “tumor” are used interchangeably. They refer to abnormal growth of cells in a subject. A neoplasm or tumor can be benign, potentially malignant, or malignant. A malignant tumor is referred to as a cancer or a cancerous tumor.

[0176] As used herein, “next-generation sequencing” or “NGS” refers to sequencing technologies having increased throughput as compared to traditional Sanger- and capillary electrophoresis-based approaches, for example, with the ability to generate hundreds of thousands of relatively small sequence reads at a time. Some examples of next-generation sequencing techniques include, but are not limited to, sequencing by synthesis, sequencing by ligation, and sequencing by hybridization. In some embodiments, next-generation sequencing includes the use of instruments capable of sequencing single molecules. Examples of commercially available instruments for performing next-generation sequencing include, but are not limited to, NextSeq, HiSeq, NovaSeq, MiSeq, Ion PGM and Ion GeneStudio S5.

[0177] As used herein, “nucleic acid tag” refers to a short nucleic acid (e.g., less than about 500 nucleotides, about 100 nucleotides, about 50 nucleotides, or about 10 nucleotides in length), used to distinguish nucleic acids from different samples (e.g., representing a sample index), distinguish nucleic acids from different partitions (e.g., representing a partition tag) or different nucleic acid molecules in the same sample (e.g., representing a molecular barcode), of different types, or which have undergone different processing. The nucleic acid tag comprises a predetermined, fixed, non-random, random or semi-random oligonucleotide sequence. Such nucleic acid tags may be used to label different nucleic acid molecules or different nucleic acid samples or sub-samples. Nucleic acid tags can be single-stranded, double-stranded, or at least partially double-stranded. Nucleic acid tags optionally have the same length or varied lengths. Nucleic acid tags can also include double-stranded moleculesAtty. Docket No. GH0193WO having one or more blunt-ends, include 5’ or 3’ single-stranded regions (e.g., an overhang), and / or include one or more other single-stranded regions at other locations within a given molecule. Nucleic acid tags can be attached to one end or to both ends of the other nucleic acids (e.g., sample nucleic acids to be amplified and / or sequenced). Nucleic acid tags can be decoded to reveal information such as the sample of origin, form, or processing of a given nucleic acid. For example, nucleic acid tags can also be used to enable pooling and / or parallel processing of multiple samples comprising nucleic acids bearing different molecular barcodes and / or sample indexes in which the nucleic acids are subsequently being deconvolved by detecting (e.g., reading) the nucleic acid tags. Nucleic acid tags can also be referred to as identifiers (e.g. molecular identifier, sample identifier). Additionally, or alternatively, nucleic acid tags can be used as molecular identifiers (e.g., to distinguish between different molecules or amplicons of different parent molecules in the same sample or sub-sample). This includes, for example, uniquely tagging different nucleic acid molecules in a given sample, or non- uniquely tagging such molecules. In the case of non-unique tagging applications, a limited number of tags (i.e., molecular barcodes) may be used to tag each nucleic acid molecule such that different molecules can be distinguished based on their endogenous sequence information (for example, start and / or stop positions where they map to a selected reference genome, a sub-sequence of one or both ends of a sequence, and / or length of a sequence) in combination with at least one molecular barcode. Typically, a sufficient number of different molecular barcodes are used such that there is a low probability (e.g., less than about a 10%, less than about a 5%, less than about a 1%, or less than about a 0.1% chance) that any two molecules may have the same endogenous sequence information (e.g., start and / or stop positions, subsequences of one or both ends of a sequence, and / or lengths) and also have the same molecular barcode. Terms such as “library adapters having distinct molecular barcodes” encompass library adapters for uniquely or non-uniquely tagging molecules, in that regardless of whether the adapters are for unique or non-unique tagging, distinct barcodes will be present in the population of adapters.

[0178] As used herein, DNA that is “not immobilized” or that is “free in solution” refers to DNA that is not bound covalently or non-covalently to a solid support, such as a bead. Such DNA may be free in solution during any step (such as all steps) of the disclosed methods.

[0179] The terms “agent that recognizes a modified nucleobase in DNA,” such as an “agent that recognizes a modified cytosine in DNA” refers to a molecule or reagent that binds to or detects one or more modified nucleobases in DNA, such as methyl cytosine. A “modified nucleobase” is a nucleobase that comprises a difference in chemical structure from anAtty. Docket No. GH0193WO unmodified nucleobase. In the case of DNA, an unmodified nucleobase is adenine, cytosine, guanine, or thymine. In some embodiments, a modified nucleobase is a modified cytosine. In some embodiments, a modified nucleobase is a methylated nucleobase. In some embodiments, a modified cytosine is a methyl cytosine, e.g., a 5-methyl cytosine. In such embodiments, the cytosine modification is a methyl. Agents that recognize a methyl cytosine in DNA include but are not limited to “methyl binding reagents,” which refer herein to reagents that bind to a methyl cytosine. Methyl binding reagents include but are not limited to methyl binding domains (MBDs) and methyl binding proteins (MBPs). In some such embodiments, the DNA may be single-stranded or double-stranded. Suitable agents include agents that recognize modified nucleotides in double-stranded DNA, single-stranded DNA, and both double-stranded and single-stranded DNA.

[0180] As used herein, “polynucleotide”, “nucleic acid”, “nucleic acid molecule”, or “oligonucleotide” refers to a linear polymer of nucleosides (including deoxyribonucleosides, ribonucleosides, or analogs thereof) joined by inter-nucleosidic linkages. Typically, a polynucleotide comprises at least three nucleosides. Oligonucleotides often range in size from a few monomeric units, e.g., 3-4, to hundreds of monomeric units. Whenever a polynucleotide is represented by a sequence of letters, such as “ATGCCTG”, the nucleotides are in 5’ - 3’ order from left to right, and in the case of DNA, “A” denotes deoxyadenosine, “C” denotes deoxycytidine, “G” denotes deoxyguanosine, and “T” denotes deoxythymidine, unless otherwise noted. The letters A, C, G, and T may be used to refer to the bases themselves, to nucleosides, or to nucleotides comprising the bases.

[0181] As used herein, “processing” refers to a set of steps used to generate a library of nucleic acids that is suitable for sequencing. The set of steps can include, but are not limited to, partitioning, end repairing, addition of sequencing adapters, tagging, and / or PCR amplification of nucleic acids.

[0182] As used herein, “quantitative measure” refers to an absolute or relative measure. A quantitative measure can be, without limitation, a number, a statistical measurement (e.g., frequency, mean, median, standard deviation, or quantile), or a degree or a relative quantity (e.g., high, medium, and low). A quantitative measure can be a ratio of two quantitative measures. A quantitative measure can be a linear combination of quantitative measures. A quantitative measure may be a normalized measure.

[0183] As used herein, “reference sequence” refers to a known sequence used for purposes of comparison with experimentally determined sequences. For example, a known sequence can be an entire genome, a chromosome, or any segment thereof. A reference sequence can alignAtty. Docket No. GH0193WO with a single contiguous sequence of a genome or chromosome or chromosome arm or can include non-contiguous segments that align with different regions of a genome or chromosome. Examples of reference sequences include, for example, human genomes, such as, hgl9 and hg38.

[0184] As used herein, “sample” means anything capable of being analyzed by the methods and / or systems disclosed herein.

[0185] As used herein, an “original sample” is a sample (e.g., of blood, plasma, or serum) as originally obtained from a source, such as a subject, tissue, or cell.

[0186] As used herein, a sample is a “proportionately methylated sample relative to an original sample” when it is an aliquot of the original sample or otherwise has not undergone manipulations that selectively enrich, capture, or deplete DNA molecules based on methylation or lack thereof. Proportionately methylated samples include samples or aliquots that were purified or isolated in a manner that does not substantially discriminate on the basis of methylation, such as ion exchange chromatography, size fractionation, hydrophobic interaction chromatography, target capture that is independent of methylation status (e.g., target capture performed prior to base conversion or that does not discriminate based on base conversion), or other techniques that do not rely on specific binding to methylated DNA or specific binding to unmethylated DNA. “Proportionately methylated samples,” as used herein, may have undergone processing steps such as end repair and adapter ligation (including end repair and adapter ligation using nucleotides and / or adapters comprising methylation or other modifications). In contrast, subsamples resulting from partitioning on the basis of methylation, e.g., using MBD or an antibody specific for methylated cytosine, are not proportionately methylated samples relative to their original sample.

[0187] As used herein, “sequencing” refers to any of a number of technologies used to determine the sequence (e.g., the identity and order of monomer units) of a biomolecule, e.g., a nucleic acid such as DNA or RNA. Examples of sequencing methods include, but are not limited to, targeted sequencing, single molecule real-time sequencing, exon or exome sequencing, intron sequencing, electron microscopy-based sequencing, panel sequencing, transistor-mediated sequencing, direct sequencing, random shotgun sequencing, Sanger dideoxy termination sequencing, whole-genome sequencing, sequencing by hybridization, pyrosequencing, duplex sequencing, cycle sequencing, single-base extension sequencing, solid-phase sequencing, high-throughput sequencing, massively parallel signature sequencing, emulsion PCR, co-amplification at lower denaturation temperature-PCR (COLD- PCR), multiplex PCR, sequencing by reversible dye terminator, paired-end sequencing, nearAtty. Docket No. GH0193WO term sequencing, exonuclease sequencing, sequencing by ligation, short-read sequencing, single-molecule sequencing, sequencing-by-synthesis, real-time sequencing, reverseterminator sequencing, nanopore sequencing, 454 sequencing, Solexa Genome Analyzer sequencing, SOLiD™ sequencing, MS-PET sequencing, and a combination thereof. In some embodiments, sequencing can be performed by a gene analyzer such as, for example, gene analyzers commercially available from Illumina, Inc., Pacific Biosciences, Inc., or Applied Biosystems / Thermo Fisher Scientific, among many others.

[0188] As used herein, “sequence information” in the context of a nucleic acid polymer means the order and identity of monomer units (e.g., nucleotides, etc.) in that polymer.

[0189] As used herein, the terms “somatic mutation” or “somatic variation” are used interchangeably. They refer to a mutation in the genome that occurs after conception. Somatic mutations can occur in any cell of the body except germ cells and accordingly, are not passed on to progeny.

[0190] As used herein, “subject” refers to an animal, such as a mammalian species (e.g., human) or avian (e.g., bird) species, or other organism, such as a plant. More specifically, a subject can be a vertebrate, e.g., a mammal such as a mouse, a primate, a simian or a human. Animals include farm animals (e.g., production cattle, dairy cattle, poultry, horses, pigs, and the like), sport animals, and companion animals (e.g., pets or support animals). A subject can be a healthy individual, an individual that has or is suspected of having a disease or a predisposition to the disease, or an individual in need of therapy or suspected of needing therapy. The terms “individual” or “patient” are intended to be interchangeable with “subject”. For example, a subject can be an individual who has been diagnosed with having a cancer, is going to receive a cancer therapy, and / or has received at least one cancer therapy. The subject can be in remission of a cancer. As another example, the subject can be an individual who is diagnosed of having an autoimmune disease. As another example, the subject can be a female individual who is pregnant or who is planning on getting pregnant, who may have been diagnosed of or suspected of having a disease, e.g., a cancer, an autoimmune disease.

[0191] As used herein, “tumor fraction” refers to the proportion of cfDNA molecules that originated from tumor cells for a given sample, or sample-region pair.

[0192] As used herein, an “asymmetric adapter” is a double stranded adapter in which the two strands are not completely complementary or are otherwise distinguishable such that synthesis of a complementary sequence of one strand of the adapter results in a sequence thatAtty. Docket No. GH0193WO is distinguishable from the sequence of the other strand of the adapter. Examples of asymmetric adapters are Y-shaped adapters and bubble adapters.

[0193] As used herein, a “Y-shaped adapter” refers to an adapter comprising two DNA strands comprising complementary and non-complementary parts, wherein the non- complementary parts form single-stranded arms. The adapter can be attached to a sample or insert DNA molecule, e.g., by ligation, such that the complementary (double-stranded) part of the adapter is proximal to the sample or insert DNA molecule. Prior to attachment, the double stranded portion of the Y-shaped adapter may have a blunt end or an overhang, e.g., of one to three nucleotides. The single stranded arms may or may not be of identical length.

[0194] As used herein, a “bubble adapter” refers to an adapter comprising two DNA strands comprising a non-complementary part flanked by complementary parts, such that the adapter has a single stranded region located between double-stranded regions. The adapter can be attached to a sample or insert DNA molecule, e.g., by ligation, such that one of the complementary (double-stranded) parts of the adapter is proximal to the sample or insert DNA molecule. Prior to attachment, the double stranded portion of the Y-shaped adapter that would be attached to the insert or sample molecule may have a blunt end or an overhang, e.g., of one to three nucleotides. The single stranded portions of the two strands may or may not be of identical length.

[0195] The terms “or a combination thereof’ and “or combinations thereof’ as used herein refers to any and all permutations and combinations of the listed terms preceding the term. For example, “A, B, C, or combinations thereof’ is intended to include at least one of: A, B, C, AB, AC, BC, or ABC, and if order is important in a particular context, also BA, CA, CB, ACB, CBA, BCA, BAC, or CAB. Continuing with this example, expressly included are combinations that contain repeats of one or more item or term, such as BB, AAA, AAB, BBC, AAABCCCC, CBBAAA, CAB ABB, and so forth. The skilled artisan will understand that typically there is no limit on the number of items or terms in any combination, unless otherwise apparent from the context.

[0196] “Buffy coat” refers to the portion of a blood (such as whole blood) or bone marrow sample that contains all or most of the white blood cells and platelets of the sample. The buffy coat fraction of a sample can be prepared from the sample using centrifugation, which separates sample components by density. For example, following centrifugation of a whole blood sample, the buffy coat fraction is situated between the plasma and erythrocyte (red blood cell) layers. The buffy coat can contain both mononuclear e.g., T cells, B cells, NKAtty. Docket No. GH0193WO cells, dendritic cells, and monocytes) and polymorphonuclear e.g., granulocytes such as neutrophils and eosinophils) white blood cells.

[0197] As used herein, “leukapheresis” refers to a procedure in which white blood cells (leukocytes) are isolated from a sample of blood collected from a subject. Leukapheresis may be performed, e.g., obtain cells for research, diagnostic, prognostic, or monitoring purposes, such as those described herein. Thus, as used herein, a “leukapheresis sample” refers to a sample comprising leukocytes collected from a subject using leukapheresis.

[0198] As used herein, “peripheral blood mononuclear cells” or “PBMCs” refers to immune cells having a single, round nucleus that originate in bone marrow and are found in the peripheral circulation. Such cells include, e.g., lymphocytes (T cells, B cells, and NK cells) as well as monocytes, and are isolated from blood samples (such as from a whole blood sample collected from a subject) using density gradient centrifugation.

[0199] As used herein, “amplify,” “amplifying,” or “amplification” refers to a process by which extra or multiple copies of a particular polynucleotide are formed. Amplification methods can include any suitable methods known in the art. As used herein, a nucleic acid molecule amplified using “methylation-preserving amplification” substantially maintains its methylation status post-amplification.

[0200] A “X1WOTX2 mutation” in a specified polypeptide as used herein, where Xi and X2 are amino acids and nnn is a position in an amino acid sequence, refers to a substitution in the polypeptide of amino acid Xi present at position nnn of the full-length wild-type polypeptide with amino acid X2. The polypeptide is the human polypeptide unless indicated otherwise. The polypeptide comprising the X1 / 7 / 7 / 7X2 mutation may, but does not necessarily, comprise additional differences from the wild-type sequence, including but not limited to truncations and deletions as well as other substitutions. For example, a “T1372S mutation” in TET2 refers to a substitution in a TET2 enzyme of the threonine present at position 1372 of the full- length wild-type human TET2 enzyme with a serine. Position 1372 of wild-type human TET2 aligns to position 258 and 248, respectively, of the truncated TET2 sequences disclosed as SEQ ID NOs: 23 and 24 of US Patent 10,961,525. Similarly, a “V1900X2 mutation” where X2 is A, C, G, I, or P in TET2 refers to a substitution in a TET2 enzyme of the valine present at position 1900 of the full-length wild-type human TET2 enzyme with an alanine, cysteine, glycine, isoleucine, or proline.

[0201] “ Or” is used in the inclusive sense, i.e., equivalent to “and / or,” unless the context requires otherwise.Atty. Docket No. GH0193WO

[0202] The term “methylation-dependent nuclease” refers to a nuclease that preferentially cuts methylated DNA relative to unmethylated DNA. For example, a methylation-dependent nuclease may cut at or near a recognition sequence such as a restriction site in a manner dependent on methylation of at least one of the nucleobases in the recognition sequence, such as a cytosine. In some embodiments, the nucleolytic activity of the methylation-dependent nuclease is at least 10, 20, 50, or 100-fold higher on a methylated recognition site relative to an unmethylated control in a standard nucleolysis assay. Methylation-dependent nucleases include methylation-dependent restriction enzymes.

[0203] As used herein, “methylation-dependent restriction enzyme” or “MDRE” refers to a restriction enzyme that is dependent on methylation of the DNA (e.g. cytosine methylation) i.e., the presence or absence of methyl group in a nucleotide base alters the rate at which the enzyme cleaves the target DNA. In some embodiments, the methylation dependent restriction enzymes do not cleave the DNA if a particular nucleotide base is unmethylated at the recognition sequence. For example, MspJI is a methylation dependent restriction enzyme with a recognition sequence “mCNNR(N9)” and it does not cleave DNA if the absence of the methylated cytosine (mC) in the recognition sequence.

[0204] The term “methylation-sensitive nuclease” refers to a nuclease that preferentially cuts unmethylated DNA relative to methylated DNA. For example, a methylation-sensitive nuclease may cut at or near a recognition sequence such as a restriction site in a manner dependent on lack of methylation of at least one of the nucleobases in the recognition sequence, such as a cytosine. In some embodiments, the nucleolytic activity of the methylation-sensitive nuclease is at least 10, 20, 50, or 100-fold higher on an unmethylated recognition site relative to a methylated control in a standard nucleolysis assay. Methylationsensitive nucleases include methylation- sensitive restriction enzymes.

[0205] As used herein, “methylation sensitive restriction enzyme” or “MSRE” refers to a restriction enzyme that is sensitive to the methylation status of the DNA (e.g. cytosine methylation) i.e., the presence or absence of methyl group in a nucleotide base alters the rate at which the enzyme cleaves the target DNA. In some embodiments, the methylation sensitive restriction enzymes do not cleave the DNA if a particular nucleotide base is methylated at the recognition sequence. For example, Hpall is a methylation sensitive restriction enzyme with a recognition sequence “CCGG” and it does not cleave DNA if the second cytosine in the recognition sequence is methylated.

[0206] As used herein, “restriction enzyme” is an enzyme that recognizes and cleaves the DNA at or near a specific recognition site.Atty. Docket No. GH0193WOII.Exemplary methodsA. Overview

[0207] Cancer formation and progression may arise from both genetic modification and epigenetic features of deoxyribonucleic acid (DNA). The present disclosure provides methods and systems for analyzing DNA, such as cell-free DNA (cfDNA) and / or for analyzing epigenetic and / or sequence-variable target regions. The present disclosure provides methods and systems for reducing signal to noise ratio of methylation partitioning assays.

[0208] Without wishing to be bound by any particular theory, cells in or around a cancer or neoplasm may shed more DNA than cells of the same tissue type in a healthy subject. As such, the distribution of tissue of origin of certain DNA samples, such as cfDNA, may change upon carcinogenesis. Thus, for example, an increase in the level of hypermethylation variable target regions that show lower methylation in healthy cfDNA than in at least one other tissue type can be an indicator of the presence (or recurrence, depending on the history of the subject) of cancer. Similarly, an increase in the level of hypomethylation variable target regions in the sample can be an indicator of the presence (or recurrence, depending on the history of the subject) of cancer.

[0209] Additionally, cancer can be indicated by non-sequence modifications, such as methylation. Examples of methylation changes in cancer include local gains of DNA methylation in the CpG islands at the TSS of genes involved in normal growth control, DNA repair, cell cycle regulation, and / or cell differentiation. This hypermethylation can be associated with an aberrant loss of transcriptional capacity of involved genes and occurs at least as frequently as point mutations and deletions as a cause of altered gene expression.

[0210] Thus, DNA methylation profiling can be used to detect aberrant methylation in DNA of a sample. The DNA can correspond to certain genomic regions (“differentially methylated regions” or “DMRs”) that are normally hypermethylated or hypomethylated in a given sample type (e.g., cfDNA from the bloodstream) but which may show an abnormal degree of methylation that correlates to a neoplasm or cancer, e.g., because of unusually increased contributions of tissues to the type of sample (e.g., due to increased shedding of DNA in or around the neoplasm or cancer) and / or from extents of methylation of the genome that are altered during development or that are perturbed by disease, for example, cancer or any cancer-associated disease.

[0211] In some embodiments, DNA methylation comprises addition of a methyl group to a cytosine residue at a CpG site (cytosine-phosphate-guanine site (i.e., a cytosine followed by a guanine in a 5’ -> 3’ direction of the nucleic acid sequence). In some embodiments, DNAAtty. Docket No. GH0193WO methylation comprises addition of a methyl group to an adenine residue, such as in N6- methyladenine. In some embodiments, DNA methylation is 5-methylation (modification of the carbon in the 5thposition of the cytosine ring). In some embodiments, 5-methylation comprises addition of a methyl group to the 5C position of the cytosine residue to create 5- methylcytosine (m5c or 5-mC or 5mC). In some embodiments, methylation comprises a derivative of m5c. Derivatives of m5c include, but are not limited to, 5- hydroxymethylcytosine (5-hmC or 5hmC), 5-formylcytosine (5-fC), and 5-caryboxylcytosine (5-caC). In some embodiments, DNA methylation is 3C methylation (modification of the carbon in the 3rdposition of the ring of the cytosine ring). In some embodiments, 3C methylation comprises addition of a methyl group to the 3C position of the cytosine residue to generate 3 -methylcytosine (3mC). Methylation can also occur at non-CpG sites, for example, methylation can occur at a CpA, CpT, or CpC site. DNA methylation can change the activity of methylated DNA region. For example, when DNA in a promoter region is methylated, transcription of the gene may be repressed. DNA methylation is critical for normal development and abnormality in methylation may disrupt epigenetic regulation. The disruption, e.g., repression, in epigenetic regulation may cause diseases, such as cancer. Promoter methylation in DNA may be indicative of cancer.

[0212] Methylation profiling can involve determining methylation patterns across different regions of the genome. For example, after sequencing, the sequences of molecules can be mapped to a reference genome. This can show regions of the genome that, compared with other regions, are more highly methylated or are less highly methylated. In this way, genomic regions, in contrast to individual molecules, may differ in their extent of methylation.

[0213] Combining the signals obtained from methylation profiling with the signals obtained from somatic variations (e.g., SNV, indel, CNV, and gene fusions) facilitate the detection of cancer.

[0214] Many commercialized methods and methods undergoing development target specific cancer changes that occur in early-stage cancers and precancers. However, analysis of DNA, including copy number variants, methylation detection accuracy, molecular recovery, and coverage uniformity, can be improved in these methods (e.g., single-site methylation assays), which would lead to improved clinical assay performance and / or assay cost reduction.

[0215] FIG. 3 illustrates an example embodiment of a method for detecting the presence or absence of cancer in a subject according to an embodiment of the disclosure. In some embodiments, a polynucleotide sample is obtained from the subject. In some embodiments, the polynucleotide sample is a DNA sample obtained from a tumor tissue biopsy. In someAtty. Docket No. GH0193WO embodiments, the polynucleotides sample is a cell-free DNA (cfDNA) sample obtained from blood. In some embodiments, the sample comprises cfDNA molecules belonging to hypermethylation variable target regions (Hyper DMR), lowly methylated normal regions, and unmethylated normal regions. In some embodiments, a sample is a proportionately methylated sample relative to an original sample, such as an aliquot of the original sample or a sample that has not undergone manipulations that selectively enrich, capture, or deplete DNA molecules based on methylation or lack thereof. Proportionately methylated samples include samples or aliquots that were purified or isolated in a manner that does not substantially discriminate on the basis of methylation, such as ion exchange chromatography, size fractionation, hydrophobic interaction chromatography, target capture that is independent of methylation status (e.g., target capture performed prior to base conversion or that does not discriminate based on base conversion), or other techniques that do not rely on specific binding to methylated DNA or specific binding to unmethylated DNA. Proportionately methylated samples may have undergone processing steps such as end repair and adapter ligation (including end repair and adapter ligation using nucleotides and / or adapters comprising methylation or other modifications). In some embodiments, the sample is not a subsample resulting from partitioning on the basis of methylation, e.g., using MBD or an antibody specific for methylated cytosine. These are not considered not proportionately methylated samples.

[0216] In contrast to the methods illustrated in FIG. 3, FIG. 1 illustrates a comparative MSRE-sequencing method for determining the methylation status of nucleic acid molecules in a polynucleotide sample obtained from a subject. In MSRE-sequencing, MSRE cleaves abundant unmethylated and some lowly methylated cfDNA molecules from a sample. MSRE-sequencing enriches methylated molecules, but uncleaved partially methylated molecules with a methylated MSRE site lead to a low signal-to-noise ratio. MSRE- sequencing can create false positive cancer detection because of such un cleaved partially methylated molecules, including in samples from subjects without cancer, with a methylated MSRE site. Additionally in contrast to the methods illustrated in FIG. 3, FIG. 2 illustrates a comparative single-site methylation sequencing (SSM) method for determining the methylation status of nucleic acid molecules in a polynucleotides sample obtained from a subject. SSM sequencing can accurately resolve DNA molecules derived from a cancerous sample and normal DNA molecules; however, SSM has high sequencing costs because unmethylated molecules are sequenced in the assay. The method illustrated in FIG. 3, according to an embodiment of the disclosure, advantageously avoids sequencingAtty. Docket No. GH0193WO unmethylated molecules and also accurately resolves the methylation status of the sequenced nucleic molecules in a polynucleotides sample.

[0217] In some embodiments, the nucleic acid molecules of the sample are digested with at least one methylation-sensitive restriction enzyme (MSRE). In some embodiments, the MSRE digestion removes abundant unmethylated and some lowly methylated cfDNA molecules from a sample.

[0218] In some embodiments, the genomic regions of interest can comprise differentially methylated regions for cancer detection. In some embodiments, at least a subset of the nucleic acid molecules undergoes single-site resolution methylation sequencing (SSM). In some embodiments, the SSM uses a next generation sequencer. In some embodiments, the sequencing reads generated by the sequencer are then analyzed using bioinformatic tools / algorithms to determine the methylation status at one or more genetic loci of the nucleic acid molecules . In some embodiments, the one or more genetic loci can comprise multiple genetic loci. In some embodiments, the one or more genetic loci can comprise one or more genomic regions. In some embodiments, the genomic regions can be promoter region of genes. In some embodiments, prior to sequencing, the nucleic acid molecules can be amplified via PCR amplification. In some embodiments, the primers used in the amplification can comprise at least one sample index.

[0219] In some embodiments, the method can further comprise detecting the presence or absence of cancer in the subject, e.g., based on the methylation status at one or more genetic loci of the nucleic acid molecules. In some embodiments, the method further comprises determining a level of DNA from tumor cells in the polynucleotide sample.

[0220] FIG. 4 illustrates an exemplary workflow, e.g., that can be used to detect the presence or absence of cancer, according to certain embodiments of the disclosure beginning with a cfDNA sample, in which cfDNA is isolated from the blood sample and the cfDNA sample. The cfDNA undergoes end-repair and A-tailing reactions; the end-repaired cfDNA is ligated to an adapter that is protected from a conversion step; the ligated cfDNA is digested with one or more MSREs, cleaving unmethylated cfDNA molecules at the RE recognition site; the digested cfDNA is converted with an SSM process, and then converted samples are amplified, optionally enriched for the target sequences, and sequenced using NGS.

[0221] The present disclosure provides methods and systems for analyzing DNA, such as cell-free DNA (cfDNA), in a sample. The disclosed methods can comprise contacting the sample with a methylation-discriminating nuclease, thereby degrading methylated or unmethylated DNA to produce a treated sample, wherein the sample is an original sample orAtty. Docket No. GH0193WO is a proportionately methylated sample relative to an original sample, and subjecting the treated sample to a procedure that affects a first nucleobase differently from a second nucleobase, wherein the first nucleobase is a modified or unmodified nucleobase, the second nucleobase is a modified or unmodified nucleobase different from the first nucleobase, and the first nucleobase and the second nucleobase have the same base pairing specificity, and wherein the procedure alters the base pairing specificity of the first or second nucleobase to produce a treated and converted sample. The disclosed methods can comprise subjecting the sample to a procedure that affects a first nucleobase differently from a second nucleobase, wherein the sample is an original sample or is a proportionately methylated sample relative to an original sample, wherein the first nucleobase is a modified or unmodified nucleobase, the second nucleobase is a modified or unmodified nucleobase different from the first nucleobase, and the first nucleobase and the second nucleobase have the same base pairing specificity, and wherein the procedure alters the base pairing specificity of the first or second nucleobase to produce a converted sample and contacting the converted sample with a methylation-discriminating nuclease, thereby degrading methylated or unmethylated DNA to produce a treated and converted sample. Degrading methylated or unmethylated DNA can improve the efficiency of downstream analyses, such as library preparation, sequencing, and subsequent data analysis, e.g., analyzing the sequencing data to quantify a level of methylation at one or more differentially methylated regions of the DNA, such that resources used downstream, e.g., during sequencing and analysis, can be more efficiently focused on methylated DNA.

[0222] Some embodiments of the disclosed methods of analyzing DNA in a sample comprise: (a) contacting the sample with a methylation-discriminating nuclease, thereby degrading methylated or unmethylated DNA to produce a treated sample, wherein the sample is an original sample or is a proportionately methylated sample relative to an original sample; and (b) subjecting the treated sample to a procedure that affects a first nucleobase differently from a second nucleobase, wherein the first nucleobase is a modified or unmodified nucleobase, the second nucleobase is a modified or unmodified nucleobase different from the first nucleobase, and the first nucleobase and the second nucleobase have the same base pairing specificity, and wherein the procedure alters the base pairing specificity of the first or second nucleobase to produce a treated and converted sample. Some embodiments of the disclosed methods of analyzing DNA in a sample comprise: (a) subjecting the sample to a procedure that affects a first nucleobase differently from a second nucleobase, wherein the sample is an original sample or is a proportionately methylated sample relative to an originalAtty. Docket No. GH0193WO sample, wherein the first nucleobase is a modified or unmodified nucleobase, the second nucleobase is a modified or unmodified nucleobase different from the first nucleobase, and the first nucleobase and the second nucleobase have the same base pairing specificity, and wherein the procedure alters the base pairing specificity of the first or second nucleobase to produce a converted sample; and (b) contacting the converted sample with a methylationdiscriminating nuclease, thereby degrading methylated or unmethylated DNA to produce a treated and converted sample.

[0223] In some embodiments, the DNA comprises cell-free DNA (cfDNA). In some embodiments, the sample is a tissue sample. In some embodiments, the sample is a blood sample. In some embodiments, the blood sample is a whole blood sample, a plasma sample, a buffy coat sample, a leukapheresis sample, or a peripheral blood mononuclear cell (PBMC) sample.

[0224] In some embodiments, the methylation-discriminating nuclease is a methylationdependent restriction enzyme (MDRE). In some embodiments, the MDRE cleaves a methylated CpG sequence. In some embodiments, the MDRE is one or more of MspJI, LpnPI, FspEI, or McrBC.

[0225] In some embodiments, methylation-discriminating nuclease is a methylation-sensitive restriction enzyme (MSRE). In some embodiments, the MSRE cleaves an unmethylated CpG sequence. In some embodiments, the MSRE is one or more of Aatll, AccII, Acil, Aorl3HI, Aorl5HI, BspT104I, BssHII, BstUI, CfrlOI, Clal, Cpol, Eco52I, Haell, HapII, Hhal, Hin6I, Hpall, HpyCH4IV, Mlul, Nael, Notl, Nrul, Nsbl, PmaCI, Psp 14061, Pvul, SacII, Sall, Smal, and SnaBI.

[0226] In some embodiments, the procedure that affects a first nucleobase in the DNA differently from a second nucleobase comprises a conversion procedure.

[0227] In some embodiments, the first nucleobase is an unmodified cytosine and the second nucleobase is a modified cytosine, optionally wherein the modified cytosine is 5- methylcytosine or 5-hydroxymethylcytosine.

[0228] In some embodiments, the procedure that affects a first nucleobase of the DNA differently from a second nucleobase of the DNA is methylation-sensitive conversion. In some embodiments, the methylation-sensitive conversion is bisulfite conversion, oxidative bisulfite (Ox-BS) conversion, Tet-assisted bisulfite (TAB) conversion, APOBEC-coupled epigenetic (ACE) conversion, enzymatic methyl-seq (EM-seq) conversion, single-enzyme 5- methylcytosine sequencing (SEM-seq) conversion, direct methylation sequencing (DM-seq), Tet-assisted pyridine borane sequencing (TAPS), or Tet-assisted pyridine borane sequencingAtty. Docket No. GH0193WO with protection of 5hmC (TAPS-P). In some embodiments, the Tet-assisted conversion further comprises a substituted borane reducing agent, optionally wherein the substituted borane reducing agent is 2-picoline borane, borane pyridine, tert-butyl amine borane, or ammonia borane.

[0229] In some embodiments, the procedure that affects a first nucleobase of the DNA differently from a second nucleobase procedure comprises contacting the DNA with a CpG- specific DNA methyltransferase (MTase) or a CpG-specific carboxymethyltransferase (CxMTase), a methyl donor or a carboxymethyl donor, and a cytosine deaminase. In some embodiments, the cytosine deaminase is an APOBEC enzyme, optionally wherein the APOB EC enzyme is APOBEC3A.

[0230] In some embodiments, the procedure that affects a first nucleobase in the DNA differently from a second nucleobase comprises bisulfite conversion; the contacting the sample with the methylation-discriminating nuclease occurs prior to bisulfite conversion; and the methylation-discriminating nuclease is a methylation-sensitive restriction enzyme (MSRE).

[0231] In some embodiments, the procedure that affects a first nucleobase in the DNA differently from a second nucleobase comprises direct methylation sequencing (DM-seq); the contacting the sample with the methylation-discriminating nuclease occurs prior to direct methylation sequencing (DM-seq); and the methylation-discriminating nuclease is a methylation-sensitive restriction enzyme (MSRE).

[0232] In some embodiments, the procedure that affects a first nucleobase in the DNA differently from a second nucleobase comprises APOBEC-coupled epigenetic (ACE) conversion; the contacting the sample with the methylation-discriminating nuclease occurs prior to contacting the sample with APOBEC; and the methylation-discriminating nuclease is a methylation-sensitive restriction enzyme (MSRE).

[0233] In some embodiments, the procedure that affects a first nucleobase in the DNA differently from a second nucleobase comprises enzymatic methyl-seq (EM-seq) conversion, wherein EM-seq comprises a contacting a sample with APOBEC; the contacting the sample with the methylation-discriminating nuclease occurs prior to contacting the sample with APOBEC; and the methylation-discriminating nuclease is a methylation-sensitive restriction enzyme (MSRE).

[0234] In some embodiments, the procedure that affects a first nucleobase in the DNA differently from a second nucleobase comprises single-enzyme 5 -methylcytosine sequencing (SEM-seq) conversion; the contacting the sample with the methylation-discriminatingAtty. Docket No. GH0193WO nuclease occurs prior to single-enzyme 5 -methylcytosine sequencing (SEM-seq) conversion; and the methylation-discriminating nuclease is a methylation-sensitive restriction enzyme (MSRE).

[0235] In some embodiments, the method further comprises ligating one or more adapters to the DNA, thereby producing adapter-ligated DNA. In some embodiments, at least one cytosine in the one or more adapters is an unmodified cytosine, optionally wherein each cytosine in the one or more adapters is an unmodified cytosine. In some embodiments, at least one cytosine in the one or more adapters is a modification resistant cytosine, optionally wherein each cytosine in the one or more adapters is a modification resistant cytosine. In some embodiments, the modification resistant cytosine is a deaminase resistant cytosine. In some embodiments, wherein the deaminase resistant cytosine is 5-propynylC (5pyC), 5- pyrrolo-dC (5pyrC), 5-hydroxymethylcytosine (5hmC), glucosylated5- hydroxymethylcytosine (5ghmC), cytosine 5-methylenesulfonate (CMS), or N4-modified cytosine.

[0236] In some embodiments, the one or more adapters are Y-shaped adapters. In some embodiments, the one or more adapters comprise molecular barcodes.

[0237] In some embodiments, the one or more adapters is resistant to digestion by the methylation-discriminating nuclease.

[0238] In some embodiments, the methylation-discriminating nuclease is a methylationsensitive restriction enzyme (MSRE) and the one or more adapters that is resistant to digestion by the MSRE: comprises one or more methylated nucleotides, optionally wherein the methylated nucleotides comprise 5 -methylcytosine and / or 5-hydroxymethylcytosine; comprises one or more nucleotide analogs resistant to methylation sensitive restriction enzymes; or does not comprise a nucleotide sequence recognized by the MSRE.

[0239] In some embodiments, the methylation-discriminating nuclease is a methylationdependent restriction enzyme (MDRE) and wherein the one or more adapters that is resistant to digestion by the MDRE: comprises one or more unmethylated nucleotides; comprises one or more nucleotide analogs resistant to methylation dependent restriction enzymes; or does not comprise a nucleotide sequence recognized by the MDRE.

[0240] In some embodiments, the ligating one or more adapters to the DNA occurs prior to subjecting the sample or the treated sample to a procedure that affects a first nucleobase differently from a second nucleobase.

[0241] The disclosed methods can be combined with analysis of one or more additional biomarkers. In some embodiments, the disclosed methods are combined with one or moreAtty. Docket No. GH0193WO methods, such as but not limited to, methods for assessing DNA methylation patterns, DNA mutations (such as somatic mutations), nucleic acid fragmentation patterns, non-coding RNA (such as micro RNAs (miRNAs), ribosomal RNAs, transfer RNAs, small nucleolar RNAs (snow RNAs), and / or small nuclear RNAs (snRNAs)) levels, and / or cell type proportions / levels, cellular locations, and / or structural modifications of one or more proteins (such as in a sample from a subject). In some embodiments, the disclosed methods are combined with one or more analyses of genetic variations including mutations, rare mutations, indels, rearrangements, copy number variations, transversions, translocations, recombinations, inversion, deletions, aneuploidy, partial aneuploidy, polyploidy, chromosomal instability, chromosomal structure alterations, gene fusions, chromosome fusions, gene truncations, gene amplification, gene duplications, chromosomal lesions, DNA lesions, abnormal changes in nucleic acid chemical modifications, abnormal changes in epigenetic patterns, and / or abnormal changes in nucleic acid 5-methylcytosine.B. Partitioning

[0242] In some embodiments, a heterogeneous nucleic acid sample is separated. In some embodiments, the separating comprises separating at least a portion of the treated sample or the treated and converted sample. In some embodiments, sequence-variable target regions and / or epigenetic target regions can be separated from the treated sample or the treated and converted sample. In some embodiments, sequence-variable target regions and epigenetic target regions can be separated from the treated sample or the treated and converted sample. In some embodiments, sequence-variable target regions can be separated from the treated sample or the treated and converted sample. In some embodiments, epigenetic target regions can be separated from the treated sample or the treated and converted sample.

[0243] In some embodiments, the separating uses the label to separate the sequence-variable target regions and / or epigenetic target regions from the DNA of the treated sample or the treated and converted sample. In some embodiments, the DNA is rendered single stranded prior to the separating.

[0244] The separating step can occur after contacting the sample with a methylationdiscriminating nuclease. The separating step can occur after contacting the converted sample with a methylation-discriminating nuclease.

[0245] In some embodiments, separating comprises partitioning. In some instances, a heterogeneous nucleic acid sample is partitioned into two or more partitions (sub-samples). In some embodiments, each partition is differentially tagged. Tagged partitions can then be pooled together for collective sample prep and / or sequencing. The partitioning-tagging-Atty. Docket No. GH0193WO pooling steps can occur more than once, with each round of partitioning occurring based on different characteristics and tagged using differential tags that are distinguished from other partitions and partitioning means. In some embodiments, the separating comprises partitioning the DNA in the sample into a plurality of partitioned subsamples. In some embodiments, the plurality of partitioned subsamples comprises a first subsample and a second subsample.

[0246] The partitioning step can occur after contacting the sample with a methylationdiscriminating nuclease. The partitioning step can occur after contacting the converted sample with a methylation-discriminating nuclease.

[0247] In some embodiments, the treated sample or the treated and converted sample is partitioned into at least a first subsample and a second subsample. This may be accomplished simply by dividing the library into identical or substantially identical subsamples. Alternatively, in some methods, different DNA (e.g., sequence-variable target regions, and epigenetic target regions) can be partitioned based on one or more characteristics of the DNA. Detecting aberrant features in DNA (whether sequence-based, epigenetic, or both) while also detecting target regions comprising sequence-variable target regions and / or epigenetic target regions may provide greater specificity and / or sensitivity for identifying an abnormal state than detecting the DNA features alone or levels of one or more post- translationally modified proteins alone.

[0248] In some embodiments, the first subsample comprises sequence-variable target regions and / or epigenetic target regions in a greater proportion than the second subsample. In some embodiments, the second subsample comprises sequence-variable target regions and / or epigenetic target regions in a greater proportion than the first subsample.

[0249] In some embodiments, the partitioning the DNA into a plurality of subsamples comprises contacting the DNA with an agent that recognizes methyl cytosine in the DNA. The partitioning step can occur prior to or after capturing an epigenetic target region set of DNA or a sequence-variable target region of the DNA. The partitioning step can occur prior to capturing an epigenetic target region set of DNA or a sequence-variable target region of the DNA. The partitioning step can occur prior to or after capturing an epigenetic target region set of DNA or a sequence-variable target region of the DNA and prior to or after sequencing the DNA. The partitioning step can occur after capturing an epigenetic target region set of DNA or a sequence-variable target regions of the DNA and prior to sequencing the DNA.Atty. Docket No. GH0193WO

[0250] Disclosed methods herein comprise analyzing DNA in a sample. In some embodiments described herein, the disclosed methods comprise partitioning DNA. In such methods, different forms of DNA (e.g., hypermethylated and hypomethylated DNA) can be physically partitioned based on one or more characteristics of the DNA.

[0251] In some embodiments, a first subsample or aliquot of a sample is subjected to steps for making capture probes as described elsewhere herein and a second subsample or aliquot of a sample is subjected to partitioning. In some embodiments, a sample or subsample or aliquot thereof is subjected to partitioning and differential tagging, followed by a capture step using capture probes for rearranged sequences and optionally additional capture probes, e.g., for sequence-variable and / or epigenetic target regions.

[0252] Methylation profiling can involve determining methylation patterns across different regions of the genome. For example, after partitioning molecules based on extent of methylation (e.g., relative number of methylated nucleobases per molecule) and sequencing, the sequences of molecules in the different partitions can be mapped to a reference genome. This can show regions of the genome that, compared with other regions, are more highly methylated or are less highly methylated. In this way, genomic regions, in contrast to individual molecules, may differ in their extent of methylation.

[0253] In some embodiments, the partitioning comprises contacting the DNA with an agent that recognizes a modification associated with (e.g., in) the DNA. In some embodiments, the agent that recognizes the modification is an antibody or a methyl binding domain (MBD) protein. In some embodiments, the agent is immobilized on a solid support. In some embodiments, the solid support comprises a bead. In some embodiments, the partitioning comprises immunoprecipitation, e.g., using the agent that recognizes the modification, such as an antibody or an MBD protein, immobilized on solid support.

[0254] In some embodiments, the partitioning comprises precipitating the methylated DNA. In some embodiments, the partitioning comprises precipitating the methylated DNA to separate it from the unmethylated DNA. In some embodiments, the precipitating the methylated DNA can be performed using any pair of binding partners. In some embodiments, one of the binding partners may be linked to the MBD protein or antibody, and the other binding partner may be linked to a solid support. In some embodiments, the binding partner comprises biotin and streptavidin. In some embodiments, the biotin may be linked to the MBD protein, and the streptavidin may be linked to a solid support. In some embodiments, the MBD protein is linked to a solid support, optionally using any pair of binding partners. In some embodiments, the partitioning comprises immunoprecipitating the methylated DNA. InAtty. Docket No. GH0193WO some embodiments, the partitioning comprises immunoprecipitating the methylated DNA separately from the unmethylated DNA.

[0255] In some embodiments, the modification is methylation, and in some such embodiments, the partitioning comprises partitioning on the basis of methylation level. In some such embodiments, the agent is a methyl binding reagent. In some embodiments, the methyl binding reagent specifically recognizes 5-methylcytosine. In some such embodiments, the agent is a hydroxymethyl binding reagent. In some embodiments, the methyl binding reagent specifically recognizes 5-hydroxymethylcytosine, biotinylated 5- hydroxymethylcytosine, glucosylated 5-hydroxymethylcytosine, or sulfonylated 5- hydroxymethylcytosine. In some embodiments, the partitioning comprises partitioning on the basis of binding to a protein comprising contacting the sample comprising the DNA with a binding reagent specific for the protein. In some such embodiments, binding reagent specifically binds a methylated protein or an acetylated protein, such as a methylated or acetylated histone, or an unmethylated protein or an unacetylated protein such as an unmethylated or unacetylated histone. In some embodiments, the binding reagent specifically binds an unmethylated or unacetylated protein epitope.

[0256] In some embodiments, the modification is hydroxymethylation, and in some such embodiments, the partitioning comprises partitioning on the basis of hydroxymethylation level. In some such embodiments, the agent is a hydroxymethyl binding reagent, such as an antibody. In some embodiments, the hydroxymethyl binding reagent (e.g., antibody) specifically recognizes 5-hydroxymethylcytosine (5-hmC). In some embodiments, a modification such as hydroxymethylation is labeled (e.g., biotinylated, glucosylated, or sulfonated) before being contacted with an agent that recognizes the labeled form of the modification. For example, 5-hmC can be enzymatically glucosylated and then partitioned based on binding to J-binding protein 1. Exemplary methods of labeling and / or partitioning 5- hmC are provided, e.g., in Song et al., Nat. Biotech. 29:68-72 (2010); Ko et al., Nature 468:839-843 (2010); and Robertson et al., Nucleic Acids Res. 39:e55 (2011).

[0257] Where immunoprecipitation is used and involves an antibody that recognizes singlestranded DNA, the DNA may be converted to double-stranded form by complementary strand synthesis before a subsequent step. Such synthesis may use an adapter as a primer binding site, or can use random priming.

[0258] Partitioning nucleic acid molecules in a sample can increase a rare signal, e.g., by enriching rare nucleic acid molecules that are more prevalent in one partition of the sample. For example, a genetic variation present in epigenetic target regions and / or sequence-variableAtty. Docket No. GH0193WO target regions, can be more easily detected by partitioning a sample into a subsample comprising epigenetic target regions or sequence-variable target regions. By analyzing multiple partitions of a sample, a multi-dimensional analysis of a single molecule can be performed, and hence, greater sensitivity can be achieved. Partitioning may include physically partitioning nucleic acid molecules into partitions or subsamples based on the presence or absence of one or more methylated nucleobases. A sample may be partitioned into partitions or subsamples based on a characteristic that is indicative of differential gene expression or a disease state. A sample may be partitioned based on a characteristic, or combination thereof that provides a difference in signal between a normal and diseased state during analysis of nucleic acids, e.g., cell free DNA (cfDNA), non-cfDNA, tumor DNA, circulating tumor DNA (ctDNA) and cell free nucleic acids (cfNA).

[0259] In some embodiments, hypermethylation and / or hypomethylation variable epigenetic target regions are analyzed to determine whether they show differential methylation characteristic of tumor cells or cells of a type that does not normally contribute to the DNA sample being analyzed (such as cfDNA), and / or particular immune cell types.

[0260] In some instances, heterogenous DNA in a sample can be partitioned into two or more partitions (e.g., at least 3, 4, 5, 6 or 7 partitions). In some embodiments, each partition is differentially tagged. Tagged partitions can then be pooled together for collective sample prep and / or sequencing. The partitioning-tagging-pooling steps can occur more than once, with each round of partitioning occurring based on a different characteristic (examples provided herein) and tagged using differential tags that are distinguished from other partitions and partitioning means. In other instances, the differentially tagged partitions are separately sequenced.

[0261] In some embodiments, sequence reads from differentially tagged and pooled DNA are obtained and analyzed in silico. After sequencing, analysis of reads can be performed on a partiti on-by-partition level, as well as a whole DNA population level. Tags are used to sort reads from different partitions. Analysis to detect genetic variants can be performed on a partition-by-partition level, as well as whole nucleic acid population level. For example, analysis can include in silico analysis to determine genetic variants, such as copy number variations (CNVs), single nucleotide variations (SNVs), insertions / deletions (indels), and / or fusions in nucleic acids in each partition. In some instances, in silico analysis can include analysis to determine epigenetic variation (one or more of methylation chromatin structure, etc.). Analysis can include in silico using sequence information, genomic coordinates length, coverage, and / or copy number. For example, coverage of sequence reads can be used toAtty. Docket No. GH0193WO determine nucleosome positioning in chromatin. Tags are used to sort reads from different partitions. Higher coverage can correlate with higher nucleosome occupancy in genomic region while lower coverage can correlate with lower nucleosome occupancy or nucleosome depleted region (NDR).

[0262] Examples of characteristics that can be used for partitioning include sequence length, methylation level, nucleosome binding, sequence mismatch, immunoprecipitation, and / or proteins that bind to DNA. Resulting partitions can include one or more of the following nucleic acid forms: single-stranded DNA (ssDNA), double-stranded DNA (dsDNA), shorter DNA fragments and longer DNA fragments. In some embodiments, partitioning based on a cytosine modification (e.g., cytosine methylation) or methylation generally is performed and is optionally combined with at least one additional partitioning step, which may be based on any of the foregoing characteristics or forms of DNA. In some embodiments, a heterogeneous population of nucleic acids is partitioned into nucleic acids with base modification and without one or more base modifications, including e.g., one or more sequence-variable target regions or one or more epigenetic modifications. Examples of base modifications are described elsewhere herein. Alternatively or additionally, a heterogeneous population of nucleic acids can be partitioned into nucleic acid molecules associated with nucleosomes and nucleic acid molecules devoid of nucleosomes. Alternatively or additionally, a heterogeneous population of nucleic acids may be partitioned into single-stranded DNA (ssDNA) and double-stranded DNA (dsDNA). Alternatively, or additionally, a heterogeneous population of nucleic acids may be partitioned based on nucleic acid length (e.g., molecules of up to 160 bp and molecules having a length of greater than 160 bp).

[0263] In some cases, different procedures are applied to different partitions to determine different characteristics of the initial sample. In some embodiments, the DNA of at least one partition is subjected to an end repair and sequencing procedure described herein. In some embodiments at least one partition is not subjected to the end repair and sequencing procedure described herein. In cases where the method comprises a conversion procedure, corresponding sequences from the converted and non-converted partitions can be compared to identify single nucleotides that have undergone conversion and therefore identify corresponding modified nucleosides in the initial sample.

[0264] In some embodiments, partition tagging comprises tagging molecules in each partition with a partition tag. After re-combining partitions (e.g., to reduce the number of sequencing runs needed and avoid unnecessary cost) and sequencing molecules, the partition tags identify the source partition. In another embodiment, different partitions are tagged withAtty. Docket No. GH0193WO different sets of molecular tags, e.g., comprised of a pair of barcodes. In this way, each molecular barcode indicates the source partition as well as being useful to distinguish molecules within a partition. For example, a first set of 35 barcodes can be used to tag molecules in a first partition, while a second set of 35 barcodes can be used tag molecules in a second partition.

[0265] In some embodiments, after partitioning and tagging with partition tags, the molecules may be pooled for sequencing in a single run. In some embodiments, a sample tag is added to the molecules, e.g., in a step subsequent to addition of partition tags and pooling. Sample tags can facilitate pooling material generated from multiple samples for sequencing in a single sequencing run.

[0266] Alternatively, in some embodiments, partition tags may be correlated to the sample as well as the partition. As a simple example, a first tag can indicate a first partition of a first sample; a second tag can indicate a second partition of the first sample; a third tag can indicate a first partition of a second sample; and a fourth tag can indicate a second partition of the second sample.

[0267] While tags may be attached to molecules already partitioned based on one or more characteristics, the final tagged molecules in the library may no longer possess that characteristic. For example, while single stranded DNA molecules may be partitioned and tagged, the final tagged molecules in the library are likely to be double stranded. Similarly, while DNA may be subject to partition based on different levels of methylation, in the final library, tagged molecules derived from these molecules are likely to be unmethylated. Accordingly, the tag attached to a molecule in the library typically indicates the characteristic of the “parent molecule” from which the ultimate tagged molecule is derived, not necessarily to characteristic of the tagged molecule, itself.

[0268] As an example, barcodes 1, 2, 3, 4, etc. are used to tag and label molecules in the first partition; barcodes A, B, C, D, etc. are used to tag and label molecules in the second partition; and barcodes a, b, c, d, etc. are used to tag and label molecules in the third partition.Differentially tagged partitions can be pooled prior to sequencing. Differentially tagged partitions can be separately sequenced or sequenced together concurrently, e.g., in the same flow cell of an Illumina sequencer.

[0269] After sequencing, analysis of reads can be performed on a partition-by-partition level, as well as a whole DNA population level. Tags are used to sort reads from different partitions. Analysis can include in silico analysis to determine genetic and epigenetic variation (one or more of methylation, chromatin structure, etc.) using sequence information,Atty. Docket No. GH0193WO genomic coordinates length, coverage, and / or copy number. In some embodiments, higher coverage can correlate with higher nucleosome occupancy in a genomic region, while lower coverage can correlate with lower nucleosome occupancy or a nucleosome depleted region (NDR).

[0270] The agents used to partition populations of nucleic acids within a sample can be affinity agents, such as antibodies with the desired specificity, natural binding partners or variants thereof (Bock et al., Nat Biotech 28: 1106-1114 (2010); Song et al., Nat Biotech 29: 68-72 (2011)), or artificial peptides selected e.g., by phage display to have specificity to a given target. In some embodiments, the agent used in the partitioning is an agent that recognizes a modified nucleobase. In some embodiments, the modified nucleobase recognized by the agent is a modified cytosine, such as a methylcytosine (e.g., 5- methylcytosine). In some embodiments, the modified nucleobase recognized by the agent is a product of a procedure that affects the first nucleobase in the DNA differently from the second nucleobase in the DNA of the sample. In some embodiments, the modified nucleobase may be a “converted nucleobase,” meaning that its base pairing specificity was changed by a procedure. For example, certain procedures convert unmethylated or unmodified cytosine to dihydrouracil, or more generally, at least one modified or unmodified form of cytosine undergoes deamination, resulting in uracil (considered a modified nucleobase in the context of DNA) or a further modified form of uracil. Examples of partitioning agents include antibodies, such as antibodies that recognize a modified nucleobase, which may be a modified cytosine, such as a methylcytosine (e.g., 5- methylcytosine). In some embodiments, the partitioning agent is an antibody that recognizes a modified cytosine other than 5-methylcytosine, such as 5-carboxylcytosine (5-caC). Alternative partitioning agents include methyl binding domain (MBDs) and methyl binding proteins (MBPs) as described herein, including proteins such as MeCP2, MBD2, and antibodies preferentially binding to 5-methylcytosine. Where an antibody is used to immunoprecipitate methylated DNA, the methylated DNA may be recovered in singlestranded form. In such embodiments, a second strand can be synthesized. Hypermethylated (and optionally intermediately methylated) subsamples may then be contacted with a methylation sensitive nuclease that does not cleave hemi-methylated DNA, such as Hpall, BstUI, or Hin6i. Alternatively or in addition, hypom ethylated (and optionally intermediately methylated) subsamples may then be contacted with a methylation dependent nuclease that cleaves hemi-methylated DNA.Atty. Docket No. GH0193WO

[0271] Additional, non-limiting examples of partitioning agents are histone binding proteins which can separate nucleic acids bound to histones from free or unbound nucleic acids. Examples of histone binding proteins that can be used in the methods disclosed herein include RBBP4, RbAp48 and SANT domain peptides.

[0272] In some embodiments, partitioning can comprise both binary partitioning and partitioning based on degree / level of modifications. For example, methylated fragments can be partitioned by methylated DNA immunoprecipitation (MeDIP), or all methylated fragments can be partitioned from unmethylated fragments using methyl binding domain proteins (e.g., MethylMiner™ Methylated DNA Enrichment Kit (ThermoFisher Scientific). Subsequently, additional partitioning may involve eluting fragments having different levels of methylation by adjusting the salt concentration in a solution with the methyl binding domain and bound fragments. As salt concentration increases, fragments having greater methylation levels are eluted.

[0273] In some embodiments, methylation levels can be determined using partitioning, modification-sensitive conversion such as bisulfite conversion, direct detection during sequencing, methylation-sensitive restriction enzyme digestion, methylation-dependent restriction enzyme digestion, or any other suitable approach. For example, different forms of DNA (e.g., hypermethylated and hypom ethylated DNA) can be physically partitioned based on one or more characteristics of the DNA. For example, a methylated DNA binding protein (e.g., an MBD such as MBD2, MBD4, or MeCP2) or an antibody specific for 5- methylcytosine (as in MeDIP) can be used to partition the DNA. This approach can be used to determine, for example, whether certain sequences are hypermethylated or hypomethylated. In some embodiments, a DNA fragmentation pattern can be determined based on endpoints and / or centerpoints of DNA molecules, such as cfDNA molecules.

[0274] In some instances, the final partitions are enriched in nucleic acids having different extents of modifications (overrepresentative or underrepresentative of modifications). Overrepresentation and underrepresentation can be defined by the number of modifications bom by a nucleic acid relative to the median number of modifications per strand in a population. For example, if the median number of 5-methylcytosine residues in nucleic acid in a sample is 2, a nucleic acid including more than two 5-methylcytosine residues is overrepresented in this modification and a nucleic acid with 1 or zero 5-methylcytosine residues is underrepresented. The effect of affinity separation is to enrich for nucleic acids overrepresented in a modification in a bound phase and for nucleic acids underrepresented inAtty. Docket No. GH0193WO a modification in an unbound phase (i.e. in solution). The nucleic acids in the bound phase can be eluted before subsequent processing.

[0275] When using MeDIP or MethylMiner™ Methylated DNA Enrichment Kit (ThermoFisher Scientific) various levels of methylation can be partitioned using sequential elutions. For example, a hypomethylated partition (no methylation) can be separated from a methylated partition by contacting the nucleic acid population with the MBD from the kit, which is attached to magnetic beads. The beads are used to separate out the methylated nucleic acids from the non- methylated nucleic acids. Subsequently, one or more elution steps are performed sequentially to elute nucleic acids having different levels of methylation. For example, a first set of methylated nucleic acids can be eluted at a salt concentration of 160 mM or higher, e.g., at least 150 mM, at least 200 mM, 300 mM, 400 mM, 500 mM, 600 mM, 700 mM, 800 mM, 900 mM, 1000 mM, or 2000 mM. After such methylated nucleic acids are eluted, magnetic separation is once again used to separate higher level of methylated nucleic acids from those with lower level of methylation. The elution and magnetic separation steps can be repeated to create various partitions such as a hypomethylated partition (enriched in nucleic acids comprising no methylation), a methylated partition (enriched in nucleic acids comprising low levels of methylation), and a hyper methylated partition (enriched in nucleic acids comprising high levels of methylation).

[0276] In some methods, nucleic acids bound to an agent used for affinity separation based partitioning are subjected to a wash step. The wash step washes off nucleic acids weakly bound to the affinity agent. Such nucleic acids can be enriched in nucleic acids having the modification to an extent close to the mean or median (i.e., intermediate between nucleic acids remaining bound to the solid phase and nucleic acids not binding to the solid phase on initial contacting of the sample with the agent).

[0277] The affinity separation results in at least two, and sometimes three or more partitions of nucleic acids with different extents of a modification. While the partitions are still separate, the nucleic acids of at least one partition, and usually two or three (or more) partitions are linked to nucleic acid tags, usually provided as components of adapters, with the nucleic acids in different partitions receiving different tags that distinguish members of one partition from another. The tags linked to nucleic acid molecules of the same partition can be the same or different from one another. But if different from one another, the tags may have part of their code in common so as to identify the molecules to which they are attached as being of a particular partition.Atty. Docket No. GH0193WO

[0278] For further details regarding portioning nucleic acid samples based on characteristics such as methylation, see WO2018 / 119452, which is incorporated herein by reference.

[0279] In some embodiments, the nucleic acid molecules can be partitioned into different partitions based on the nucleic acid molecules that are bound to a specific protein or a fragment thereof and those that are not bound to that specific protein or fragment thereof.

[0280] Nucleic acid molecules can be partitioned based on DNA-protein binding. Protein- DNA complexes can be partitioned based on a specific property of a protein. Examples of such properties include various epitopes, modifications (e.g., histone methylation or acetylation) or enzymatic activity. Examples of proteins which may bind to DNA and serve as a basis for fractionation may include, but are not limited to, protein A and protein G. Any suitable method can be used to partition the nucleic acid molecules based on protein bound regions. Examples of methods used to partition nucleic acid molecules based on protein bound regions include, but are not limited to, SDS-PAGE, chromatin-immuno-precipitation (ChIP), heparin chromatography, and asymmetrical field flow fractionation (AF4).

[0281] In some embodiments, the partitioning comprises contacting the DNA with a methylation sensitive restriction enzyme (MSRE) and / or a methylation dependent restriction enzyme (MDRE). Following the treatment of the DNA with a MSRE or a MDRE, the DNA may be partitioned based on size to generate hypermethylated (longest DNA molecules following MSRE treatment and shortest DNA fragments following MDRE treatment), intermediate (intermediate length DNA molecules following MSRE or MDRE treatment), and hypomethylated (shortest DNA molecules following MSRE treatment and longest DNA fragments following MDRE treatment) subsamples.

[0282] In some embodiments, the partitioning is performed by contacting the nucleic acids with a methyl binding domain (“MBD”) of a methyl binding protein (“MBP”). In some such embodiments, the nucleic acids are contacted with an entire MBP. In some embodiments, an MBD binds to 5-methylcytosine (5mC), and an MBP comprises an MBD and is referred to interchangeably herein as a methyl binding protein or a methyl binding domain protein. In some embodiments, MBD is coupled to paramagnetic beads, such as Dynabeads® M-280 Streptavidin via a biotin linker. Partitioning into fractions with different extents of methylation can be performed by eluting fractions by increasing the NaCl concentration.

[0283] In some embodiments, bound DNA is eluted by contacting the antibody or MBD with a protease, such as proteinase K. This may be performed instead of or in addition to elution steps using NaCl as discussed above.Atty. Docket No. GH0193WO

[0284] Examples of agents that recognize a modified nucleobase contemplated herein include, but are not limited to:(a) MeCP2 is a protein that preferentially binds to 5-methyl-cytosine over unmodified cytosine.(b) RPL26, PRP8 and the DNA mismatch repair protein MHS6 preferentially bind to 5- hydroxymethyl-cytosine over unmodified cytosine.(c) FOXK1, FOXK2, FOXP1, FOXP4 and FOXI3 preferably bind to 5-formyl-cytosine over unmodified cytosine (lurlaro et al., Genome Biol. 14: R119 (2013)).(d) Antibodies specific to one or more methylated or modified nucleobases or conversion products thereof, such as 5mC, 5-caC, or DHU.

[0285] In general, elution is a function of the number of modifications, such as the number of methylated sites per molecule, with molecules having more methylation eluting under increased salt concentrations. To elute the DNA into distinct populations based on the extent of methylation, one can use a series of elution buffers of increasing NaCl concentration. Salt concentration can range from about 100 nm to about 2500 mM NaCl. In one embodiment, the process results in three (3) partitions. Molecules are contacted with a solution at a first salt concentration and comprising a molecule comprising an agent that recognizes a modified nucleobase, which molecule can be attached to a capture moiety, such as streptavidin. At the first salt concentration a population of molecules will bind to the agent and a population will remain unbound. The unbound population can be separated as a “hypomethylated” population. For example, a first partition enriched in hypomethylated form of DNA is that which remains unbound at a low salt concentration, e.g., 100 mM or 160 mM. A second partition enriched in intermediate methylated DNA is eluted using an intermediate salt concentration, e.g., between 100 mM and 2000 mM concentration. This is also separated from the sample. A third partition enriched in hypermethylated form of DNA is eluted using a high salt concentration, e.g., at least about 2000 mM.

[0286] In some embodiments, a monoclonal antibody raised against 5-methylcytidine (5mC) is used to purify methylated DNA. DNA is denatured, e.g., at 95°C in order to yield singlestranded DNA fragments. Protein G coupled to standard or magnetic beads as well as washes following incubation with the anti-5mC antibody are used to immunoprecipitate DNA bound to the antibody. Such DNA may then be eluted. Partitions may comprise unprecipitated DNA and one or more partitions eluted from the beads.

[0287] In some embodiments, the partitions of DNA are desalted and concentrated in preparation for enzymatic steps of library preparation.Atty. Docket No. GH0193WO

[0288] Sequences that comprise aberrantly high copy numbers may tend to be hypermethylated. Accordingly, in some embodiments, the DNA contacted with targetspecific probes specific for members of an epigenetic target region set comprising a plurality of target regions that are both type-specific differentially methylated regions and copy number variants comprises at least a portion of a hypermethylated partition. The DNA from or comprising at least a portion of the hypermethylated partition may or may not be combined with DNA from or comprising at least a portion of one or more other partitions, such as an intermediate partition or a hypomethylated partition.

[0289] In some cases, different procedures are applied to different partitions to determine different characteristics of the initial sample. In some embodiments, the DNA of at least one partition is subjected to an end repair and sequencing procedure described herein. In some embodiments at least one partition is not subjected to the end repair and sequencing procedure according to the methods of the disclosure described herein. In cases where the sequencing procedure comprises a conversion procedure, corresponding sequences from the converted and non-converted partitions can be compared to identify single nucleotides that have undergone conversion and therefore identify corresponding modified nucleosides in the initial sample.

[0290] Disclosed methods herein can comprise analyzing DNA in a sample. In some embodiments described herein, the disclosed methods comprise partitioning DNA. In such methods, different forms of DNA (e.g., hypermethylated and hypomethylated DNA) can be physically partitioned based on one or more characteristics of the DNA. This approach can be used to determine, for example, whether certain sequences are hypermethylated or hypomethylated and whether certain hypermethylated regions overlap with regions with copy number variants. In some embodiments, a first subsample or aliquot of a sample is subjected to steps for making capture probes as described elsewhere herein and a second subsample or aliquot of a sample is subjected to partitioning. In some embodiments, a sample or subsample or aliquot thereof is subjected to partitioning and differential tagging, followed by a capture step using capture probes for rearranged sequences and optionally additional capture probes, e.g., for sequence-variable and / or epigenetic target regions.

[0291] Methylation profiling can involve determining methylation patterns across different regions of the genome. For example, after partitioning molecules based on extent of methylation (e.g., relative number of methylated nucleobases per molecule) and sequencing, the sequences of molecules in the different partitions can be mapped to a reference genome. This can show regions of the genome that, compared with other regions, are more highlyAtty. Docket No. GH0193WO methylated or are less highly methylated. In this way, genomic regions, in contrast to individual molecules, may differ in their extent of methylation.C. Contacting DNA with a methylation-discriminating nuclease

[0292] In some embodiments, a DNA sample is contacted with a methylation-discriminating nuclease to produce a treated sample. In some embodiments, a converted sample is contacted with a methylation-discriminating nuclease to produce a treated and converted sample. In some embodiments, the methylation-discrimination nuclease degrades methylated or unmethylated DNA. In some embodiments, the methylation-discriminating nuclease is a methylation-dependent nuclease or methylation-sensitive nuclease. In some embodiments, the converted sample is produced by subjecting the sample to a procedure that affects a first nucleobase differently from a second nucleobase, in which the first nucleobase is a modified or unmodified nucleobase, the second nucleobase is a modified or unmodified nucleobase different from the first nucleobase, and the first nucleobase and the second nucleobase have the same base pairing specificity. In some embodiments, the procedure alters the base pairing specificity of the first or second nucleobase.

[0293] In some embodiments, methods herein comprise contacting DNA (e.g., DNA in a sample or DNA in a converted sample) with a methylation-sensitive nuclease, thereby degrading DNA comprising unmethylated sequences (e.g., an unmethylated CpG sequence) or sequences having low levels of methylation. In some such embodiments, the methylationsensitive nuclease is a methylation-sensitive restriction enzyme (MSRE), thereby degrading DNA comprising an unmethylated recognition site of the MSRE. Methylation-sensitive nucleases can thus be used in methods herein comprising one or more steps that deplete unmodified or unmethylated sequences, such as those that are prevalent in cfDNA from a subject.

[0294] In some embodiments, methods herein comprise contacting DNA (e.g., DNA in a sample or DNA in a converted sample) with a methylation-dependent nuclease, thereby degrading DNA comprising methylated sequences (e.g., a methylated CpG sequence) or sequences having high levels of methylation. In some such embodiments, the methylationdependent nuclease is a methylation-dependent restriction enzyme (MDRE), thereby degrading DNA comprising a methylated recognition site of the MDRE. Methylationdependent nucleases can thus be used in methods herein comprising one or more steps that deplete modified or methylated sequences, such as those that are prevalent in cfDNA from a subject.Atty. Docket No. GH0193WO

[0295] In contacting a sample with a nuclease, one or more nucleases can be used. In some embodiments, a sample is contacted with a plurality of nucleases. The sample may be contacted with the nucleases sequentially or simultaneously. Simultaneous use of nucleases may be advantageous when the nucleases are active under similar conditions (e.g., buffer composition) to avoid unnecessary sample manipulation. Contacting the second sample with more than one methylation-dependent restriction enzyme can more completely degrade hypermethylated DNA. Similarly, contacting the first sample with more than one methylation-sensitive restriction enzyme can more completely degrade hypomethylated and / or unmethylated DNA.

[0296] In some embodiments, a methylation-dependent nuclease comprises one or more of MspJI, LpnPI, FspEI, or McrBC. In some embodiments, at least two methylation-dependent nucleases are used. In some embodiments, at least three methylation-dependent nucleases are used. In some embodiments, the methylation-dependent nuclease comprises FspEI. In some embodiments, the methylation-dependent nuclease comprises FspEI and MspJI, e.g., used sequentially.

[0297] In some embodiments, a methylation-sensitive nuclease comprises one or more of Aatll, AccII, Acil, Aorl3HI, Aorl5HI, BspT104I, BssHII, BstUI, CfrlOI, Clal, Cpol, Eco52I, Haell, HapII, Hhal, Hin6I, Hpall, HpyCH4IV, Mlul, MspI, Nael, Notl, Nrul, Nsbl, PmaCI, Pspl406I, Pvul, SacII, Sall, Smal, and SnaBI. In some embodiments, a methylationsensitive nuclease comprises one or more of BstUI, Hpall, Hin6I, Hhal, or AccII. In some embodiments, at least two methylation-sensitive nucleases are used. In some embodiments, at least three methylation-sensitive nucleases are used. In some embodiments, the methylationsensitive nucleases comprise BstUI and Hpall. In some embodiments, the two methylationsensitive nucleases comprise Hhal and AccII. In some embodiments, the methylationsensitive nucleases comprise BstUI, Hpall and Hin6I.

[0298] In some embodiments, a sample is contacted with a nuclease as described above after a step of tagging or attaching adapters to both ends of the DNA. The tags or adapters can be resistant to cleavage by the nuclease using any of the approaches described above. In this approach, cleavage can prevent the molecule from being carried through the analysis because the cleavage products lack tags or adapters at both ends.

[0299] Alternatively, a step of tagging or attaching adapters can be performed after cleavage with a nuclease as described above. Cleaved molecules can be then identified in sequence reads based on having an end (point of attachment to tag or adapter) corresponding to a nuclease recognition site. Processing the molecules in this way can also allow the acquisitionAtty. Docket No. GH0193WO of information from the cleaved molecule, e.g., observation of somatic mutations. When tagging or attaching adapters after contacting the subsample with a nuclease, and low molecular weight DNA such as cfDNA is being analyzed, it may be desirable to remove high molecular weight DNA (such as contaminating genomic DNA) from the sample before the contacting step. It may also be desirable to use nucleases that can be heat-inactivated at a relatively low temperature (e.g., 65°C or less, or 60°C or less) to avoid denaturing DNA, in that denaturation may interfere with subsequent ligation steps.

[0300] In some embodiments, the DNA is purified after being contacted with the nuclease, e.g., using SPRI beads. Such purification may occur after heat inactivation of the nuclease. Alternatively, purification can be omitted; thus, for example, a subsequent step such as amplification can be performed on the subsample containing heat-inactivated nuclease. In another embodiment, the contacting step can occur in the presence of a purification reagent such as SPRI beads, e.g., to minimize losses associated with tube transfers. After cleavage and heat inactivation, the SPRI beads can be re-used for cleanup by adding molecular crowding reagents (e.g., PEG) and salt.

[0301] In some embodiments, DNA fragmentation is detected by determining the endpoints and / or midpoints of sequenced fragments of DNA (e.g., cfDNA). For example, differences in fragmentation patterns may occur depending on whether the fragments originated from a tumor or from healthy cells. To detect tumor-cell derived DNA of cfDNA based on fragmentation, the presence or absence of an increased level of abnormal fragments can be determined at regions with copy -number amplifications, (e.g., proportional to the degree of amplification), e.g., where the increase and abnormality are relative to control or healthy samples.

[0302] In some embodiments, where a modification sensitive conversion is performed on a sample or subsample, the subsequent capturing of one or more target region sets (e.g., at least an epigenetic target region set) from that sample or subsample uses target-specific probes that comprise probes specific for a modification state (e.g., of at least one base in the sequence to which the probe hybridizes), e.g., complementary to target sequences that have undergone conversion (e.g., conversion of modified or unmodified cytosines to uracils or analogs thereof, such as DHU, that preferentially pair with adenine) or that have not undergone conversion, as desired. As such, the probes can be specific for sequences in which a modification of interest, such as methylation, was or was not present. In some embodiments, where a modification sensitive conversion is performed on a sample or subsample, the subsequent capturing of one or more target region sets (e.g., at least an epigenetic targetAtty. Docket No. GH0193WO region set) from that sample or subsample uses target-specific probes that comprise probes that can hybridize to target sequences regardless of modification state (e.g., comprise a promiscuously pairing nucleobase at a position that may or may not have undergone conversion of modified or unmodified cytosines to uracils or analogs thereof, such as DHU, that preferentially pair with adenine; for example, inosine can pair with C or U).

[0303] In some embodiments, the methods comprise preparing a pool comprising at least a portion of the DNA of the second subsample (also referred to as the hypomethylated partition) and at least a portion of the DNA of the first subsample (also referred to as the hypermethylated partition). Target regions, e.g., including epigenetic target regions and / or sequence-variable target regions, may be captured from the pool. The steps of capturing a target region set from at least a portion of a subsample described elsewhere herein encompass capture steps performed on a pool comprising DNA from the first and second subsamples. A step of amplifying DNA in the pool may be performed before capturing target regions from the pool. The capturing step may have any of the features described elsewhere herein.

[0304] The epigenetic target regions may show differences in methylation levels and / or fragmentation patterns depending on whether they originated from a tumor or from healthy cells, or what type of tissue they originated from, as discussed elsewhere herein. The sequence-variable target regions may show differences in sequence depending on whether they originated from a tumor or from healthy cells.

[0305] Analysis of epigenetic target regions from the hypomethylated partition may be less informative in some applications than analysis of sequence-variable target-regions from the hypermethylated and hypomethylated partitions and epigenetic target regions from the hypermethylated partition. As such, in methods where sequence-variable target-regions and epigenetic target regions are being captured, the latter may be captured to a lesser extent than one or more of the sequence-variable target-regions from the hypermethylated and hypomethylated partitions and epigenetic target regions from the hypermethylated partition. For example, sequence-variable target regions can be captured from the portion of the hypomethylated partition not pooled with the hypermethylated partition, and the pool can be prepared with some (e.g., a majority, substantially all, or all) of the DNA from the hypermethylated partition and none or some (e.g., a minority) of the DNA from the hypomethylated partition. Such approaches can reduce or eliminate sequencing of epigenetic target regions from the hypomethylated partition, thereby reducing the amount of sequencing data that suffices for further analysis.Atty. Docket No. GH0193WO

[0306] In some embodiments, including a minority of the DNA of the hypomethylated partition in the pool facilitates quantification of one or more epigenetic features (e.g., methylation or other epigenetic feature(s) discussed in detail elsewhere herein), e.g., on a relative basis.

[0307] In some embodiments, the pool comprises a minority of the DNA of the hypomethylated partition, e.g., less than about 50% of the DNA of the hypomethylated partition, such as less than or equal to about 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, or 5% of the DNA of the hypomethylated partition. In some embodiments, the pool comprises about 5%-25% of the DNA of the hypomethylated partition. In some embodiments, the pool comprises about 10%-20% of the DNA of the hypomethylated partition. In some embodiments, the pool comprises about 10% of the DNA of the hypomethylated partition. In some embodiments, the pool comprises about 15% of the DNA of the hypomethylated partition. In some embodiments, the pool comprises about 20% of the DNA of the hypomethylated partition.

[0308] In some embodiments, the pool comprises a portion of the hypermethylated partition, which may be at least about 50% of the DNA of the hypermethylated partition. For example, the pool may comprise at least about 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95% of the DNA of the hypermethylated partition. In some embodiments, the pool comprises 50- 55%, 55-60%, 60-65%, 65-70%, 70-75%, 75-80%, 80-85%, 85-90%, 90-95%, or 95-100% of the DNA of the hypermethylated partition. In some embodiments, the second pool comprises all or substantially all of the hypermethylated partition.

[0309] In some embodiments, the methods comprise preparing a first pool comprising at least a portion of the DNA of the hypomethylated partition. In some embodiments, the methods comprise preparing a second pool comprising at least a portion of the DNA of the hypermethylated partition. In some embodiments, the first pool further comprises a portion of the DNA of the hypermethylated partition. In some embodiments, the second pool further comprises a portion of the DNA of the hypomethylated partition. In some embodiments, the first pool comprises a majority of the DNA of the hypomethylated partition, and optionally and a minority of the DNA of the hypermethylated partition. In some embodiments, the second pool comprises a majority of the DNA of the hypermethylated partition and a minority of the DNA of the hypomethylated partition. In some embodiments involving an intermediately methylated partition, the second pool comprises at least a portion of the DNA of the intermediately methylated partition, e.g., a majority of the DNA of the intermediately methylated partition. In some embodiments, the first pool comprises a majority of the DNAAtty. Docket No. GH0193WO of the hypomethylated partition, and the second pool comprises a majority of the DNA of the hypermethylated partition and a majority of the DNA of the intermediately methylated partition.

[0310] In some embodiments, the methods comprise capturing at least a first set of target regions from the first pool, e.g., wherein the first pool is as set forth in any of the embodiments above. In some embodiments, the first set comprises sequence-variable target regions. In some embodiments, the first set comprises hypomethylation variable target regions and / or fragmentation variable target regions. In some embodiments, the first set comprises sequence-variable target regions and fragmentation variable target regions. In some embodiments, the first set comprises sequence-variable target regions, hypomethylation variable target regions and fragmentation variable target regions. A step of amplifying DNA in the first pool may be performed before this capture step. In some embodiments, capturing the first set of target regions from the first pool comprises contacting the DNA of the first pool with a first set of target-specific probes. In some embodiments, the first set of targetspecific probes comprises target-binding probes specific for the sequence-variable target regions. In some embodiments, the first set of target-specific probes comprises target-binding probes specific for the sequence-variable target regions, hypomethylation variable target regions and / or fragmentation variable target regions.

[0311] In some embodiments, the methods comprise capturing a second set of target regions or plurality of sets of target regions from the second pool, e.g., wherein the first pool is as set forth in any of the embodiments above. In some embodiments, the second plurality comprises epigenetic target regions, such as hypermethylation variable target regions and / or fragmentation variable target regions. In some embodiments, the second plurality comprises sequence-variable target regions and epigenetic target regions, such as hypermethylation variable target regions and / or fragmentation variable target regions. A step of amplifying DNA in the second pool may be performed before this capture step. In some embodiments, capturing the second plurality of sets of target regions from the second pool comprises contacting the DNA of the first pool with a second set of target-specific probes, wherein the second set of target-specific probes comprises target-binding probes specific for the sequence-variable target regions and target-binding probes specific for the epigenetic target regions. In some embodiments, the first set of target regions and the second set of target regions are not identical. For example, the first set of target regions may comprise one or more target regions not present in the second set of target regions. Alternatively or in addition, the second set of target regions may comprise one or more target regions not presentAtty. Docket No. GH0193WO in the first set of target regions. In some embodiments, at least one hypermethylation variable target region is captured from the second pool but not from the first pool. In some embodiments, a plurality of hypermethylation variable target regions are captured from the second pool but not from the first pool. In some embodiments, the first set of target regions comprises sequence-variable target regions and / or the second set of target regions comprises epigenetic target regions. In some embodiments, the first set of target regions comprises sequence-variable target regions, and fragmentation variable target regions; and the second set of target regions comprises epigenetic target regions, such as hypermethylation variable target regions and fragmentation variable target regions. In some embodiments, the first set of target regions comprises sequence-variable target regions, fragmentation variable target regions, and comprises hypomethylation variable target regions; and the second set of target regions comprises epigenetic target regions, such as hypermethylation variable target regions and fragmentation variable target regions.

[0312] In some embodiments, the first pool comprises a majority of the DNA of the hypomethylated partition and a portion of the DNA of the hypermethylated partition (e.g., about half), and the second pool comprises a portion of the DNA of the hypermethylated partition (e.g., about half). In some such embodiments, the first set of target regions comprises sequence-variable target regions and / or the second set of target regions comprises epigenetic target regions. The sequence-variable target regions and / or the epigenetic target regions may be as set forth in any of the embodiments described elsewhere herein.D. Subjecting the sample or treated sample to a procedure that affects a first nucleobase in the DNA differently from a second nucleobase

[0313] Methods disclosed herein may comprise a step of subjecting a sample a procedure that affects a first nucleobase in the DNA differently from a second nucleobase. In some embodiments, the sample is an original sample or is a proportionately methylated sample relative to an original sample. Methods disclosed herein may comprise a step of subjecting a treated sample to a procedure that affects a first nucleobase in the DNA differently from a second nucleobase.

[0314] In some embodiments, the first nucleobase is a modified or unmodified nucleobase, the second nucleobase is a modified or unmodified nucleobase different from the first nucleobase, and the first nucleobase and the second nucleobase have the same base pairing specificity. In some embodiments, if the first nucleobase is a modified or unmodified adenine, then the second nucleobase is a modified or unmodified adenine; if the first nucleobase is a modified or unmodified cytosine, then the second nucleobase is a modified orAtty. Docket No. GH0193WO unmodified cytosine; if the first nucleobase is a modified or unmodified guanine, then the second nucleobase is a modified or unmodified guanine; and if the first nucleobase is a modified or unmodified thymine, then the second nucleobase is a modified or unmodified thymine (where modified and unmodified uracil are encompassed within modified thymine for the purpose of this step). Such a procedure can be used to identify nucleotides in the subsample that have or lack certain modifications, such as methylation.

[0315] In some embodiments, the first nucleobase is a modified or unmodified cytosine, then the second nucleobase is a modified or unmodified cytosine. In some embodiments, the first nucleobase is a modified cytosine, then the second nucleobase is an unmodified cytosine. In some embodiments, the first nucleobase is an unmodified cytosine, then the second nucleobase is a modified cytosine. For example, first nucleobase may comprise unmodified cytosine (C) and the second nucleobase may comprise one or more of 5-methylcytosine (mC) and 5-hydroxymethylcytosine (hmC). Alternatively, the second nucleobase may comprise C and the first nucleobase may comprise one or more of mC and hmC. Other combinations are also possible, as indicated, e.g., in the Summary above and the following discussion, such as where one of the first and second nucleobases comprises mC and the other comprises hmC.

[0316] In some embodiments, the procedure that affects a first nucleobase of the DNA differently from a second nucleobase of the DNA is a conversion. In some embodiments, the procedure that affects a first nucleobase of the DNA differently from a second nucleobase of the DNA is methylation-sensitive conversion. The methods disclosed herein can comprise contacting DNA in a sample with a deaminase, thereby providing a converted sample. In some embodiments, the deaminase is a methyl-sensitive deaminase or a methyl -insensitive deaminase. In some embodiments, the deaminase is a dsDNA deaminase and / or a ssDNA deaminase. This step of contacting the DNA in the sample with a deaminase can be referred to as, or be included in, a conversion procedure, such as any of the conversion procedures described elsewhere herein. For an exemplary description of conversion using a deaminase, see, e.g., Schutsky et al., Nature Biotechnology 2018; 36: 1083-1090. In some embodiments, the DNA in the converted sample is then sequenced, and a level or methylation at one or more differentially methylated regions of the DNA is quantified, or a variation of the copy number at one or more regions of the DNA is quantified.

[0317] Table 1 summarizes exemplary methods of deamination with the type of modified bases detectable with these methods. These are described in more detail below.Atty. Docket No. GH0193WO

[0318] As outlined below, there are various methods of detecting and / or identifying modified nucleosides that rely on a conversion procedure that changes the base-pairing specificity of a nucleoside, based on the modification status of the nucleosides. These changes of basepairing specificity can then be detected, and thus the modification status of the nucleoside inferred, by sequencing.

[0319] In some embodiments, the procedure that affects a first nucleobase in the DNA differently from a second nucleobase comprises bisulfite conversion. In some embodiments, contacting the sample with the methylation-discriminating nuclease occurs prior to bisulfite conversion, in which the methylation-discriminating nuclease is a methylation-sensitive restriction enzyme (MSRE).

[0320] In some embodiments, the procedure that affects a first nucleobase in the DNA differently from a second nucleobase comprises direct methylation sequencing (DM-seq). In some embodiments, contacting the sample with the methylation-discriminating nuclease occurs prior to direct methylation sequencing (DM-seq), in which the methylationdiscriminating nuclease is a methylation-sensitive restriction enzyme (MSRE).

[0321] In some embodiments, the procedure that affects a first nucleobase in the DNA differently from a second nucleobase comprises APOBEC-coupled epigenetic (ACE) conversion. In some embodiments, contacting the sample with the methylation-discriminating nuclease occurs prior to contacting the sample with APOB EC, in which the methylationdiscriminating nuclease is a methylation-sensitive restriction enzyme (MSRE).

[0322] In some embodiments, the procedure that affects a first nucleobase in the DNA differently from a second nucleobase comprises enzymatic methyl-seq (EM-seq) conversion. In some embodiments, as discussed below, EM-seq comprises a contacting a sample with APOBEC. In some embodiments, contacting the sample with the methylation-discriminatingAtty. Docket No. GH0193WO nuclease occurs prior to contacting the sample with APOBEC, in which the methylationdiscriminating nuclease is a methylation-sensitive restriction enzyme (MSRE).

[0323] In some embodiments, the procedure that affects a first nucleobase in the DNA differently from a second nucleobase comprises single-enzyme 5 -methylcytosine sequencing (SEM-seq) conversion. In some embodiments, contacting the sample with the methylationdiscriminating nuclease occurs prior to single-enzyme 5-methylcytosine sequencing (SEM- seq) conversion, in which the methylation-discriminating nuclease is a methylation-sensitive restriction enzyme (MSRE).

[0324] In some embodiments, the conversion procedure used in the methods of the disclosure is one that changes the base pairing specificity of a modified nucleoside (e.g. methylated cytosine) but does not change the base pairing specificity of the corresponding unmodified nucleoside (e.g. cytosine) or does not change the base pairing specificity of any un-modified nucleoside (e.g. cytosine, adenosine, guanosine and thymidine (or uracil)). Advantages of methods that do not convert the base-pairing specificity of unmodified nucleosides include reduced loss of sequence complexity, higher sequencing efficiency and reduced alignment losses. Additionally, methods such as TAPS may in some cases be preferred over methods such as bisulfite sequencing and EM-seq because they are less destructive (especially important for low yield samples such as cfDNA or FFPE samples) and do not require denaturation, meaning that non-conversion errors are theoretically more likely to be random. In methods that require denaturation for conversion, failure to denature a DNA molecule will result in non-conversion of all bases in the DNA molecule. As biological changes in methylation are predominantly concerted to a localized regions of interest, these non-random (localized) non-conversion events can appear as false negatives (non-methylated regions). Random non-conversion methods can maximally affect a low percent of bases within a region, and thus the specificity of methylation change detection can be maximized (reduce false positives) by placing a threshold on percentage of bases within a region that are methylated / non-methylated. Hence, in some cases, a conversion procedure that does not involve denaturation can be preferred.

[0325] In other cases, the conversion procedure that can be used in the methods of the disclosure is one that changes the base pairing specificity of an unmodified nucleoside (e.g. cytosine) but does not change the base pairing specificity of the corresponding modified nucleoside (e.g. methylated cytosine such as 5hmC and / or 5mC). Such methods include, for example, bisulfite sequencing.Atty. Docket No. GH0193WO

[0326] The skilled person can select a suitable method according to their needs, including which nucleoside modifications are to be detected and / or identified and which type of modified base is used in the end repair reaction.

[0327] In some embodiments, the conversion procedure converts modified nucleosides. In some embodiments, the conversion procedure which converts modified nucleosides comprises Tet-assisted conversion with a substituted borane reducing agent, optionally wherein the substituted borane reducing agent is 2-picoline borane, borane pyridine, tertbutylamine borane, ammonia borane or pyridine borane. In Tet-assisted pic-borane conversion with a substituted borane reducing agent conversion, a TET protein is used to convert 5mC and 5hmC to 5caC, without affecting unmodified C. 5caC, and 5fC if present, are then converted to dihydrouracil (DHU) by treatment with 2-picoline borane (pic-borane) or another substituted borane reducing agent such as borane pyridine, tert-butylamine borane, or ammonia borane, also without affecting unmodified C. See, e.g, Liu et al., Nature Biotechnology 2019; 37:424-429 (e.g., at Supplementary Fig. 1 and Supplementary Note 7). Thus, when this type of conversion is used, the first nucleobase comprises one or more of 5mC, 5fC, 5caC, or 5hmC, and the second nucleobase comprises unmodified cytosine. DHU is read as a T in sequencing. Sequencing of the converted DNA identifies positions that are read as cytosine as being unmodified C positions. Meanwhile, positions that are read as T are identified as being T, 5mC, 5fC, 5caC, or 5hmC. Performing TAP conversion, such as on a DNA sample as described herein, thus facilitates identifying positions containing unmodified C using the sequence reads obtained.

[0328] Hence, in these embodiments, the end repair reaction can be performed with dNTPs, wherein the at least one type of dNTP comprises a 5mC or 5hmC, and regions of the end- repaired DNA synthesized during the end repair reaction can be identified as those regions comprising 5mC or 5hmC (via T being called at positions which are C in the reference) at non-CpG positions. This procedure encompasses Tet-assisted pyridine borane sequencing (TAPS), described in further detail in Liu et al. 2019, supra. In this method Tet enzyme is used to progressively oxidize 5mC and 5hmC to 5fC or 5caC, then pyridine borane deaminates 5fC, 5CaC to DHU, amplified as T.

[0329] Alternatively, protection of 5hmC (e.g., using PGT or 5-hydroxymethylcytosine carbamoyltransferase) can be combined with Tet-assisted conversion with a substituted borane reducing agent, e.g. as described above. In this method (TAPS-P), 5hmC can be protected from conversion, for example through glucosylation using P-glucosyl transferase (PGT), forming (forming 5-glucosylhydroxymethylcytosine) 5ghmC, or throughAtty. Docket No. GH0193WO carbamoylation using 5-hydroxymethylcytosine carbamoyltransferase, forming 5cmC. This is described in Yu et al., Cell 2012; 149: 1368-80. Treatment with a TET protein such as mTetl then converts 5mC to 5caC but does not convert C, 5ghmC, or 5cmC. 5caC is then converted to DHU by treatment with pic-borane or another substituted borane reducing agent such as borane pyridine, tert-butylamine borane, or ammonia borane, also without affecting ghmC, 5cmC, or unmodified C. Thus, when Tet-assisted conversion with a substituted borane reducing agent is used, the first nucleobase comprises mC, and the second nucleobase comprises one or more of unmodified cytosine or hmC, such as unmodified cytosine and optionally hmC, fC, and / or caC. Sequencing of the converted DNA identifies positions that are read as cytosine as being either 5hmC or unmodified C positions. Meanwhile, positions that are read as T are identified as being T, 5fC, 5caC, or 5mC. Performing TAPSP conversion on a sample as described herein thus facilitates distinguishing positions containing unmodified C or 5hmC on the one hand from positions containing 5mC using the sequence reads obtained. Hence, in these embodiments, the end repair reaction can be performed with dNTPs, wherein the at least one type of dNTP comprises a 5mC, and regions synthesized during the end repair reaction can be identified as those regions comprising 5mC (via T being called at positions which are C in the reference) at non-CpG positions. For an exemplary description of this type of conversion, see, e.g., Liu et al., Nature Biotechnology 2019; 37:424-429. 5-hydroxymethylcytosine carbamoyltransferase is described in Yang et al., Bio-protocol, 2023; 12(17): e4496.

[0330] In some embodiments, the conversion procedure converts modified nucleosides. In some embodiments, the conversion procedure which converts modified nucleosides comprises chemical-assisted conversion with a substituted borane reducing agent, optionally wherein the substituted borane reducing agent is 2-picoline borane, borane pyridine, tert- butylamine borane, borane pyridine or ammonia borane. In chemical-assisted conversion with a substituted borane reducing agent, an oxidizing agent such as potassium perruthenate (KRuCh) (also suitable for use in ox-BS conversion) is used to specifically oxidize 5hmC to 5fC. Treatment with pic-borane or another substituted borane reducing agent such as borane pyridine, tert-butylamine borane, or ammonia borane converts 5fC and 5caC to DHU but does not affect 5mC or unmodified C. Thus, when this type of conversion is used, the first nucleobase comprises one or more of hmC, fC, and caC, and the second nucleobase comprises one or more of unmodified cytosine or mC, such as unmodified cytosine and optionally mC. Sequencing of the converted DNA identifies positions that are read as cytosine as being either 5mC or unmodified C positions. Meanwhile, positions that are readAtty. Docket No. GH0193WO as T are identified as being T, 5fC, 5caC, or 5hmC. Performing this type of conversion as described herein thus facilitates distinguishing positions containing unmodified C or 5mC on the one hand from positions containing 5hmC using the sequence reads obtained. Hence, in these embodiments, the end repair reaction can be performed with dNTPs, wherein at least one type of dNTP comprises a 5hmC, and regions synthesized during the end repair reaction can be identified as those regions comprising 5hmC (via T being called at positions which are C in the reference) at non-CpG positions. For an exemplary description of this type of conversion, see, e.g., Liu et al., Nature Biotechnology 2019; 37:424-429.

[0331] Exemplary conversion procedures that change the base-pairing specificity of modified cytosines have been described. However, the methods described herein could in principle use any modified nucleoside and suitable conversion procedure (i.e. single-base epigenetic conversion assay) that changes the base-pairing specificity of the modified nucleoside and thereby allows the modified base to be distinguished from the corresponding unmodified nucleoside and / or other types of modification when sequenced. For example, any conversion procedure could be used allowing any one of N6-methyladenine (6mA), N6- hydroxymethyladenine (6hmA), or N6-formyladenine (6fA) to be distinguished from unmodified adenosine.

[0332] In some embodiments, the conversion procedure converts unmodified nucleosides. In some embodiments, the conversion procedure which converts unmodified nucleosides comprises bisulfite conversion. Treatment with bisulfite converts unmodified cytosine and certain modified cytosine nucleotides (e.g. 5-formyl cytosine (5fC) or 5-carboxylcytosine (5caC)) to uracil whereas other modified cytosines (e.g., 5mC and 5hmC) are not converted. Thus, where bisulfite conversion is used, the first nucleobase comprises one or more of unmodified cytosine, 5fC, 5caC, or other cytosine forms affected by bisulfite, and the second nucleobase may comprise one or more of 5mC and 5hmC, such as 5mC and optionally 5hmC. Sequencing of bisulfite-treated DNA identifies positions that are read as cytosine as being 5mC or 5hmC positions. Meanwhile, positions that are read as T are identified as being T or a bisulfite-susceptible form of C, such as unmodified cytosine, 5fC, or 5caC. Thus, performing bisulfite conversion, such as on a DNA sample as described herein facilitates identifying positions containing 5mC or 5hmC. Hence, in these embodiments, the end repair reaction can be performed with dNTPs, wherein at least one type of dNTP comprises a 5mC and / or a 5hmC, and regions synthesized during the end repair reaction can be identified as those regions comprising 5mC or a 5hmC (via C being called at these positions) at non-CpGAtty. Docket No. GH0193WO positions. For an exemplary description of bisulfite conversion, see, e.g., Moss et al., Nat Commun. 2018; 9: 5068.

[0333] In some embodiments, the procedure which converts unmodified nucleosides comprises oxidative bisulfite (Ox-BS) conversion. This procedure first converts 5hmC to 5fC, which is bisulfite susceptible, followed by bisulfite conversion. Thus, when oxidative bisulfite conversion is used, the first nucleobase comprises one or more of unmodified cytosine, 5fC, 5caC, 5hmC, or other cytosine forms affected by bisulfite, and the second nucleobase comprises 5mC. Sequencing of Ox-BS converted DNA identifies positions that are read as cytosine as being 5mC positions. Meanwhile, positions that are read as T are identified as being T or a bisulfite-susceptible form of C, such as unmodified cytosine, 5fC, or 5hmC. Hence, in these embodiments, the end repair reaction can be performed with dNTPs, wherein at least one type of dNTP comprises a 5mC, and regions synthesized during the end repair reaction can be identified as those regions comprising 5mC (via C being called at these positions) at non-CpG positions. Performing Ox-BS conversion thus facilitates identifying positions containing mC. For an exemplary description of oxidative bisulfite conversion, see, e.g., Booth et al., Science 2012; 336: 934-937.

[0334] In some embodiments, the procedure which converts unmodified nucleosides comprises Tet-assisted bisulfite (TAB) conversion. In TAB conversion, 5hmC is protected from conversion and 5mC is oxidized in advance of bisulfite treatment, so that positions originally occupied by 5mC are converted to U while positions originally occupied by 5hmC remain as a protected form of cytosine. For example, as described in Yu et al., Cell 2012; 149: 1368-80, P-glucosyl transferase can be used to protect 5hmC (forming 5- glucosylhydroxymethylcytosine (5ghmC)), then a TET protein such as mTetl can be used to convert 5mC to 5caC, and then bisulfite treatment can be used to convert C and 5caC to U while 5ghmC remains unaffected.

[0335] Alternatively, a carbamoyltransferase enzyme, such as 5-hydroxymethylcytosine carbamoyltransferase as described in Yang et al., Bio-protocol, 2023; 12(17): e4496, can be used to protect hmC (by converting hmC to 5-carbamoyloxymethylcytosine (5cmC)), then a TET protein such as mTetl can be used to convert mC to caC, and then bisulfite treatment can be used to convert C and caC to U while 5cmC remains unaffected. Thus, when TAB conversion is used, the first nucleobase comprises one or more of unmodified cytosine, 5fC, 5caC, 5mC, or other cytosine forms affected by bisulfite, and the second nucleobase comprises 5hmC. Sequencing of TAB-converted DNA identifies positions that are read as cytosine as being 5hmC positions. Meanwhile, positions that are read as T are identified asAtty. Docket No. GH0193WO being T, or a bisulfite-susceptible form of C, such as unmodified cytosine, 5mC, 5fC, or 5caC. Performing TAB conversion on a first subsample as described herein thus facilitates identifying positions containing 5hmC. Hence, in these embodiments, the end repair reaction can be performed with dNTPs, wherein at least one type of dNTP comprises a 5hmC, and regions synthesized during the end repair reaction can be identified as those regions comprising 5hmC (via C being called at these positions) at non-CpG positions.

[0336] In some embodiments, the conversion procedure which converts unmodified cytosines comprises APOBEC-coupled epigenetic (ACE) conversion. In ACE conversion, an AID / APOBEC family DNA deaminase enzyme such as APOBEC3A (A3 A) is used to deaminate an unmodified cytosine and 5mC without deaminating 5hmC, 5fC, or 5-caC. Thus, when ACE conversion is used, the first nucleobase comprises unmodified C and / or mC (e.g., unmodified C and optionally mC), and the second nucleobase comprises hmC. Sequencing of ACE-converted DNA identifies positions that are read as cytosine as being 5hmC, 5fC, or 5- caC positions. Meanwhile, positions that are read as T are identified as being T, unmodified C, or 5mC. Performing ACE conversion as described herein thus facilitates distinguishing positions containing 5hmC from positions containing 5mC or unmodified C using the sequence reads obtained from the first subsample. In some embodiments, the end repair reaction can be performed with dNTPs, wherein at least one type of dNTP comprises a 5hmC, and regions synthesized during the end repair reaction can be identified as those regions comprising 5hmC (via C being called at these positions) at non-CpG positions. For an exemplary description of ACE conversion, see, e.g., Schutsky et al., Nature Biotechnology 2018; 36: 1083-1090.

[0337] In some embodiments, the procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA of the first subsample comprises enzymatic conversion of the first nucleobase, e.g., as in EM-Seq. See, e.g., Vaisvila R, et al. (2019) EM-seq: Detection of DNA methylation at single base resolution from picograms of DNA. bioRxiv, DOI: 10.1101 / 2019.12.20.884692, available at www.biorxiv.org / content / 10.1101 / 2019.12.20.884692vl. For example, TET2 and T4-PGT or 5-hydroxymethylcytosine carbamoyltransferase (described in Yang et al., Bio-protocol, 2023; 12(17): e4496) can be used to convert 5mC and 5hmC into substrates that cannot be deaminated by a deaminase (e.g., APOBEC3A), and then a deaminase (e.g., APOBEC3A) can be used to deaminate unmodified cytosines, converting them to uracils.

[0338] In some embodiments, the procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA comprises enzymatic conversion of the firstAtty. Docket No. GH0193WO nucleobase using a non-specific, modification-sensitive double-stranded DNA deaminase, e.g., as in SEM-seq. See, e.g., Vaisvila et al. (2023) Discovery of novel DNA cytosine deaminase activities enables a nondestructive single-enzyme methylation sequencing method for base resolution high-coverage methylome mapping of cell-free and ultra-low input DNA. bioRxiv; DOI: 10.1101 / 2023.06.29.547047, available at https: / / www.biorxiv.org / content / 10.1101 / 2023.06.29.547047vl. SEM-Seq employs a non- specific, modification-sensitive double-stranded DNA deaminase (MsddA) in a nondestructive single-enzyme 5-methylctyosine sequencing (SEM-seq) method that deaminates unmodified cytosines. Accordingly, SEM-seq does not require the TET2 and T4- PGT or 5 -hydroxymethyl cytosine carbamoyltransferase protection and denaturing steps that are of use, e.g., in APOEC3A-based protocols. Additionally, MsddA does not deaminate 5- formylated cytosines (5fC) or 5-carboxylated cytosines (5-caC). In SEM-seq, unmodified cytosines in the DNA are deaminated to uracil and is read as “T” during sequencing. Modified cytosines (e.g., 5mC) are not converted and are read as “C” during sequencing. Cytosines that are read as thymines are identified as unmodified (e.g., unmethylated) cytosines or as thymines in the DNA. Performing SEM-seq conversion thus facilitates identifying positions containing 5mC using the sequence reads obtained. In some embodiments, the procedure that affects a first nucleobase in the DNA differently from a second nucleobase in the DNA comprises enzymatic conversion of unmodified cytosine using MsddA or a modification-sensitive DNA deaminase A (MsddA)-like deaminase. For an exemplary description of MsddA and MsddA-like deaminases, see, e.g., Vaisvila et al. Mol Cell. 2024 Mar 7;84(5):854-866.e7, which illustrates in Fig. 2A-C that MsddA-like deaminases have reduced activity on each of 5mC, 5hmC, and 5gmC relative to unmodified cytosine in dsDNA, e.g., a reduction of about 75%, 80%, or more on each of 5mC, 5hmC, and 5gmC relative to unmodified cytosine (e.g., using assay conditions as described in Vaisvila et al., such as analysis of deamination of C in E. coli or lambda dem- DNA, deamination of 5mC in XP12 phage DNA, deamination of 5hmC in a C-hydroxymethylated adenovirus PCR fragment or fully C-hydroxymethylated T4147 phage DNA, and deamination of 5gmC in alpha-glucosyltransferase knockout (AGT-) T4 phage DNA. Deamination can be performed by contacting substrate DNA with deaminase and analyzed using NGS as follows: 50 ng of unmodified E. coli C2566 genomic DNA can be combined with the control DNAs (about 1 ng of Lambda, XP12, and T4147, and 0.1 ng of the 5hmC Adenovirus PCR fragment), sheared to about 300 bp and ligated to pyrrolo-dC adapters with 1 uL of in vitro synthesized deaminase (e.g., synthesized using the PURExpress In VitroAtty. Docket No. GH0193WOProtein Synthesis kit (NEB, Ipswich, MA) following manufacturer’s recommendations with 100-400 ng of PCR fragment template DNA containing codon-optimized deaminase coding sequence and T7 promoter and terminator). Exemplary deamination reaction conditions are 50 mM Bis-Tris pH 6.0, 0.1% Triton X-100 for 1 hour at 37 degrees C. After the deamination reaction, 1 uL of Thermolabile Proteinase K (NEB, Ipswich, MA) can be added and incubated for 30 min at 37 degrees C and then the Proteinase K can be heat inactivated at 60 degrees C for 10 minutes. The deaminated product can then be used for library amplification using the NEBNext Q5U Master Mix (New England Biolabs, Ipswich, MA, USA) with 5mMof NEBNext Unique Dual Index Primers. The resulting library can be purified using IX NEBNext Sample Purification Beads according to the manufacturer’s instructions and the purified library can be analyzed and quantified by an Agilent Bioanalyzer 2100 DNA Highsensitivity chip. The libraries can be sequenced using the Illumina NextSeq and NovaSeq platforms. Paired-end sequencing of 75 cycles (2 x 75 bp) can be performed for all the sequencing runs. Base calling and demultiplexing can be carried out with the standard Illumina pipeline.

[0339] In some embodiments, the conversion procedure converts modified nucleosides. In some embodiments, the conversion procedure which converts modified nucleosides comprises enzymatic conversion, such as DM-seq, for example, as described in Wang et al., Nat Chem Biol. 2023, 19(8): 1004-1012 and WO2023 / 288222A1. In DM-seq, unmodified cytosines in the DNA are enzymatically protected from a subsequent deamination step wherein 5mC in 5mCpG is converted to T. The enzymatically protected unmodified (e.g., unmethylated) cytosines are not converted and are read as “C” during sequencing. Cytosines that are read as thymines (in a CpG context) are identified as methylated cytosines in the DNA. For an exemplary description of mCpG binding domain proteins, see, e.g., Du et al., Methyl-CpG-binding domain proteins: readers of the epigenome. Epigenomics. 2015;7(6): 1051-73.

[0340] Thus, when this type of conversion is used, the first nucleobase comprises unmodified (such as unmethylated) cytosine, and the second nucleobase comprises modified (such as methylated) cytosine. Sequencing of the converted DNA identifies positions that are read as cytosine as being unmodified C positions. Meanwhile, positions that are read as T are identified as being T or 5mC. Performing DM-seq conversion thus facilitates identifying positions containing 5mC using the sequence reads obtained.

[0341] Exemplary cytosine deaminases for use herein include APOBEC enzymes, for example, APOBEC3 A. Generally, AID / APOBEC family DNA deaminase enzymes such asAtty. Docket No. GH0193WOAP0BEC3A (A3 A) are used to deaminate (unprotected) unmodified cytosine and 5mC. For an exemplary description of APOBEC enzymes, see, e.g., Gajula et al., Nucleic Acids Res. 2014 Sep;42(15):9964-75 and Schutsky et al., Nucleic Acids Res. 2017 Jul 27;45(13):7655- 7665. For an exemplary description of APOBEC conversion, see, e.g., Schutsky et al., Nature Biotechnology 2018, 36: 1083-1090.

[0342] The enzymatic protection of unmodified cytosines in the DNA comprises addition of a protective group to the unmodified cytosines. Such protective groups can comprise an alkyl group, an alkyne group, a carboxyl group, a carboxyalkyl group, an amino group, a hydroxymethyl group, a glucosyl group, a glucosylhydroxymethyl group, an isopropyl group, or a dye. For example, DNA can be treated with a methyltransferase, such as a CpG-specific methyltransferase, which adds the protective group to unmodified cytosines. The term methyltransferase is used broadly herein to refer to enzymes capable of transferring a methyl or substituted methyl (e.g., carboxymethyl) to a substrate (e.g., a cytosine in a nucleic acid). In some embodiments, the DNA is contacted with a CpG-specific DNA methyltransferase (MTase), such as a CpG-specific carboxymethyltransferase (CxMTase), and a substituted methyl donor, such as a carboxymethyl donor (e.g., carboxymethyl-S-adenosyl-L- methionine). See, e.g., W 020211236778 A2. In particular embodiments, the CxMTase can facilitate the addition of a protective carboxymethyl group to an unmethylated cytosine. In some embodiments, the unmethylated cytosine is unmodified cytosine. The carboxymethyl group can prevent deamination of the cytosine during a deamination step (such as a deamination step using an APOBEC enzyme, such as A3 A). Substituted methyl or carboxymethyl donors useful in the disclosed methods include but are not limited to, S- adenosyl-L-methionine (SAM) analogs, optionally wherein the SAM analog is carboxy-S- adenosyl-L-methionine (CxSAM). SAM analogs are described, for example, in WO2022 / 197593A1. The MTase may be, for example, a CpG methyltransferase from Spiroplasma sp. strain MQ1 (M.SssI), DNA-methyltransferase 1 (DNMT1), DNA- methyltransferase 3 alpha (DNMT3 A), DNA-methyltransferase 3 beta (DNMT3B), or DNA adenine methyltransferase (Dam). The CxMTase may be a CpG methyltransferase from Mycoplasma penetrans (M.Mpel).

[0343] In one embodiment, the methyltransferase enzyme is a variant of M.Mpel having an N374R substitution or an N374K substitution. The methyltransferase can further comprise one or more amino acid substitutions selected from a) substitution of one or both residues T300 and E305 with S, A, G, Q, D, or N; b) substitution of one or more residues A323, N306,Atty. Docket No. GH0193WO and Y299 with a positively charged amino acid selected from K, R or H; and / or c) substitution of S323 with A, G, K, R or H, which may enhance the activity of the enzyme.

[0344] Optionally, the conversion procedure further includes enzymatic protection of 5hmCs, such as by glucosylation of the 5hmCs (e.g., using PGT) or by carbamoylation of the 5hmCs (e.g., using 5-hydroxymethylcytosine carbamoyltransferase), in the DNA prior to the deamination of unprotected modified cytosines. In this method, 5hmC can be protected from conversion, for example through glucosylation using P-glucosyl transferase (PGT), forming (5-glucosylhydroxymethylcytosine) 5ghmC, or through carbamoylation using 5- hydroxymethylcytosine carbamoyltransferase, forming 5cmC. This is described, for example, in Yu et al., Cell 2012; 149: 1368-80, and in Yang et al., Bio-protocol, 2023; 12(17): e4496. Glucosylation or carbamoylation of 5hmC can reduce or eliminate deamination of 5hmC by a deaminase such as APOBEC3A. Treatment with an MTase or CxMTase then adds a protecting group to unmodified (unmethylated) cytosines in the DNA. 5mC (but not protected, unmodified cytosine and not 5ghmC or 5cmC) is then deaminated (converted to T in the case of 5mC) by treatment with a deaminase, for example, an APOB EC enzyme (such as APOBEC3 A). Sequencing of the converted DNA identifies positions that are read as cytosine as being either 5hmC or unmodified C positions. Meanwhile, positions that are read as T are identified as being T or 5mC. Performing DM-seq conversion with glucosylation of 5hmC on a sample as described herein thus facilitates distinguishing positions containing unmodified C or 5hmC on the one hand from positions containing 5mC using the sequence reads obtained.

[0345] Also provided herein are methods in which alternative base conversion schemes can be used. For example, unmethylated cytosines can be left intact while methylated cytosines and hydroxymethylcytosines are converted to a base read as a thymine (e.g., uracil, thymine, or dihydrouracil).

[0346] In some embodiments, methylating a cytosine in at least one first complementary strand or second complementary strand comprises contacting the cytosine with a methyltransferase such as DNMT1 or DNMT5. In such embodiments, the step of oxidizing a 5-hydroxymethylated cytosine to 5-formylcytosine (such as by contacting the 5- hydroxymethyl cytosine in a first strand and a second strand with KRuCh) can be optional.

[0347] In some embodiments, converting the modified cytosine in at least one first or second strand to a thymine or a base read as thymine comprises oxidizing a hydroxymethyl cytosine, e.g., the hydroxymethyl cytosine is oxidized to formylcytosine. In some embodiments,Atty. Docket No. GH0193WO oxidizing the hydroxymethyl cytosine to formylcytosine comprises contacting the hydroxymethyl cytosine with a ruthenate, such as potassium ruthenate (KRuOQ.

[0348] In some embodiments, the modified cytosine is converted to thymine, uracil, or dihydrouracil. In any such embodiments, amplification methods may comprise uracil- and / or dihydrouracil-tolerant amplification methods, such as PCR using a uracil- and / or dihydrouracil-tolerant DNA polymerase.

[0349] In some embodiments, the method comprises converting a formylcytosine and / or a methylcytosine to carboxylcytosine as part of converting the modified cytosine in at least one first or second strand to a thymine or a base read as thymine. For example, converting the formylcytosine and / or the methylcytosine to carboxylcytosine can comprise contacting the formylcytosine and / or the methylcytosine with a TET enzyme, such as TET1, TET2, or TET3. In some embodiments, the method comprises reducing the carboxyl cytosine as part of converting the modified cytosine in at least one first or second strand to a thymine or a base read as thymine, and / or the carboxylcytosine is reduced to dihydrouracil. In some embodiments, reducing the carboxylcytosine comprises contacting the carboxylcytosine with a borane or borohydride reducing agent.

[0350] In some embodiments, the borane or borohydride reducing agent comprises pyridine borane, 2-pi coline borane, borane, tert-butylamine borane, ammonia borane, sodium borohydride, sodium cyanoborohydride (NaBHsCN), lithium borohydride (LiBEU), ethylenediamine borane, dimethylamine borane, sodium triacetoxyborohydride, morpholine borane, 4-methylmorpholine borane, trimethylamine borane, dicyclohexylamine borane, or a salt thereof. In other embodiments, the reducing agent comprises lithium aluminum hydride, sodium amalgam, amalgam, sulfur dioxide, dithionate, thiosulfate, iodide, hydrogen peroxide, hydrazine, diisobutylaluminum hydride, oxalic acid, carbon monoxide, cyanide, ascorbic acid, formic acid, dithiothreitol, beta-mercaptoethanol, or any combination thereof.

[0351] As discussed above, in some embodiments, a TET protein can be used to convert 5mC and optionally 5hmC (but not unmodified C) into substrates (e.g., 5caC) that cannot be deaminated by a deaminase, and then a deaminase (e.g., APOBEC3 A) can be used to deaminate unmodified cytosines, converting them to uracils. Various TET enzymes may be used in the disclosed methods as appropriate. In some embodiments, the one or more TET enzymes comprise TETv. TETv is described in US Patent 10,260,088 and its sequence is SEQ ID NO: 1 therein. In some embodiments, the one or more TET enzymes comprise TETcd. TETcd is described in US Patent 10,260,088 and its sequence is SEQ ID NO: 3 therein. In some embodiments, the one or more TET enzymes comprise TET1. In someAtty. Docket No. GH0193WO embodiments, the one or more TET enzymes comprise TET2. TET2 may be expressed and used as a fragment comprising TET2 residues 1129-1480 joined to TET2 residues 1844-1936 by a linker as described, e.g., in US Patent 10,961,525. In some embodiments, the one or more TET enzymes comprise TET1 and TET2. In some embodiments, the one or more TET enzymes comprise a T1372 TET mutant, such as T1372S. In some embodiments, the one or more TET enzymes comprise a VI 900 TET mutant, such as a VI 900 A, V1900C, V1900G, VI 9001, or V1900P TET mutant. In some embodiments, the one or more TET enzymes comprise a V1900 TET2 mutant, such as a V1900A, V1900C, V1900G, V1900I, or V1900P TET2 mutant. It can be beneficial to use a TET enzyme that maximizes formation of 5- carboxylcytosine (5-caC) relative to less oxidized modified cytosines, particularly 5- formylcytosine, because 5-caC is not a substrate for enzymatic deamination, e.g., by APOBEC enzymes such as APOBEC3 A. Maximizing formation of 5-caC thus reduces the risk of false calls in which a base is identified as unmethylated because it underwent deamination even though it was methylated (or hydroxymethylated) in the original sample. Accordingly, in some embodiments, the TET enzyme comprises a mutation that increases formation of 5-caC. Exemplary mutations are set forth above. “A mutation that increases formation of 5-caC” means that the TET enzyme having the mutation produces more 5-caC than a TET enzyme that lacks the mutation but is otherwise identical. 5-caC production can be measured as described, e.g., in Liu et al., Nat Chem Biol 13: 181-187 (2017) (see Online Methods section, TET reactions in vitro subsection, “driving” conditions). Any variants and / or mutants described in Liu et al. (2017) can be used in the disclosed methods as appropriate.

[0352] In some embodiments, the one or more TET enzymes comprise a TET2 enzyme comprising a T1372S mutation, such as TET2-CS-T1372S and TET2-CD-T1372S. A TET2 comprising a T1372S mutation is described in US Patent 10,961,525 and may be expressed and used as a fragment comprising TET2 residues 1129-1480 joined to TET2 residues 1844- 1936 by a linker. Position 1372 of TET2 corresponds to position 258 of SEQ ID NO: 21 (wild type TET2 catalytic domain) of US Patent 10,961,525. Thus, the sequence of a T1372S TET2 catalytic domain may be obtained by changing the threonine at position 258 of SEQ ID NO: 21 of US Patent 10,961,525 to serine. TET2 comprising a T1372S mutation is also described in Liu et al., Nat Chem Biol. 2017 February; 13(2): 181-187. As demonstrated in Liu et al., TET2 comprising a T1372S mutation can more efficiently oxidize 5mC to produce 5-carboxylcytosine (5-caC) than other versions of TET2 such as TET2 lacking a T1372S mutation.Atty. Docket No. GH0193WO

[0353] In some embodiments, the deaminase is thermally inactivated after contacting DNA with the deaminase. In some embodiments, the thermal inactivation comprises heating or cooling of the deaminase to a temperature at which the deaminase has reduced or inhibited activity relative to a deaminase that has not been subjected to heating or cooling. In some embodiments, the thermal inactivation completely inhibits the activity of the deaminase or reduces the activity of the deaminase by at least about 5%, about 10%, about 15%, about 20%, about 25%, about 50%, about 75%, about 90%, about 95%, about 98%, about 99%, or 100% relative to a deaminase that has not been subjected to heating or cooling.E. Samples and Subjects

[0354] The disclosure relates to methods of analyzing DNA in a sample. In some embodiments, the sample is a proportionately methylated sample relative to an original sample when it is an aliquot of the original sample or otherwise has not undergone manipulations that selectively enrich, capture, or deplete DNA molecules based on methylation or lack thereof. Proportionately methylated samples include samples or aliquots that were purified or isolated in a manner that does not substantially discriminate on the basis of methylation, such as ion exchange chromatography, size fractionation, hydrophobic interaction chromatography, target capture that is independent of methylation status (e.g., target capture performed prior to base conversion or that does not discriminate based on base conversion), or other techniques that do not rely on specific binding to methylated DNA or specific binding to unmethylated DNA. Proportionately methylated samples may have undergone processing steps such as end repair and adapter ligation (including end repair and adapter ligation using nucleotides and / or adapters comprising methylation or other modifications). In contrast, subsamples resulting from partitioning on the basis of methylation, e.g., using MBD or an antibody specific for methylated cytosine, are not proportionately methylated samples relative to their original sample.

[0355] In some embodiments, an original sample is a sample (e.g., of blood, plasma, or serum) as originally obtained from a source, such as a subject.

[0356] In some cases, the DNA sample used in a method disclosed herein is obtained or has been obtained from a subject. In some embodiments, the DNA sample may comprise or consist of DNA from a biological sample obtained from a subject. The subject may be a human, a mammal, an animal, a primate, rodent (including mice and rats), or other common laboratory, domestic, companion, service or agricultural animal, for example a rabbit, dog, cat, horse, cow, sheep, goat or pig. Preferably, the DNA sample is from a human. The subject may in some cases have or be suspected of having a cancer, tumor or neoplasm. In otherAtty. Docket No. GH0193WO cases, the subject may not have cancer or a detectable cancer symptom. The subject may have been treated with one or more cancer therapy, e.g., any one or more of chemotherapies, antibodies, vaccines or biologies. The subject may be in remission, e.g. from a tumor, cancer, or neoplasia (e.g., following treatment such as chemotherapy, surgical resection, radiation, or a combination thereof). The subject may or may not be diagnosed as being susceptible to cancer or any cancer-associated genetic mutations / disorders. In some embodiments, the sample is a DNA sample obtained from a tumor tissue biopsy. The cancer, tumor, or neoplasm may generally be of any type, for example a cancer tumor or neoplasm of the lung, colon, rectum (or colorectum), kidney, breast, prostate, or liver, or other type of cancer as described herein. In some embodiments, the sample is obtained from a subject in remission from a tumor, cancer, or neoplasia (e.g., following chemotherapy, surgical resection, radiation, or a combination thereof). In any of the foregoing embodiments, the precancer, cancer, tumor, or neoplasia or suspected precancer, cancer, tumor, or neoplasia may be of the bladder, head and neck, lung, colon, rectum, kidney, breast, prostate, skin, or liver. In some embodiments, the precancer, cancer, tumor, or neoplasia or suspected precancer, cancer, tumor, or neoplasia is of the lung. In some embodiments, the precancer, cancer, tumor, or neoplasia or suspected precancer, cancer, tumor, or neoplasia is of the colon or rectum. In some embodiments, the precancer, cancer, tumor, or neoplasia or suspected precancer, cancer, tumor, or neoplasia is of the breast. In some embodiments, the precancer, cancer, tumor, or neoplasia or suspected precancer, cancer, tumor, or neoplasia is of the prostate. In any of the foregoing embodiments, the subject may be a human subject. In some embodiments, the sample is obtained from a subject having a stage I cancer, stage II cancer, stage III cancer or stage IV cancer.

[0357] In some embodiments, the subject may have an infection, a transplant rejection, or other disease or disorder related to changes in the immune system. The subject may not have cancer or a detectable cancer symptom. The subject may have been treated with one or more cancer therapy, e.g., any one or more of chemotherapies, antibodies, vaccines or biologies. The subject may be in remission. The subject may or may not be diagnosed as being susceptible to cancer or any cancer-associated genetic mutations / disorders.

[0358] The biological sample can be any biological sample isolated from a subject. Biological samples can include body tissues, such as known or suspected solid tumors (such as carcinomas, adenocarcinomas, or sarcomas), whole blood, platelets, serum, plasma, stool, red blood cells, white blood cells or leucocytes, endothelial cells, tissue biopsies, cerebrospinal fluid synovial fluid, lymphatic fluid, ascites fluid, interstitial or extracellularAtty. Docket No. GH0193WO fluid, the fluid in spaces between cells, including gingival crevicular fluid, bone marrow, pleural effusions, cerebrospinal fluid, saliva, mucous, sputum, semen, sweat, and urine. In some embodiments, biological samples are body fluids, particularly blood and fractions thereof (e.g., plasma and / or serum) or urine. A sample can be in the form originally isolated from a subject or can have been subjected to further processing to remove or add components, such as cells, or enrich for one component relative to another.

[0359] In some embodiments, a population of nucleic acids is obtained from a serum, plasma or blood sample from a subject suspected of having neoplasia, a tumor, precancer, or cancer or previously diagnosed with neoplasia, a tumor, precancer, or cancer. The population includes nucleic acids having varying levels of sequence variation, epigenetic variation, and / or post-replication or transcriptional modifications. Post-replication modifications include modifications of cytosine, particularly at the 5-position of the nucleobase, e.g., 5- methylcytosine, 5-hydroxymethylcytosine, 5-formylcytosine and 5-carboxylcytosine.

[0360] A sample can be isolated or obtained from a subject and transported to a site of sample analysis. The sample may be preserved and shipped at a desirable temperature, e.g., room temperature, 4°C, -20°C, and / or -80°C. A sample can be isolated or obtained from a subject at the site of the sample analysis.

[0361] In a particular embodiment, the DNA sample comprises cell-free DNA. In another particular embodiment the DNA sample is a DNA sample from a formalin fixed paraffin embedded (FFPE) sample.

[0362] The subject can be a human, a mammal, an animal, a companion animal, a service animal, or a pet. The subject may have a cancer, precancer, infection, transplant rejection, or other disease or disorder related to changes in the immune system. The subject may not have cancer or a detectable cancer symptom. The subject may have been treated with one or more cancer therapy, e.g., any one or more of chemotherapies, antibodies, vaccines or biologies. The subject may be in remission. The subject may or may not be diagnosed of being susceptible to cancer or any cancer-associated genetic mutations / disorders.

[0363] In some embodiments, the sample comprises plasma. The volume of plasma used to obtain the DNA sample can depend on the desired read depth for sequenced regions. Exemplary volumes are 0.4-40 ml, 5-20 ml, 10-20 ml. For example, the volume can be 0.5 mL, 1 mL, 2 mL, 3 mL, 4 mL, 5 mL, 6 mL, 7 mL, 8 mL, 9 mL, 10 mL, 20 mL, 30 mL, or 40 mL. A volume of sampled plasma may be 5 to 20 mL. In some embodiments, the sample volume is 3-5 mL of plasma, such as 4 mL of plasma, per 10 mL whole blood.Atty. Docket No. GH0193WO

[0364] In some embodiments, the sample comprises whole blood. Exemplary volumes of sampled whole blood are 0.4-40 mL, 5-20 mL, 10-20 mL, 1-6 mL, 1-3 mL, and 3-5 mL. For example, the volume can be 0.5 mL, 1 mL, 2 mL, 3 mL, 4 mL, 5 mL, 6 mL, 7 mL, 8 mL, 9 mL, 10 mL, 20 mL, 30 mL, or 40 mL. A volume of sampled whole blood may be 5 to 20 mL. In some embodiments, the sample volume is 1-5 mL of whole blood, such as 2.5 mL of whole blood.

[0365] In some embodiments, the sample comprises buffy coat separated from whole blood. Exemplary volumes of sampled buffy coat are 0.1-20 mL, 1-10 mL, 1-5 mL, 0.2-0.6 mL, and 0.3-0.5 mL. For example, the volume can be 0.1 mL, 0.2 mL, 0.3 mL, 0.4 mL, 0.5 mL, 0.6 mL, 0.7 mL, 0.8 mL, 0.9 mL, 1 mL, 2 mL, 3 mL, 4 mL, 5 mL 10 mL, or 20 mL. A volume of sampled buffy coat may be 1 to 10 mL. In some embodiments, the sample volume is 0.1-0.5 mL of buffy coat, such as 0.3 mL of buffy coat, per 10 mL whole blood.

[0366] In some embodiments, the sample comprises PBMCs separated from whole blood. Exemplary volumes of sampled PBMCs are 0.1-20 mL, 1-10 mL, 1-5 mL, 0.2-0.6 mL, and 0.3-0.5 mL. For example, the volume can be 0.1 mL, 0.2 mL, 0.3 mL, 0.4 mL, 0.5 mL, 0.6 mL, 0.7 mL, 0.8 mL, 0.9 mL, 1 mL, 2 mL, 3 mL, 4 mL, 5 mL 10 mL, or 20 mL. A volume of sampled PBMCs may be 1 to 10 mL. In some embodiments, the sample volume is 0.1-0.5 mL of PBMCs, such as 0.3 mL of PBMCs, per 10 mL whole blood.

[0367] In some embodiments, the sample comprises leukocytes separated from subject blood using leukapheresis. Exemplary volumes of sampled leukocytes from leukapheresis are 0.1- 20 mL, 1-10 mL, 1-5 mL, 0.2-0.6 mL, and 0.3-0.5 mL. For example, the volume can be 0.1 mL, 0.2 mL, 0.3 mL, 0.4 mL, 0.5 mL, 0.6 mL, 0.7 mL, 0.8 mL, 0.9 mL, 1 mL, 2 mL, 3 mL, 4 mL, 5 mL, 10 mL, or 20 mL. A volume of sampled leukocytes from leukapheresis may be 1 to 10 mL. In some embodiments, the sample volume is 0.1-0.6 mL of leukocytes from leukapheresis, such as 0.4 mL of leukocytes, per 10 mL whole blood.

[0368] A sample can comprise various amounts of nucleic acid that contain genome equivalents. For example, a sample of about 30 ng DNA can contain about 10,000 (104) haploid human genome equivalents and, in the case of cfDNA, about 200 billion (2xlOn) individual polynucleotide molecules. Similarly, a sample of about 100 ng of DNA can contain about 30,000 haploid human genome equivalents and, in the case of cfDNA, about 600 billion individual molecules.

[0369] A sample can comprise nucleic acids from different sources, e.g., nucleic acids from cells and cell-free of the same subject, and nucleic acids from cells and cell-free nucleic acids of different subjects. In some embodiments, the nucleic acid may be DNA. A sample canAty. Docket No. GH0193WO comprise nucleic acids (e.g., DNA) carrying mutations. For example, a sample can comprise DNA carrying germline mutations and / or somatic mutations. Germline mutations refer to mutations existing in germline DNA of a subject. Somatic mutations refer to mutations originating in somatic cells of a subject, e.g., cancer cells. A sample can comprise DNA carrying cancer-associated mutations (e.g., cancer-associated somatic mutations). A sample can comprise an epigenetic variant (i.e., a chemical or protein modification), wherein the epigenetic variant associated with the presence of a genetic variant such as a cancer- associated mutation. In some embodiments, the sample comprises an epigenetic variant associated with the presence of a genetic variant, wherein the sample does not comprise the genetic variant.

[0370] The DNA sample may be or comprise cell free nucleic acids or cfDNA. The cfDNA may be obtained from a test subject, for example as described above. For example, the sample for analysis may be plasma or serum containing cell-free nucleic acids. “Cell-free DNA” “cfDNA molecules,” or “cfDNA”, for example, include DNA molecules that naturally occur in a subject in extracellular form (e.g., in blood, serum, plasma, or other bodily fluids such as lymph, cerebrospinal fluid, urine, or sputum). While the cfDNA originally existed in a cell or cells in a large complex biological organism, e.g., a mammal, it has undergone release from the cell(s) in vivo into a fluid found in the organism, and may be obtained by obtaining a sample of the fluid without the need to perform an in vitro cell lysis step. In other words, cell-free nucleic acids or cfDNA are nucleic acids or DNA not contained within or otherwise bound to a cell, or the nucleic acids or DNA remaining in a sample after removing intact cells. Cell-free nucleic acids include DNA, RNA, and hybrids thereof, including genomic DNA, mitochondrial DNA, siRNA, miRNA, circulating RNA (cRNA), tRNA, rRNA, small nucleolar RNA (snoRNA), Piwi-interacting RNA (piRNA), long non-coding RNA (long ncRNA), or fragments of any of these. Cell-free nucleic acids can be doublestranded, single-stranded, or a hybrid thereof. A cell-free nucleic acid can be released into bodily fluid through secretion or cell death processes, e.g., cellular necrosis and apoptosis. Some cell-free nucleic acids are released into bodily fluid from cancer cells e.g., circulating tumor DNA, (ctDNA). Others are released from healthy cells. In some embodiments, cfDNA is cell-free fetal DNA (cffDNA). In some embodiments, cell free nucleic acids are produced by tumor cells. In some embodiments, cell-free nucleic acids are produced by a mixture of tumor cells and non-tumor cells.

[0371] Exemplary amounts of cell-free nucleic acids (e.g., cfDNA) in a sample before amplification range from about 1 fg to about 1 pg, e.g., 1 pg to 200 ng, 1 ng to 100 ng, 10 ngAtty. Docket No. GH0193WO to 1000 ng. For example, the amount can be up to about 600 ng, up to about 500 ng, up to about 400 ng, up to about 300 ng, up to about 200 ng, up to about 100 ng, up to about 50 ng, or up to about 20 ng of cell-free nucleic acid molecules. The amount can be at least 1 fg, at least 10 fg, at least 100 fg, at least 1 pg, at least 10 pg, at least 100 pg, at least 1 ng, at least 10 ng, at least 100 ng, at least 150 ng, or at least 200 ng of cell-free nucleic acid molecules. The amount can be up to 1 femtogram (fg), 10 fg, 100 fg, 1 picogram (pg), 10 pg, 100 pg, 1 ng, 10 ng, 100 ng, 150 ng, or 200 ng of cell-free nucleic acid molecules. The method can comprise obtaining 1 femtogram (fg) to 200 ng of cell-free nucleic acid molecules from samples.

[0372] Cell-free nucleic acids have an exemplary size distribution of about 100-500 nucleotides, with molecules of 110 to about 230 nucleotides representing about 90% of molecules, with a mode of about 168 nucleotides and a second minor peak in a range between 240 to 440 nucleotides.

[0373] Cell-free nucleic acids can be isolated from bodily fluids through a fractionation or partitioning step in which cell-free nucleic acids, as found in solution, are separated from intact cells and other non-soluble components of the bodily fluid. In some embodiments, a blood sample is fractionated prior to capturing at least an epigenetic target region set of DNA. Partitioning may include techniques such as centrifugation or filtration. Alternatively, cells in bodily fluids can be lysed and cell-free and cellular nucleic acids processed together. Generally, after addition of buffers and wash steps, nucleic acids can be precipitated with an alcohol. Further clean up steps may be used such as silica-based columns to remove contaminants or salts. Non-specific bulk carrier nucleic acids, DNA or protein for bisulfite sequencing, hybridization, and / or ligation, may be added throughout the reaction to optimize certain aspects of the procedure such as yield.

[0374] After such processing, samples can include various forms of nucleic acid including double stranded DNA, single stranded DNA and single stranded RNA. In some embodiments, single stranded DNA and RNA can be converted to double stranded forms so they are included in subsequent processing and analysis steps.

[0375] The methods disclosed herein are also particularly suited for the analysis of DNA from formalin-fixed paraffin-embedded (FFPE) tissue samples. While the formalin fixation process adequately preserves the ultrastructure of the tissues, it results in various types of damage to the DNA within the tissues, such as nicks in the DNA. As explained elsewhere herein, these nicks can lead to synthesis of regions of the DNA molecule in the end repair process. The methods disclosed herein allow for these regions to be identified and the sequence data to be interpreted accordingly.Atty. Docket No. GH0193WO

[0376] Reference or control molecules can be added to or spiked into a sample as a control or normalization standard. For example, a certain amount of modified DNA from a species other than the species of the subject from which the sample was obtained or synthetic nucleic acids comprising certain modifications may be added to the sample. In some embodiments, the reference or control molecules are distinguishable from the molecules originally present in the sample. In some embodiments, the detected DNA sequences are normalized to the reference or control molecules.F. Pooling of DNA from first and second subsamples or portions thereof

[0377] In some embodiments, the methods comprise preparing a pool comprising at least a portion of the DNA of the second subsample (also referred to as the hypomethylated partition) and at least a portion of the DNA of the first subsample (also referred to as the hypermethylated partition). Target regions, e.g., including epigenetic target regions and / or sequence-variable target regions, may be captured from the pool. The steps of capturing a target region set from at least a portion of a subsample described elsewhere herein encompass capture steps performed on a pool comprising DNA from the first and second subsamples. A step of amplifying DNA in the pool may be performed before capturing target regions from the pool. The capturing step may have any of the features described elsewhere herein.

[0378] The epigenetic target regions may show differences in methylation levels and / or fragmentation patterns depending on whether they originated from a tumor or from healthy cells, or what type of tissue they originated from, as discussed elsewhere herein. The sequencevariable target regions may show differences in sequence depending on whether they originated from a tumor or from healthy cells.

[0379] Analysis of epigenetic target regions from the hypomethylated partition may be less informative in some applications than analysis of sequence-variable target-regions from the hypermethylated and hypomethylated partitions and epigenetic target regions from the hypermethylated partition. As such, in methods where sequence-variable target-regions and epigenetic target regions are being captured, the latter may be captured to a lesser extent than one or more of the sequence-variable target-regions from the hypermethylated and hypomethylated partitions and epigenetic target regions from the hypermethylated partition. For example, sequence-variable target regions can be captured from the portion of the hypomethylated partition not pooled with the hypermethylated partition, and the pool can be prepared with some (e.g., a majority, substantially all, or all) of the DNA from the hypermethylated partition and none or some (e.g., a minority) of the DNA from the hypomethylated partition. Such approaches can reduce or eliminate sequencing of epigeneticAtty. Docket No. GH0193WO target regions from the hypomethylated partition, thereby reducing the amount of sequencing data that suffices for further analysis.

[0380] In some embodiments, including a minority of the DNA of the hypomethylated partition in the pool facilitates quantification of one or more epigenetic features (e.g., methylation or other epigenetic feature(s) discussed in detail elsewhere herein), e.g., on a relative basis.

[0381] In some embodiments, the pool comprises a minority of the DNA of the hypomethylated partition, e.g., less than about 50% of the DNA of the hypomethylated partition, such as less than or equal to about 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, or 5% of the DNA of the hypomethylated partition. In some embodiments, the pool comprises about 5%-25% of the DNA of the hypomethylated partition. In some embodiments, the pool comprises about 10%-20% of the DNA of the hypomethylated partition. In some embodiments, the pool comprises about 10% of the DNA of the hypomethylated partition. In some embodiments, the pool comprises about 15% of the DNA of the hypomethylated partition. In some embodiments, the pool comprises about 20% of the DNA of the hypomethylated partition.

[0382] In some embodiments, the pool comprises a portion of the hypermethylated partition, which may be at least about 50% of the DNA of the hypermethylated partition. For example, the pool may comprise at least about 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95% of the DNA of the hypermethylated partition. In some embodiments, the pool comprises 50-55%, 55-60%, 60-65%, 65-70%, 70-75%, 75-80%, 80-85%, 85-90%, 90-95%, or 95-100% of the DNA of the hypermethylated partition. In some embodiments, the second pool comprises all or substantially all of the hypermethylated partition.G. Sequencing

[0383] In some embodiments, the method comprises sequencing at least a portion of the DNA. In some embodiments, the method comprises sequencing at least a portion of treated and converted sample. In some embodiments, the sequencing occurs after amplifying the treated and converted sample. In some embodiments, the sequencing occurs after capturing a first target region set comprising epigenetic target regions from the sample (e.g., the treated and converted sample). In some embodiments, the sequencing occurs after producing the treated and converted sample and after amplifying a treated and converted sample. In some embodiments, the sequencing occurs after producing the treated and converted sample and after capturing a first target region set comprising epigenetic target regions from the sample. In some embodiments, the sequencing occurs after amplifying a treated and converted sample and after capturing a first target region set comprising epigenetic target regions from theAtty. Docket No. GH0193WO sample. In some embodiments, the sequencing occurs after producing the treated and converted sample, after amplifying a treated and converted sample, and after capturing a first target region set comprising epigenetic target regions from the sample.

[0384] In some embodiments, sequencing comprises sequencing the DNA in a manner that distinguishes the first nucleobase from the second nucleobase. In some embodiments, subsamples are pooled prior to the sequencing. In some embodiments, subsamples are produced using a partitioning step. In general, sample nucleic acids, including nucleic acids flanked by adapters, with or without prior amplification can be subject to sequencing. Sequencing methods include, for example, Sanger sequencing, high-throughput sequencing, pyrosequencing, sequencing-by-synthesis, long-read sequencing (also known as singlemolecule sequencing or third generation sequencing), nanopore sequencing (a type of long- read sequencing), 5 -letter sequencing or 6-letter sequencing, semiconductor sequencing, sequencing-by-ligation, sequencing-by-hybridization, Digital Gene Expression (Helicos), Next generation sequencing (NGS), Single Molecule Sequencing by Synthesis (SMSS) (Helicos), massively-parallel sequencing, Clonal Single Molecule Array (Solexa), shotgun sequencing, Ion Torrent, Oxford Nanopore, Roche Genia, Maxim-Gilbert sequencing, primer walking, and sequencing using PacBio, SOLiD, Ion Torrent, or Nanopore platforms.Sequencing reactions can be performed in a variety of sample processing units, which may include multiple lanes, multiple channels, multiple wells, or other means of processing multiple sample sets substantially simultaneously. Sample processing unit can also include multiple sample chambers to enable processing of multiple runs simultaneously.

[0385] In some embodiments, sequencing comprises detecting and / or distinguishing unmodified and modified nucleobases. For example, long-read sequencing (also referred to herein as third generation sequencing) methods include those that can generate longer sequencing reads, such as reads in excess of 10 kilobases, as compared to short-read sequencing methods, which generally produce reads of up to about 600 bases in length. Compared to short reads, long reads can improve de novo assembly, transcript isoform identification, and detection and / or mapping of structural variants. Furthermore, long-read sequencing of native DNA or RNA molecules reduces amplification bias and preserves base modifications, such as methylation status. Long-read sequencing technologies useful herein can include any suitable long-read sequencing methods, including, but not limited to, Pacific Biosciences (PacBio) single-molecule real-time (SMRT) sequencing, Oxford Nanopore Technologies (ONT) nanopore sequencing, and synthetic long-read sequencing approaches, such as linked reads, proximity ligation strategies, and optical mapping. Synthetic long-readAtty. Docket No. GH0193WO approaches comprise assembly of short reads from the same DNA molecule to generate synthetic long reads, and may be used in conjunction with “true” long-read sequencing technologies, such as SMRT and nanopore sequencing methods.

[0386] Single-molecule real-time (SMRT) sequencing can facilitate direct detection of, e.g., 5-methylcytosine and 5-hydroxymethylcytosine as well as unmodified cytosine. (Weirather JL, et al., “Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis,” FlOOOResearch, 6: 100, 2017). Whereas next-generation sequencing methods detect augmented signals from a clonal population of amplified DNA fragments, SMRT sequencing captures a single DNA molecule, maintaining base modification during sequencing. The error rate of raw PacBio SMRT sequencing-generated data is about 13-15%, as the signal-to-noise ratio from single DNA molecules not high. To increase accuracy, this platform uses a circular DNA template by ligating hairpin adapters to both ends of target double-stranded DNA. As the polymerase repeatedly traverses and replicates the circular molecule, the DNA template is sequenced multiple times to generate a continuous long read (CLR). The CLR can be split into multiple reads (“subreads”) by removing adapter sequences, and multiple subreads generate circular consensus sequence (“CCS”) reads with higher accuracy. The average length of a CLR is >10 kb and up to 60 kb, with length depending on the polymerase lifetime. Thus, the length and accuracy of CCS reads depends on the fragment sizes. PacBio sequencing has been utilized for genome (e.g., de novo assembly, detection of structural variants and haplotyping) and transcriptome (e.g., gene isoform reconstruction and novel gene / isoform discovery) studies.

[0387] SMRT sequencing relies on sequencing-by-synthesis, where the sequence of a circular DNA template is determined from the succession of fluorescence pulses, each resulting from the addition of one labelled nucleotide by a polymerase fixed to the bottom of a well. Base modifications do not affect the base-called sequence, but they affect the kinetics of the polymerase. By considering the inter-pulse duration (IPD), base modifications can be inferred from the comparison of a modified template to an in silico model or an unmodified template. Such methods can therefore use the pulse width of a signal from sequencing bases, the interpulse duration (IPD) of bases, and the identity of the bases in order to detect a modification in a base or in a neighboring base. (See e.g., Weirather et al., FlOOOResearch, 6: 100, 2017.) SMRT sequencing can thus be used to detect base modifications such as 5-caC, 4mC, 5mC, 5hmC, 6mA, and 8oxoG (Gouil & Keniry Essays in Biochemistry (2019) 63 639-648). Accordingly, in some embodiments, the sequencing comprises SMRT sequencing.Atty. Docket No. GH0193WO

[0388] Some sequencing reactions involve use of an enzyme to control passage of a nucleic acid through a nanopore, and in such cases reaction data can include both kinetics and other behavior of the enzyme and fluctuations in current through the nanopore. For example, ratchet proteins, helicases, or motor proteins can be used to push or pull a nucleic acid molecule through a hole in a biological or synthetic membrane. The kinetics of these proteins can vary depending on the sequence context of a nucleic acid on which they are acting. For example, they may slow down or pause at a modified base, and this behavior, captured as a part of the reaction data, is indicative of the presence of the modified base even where the modified base is not within the sensing portion of the nanopore.

[0389] One example of a nanopore-based single molecule sequencing system is that commercialized by Oxford Nanopore Technologies (ONT). (Weirather JL, et al., FlOOOResearch, 6: 100, 2017). ONT directly sequences a native single-stranded DNA (ssDNA) molecule by measuring characteristic current changes as the bases are threaded through the nanopore by a molecular motor protein. ONT uses a hairpin library structure similar to the PacBio circular DNA template: the DNA template and its complement are bound by a hairpin adapter. Therefore, the DNA template passes through the nanopore, followed by a hairpin and finally the complement. The raw read can be split into two “ID” reads (“template” and “complement”) by removing the adapter. The consensus sequence of two “ID” reads is a “2D” read with a higher accuracy.

[0390] Nanopore sequencing can be used to detect base modifications including 5-caC, 5mC, 5hmC, 6mA, BrdU, FldU, IdU, and EdU (see e.g., Gouil & Keniry Essays in Biochemistry (2019) 63 639-648; Kutyavin, Biochemistry (2008), 47, 51, 13666-1367; Muller et al., Nature Methods (2019), volume 16, pages 429-436; Hennion et al., Genome Biology (2020), volume 21, Article number: 125). Accordingly, in some embodiments, the sequencing comprises nanopore sequencing.

[0391] 5 -letter and 6-letter sequencing methods include whole genome sequencing methods capable of sequencing A, C, T, and G in addition to 5mC and 5hmC to provide a 5-letter (A, C, T, G, and either 5mC or 5hmC) or 6-letter (A, C, T, G, 5mC, and 5hmC) digital readout in a single workflow. The processing of the DNA sample is entirely enzymatic and avoids the DNA degradation and genome coverage biases of bisulfite treatment. In an exemplary 5-letter sequencing method developed by Cambridge Epigenetix, the sample DNA is first fragmented via sonication and then ligated to short, synthetic DNA hairpin adapters at both ends (Fullgrabe, et al. 2022, bioRxiv doi: https: / / doi.org / 10.1101 / 2022.07.08.499285). The construct is then split to separate the sense and antisense sample strands. For each originalAtty. Docket No. GH0193WO sample strand a complementary copy strand is synthesized by DNA polymerase extension of the 3 ’-end to generate a hairpin construct with the original sample DNA strand connected to its complementary strand, lacking epigenetic modifications, via a synthetic loop. Sequencing adapters are then ligated to the end. Modified cytosines are enzymatically protected. The unprotected Cs are then deaminated to uracil, which is subsequently read as thymine. In any such embodiments, amplification methods may comprise uracil- and / or dihydrouracil-tolerant amplification methods, such as PCR using a uracil- and / or dihydrouracil-tolerant DNA polymerase (i.e., a DNA polymerase that can read and amplify templates comprising uracil and / or dihydrouracil bases). The deaminated constructs are no longer fully complementary and have substantially reduced duplex stability, thus the hairpins can be readily opened and amplified by PCR. The constructs can be sequenced in paired-end format whereby read 1 (Pl primed) is the original stand and read 2 (P2 primed) is the copy stand. The read data is pairwise aligned so read 1 is aligned to its complementary read 2. Cognate residues from both reads are computationally resolved to produce a single genetic or epigenetic letter. Pairings of cognate bases that differ from the permissible five are the result of incomplete fidelity at some stage(s) comprising sample preparation, amplification, or erroneous base calling during sequencing. As these errors occur independently to cognate bases on each strand, substitutions result in a non-permissible pair. Non-permissible pairs are masked (marked as N) within the resolved read and the read itself is retained, leading to minimal information loss and high accuracy at read-level. The resolved read is aligned to the reference genome. Genetic variants and methylation counts are produced by read-counting at base-level.

[0392] 5hmC has been shown to have value as a marker of biological states and disease which includes early cancer detection from cell-free DNA. In adapting 5-letter to 6-letter sequencing, 5mC is disambiguated from 5hmC without compromising genetic base calling within the same sample fragment. The first three steps of the workflow are identical to 5- letter sequencing described above, to generate the adapter ligated sample fragment with the synthetic copy strand. Methylation at 5mC is enzymatically copied across the CpG unit to the C on the copy strand, whilst 5hmC is enzymatically protected from such a copy. Thus, unmodified C, 5mC and 5hmC in each of the original CpG units are distinguished by unique 2-base combinations. The unmodified cytosines are then deaminated to uracil, which is subsequently read as thymine. The DNA is subjected to PCR amplification and sequencing as described earlier. The reads are pairwise aligned and resolved using a 2-base code. Each of unmodified C, 5mC, and 5hmC can be resolved as the three CpG units are distinct sequencing environments of the 2-base code.Atty. Docket No. GH0193WO

[0393] In some embodiments, sequence coverage of the genome may be, for example, less than 5%, 10%, 15%, 20%, 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 99%, 99.9% or 100%. In some embodiments, the sequence reactions may provide for sequence coverage of at least 5%, 10%, 15%, 20%, 25%, 30%, 40%, 50%, 60%, 70%, or 80% of the genome. Sequence coverage can be performed on at least 5, 10, 20, 70, 100, 200 or 500 different genes, or up to, for example, 5000, 2500, 1000, 500 or 100 different genes.

[0394] Simultaneous sequencing reactions may be performed using multiplex sequencing. In some cases, cell-free nucleic acids may be sequenced with at least, for example, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 50000, 100,000 sequencing reactions. In other embodiments, cell-free nucleic acids may be sequenced with less than, for example, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 50000, 100,000 sequencing reactions. Sequencing reactions may be performed sequentially or simultaneously.Subsequent data analysis may be performed on all or part of the sequencing reactions. In some cases, data analysis may be performed on at least, for example, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 50000, 100,000 sequencing reactions. In other cases, data analysis may be performed on less than, for example, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 50000, 100,000 sequencing reactions. An exemplary read depth is 1000-50000, 1000-10000, or 1000-20000 reads per locus (base).

[0395] In some embodiments, sequencing DNA that was amplified using RCA (e.g., as described elsewhere herein) provides sequence reads comprising multiple copies of the sequence of an original sample molecule or converted molecule and the copies are used to determine a consensus sequence of the original sample molecule or converted molecule.1. Differential depth of sequencing

[0396] In some embodiments, nucleic acids corresponding to the sequence-variable target region set are sequenced to a greater depth of sequencing than nucleic acids corresponding to the epigenetic target region set. In some embodiments, nucleic acids corresponding to the hydroxymethylation-variable target region set are sequenced to a greater depth of sequencing than nucleic acids corresponding to at least one other target region set. For example, the depth of sequencing for nucleic acids corresponding to the sequence-variable and / or hydroxymethylation-variable target region sets may be at least 1.25-, 1.5-, 1.75-, 2-, 2.25-, 2.5-, 2.75-, 3-, 3.5-, 4-, 4.5-, 5-, 6-, 7-, 8-, 9-, 10-, 11-, 12-, 13-, 14-, or 15 -fold greater, or 1.25- to 1.5-, 1.5- to 1.75-, 1.75- to 2-, 2- to 2.25-, 2.25- to 2.5-, 2.5- to 2.75-, 2.75- to 3-, 3- to 3.5-, 3.5- to 4-, 4- to 4.5-, 4.5- to 5-, 5- to 5.5-, 5.5- to 6-, 6- to 7-, 7- to 8-, 8- to 9-, 9- to 10-, 10- to 11-, 11- to 12-, 13- to 14-, 14- to 15-fold, or 15- to 100-fold greater, than the depthAtty. Docket No. GH0193WO of sequencing for nucleic acids corresponding to the epigenetic target region set or to at least one other target region set. In some embodiments, said depth of sequencing is at least 2-fold greater. In some embodiments, said depth of sequencing is at least 5-fold greater. In some embodiments, said depth of sequencing is at least 10-fold greater. In some embodiments, said depth of sequencing is 4- to 10-fold greater. In some embodiments, said depth of sequencing is 4- to 100-fold greater. Each of these embodiments refer to the extent to which nucleic acids corresponding to the sequence-variable target region set are sequenced to a greater depth of sequencing than nucleic acids corresponding to the epigenetic target region set.

[0397] In some embodiments, the captured cfDNA corresponding to the sequence-variable target region set and the captured cfDNA corresponding to the epigenetic target region set are sequenced concurrently, e.g., in the same sequencing cell (such as the flow cell of an Illumina sequencer) and / or in the same composition, which may be a pooled composition resulting from recombining separately captured sets or a composition obtained by capturing the cfDNA corresponding to the sequence-variable target region set and the captured cfDNA corresponding to the epigenetic target region set in the same vessel.

[0398] In some embodiments, the captured cfDNA corresponding to the hydroxymethylation variable target region set and the captured cfDNA corresponding to the at least one other target region set are sequenced concurrently, e.g., in the same sequencing cell (such as the flow cell of an Illumina sequencer) and / or in the same composition, which may be a pooled composition resulting from recombining separately captured sets or a composition obtained by capturing the cfDNA corresponding to the hydroxymethylation variable target region set and the captured cfDNA corresponding to the at least one other target region set in the same vessel.2. Sequencing Methods with Partitioning

[0399] As a variation on grouping sequencing reads of the same original molecule by molecular barcodes, a sample can be partitioned into aliquots as described in PCT / US2025 / 035226, incorporated by reference herein. Partitioning can be used either for individual samples or pooled samples, in which nucleic acids from different samples are distinguished by sample indexes. Partitioning preferably occurs before any amplification of original sample nucleic acid molecules so that amplicons of the same original molecule are not partitioned from each other. Partitioning reduces the number of instances of nucleic acid molecules having the same start and stop points in an individual aliquot relative to the sample before partitioning. Preferably the number of instances of nucleic acid molecules having theAtty. Docket No. GH0193WO same start and stop points is reduced such that at least 75%, 80%, 90%, 95% or 99% of nucleic acid molecules in each aliquot have unique start and stop sequences.

[0400] The number of partitions depends on the characteristics of a population of nucleic acid molecules to be partitioned. These characteristics include the mean, median and mode of nucleic acid molecules having the same start and stop points, the maximum number of instances of nucleic acid molecules having the same start and stop points, and the overall distribution of instances of nucleic acid molecules having the same start and stop points.

[0401] For it to be statistically probable that an aliquot contains no instances of multiple nucleic acid molecules with the same start and stop points then the number of partitions should be equal to or greater (e.g., at least lx, 2x, 5 x or lOx) than the maximum number of instances of the same start and stop points in the sample before partition. Eight or sixteen partitions can sometimes be suitable.

[0402] With or without additional processing steps in separated partitions, the partitioned nucleic acid molecules can be labelled with partition indexes, such that nucleic acid molecules in the same aliquot receive the same partition index and nucleic acid molecules in at least some, and sometimes all of the different aliquots receive different partition indexes. Thus, linkage of sample molecules to partition indexes does not require random assortment of the partition indexes to the sample molecules. Partition indexes can be linked to sample molecules as primer components or by ligation, e.g., as a component of a further adapter. Preferably a partition index is included in one or both members of a pair of primers suitable for amplification of nucleic acid molecules in an aliquot. For example, such a primer pair can have 3’ regions complementary to adapter sequences flanking sample nucleic acid molecules, with one or both of the primers having a 5’ tail region including a partition index. If partition indexes are included in both members of a primer pair, the partition indexes can be the same or different from each other. After hybridization of such primers to adapter sequences, an amplification can conducted thereby covalently attaching partition indexes to sample nucleic acids.

[0403] An index is a short nucleic acid (e.g., less than 500, 100, 50, 20, 15, 10 or 5 nucleotides long), used to label nucleic acid molecules, for example to distinguish nucleic acids from different samples (a sample index), or nucleic acid molecules in different aliquots of the sample (partition indexes). The particular code stored by an index can be referred to as a designation of an index. Indexes are typically provided as sets of multiple different individual indexes for distinguishing samples or aliquots of a sample. That is, differentAtty. Docket No. GH0193WO samples receive different sample indexes from a set of sample indexes, and different aliquots receive different partition indexes.

[0404] In general, the distinction between a set of sample indexes and a set of partition indexes lies in the stages at which they added, the number of different indexes in the set, how the indexes are linked to samples nucleic acids, and the molecules they are used to distinguish rather than in indexes themselves. In principle, a set of sample indexes could be used as a set of partition indexes and vice versa. Preferably the code designations of a set of sample and partition indexes are mutually exclusive with one another.

[0405] After incorporation of partition indexes, further processing steps can be conducted on the aliquots separately or aliquots differentially labelled with partition indexes can be pooled and further processing steps performed on the pooled aliquots. Alternatively, the methods can be performed without use of partition indexes, in which case, all further processing steps are performed on separate aliquots so that it is known which sequencing reads originate from which aliquots. The methods can also be performed with some aliquots pooled and some kept separate from one another. The methods can also be performed with aliquots grouped in subpools, in which the aliquots within a subpool have different partition indexes from one another but aliquots in different subpools can have the same partition indexes as any of the other subpools. The different subpools are then kept separate from one another in subsequent processing whereas the aliquots within a subpooled are processed together. Sequencing reads can be traced back to the aliquot of origin based on a combination of the partition index present in a sequencing read and knowledge of the subpool from which it originated.

[0406] Further processing steps can include further amplification, affinity-enrichment for DNA molecules from selected genomic regions, sequencing and analysis of sequence reads. When partition indexes are used, sequencing is preferably performed after pooling of aliquots into a single vessel. Thus, nucleic acid molecules from the previously separate aliquots and different samples are sequenced together. When partition indexes are not used, sequencing is preferably performed keeping nucleic acid molecules from the different aliquots separate.

[0407] Sequencing reads from a sample are grouped to their molecule of origin by aliquot of origin determined by partition index or otherwise as described above, and a measure of sequence identity or similarity between sequencing reads. This measure can be start and stop points, which can be determined, for example, after alignment of sequencing reads with a reference sequence, length of sequencing reads, or minimum sequence similarity between reads (e.g., at least 95 or 99% identity after maximal alignment). If samples are pooled, sequence reads can be traced to a sample of origin from a sample index in the sequencingAtty. Docket No. GH0193WO read. Lane information, when determined, can also be used in grouping sequencing reads. Grouping of sequencing reads by molecule of origin permits distinction of genuine genetic or epigenetic variation from amplification and sequencing errors as further described below.

[0408] Methylation analysis can involve methylation-based separation of nucleic acid molecules. In some embodiments, methylation-based separation of nucleic acid molecules is performed by contacting the nucleic acid molecules with an agent that recognizes methylated DNA, such as 5-methylcytosine. In particular embodiments, the agent is a methyl binding reagent. In particular embodiments, the methyl binding reagent is a methyl binding domain (MBD) protein or an antibody. In some embodiments, the methyl binding reagent specifically recognizes 5-methylcytosine. For example, methylated fragments in a DNA sample can be separated via methylated DNA immunoprecipitation (MeDIP), or methylated fragments can be separated from unmethylated fragments using methyl binding domain proteins (e.g., MethylMiner™ Methylated DNA Enrichment Kit (ThermoFisher Scientific).

[0409] One application of partition methods is analysis of methylation state of nucleic acids. Methylation analysis can comprise subjecting parent nucleic acids or amplification products thereof to a procedure that affects a first nucleobase in the nucleic acid differently from a second nucleobase, for example wherein the first nucleobase is a modified or unmodified nucleobase, the second nucleobase is a modified or unmodified nucleobase different from the first nucleobase, and the first nucleobase and the second nucleobase have the same base pairing specificity. In some embodiments, the procedure that affects a first nucleobase of the nucleic acid differently from a second nucleobase of the nucleic acid is a methylationsensitive conversion. In particular embodiments, the methylation-sensitive conversion is bisulfite conversion, oxidative bisulfite (Ox-BS) conversion, Tet-assisted bisulfite (TAB) conversion, APOBEC-coupled epigenetic (ACE) conversion, enzymatic methyl-seq (EM- seq) conversion, single-enzyme 5-methylcytosine sequencing (SEM-seq) conversion, or direct methylation sequencing (DM-seq).

[0410] Comparison of sequencing reads from treated and control groups indicates which cytosines were subject of modification. Splitting into groups for analysis of DNA modification is preferably performed after partitioning of samples or combined samples into aliquots so members of the same pairs of duplex strands are present in the same aliquot. Conversion also preferably precedes amplification. Conversion can occur before or after enrichment. If conversion occurs before enrichment, probes must be modified to hybridize with modified bases (e.g., U / T in place of C). Thus, a preferred order of steps is to attachAtty. Docket No. GH0193WO sample indexes to different samples, pool the different samples, partition the pooled samples, conversion of portions of the partitioned samples, amplification, enrichment and sequencing.

[0411] Methylation analysis can alternatively involve methylation-based separation of nucleic acid molecules. In some embodiments, methylation-based separation of nucleic acid molecules is performed by contacting the nucleic acid molecules with an agent that recognizes methylated DNA, such as 5-methylcytosine. In particular embodiments, the agent is a methyl binding reagent. In particular embodiments, the methyl binding reagent is a methyl binding domain (MBD) protein (e.g., see WO2018119452) or an antibody. In some embodiments, the methyl binding reagent specifically recognizes 5-methylcytosine. For example, methylated fragments in a DNA sample can be separated via methylated DNA immunoprecipitation (MeDIP), or methylated fragments can be separated from unmethylated fragments using methyl binding domain proteins (e.g., MethylMiner™ Methylated DNA Enrichment Kit (ThermoFisher Scientific). These types of methods separate DNA fragments having a high methyl C content from those with a low methyl C content before sequencing.

[0412] In one format, MBD separation is performed on individual samples, resulting in two portions for each sample, one having high methyl C content, the other lower methyl C content. The portions are then labelled with sample indexes. The portions are then pooled, high methyl content portions being pooled together, and low methyl content portions being pooled together. The two pools are then partitioned. Amplification and enrichment are performed in the separate partitions followed by attachment of partitions indexes. The partitions are then combined for sequencing. In another format, after ligation of sample indexes all portions are combined in the same pool instead of splitting into high and low methyl content pools. In another form, samples indexes are attached to samples before MBD separation. Thus, high and low methyl portions after MBD separation have the same sample index and are kept separate by pooling into two pools one with high methyl content, the other low methyl content. The two pools are separately portioned. The partitions are subject to amplification and enrichment followed by incorporation of partition indexes. The partitions are then combined for sequencing.

[0413] In some embodiments, sequencing of different aliquots is performed in different flow cells or different regions or lanes of the same flow cell. Different aliquots can be tracked using aliquot-specific partition indices (“Variation #1”) or tracked using partition indices and separate sequencing (“Variation #2”). In both variations, 96 samples, for example, are each ligated to a different sample index, and subsequently mixed and aliquoted into 96 wells. The particular numbers of samples and partitions are provided as an example. In Variation #1,Atty. Docket No. GH0193WO each well receives a different partition index via PCR with labelled primers (i.e., the partition indices are aliquot-specific), and aliquots are subsequently pooled into a single pool prior to sequencing. The deconvolution of sequencing reads to original molecules is performed using the partition index, start / stop positions, and (for sample demultiplexing) the sample index. In Variation #2, each column of wells receives the same partition index whereas partition indices vary across each row, such that partition indices are aliquot-specific only with respect to a subset of the aliquots (and not all aliquots). In this variation, each row of aliquots is pooled (i.e., the pooling is amongst aliquots differentially labelled with partition indices), and each subset pool is sequenced separately). For example, each of the eight subset pools can be loaded onto a separate lane of a flow cell comprising eight lanes (or loaded on different flow cells or different sequencing instruments). The deconvolution of sequencing reads to original molecules is performed using the partition index, the separate sequencing, start / stop positions, and (for sample demultiplexing) the sample index.

[0414] In some embodiment, the methods do not necessarily involve an initial step of sample mixing before partitioning. Mixing or pooling nucleic acids from different samples after initial processing steps advantageously allows different samples to be subjected to different processing steps (e.g. different enrichment reactions). For example, in one embodiment, each of 96 samples is partitioned into eight aliquots, i.e. one column of wells per sample. The particular numbers of samples and partitions are provided as an example. Partition indices are introduced via PCR, wherein four different partition indices are used, such that two aliquots of each sample receive the same partition index. Aliquots of the same sample that have been differentially labelled with partition indices are then pooled such that two subset pools are generated per sample, which in turn means that two enrichment reactions are performed per sample (the enrichment reactions are performed on the subset pool). The two different subset pools deriving from the same sample are sequenced separately (e.g. in different lanes), and subsequent deconvolution of sequencing reads to original molecules is performed using the partition index, the separate sequencing, and start / stop positions. The partition indices may not be sample-specific (the partition indices are the same across rows), so the method can use tagging with sample indices before sample multiplexing. Alternatively, partition indices can be sample-specific, e.g. each column of wells can receive a different set of four partition indices; in such a case the ligation of separate indices for sample demultiplexing is not required. Demultiplexing by sample of origin is based on the sample index or the samplespecific partition index.Atty. Docket No. GH0193WOH. Analysis

[0415] The present disclosure provides methods of analyzing DNA or nucleic acids in a sample. In some embodiments, a method described herein comprises identifying the presence of DNA produced by a tumor (or neoplastic cells, or cancer cells). Methods of analyzing DNA or nucleic acids from a sample herein can comprise contacting the sample with a methylation-discriminating nuclease, thereby degrading methylated or unmethylated DNA to produce a treated sample, wherein the sample is an original sample or is a proportionately methylated sample relative to an original sample, and subjecting the treated sample to a procedure that affects a first nucleobase differently from a second nucleobase, wherein the first nucleobase is a modified or unmodified nucleobase, the second nucleobase is a modified or unmodified nucleobase different from the first nucleobase, and the first nucleobase and the second nucleobase have the same base pairing specificity, and wherein the procedure alters the base pairing specificity of the first or second nucleobase to produce a treated and converted sample. Methods of analyzing DNA or nucleic acids from a sample herein can comprise can comprise subjecting the sample to a procedure that affects a first nucleobase differently from a second nucleobase, wherein the sample is an original sample or is a proportionately methylated sample relative to an original sample, wherein the first nucleobase is a modified or unmodified nucleobase, the second nucleobase is a modified or unmodified nucleobase different from the first nucleobase, and the first nucleobase and the second nucleobase have the same base pairing specificity, and wherein the procedure alters the base pairing specificity of the first or second nucleobase to produce a converted sample and contacting the converted sample with a methylation-discriminating nuclease, thereby degrading methylated or unmethylated DNA to produce a treated and converted sample

[0416] The present methods can be used to diagnose presence of conditions, particularly cancer or precancer, in a subject, to characterize conditions (e.g., staging cancer or determining heterogeneity of a cancer), monitor response to treatment of a condition, effect prognosis risk of developing a condition or subsequent course of a condition. The present disclosure can also be useful in determining the efficacy of a particular treatment option. Successful treatment options may decrease the amount of copy number variation or rare mutations detected in subject’s blood if the treatment is successful as there will be fewer cancer cells to shed DNA. In other examples, this may not occur. In another example, perhaps certain treatment options may be correlated with genetic profiles of cancers over time. This correlation may be useful in selecting a therapy.Atty. Docket No. GH0193WO

[0417] Analyzing DNA may comprise detecting or quantifying DNA of interest. Analyzing DNA can comprise detecting genetic variants and / or epigenetic features (e.g., DNA methylation and / or DNA fragmentation). In some embodiments, the DNA of interest is one or more differentially methylated regions of the DNA. In some embodiments, the detecting or quantifying the DNA of interest comprises quantifying and / or detecting a level of methylation at one or more differentially methylated regions of the DNA. In some embodiments, quantifying and / or detecting the level of methylation at one or more differentially methylated regions of the DNA comprises sequencing at least a portion of the amplified DNA or quantitative PCR (qPCR). In some embodiments, the DNA of interest is a differentially methylated region. In some embodiments, the detecting or quantifying the DNA of interest comprises quantifying and / or detecting a level of a differentially methylated region of the DNA. In some embodiments, quantifying and / or detecting the level of a differentially methylated region of the DNA comprises quantitative PCR (qPCR).

[0418] Additionally, if a cancer is observed to be in remission after treatment, the present methods can be used to monitor residual disease or recurrence of disease.

[0419] The types and number of cancers that may be detected may include blood cancers, brain cancers, lung cancers, skin cancers, nose cancers, throat cancers, liver cancers, bone cancers, lymphomas, pancreatic cancers, skin cancers, bowel cancers, rectal cancers, colon cancers, prostate cancers, thyroid cancers, bladder cancers, head and neck cancers, kidney cancers, mouth cancers, stomach cancers, solid state tumors, heterogeneous tumors, homogenous tumors and the like. Type and / or stage of cancer can be detected from genetic variations including mutations, rare mutations, indels, copy number variations, transversions, translocations, recombination, inversion, deletions, aneuploidy, partial aneuploidy, polyploidy, chromosomal instability, chromosomal structure alterations, gene fusions, chromosome fusions, gene truncations, gene amplification, gene duplications, chromosomal lesions, DNA lesions, abnormal changes in nucleic acid chemical modifications, abnormal changes in epigenetic patterns, and abnormal changes in nucleic acid 5-methylcytosine.

[0420] In some embodiments, a method described herein comprises identifying the presence of nucleic acids, such as DNA, produced by a tumor (or neoplastic cells, or cancer cells) or by precancer cells.

[0421] Genetic data can be used for characterizing a specific form of cancer. Cancers are often heterogeneous in both composition and staging. Genetic profile data may allow characterization of specific sub-types of cancer that may be useful in the diagnosis or treatment of that specific sub-type. This information may also provide a subject orAtty. Docket No. GH0193WO practitioner clues regarding the prognosis of a specific type of cancer and allow either a subject or practitioner to adapt treatment options in accord with the progress of the disease. Some cancers progress, becoming more aggressive and genetically unstable. Other cancers may remain benign, inactive or dormant. The system and methods of this disclosure may be useful in determining disease progression.

[0422] The present methods are useful in determining the efficacy of a particular treatment option. The present methods can also be used for detecting epigenetic variations in conditions other than cancer. Further, the methods of the disclosure may be used to characterize the heterogeneity of an abnormal condition in a subject, the method comprising generating a genetic profile of extracellular polynucleotides in the subject, wherein the genetic profile comprises a plurality of data resulting from epigenetic information (such as methylation profiling), and optionally copy number variation and rare mutation analyses. In some cases, including but not limited to cancer, a disease may be heterogeneous. Disease cells may not be identical. In the example of cancer, some tumors are known to comprise different types of tumor cells, some cells in different stages of the cancer. In other examples, heterogeneity may comprise multiple foci of disease. Again, in the example of cancer, there may be multiple tumor foci, perhaps where one or more foci are the result of metastases that have spread from a primary site.

[0423] The present methods can thus be used to generate_or profile, fingerprint or set of data that is a summation of epigenetic, and optionally genetic, information derived from different cells in a heterogeneous disease. This set of data may comprise epigenetic information, copy number variation, and / or rare mutation analyses alone or in combination.

[0424] The present disclosure provides methods of analyzing DNA. In some embodiments, the disclosed methods comprise analyzing DNA (such as DNA from a subject) to identify at least one cell type, cell cluster type, tissue type, and / or cancer type from which one or more type-specific epigenetic target regions and / or type-specific sequence-variable target regions originated. In some embodiments, methods comprise determining the level of one or more type-specific epigenetic target regions and / or type-specific sequence-variable target regions that originated from the at least one cell type, cell cluster type, tissue type, and / or cancer type.

[0425] In some embodiments, detecting the presence, levels, or absence of DNA sequences and / or modifications facilitates disease diagnosis or identification of appropriate treatments. In some embodiments, the presence of or a change in the levels of one or more sequences and / or modifications is indicative of the presence or absence of a disease or disorder in aAtty. Docket No. GH0193WO subject, such as cancer or precancer, or other disorder that causes changes in nucleic acids relative to a healthy subject.

[0426] Information and data generated by the methods disclosed herein can also be used for characterizing a specific form of cancer. The methods disclosed herein may allow characterization of specific sub-types of cancer that may be important in the diagnosis or treatment of that specific sub-type. This information may also provide a subject or practitioner clues regarding the prognosis of a specific type of cancer and allow either a subject or practitioner to adapt treatment options in accord with the progress of the disease. Some cancers can progress to become more aggressive and genetically unstable. Other cancers may remain benign, inactive or dormant. The system and methods of this disclosure may be useful in determining disease progression.

[0427] Further, the methods of the disclosure may be used to characterize the heterogeneity of a condition in a subject. Such methods can include, e.g., generating an aggregate profile of extracellular nucleic acids derived from the subject, wherein the aggregate profile comprises a plurality of data resulting from various nucleic acid analyses. In some embodiments, the aggregate profile comprises epigenetic and mutation analyses. In some embodiments, an aggregate profile comprises a summation of information derived from different cells in a heterogeneous disease. This summation may comprise structural variation identities and levels, copy number variation, epigenetic variation, or other mutation analyses.

[0428] An exemplary method for analyzing DNA comprises the following steps:1. Extracting a DNA sample (e.g., extracted blood plasma DNA from a human sample).2. Performing end-repair and A-tailing reactions on the DNA in the sample.3. Ligating at least a portion of the end-repaired products to an adapter, in which the adapter is protected from a conversion step.4. Digesting the ligated DNA with one or more MSREs, cleaving unmethylated DNA molecules at the RE recognition site, to produce a treated sample.5. Converting the treated sample to produce a treated and converted sample.6. Optionally, enriching the treated and converted sample, targeting genomic regions.7. Sequencing at least a portion of the treated and converted sample, optionally on an NGS instrument.8. Analyzing the sequencing data, optionally with the adapters being used to identify unique molecules. This analysis can yield information on relative 5-methylcytosine for genomic regions, concurrent with standard genetic sequencing / variant detection.

[0429] An exemplary method for analyzing DNA comprises the following steps:Atty. Docket No. GH0193WO1. Extracting a DNA sample (e.g., extracted blood plasma DNA from a human sample).2. Performing end-repair and A-tailing reactions on the DNA in the sample.3. Ligating at least a portion of the end-repaired products to an adapter, in which the adapter is protected from a conversion step.4. Subjecting the ligated sample to a conversion procedure that alters the base pairing specificity of mC without affecting unmethylated cytosines, providing a converted sample.5. Contacting the converted sample with one or more methylation-sensitive restriction enzymes (MSRE). The MSREs cleave at least a portion of unmethylated DNA to produce a treated sample and converted sample.6. Optionally, enriching the treated and converted sample, targeting genomic regions.7. Sequencing at least a portion of the treated and converted sample, optionally on an NGS instrument.8. Analyzing the sequencing data, optionally with the adapters being used to identify unique molecules. This analysis can yield information on relative 5-methylcytosine for genomic regions, concurrent with standard genetic sequencing / variant detection.

[0430] An exemplary method for analyzing DNA comprises the following steps:1. Extracting a DNA sample (e.g., extracted blood plasma DNA from a human sample).2. Performing end-repair and A-tailing reactions on the DNA in the sample.3. Ligating at least a portion of the end-repaired products to an adapter, in which the adapter is protected from a conversion step.4. Digesting the ligated products with one or more MDREs, cleaving methylated DNA molecules at the RE recognition site to produce a treated sample.5. Converting the treated sample to produce a treated and converted sample.6. Optionally, enriching the treated and converted sample, targeting genomic regions.7. Sequencing at least a portion of the treated and converted sample, optionally on an NGS instrument.8. Analyzing the sequencing data, optionally with the adapters being used to identify unique molecules. This analysis can yield information on relative 5-methylcytosine for genomic regions, concurrent with standard genetic sequencing / variant detection.

[0431] An exemplary method for analyzing DNA comprises the following steps:1. Extracting a DNA sample (e.g., extracted blood plasma DNA from a human sample).2. Performing end-repair and A-tailing reactions on the DNA in the sample.3. Ligating at least a portion of the end-repaired products to an adapter, in which the adapter is protected from a conversion step.Atty. Docket No. GH0193WO4. Converting the ligated products to produce a converted sample.5. Digesting the converted sample with one or more MDREs, cleaving methylated DNA molecules at the RE recognition site, to produce a treated and converted sample.6. Optionally, enriching the treated and converted sample, targeting genomic regions.7. Sequencing at least a portion of the treated and converted sample, optionally on an NGS instrument.8. Analyzing the sequencing data, optionally with the adapters being used to identify unique molecules. This analysis can yield information on relative 5 -methyl cytosine for genomic regions, concurrent with standard genetic sequencing / variant detection.I. Amplification

[0432] In some embodiments, DNA is amplified. In some embodiments, the DNA can be subjected to a plurality of distinct amplification reactions. For example, adapted DNA can be amplified (e.g. by PCR) prior to, or as part of, sequencing. For example, in sequencing procedures which comprise a conversion step, the adapted DNA may be amplified after the conversion step. In sequencing procedures which involve single molecule sequencing (such a nanopore-based sequencing or SMRT sequencing), there may be no amplification step. In some embodiments, the DNA of a treated and converted sample is amplified. In some embodiments, the DNA is amplified prior to a step of subjecting DNA of a treated and converted sample to sequencing. In some embodiments, DNA is amplified after ligating adapters to the DNA and / or before sequencing the DNA.

[0433] In some embodiments, the amplification of the DNA in the converted sample or treated and converted sample comprises using a DNA polymerase. In some embodiments, the DNA polymerase is a uracil-tolerant DNA polymerase.

[0434] In some embodiments, the uracil -tolerant polymerase may be Q5U® Hot Start High- Fidelity DNA Polymerase, OneTaq® DNA Polymerase, Taq DNA Polymerase, Long Amp® Taq DNA Polymerase, Hemo Klen' / bc / , Epimark® Hot Start Taq DNA Polymerase, Bst DNA Polymerase, Full Length, Bst DNA Polymerase, Large Fragment, Bst 2.0 DNA Polymerase, Bst 3.0 DNA Polymerase, Bsu DNA Polymerase, Large Fragment, phi29 DNA Polymerase, phi29-XT DNA Polymerase, Therminator™ DNA Polymerase, DNA Polymerase I (E. coll), DNA Polymerase I, Large (Klenow) Fragment (“Klenow fragment”), Klenow Fragment (3 ' — >5' exo-), or any combination thereof.

[0435] Amplification is typically primed by primers binding to primer binding sites in adapters flanking a DNA molecule to be amplified. Amplification methods can involve cycles ofAtty. Docket No. GH0193WO denaturation, annealing and extension, resulting from thermocycling or can be isothermal as in transcript! on -mediated amplification. For example, sample nucleic acids flanked by adapters can be amplified by PCR and other amplification methods. Amplification methods of use herein can include any suitable methods, such as known to those of ordinary skill in the art. In some embodiments, amplification is primed by primers binding to primer binding sites in adapters flanking a DNA molecule to be amplified. Amplification methods can involve cycles of denaturation, annealing and extension, resulting from thermocycling, such as polymerase chain reaction (PCR), or can be isothermal, such as in linear amplification methods, transcription-mediated amplification, recombinase polymerase amplification (RPA), helices dependent amplification (HDA), loop-mediated isothermal amplification (LAMP) (Notomi et al., Nuc. Acids Res., 28, e63, 2000), rolling-circle amplification (RCA) (Blanco et al., J. Biol. Chem., 264, 8935-8940, 1989), or hyperbranched rolling circle amplification (Lizard et al., Nat. Genetics, 19, 225-232, 1998). Other amplification methods include the ligase chain reaction, strand displacement amplification, nucleic acid sequence based amplification, and self-sustained sequence based replication.

[0436] In some embodiments, the present methods perform dsDNA ligations with T-tailed and C-tailed adapters. The addition of C-tailed adapters can increase ligation efficiency because the A-tailing reaction can also add G-tails to a small portion of the DNA molecules, when the A tailing is performed in the presence of dGTP, such as when the A-tailing is performed in the same reaction as the end repair. The use of T-tailed and C-tailed adapters can result in amplification of at least 50, 60, 70 or 80% of double stranded nucleic acids before. The present methods can increase the amount or number of amplified molecules relative to control methods performed with T-tailed adapters alone by at least 10, 15 or 20%.

[0437] In some embodiments, adapted DNA is amplified before sequencing. Amplification may in some cases be before one or more capture steps. In some embodiments, the ligation step occurs after the conversion step. In some embodiments, the ligation occurs before or simultaneously with amplification.

[0438] In some embodiments, the amplification of the DNA (e.g., adapter ligated DNA) comprises using a DNA polymerase. In some embodiments, the DNA polymerase may be Q5® High-Fidelity DNA Polymerase, Q5U® Hot Start High-Fidelity DNA Polymerase, Phusion® High-Fidelity DNA Polymerase, OneTaq® DNA Polymerase, Taq DNA Polymerase, Long Amp® Taq DNA Polymerase, Hemo Klen' / bc / , Epimark® HotStart Taq DNA Polymerase, Bst DNA Polymerase, Full Length, Bst DNA Polymerase, Large Fragment, Bst 2.0 DNA Polymerase, Bst 3.0 DNA Polymerase, Bsu DNA Polymerase, LargeAtty. Docket No. GH0193WOFragment, phi29 DNA Polymerase, phi29-XT DNA Polymerase, Sulfolobus DNA Polymerase IV, Therminator™ DNA Polymerase, T7 DNA Polymerase, DNA Polymerase I (E. coli), DNA Polymerase I, Large (Klenow) Fragment (“Klenow fragment”), Klenow Fragment (3 ' — 5' exo-), T4 DNA Polymerase, Vent® DNA Polymerase, Vent® (exo-) DNA Polymerase, Deep Vent® DNA Polymerase, Deep Vent® (exo-) DNA Polymerase, or any combination thereof.

[0439] In some embodiments, DNA can be amplified by methylation-preserving amplification. In some embodiments, the methylation-preserving amplification can occur before the contacting the DNA in a sample with an mCpG-binding protein. For an exemplary description of mCpG binding domain proteins, see, e.g., Du et al., Methyl-CpG-binding domain proteins: readers of the epigenome. Epigenomics. 2015;7(6):1051-73.

[0440] Amplification, including methylation-preserving amplification, is typically primed by primers binding to primer binding sites in adapters flanking a DNA molecule to be amplified. Amplification methods can involve cycles of denaturation, annealing and extension, resulting from thermocycling or can be isothermal as in transcription-mediated amplification. For example, DNA flanked by adapters added to the DNA as described herein can be amplified by PCR or other amplification methods. Amplification methods of use herein, including methylation-preserving amplification, can include any suitable methods, such as known to those of ordinary skill in the art. In some embodiments, amplification is primed by primers binding to primer binding sites in adapters flanking a DNA molecule to be amplified.Amplification methods can involve cycles of denaturation, annealing and extension, resulting from thermocycling, such as polymerase chain reaction (PCR), or can be isothermal, such as in linear amplification methods, transcription-mediated amplification, recombinase polymerase amplification (RPA), helices dependent amplification (HD A), loop-mediated isothermal amplification (LAMP) (Notomi et al., Nuc. Acids Res., 28, e63, 2000), rollingcircle amplification (RCA) (Blanco et al., J. Biol. Chem., 264, 8935-8940, 1989), or hyperbranched rolling circle amplification (Lizard et al., Nat. Genetics, 19, 225-232, 1998). Other amplification methods include the ligase chain reaction, strand displacement amplification, nucleic acid sequence-based amplification, and self-sustained sequence based replication. In some embodiments, the methylation-preserving amplification comprises linear amplification with thermocycling.

[0441] In some embodiments, methylation-preserving amplification comprises amplification performed in the presence of a methyltransferase. Methylating agents of use in methylationpreserving amplification methods described herein are known to those of ordinary skill in theAtty. Docket No. GH0193WO art, and can include, for example, any suitable methyltransferase. In some embodiments, the methylating agent is DNMT1. DNMT1 is the most abundant DNA methyltransferase in mammalian cells and predominantly methylates hemimethylated CpG di-nucleotides in the mammalian genome. For example, DNA molecules replicated using PCR amplification with DNMT1 incubation will maintain their methylation status post-amplification, for use in further analyses, such as those described herein (such as an epigenetic base conversion step and / or an enrichment step).

[0442] Additional methylating agents useful herein include the mammalian methyltransferases, DNMT3a and DNMT3b, the plant methyltransferases, MET1, and CMT3. In some embodiments, DNMT1 or another suitable methyltransferase is used with a methyl donor and may be used with or without cofactors known to those of ordinary skill in the art. DNMT1 works in vitro at 95% efficiency without a cofactor; however, DNMT1 may be used with a cofactor such as NP95(Uhrfl), such as described in Bashtrykov PI, et al. “The UHRF1 protein stimulates the activity and specificity of the maintenance DNA methyltransferase DNMT1 by an allosteric mechanism.” J Biol Chem. 2014. In some embodiments, DNMT1 is used at a concentration of about 50-10000 U / mL, such as about 50- 2000, about 50-5000, about 2500-7500, or about 5000-10000 U / mL. In some embodiments, DNMT1 is used at a concentration of about 100-500, about 500-1000, about 100-1000, about 1000-1500, about 500-1500, about 600-1400, about 700-1300, about 800-1200, about 900- 1100, or about 950-1050 U / mL. In some embodiments, DNMT1 is used at a concentration of about 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1550, 1600, 1650, 1700, 1750, 1800, 1850, 1900, 1950, or about 2000 U / mL. In some embodiments, DNMT1 is used at a concentration of about 1,000 U / ml.

[0443] In some embodiments, enriching methylated DNA in a sample comprises amplification, such as embodiments comprising quantitative PCR (qPCR) or digital PCR. Some such embodiments comprising targeted detection of DNA sequences using qPCR or digital PCR do not comprise standard DNA library preparation steps, such as adapter ligation or tagging.

[0444] In some embodiments, the present methods perform dsDNA ligations with T-tailed and C-tailed adapters. The addition of C-tailed adapters can increase ligation efficiency because the A-tailing reaction can also add G-tails to a small portion of the DNA molecules, when the A tailing is performed in the presence of dGTP, such as when the A-tailing is performed in the same reaction as the end repair. The use of T-tailed and C-tailed adaptersAtty. Docket No. GH0193WO can result in amplification of at least 50, 60, 70 or 80% of double stranded nucleic acids. The present methods can increase the amount or number of amplified molecules relative to control methods performed with T-tailed adapters alone by at least 10, 15, or 20%.

[0445] In some embodiments, adapted DNA is amplified before sequencing. Amplification may in some cases be before one or more capture steps. In some embodiments, the ligation step occurs after the conversion step. In some embodiments, the ligation occurs before or simultaneously with amplification.

[0446] In some embodiments, an amplification of the DNA in a sample comprises amplifying rolling-circle amplification (RCA). In some embodiments, RCA comprises circularizing a DNA template (e.g., DNA in the converted sample). In some embodiments, RCA comprises copying the circularized DNA template using a rolling circle polymerase to generate a plurality of circularized DNA templates. In some embodiments, the rolling circle polymerase is a phi29 DNA polymerase. Exemplary methods of RCA are provided, e.g., in Lou et al., Proc. Natl. Acad. Sci. 110 (49) 19872-19877 (2013). In some embodiments, the RCA occurs prior to a step of sequencing the DNA.

[0447] This may be an additional amplification step subsequent to an earlier amplification step, such as amplification as described elsewhere herein. In some embodiments, amplification of adapted DNA comprises RCA, e.g., as described above. In some embodiments, RCA comprises copying the circularized DNA template using a rolling circle polymerase to generate a plurality of circularized DNA templates. In some embodiments, the rolling circle polymerase is a phi29 DNA polymerase.

[0448] In some embodiments, sequencing DNA that was amplified using RCA (e.g., as described elsewhere herein) provides sequence reads comprising multiple copies of the sequence of an original sample molecule or converted molecule and the copies are used to determine a consensus sequence of the original sample molecule or converted molecule.J. Ligation to Adapters

[0449] In some embodiments, the methods comprise ligating adapters to DNA. In some embodiments, the ligating adapters to DNA produces adapter-ligated DNA. In some embodiments, DNA molecules can be subjected to blunt-end ligation with blunt-ended adapters. In some embodiments, DNA molecules can be subjected to sticky-end ligation with sticky-ended adapters. DNA molecules can be ligated to adapters at either one end or both ends. DNA molecules can be ligated with at least partially double stranded adapter (e.g., a Y shaped or bell-shaped adapter).Atty. Docket No. GH0193WO

[0450] In some embodiments, the ligation step can take place prior to or after sequencing the DNA. In some embodiments, the ligation step can take place prior to sequencing the DNA. In some embodiments, the ligation step can take place prior to or after subjecting the sample or treated sample to a procedure that affects a first nucleobase in the DNA differently from a second nucleobase. In some embodiments, the ligation step can take place prior to subjecting the sample or treated sample to a procedure that affects a first nucleobase in the DNA differently from a second nucleobase. In some...

Claims

Atty. Docket No. GH0193WOWhat is claimed is:

1. A method of analyzing DNA in a sample, the method comprising: a) contacting the sample with a methylation-discriminating nuclease, thereby degrading methylated or unmethylated DNA to produce a treated sample, wherein the sample is an original sample or is a proportionately methylated sample relative to an original sample; and b) subjecting the treated sample to a procedure that affects a first nucleobase differently from a second nucleobase, wherein the first nucleobase is a modified or unmodified nucleobase, the second nucleobase is a modified or unmodified nucleobase different from the first nucleobase, and the first nucleobase and the second nucleobase have the same base pairing specificity, and wherein the procedure alters the base pairing specificity of the first or second nucleobase to produce a treated and converted sample.

2. A method of analyzing DNA in a sample, the method comprising: a) subjecting the sample to a procedure that affects a first nucleobase differently from a second nucleobase, wherein the sample is an original sample or is a proportionately methylated sample relative to an original sample, wherein the first nucleobase is a modified or unmodified nucleobase, the second nucleobase is a modified or unmodified nucleobase different from the first nucleobase, and the first nucleobase and the second nucleobase have the same base pairing specificity, and wherein the procedure alters the base pairing specificity of the first or second nucleobase to produce a converted sample; and b) contacting the converted sample with a methylation-discriminating nuclease, thereby degrading methylated or unmethylated DNA to produce a treated and converted sample.

3. The method of any one of the preceding claims, wherein the DNA comprises cell-free DNA (cfDNA).

4. The method of any one of the preceding claims, wherein the sample is a tissue sample.

5. The method of any one of the preceding claims, wherein the sample is a blood sample.Atty. Docket No. GH0193WO6. The method of the immediately preceding claim, wherein the blood sample is a whole blood sample, a plasma sample, a buffy coat sample, a leukapheresis sample, or a peripheral blood mononuclear cell (PBMC) sample.

7. The method of any one of the preceding claims, wherein the methylationdiscriminating nuclease is a methylation-dependent restriction enzyme (MDRE).

8. The method of the immediately preceding claim, wherein the MDRE cleaves a methylated CpG sequence.

9. The method of claim 7 or claim 8, wherein the MDRE is one or more of MspJI, LpnPI, FspEI, or McrBC.

10. The method of any one of claims 1-6, wherein the methylation-discriminating nuclease is a methylation-sensitive restriction enzyme (MSRE).

11. The method of the immediately preceding claim, wherein the MSRE cleaves an unmethylated CpG sequence.

12. The method of claim 10 or claim 11, wherein the MSRE is one or more of Aatll, AccII, Acil, Aorl3HI, Aorl5HI, BspT104I, BssHII, BstUI, CfrlOI, Clal, Cpol, Eco52I, Haell, HapII, Hhal, Hin6I, Hpall, HpyCH4IV, Mlul, Nael, Notl, Nrul, Nsbl, PmaCI, Psp 14061, Pvul, SacII, Sall, Smal, and SnaBI.

13. The method of any one of the preceding claims, wherein the procedure that affects a first nucleobase in the DNA differently from a second nucleobase comprises a conversion procedure.

14. The method of any one of the preceding claims, wherein the first nucleobase is an unmodified cytosine and the second nucleobase is a modified cytosine, optionally wherein the modified cytosine is 5-methylcytosine or 5-hydroxymethylcytosine.Atty. Docket No. GH0193WO15. The method of any one of the preceding claims, wherein the procedure that affects a first nucleobase of the DNA differently from a second nucleobase of the DNA is methylationsensitive conversion.

16. The method of the immediately preceding claim, wherein the methylation-sensitive conversion is bisulfite conversion, oxidative bisulfite (Ox-BS) conversion, Tet-assisted bisulfite (TAB) conversion, APOBEC-coupled epigenetic (ACE) conversion, enzymatic methyl-seq (EM-seq) conversion, single-enzyme 5 -methylcytosine sequencing (SEM-seq) conversion, direct methylation sequencing (DM-seq), Tet-assisted pyridine borane sequencing (TAPS), or Tet-assisted pyridine borane sequencing with protection of 5hmC (TAPS-P).

17. The method of the immediately preceding claim, wherein the Tet-assisted conversion further comprises a substituted borane reducing agent, optionally wherein the substituted borane reducing agent is 2-picoline borane, borane pyridine, tert-butyl amine borane, or ammonia borane.

18. The method of any one of claims 16-17, wherein the procedure that affects a first nucleobase of the DNA differently from a second nucleobase procedure comprises contacting the DNA with a CpG-specific DNA methyltransferase (MTase) or a CpG-specific carboxymethyltransferase (CxMTase), a methyl donor or a carboxymethyl donor, and a cytosine deaminase.

19. The method of the immediately preceding claim, wherein the cytosine deaminase is an APOBEC enzyme, optionally wherein the APOBEC enzyme is APOBEC3 A.

20. The method of any one of claims 13-16, wherein the procedure that affects a first nucleobase in the DNA differently from a second nucleobase comprises bisulfite conversion; the contacting the sample with the methylation-discriminating nuclease occurs prior to bisulfite conversion; and the methylation-discriminating nuclease is a methylation-sensitive restriction enzyme (MSRE).Atty. Docket No. GH0193WO21. The method of any one of claims 13-16, wherein the procedure that affects a first nucleobase in the DNA differently from a second nucleobase comprises direct methylation sequencing (DM-seq); the contacting the sample with the methylation-discriminating nuclease occurs prior to direct methylation sequencing (DM-seq); and the methylation-discriminating nuclease is a methylation-sensitive restriction enzyme (MSRE).

22. The method of any one of claims 13-16, wherein the procedure that affects a first nucleobase in the DNA differently from a second nucleobase comprises APOBEC-coupled epigenetic (ACE) conversion; the contacting the sample with the methylation-discriminating nuclease occurs prior to contacting the sample with APOBEC; and the methylation-discriminating nuclease is a methylation-sensitive restriction enzyme (MSRE).

23. The method of any one of claims 13-16, wherein the procedure that affects a first nucleobase in the DNA differently from a second nucleobase comprises enzymatic methyl-seq (EM-seq) conversion, wherein EM-seq comprises a contacting a sample with APOBEC; the contacting the sample with the methylation-discriminating nuclease occurs prior to contacting the sample with APOBEC; and the methylation-discriminating nuclease is a methylation-sensitive restriction enzyme (MSRE).

24. The method of any one of claims 13-16, wherein the procedure that affects a first nucleobase in the DNA differently from a second nucleobase comprises single-enzyme 5 -methylcytosine sequencing (SEM-seq) conversion; the contacting the sample with the methylation-discriminating nuclease occurs prior to single-enzyme 5 -methylcytosine sequencing (SEM-seq) conversion; andAtty. Docket No. GH0193WO the methylation-discriminating nuclease is a methylation-sensitive restriction enzyme (MSRE).

25. The method of any one of the preceding claims, further comprising ligating one or more adapters to the DNA, thereby producing adapter-ligated DNA.

26. The method of the immediately preceding claim, wherein at least one cytosine in the one or more adapters is an unmodified cytosine, optionally wherein each cytosine in the one or more adapters is an unmodified cytosine.

27. The method any one of claims 25-26, wherein at least one cytosine in the one or more adapters is a modification resistant cytosine, optionally wherein each cytosine in the one or more adapters is a modification resistant cytosine.

28. The method of the immediately preceding claim, wherein the modification resistant cytosine is a deaminase resistant cytosine.

29. The method of the immediately preceding claim, wherein the deaminase resistant cytosine is 5-propynylC (5pyC), 5-pyrrolo-dC (5pyrC), 5-hydroxymethylcytosine (5hmC), glucosylated5-hydroxymethylcytosine (5ghmC), cytosine 5-methylenesulfonate (CMS), or N4-modified cytosine.

30. The method of any one of claims 25-29, wherein the one or more adapters are Y- shaped adapters.

31. The method of claim any one of claims 25-30, wherein the one or more adapters comprise molecular barcodes.

32. The method of any one of claims 25-31, wherein the one or more adapters is resistant to digestion by the methylation-discriminating nuclease.

33. The method of the immediately preceding claim, wherein the methylationdiscriminating nuclease is a methylation-sensitive restriction enzyme (MSRE) and wherein the one or more adapters that is resistant to digestion by the MSRE:Atty. Docket No. GH0193WO(i) comprises one or more methylated nucleotides, optionally wherein the methylated nucleotides comprise 5 -methyl cytosine and / or 5-hydroxymethylcytosine;(ii) comprises one or more nucleotide analogs resistant to methylation sensitive restriction enzymes; or(iii) does not comprise a nucleotide sequence recognized by the MSRE.

34. The method of claim 32, wherein the methylation-discriminating nuclease is a methylation-dependent restriction enzyme (MDRE) and wherein the one or more adapters that is resistant to digestion by the MDRE:(i) comprises one or more unmethylated nucleotides;(ii) comprises one or more nucleotide analogs resistant to methylation dependent restriction enzymes; or(iii) does not comprise a nucleotide sequence recognized by the MDRE.

35. The method of any one of claims 25-34, wherein the ligating one or more adapters to the DNA occurs prior to subjecting the sample or the treated sample to a procedure that affects a first nucleobase differently from a second nucleobase.

36. The method of any one of the preceding claims, further comprising subjecting the DNA to end repair to generate end-repaired DNA molecules, wherein the end repair is performed using deoxynucleotide triphosphates (dNTPs).

37. The method of the immediately preceding claim, wherein the end repair is performed using at least one type of dNTP which comprises a modified base, wherein the modified base is other than 5mC or 5hmC, and the at least one type of dNTP comprising a modified base is incorporated into a repaired region of the end-repaired DNA molecules at one or more locations.

38. The method of claim 36, wherein the end repair is performed using at least one type of dNTP which comprises a modified base, wherein the modified base is a methylated cytosine, optionally wherein the methylated base is 5mC or 5hmC, and the at least one type of dNTP comprising a modified base is incorporated into a repaired region of the end- repaired DNA molecules at one or more locations.Atty. Docket No. GH0193WO39. The method of claim 36, wherein the end repair is performed using at least one type of dNTP which comprises a modified base, wherein the modified base is a methylated cytosine, optionally wherein the methylated base is 5mC or 5hmC, wherein the at least one type of dNTP comprising a modified base is incorporated into a repaired region of the end- repaired DNA molecules at one or more locations, and the repaired region is defined as:(i) the sequence between two non-m ethylated cytosines which span one or more methylated CpH cytosines; and / or(ii) the sequence between a methylated CpH cytosine and an end of a sequence read, wherein the methylated CpH cytosine is the CpH cytosine most distant from the end of the sequence read, or a subsequence thereof comprising one or more methylated CpH cytosines.

40. The method of claim 36, wherein at least one type of dNTP comprises a modified base, and the at least one dNTP comprising a modified base is incorporated into a repaired region of the end-repaired DNA molecules at one or more locations.

41. The method of any one of claims 36-40, wherein the end repair is performed using a DNA polymerase that does not have 5 ’-3’ exonuclease activity and / or is not a strand displacing DNA polymerase.

42. The method of any one of claims 36-40, wherein the end repair is performed using a DNA polymerase that has 5’-3’ exonuclease activity and / or is a strand displacing DNA polymerase.

43. The method of any one of claims 36-42, wherein the at least one type of dNTP which comprises a modified base, wherein the modified base includes a dNTP comprising 4- methylcytosine (4mC), a dNTP comprising 5-methylcytosine (5mC), a dNTP comprising 5- hydroxymethyl -cytosine (5hmC), a dNTP comprising N6-methyladenosine (6mA), a dNTP comprising bromodeoxyuridine (BrdU) and / or a dNTP comprising 8-oxoguanine (8oxoG).

44. The method of any one of claims 36-43, wherein the subjecting the DNA to end repair occurs prior step a) and / or prior to ligating one or more adapters to the DNA.

45. The method of any one of the preceding claims, further comprising performing an A- tailing reaction, optionally after a step of subjecting the DNA to end repair.Atty. Docket No. GH0193WO46. The method of the immediately preceding claim, wherein the end-repair and the A- tailing reaction are performed in the same reaction mixture, optionally wherein the end-repair and the A-tailing reaction are performed a single tube and / or optionally wherein the endrepair and the A-tailing reaction are performed without an intervening clean-up step.

47. The method of claim 45 or claim 46, wherein the A-tailing is performed using a DNA polymerase that does not possess 5’-3’ exonuclease activity and / or is not a strand displacing DNA polymerase, optionally wherein the DNA polymerase is HemoKlen Taq.

48. The method of any one of claims 45-48, wherein the A-tailing is performed using a thermostable DNA polymerase.

49. The method of any one of the preceding claims, further comprising amplifying DNA in the sample using a DNA polymerase.

50. The method of the immediately preceding claim, wherein the DNA polymerase is a uracil-tolerant DNA polymerase.

51. The method of claim 49 or claim 50, wherein the amplifying occurs after step b).

52. The method of any one of the preceding claims, further comprising capturing a first target region set comprising epigenetic target regions from the DNA.

53. The method of the immediately preceding claim, wherein the capturing comprises contacting the DNA in the sample with a plurality of target-specific probes specific for members of the epigenetic target region set, thereby providing captured DNA.

54. The method of claim 52 or claim 53, wherein the capturing further comprises capturing sequence-variable target regions of the DNA, comprising contacting the DNA with a plurality of target-specific probes specific for the sequence-variable target regions.

55. The method of any one of claims 52-54, wherein the capturing occurs after step b).

56. The method of any one of claims 52-55, wherein the capturing occurs after amplifying the DNA.Atty. Docket No. GH0193WO57. The method of any one of claims 52-56, wherein the capturing occurs after step b) and after amplifying the DNA.

58. The method of any one of claims 52-57, wherein the first target region set comprises a hypermethylation variable target region set.

59. The method of the immediately preceding claim, wherein the hypermethylation variable target region set comprises regions having a higher degree of methylation in at least one type of tissue than the degree of methylation in cell-free DNA from a healthy subject.

60. The method of claim 58 or claim 59, wherein the method further comprises determining a presence, absence, or likelihood of cancer based at least in part on sequences or quantities of regions in the hypermethylation variable target region set.

61. The method of any one of claims 58-60, further comprising quantifying tumor DNA in the sample based at least in part on sequences or quantities of regions in the hypermethylation variable target region set.

62. The method of any one of claims 52-57, wherein the epigenetic target regions comprise a hypomethylation variable target region set.

63. The method of the immediately preceding claim, wherein the hypomethylation variable target region set comprises regions having a lower degree of methylation in at least one type of tissue than the degree of methylation in cell-free DNA from a healthy subject.

64. The method of the immediately preceding claim, wherein the method further comprises determining a presence, absence, or likelihood of cancer based at least in part on sequences or quantities of regions in the hypomethylation variable target region set.

65. The method of any one of claims 62-64, further comprising quantifying tumor DNA in the sample based at least in part on sequences or quantities of regions in the hypomethylation variable target region set.Atty. Docket No. GH0193WO66. The method of any one of claims 52-65, wherein the epigenetic target regions comprise a methylation control target region set.

67. The method of any one of claims 65-66, wherein the epigenetic target region set comprise a fragmentation variable target region set.

68. The method of the immediately preceding claim, wherein the fragmentation variable target region set comprises transcription start site regions.

69. The method of claim 67 or claim 68, wherein the fragmentation variable target region set comprises CTCF binding regions.

70. The method of any one of claims 54-69, wherein DNA molecules corresponding to the sequence-variable target region set are captured with a greater capture yield than DNA molecules corresponding to the epigenetic target region set.

71. The method of any one of claims 52-70, wherein capturing comprises contacting DNA to be captured with a set of target-specific probes, whereby complexes of targetspecific probes and DNA are formed.

72. The method of the immediately preceding claim, wherein capturing further comprises separating the complexes from DNA not bound to target-specific probes, thereby providing captured DNA.

73. The method of claim 71 or claim 72, wherein the set of target-specific probes is configured to capture DNA corresponding to the sequence-variable target region set with a greater capture yield than DNA corresponding to the epigenetic target region set.

74. The method of any one of claims 54-73, comprising sequencing DNA molecules corresponding to the sequence-variable target region set to a greater depth of sequencing than DNA molecules corresponding to the epigenetic target region set.Atty. Docket No. GH0193WO75. The method of any one of the preceding claims, further comprising sequencing at least a portion of the DNA in the sample.

76. The method of the immediately preceding claim, wherein the sequencing occurs after step b).

77. The method of claim 75 or claim 76, wherein the sequencing occurs after amplifying a treated and converted sample.

78. The method of any one of claims 75-77, wherein the sequencing occurs after capturing a first target region set comprising epigenetic target regions from the sample.

79. The method of any one of claims 75-78, wherein the sequencing occurs after step b), after amplifying a treated and converted sample, and after capturing a first target region set comprising epigenetic target regions from the sample.

80. The method of any one of the preceding claims, further comprising quantifying a level of methylation at one or more differentially methylated regions of the DNA.

81. The method of the immediately preceding claim, wherein quantifying the level of methylation at one or more differentially methylated regions of the DNA comprises sequencing at least a portion of the amplified DNA or quantitative PCR.

82. The method of any one of the preceding claims, wherein the sequencing comprises sequencing the DNA in a manner that distinguishes the first nucleobase from the second nucleobase.

83. The method of any one of claims 74-82, wherein the sequencing comprises nextgeneration sequencing (NGS).

84. The method of the immediately preceding claim, wherein the NGS comprises pyrosequencing, sequencing-by-synthesis, semiconductor sequencing, sequencing-by- ligation, or sequencing-by-hybridization.Atty. Docket No. GH0193WO85. The method of any one of claims 74-82, wherein the sequencing comprises singlemolecule real time (SMRT) sequencing.

86. The method of any one of claims 74-82, wherein the sequencing comprises long-read sequencing.

87. The method of any one of claims 74-82, wherein the sequencing comprises nanoporebased sequencing.

88. The method of any one of claims 74-82, wherein the sequencing comprises 5-letter or 6-letter sequencing.

89. The method of claims 74-82 or 87, wherein the sequencing comprises nanopore-based sequencing and the method comprises subjecting the DNA in the sample to end repair to generate end-repaired DNA molecules, wherein the end repair is performed using at least one type of dNTP which comprises a modified base including a dNTP comprising 4mC, a dNTP comprising 5mC, a dNTP comprising 5hmC, a dNTP comprising 6mA, a dNTP comprising BrdU, dUTP, a dNTP comprising fluorodeoxyuridine (FldU), a dNTP comprising 5- iododeoxyuridine (IdU), and / or a dNTP comprising 5-ethynyldeoxyuridine (EdU), and the at least one type of dNTP comprising a modified base is incorporated into a repaired region of the end-repaired DNA molecules at one or more locations.

90. The method of claims 74-82 or 85, wherein the sequencing comprises single-molecule real time (SMRT) sequencing and the method comprises subjecting the DNA in the sample to end repair to generate end-repaired DNA molecules, wherein the end repair is performed using at least one type of dNTP which comprises a modified base including a dNTP comprising a 4mC, a dNTP comprising 5mC, a dNTP comprising 5hmC, a dNTP comprising 6mA, and / or a dNTP comprising 8oxoG, and the at least one type of dNTP comprising a modified base is incorporated into a repaired region of the end-repaired DNA molecules at one or more locations.

91. The method of any one of claims 74-90, further comprising analyzing at least some of the sequence data corresponding to regions that are not identified as being synthesized duringAtty. Docket No. GH0193WO the end repair to detect the presence or absence of base modifications or mutations present in the DNA sample.

92. The method of any one of claims 74-91, wherein the method further comprises detecting the methylation status of cytosines in the DNA in the sample, and further comprises analyzing the sequence data, wherein the analyzing the sequence data filtering out the one or more repaired regions of the end-repaired DNA molecules such that the one or more repaired regions are not used to determine the methylation status of cytosines in the DNA sample.

93. The method of any one of claims 74-91, wherein the method is for detecting the single nucleotide variants (SNVs) in the DNA sample, and further comprises analyzing the sequence data, wherein the analyzing the sequence data comprises classifying all base calls within the one or more end repaired regions as not having double stranded support.

94. The method of any one of the preceding claims, further comprising analyzing the sequence data to determine a level of measured artifacts in the DNA of the sample.

95. The method of any one of the preceding claims, wherein the sample is from a subject.

96. The method of any one of the preceding claims, wherein the sample is from a subject and the method further comprises determining the presence or absence of cancer in the subject based at least in part on the sequencing data.

97. The method of any one of claims 95-96, wherein the subject is an animal.

98. The method of the immediately preceding claim, wherein the subject is a human.

99. The method of any one of claims 95-98, wherein the subject has or is at risk of having a cancer.

100. The method of any one of claims 95-99, further comprising determining the presence or status of a cancer in the subject.

101. The method of any one of claims 95-99, further comprising determining a likelihood that the subject has cancer.Atty. Docket No. GH0193WO102. The method of the immediately preceding claim, wherein the sequencing generates a plurality of sequencing reads; and the method further comprises mapping the plurality of sequence reads to one or more reference sequences to generate mapped sequence reads, and processing the mapped sequence reads corresponding to the sequence-variable target region set and to the epigenetic target region set to determine the likelihood that the subject has cancer.

103. The method of claim 101, wherein the test subject was previously diagnosed with a cancer and received one or more previous cancer treatments, optionally wherein the cfDNA is obtained at one or more preselected time points following the one or more previous cancer treatments, and sequencing the captured set of cfDNA molecules, whereby a set of sequence information is produced.

104. The method of the immediately preceding claim, further comprising detecting a presence or absence of DNA originating or derived from a tumor cell at a preselected timepoint using the set of sequence information.

105. The method of the immediately preceding claim, further comprising determining a cancer recurrence score that is indicative of the presence or absence of the DNA originating or derived from the tumor cell for the test subject, optionally further comprising determining a cancer recurrence status based on the cancer recurrence score, wherein the cancer recurrence status of the test subject is determined to be at risk for cancer recurrence when a cancer recurrence score is determined to be at or above a predetermined threshold or the cancer recurrence status of the test subject is determined to be at lower risk for cancer recurrence when the cancer recurrence score is below the predetermined threshold.

106. The method of the immediately preceding claim, further comprising comparing the cancer recurrence score of the test subject with a predetermined cancer recurrence threshold, wherein the test subject is classified as a candidate for a subsequent cancer treatment when the cancer recurrence score is above the cancer recurrence threshold or not a candidate for a subsequent cancer treatment when the cancer recurrence score is below the cancer recurrence threshold.