Systems and methods for determining cancer progression
The method addresses the limitations of existing genetic testing by dynamically analyzing methylation signals to identify initiating cancer-related changes, enhancing cancer detection and treatment personalization.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- GRAIL INC
- Filing Date
- 2025-12-18
- Publication Date
- 2026-06-25
AI Technical Summary
Existing genetic testing methods for cancer detection based on differential methylation analysis fail to account for biological heterogeneity, such as cancer stages and tumor fraction levels, resulting in lengthy and less relevant lists of differentially methylated regions (DMRs).
A computer-implemented method that analyzes differential methylation signals across feature values, using threshold criteria to identify initiating methylation changes and refine DMRs by focusing on biologically meaningful changes related to cancer progression, including cancer stage, tumor fraction, and other dynamic factors.
Provides a more targeted and actionable dataset for cancer detection, enabling personalized treatment recommendations, monitoring, and risk stratification by identifying methylation changes that drive cancer progression.
Smart Images

Figure US2025060234_25062026_PF_FP_ABST
Abstract
Description
Attorney Docket No. 00316-0025-00304Client Ref. No. P0221-WOSYSTEMS AND METHODS FOR DETERMINING CANCER PROGRESSIONCROSS-REFERENCE TO RELATED APPLICATION
[0001] This application claims the benefit of priority to U.S. Provisional Patent Application No. 63 / 736,040, filed on December 19, 2024, which is hereby incorporated by reference in its entirety.TECHNICAL FIELD
[0002] The present disclosure relates generally to the field of genetic testing and, more specifically, to systems and methods for detecting initiating methylation changes that drive disease, e.g., cancer, progression.BACKGROUND
[0003] In the realm of genetic testing, the analysis of methylation patterns permits detection and / or identification of potential disease, e.g., cancer signals from an individual’s biological sample. Such analysis may facilitate cancer detection, as well as prediction of cancer signal origin. Specifically, differential methylation analysis may be utilized to detect differences in methylation signals between cancer and non-cancer samples. These differentially methylated regions (DMRs) may be analyzed as between cancer and non-cancer genetic samples to identify areas of interest for potential cancer detection.
[0004] Simple, static comparisons of differential methylation between cancer and non-cancer genetic samples, however, may return a lengthy list of DMRs. These simple comparisons may, for example, fail to account for relevant biological heterogeneity that reflects cancer progression, such as cancer stages e.g., stages 1-4) or tumor fraction level, and as such, may fail to provide meaningful data and analysis. ., Embodiments of the present disclosure may address one or more of these problems.
[0005] The background description provided herein is for the purpose of generally presenting the context of the disclosure. Unless otherwise indicated herein, the materials described in this section are not prior art to the claims in this application and are not admitted to be prior art, or suggestions of the prior art, by inclusion in this section.SUMMARY OF THE DISCLOSURE
[0006] According to certain aspects of the disclosure, systems and methods are described for analyzing biological samples to detect and identify differentially methylatedAttomey Docket No. 00316-0025-00304Client Ref. No. P0221-WO regions and / or initiating methylation changes or events. In some aspects, the systems and methods may analyze differential signal values as between samples or as between samples and reference samples to detect differential methylation and changes in methylation signals or signal velocity across feature values, according to threshold criteria.
[0007] In summary, one aspect described herein provides a computer-implemented method for detecting initiating methylation changes, including: receiving, at a computer system, genomic data associated with at least one sample, wherein the genomic data includes sequenced genetic material corresponding to the at least one sample; generating, using a processor associated with the computer system, a feature value for at least one region of the at least one sample; assembling, using the processor, at least one sample group based upon the feature value for the at least one sample, wherein each sample group is assembled according to user-defined feature value criteria; generating, using the processor, at least one sample dataset for each of the at least one sample groups; comparing, using the processor, each at least one sample dataset associated with a sample group to a reference dataset associated with a reference sample; identifying, based on the comparing and responsive to determining that an at least one sample dataset meets a differential threshold with the reference dataset, a differentially methylated region; generating, using the processor, an effect size for each differentially methylated region, wherein the effect size corresponds to a mean differential of the at least one sample dataset and the reference dataset and is associated with the feature value; comparing, using the processor, the effect size associated with each feature value to the effect size associated with each additional feature value; and identifying, responsive to determining that a change in the effect size over feature values meets an initiating threshold, an initiating methylation change, or responsive to determining that the change in effect size over feature values does not meet an initiating threshold, an absence of an initiating methylation change.
[0008] In some aspects, the techniques described herein relate to a computer implemented method, wherein the feature value includes cancer type.
[0009] In some aspects, the techniques described herein relate to a computer implemented method, wherein the feature value includes cancer stage.
[0010] In some aspects, the techniques described herein relate to a computer implemented method, wherein the feature value includes age.
[0011] In some aspects, the techniques described herein relate to a computer implemented method, wherein the feature value includes smoking status.Attorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO
[0012] In some aspects, the techniques described herein relate to a computer implemented method, wherein the at least one sample dataset and the reference dataset each include a methylation signal value.
[0013] In some aspects, the techniques described herein relate to a computer implemented method, wherein the at least one sample dataset and the reference dataset each include a tumor fraction value.
[0014] In some aspects, the techniques described herein relate to a computer implemented method, the method further including: generating, using the processor, a methylation differential value calculated as a difference between a sample methylation signal value associated with a differentially methylated region to a reference methylation signal value associated with a reference sample; comparing, using the processor, a methylation differential value with a first differential threshold and with a second differential threshold; and identifying, based on the comparing and responsive to determining that the methylation differential value is greater than a first differential threshold, a hypermethylated region, or identifying, based on the comparing and responsive to determining that the methylation differential value is less than a second differential threshold, a hypomethylated region, or identifying, based on comparing and responsive to determining that the methylation differential value is less than the first differential threshold and greater than the second differential threshold, an insufficiently differentiated region.
[0015] In some aspects, the techniques described herein relate to a computer- implemented method, wherein the at least one sample and the reference sample are associated with a sample type selected from the group consisting of: cell-free DNA (cfDNA), cell-free RNA (cfRNA), bone marrow, urine, tissue, saliva, or plasma.
[0016] In some aspects, the techniques described herein relate to a system for detecting initiating methylation changes, including: one or more processors; and one or more computer readable media storing instructions that are executable by the one or more processors to perform operations to: receive, at a computer system, genomic data associated with at least one sample, wherein the genomic data includes sequenced genetic material corresponding to the at least one sample; generate, using a processor associated with the computer system, a feature value for at least one region of the at least one sample; assemble, using the processor, at least one sample group based upon the feature value for the at least one sample, wherein each sample group is assembled according to user-defined feature value criteria; generate, using the processor, at least one sample dataset for each of the at least oneAttomey Docket No. 00316-0025-00304Client Ref. No. P0221-WO sample groups; compare, using the processor, each at least one sample dataset associated with a sample group to a reference dataset associated with a reference sample; identify, based on the comparing and responsive to determining that an at least one sample dataset meets a differential threshold with the reference dataset, a differentially methylated region; generate, using the processor, an effect size for each differentially methylated region, wherein the effect size corresponds to a mean differential of the at least one sample dataset and the reference dataset and is associated with the feature value; compare, using the processor, the effect size associated with each feature value to the effect size associated with each additional feature value; and identify, responsive to determining that a change in the effect size over feature values meets an initiating threshold, an initiating methylation change, or responsive to determining that the change in effect size over feature values does not meet an initiating threshold, an absence of an initiating methylation change.
[0017] In some aspects, the techniques described herein relate to a system, wherein the feature value includes cancer type.
[0018] In some aspects, the techniques described herein relate to a system, wherein the feature value includes cancer stage.
[0019] In some aspects, the techniques described herein relate to a system, wherein the feature value includes age.
[0020] In some aspects, the techniques described herein relate to a system, wherein the feature value includes smoking status.
[0021] In some aspects, the techniques described herein relate to a system, wherein the at least one sample dataset and the reference dataset each include a methylation signal value.
[0022] In some aspects, the techniques described herein relate to a system, wherein the at least one sample dataset and the reference dataset each include a tumor fraction value.
[0023] In some aspects, the techniques described herein relate to a system, wherein the one or more computer readable media storing instructions that are executable by the one or more processors to perform operations further include operations to: generate, using the processor, a methylation differential value calculated as a difference between a sample methylation signal value associated with a differentially methylated region to a reference methylation signal value associated with a reference sample; compare, using the processor, a methylation differential value with a first differential threshold and with a second differential threshold; and identify, based on the comparing and responsive to determining that theAttorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO methylation differential value is greater than a first differential threshold, a hypermethylated region, or identifying, based on the comparing and responsive to determining that the methylation differential value is less than a second differential threshold, a hypomethylated region, or identifying, based on comparing and responsive to determining that the methylation differential value is less than the first differential threshold and greater than the second differential threshold, an insufficiently differentiated region.
[0024] In some aspects, the techniques described herein relate to a system, wherein the at least one sample and the reference sample are associated with a sample type selected from the group consisting of: cell-free DNA (cfDNA), cell-free RNA (cfRNA), bone marrow, urine, tissue, saliva, or plasma.
[0025] In some aspects, the techniques described herein relate to a non-transitory computer-readable medium storing computer-executable instructions which, when executed by a system, cause the system to perform operations including: receiving, at a computer system, genomic data associated with at least one sample, wherein the genomic data includes sequenced genetic material corresponding to the at least one sample; generating, using a processor associated with the computer system, a feature value for at least one region of the at least one sample; assembling, using the processor, at least one sample group based upon the feature value for the at least one sample, wherein each sample group is assembled according to user-defined feature value criteria; generating, using the processor, at least one sample dataset for each of the at least one sample groups; comparing, using the processor, each at least one sample dataset associated with a sample group to a reference dataset associated with a reference sample; identifying, based on the comparing and responsive to determining that an at least one sample dataset meets a differential threshold with the reference dataset, a differentially methylated region; generating, using the processor, an effect size for each differentially methylated region, wherein the effect size corresponds to a mean differential of the at least one sample dataset and the reference dataset and is associated with the feature value; comparing, using the processor, the effect size associated with each feature value to the effect size associated with each additional feature value; and identifying, responsive to determining that a change in the effect size over feature values meets an initiating threshold, an initiating methylation change, or responsive to determining that the change in effect size over feature values does not meet an initiating threshold, an absence of an initiating methylation change.Attomey Docket No. 00316-0025-00304Client Ref. No. P0221-WO
[0026] In some aspects, the techniques described herein relate to a non-transitory computer-readable medium, wherein the feature value includes cancer type.
[0027] In some aspects, the techniques described herein relate to a non-transitory computer-readable medium, wherein the feature value includes cancer stage.
[0028] In some aspects, the techniques described herein relate to a non-transitory computer-readable medium, wherein the feature value includes age.
[0029] In some aspects, the techniques described herein relate to a non-transitory computer-readable medium, wherein the feature value includes smoking status.
[0030] In some aspects, the techniques described herein relate to a non-transitory computer-readable medium, wherein the at least one sample dataset and the reference dataset each include a methylation signal value.
[0031] In some aspects, the techniques described herein relate to a non-transitory computer-readable medium, wherein the at least one sample dataset and the reference dataset each include a tumor fraction value.
[0032] In some aspects, the techniques described herein relate to a non-transitory computer-readable medium, wherein the computer-executable instructions which, when executed by a system, cause the system to perform operations further include: generating, using the processor, a methylation differential value calculated as a difference between a sample methylation signal value associated with a differentially methylated region to a reference methylation signal value associated with a reference sample; comparing, using the processor, a methylation differential value with a first differential threshold and with a second differential threshold; and identifying, based on the comparing and responsive to determining that the methylation differential value is greater than a first differential threshold, a hypermethylated region, or identifying, based on the comparing and responsive to determining that the methylation differential value is less than a second differential threshold, a hypomethylated region, or identifying, based on comparing and responsive to determining that the methylation differential value is less than the first differential threshold and greater than the second differential threshold, an insufficiently differentiated region.
[0033] In some aspects, the techniques described herein relate to a system, wherein the at least one sample and the reference sample are associated with a sample type selected from the group consisting of: cell-free DNA (cfDNA), cell-free RNA (cfRNA), bone marrow, urine, tissue, saliva, or plasma.Attorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO
[0034] In some aspects, the techniques described herein relate to a computer- implemented method for identifying a differentially methylated region, including: receiving, at a computer system, genomic data associated with at least one sample, wherein the genomic data includes sequenced genetic material corresponding to the at least one sample; generating, using a processor associated with the computer system, a feature value for at least one region of the at least one sample; assembling, using the processor, at least one sample group based upon the feature value for the at least one sample, wherein each sample group is assembled according to user-defined feature value criteria; generating, using the processor, at least one sample dataset for each of the at least one sample groups; comparing, using the processor, each at least one sample dataset associated with a sample group to a reference dataset associated with a reference sample; and identifying, based on the comparing and responsive to determining that an at least one sample dataset meets a differential threshold with the reference dataset, a differentially methylated region.
[0035] In some aspects, the techniques described herein relate to a computer- implemented method for detecting initiating methylation changes, including: receiving, at a computer system, genomic data associated with at least one sample, wherein the genomic data includes sequenced genetic material corresponding to the at least one sample; receiving at a computer system, genomic data associated with at least one reference sample, wherein the genomic data associated with the at least one reference sample includes a reference dataset; generating, using a processor associated with the computer system, a feature value and at least one sample dataset for at least one region of the at least one sample; comparing, using the processor, each at least one sample dataset associated with a sample group to the reference dataset; identifying, based on the comparing and responsive to determining that an at least one sample dataset meets a differential threshold with the reference dataset, a differentially methylated region; generating, using the processor, an effect size for at least one differentially methylated region associated with the at least one sample, wherein the effect size corresponds to a mean differential of the at least one sample dataset and a reference dataset, wherein the reference dataset is associated with a reference sample; comparing, using the processor, the effect size associated with each feature value to the effect size associated with each additional feature value; and identifying, responsive to determining that a change in the effect size over feature values meets an initiating threshold, an initiating methylation change, or responsive to determining that the change in effect size over feature values does not meet an initiating threshold, an absence of an initiating methylation change.Attorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO
[0036] In some aspects, the techniques described herein relate to a computer- implemented method for detecting initiating methylation changes, including: receiving, at a computer system, genomic data associated with at least one sample and at least one reference sample, wherein the genomic data includes sequenced genetic material; generating, using a processor associated with the computer system, a feature value for at least one region of the at least one sample; assembling, using the processor, at least one sample group based upon the feature value for the at least one sample, wherein each sample group is assembled according to user-defined feature value criteria; generating, using the processor, at least one sample dataset for each of the at least one sample groups and a reference dataset for the at least one reference sample; generating, using the processor, an effect size for at least one differentially methylated region, wherein the effect size corresponds to a mean differential of each sample dataset and each reference dataset and is associated with the feature value; comparing, using the processor, the effect size associated with each feature value to the effect size associated with each additional feature value; and identifying, responsive to determining that a change in the effect size over feature values meets an initiating threshold, an initiating methylation change, or responsive to determining that the change in effect size over feature values does not meet an initiating threshold, an absence of an initiating methylation change.
[0037] In some aspects, the techniques described herein relate to a computer- implemented method for detecting initiating methylation changes, including: receiving, at a computer system, genomic data associated with at least one sample, wherein the genomic data includes sequenced genetic material corresponding to the at least one sample; generating, using a processor associated with the computer system, a feature value for at least one region of the at least one sample; generating, using the processor, at least one sample dataset for each of the at least one samples; comparing, using the processor, each at least one sample dataset associated with each sample to a reference dataset associated with a reference sample; identifying, based on the comparing and responsive to determining that an at least one sample dataset meets a differential threshold with the reference dataset, a differentially methylated region; generating, using the processor, an effect size for each differentially methylated region, wherein the effect size corresponds to a mean differential of the at least one sample dataset and the reference dataset and is associated with the feature value; comparing, using the processor, the effect size associated with each feature value to the effect size associated with each additional feature value; and identifying, responsive to determining that a change in the effect size over feature values meets an initiating threshold, an initiatingAttomey Docket No. 00316-0025-00304Client Ref. No. P0221-WO methylation change, or responsive to determining that the change in effect size over feature values does not meet an initiating threshold, an absence of an initiating methylation change.
[0038] Additional objects and advantages of the disclosed embodiments will be set forth in part in the description that follows, and in part will be apparent from the description, or may be learned by practice of the disclosed embodiments. The objects and advantages of the disclosed embodiments will be realized and attained by means of the elements and combinations particularly pointed out in the appended claims.
[0039] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosed embodiments, as claimed.BRIEF DESCRIPTION OF THE DRAWINGS
[0040] The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate several embodiments, and together with the description, serve to explain the principles of the disclosure.
[0041] FIG. 1 depicts an exemplary illustration of methylation patterns for a cancerous tumor as differentiated from that of normal, non-cancerous tissue, based on hyper- or hypomethylation patterns, according to one or more aspects of the present disclosure.
[0042] FIG. 2 depicts an example illustration of detection of differentially methylated regions (DMRs) for cancerous versus non-cancerous genetic material, according to one or more aspects of the present disclosure.
[0043] FIG. 3A depicts an example illustration of non-differentially methylated regions (DMRs) for cancerous versus non-cancerous genetic material, according to one or more aspects of the present disclosure.
[0044] FIG. 3B depicts an example illustration of differentially methylated regions (DMRs) for cancerous versus non-cancerous genetic material, according to one or more aspects of the present disclosure.
[0045] FIG. 4 depicts an example system environment, according to one or more embodiments of the present disclosure.
[0046] FIG. 5 depicts an example illustration of a process flow for a DMR identification component of an exemplary system environment, according to one or more embodiments of the present disclosure.Attorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO
[0047] FIG. 6 depicts an example illustration of a process flow for DMR identification of an exemplary system environment, according to one or more embodiments of the present disclosure.
[0048] FIG. 7 depicts an example illustration of a process flow of the system environment illustrated in FIG. 1, according to one or more embodiments of the present disclosure.
[0049] FIG. 8 depicts a process flow for detecting hypermethylation and hypomethylation, according to one or more embodiments of the present disclosure.
[0050] FIG. 9 depicts an example illustration of a process flow for detecting initiating methylation events, according to one or more embodiments of the present disclosure.
[0051] FIG. 10 depicts an example illustration of the identification of a plurality of DMRs from cfDNA-derived sample groups, according to one or more aspects of the present disclosure.
[0052] FIG. 11 depicts an exemplary illustration plot of the effect size of initiating methylation events, according to one or more aspects of the present disclosure.
[0053] FIG. 12 depicts an exemplary illustration of a feature coverage map for H0XA9 correlated for numerous cancers, according to one or more aspects of the present disclosure.
[0054] FIG. 13 depicts an exemplary illustration of H0XA9 methylation signals shown across different cancer stages relative to a non-cancer signal, according to one or more aspects of the present disclosure.
[0055] FIG. 14 depicts an exemplary illustration of H0XA9 methylation signals for different cancer types shown across different cancer stages relative to a non-cancer signal, according to one or more aspects of the present disclosure.
[0056] FIG. 15 depicts an exemplary illustration of calculated methylation signal corresponding to individual cancer stages and shown relative to a non-cancerous sample, according to one or more aspects of the present disclosure.
[0057] FIG. 16 depicts an exemplary illustration of a sample plot of calculated beta values for a number of sample groups referenced against a reference sample, according to one or more aspects of the present disclosure.
[0058] FIG. 17 depicts an exemplary illustration plot of the effect size of initiating methylation events, according to one or more aspects of the present disclosure.Attorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO
[0059] FIG. 18 depicts an example computing system, according to one or more aspects of the present disclosure.DETAILED DESCRIPTION OF EMBODIMENTS
[0060] The terminology used below may be interpreted in its broadest reasonable manner, even though it is being used in conjunction with a detailed description of certain specific examples of the present disclosure. Indeed, certain terms may even be emphasized below; however, any terminology intended to be interpreted in any restricted manner will be overtly and specifically defined as such in this detailed description section. Both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the features, as claimed.
[0061] In the realm of modern cancer medicine, early detection is one of the most promising tools in cancer treatment. Many cancers are often found too late, when prognosis may be poor and treatment options are limited. Recent advances in early cancer detection include testing methods that detect signals (e.g., methylation patterns) in genetic materials that are common to or shared by many cancers.
[0062] In general, for example, researchers and / or clinicians may extract and sequence nucleic acids, such as DNA (e.g., cell-free DNA or cfDNA) fragments or RNA (e.g., cell -free RNA or cfRNA) from an individual’s biological sample to determine whether those DNA or RNA fragments originate from healthy cells or cancerous cells. Biological samples may include blood samples, urine samples, or any other suitable samples. The human body naturally sheds small amounts of DNA and RNA into the bloodstream, including DNA and RNA from both cancerous and non-cancerous cells. To determine whether a particular DNA or RNA fragment in an individual’s blood originated from a cancerous cell or a non-cancerous cell, testing procedures and systems may be utilized to detect the presence (or absence) of methylation patterns that may be associated with cancer. If cancerous DNA or RNA fragments are detected above a certain threshold, these procedures and methods may further identify the cells from which the cancerous fragments originated (cancer signal -of- origin prediction, or CSO).
[0063] Modem genetic testing and investigation may utilize differential methylation analyses for detection of cancer signal(s) in a biological sample, including identification of one or more of cancer stage, tumor fraction, and / or cancer signal origin site(s).Attorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO
[0064] For example, and as shown in FIG. 1, the DNA methylation pattern for cancerous tumor tissue may be different from that of normal, non-cancerous tissue. In FIG. 1, the filled-in circles indicate methylated regions of DNA, whereas the empty circles indicate unmethylated regions of DNA. As shown in FIG. 1, tumor tissue may have one or more regions of DNA that are hypermethylated compared to normal tissue. In other aspects, tumor tissue may have one or more regions of DNA that are hypomethylated compared to normal tissue. In some instances, these DMRs may be indicative of cancer. Accordingly, analysis of DMRs may provide insight into whether or not cancer is present in an individual.
[0065] For example, and as shown in FIG. 2, detecting differentially methylated regions for cancerous versus non-cancerous genetic material may allow for identification of specific portions of the genome that are relevant for cancer detection, cancer detection verification, and / or identifying the cancer stage or cancer signal of origin. Specifically, methylation may be analyzed at certain regions of the DNA, such as CpG sites, and the measured methylation percentages may be compared as between cancer and non-cancer genetic materials. By identifying these DMRs as relevant to cancer detection, a clinician or investigator may focus on methylation patterns at specific DMRs (e.g., at specific CpG sites) to determine whether a patient’s genetic material includes relevant DMRs. This process may permit identification or verification of a patient’s cancer, cancer stage, and / or the cancer signal origin.
[0066] Similarly, and as shown in FIGS. 3A-3B, non-cancer samples and cancer samples are graphically plotted, wherein methylation percentage is displayed on the x-axis and density is displayed on the y-axis.
[0067] As shown in the plots of FIG. 3 A, the non-cancer samples and the cancer samples are not significantly differentially methylated, as these plots contain substantial overlap and relative proximity at specific coordinates. In contrast, FIG. 3B illustrates an example of a differentially methylated plot, in which there is relatively little overlap and proximity between the plots of a non-cancer sample and a cancer sample.
[0068] As described above, simple, static comparisons of differential methylation in cancer versus non-cancer samples often return a lengthy listing of DMRs, including potentially overbroad or less relevant DMR listings. For example, these comparisons may fail to account for relevant biological heterogeneity that reflects cancer progression, such as cancer stage (such as stages I to IV, or stage II to stage IV) or tumor fraction level.Attorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO
[0069] Accordingly, the present disclosure provides a novel approach for differential methylation analysis via identification of initiating methylation changes. Embodiments of the disclosure identify and focus on methylation changes that occur in the earlier stages of cancer (e.g., stage I and / or stage II) and continue in later stages, with an increasing velocity, to identify methylation changes that drive cancer progression. By focusing on DMRs that are present in both early and later stages of cancer, the systems and methods described herein may return a more relevant and informative list of DMRs representing methylation changes that are persisting features that may be enriched for initiating cancer events. The results of these analyses demonstrate that such initiating methylation changes may persist as cancer progresses and may be present across multiple, different cancer types, where increasing cancer signal and / or cancer signal velocity may be driven by tumor fraction.
[0070] Such approaches may provide a more general framework for detecting cancer initiating methylation events or alternative cancer initiating events. For example, these applications may be utilized for one or more of cancer detection, cancer signal detection prediction verification, and understanding disease mechanism(s) to inform treatment modalities and targeting. Such DMRs may also be interpreted with other measures of cancer progression, such as tumor fraction, which may yield further insights. It will be appreciated by one of ordinary skill in the art that the present disclosure may not be limited to use with humans; rather the present disclosure may similarly be applied in non-human applications (e.g., differential methylation analysis for cancer detection in non-human organisms).
[0071] Traditional genetic testing methods rely on static comparisons of methylation patterns between samples, which may yield large and overbroad datasets with limited relevance to disease progression. These methods often fail to account for biological heterogeneity, such as differences in cancer stages, tumor fraction levels, or other dynamic factors. The concepts described herein introduce advanced data processing techniques that dynamically analyze methylation signal changes over various biological feature values, such as cancer stage or tumor fraction. This approach allows the system to: i) refine the identification of DMRs by filtering out irrelevant or less significant data; and ii) focus on initiating methylation changes are biologically meaningful and directly related to disease progressing, thereby providing more targeted and actionable data. By enabling such complex and targeted analysis, the described concepts improve the computer’s ability to process large genomic datasets efficiently, which represents an improvement to computer technology byAttorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO optimizing how genomic data is managed, compared, and processed. Additionally, the disclosed methods may be automated such that the system can automatically process biological samples, group data based on user-defined features, and execute complex statistical comparisons against reference datasets. This automation inhibits the need for manual intervention, thereby reducing human error. Furthermore, the statistical and algorithmic process involved in the computer analysis cannot be performed in the human mind due to the complexity, volume, and speed of data processing required, as well as the need for advanced statistical modeling and real-time analysis. For instance, genomic datasets, particularly when they include methylation patterns, include millions or even billions of data points (e.g., methylation levels at CpG sites across the genome). Comparing methylation levels between samples, generating feature values, and assembling sample groups based on these comparisons involve computations on a scale that far exceeds human cognitive capabilities.
[0072] Using the results of the methylation analysis, the system described herein may perform one or more automatic downstream actions. For instance, using the results of the methylation analysis, a system may automatically recommend personalized treatment plans for a subject. For example, the system may compare a subject’s methylation profile to a pre-existing database of known methylation patterns associated with different treatment responses. This database may be populated with data from clinical trials and studies, correlating specific methylation changes with subject outcomes for various treatments. Based on the identified methylation patterns and their correlation with treatment efficacy, the system may automatically generate a treatment. In an aspect, the system may incorporate patient-specific data, such as age, medical history, previous treatment responses, other contextual data, etc., to further tailor the treatment recommendation. For example, a subject with a certain methylation profile and a history of intolerance to chemotherapy may automatically be recommend for an alternative therapy type.
[0073] In an embodiment, the system may automatically monitor the subject’s progress by analyzing follow-up methylation data and adjust the treatment plan as needed. More particularly, the system may regularly receive new samples from the subject, process the methylation data, and compare the new data to previous methylation profiles. By comparing changes in methylation patterns, the system may determine whether the subject is responding to the current treatment. If the system detects that the treatment is not reducing the disease-specific methylation signals, it can automatically issue an alert to a clinician, orAttorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO another individual or healthcare provider, suggesting a review of the treatment plan. Additionally or alternatively, the system may be able to dynamically suggest a modification to an existing treatment plan based on the analysis of new data and its comparison to a subject’s methylation profile.
[0074] Additionally or alternatively, the results from methylation analysis may be used to stratify subjects based on their risk of disease progression. For instance, based on the methylation analysis, the system may generate a risk profile by identifying the subject’s likelihood of aggressive disease progression. For high-risk subjects, the system may automatically recommend early intervention measures, e.g., such as initiating aggressive treatment earlier or scheduling more frequent follow-up visits. The system can adjust the frequency of follow-up tests based on the subject’s risk level. For example, a high-risk subject may automatically be scheduled for monthly methylation tests, while a low-risk subject may be monitored every six months.
[0075] In an embodiment, the system may be configured to automatically match subjects to clinical trials based on their methylation profiles and disease progression. For instance, the system may analyze the subject’s methylation profile and compare it to the eligibility criteria of ongoing clinical trials stored in a database. Based on the methylation changes detected, the system may automatically identify trials for which the subject qualifies, particularly for new drugs targeting specific methylation-related pathways. In an embodiment, the system may send automatic alerts to the subject’s clinician, recommending enrollment in relevant clinical trials and providing trial details.
[0076] At the outset, samples may be collected from subjects, or may be received based on prior collection. Suitable samples include, e.g., liquid biopsy samples (e.g., blood samples (e.g., whole blood or plasma), urine, saliva, etc.) and / or solid biopsy (e.g., tissue, bone marrow, tumor) collection. Once collected, they may be processed according to user- defined processes. For example, genomic regions containing one or more CpG sites may be processed to exclude specific CpG sites (e.g., noisy CpG sites that overlap a single nucleotide polymorphism (SNP) site).
[0077] The concepts described herein utilize a process involving nucleic acid (e.g., DNA or RNA) analyses to detect and identify genetic changes that drive cancer and to detect and analyze the velocity of those changes, the differential methylation signal across samples, and other persisting features. Although the specification discusses DNA in particular forAttorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO convenience, it will be understood that an alternative genetic material or nucleic acid, such as RNA, may alternatively or additionally be used in embodiments of the disclosure.
[0078] The subject matter of the present disclosure will now be described more fully hereinafter with reference to the accompanying drawings, which form a part hereof, and which show, by way of illustration, specific exemplary embodiments. An embodiment or implementation described herein as “exemplary” is not to be construed as preferred or advantageous, for example, over other embodiments or implementations; rather, it is intended to reflect or indicate that the embodiment s) is / are “example” embodiment(s). Subject matter may be embodied in a variety of different forms and, therefore, covered or claimed subject matter is intended to be construed as not being limited to any exemplary embodiments set forth herein; exemplary embodiments are provided merely to be illustrative. Likewise, a reasonably broad scope for claimed or covered subject matter is intended. Among other things, for example, subject matter may be embodied as methods, devices, components, or systems. Accordingly, embodiments may, for example, take the form of hardware, software, firmware, or any combination thereof. The following detailed description is, therefore, not intended to be taken in a limiting sense.
[0079] Throughout the specification and claims, terms may have nuanced meanings suggested or implied in context beyond an explicitly stated meaning. Likewise, the phrase “in one embodiment” or “in some embodiments,” or “in one aspect” or “in some aspects” as used herein does not necessarily refer to the same embodiment or aspect, and the phrase “in another embodiment” or “in another aspect” as used herein does not necessarily refer to a different embodiment or aspect. It is intended, for example, that claimed subject matter include combinations of exemplary embodiments in whole or in part.
[0080] Embodiments of the disclosure may be drawn to methods and systems for computing sample-level features (e.g., features that are bioinformatically extracted from genomic data and features indicative of different disease progression states, disease transitional states, disease types, and / or disease subtypes) from raw or pre-processed fragment methylation patterns, combining these fragment methylation patterns into user- defined groups, and then modeling the methylation states between groups to identify biologically and / or clinically meaningful DMRs. In some aspects, systems and methods described herein may be customizable, e.g., to allow users to test different statistical models and apply them genome-wide across multiple, e.g., thousands, of samples. In some aspects, biologically and / or clinically relevant DMRs identified by systems and methods of theAttorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO present disclosure may be output and stored in a database (e.g., database 102D, described in reference to FIG. 4 below). Outputs of the disclosed systems and methods may include, e.g., methylation states genome-wide across different cancer types, or across all cancer types, and across demographic groups. In some aspects, such outputs may be used to inform how methylation drives different biological processes.
[0081] Referring now to FIG. 4, an exemplary system environment 100 is depicted that may be utilized to identify DMRs associated with cancer and to detect initiating methylation changes associated with various cancer features, including cancer stage, CSO, methylation signal, and / or tumor fraction level. Additionally or alternatively, the system environment may be used to validate or invalidate previously identified DMRs. The system environment 100 may receive one or more samples 10 and may include a computing device 102. Sample 10 may be from any source (e.g., whole blood, blood plasma, urine, tissue, white blood cell, solid tumor, etc.), of any type (e.g., cfDNA, genomic DNA (gDNA)), or assay type from an existing methylation dataset.
[0082] Although depicted in FIG. 4 as components all belonging to a single computing device 102, it should be understood that one or more components, or portions thereof, may, in some embodiments, be integrated with or incorporated on other devices. For example, computing device 102 may be a user device that may be configured to interact with another device on which DMR identification component 105 may be incorporated. In some embodiments, operations or aspects of one or more of the components listed herein may be distributed amongst one or more other components. The one or more other components may be physically co-located or may be physically distributed (e.g., in a cloud computing environment). The one or more components may be owned and operated by one or more owners, although the overall orchestration of the components relevant to this disclosure may be performed at the direction of a single entity. Any suitable arrangement and / or integration of the various systems and devices of the environment 100 may be used.
[0083] In some embodiments, the components of the computing device 102 may be associated with a common entity (e.g., a single business or organization, etc.). Alternatively, one or more of the components may be associated with a different entity than another. In some embodiments, the computing device 102 may be a computer system such as, for example, a desktop computer, a mobile device, a tablet device, a laptop computer, a hybrid device, etc. The computing device 102 may include a display / user interface (UI) 102A, a processor 102B, a memory 102C, a database 102D, and / or a network interface 102E. TheAttorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO computing device may execute, by the processor or multiple processors 102B, an operating system (O / S) and at least one electronic application (each stored in memory 102C). The electronic application may be a desktop program, a browser program, a web client, or a mobile application program (which may also be a browser program in a mobile O / S), system control software, system monitoring software, software development tools, or the like. In an aspect, the application may manage the memory 102C, such as a database, to store and provide, e.g., DMRs associated with certain samples. The display / UI 102A may be a touch screen or a display with other input systems (e.g., mouse, keyboard, etc.) so that the user(s) may interact with the application and / or the O / S. The network interface 102E may be a TCP / IP network interface for, e.g., Ethernet or wireless communications with a network (not illustrated). The processor 102B, while executing the application, may generate data and / or receive user inputs from the display / UI 102A and / or receive / transmit messages to external components.
[0084] The electronic application, executed by processor 102B of computing device 102, may generate one or many points of data that can be accessed, viewed, and / or interacted with by a user of the computing device 102. As an example, the electronic application may enable users to view, edit, and control processing of sequence reads associated with received genomic data or the generation of feature values and / or datasets. A user may further utilize the electronic application to generate and compare genotype datasets among samples and as associated with feature values, as further described herein.
[0085] The computing device 102 may include an electronic data system, computer- readable memory, such as a hard drive, flash drive, disk, etc. In some embodiments, the computing device 102 includes and / or interacts with an application programming interface for exchanging data to other systems, e.g., one or more of the other components of the environment. The computing device 102 may include and / or act as the host for an application platform (e.g., a DMR identification component, an initiating methylation change detection component, etc.) that may be accessible by users and / or other components.
[0086] The processor 102B may include and / or execute instructions to implement a DMR identification component 105, which may include executing methylation toolbox 107A and tertiary analyses component 107B. Methylation toolbox 107 A may be an application programming interface (API) configured to compute sample-level features from raw (or pre- processed) methylation patterns of genetic material, combining these features into user- defined groups and modeling the methylation states between groups to identify biologicallyAttorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO and / or clinically meaningful DMRs, as described further below. The user-defined groups may be, for example, participant samples, genomic intervals, or other group definitions.
[0087] The identified DMRs may then be output as methylation toolbox results that provide data concerning the methylation states across different cancer groups or cancer stages (and / or by alternative or additional user-defined metrics or criteria). The methylation toolbox results may be further processed and / or manipulated by tertiary analyses component 107B according to user-defined parameters.
[0088] The tertiary analysis component 107B may be utilized to manipulate and / or analyze the results of the methylation toolbox 107 A according to user-defined parameters to allow for interpretation of identified DMRs and / or initiating methylation changes. As a nonlimiting example, methylation toolbox results may include identified DMRs that may be utilized for further analysis or applications. Similarly, for example, tertiary analysis component 107B may be utilized to annotate DMRs (e.g., by relevant genes, biological functions, pathways, type of gene regions, etc.) or to manipulate DMRs for specific applications or according to specific user-defined requirements. Similarly, for example, high- level analysis and plotting of methylation toolbox results may be generated by tertiary analysis component 107B, where for example, visualizations may be generated for clearer understanding and representation of methylation patterns and / or methylation variances, such as identification and representation of methylation trends across cancer stages (e.g., from stage I to stage IV, or from stage II to stage IV, as compared to non-cancer sample(s)). Additionally for example, the tertiary analysis component 107B may utilize methylation toolbox results for exploratory modeling and inferencing prior to implementation in further applications, such as consolidating the methylation toolbox results to summarize results and biological insights by cancer type across the genome. According to one non-limiting embodiment, an implementation may be accomplished via executable software (e.g., software written in R and / or Python and / or other suitable programming languages).
[0089] In an embodiment, methylation toolbox 107 A and tertiary analyses component 107B may both be contained within DMR identification component 105 on computing device 102. Alternatively, in another embodiment, one or both of methylation toolbox 107 A and tertiary analyses component 107B may reside on other components located within the system environment 100. For example, methylation toolbox 107A may reside on computing device 102 and tertiary analyses component 107B may reside on another computing device or server (not illustrated).Attorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO
[0090] In an embodiment, DMR identification component 105 may employ a “machine-learning model” or “trained classifier” that may be used in conjunction with the processes executed by methylation toolbox 107A and / or tertiary analysis component 107B. As used herein, a “machine-learning model” or “trained classifier” generally encompasses instructions, data, and / or a model configured to receive input, and apply one or more of a weight, bias, classification, or analysis on the input to generate an output. The output may include, for example, a classification of the input, an analysis based on the input, a design, process, prediction, or recommendation associated with the input, or any other suitable type of output. A machine-learning model is generally trained using training data, e.g., experiential data and / or samples of input data, which are fed into the model in order to establish, tune, or modify one or more aspects of the model, e.g., the weights, biases, criteria for forming classifications or clusters, or the like. Aspects of a machine-learning model may operate on an input linearly, in parallel, via a network (e.g., a neural network), or via any suitable configuration. In some aspects, the machine-learning model may be trained on a combination of real and synthetic sample data.
[0091] The execution of the machine-learning model may include deployment of one or more machine-learning techniques, such as k-nearest neighbors, linear regression, logistic regression, random forest, gradient boosted machine (GBM), deep learning, a deep neural network, and / or any other suitable machine-learning technique that solves problems in the field of Natural Language Processing (NLP). Supervised, semi -supervised, and / or unsupervised training may be employed. For example, supervised learning may include providing training data and labels corresponding to the training data, e.g., as ground truth. Unsupervised approaches may include clustering, classification or the like. K-means clustering or K-Nearest Neighbors may also be used, which may be supervised or unsupervised. Combinations of K-Nearest Neighbors and an unsupervised cluster technique may also be used. Any suitable type of training may be used, e.g., stochastic, gradient boosted, random seeded, recursive, epoch or batch-based, etc.
[0092] In an exemplary use case, a machine-learning model may be trained to analyze data from a test sample from a test subject whose status with respect to a medical condition is unknown and subsequently classifies the unknown test sample from the test subject based on the likelihood of the subject fitting into a particular category. In some embodiments, the one or more parameters may include a binomial probability score that is calculated based on logistic regression analysis. As disclosed herein, the binomial probabilityAttorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO score may correspond to the likelihood of a subject having a certain medical condition, such as cancer. For example, a score of over a predefined threshold may indicate that the subject associated with a test sample is more likely to have cancer than not have cancer. In some embodiments, the one or more parameters may include a sequencing or methylation data distribution pattern correlating with the presence of cancer. A subject associated with a test sample having sequencing or methylation data with a pattern resembling the cancer pattern to a sufficient degree may be predicted as having cancer. In some embodiments, a sequencing or methylation data distribution pattern may be identified in connection with a specific type of cancer, determining a tissue of origin or cancer signal origin, thus allowing a test sample to be classified as indicative of a certain cancer type.
[0093] Referring now to FIG. 5A, a process for detecting initiating methylation changes in sample genetic material is provided, according to one or more aspects of the present disclosure. For example, the detecting process described herein may be utilized to detect and identify specific DMRs as relevant for identifying cancer(s) at specific parameters, including for example, at specific cancer stages, CSOs, and / or tumor fraction levels.
[0094] At step 205, input file 12 is received and processed via arbitrary filtering for user-defined metrics, where input file 12 may include data concerning participants, samples, genomic intervals, and / or group definitions. It will be appreciated that step 205 may be omitted (or partially omitted) where processing of input file 12 is unnecessary before proceeding to step 210. For example, input file 12 may include existing methylation datasets that do not require preprocessing of step 205, or alternatively, may require limited preprocessing relative to input files 12 that include, for example, previously processed data that is properly configured for analysis. According to one non-limiting embodiment, the described system and method input sample(s) may be implemented using cfDNA, e.g., wherein the cfDNA is analyzed at Ik base pair bin intervals, and the cfDNA is sequenced via CCGA1 whole-genome bisulfite sequencing.
[0095] At step 210, methylation toolbox 107A analyzes the processed input file via DNA fragment level analyses, grouping of samples by user-defined metrics, and comparisons of resulting groupings according to user-defined metrics. Methylation toolbox 107A is configured to compute DNA fragment level metrics across arbitrary genomic intervals and different groupings of samples, synthesizing these metrics into blocks of consistent methylation profiles (e.g., DMRs). Methylation toolbox 107A may interface with tools configured to cluster large numbers of biosynthetic gene clusters (BGCs), for distributedAttorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO computation of genome-wide analyses. Methylation toolbox 107 A may filter DNA fragments based on at least one of p-value, length, CpGs, or other suitable metrics. Samples may be combined into groups (e.g., by cancer type, age, coverage, classification score, sample type, sample source, etc.), and computation may be performed for metrics and / or features per sample, per group, and / or between / among samples and / or groups. The aggregation of metrics over genomic intervals (e.g., per sample and / or per grouping) may be used to define DMRs within a genomic regions jointly across all groups. Arbitrary intermediate files may be output for interim analyses and validation by, for example, tertiary analyses component 107B.
[0096] It will be appreciated that step 205 may be performed external to DMR identification component 105, wherein the data to be output in step 205 may be previously provided as part of input file 12 where, for example, input file 12 may include existing methylation datasets that do not require preprocessing or may require limited preprocessing.
[0097] At step 215, the methylation toolbox results are output for performance of tertiary analysis at step 220, wherein the methylation toolbox results may be manipulated and / or analyzed according to user-defined parameters to allow for interpretation of the identified DMRs and / or initiating methylation changes. As a non-limiting example, methylation toolbox results may include identified DMRs that may be utilized for further analysis or applications.
[0098] Similarly, for example, at step 220, DMRs may be annotated (e.g., by relevant genes, biological functions, pathways, type of gene regions, etc.) or may be manipulated for specific applications or according to specific user-defined requirements. Similarly, for example, high-level analysis and plotting of methylation toolbox results may be generated, where for example, visualizations may be generated for clearer understanding and representation of the methylation patterns and / or methylation variances, such as identification and representation of methylation trends across cancer stages (e.g., from stage I to stage IV, or from stage II to stage IV, as compared to non-cancer sample(s)). Additionally, for example, the methylation toolbox results may be used for exploratory modeling and inferencing prior to implementation in further applications, such as consolidating the methylation toolbox results to summarize results and biological insights by cancer type across the genome. According to one non-limiting embodiment, an implementation may be accomplished via executable software (e.g., software written in R and / or Python and / or other suitable programming languages).Attorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO
[0099] At step 225, the tertiary analysis results are output, whereupon the results may be analyzed or examined by a user or may be submitted for further processing.
[0100] In some aspects, biologically and / or clinically relevant DMRs identified may be output and stored in a database (e.g., database 102D, described in reference to FIG. 4). Outputs of the disclosed systems and methods may include, e.g., methylation states genomewide across different cancer types, or across all cancer types, and across demographic groups. In some aspects, such outputs may be used to inform how methylation drives different biological processes.
[0101] According to one non -limiting embodiment, the described system and method may execute separate processing events for samples from each cancer stage. Similarly, each separate processing event may be executed for one or multiple cancer-types for each cancer stage.
[0102] For example, in a cancer-stage specific analysis, the methylation toolbox 107 A may be utilized to compare cancerous reference samples from different cancer stages (i.e., stage I, stage II, stage III, or stage IV, or from stage II to stage IV) with a non-cancer sample (or sample group) to identify specific DMRs associated with a respective cancer stage and to identify methylation changes across individual cancer stages to detect initiating methylation changes. Similarly, for a tumor fraction level analysis, the methylation toolbox may be utilized to compare cancerous reference samples having different tumor fraction levels with a non-cancer sample (or sample group) to identify specific DMRs associated with a respective tumor fraction level and to identify methylation changes across tumor fraction levels to detect initiating methylation changes.
[0103] The methylation toolbox 107 A may compute the mean calculated methylation signal in cancerous samples and the mean calculated methylation signal in non- cancerous samples, and methylation toolbox 107A may compute the delta of these mean methylation signals for a specific region(s). The methylation toolbox 107A and / or tertiary analysis component 107B of DMR identification component 105 may then identify delta values that are statistically significant differentials for every region of every feature value being examined, including cancer type, tumor fraction level, and / or cancer stage. The output of these computations may be, e.g., p-values and / or effect sizes representing methylation differentials, wherein the output may be further manipulated by tertiary analysis component 107B to provide user-defined data, plotting, and / or analysis.Attorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO
[0104] It will be appreciated that methylation toolbox 107A and / or tertiary analysis component 107B of DMR identification component 105 may alternatively identify delta values that are statistically significant differentials for one or more region, one or more cancer types, and / or one or more cancer stages where an application requires that less than all regions, cancer types, and / or stages be examined. For example, this narrower, more targeted analysis may be part of a second (or more) processing round to further refine and / or focus the initial processing round. This application may be used for performing targeted analysis on genomic regions that are identified as particularly relevant to the user’s application or research. Similarly, this approach may also allow researchers to discard regions of little or no interest where, for example, the differential methylation signal(s) are not statistically significant.
[0105] For example, where the system environment 100 has identified a specific cancer type and / or a specific cancer stage, a user may perform additional targeted processing of specifically identified regions of interest to identify persistent features that are specific to cancer types and / or stages. Similarly, this approach may be used in post-treatment cancer detection applications, where a sample may be processed to detect minimal residual disease. For example, where an individual has been treated for a specific cancer type at a specific cancer stage, aspects of the present disclosure may be utilized to perform a narrow, targeted analysis of the individual’s sample post-treatment for potential detection of differential methylation relative to a non-cancerous sample and / or relative to an individual’s sample taken at the conclusion of treatment. Such an approach may allow detection of residual cancer and / or detection of cancer proliferation following treatment.
[0106] Referring now to FIG. 6, an illustration of a flow diagram 200 for detecting initiating methylation changes in sample genetic material is provided, according to one or more aspects of the present disclosure. In general, an input file 12 is input into and received by methylation toolbox 107A and is processed by methylation toolbox 107A, wherein the input file 12 may contain data concerning one or more genomic samples. In an embodiment, methylation toolbox 107 A analyzes the input file 12 via DNA fragment level analyses, groups samples by user-defined metrics, and compares resulting groupings according to user- defined metrics. Methylation toolbox 107 A may compute DNA fragment level metrics across arbitrary genomic intervals and different groupings of samples, synthesizing these metrics into blocks of consistent methylation profiles (e.g., DMRs).Attorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO
[0107] The processed results of methylation toolbox 107A are output as methylation toolbox results 110, whereupon the methylation toolbox results 110 may be further processed by tertiary analysis component 107B. For example, tertiary analysis component 107B may further analyze the output of methylation toolbox 107 A to allow for interpretation of the identified DMRs to detect initiating methylation changes, whereupon the processed results are output as DMR output 20.
[0108] Referring now to FIG. 7, an illustration of sample processing within the methylation toolbox 107A is provided. Sample 10 may be any source (e.g., whole blood, blood plasma, urine, tissue, white blood cell, tumor, etc.), type (e.g., cfDNA, gDNA, cfRNA, RNA, etc.) or assay from any methylation dataset, and samples 10 or sample groups 22 may be of a single cancer type or may be of more than one cancer type. According to one nonlimiting embodiment, the regions to be processed and analyzed may be 100 base pair intervals containing at least one CpG, although longer or shorter base pair intervals may be processed and analyzed. Similarly, the sample processing of FIG. 7 may be performed on an individually-identified region or on more than one region, concurrently or consecutively.
[0109] Fragment filter 305 may perform arbitrary sample filtering based upon user- defined metrics, including for example, p-value, number of CpGs per fragment, fragment length, etc. It will be appreciated that fragment filter 305 processing steps may be optional where, for example, sample 10 has been previously processed according to user-defined metrics.
[0110] Fragment to features module 310 may convert the methylation states of individual sample 10 fragments within a region into one or more sample features for that sample 10. For example, the fragment to features module 310 may process the mean and variances of any values (e.g., beta values) of specific fragments or CpG sections (i.e., CpG islands) in the specified region, outputting three main metrics per sample 10: mean (meanBeta), var (meanBeta), and mean (varBeta). Sample coverage may also be tracked for quality control purposes.
[0111] As another non-limiting example, fragments to features module 310 may process a histogram of fragment counts binned by any beta values of specific fragments, where such processing may be by variable numbers of CpGs and different length formats. As another non-limiting example, fragments to features module 310 may process any fit parameters where beta values for a region may be fit to a beta-binomial distribution. As another non-limiting example, fragments to features module 310 may process the quantiles ofAttorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO fragment or CpG beta values in a specified region. It will be appreciated that genomic regions may contain one or more CpGs with options to exclude specific CpGs (e.g., noisy CpGs that may overlap an SNP site) where desired by a user, as regions are user-specified. In an aspect, computations within each region may be made independent of other regions or may be made across different regions or integrated with different regions.
[0112] Samples to group module 315 computes group-level metrics from the set of samples 10 contained within a sample group 22 prior to determining status of the DMR in the sample 10 region. Particularly, samples to group module 315 receives a set(s) of sample features for a user-defined group of participants and computes / infers metrics describing the group or samples 10. It will be appreciated that if the user-defined metrics for the system environment 100 require sample-level metrics and / or inter-sample group 22 comparisons, samples to group module 315 may simply pass through the set of sample features for each group to the groups to DMR module 320. In an aspect, each group may be treated and / or processed independently of other groups.
[0113] As a non-limiting example, group-level computations performed by samples to group module 315 may include: the mean and variance over all the sample features; a histogram of sample features; and / or fit parameters to a distribution of sample features.
[0114] Sample groups 22 may contain features or other data for one or more samples 10 output from fragment to features module 310, and according to one non-limiting embodiment, a sample group 22 may contain approximately five samples 10. As illustrated in FIG. 7, there may be a plurality of sample groups, with M representing the total number of sample groups 22 in this non-limiting illustration.
[0115] Samples to group module 315 may perform groupings based upon arbitrary features or characteristics including, for example, grouping by: cancer type, cancer subtype, cancer stage, age, sex, coverage, classification score, sample type, sample source, and / or smoking status.
[0116] Groups to DMR module 320 defines the characteristics of a DMR and assesses the status of each sample group 22. Additionally, or alternatively, one or more samples 10 and or sample groups 22 may be selected as a reference group / sample. For example, a user may choose one sample group 22 to serve as a reference sample 24, wherein the each sample group’s 22 methylation levels may be assessed relative to the reference sample 24. In some aspects, reference sample 24 may be, e.g., a non-cancerous sample group 22. It will be appreciated by one of ordinary skill that reference sample 24 may consist of aAttorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO single sample or sample group or additionally or alternatively, reference sample 24 may consist of one or more samples or sample groups. For example, groups to DMR module 320 may apply a DMR model across sample groups 22 and assess relative methylation among or between sample groups 22 and between sample groups and a reference sample 24. Particularly, groups to DMR module 320 may process the output from the samples to group module 315 to determine which sample groups 22 are differentially methylated, and if so, the classification into which each sample group 22 falls (e.g., hypermethylated, hypomethylated, no change, or insufficiently differentiated). The resulting complexity of the model and the statistical rigor or precision applied in classifying each sample group 22 may be defined by a user.
[0117] According to one non -limiting embodiment, the user may pick one sample group 22 (or sample 10) as a reference group 24, and the methylation levels and / or patterns of the remaining sample groups 22 (or a subset of those sample groups 22) may be assessed against the methylation levels and / or patterns of the reference group 24. For example, a reference group 24 may include a non-cancerous sample, and the remaining sample groups 22 may include exclusively cancerous samples, wherein said cancerous samples may include arbitrary sample groupings by features (e.g., by cancer type, cancer subtype, cancer stage, tumor fraction level, etc.). The methylation levels / patterns of the non-cancerous reference group 24 may thus be assessed against the methylation levels / patterns of the cancerous sample groups 22.
[0118] According to one non-limiting embodiment, groups to DMR module 320 computes DMRs for each sample group 22 as compared to the reference group 24 utilizing the relative difference between the distributions where, for example, the computed DMR signal is defined as the difference in means divided by the square root of the sum over all variance terms. If the DMR signal is greater than a first specified threshold value, the sample group(s) 22 is classified as hypermethylated; if the DMR signal is less than a second specified threshold value, the sample group(s) 22 is classified as not sufficiently differentiated relative to the reference sample 24.
[0119] Furthermore, for example, various cutoff thresholds utilized in the processing and comparing processes may promote accurate results, even for samples with varying characteristics (e.g., different sample types, such as a cell-free DNA (cfDINA) or cell-free RNS (cfRNA) sample vs. tissue sample).Attorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO
[0120] Non-limiting examples of group-level computations performed by groups to DMR module 320 include: the Z-score distance cutoff between the reference group 24 and the non-reference sample groups; the p-value for the estimated distance between the reference group 24 and each non-reference sample group; or other statistical tests for the difference between two distributions (e.g., KS tests, Jansen-Shannon, etc.).
[0121] In an aspect, the changes and / or velocity of changes of the methylation levels / pattems of the non-cancerous reference group 24 may be assessed against the changes and / or velocity of changes of the cancerous sample groups 22. The output of this assessment may then be used to detect and identify initiating methylation changes that may drive cancer progression.
[0122] In an aspect, groups to DMR module 320 may assign DMR categories to sample groups 22, wherein such DMR categories are user-defined and / or created by groups to DMR module 320 or via an alternate component of methylation toolbox 107A and / or DMR identification component 105. According to one non-limiting embodiment, the assigned DMR categories may include: hypermethylated, hypomethylated, and / or unchanged, relative to the reference group 24.
[0123] A DMR signal value (sometimes referred to as a Z-score) may be calculated for each sample group 22 relative to the reference sample 24 and is defined by difference in mean values divided by the square root of the sum over all variance term values. The Z-score and / or additional metrics may then be assessed to determine whether a sample group(s) 22 is hypermethylated, hypomethylated, or not sufficiently differentiated from the reference sample 24.
[0124] In an aspect, after determining the category of each sample group 22, the DMRs may be assessed for excessive noise in the reference sample 24 and / or whether the DMR is quiescent (e.g., where, for example, none of the samples groups 22 are different from the reference sample 24).
[0125] Referring now to FIG. 8, a process for identifying DMRs and analyzing whether a DMR is hypermethylated, hypomethylated, or not sufficiently differentiated is provided, according to one or more aspects of the present disclosure. In an aspect, the goal of identifying DMRs and determining relative methylation states is to identify DMRs of interest to a user’s methylation analysis, which may then be used to assess the effect size and differential methylation values / patterns as between cancerous and non-cancerous samples. The results may be further used to identify persistent differentially methylated regions andAttorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO their accompanying patterns / values that are associated with specific cancer types, cancer stages, or other identifying information.
[0126] At step 405, one or more samples 10 or sample groups 22 may be received at methylation toolbox 107 A. The methylation toolbox 107 A may compute the mean and variance for all sample fragments and organize the fragments into groups, whereupon the mean and variance within each group may be calculated. At step 410, the methylation signal values are compared to a reference sample using the relative difference between the two distributions (e.g., the difference in means of the sample vs. reference sample, divided by the square root of the sum over all variance terms). At step 415, DMRs and accompanying signal values are identified according to the complexity and statistical rigor of user-defined classification. At step 420, a threshold check may be performed to determine whether a DMR signal value is greater than a first threshold. This threshold check serves as a criterion for determining whether a specific DMR is hypermethylated. In certain exemplary aspects, the predetermined threshold value may be based on the Z-score calculation (e.g., the difference in means of the sample vs. reference sample, divided by the square root of the sum over all variance terms), p-values for the estimated distance between the sample(s) 10 or sample group(s) 22 and the reference group 24, or other statistical tests for the difference between two distributions.
[0127] Responsive to determining, at step 420, that the DMR signal value is greater than a first threshold, an embodiment may designate, at step 425, a result identifying that the sample(s) 10 or sample group(s) 22 are sufficiently hypermethylated relative to the reference sample 24. In contrast, responsive to determining, at step 420, that the DMR signal value is not greater than a first threshold, an embodiment may determine that sufficient hypermethylation is not present and proceed further in the process to step 430.
[0128] At step 430, a threshold check may be performed to determine whether a DMR signal value is less than a second threshold. Responsive to determining, at step 430 that the DMR signal value is less than a second threshold, an embodiment may designate, at step 435, a result identifying that the sample(s) 10 or sample group(s) 22 are sufficiently hypomethylated relative to the reference sample 24. In contrast, responsive to determining, at step 430, that the DMR signal value is not less than a second threshold, an embodiment may determine that sufficient hypomethylation is not present and may generate, at step 440, a result indicating that the DMR signal value is not sufficiently differentiated from the reference sample 24.Attomey Docket No. 00316-0025-00304Client Ref. No. P0221-WO
[0129] In some embodiments, situations may exist in which multiple regions of a sample 10 or sample group 22 are compared against the reference sample 24 or against alternate samples 10 or sample groups 22. At the conclusion of the comparison process for each of the samples 10 or sample groups 22, a subset of the samples 10 or sample groups 22 (or particular regions thereof) may be identified as being hypermethylated, hypomethylated, or not sufficiently differentiated, according to the defined threshold values for these features.
[0130] Additionally or alternatively to the foregoing, the DMR signal value may provide various non-binary decisions or insights that are not limited to “hypermethylated,” “hypomethylated,” or “not sufficiently different” outputs. For instance, upon analyzing the DMR signal value relative to the reference sample 24, an embodiment may provide insight into the degree or other quantification of the hypermethylation or hypomethylation levels, which may indicate a static, persistent, or progressive methylation level indicative of cancer type, cancer stage, or other characteristic information.
[0131] Referring now to FIG. 9, a process for identifying DMRs and detecting initiating methylation events and persistent methylation changes is provided, according to one or more aspects of the present disclosure. In an aspect, the goal of identifying DMRs and detecting initiating methylation states is to identify DMRs of interest to a user’s methylation analysis, which may be used to assess methylation states that indicate the potential presence of DNA associated with cancerous tissue. The results may be further used to identify persistent differentially methylated regions and their accompanying patterns / values that are associated with specific cancer types, cancer stages, or other identifying information.
[0132] At step 705, one or more samples 10 or sample groups 22 may be used to generate feature values and sample groups. The feature values may be associated with the specific sample and may include such categories as cancer type, cancer subtype, cancer stage, age of sample subject, sex of sample subject, smoking status of sample subject, etc.. If samples 10 are not already processed into sample groups 22, the samples 10 may be processed into sample groups based upon arbitrary characteristics including, for example, grouping by: cancer type, cancer subtype, cancer stage, age, sex, coverage, classification score, sample type, sample source, and / or smoking status.
[0133] At step 710, one or more sample datasets may be assembled for each sample or sample group. At step 715, the sample dataset may be compared with a reference sample dataset, where, for example, the sample dataset and reference sample dataset may include corresponding data types (e.g., methylation signal values, tumor fraction values, etc.). StepAttorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO715 may further include computation of differential mean and / or variance values for each sample dataset as calculated relative to the reference dataset as a representation of the comparison between the sample dataset and reference dataset. Similarly, the sample dataset and reference sample dataset that are compared may be of corresponding types where, for example, the sample dataset and reference dataset are obtained from similar sample types. The dataset may include, for example, Z-score calculation (e.g., the difference in means of the sample vs. reference sample, divided by the square root of the sum over all variance terms), p-values for the estimated distance between the sample(s) 10 or sample group(s) 22 and the reference group 24, mean methylation or tumor fraction differential values, or other statistical tests indicative of the difference between two distributions. It will be appreciated that according to alternative, non-limiting embodiments, step 715 may be performed with respect to individual samples 10 in addition to or alternative to performance with sample groups 22.
[0134] At step 720, a threshold check may be performed to determine whether a difference between the sample dataset and the reference dataset meets a differential threshold. The difference threshold may be user defined or may be generated based upon analysis and / or results from previous processing or comparison events.
[0135] Responsive to determining, at step 720, that the differential threshold has been met or exceeded, an embodiment may designate, at step 730, a result identifying the sample(s) 10 or sample group(s) 22 as including a DMR. In contrast, responsive to determining, at step 720, that the differential threshold has not been met or exceeded, an embodiment may designate, at step 725, a result identifying the sample(s) 10 or sample group(s) 22 as not including a DMR.
[0136] For each identified DMR, at step 735, an effect size is generated for each identified DMR, where, for example, the effect size may represent a mean methylation differential value between sample group versus reference group (or sample versus reference sample).
[0137] Responsive to determining, at step 740, that a change in effect size over feature value meets an initiating threshold, an embodiment may designate, at step 745, a result identifying an initiating methylation event. In contrast, responsive to determining, at step 740, that a change in effect size over feature value does not meet an initiating threshold, an embodiment may designate, at step 750, a result identifying an absence of an initiating methylation event.Attomey Docket No. 00316-0025-00304Client Ref. No. P0221-WO
[0138] Additionally or alternatively to the foregoing, the identification of an initiating methylation event(s) may not be limited to binary decisions where, for example, an embodiment may provide relative assessments of whether an initiating methylation event is detected.
[0139] Referring now to FIG. 10, an illustration is shown in which the system environment 100 identified a plurality of DMRs from cfDNA-derived sample groups, wherein these identified DMRs represent persisting features for initiating events in cancer. As shown in this example, DMRs are classified according to their correlation with features that were common to specific cancer stages II, III, and IV, as relevant to all cancer types. It will be appreciated that in certain embodiments, a user may analyze samples for features that are common to an additional or alternative set of cancer stages (e.g., stage I, stage II, stage III, and stage IV, or stage II to stage IV) relevant to specific cancer types.
[0140] Of the DMRs analyzed, 87 DMRs included features that were common to stage II, stage III, and stage IV. These 87 DMRs are identified to be “initiating methylation events,” since they are present at each of the three cancer stages, and thus were consistently associated with the presence of cancer.
[0141] Referring now to FIG. 11, the system environment 100 plotted the effect size of methylation for each of the 87 initiating methylation events identified in FIG. 10, wherein the effect size represents the mean methylation differential between cancer versus non-cancer groups. The x-axis of the plot represents the cancer stage (e.g., stage I, stage II, stage III, stage IV, or additionally or alternatively stage II to stage IV) of each cancerous sample. The y-axis of the plot represents the effect size of methylation, wherein hypermethylation corresponds to a positive (i.e., greater than zero) effect size, and hypomethylation corresponds to a negative (i.e., less than zero) effect size. In one embodiment, the effect size represents the difference in mean calculated methylation signal as between cancerous and non-cancerous samples and / or sample groups. As can be seen in FIG. 11, the general trend for most of the 87 initiating methylation events is that the effect size became greater (i.e., more positive or more negative) as the cancer stage advanced from stage I to stage IV (or from stage II to stage IV). This tends to suggest that the majority of the 87 DMRs identified (those that followed this trend) are indeed initiating methylation events associated with cancer, because they increased (i.e., became more positive or more negative), as the cancer stage advanced, as would be expected. This subset of DMRs then would be more indicative than all of the DMRs identified in FIG. 10, since all of the DMRs may not be associated withAttorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO cancer initiation and / or progression. The subset of DMRs present across at least stages II, III, and IV may facilitate early cancer detection in a way that DMRs only present in later cancer stages (e.g., stage IV or stages III and IV) may not be capable of doing.
[0142] Referring to the initiating methylation events identified in FIG. 11, one specific initiating methylation event, indicated by the arrow in FIG. 11, demonstrated a consistent, increasing positive effect size across cancer stages I, II, III, and IV. This specific initiating methylation event was identified as associated with the H0XA9 gene. Because of the H0XA9 gene’s known, significant functional role in multiple cancer types (e.g., breast, oral, lung, ovarian, bladder, leukemia), the process’s identification of the H0XA9 gene as an initiating methylation event serves as evidence corroborating the accuracy and sensitivity of the embodiment’s cancer signal detection.
[0143] For example, and as illustrated in FIG. 11, a number of DMRs can be assessed across cancer stages (e.g., stages I-IV or stages II-IV), where an increase in DMR effect size corresponds with an increase in cancer stage and may thus indicate the presence of an initiating methylation event present at a specific genomic position.
[0144] Referring now to FIG. 12, the embodiment’s classifier feature coverage for H0XA9 is shown as being densely correlated for numerous cancers, including upper GI, lung, pancreas / gallbl adder, and head / neck, as referenced by genomic positioning on chromosome 7.
[0145] Referring now to FIG. 13, H0XA9 methylation is shown to have a consistent signal across cancer stages in upper GI biopsies, as shown relative to a non-cancer signal, wherein the y-axis Beta values represent calculated methylation levels and the x-axis represents genomic positioning.
[0146] Referring now to FIG. 14, H0XA9 methylation is shown as plotted for a number of specific cancer types, as shown relative to methylation patterns for a non- cancerous sample, wherein the y-axis Beta values represent calculated methylation levels and the x-axis represents genomic positioning. H0XA9 hypermethylation is shown to be common across numerous cancer types, including, for example, sarcoma, bladder and urothelial, ovarian, upper GI, lung, head and neck, liver and bileduct, and melanoma. Based on these unique patterns, a sample’s cancer type may be identified and then processed for further analysis to obtain additional meaningful information from additional methylation characteristics or patterns, or from methylation patterns at different positions.Attomey Docket No. 00316-0025-00304Client Ref. No. P0221-WO
[0147] As shown in FIG. 15, the calculated HOXA9 methylation signal is shown to increase in correspondence with an increase in the cancer stage (i.e., from cancer stage I to cancer stage IV, or additionally or alternatively, stage II to stage IV).
[0148] Referring now to FIG. 16, an illustration of a sample plot of calculated beta values for a number of sample groups is shown as referenced against a non-cancer reference sample. These calculated beta values (y-axis) represent methylation levels for individual samples or sample groups, and the x-axis values represent genomic positioning.
[0149] Referring now to FIG. 17, an illustration of a sample plot of effect size (y- axis) versus cancer stage (x-axis) is shown. The plot illustrates the identified regions being approximately evenly distributed between hypermethylation and hypomethylation as cancer progresses from stage I to stage IV, or stage II to stage IV, with stage IV having higher absolute effect sizes for hypermethylation and hypomethylation.
[0150] In an aspect, DMRs that are identified from each cancer stage may be intersected post-hoc to identify significant genetic regions that are common to (or include features common to) multiple cancer stages. Identification of such intersections could allow additional or alternative means of identifying cancer-initiating methylation events.
[0151] In an aspect, methylation signal covariance may be jointly modeled across cancer stages to improve detection of subtle cancer signals (e.g., initiating methylation events) by utilizing information over multiple regions and cancer progression levels. Such a method may be important for detecting early stage (or low tumor fraction level) cancers when compared to a post-hoc intersection approach. For example, the stage I or stage II (or low tumor fraction level) state of certain cancers may include subtle methylation changes that are difficult to detect, where such a stage I or stage II (or low tumor fraction level) cancer may not include any (or relatively few) intersections with later cancer stages (or higher tumor fraction levels).
[0152] In an aspect, the described embodiments may identify DMRs demonstrating methylation progression trends that are more extreme than expected under a null model. For example, the most likely regions for initiating cancer progression can be identified by disregarding DMRs having inconsistent signals across cancer stages and / or DMRs where the largest relative sample difference does not occur at the latest cancer stage and / or at the highest tumor fraction level. Similarly, the described embodiments may utilize time-series based statistical methods to identify temporal trends over increasing cancer progression measurements. For example, differential temporal trends over separate cancer measurementAttorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO features (e.g., cancer type, cancer stage, tumor fraction level, etc.) may allow identification of initiating methylation events, as well as methylation features associated with transitions between cancer stages.
[0153] Although the specification discusses DNA in particular for convenience, it will be understood that any nucleic acid, such as RNA, may alternatively or additionally be used in embodiments of the disclosure.
[0154] Referring now back to FIG. 4, the electronic application, executed by processor 102B of computing device 102, may generate one or many points of data that can be accessed, viewed, and / or interacted with by a user of the computing device 102. As an example, the electronic application may enable users to view, edit, and control processing of fragment analysis, sample grouping and group analysis, DMR and initiating methylation event identification and analysis associated with received genomic data. A user may further utilize the electronic application to identify and compare DMRs and methylation (or other) values between individual samples 10, between individual samples 10 and reference groups 24, between sample groups 22, or between sample groups 22 and reference groups 24 (FIG. 7).
[0155] The computing device 102 may include an electronic data system, computer- readable memory, such as a hard drive, flash drive, disk, etc. In some embodiments, the computing device 102 includes and / or interacts with an application programming interface for exchanging data to other systems, e.g., one or more of the other components of the environment. The computing device 102 may include and / or act as the host for an application platform (e.g., a sample comparison platform, etc.) that may be accessible by users and / or other components.
[0156] In general, any process discussed in this disclosure that is understood to be computer-implementable may be performed by one or more processors of a computer system, such as system environment 100, as described above. A process or process step performed by one or more processors may also be referred to as an operation. The one or more processors may be configured to perform such processes by having access to instructions (e.g., software or computer-readable code) that, when executed by the one or more processors, cause the one or more processors to perform the processes. The instructions may be stored in a memory of the computer server. A processor may be a central processing unit (CPU), a graphics processing unit (GPU), or any suitable types of processing unit.Attomey Docket No. 00316-0025-00304Client Ref. No. P0221-WO
[0157] A computer system, such as system environment 100, may include one or more computing devices. If the one or more processors of the computer system are implemented as a plurality of processors, the plurality of processors may be included in a single computing device or distributed among a plurality of computing devices. If a system environment comprises a plurality of computing devices, the memory of the computer system may include the respective memory of each computing device of the plurality of computing devices.
[0158] FIG. 18 is a simplified functional block diagram of a computer system 600 that may be configured as a computing device for executing the processes described herein, according to exemplary embodiments of the present disclosure. In various embodiments, any of the systems herein may be an assembly of hardware including, for example, a data communication interface 620 for packet data communication. The platform also may include a central processing unit (“CPU”) 602, in the form of one or more processors, for executing program instructions. The platform may include an internal communication bus 608, and a storage unit 606 (such as ROM, HDD, SDD, etc.) that may store data on a computer readable medium 622, although the system 600 may receive programming and data via network communications via electronic network 625 (e.g., voice, video, audio, images, or any other data over the electronic network 625). The system 600 may also have a memory 604 (such as RAM) storing instructions 624 for executing techniques presented herein, although the instructions 624 may be stored temporarily or permanently within other modules of system 600 (e.g., processor 602 and / or computer readable medium 622). The system 600 also may include input and output ports 612 and / or a display 610 to connect with input and output devices such as keyboards, mice, touchscreens, monitors, displays, etc. The various system functions may be implemented in a distributed fashion on a number of similar platforms, to distribute the processing load. Alternatively, the systems may be implemented by appropriate programming of one computer hardware platform.
[0159] In this disclosure, the term “based on” means “based at least in part on.” The singular forms “a,” “an,” and “the” include plural referents unless the context dictates otherwise. The term “exemplary” is used in the sense of “example” rather than “ideal.” The terms “comprises,” “comprising,” “includes,” “including,” or other variations thereof, are intended to cover a non-exclusive inclusion such that a process, method, or product that comprises a list of elements does not necessarily include only those elements, but may include other elements not expressly listed or inherent to such a process, method, article, orAttomey Docket No. 00316-0025-00304Client Ref. No. P0221-WO apparatus. Relative terms, such as “about,” “approximately,” “substantially,” and “generally,” are used to indicate a possible variation of ±10% of a stated or understood value. In addition, the term “between” used in describing ranges of values is intended to include the minimum and maximum values described herein. The use of the term “or” in the claims and specification is used to mean “and / or” unless explicitly indicated to refer to alternatives only or the alternatives are mutually exclusive, although the disclosure supports a definition that refers to only alternatives and “and / or.” As used herein, “another” may mean at least a second or more.
[0160] As used herein, the term “user” generally encompasses any person or entity, such as a researcher and / or a care provider (e.g., a doctor, etc.), that may desire information, resolution of an issue, or engage in any other type of interaction with a provider of the systems and methods described herein (e.g., via an application interface resident on their electronic device, etc.). The term “electronic application” or “application” may be used interchangeably with other terms like “program,” or the like, and generally encompasses software that is configured to interact with, modify, override, supplement, or operate in conjunction with other software.
[0161] Program aspects of the technology may be thought of as “products” or “articles of manufacture,” typically in the form of executable code and / or associated data that is carried on or embodied in a type of machine-readable medium. “Storage” type media include any or all of the tangible memory of the computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which may provide non-transitory storage at any time for the software programming. All or portions of the software may at times be communicated through the Internet or various other telecommunication networks. Such communications, for example, may enable loading of the software from one computer or processor into another, for example, from a management server or host computer of the mobile communication network into the computer platform of a server and / or from a server to the mobile device. Thus, another type of media that may bear the software elements includes optical, electrical and electromagnetic waves, such as used across physical interfaces between local devices, through wired and optical landline networks and over various air-links. The physical elements that carry such waves, such as wired or wireless links, optical links, or the like, also may be considered as media bearing the software. As used herein, unless restricted to non-Attorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO transitory, tangible “storage” media, terms such as computer or machine “readable medium” refer to any medium that participates in providing instructions to a processor for execution.
[0162] Furthermore, while some embodiments described herein include some but not other features included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention, and form different embodiments, as would be understood by those skilled in the art. For example, in the following claims, any of the claimed embodiments can be used in any combination.
[0163] Thus, while certain embodiments have been described, those skilled in the art will recognize that other and further modifications may be made thereto without departing from the spirit of the disclosure, and it is intended to claim all such changes and modifications as falling within the scope of the disclosure. For example, functionality may be added or deleted from the block diagrams, and operations may be interchanged among functional blocks. Steps may be added to or deleted from methods described within the scope of the present invention.
[0164] The above disclosed subject matter is to be considered illustrative, and not restrictive, and the appended claims are intended to cover all such modifications, enhancements, and other implementations, which fall within the true spirit and scope of the present disclosure. Thus, to the maximum extent allowed by law, the scope of the present disclosure is to be determined by the broadest permissible interpretation of the following claims and their equivalents, and shall not be restricted or limited by the foregoing detailed description. While various implementations of the disclosure have been described, it will be apparent to those of ordinary skill in the art that many more implementations are possible within the scope of the disclosure. Accordingly, the disclosure is not to be restricted except in light of the attached claims and their equivalents.
Claims
Attomey Docket No. 00316-0025-00304Client Ref. No. P0221-WOWHAT IS CLAIMED IS:
1. A computer-implemented method for detecting initiating methylation changes, comprising: receiving, at a computer system, genomic data associated with at least one sample, wherein the genomic data includes sequenced genetic material corresponding to the at least one sample; generating, using a processor associated with the computer system, a feature value for at least one region of the at least one sample; assembling, using the processor, at least one sample group based upon the feature value for the at least one sample, wherein each sample group is assembled according to user- defined feature value criteria; generating, using the processor, at least one sample dataset for each of the at least one sample groups; comparing, using the processor, each at least one sample dataset associated with a sample group to a reference dataset associated with a reference sample; identifying, based on the comparing and responsive to determining that an at least one sample dataset meets a differential threshold with the reference dataset, a differentially methylated region; generating, using the processor, an effect size for each differentially methylated region, wherein the effect size corresponds to a mean differential of the at least one sample dataset and the reference dataset and is associated with the feature value; comparing, using the processor, the effect size associated with each feature value to the effect size associated with each additional feature value; and identifying, responsive to determining that a change in the effect size over feature values meets an initiating threshold, an initiating methylation change, or responsive to determining that the change in effect size over feature values does not meet an initiating threshold, an absence of an initiating methylation change.
2. The computer implemented method of claim 1, wherein the feature value includes cancer type.Attorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO3. The computer implemented method of claim 1, wherein the feature value includes cancer stage.
4. The computer implemented method of claim 1, wherein the feature value includes age.
5. The computer implemented method of claim 1, wherein the feature value includes smoking status.
6. The computer implemented method of claim 1, wherein the at least one sample dataset and the reference dataset each include a methylation signal value.
7. The computer implemented method of claim 1, wherein the at least one sample dataset and the reference dataset each include a tumor fraction value.
8. The computer implemented method of claim 6, the method further comprising: generating, using the processor, a methylation differential value calculated as a difference between a sample methylation signal value associated with a differentially methylated region to a reference methylation signal value associated with a reference sample; comparing, using the processor, a methylation differential value with a first differential threshold and with a second differential threshold; and identifying, based on the comparing and responsive to determining that the methylation differential value is greater than a first differential threshold, a hypermethylated region, or identifying, based on the comparing and responsive to determining that the methylation differential value is less than a second differential threshold, a hypomethylated region, or identifying, based on comparing and responsive to determining that the methylation differential value is less than the first differential threshold and greater than the second differential threshold, an insufficiently differentiated region.
9. The computer-implemented method of claim 1, wherein the at least one sample and the reference sample are associated with a sample type selected from the group consisting of: cell-free DNA (cfDNA), cell-free RNA (cfRNA), bone marrow, urine, tissue, saliva, or plasma.Attorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO10. A system for detecting initiating methylation changes, comprising: one or more processors; and one or more computer readable media storing instructions that are executable by the one or more processors to perform operations to: receive, at a computer system, genomic data associated with at least one sample, wherein the genomic data includes sequenced genetic material corresponding to the at least one sample; generate, using a processor associated with the computer system, a feature value for at least one region of the at least one sample; assemble, using the processor, at least one sample group based upon the feature value for the at least one sample, wherein each sample group is assembled according to user-defined feature value criteria; generate, using the processor, at least one sample dataset for each of the at least one sample groups; compare, using the processor, each at least one sample dataset associated with a sample group to a reference dataset associated with a reference sample; identify, based on the comparing and responsive to determining that an at least one sample dataset meets a differential threshold with the reference dataset, a differentially methylated region; generate, using the processor, an effect size for each differentially methylated region, wherein the effect size corresponds to a mean differential of the at least one sample dataset and the reference dataset and is associated with the feature value; compare, using the processor, the effect size associated with each feature value to the effect size associated with each additional feature value; and identify, responsive to determining that a change in the effect size over feature values meets an initiating threshold, an initiating methylation change, or responsive to determining that the change in effect size over feature values does not meet an initiating threshold, an absence of an initiating methylation change.
11. The system of claim 10, wherein the feature value includes cancer type.
12. The system of claim 10, wherein the feature value includes cancer stage.Attorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO13. The system of claim 10, wherein the feature value includes age.
14. The system of claim 10, wherein the feature value includes smoking status.
15. The system of claim 10, wherein the at least one sample dataset and the reference dataset each include a methylation signal value.
16. The system of claim 10, wherein the at least one sample dataset and the reference dataset each include a tumor fraction value.
17. The system of claim 15, wherein the one or more computer readable media storing instructions that are executable by the one or more processors to perform operations further comprise operations to: generate, using the processor, a methylation differential value calculated as a difference between a sample methylation signal value associated with a differentially methylated region to a reference methylation signal value associated with a reference sample; compare, using the processor, a methylation differential value with a first differential threshold and with a second differential threshold; and identify, based on the comparing and responsive to determining that the methylation differential value is greater than a first differential threshold, a hypermethylated region, or identifying, based on the comparing and responsive to determining that the methylation differential value is less than a second differential threshold, a hypomethylated region, or identifying, based on comparing and responsive to determining that the methylation differential value is less than the first differential threshold and greater than the second differential threshold, an insufficiently differentiated region.
18. The system of claim 10, wherein the at least one sample and the reference sample are associated with a sample type selected from the group consisting of: cell-free DNA (cfDNA), cell-free RNA (cfRNA), bone marrow, urine, tissue, saliva, or plasma.Attorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO19. A non-transitory computer-readable medium storing computer-executable instructions which, when executed by a system, cause the system to perform operations comprising: receiving, at a computer system, genomic data associated with at least one sample, wherein the genomic data includes sequenced genetic material corresponding to the at least one sample; generating, using a processor associated with the computer system, a feature value for at least one region of the at least one sample; assembling, using the processor, at least one sample group based upon the feature value for the at least one sample, wherein each sample group is assembled according to user-defined feature value criteria; generating, using the processor, at least one sample dataset for each of the at least one sample groups; comparing, using the processor, each at least one sample dataset associated with a sample group to a reference dataset associated with a reference sample; identifying, based on the comparing and responsive to determining that an at least one sample dataset meets a differential threshold with the reference dataset, a differentially methylated region; generating, using the processor, an effect size for each differentially methylated region, wherein the effect size corresponds to a mean differential of the at least one sample dataset and the reference dataset and is associated with the feature value; comparing, using the processor, the effect size associated with each feature value to the effect size associated with each additional feature value; and identifying, responsive to determining that a change in the effect size over feature values meets an initiating threshold, an initiating methylation change, or responsive to determining that the change in effect size over feature values does not meet an initiating threshold, an absence of an initiating methylation change.
20. The non-transitory computer-readable medium of claim 19, wherein the feature value includes cancer type.Attorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO21. The non-transitory computer-readable medium of claim 19, wherein the feature value includes cancer stage.
22. The non-transitory computer-readable medium of claim 19, wherein the feature value includes age.
23. The non-transitory computer-readable medium of claim 19, wherein the feature value includes smoking status.
24. The non-transitory computer-readable medium of claim 19, wherein the at least one sample dataset and the reference dataset each include a methylation signal value.
25. The non-transitory computer-readable medium of claim 19, wherein the at least one sample dataset and the reference dataset each include a tumor fraction value.
26. The non-transitory computer-readable medium of claim 24, wherein the computer-executable instructions which, when executed by a system, cause the system to perform operations further comprise: generating, using the processor, a methylation differential value calculated as a difference between a sample methylation signal value associated with a differentially methylated region to a reference methylation signal value associated with a reference sample; comparing, using the processor, a methylation differential value with a first differential threshold and with a second differential threshold; and identifying, based on the comparing and responsive to determining that the methylation differential value is greater than a first differential threshold, a hypermethylated region, or identifying, based on the comparing and responsive to determining that the methylation differential value is less than a second differential threshold, a hypomethylated region, or identifying, based on comparing and responsive to determining that the methylation differential value is less than the first differential threshold and greater than the second differential threshold, an insufficiently differentiated region.Attorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO27. The system of claim 19, wherein the at least one sample and the reference sample are associated with a sample type selected from the group consisting of: cell-free DNA (cfDNA), cell-free RNA (cfRNA), bone marrow, urine, tissue, saliva, or plasma.
28. A computer-implemented method for identifying a differentially methylated region, comprising: receiving, at a computer system, genomic data associated with at least one sample, wherein the genomic data includes sequenced genetic material corresponding to the at least one sample; generating, using a processor associated with the computer system, a feature value for at least one region of the at least one sample; assembling, using the processor, at least one sample group based upon the feature value for the at least one sample, wherein each sample group is assembled according to user- defined feature value criteria; generating, using the processor, at least one sample dataset for each of the at least one sample groups; comparing, using the processor, each at least one sample dataset associated with a sample group to a reference dataset associated with a reference sample; and identifying, based on the comparing and responsive to determining that an at least one sample dataset meets a differential threshold with the reference dataset, a differentially methylated region.
29. A computer-implemented method for detecting initiating methylation changes, comprising: receiving, at a computer system, genomic data associated with at least one sample, wherein the genomic data includes sequenced genetic material corresponding to the at least one sample; receiving at a computer system, genomic data associated with at least one reference sample, wherein the genomic data associated with the at least one reference sample includes a reference dataset; generating, using a processor associated with the computer system, a feature value and at least one sample dataset for at least one region of the at least one sample;Attorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO comparing, using the processor, each at least one sample dataset associated with a sample group to the reference dataset; identifying, based on the comparing and responsive to determining that an at least one sample dataset meets a differential threshold with the reference dataset, a differentially methylated region; generating, using the processor, an effect size for at least one differentially methylated region associated with the at least one sample, wherein the effect size corresponds to a mean differential of the at least one sample dataset and a reference dataset, wherein the reference dataset is associated with a reference sample; comparing, using the processor, the effect size associated with each feature value to the effect size associated with each additional feature value; and identifying, responsive to determining that a change in the effect size over feature values meets an initiating threshold, an initiating methylation change, or responsive to determining that the change in effect size over feature values does not meet an initiating threshold, an absence of an initiating methylation change.
30. A computer-implemented method for detecting initiating methylation changes, comprising: receiving, at a computer system, genomic data associated with at least one sample and at least one reference sample, wherein the genomic data includes sequenced genetic material; generating, using a processor associated with the computer system, a feature value for at least one region of the at least one sample; assembling, using the processor, at least one sample group based upon the feature value for the at least one sample, wherein each sample group is assembled according to user- defined feature value criteria; generating, using the processor, at least one sample dataset for each of the at least one sample groups and a reference dataset for the at least one reference sample; generating, using the processor, an effect size for at least one differentially methylated region, wherein the effect size corresponds to a mean differential of each sample dataset and each reference dataset and is associated with the feature value; comparing, using the processor, the effect size associated with each feature value to the effect size associated with each additional feature value; andAttorney Docket No. 00316-0025-00304Client Ref. No. P0221-WO identifying, responsive to determining that a change in the effect size over feature values meets an initiating threshold, an initiating methylation change, or responsive to determining that the change in effect size over feature values does not meet an initiating threshold, an absence of an initiating methylation change.
31. A computer-implemented method for detecting initiating methylation changes, comprising: receiving, at a computer system, genomic data associated with at least one sample, wherein the genomic data includes sequenced genetic material corresponding to the at least one sample; generating, using a processor associated with the computer system, a feature value for at least one region of the at least one sample; generating, using the processor, at least one sample dataset for each of the at least one samples; comparing, using the processor, each at least one sample dataset associated with each sample to a reference dataset associated with a reference sample; identifying, based on the comparing and responsive to determining that an at least one sample dataset meets a differential threshold with the reference dataset, a differentially methylated region; generating, using the processor, an effect size for each differentially methylated region, wherein the effect size corresponds to a mean differential of the at least one sample dataset and the reference dataset and is associated with the feature value; comparing, using the processor, the effect size associated with each feature value to the effect size associated with each additional feature value; and identifying, responsive to determining that a change in the effect size over feature values meets an initiating threshold, an initiating methylation change, or responsive to determining that the change in effect size over feature values does not meet an initiating threshold, an absence of an initiating methylation change.