Determining significant copy number variants
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- FOUNDATION MEDICINE INC
- Filing Date
- 2025-10-30
- Publication Date
- 2026-06-11
Smart Images

Figure US2025053428_11062026_PF_FP_ABST
Abstract
Description
DETERMINING SIGNIFICANT COPY NUMBER VARIANTSCROSS-REFERENCE TO RELATED APPLICATION
[0001] This application claims priority to U. S. Provisional App. No. 63 / 714,820, which was filed on October 31, 2024 and is incorporated by reference herein in its entirety.BACKGROUND
[0002] Copy number variants are sequences that are repeated in the genome of a subject where the number and type of repeated sequences vary across different individuals of the same species. In some cases, the number of copies of a given sequence in a subject's genome can increase in somatic and / or germline cells during a process referred to as "amplification.” Amplification of a gene in a subject's genome can impact how much the gene is expressed, in some cases. Further, in some cases, a copy of a gene can be deleted from one chromosome of a subject's genome, such that they may only have a single copy of the gene on another chromosome, assuming a diploid genome. In some examples, copies of the gene are deleted from both chromosomes in a chromosome pair. These variants can be referred to as "losses.”
[0003] Copy number variants, such as amplification variants and heterozygous losses, are common in cancer cells. These variants can have a significant impact on cancer type, progression, and treatment options. Various techniques can be used to identify copy number variants from data representative of sequences of nucleic acid molecules in a sample obtained from a subject. However, existing techniques for identifying significant copy number variants suffer from limited robustness and reproducibility.BRIEF DESCRIPTION OF THE DRAWINGS
[0004] Various aspects of the disclosed methods, devices, and systems are set forth with particularity in the appended claims. A better understanding of the features and advantages of the disclosed methods, devices, and systems will be obtained by reference to the following detailed description of illustrative embodiments and the accompanying drawings, of which:
[0005] FIG. 1 illustrates an example environment for identifying significant copy numbers of segments of a genome of a subject.
[0006] FIG. 2 is a diagram illustrating how significant copy number variants are identified, according to various implementations of the present disclosure.
[0007] FIG. 3 illustrates an example process for determining whether a computed copy number of a sequence-of-interest is significant.
[0008] FIG. 4 illustrates an example report summarizing predicted categories of a cancer of a subject.
[0009] FIG. 5 illustrates an example environment for sequencing various nucleic acid molecules.
[0010] FIG. 6 illustrates one or more devices configured to perform various operations described herein.DETAILED DESCRIPTION
[0011] Various implementations of the present disclosure relate to techniques for identifying significant copy number variants (e.g., copy number amplifications and / or losses) present in a genome of a sample. In various cases, a ratio representing a modeled copy number of a segment (e.g., a sequence-of-interest, such as a gene) of a sample genome with respect to a model ploidy of a sample is calculated. The ratio, in various cases, is compared to one or more thresholds. A significance of the modeled copy number is identified based on the comparison of the ratio to thethreshold(s). In some cases, a report or genomic analysis is generated based on significant modeled copy numbers associated with various segments of the sample genome. In some examples, the report or genomic analysis is performed without utilizing insignificant modeled copy numbers associated with segments of the sample genome. In particular cases, a "significant” modeled copy number is any copy number whose associated ratio exceeds an amplification threshold (e.g., indicative of an amplification variant) or whose associated ratio is less than a loss threshold (e.g., indicative of a homozygous or heterozygous loss).
[0012] Implementations of the present disclosure provide significant improvements to the technical field of medical diagnostics. Modeling copy numbers based on next generation sequencing (NGS) data, such as data obtained by performing sequencing-by-synthesis techniques, produces uncertainty due to ploidy variations within the sample genome. By generating a ratio of a modeled copy number to the ploidy, significant modeled copy numbers can be efficiently distinguished from apparent copy number variants due to the ploidy of the sample.
[0013] Various analyses described herein cannot be performed in the human mind, or by pen and paper. For example, it is impossible for a human to identify the sequences of nucleic acid molecules in a sample obtained from a subject. Moreover, sequence read data obtained using various sequencing techniques described herein may represent numerous sequence reads that are sufficiently complex as to prevent analysis in the human mind or by pen and paper. Various techniques described herein for modeling a copy number of a segment, such as a gene, as well as modeling the ploidy of a sample, are similarly too complex to be performed in the human mind or by pen and paper.Example Definitions
[0014] As used herein, the terms "deoxyribonucleic acid,” "DNA,” "DNA molecule,” and their equivalents, may refer to a polymer of nucleotides (also referred to as "nucleobases”) containing deoxyribose. The nucleotides in DNA include cytosine (C), guanine (G), adenine (A), and thymine (T). Each DNA nucleotide includes a deoxyribose and a phosphate group. An example single-stranded DNA (ssDNA) molecule includes a chain of covalently bonded DNA nucleotides. In the example ssDNA molecule, the phosphate group of the mth nucleotide is covalently bonded to the deoxyribose of the (m-1)th nucleotide, wherein m is a positive integer greater than 2 and less than or equal to the number of DNA nucleotides in the chain. In various examples, DNA is double-stranded and includes two ssDNA molecules that are complementary to one another and coiled around each other in a double helix form. The nucleotides of one ssDNA molecule are hydrogen bonded to the nucleotides of the other ssDNA molecule. In particular, the pyrimidines (A and T) hydrogen bond to each other, and the purines (C and G) hydrogen bond to each other.
[0015] As used herein, the terms "ribonucleic acid,” "RNA,” "RNA molecule,” and their equivalents, may refer to a polymer of nucleotides containing ribose. The nucleotides in RNA include cytosine (C), guanine (G), adenine (A), and uracil (U). Each RNA nucleotide includes a ribose and a phosphate group. In an example RNA molecule, the phosphate group of the nth nucleotide is covalently bonded to the ribose of the (n-1)th nucleotide, wherein n is a positive integer greater than 2 and less than or equal to the number of RNA nucleotides in the chain. Messenger RNA (mRNA) is a type of RNA molecule that is synthesized (or "transcribed”) by RNA polymerase (an enzyme) to be complementary to a gene encoded in a DNA sequence, and is also used by a ribosome to synthesize a polypeptide or protein. An mRNA is therefore an example of a "coding RNA.” In various cases, intron sequences are removed from an mRNA via a process known as "RNA splicing.” MicroRNA ("miRNA”) are single-stranded RNA molecules that perform post-transcriptional gene expression regulation. For instance, a miRNA may bind to a complementary mRNAmolecule, thereby cleaving, destabilizing, or otherwise preventing the mRNA molecule from being translated into a polypeptide or protein by a ribosome. In various examples, a miRNA has a length in a range of 21 to 23 RNA nucleotides. As used herein, the terms "non-coding RNA” may refer to a type of RNA that is not translated into a protein. Examples of non-coding RNA include miRNA, transfer RNA (tRNA), and ribosomal RNA (rRNA). The term "functional RNA,” and its equivalents, may refer to any RNA molecule that impacts a biological process. For instance, functional RNA may include mRNA, miRNA, tRNA, rRNA, and the like.
[0016] As used herein, the term "base,” and its equivalents, may refer to a monomer of a polymer. For example, a base of DNA or RNA is a nucleotide.
[0017] As used herein, the term "base pair,” and its equivalents, may refer to a pair of complementary DNA nucleotides, which are hydrogen-bonded to one another in a double-stranded DNA molecule. For example, a base pair includes a first base in a first ssDNA and a second base in a second ssDNA, wherein the first and second bases are complementary and hydrogen-bonded to one another.
[0018] As used herein, the terms "nucleotide,” "nucleobase,” "nucleic acid,” "nucleic acid molecule,” and their equivalents, may refer to an organic molecule that includes a nitrogenous base, a sugar, and a phosphate group. In various cases, a nucleotide is a monomer of DNA or RNA. A nucleotide, for instance, is a chemical structure.
[0019] As used herein, the terms "3' end,” "3-prime end,” and their equivalents, may refer to a terminus of a singlestranded nucleotide polymer that includes a base whose third carbon in its deoxyribose or ribose is bound to a hydroxyl group while being unbound to another base.
[0020] As used herein, the terms "5' end,” "5-prime end,” and their equivalents, may refer to a terminus of a singlestranded nucleotide polymer that includes a base whose fifth carbon in its deoxyribose or ribose ring is unbound to another base. In some cases, the fifth carbon is bound to a phosphate group.
[0021] As used herein, the "length” of a polymer refers to a number of covalently bonded monomers that are included in the polymer. For instance, the length of a DNA molecule may be the number of covalently bonded nucleotides in at least one strand of the DNA molecule and / or the number of base pairs in the DNA molecule. In various examples, the length of an RNA molecule may be the number of covalently bonded nucleotides in the RNA molecule.
[0022] As used herein, the term "gene,” and its equivalents, refers to a sequence of DNA nucleotides that is transcribed into a functional RNA. The functional RNA, for instance, is RNA that is translated into a polypeptide or protein (e.g., mRNA) or that has some other biological function (e.g., miRNA, tRNA, etc.). A gene is "expressed” when it is used as a template to generate a functional RNA. A subject, for instance, has numerous genes contained in the subject's genome. A gene may include both introns and exons. As used herein, the term "intron,” and its equivalents, may refer to a subset of DNA nucleotides in a gene that is not used to code for any functional RNA that is expressed by the organism. As used herein, the term "exon,” and its equivalents, may refer to a subset of DNA nucleotides in a gene that is used to code for a functional RNA. For instance, an exon may encode a polypeptide or protein that is expressed by the organism. In various examples, a gene can be represented in data (e.g., as data representative of the sequence of DNA nucleotides in the gene) or as a chemical structure (e.g., as the sequence of DNA nucleotides itself).
[0023] As used herein, the term "genome,” and its equivalents, refers to the aggregate of genes of a subject. In various cases, a genome represents the sequences of several linear DNA molecules that are present in a subject'schromosomes. A "reference genome” refers to an aggregation of genes of one or more reference subjects. In various cases, a genome is represented in data.
[0024] As used herein, the terms "pangenome, ” "pan-genome,” "supragenome, ” and their equivalents, refers to an aggregate set of genes from multiple subgroups (e.g., strains) within a population (e.g., a clade) of subjects. A pangenome, for example, indicates genes that are present in all subjects within the population, as well as genes that are present in some of the subjects of the population. A pangenome is represented in data, for instance.
[0025] As used herein, the term "transcriptome, ” and its equivalents, refers to the aggregate of RNA sequences of a subject. In some cases, a transcriptome is limited to mRNA sequences. In various examples, a transcriptome is represented in data.
[0026] As used herein, the term "genomic DNA,” "gDNA,” "chromosomal DNA,” and their equivalents, may refer to DNA molecules that are obtained from a chromosome and / or nucleus of a cell.
[0027] As used herein, the terms "DNA fragment,” "fragment,” and their equivalents, may refer to DNA molecules that are excised and / or broken off from a larger DNA molecule.
[0028] As used herein, the terms "cell-free DNA,” "cfDNA,” and their equivalents, may refer to DNA fragments that are non-encapsulated and obtained outside of cells within a sample (e.g., a liquid biopsy sample).
[0029] As used herein, the terms "circulating tumor DNA,” "ctDNA,” and their equivalents, may refer to a cfDNA molecule that originates from a cancer cell.
[0030] As used herein, the term "promoter,” and its equivalents, may refer to a portion of a DNA molecule that binds one or more proteins in order to initiate transcription of a gene. For example, the promotor is located "upstream” of the gene. For example, the promotor is located between the 5' end of the DNA molecule and the gene. A promotor may include one or more binding sites for RNA polymerase, and / or one or more transcription factor binding sites. In some examples, a promotor includes one or more CpG islands. A promoter, for instance, includes a transcription start site.
[0031] As used herein, the terms "CpG island,” "CGI,” "CpG site,” and their equivalents, may refer to a continuous portion of a DNA molecule whose sequence includes greater than a threshold amount (e.g., greater than 50%) of G-C base pairs.
[0032] As used herein, the term "enhancer,” and its equivalents, may refer to a portion of a DNA molecule that binds one or more proteins in order to increase the chance that a gene will be transcribed. For instance, an enhancer includes one or more transcription factor binding sites. In various cases, an enhancer includes one or more CpG islands.
[0033] As used herein, the term "cancer,” and its equivalents, may refer to a condition of a subject in which particular cells (referred to as "cancer cells”) divide uncontrollably in the subject's body. In some cases, a cancer is characterized by a location or tissue type from which the cancer cells originated. In some examples, a cancer is characterized by a location or tissue type in which the cancer cells are located.
[0034] As used herein, the terms "tumor,” "neoplasm,” and their equivalents, may refer to a mass of tissue including cancer cells.
[0035] As used herein, the terms "tissue of origin,” "tissue origin,” and their equivalents, refers to a differentiated type of tissue from which cancer cells in the body of a subject began dividing uncontrollably in the subject's body.
[0036] As used herein, the terms "liquid biopsy,” "fluid biopsy,” and their equivalents, may refer to a process of obtaining a fluid sample from a subject's body. The sample, for instance, can be referred to as a "liquid biopsy sample.” Examples of fluids that are sampled from the body include blood, plasma, cerebrospinal fluid, sputum, stool, urine, lymphatic fluid, and saliva.
[0037] As used herein, the term "tissue biopsy,” and its equivalents, may refer to a process of obtaining a sample of cells from a subject's body. A tissue biopsy, in various cases, is performed by cutting a mass of cells from the subject's body. For instance, a tissue biopsy is a procedure performed by a surgeon, interventional radiologist, interventional cardiologist, or other specialized clinician. The term "tissue” or "tissue biopsy sample” can be used to refer to the sample of cells obtained using a tissue biopsy.
[0038] As used herein, the term "subject,” and its equivalents, may refer to a human or non-human animal. A subject that is receiving care from at least one care provider may be referred to as a "patient.”
[0039] As used herein, the terms "machine learning,” "ML,” "computer learning,” "artificial intelligence,” and their equivalents, may refer to the use of a computing devices to learn patterns in training data. The process of learning these patterns may be referred to as "training.” In particular cases, one or more computing devices may perform machine learning by executing a machine learning model. As used herein, the terms "machine learning model,” "ML model,” and their equivalents, may refer to data encoding instructions that, when executed by at least one computing device, causes the at least one computing device to learn patterns in training data by optimizing one or more metrics, values, or other types of parameters. After training, an ML model, when executed by at least one computing device, causes the at least one computing device to utilize the optimized parameters in order to perform one or more tasks.
[0040] As used herein, the term "variant,” and its equivalents, may refer to a difference between a subject genetic sequence and a reference sequence. For instance, a variant may correspond to a difference between one or more nucleotides in a genome of a subject and one or more corresponding nucleotides in at least one reference genome or pangenome. A variant may be characterized by its identity (e.g., what nucleotides are different), its position (e.g., where are the nucleotides located in the genome, what chromosome contains the nucleotides, what gene contains the nucleotides, etc.), its length (e.g., how many nucleotides are different from the reference sequence), its type (e.g., substitution, insertion, deletion, copy number alternation, rearrangement of fusion, etc.), and other features that indicates its significance and / or relevance. In some cases, a variant represents any apparent alteration in a sequence that has been read from a nucleic acid molecule with respect to the reference sequence, such as reads cleaved by restriction enzymes (RE). In various examples, a variant can be represented in data (e.g., by data characterizing the variant) or as a chemical structure (e.g., the nucleotides themselves). As used herein, the term "mutation,” and its equivalents, may refer to a change in a gene.
[0041] As used herein, the term "substitution,” and its equivalents, can refer to a nucleotide in a subject sequence that is different than an equivalent nucleotide (e.g., a nucleotide at the same position) in a reference sequence.
[0042] As used herein, the term "insertion,” and its equivalents, can refer to a nucleotide in a subject sequence that is added with respect to a reference sequence.
[0043] As used herein, the term "deletion,” and its equivalents, can refer to the removal of a nucleotide from a nucleotide sequence.
[0044] As used herein, the terms "copy number alternation,” "CNA,” "copy number variation,” "CNV,” and their equivalents, can refer to a portion of a reference sequence that is repeated.
[0045] As used herein, the terms "rearrangement of fusion,” "fusion rearrangement,” "translocation,” and their equivalents, can refer to a change in the relative position of one or more portions of a reference sequence, thereby generating a gene that was not present in the reference sequence.
[0046] As used herein, the term "sequencing,” and its equivalents, may refer to a process of identifying the order and identity of monomers in a polymer chain, such as the order and identity of nucleotides in a DNA or RNA molecule. The terms "whole genome sequencing,” "WGS,” and their equivalents, may refer to the process of sequencing an entire genome of a subject, including the introns and exons of the genes of the subject. The terms "whole exome sequencing,” "WES,” and their equivalents, may refer to the process of sequencing all exomes of a subject. The term "targeted sequencing,” and its equivalents, may refer to the process of sequencing a portion of the genome of a subject, such as sequencing a single gene of the subject. Various techniques can be utilized to sequence a DNA or RNA molecule, such as massively parallel sequencing (MPS), nanopore sequencing, direct sequencing, Sanger sequencing, or next-generation sequencing (NGS). In various cases, sequencing is performed on physical molecules (e.g., RNA or DNA) and is used to generate data.
[0047] As used herein, the terms "massive parallel sequencing,” "massively parallel sequencing,” "MPS,” and their equivalents, may refer to a technique for simultaneously performing multiple reactions that can be used to identify the order and identity of monomers in multiple polymer chains. In particular cases, massive parallel sequencing can be performed using sequencing-by-synthesis on clonally amplified DNA molecules that are located in spatially separated regions, which are individually monitored by sensors.
[0048] As used herein, the term "nanopore sequencing,” and its equivalents, may refer to a technique for identifying the order and identity of monomers in a polymer chain by transporting the polymer chain from a first space to a second space, wherein the first space and the second space are separated by a substrate, by directing the polymer chain through a small hole (known as a "nanopore”) embedded in the substrate, and monitoring a relative electrical signal (e.g., a voltage or current) between the first space and the second space.
[0049] As used herein, the term "sensor,” and its equivalents, may refer to a physical device or other apparatus that is configured to detect one or more detection signals.
[0050] As used herein, the term "detection signal,” and its equivalents, may refer to a physical signal that can be identified, characterized, or otherwise perceived by a sensor.
[0051] As used herein, the term "sequence read data,” and its equivalents, may refer to data that is indicative of an order and identity of monomers in a polymer, such as the order and identity of nucleotides in a DNA or RNA sequence. In various implementations, sequence read data is generated via a sequencing operation.
[0052] As used herein, the term "image,” and its equivalents, may refer to 2D or 3D array of data indicative of an array of pixels or voxels.
[0053] As used herein, the term "ligating,” and its equivalents, may refer to a process of joining two molecules together, for example, with a chemical bond.
[0054] As used herein, the term "adapter,” and its equivalents, may refer to an oligonucleotide that can be ligated to a target nucleic acid molecule. In various cases, an adapter prepares the target nucleic acid molecule for sequencing.
[0055] As used herein, the term "bait molecule,” and its equivalents, may refer to a nucleic acid molecule having a region that is complementary to a region of a target molecule (e.g., cfDNA). A bait molecule includes, for instance, anucleic acid molecule that can hybridize to ( / .e., is complementary to) a target molecule can be used to capture the target molecule. In some instances, the bait molecule is a capture oligonucleotide (or capture probe). In some instances, the bait molecule is suitable for solution phase hybridization to the target molecule. In some instances, the bait molecule is suitable for solid phase hybridization to the target molecule. In some instances, the bait molecule is suitable for both solution-phase and solid-phase hybridization to the target molecule. The design and construction of bait molecules is described in more detail in, e.g., International Patent Application Publication No. WO 2020 / 236941.
[0056] As used herein, the term "amplifying,” and its equivalents, may refer to a process of generating copies of a target molecule, such as a nucleic acid molecule.
[0057] As used herein, the term "hybridization,” and its equivalents, may refer to a process by which to complementary single-stranded nucleic acid molecules bind to one another, thereby forming a double-stranded nucleic acid molecule. In certain examples, the double-stranded nature of the nucleic acid molecule is maintained under stringent hybridization conditions. Exemplary stringent hybridization conditions include an overnight incubation at 42 °C in a solution including 50% formamide, 5XSSC (750 mM NaCI, 75 mM trisodium citrate), 50 mM sodium phosphate (pH 7.6), 5XDenhardt's solution, 10% dextran sulfate, and 20 pig / ml denatured, sheared salmon sperm DNA, followed by washing the filters in 0.1XSSC at 50 °C.
[0058] As used herein, the term "complementary,” and its equivalents, may refer to a state of two single-stranded nucleic acid molecules with respective sequences that cause the nucleic acid molecules to spontaneously hybridize to one another. One nucleic acid molecule, for instance, may have a sequence that causes each nucleic acid to hydrogen bond to a respective nucleic acid in the other nucleic acid molecule.
[0059] As used herein, the terms "therapy,” "treatment,” and their equivalents, may refer to a composition or process that can be used to remediate a health problem. Cancer therapies, for instance, include surgery, radiotherapy, chemotherapy, immunotherapy, cell-based therapies, and the like. Examples of cancer therapies include abemaciclib (Verzenio), abiraterone acetate (Zytiga), acalabrutinib (Calquence), ado-trastuzumab emtansine (Kadcyla), afatinib dimaleate (Gilotrif), aldesleukin (Proleukin), alectinib (Alecensa), alemtuzumab (Campath), alitretinoin (Panretin), alpelisib (Piqray), amivantamab-vmjw (Rybrevant), anastrozole (Arimidex), apalutamide (Erleada), asciminib hydrochloride (Scemblix), atezolizumab (Tecentriq), avapritinib (Ayvakit), avelumab (Bavencio), axicabtagene ciloleucel (Yescarta), axitinib (Inlyta), belantamab mafodotin-blmf (Blenrep), belimumab (Benlysta), belinostat (Beleodaq), belzutifan (Welireg), bevacizumab (Avastin), bexarotene (Targretin), binimetinib (Mektovi), blinatumomab (Blincyto), bortezomib (Velcade), bosutinib (Bosulif), brentuximab vedotin (Adcetris), brexucabtagene autoleucel (Tecartus), brigatinib (Alunbrig), cabazitaxel (Jevtana), cabozantinib (Cabometyx), cabozantinib (Cabometyx, Cometriq), canakinumab (Haris), capmatinib hydrochloride (Tabrecta), carfilzomib (Kyprolis), cemiplimab-rwlc (Libtayo), ceritinib (LDK378 / Zykadia), cetuximab (Erbitux), cobimetinib (Cotellic), copanlisib hydrochloride (Aliqopa), crizotinib (Xalkori), dabrafenib (Tafinlar), dacomitinib (Vizimpro), daratumumab (Darzalex), daratumumab and hyaluronidase-fihj (Darzalex Faspro), darolutamide (Nubeqa), dasatinib (Sprycel), denileukin diftitox (Ontak), denosumab (Xgeva), dinutuximab (Unituxin), dostarlimab-gxly (Jemperli), durvalumab (Imfinzi), duvelisib (Copiktra), elotuzumab (Empliciti), enasidenib mesylate (Idhifa), encorafenib (Braftovi), enfortumab vedotin-ejfv (Padcev), entrectinib (Rozlytrek), enzalutamide (Xtandi), erdafitinib (Balversa), erlotinib (Tarceva), everolimus (Afinitor), exemestane (Aromasin), fam-trastuzumab deruxtecan-nxki (Enhertu), fedratinib hydrochloride (Inrebic), fulvestrant(Faslodex), gefitinib (Iressa), gemtuzumab ozogamicin (Mylotarg), gilteritinib (Xospata), glasdegib maleate (Daurismo), hyaluronidase-zzxf (Phesgo), ibrutinib (Imbruvica), ibritumomab tiuxetan (Zevalin), idecabtagene vicleucel (Abecma), idelalisib (Zydelig), imatinib mesylate (Gleevec), infigratinib phosphate (Truseltiq), inotuzumab ozogamicin (Besponsa), iobenguane 1131 (Azedra), ipilimumab (Yervoy), isatuximab-irfc (Sarclisa), ivosidenib (Tibsovo), ixazomib citrate (Ninlaro), lanreotide acetate (Somatuline Depot), lapatinib (Tykerb), larotrectinib sulfate (Vitrakvi), Lenvatinib mesylate (Lenvima), letrozole (Femara), lisocabtagene maraleucel (Breyanzi), loncastuximab tesirine-lpyl (Zynlonta), lorlatinib (Lorbrena), lutetium Lu 177-dotatate (Lutathera), margetuximabcmkb (Margenza), midostaurin (Rydapt), mobocertinib succinate (Exkivity), mogamulizumab-kpkc (Poteligeo), moxetumomab pasudotox-tdfk (Lumoxiti), naxitamab-gqgk (Danyelza), necitumumab (Portrazza), neratinib maleate (Nerlynx), nilotinib (Tasigna), niraparib tosylate monohydrate (Zejula), nivolumab (Opdivo), obinutuzumab (Gazyva), ofatumumab (Arzerra), olaparib (Lynparza), olaratumab (Lartruvo), osimertinib (Tagrisso), palbociclib (Ibrance), panitumumab (Vectibix), panobinostat (Farydak), pazopanib (Votrient), pembrolizumab (Keytruda), pemigatinib (Pemazyre), pertuzumab (Perjeta), pexidartinib hydrochloride (Turalio), polatuzumab vedotin-piiq (Polivy), ponatinib hydrochloride (Iclusig), pralatrexate (Folotyn), pralsetinib (Gavreto), radium 223 dichloride (Xofigo), ramucirumab (Cyramza), regorafenib (Stivarga), ribociclib (Kisqali), ripretinib (Qinlock), rituximab (Rituxan), rituximab and hyaluronidase human (Rituxan Hycela), romidepsin (Istodax), rucaparib camsylate (Rubraca), ruxolitinib phosphate (Jakafi), sacituzumab govitecanhziy (Trodelvy), seliciclib, selinexor (Xpovio), selpercatinib (Retevmo), selumetinib sulfate (Koselugo), siltuximab (Sylvant), sipuleucel-T (Provenge), sirolimus protein-bound particles (Fyarro), sonidegib (Odomzo), sorafenib (Nexavar), sotorasib (Lumakras), sunitinib (Sutent), tafasitamab-cxix (Monjuvi), tagraxofusp-erzs (Elzonris), talazoparib tosylate (Talzenna), tamoxifen (Nolvadex), tazemetostat hydrobromide (Tazverik), tebentafusp-tebn (Kimmtrak), temsirolimus (Torisel), tepotinib hydrochloride (Tepmetko), tisagenlecleucel (Kymriah), tisotumab vedotin-tftv (Tivdak), tocilizumab (Actemra), tofacitinib (Xeljanz), tositumomab (Bexxar), trametinib (Mekinist), trastuzumab (Herceptin), tretinoin (Vesanoid), tivozanib hydrochloride (Fotivda), toremifene (Fareston), tucatinib (Tukysa), umbralisib tosylate (Ukoniq), vandetanib (Caprelsa), vemurafenib (Zelboraf), venetoclax (Venclexta), vismodegib (Erivedge), vorinostat (Zolinza), zanubrutinib (Brukinsa), ziv-aflibercept (Zaltrap), and combinations thereof. Examples of cancer therapies also include targeted antibody-based therapies (antibody-drug conjugates, antibody-radioisotope conjugates, and targeted immune cell therapies (e.g., immune effector cells genetically modified to express a chimeric antigen receptor (CAR).
[0060] As used herein, the term "treatment-responsive,” and its equivalents, may refer to a type of cancer cells that can be substantially killed using a predetermined type of therapy. For example, cancer cells of a subject may be responsive to a particular treatment if, after the subject is administered the treatment, the cancer cells are diminished by a particular progression level (e.g., radiographic progression level, marker-based progression level, such as prostate-specific antigen (PSA) progression, etc.). Accordingly, the responsiveness of the cells to the type of therapy may indicate the effectiveness of that therapy.
[0061] As used herein, the term "treatment-resistant,” and its equivalents, may refer to a type of cancer that cannot be substantially killed using a predetermined type of therapy.
[0062] As used herein, the term "metastasis profile,” and its equivalents, may refer to a propensity of a type of cancer to metastasize into one or more differentiated tumor types besides the cancer's tissue origin. In some implementations, the metastasis profile can further indicate the type of tissue in which the cancer can or is likely to metastasize.
[0063] As used herein, the term "clinical trial,” and its equivalents, may refer to a research study used to evaluate a hypothesis based on participation by one or more subjects. In various examples, a clinical trial can be used to assess the efficacy and / or safety of a proposed therapy. A clinical trial may be performed in furtherance of approval of a treatment by a regulatory authority (e.g., the United States Food & Drug Administration (FDA)).Description of Example Implementations
[0064] Various implementations of the present disclosure will now be described with reference to the accompanying Figures.
[0065] FIG. 1 illustrates an example environment 100 for identifying significant copy number variants of segments of a genome of a subject 102. In various cases, a subject 102 presents with one or more symptoms of a condition. The condition, for instance, is a pathological condition.
[0066] The subject 102, for instance, may present to the clinical environment with a lesion 104. In various cases, the lesion 104 may be a tumor that includes cancer cells. According to various examples, the subject 102 has one or more types of cancer, such as adrenal cancer, bladder cancer, blood cancer, bone cancer, brain cancer, breast cancer, carcinoma, cervical cancer, colon cancer, colorectal cancer, corpus uterine cancer, ear, nose and throat (ENT) cancer, endometrial cancer, esophageal cancer, gastrointestinal cancer, head and neck cancer, Hodgkin's disease, intestinal cancer, kidney cancer, larynx cancer, leukemia, liver cancer, lymph node cancer, lymphoma, lung cancer, melanoma, mesothelioma, myeloma, nasopharynx cancer, a neuroblastoma, non-Hodgkin's lymphoma, oral cancer, ovarian cancer, pancreatic cancer, penile cancer, pharynx cancer, prostate cancer, rectal cancer, sarcoma, seminoma, skin cancer, stomach cancer, a teratoma, testicular cancer, thyroid cancer, uterine cancer, vaginal cancer, a vascular tumor, or combinations or metastases thereof.
[0067] In some embodiments, the subject 102 has a B cell cancer (multiple myeloma), a melanoma, breast cancer, lung cancer, bronchus cancer, colorectal cancer, prostate cancer, pancreatic cancer, stomach cancer, ovarian cancer, urinary bladder cancer, brain cancer, central nervous system cancer, peripheral nervous system cancer, esophageal cancer, cervical cancer, uterine cancer, endometrial cancer, cancer of an oral cavity, cancer of a pharynx, liver cancer, kidney cancer, testicular cancer, biliary tract cancer, small bowel cancer, appendix cancer, salivary gland cancer, thyroid gland cancer, adrenal gland cancer, osteosarcoma, chondrosarcoma, a cancer of hematological tissue, an adenocarcinoma, an inflammatory myofibroblastic tumor, a gastrointestinal stromal tumor (GIST), colon cancer, multiple myeloma (MM), myelodysplastic syndrome (MDS), myeloproliferative disorder (MPD), acute lymphocytic leukemia (ALL), acute myelocytic leukemia (AML), chronic myelocytic leukemia (CML), chronic lymphocytic leukemia (CLL), polycythemia Vera, Hodgkin lymphoma, non-Hodgkin lymphoma (NHL), soft-tissue sarcoma, fibrosarcoma, myxosarcoma, liposarcoma, osteogenic sarcoma, chordoma, angiosarcoma, endotheliosarcoma, lymphangiosarcoma, lymphangioendotheliosarcoma, synovioma, mesothelioma, Ewing's tumor, leiomyosarcoma, rhabdomyosarcoma, squamous cell carcinoma, basal cell carcinoma, adenocarcinoma, sweat gland carcinoma, sebaceous gland carcinoma, papillary carcinoma, papillary adenocarcinomas, medullary carcinoma, bronchogenic carcinoma, renal cell carcinoma, hepatoma, bile duct carcinoma, choriocarcinoma, seminoma, embryonal carcinoma, Wilms' tumor, bladder carcinoma, epithelial carcinoma, glioma, astrocytoma, medulloblastoma, craniopharyngioma, ependymoma, pinealoma, hemangioblastoma, acoustic neuroma, oligodendroglioma, meningioma, neuroblastoma, retinoblastoma, follicular lymphoma, diffuse large B-cell lymphoma, mantle cell lymphoma, hepatocellular carcinoma,thyroid cancer, gastric cancer, head and neck cancer, small cell cancer, essential thrombocythemia, agnogenic myeloid metaplasia, hypereosinophilic syndrome, systemic mastocytosis, familiar hypereosinophilia, chronic eosinophilic leukemia, neuroendocrine cancers, or a carcinoid tumor.
[0068] In some embodiments, the subject 102 has acute lymphoblastic leukemia (Philadelphia chromosome positive), acute lymphoblastic leukemia (precursor B-cell), acute myeloid leukemia (FLT3+), acute myeloid leukemia (with an IDH2 mutation), anaplastic large cell lymphoma, basal cell carcinoma, B-cell chronic lymphocytic leukemia, bladder cancer, breast cancer (HER2 overexpressed / amplified), breast cancer (HER2+), breast cancer (HR+, HER2-), cervical cancer, cholangiocarcinoma, chronic lymphocytic leukemia, chronic lymphocytic leukemia (with 17p deletion), chronic myelogenous leukemia, chronic myelogenous leukemia (Philadelphia chromosome positive), classical Hodgkin lymphoma, colorectal cancer, colorectal cancer (dMMR / MSI-H), colorectal cancer (KRAS wild type), cryopyrin-associated periodic syndrome, a cutaneous T-cell lymphoma, dermatofibrosarcoma protuberans, a diffuse large B-cell lymphoma, fallopian tube cancer, a follicular B-cell non-Hodgkin lymphoma, a follicular lymphoma, gastric cancer, gastric cancer (HER2+), gastroesophageal junction (GEJ) adenocarcinoma, a gastrointestinal stromal tumor, a gastrointestinal stromal tumor (KIT+), a giant cell tumor of the bone, a glioblastoma, granulomatosis with polyangiitis, a head and neck squamous cell carcinoma, a hepatocellular carcinoma, Hodgkin lymphoma, juvenile idiopathic arthritis, lupus erythematosus, a mantle cell lymphoma, medullary thyroid cancer, melanoma, a melanoma with a BRAF V600 mutation, a melanoma with a BRAF V600E or V600K mutation, Merkel cell carcinoma, multicentric Castleman's disease, multiple hematologic malignancies including Philadelphia chromosome-positive ALL and CML, multiple myeloma, myelofibrosis, a non-Hodgkin's lymphoma, a nonresectable subependymal giant cell astrocytoma associated with tuberous sclerosis, a non-small cell lung cancer, a non-small cell lung cancer (ALK+), a non-small cell lung cancer (PD-L1+), a non-small cell lung cancer (with ALK fusion or ROS1 gene alteration), a non-small cell lung cancer (with BRAF V600E mutation), a non-small cell lung cancer (with an EGFR exon 19 deletion or exon 21 substitution (L858R) mutations), a non-small cell lung cancer (with an EGFR T790M mutation), a non-small cell lung cancer KRAS (+ / - G12C), a non-small cell lung cancer TMB-H, a non-small cell lung cancer MET exon 14 skipping, a non-small cell lung cancer ERBB2 inframe indel, a non-small cell lung cancer EGFR exon 20 indel, a neurotrophic tyrosine receptor kinase (NTRK)-positive cancer, ovarian cancer, ovarian cancer (with a BRCA mutation), pancreatic cancer, a pancreatic, gastrointestinal, or lung origin neuroendocrine tumor, a pediatric neuroblastoma, a peripheral T-cell lymphoma, peritoneal cancer, prostate cancer, a renal cell carcinoma, a small lymphocytic lymphoma, a soft tissue sarcoma, a solid tumor (MSI-H / dMMR), a squamous cell cancer of the head and neck, a squamous non-small cell lung cancer, thyroid cancer, a thyroid carcinoma, urothelial cancer, a urothelial carcinoma, or Waldenstrom's macroglobulinemia.
[0069] In various cases, a care provider 106 (also referred to as a "healthcare provider”) is responsible for diagnosing and / or treating the subject 102. According to some implementations, the lesion 104 may be initially identified using a noninvasive technique. For example, the lesion 104 may be visualized using an imaging modality, such as ultrasound, x-ray, computed tomography (CT), magnetic resonance imaging (MRI), positron emission tomography (PET), single photon emission CT (SPECT), or any combination thereof. Using the noninvasive technique, the care provider 106 may identify the presence of the lesion 104 but may be unable to determine whether the lesion 104 is a cancerous tumor using noninvasive diagnostic methodologies.
[0070] In various implementations, the care provider 106 is unable to accurately identify a condition of the subject 102 based solely on noninvasive diagnostic techniques. In various cases, the care provider 106 cannot conclusively determine whether the subject 102 has a type of cancer based on noninvasive diagnostic techniques. For example, the care provider 106 is unable to identify a type of the lesion 104 (e.g., a tumor) using imaging techniques. The care provider 106 may be unable to identify a characteristic of the subject 102 presenting with a disease (e.g., cancer), wherein the characteristic is determinative of, or at least correlated with, an effectiveness of at least one therapy at treating the disease, an ineffectiveness of at least one therapy at treating the disease, a survivability (e.g., a likelihood that the subject will survive by a predetermined date or time), an expected quality of life, at least one predetermined symptom, at least one comorbidity, another factor relevant to the prognosis associated with the disease, or any combination thereof.
[0071] To further assess the condition of the subject 102, a sample 108 is obtained from the subject 102. In some examples, the sample 108 includes a tissue biopsy sample. For instance, the sample 108 is obtained by removing cells from the lesion 104 and from the subject 102. In some cases, the tissue biopsy sample is surgically excised from the subject 102. The care provider 106 could identify the condition of the subject 102 using histochemistry and / or immunohistochemistry. For instance, the care provider 106 could surgically remove a tissue sample from the lesion 104 and / or review the tissue sample using histochemistry and / or immunohistochemistry.
[0072] In some cases, the sample includes a liquid biopsy sample. The liquid biopsy sample 108, for instance, includes blood, plasma, cerebrospinal fluid, sputum, stool, urine, lymphatic fluid, saliva, or some other fluid obtained from the body of the subject 102. In some cases, a blood sample is obtained intravenously from the subject 102. The liquid biopsy sample 108, according to various examples, is a plasma sample obtained from the blood of the subject 102. The liquid biopsy sample 108, for instance, can be obtained in a minimally invasive procedure, which could be performed by a medical technician rather than a surgeon.
[0073] The sample 108 includes nucleic acid molecules 110. According to some examples, the nucleic acid molecules 110 include genomic DNA (gDNA). For instance, the nucleic acid molecules 110 include chromosomal DNA that is located in, or extracted from, cells in the sample 108. According to some cases, the DNA is extracted from nuclei and the cells in the sample 108 using mechanical shearing and / or the introduction of a chemical (e.g., a detergent). The DNA may be subsequently isolated from proteins and other cellular materials. In some implementations, the nucleic acid molecules 110 indicate an entire genome of the subject 102 and / or the lesion 104. Thus, a genome of the subject 102 and / or the lesion 104 can be determined by sequencing the DNA in the nucleic acid molecules 110.
[0074] In some examples, the nucleic acid molecules 110 include RNA. In some implementations, the nucleic acid molecules 110 include messenger RNA (mRNA), microRNA, non-coding RNA, functional RNA, or any combination thereof. Various RNA in the nucleic acid molecules 110 may be indicative of proteins expressed in the cells of the subject 102 and / or the lesion 104.
[0075] In some cases, the nucleic acid molecules cell-free DNA (cfDNA). In examples in which the subject 102 has cancer (e.g., the lesion 104 is a cancerous tumor), the cfDNA, for instance, includes circulating tumor DNA (ctDNA) and / or non-ctDNA. In cases wherein the lesion 104 is a tumor, cancer cells within the lesion 104 will lyse and release the ctDNA into the bloodstream of the subject 102. In some cases, the ctDNA is released from circulating tumor cells (CTCs). Further, other cells additionally release non-ctDNA into the bloodstream of the subject. In general, the cfDNAincludes fragments with lengths that are in a range of 1 to 500, 3 to 500, or 100 to 500 bases long. For instance, the cfDNA includes fragments that are 170 bases long and / or fragments that are 340 bases long. For example, the cfDNA includes fragments that are 100 to 240 bases long and / or fragments that are 270 to 410 bases long.
[0076] In various cases, the sample 108 is transported to a location that is remote from the subject 102 for further processing. For example, the sample 108 is removed from the subject 102 in a clinical environment (e.g., a hospital) and is then transported to a remote laboratory for further testing and analysis.
[0077] A sequencer 112 is configured to generate sequence read data 114 indicating the sequences of the nucleic acid molecules 110. The sequencer 112, for instance, includes one or more devices that are configured to generate the sequence read data 114 by processing at least a portion of the sample 108. In some cases, the nucleic acid molecules 110 are extracted from the sample 108. The extraction can be performed by the sequencer 112, by another device, manually (e.g., by a laboratory technician), or any combination thereof. Any appropriate extraction method known to those of ordinary skill in the art can be utilized. The sequence read data 114, in some cases, corresponds to a portion of a genome of the sample 108 (e.g., a genome of the subject 102 and / or lesion 104). In some examples, sequence read data 114 is based on a full genome, full transcriptome, full exome, or a combination thereof, of the sample 108.
[0078] In various cases, the sequencer 112 is configured to perform one or more processes (e.g., chemical reactions) on the nucleic acid molecules 110 in order to prepare the nucleic acid molecules 110 for sequencing. For instance, the sequencer 112 may ligate adapters onto the nucleic acid molecules 110 and / or amplify the nucleic acid molecules 110, such that numerous copies of the ligated nucleic acid molecules 110 are available for sequencing. Examples of the adapters include, for example, amplification primers, flow cell adapter sequences, substrate adapter sequences, or sample index sequences. The nucleic acid molecules 110 (e.g., the ligated nucleic acid molecules 110) may be amplified by generating multiple copies of the nucleic acid molecules 110 using one or more techniques such as polymerase chain reaction (PCR), a non-PCR amplification technique, or an isothermal amplification technique.
[0079] The sequencer 112 may identify the length, position, and identity of the bases in the nucleic acid molecules 110 by sequencing the nucleic acid molecules 110 (e.g., the amplified and / or ligated nucleic acid molecules 110). In various implementations, the sequencer 112 utilizes first-generation sequencing (e.g., Sanger sequencing), second-generation sequencing (e.g., massive parallel sequencing), third-generation sequencing (e.g., nanopore sequencing), or a combination thereof. For instance, the sequencer 112 may be a next-generation sequencer configured to perform next-generation sequencing (NGS). In some cases, the sequencer 112 is configured to sequence substantially all of the nucleotides of all of the nucleic acid molecules 110 fragments obtained from the sample 108. In some examples, the sequencer 112 is configured to perform targeted sequencing. For instance, the sequencer 112 may determine whether the nucleic acid molecules 110 fragments contain one or more predetermined sequences at one or more genomic locations.
[0080] In various cases, the sequencer 112 includes one or more sensors that are configured to detect physical signals (also referred to as "detection signals”) that are indicative of the nucleotide sequences of the nucleic acid molecules 110. The sequencer 112 may perform sequencing-by-synthesis. For example, the sequencer 112 may include one or more optical sensors configured to detect optical signals emitted from fluorescently tagged nucleotide triphosphates (NTPs) that are joined together in a synthesized DNA strand using the ligated nucleic acid molecules110 as templates. The optical signals detected by the optical sensor(s), for instance, are indicative of the sequences of the nucleic acid molecules 110. The sequencer 112 may perform nanopore sequencing. In various cases, the sequencer 112 includes one or more electrical sensors configured to measure an electrical signal (e.g., an electrical current) across a substrate as the ligated nucleic acid molecules 110 are directed through a nanopore extending through the substrate. The electrical signal over time, in various cases, is indicative of the sequences of the nucleic acid molecules 110 in the sample 108. The sequencer 112, in various implementations, is configured to generate the sequence read data 114 as digital data based on the analog signals detected by the sensor(s). For instance, the sequencer 112 includes one or more analog to digital converters (ADCs). In various cases, the sequencer 112 includes at least one processor configured to generate the sequence read data 114.
[0081] In some implementations, the sequencer 112 performs RNA sequencing (RNA-seq) on the nucleic acid molecules 110. For example, the nucleic acid molecules 110 include RNA that is extracted from the sample 108. In some examples, the RNA in the nucleic acid molecules 110 is fragmented. In various implementations, complementary DNA (cDNA) is generated using reverse transcriptase, such that the cDNA includes sequences that are complementary to the RNA in the nucleic acid molecules 110 from the sample 108. The cDNA, according to various cases, can be sequenced using the DNA sequencing techniques described above. Accordingly, in some cases, the sequence read data 114 indicates sequences of RNA present in the sample 108, which may be indicative of the transcriptome of the subject 102 and / or the lesion 104.
[0082] In various cases, the sequencer 112 performs sequencing on a subset of the nucleic acid molecules 110. For instance, the sequencer 112 may perform targeted sequencing on one or more predetermined genes, such as any of the genes described herein. The sequencer 112, in some cases, may refrain from sequencing at least a portion of the nucleic acid molecules 110 that do not correspond to the subset.
[0083] Various analyses can be performed on the sequence read data 114 in order to identify a condition of the subject 102. For example, a copy number modeler 116 is configured to generate a copy number model 118 by analyzing the sequence read data 114. The copy number model 118, for instance, represents a computed copy number for each of multiple sequences-of-interest within the genome of the lesion 104 and / or the subject 102. A sequence-of-interest, for instance, is a segment within the genome.
[0084] In some cases, each sequence-of-interest includes one or more genes, or a portion of a gene. For instance, a sequence-of-interest may include one or more of, or include a portion of, at least one of ABL1, ACVR1B, AKT1, AKT2, AKT3, ALK, ALOX12B, AMER1, APC, AR, ARAF, ARFRP1, ARID1A, ASXL1, ATM, ATR, ATRX, AURKA, AURKB, AXIN1, AXL, BAP1, BARD1, BCL2, BCL2L1, BCL2L2, BCL6, BCOR, BCORL1, BCR, BRAF, BRCA1, BRCA2, BRD4, BRIP1, BTG1, BTG2, BTK, CALR, CARD11, CASP8, CBFB, CBL, CCND1, CCND2, CCND3, CCNE1, CD22, CD274, CD70, CD74, CD79A, CD79B, CDC73, CDH1, CDK12, CDK4, CDK6, CDK8, CDKN1A, CDKN1B, CDKN2A, CDKN2B, CDKN2C, CEBPA, CHEK1, CHEK2, CIC, CREBBP, CRKL, CSF1R, CSF3R, CTCF, CTNNA1, CTNNB1, CUL3, CUL4A, CXCR4, CYP17A1, DAXX, DDR1, DDR2, DIS3, DNMT3A, DOT1L, EED, EGFR, EMSY (C11orf30), EP300, EPHA3, EPHB1, EPHB4, ERBB2, ERBB3, ERBB4, ERCC4, ERG, ERRFI1, ESR1, ETV4, ETV5, ETV6, EWSR1, EZH2, EZR, FAM46C, FANCA, FANCC, FANCG, FANCL, FAS, FBXW7, FGF10, FGF12, FGF14, FGF19, FGF23, FGF3, FGF4, FGF6, FGFR1, FGFR2, FGFR3, FGFR4, FH, FLCN, FLT1, FLT3, FOXL2, FUBP1, GABRA6, GATA3, GATA4, GATA6, GID4 (C17orf39), GNA11, GNA13, GNAQ, GNAS, GRM3, GSK3B, H3F3A, HDAC1, HGF,HNF1A, HRAS, HSD3B1, ID3, IDH1, IDH2, IGF1R, IKBKE, IKZF1, INPP4B, IRF2, IRF4, IRS2, JAK1, JAK2, JAK3, JUN, KDM5A, KDM5C, KDM6A, KDR, KEAP1, KEL, KIT, KLHL6, KMT2A (MLL), KMT2D (MLL2), KRAS, LTK, LYN, MAF, MAP2K1, MAP2K2, MAP2K4, MAP3K1, MAP3K13, MAPK1, MCL1, MDM2, MDM4, MED12, MEF2B, MEN1, MERTK, MET, MITF, MKNK1, MLH1, MPL, MRE11A, MSH2, MSH3, MSH6, MST1R, MTAP, MTOR, MUTYH, MYB, MYC, MYCL, MYCN, MYD88, NBN, NF1, NF2, NFE2L2, NFKBIA, NKX2-1, NOTCH1, NOTCH2, NOTCH3, NPM1, NRAS, NT5C2, NTRK1, NTRK2, NTRK3, NUTM1, P2RY8, PALB2, PARK2, PARP1, PARP2, PARP3, PAX5, PBRM1, PDCD1, PDCD1LG2, PDGFRA, PDGFRB, PDK1, PIK3C2B, PIK3C2G, PIK3CA, PIK3CB, PIK3R1, PIM1, PMS2, POLD1, POLE, PPARG, PPP2R1A, PPP2R2A, PRDM1, PRKAR1A, PRKCI, PTCH1, PTEN, PTPN11, PTPRO, QKI, RAC1, RAD21, RAD51, RAD51B, RAD51C, RAD51D, RAD52, RAD54L, RAF1, RARA, RB1, RBM10, REL, RET, RICTOR, RNF43, ROS1, RPTOR, RSPO2, SDC4, SDHA, SDHB, SDHC, SDHD, SETD2, SF3B1, SGK1, SLC34A2, SMAD2, SMAD4, SMARCA4, SMARCB1, SMO, SNCAIP, SOCS1, SOX2, SOX9, SPEN, SPOP, SRC, STAG2, STAT3, STK11, SUFU, SYK, TBX3, TEK, TERC, TERT, TET2, TGFBR2, TIPARP, TMPRSS2, TNFAIP3, TNFRSF14, TP53, TSC1, TSC2, TYRO3, U2AF1, VEGFA, VHL, WHSC1, WHSC1L1, WT1, XPO1, XRCC2, ZNF217, ZNF703, ABL, ALK, ALL, B4GALNT1, BAFF, BCL2, BRAF, BRCA, BTK, CD19, CD20, CD3, CD30, CD319, CD38, CD52, CDK4, CDK6, CML, CRACC, CS1, CTLA-4, dMMR, EGFR, ERBB1, ERBB2, FGFR1-3, FLT3, GD2, HDAC, HER1, HER2, HR, IDH2, IL-10, IL-6, IL-6R, JAK1, JAK2, JAK3, KIT, KRAS, MEK, MET, MSI-H, mTOR, PARP, PD-1, PDGFR, PDGFRa, PDGFRβ, PD-L1, PI3Kδ, PIGF, PTCH, RAF, RANKL, RET, ROS1, SLAMF7, VEGF, VEGFA, or VEGFB.
[0085] For example, the copy number of a sequence-of-interest corresponds to a number of copies of the sequence-of-interest present in the genome of cells of the subject 102, such as cells in the lesion 104 itself. The genomes of cancer cells, for instance, can include various types of copy number variants. In some cases, these copy number variants can have a significant impact on the prognostic outcomes of host subjects, treatment resistance, treatment efficacy, and other characteristics related to expression. In some cases, if the cells of the lesion 104 include a significant number of copies of a gene, the cells may overexpress the gene.
[0086] The copy number of each sequence-of-interest can be calculated, for instance, by analyzing the sequence read data 114. In various cases, the copy number represents a modeled copy number, such as a copy number determined according to the techniques described in International Publication No. WO 2023 / 060250, which is incorporated by reference herein in its entirety. For instance, the modeled copy number is determined by fitting a model (e.g., a copy number grid model corresponding to the modeled copy number) to the sequence read data 114. In various cases, the modeled copy number can be calculated by determining a minor allele coverage ratio (e.g., for a gene-of-interest) by analyzing the sequence read data 114 and determining a major allele coverage ratio by analyzing the sequence read data 114. The minor and major allele coverage ratios, for instance, logically relate to the copy number of the gene associated with the minor and major alleles, as well as the tumor purity of the sample and the ploidy of the gene. The modeled copy number can further be determined by segmenting the genome into genomic segments and generating input data by determining a difference and / or sum between the major allele coverage ratio and the minor allele coverage ratio for genetic loci in the genome. If the difference between the major allele coverage ratio and the minor allele coverage ratio is plotted against the sum of the major allele coverage ratio and the minor allele coverage ratio, each genetic locus is expected to lie on one of a set of evenly spaced grid points. Copy numbers, for instance, are necessarily integer values. Various copy number grid models can be generated that represent thecopy number space scaled and translated as a function of ploidy and tumor purity values. For instance, the grid models can correspond to different tumor purity and tumor ploidy estimates.
[0087] In some cases, the modeled copy number is determined by further fitting different copy number grid models (corresponding to allowed copy number states) to the input data, and selecting one of the copy number grid models that fits the input data. The modeled copy number, for instance, corresponds to the selected copy number grid model. In various cases, the modeled copy number is an integer value.
[0088] The copy number model 118 indicates the calculated copy number of various sequences-of-interest in the genome of the subject 102 and / or lesion 104. However, in some cases, the calculated copy numbers are incorrect and / or insignificant. It may be beneficial to identify which of the calculated copy numbers in the copy number model 118 are significant— that is, relevant to the condition of the subject 102.
[0089] In various implementations of the present disclosure, a significance determiner 120 is configured to distinguish between significant and insignificant copy numbers indicated in the copy number model 118. The significance determiner 120, for instance, is configured to calculate a ratio for each sequence-of-interest (e.g., gene-of-interest) reflected in the copy number model 118. The ratio of a sequence-of-interest, for instance, is based on the modeled copy number for that sequence-of-interest (e.g., as indicated in the copy number model 118) and a ploidy 122 of the subject 102 and / or the lesion 104.
[0090] The ploidy 122, in various cases, represents an overall ploidy of the sample 108. In various cases, a ploidy calculator 124 is configured to determine the ploidy 122 based on the copy number model 118. For example, the ploidy calculator 124 determines the ploidy 122 by calculating an average (e.g., mean, median, mode, etc.) modeled copy number indicated in the copy number model 118. For instance, if the copy number model 118 indicates modeled copy numbers for one hundred sequences-of-interest, then the ploidy 122 may be calculated by dividing a sum of the modeled copy numbers by one hundred. In some cases, the ploidy 122 is weighted by genomic size and / or the number of supporting genomic data points.'
[0091] In various cases, the sequence determiner 120 compares each ratio to a threshold. To determine whether sequences-of-interest have been significantly amplified, the sequence determiner 120 may selectively report modeled copy numbers whose ratios exceed one or more thresholds. For example, if the significance determiner 120 determines that the ratio is equal to or above a threshold, then the significance determiner 120 may infer that the modeled copy number is significant. In contrast, if the sequence determiner 120 determines that the ratio is below the threshold, then the significance determiner 120 may infer that the modeled copy number is insignificant. In a particular instance, a modeled copy number of an EGFR gene is 10, a ploidy of the sample 108 is 2.5, the ratio is 4.0, and the threshold is 3.0. In this case, the copy number of the EFGR gene is determined to be significant.
[0092] Implementations of the present disclosure can also be utilized to identify losses of sequences-of-interest. For example, individuals without cancer may generally have two copies of a tumor suppressor gene (e.g., TP53), present on chromosome pairs of their respective genomes. However, the subject 102 may have experienced a mutation that deleted one or both copies of their tumor suppressor gene on their genome. This loss can be identified by comparing a modeled copy number for the tumor suppressor gene of the sample 108, generating a ratio by dividing the modeled copy number by the ploidy 122, and determining whether the ratio is equal to or below a loss threshold (e.g., 0.0 or 0.5). If the ratio is equal to or below the threshold, then the significance determiner 120 may indicate the loss of the tumor suppressor gene.
[0093] The significance determiner 120, in various cases, is configured to output one or more significant copy number variants 126. The significant copy number variant(s) 126, for instance, indicate modeled copy numbers, sequences-of-interest, ratios, or any combination thereof, of any significant copy numbers that have been identified. The significant copy number variant(s) 126, for example, correspond to any sequence-of-interest that has been determined to have been significantly amplified and / or subject to a heterozygous loss.
[0094] One or more thresholds can be applied to ratios of various sequences-of-interest. According to various cases, each threshold is in a range of 0.5 to 5.0. In various cases, example thresholds are equal to 0.0, 0.5, 1.0 1.5, 2.0, 3.0, 4.0, or 5.0.
[0095] In some examples, the threshold is specific to the sequence-of-interest associated with the ratio. For example, the ratio associated with the modeled copy number of a first sequence-of-interest may be compared to a first threshold, whereas the ratio associated with a modeled copy number of a second sequence-of-interest may be compared to a second threshold. In a particular example, the threshold is 2.0 for ERBB2 and 3.0 for a non-ERBB2 gene.
[0096] According to some cases, the threshold(s) can be modified based on one or more additional characteristics of the subject 102. The additional characteristics, for instance, include demographic characteristics of the subject 102 (e.g., a sex, age, etc. of the subject 102), previously diagnosed conditions of the subject 102 (e.g., whether the subject 102 is pregnant, whether the subject 102 has a genetic disorder, the presence and / or location of the lesion 104 as identified via imaging studies, etc.), and the like. Lowering the threshold for one or more sequences-of-interest, in various cases, may increase the sensitivity but potentially reduce the specificity of the indication of the significant copy number variant(s) 126. Lowering the threshold to increase sensitivity of the indication of the significant copy number variant(s) 126 may outweigh the drawbacks in cases in which contextual information already provides clues regarding a classification of a condition of the subject 102. For instance, if the subject 102 is known to have breast cancer (e.g., the lesion 104 is located in breast tissue of the subject 102 and has characteristics of a tumor), a threshold associated with a gene of particular interest to breast cancer classification (e.g., estrogen receptor (ER) gene) may be lowered as compared to a different subject that does not have breast cancer.
[0097] In some implementations, the threshold(s) can be based on the tissue type from which the sample 108 is obtained. For example, a first threshold may be utilized if the sample 108 is a liquid biopsy sample, a second threshold may be utilized if the sample 108 is a tissue biopsy sample obtained from breast tissue, a third threshold may be utilized if the sample 108 is a tissue biopsy sample obtained from liver tissue, and the like.
[0098] In various cases, the sequence determiner 120 selectively reports the significant copy numbers as one or more significant copy number variants 126. In some cases, the significant copy number variant(s) 126 omits copy numbers whose ratios were below the threshold.
[0099] The report generator 128 is configured to generate a report 130 based, at least in part, on the significant copy number variant(s) 126. The report 130, for example, includes consumable data that can inform the care provider 106 about a condition of the subject 102. In various implementations, the report 130 may indicate the results of additional analyses, such as the results of a histological study, whole transcriptome sequencing, cfRNA sequencing, whole exome sequencing, whole genome sequencing, a cancer (e.g., DNA) hotspot panel test, a DNA methylation test, a tumor mutational burden (TMB) test, a DNA fragmentation test, an RNA fragmentation test, a microsatellite instability (MSI) test, or a viral status test. The report 130, for example, may include a genomic profile of the subject 102 based on various combinations of the above analyses and tests.
[0100] Optionally, the report generator 128 includes a classifier that classifies a condition of the subject 102 based, at least in part, on the significant copy number variant(s) 126. For instance, the report generator 128 may include a pretrained machine learning (ML) model that is configured to classify the condition of the subject 102 based on input data that includes the significant copy number variant(s) 126. The input data may also include other characteristics of the subject 102 and / or the sample 108, for instance.
[0101] The classifier, in various cases, includes one or more model types. For instance, the classifier includes at least one of an artificial neural network (e.g., feedforward neural networks, multi-layer perceptrons (MLPs), convolutional neural networks (CNNs), and backpropagation models), a nearest-neighbor model, a regression analysis model, a clustering model (e.g., k-means clustering, mean-shift clustering, expectation-maximization (EM) clustering, and agglomerative hierarchical clustering), a principal component analysis model, a gradient boosting model, a random forest, or the like.
[0102] The classifier, in various cases, may be defined by various parameters that enable the classifier to identify predictive attributes of the input data that are correlated to or otherwise associated with example categories, such as cancer types, treatment resistance, effective treatments, prognostic indicators, or other clinically relevant classifications.
[0103] In some implementations, the report 130 indicates that a follow-up test of the subject 102 is recommended and / or needed. For instance, in response to determining that the categorization of the condition of the subject 102 is inconclusive, the report generator 128 may generate the report 130 to indicate that one or more additional tests (e.g., a histological study, genome sequencing, exome sequencing, additional DNA sequencing, RNA sequencing, transcriptome sequencing, etc.) should be performed in order to accurately identify the condition of the subject 102.
[0104] In various cases, the report 130 is output to a clinical device 132. For example, the report generator 128 transmits the report 130 to the clinical device 132. In various implementations, the clinical device 132 is a computing device that is operated by, owned by, or otherwise associated with the care provider 106. For instance, the clinical device 132 may be a desktop computer, a laptop computer, a smart phone, or some other computing device associated with the care provider 106. The clinical device 132, in various cases, outputs the report 130 to the care provider 106. In some cases, the clinical device 132 includes a display (e.g., a screen) that visually presents the report 130. In various cases, the clinical device 132 includes a speaker that outputs a sound indicative of the report 130. The clinical device 132, in various cases, may output the information in the report 130 using one or more output mechanisms or devices.
[0105] The care provider 106 may review the report 130 by interacting with the clinical device 132. The report 130, in various cases, may enhance the clinical decision-making of the care provider 106. For instance, the care provider 106 may prepare and / or administer a therapy to the subject 102 based on the report 130. According to various implementations, the care provider 106 may initiate the therapy and / or refer the subject 102 to another care provider to receive the therapy. In various cases, if the predicted condition of the subject 102 is a disease (e.g., cancer), the care provider 106 may prescribe, recommend, or administer an agent in order to treat the disease the subject 102. For instance, if the predicted condition of the subject 102 is a type of cancer, the care provider 106 may administer an anticancer therapy to the subject 102.
[0106] In various implementations, the care provider 106 may develop a diagnosis and / or prognosis of the subject 102 based on the report 130. For instance, the care provider 106 may determine to administer a treatment includingat least one of a chemotherapy, radiation therapy, immunotherapy, or surgery to the patient. In some cases, the care provider 106 determines a dosage of a treatment based on the report 130. In various implementations, the care provider 106 may communicate information in the report 130 to the subject 102. According to some cases, the care provider 106 may determine that the subject 102 qualifies for a clinical trial based on the report 130.
[0107] FIG. 1 illustrates various elements that can be embodied in one or more computing devices. For example, at least a portion of the functions of sequencer 112, the copy number modeler 116, the significance determiner 120, the ploidy calculator 124, the report generator 128, the clinical device 132, or a combination thereof, are performed by one or more processors in at least one computing device. Examples of computing devices include server computers, desktop computers, laptop computers, tablet computers, mobile phones, wearable devices, Internet of Things (loT) devices, and the like. In various cases, instructions for performing at least a portion of the functions of these elements are stored in memory and / or in a non-transitory computer readable medium. The instructions, for instance, are executed by the processor(s).
[0108] FIG. 1 also illustrates various types of data. For example, the sequence read data 114, the copy number model 118, the ploidy 122, the significant copy number variant(s) 126, the report 130, or any combination thereof, includes data. The various types of data illustrated in FIG. 1 may be stored, such as in memory or in non-transitory computer readable media. In various implementations, at least a portion of the data is transmitted or otherwise output by one or more computing devices. For example, a computing device may transmit one or more communication signals to another computing device, wherein the communication signal(s) encode at least a portion of the data. Examples of communication signals include electromagnetic signals, optical signals, ultrasonic signals, optical signals, and electrical signals. For example, communication signals can be transmitted wirelessly and / or in a wired fashion. The communication signals, for instance, are transmitted over one or more wireless channels and / or one or more wired channels (e.g., optical cabling, electrical cabling, etc.). In various cases, the communication signal(s) are transmitted over one or more communication networks. A communication network, for instance, may be defined according to one or more physical channels, such as one or more frequency spectra. In some cases, a communication network is defined according to one or more communication protocols and / or standards. Examples of communication networks include fiber optic networks, Institute of Electrical and Electronics Engineers (IEEE) networks (e.g., WI-FI™ networks, WiMAX networks, BLUETOOTH™ networks, etc.), cellular networks (e.g., a 3rdGeneration Partnership Project (3GPP) radio network, such as a Long Term Evolution (LTE) network, a New Radio (NR) network; or a cellular core network such as a 3rdGeneration (3G) core, a 4thGeneration (4G) core, a 5thGeneration (5G) core, etc.), ultrasonic networks, and the like. In some cases, the data is broadcasted from one device to multiple other devices. In some cases, the data is unicasted from one device to another device. For instance, various forms of data described herein may be transmitted via a peer-to-peer (P2P) connection.
[0109] FIG. 2 is a diagram illustrating how significant copy number variants are identified, according to various implementations of the present disclosure. A genomic region 200 of a subject includes multiple sequences-of-interest, such as a first gene 202, a second gene 204, a third gene 206, and a fourth gene 208. The genomic region 200, in various cases, is a subgenomic sequence. In various cases, the genomic region 200 is a continuous region of a single chromosome of the subject.
[0110] In various implementations, copy numbers are modeled for each of the sequences of interest. A first modeled copy number 210 indicates computed copies of the first gene 202 in the genomic region 200, a second modeled copy number 212 indicates computed copies of the second gene 204 in the genomic region 200, a third modeled copy number 214 indicates computed copies of the third gene 206 in the genomic region 200, and a fourth modeled copy number 216 indicates computed copies of the fourth gene 208 in the genomic region 200. In various cases, the first modeled copy number 210, the second modeled copy number 212, the third modeled copy number 214, and the fourth modeled copy number 216 are computed based on sequence read data corresponding to the genomic region 200 of the subject.
[0111] A ploidy of the genomic region 200 can be further calculated based on the sequence read data. For example, the ploidy can be determined by calculating an average modeled copy number across the genomic region 200. In various cases, the ploidy is calculated by dividing a sum of modeled copy numbers for various sequences (e.g., all or a portion of the sequences) of the genomic region 200 by the number of the sequences in the genomic region 200. For example, if the genomic region 200 is a region of a chromosome, the ploidy may represent an estimated number of copies of the chromosome present in the sample of the subject. In some cases, the ploidy is a real number. For diploid samples, the ploidy may be about 2.0.
[0112] Various ratios between the modeled copy numbers and the ploidy are further calculated, for instance. A first ratio 218 is calculated by dividing the first modeled copy number 210 by the ploidy, a second ratio 220 is calculated by dividing the second modeled copy number 212 by the ploidy, a third ratio 222 is calculated by dividing the third modeled copy number 214 by the ploidy, and a fourth ratio 224 is calculated by dividing the fourth modeled copy number 216 by the ploidy.
[0113] The ratios are compared to one or more thresholds in order to determine whether the modeled copy numbers are significant. For example, significant copy number amplifications can be identified by determining that one or more of the ratios exceed an amplification threshold 226. In some cases, the amplification threshold 226 is in a range of 1.0 to 5.0. In the example of FIG. 2, the first modeled copy number 210 is determined to be representative of a significant copy number amplification because the first ratio 218 exceeds the amplification threshold 226. Further, because the third ratio 222 is equal to the amplification threshold 226, the third modeled copy number 214 may be determined to be representative of a significant copy number amplification. That is, using various techniques described herein, the first gene 202 and the third gene 206 may be determined to be significantly amplified in the sample. In contrast, because the second ratio 220 and the fourth ratio 224 are below the amplification threshold 226, the second gene 204 and the fourth gene 208 may be determined to have not been significantly amplified in the sample of the subject. In some cases, the second modeled copy number 212 and / or the fourth modeled copy number 216 are not reported and / or not relied upon to generate a report or identify a condition of the subject.
[0114] Further, losses (e.g., homozygous or heterozygous losses) can be identified based on the ratios. For example, the second ratio 220 is equal to or below a loss threshold 228. The loss threshold 228, for instance, is in a range of 0.0 to 0.8. For example, the loss threshold 228 may be 0.0 or 0.5. For instance, although FIG. 2 illustrates that the loss threshold 228 is greater than 0.0, in some cases, the loss threshold 228 is equal to 0.0. Because the second ratio 220 is equal to or below the loss threshold 228, a loss of the second gene 204 may be identified. For example, for a diploid sample, the second gene 204 may be determined to be present on neither of the two chromosomes or only one of the two chromosomes. In various cases, the loss of the second gene 204 (or, optionally,the second modeled copy number 212) is reported and / or relied upon to generate a report or identified a condition of the subject. In some cases, the loss of the second gene 204 can be identified if the second modeled copy number 212 is determined to be equal to or lower than the loss threshold 228.
[0115] FIG. 3 illustrates an example process 300 for determining whether a computed copy number of a sequence-of-interest is significant. The process 300 may be performed by an entity, such as a computing device, at least one processor, a medical device, a sequencer (e.g., the sequencer 112), a copy number modeler (e.g., the copy number modeler 116), a ploidy calculator (e.g., the ploidy calculator 124), a significance determiner (e.g., the significance determiner 120), a report generator (e.g., the report generator 128), a clinical device (e.g., the clinical device 132), or any combination thereof.
[0116] At 302, the entity determines a copy number of a segment of a sample. The copy number may be determined based on sequence read data of the sample. The sequence read data, for instance, represents sequence reads of nucleic acid molecules extracted from the sample. The sample, for instance, is obtained from a subject. The sample may be a liquid biopsy sample, a tissue biopsy sample, a normal control, or a combination thereof, for instance. In various cases, the sample is liquid biopsy sample that includes circulating tumor cells (CTCs), cell-free DNA (cfDNA), circulating tumor DNA (ctDNA), or a combination thereof. For example, the sample includes DNA fragments.
[0117] The segment is a sequence-of-interest, which may have relevance to a condition of the subject. In various cases, the segment is a subgenomic segment of the sample, such as a segment including one or more genes (e.g., ERRB2, a non-ERRB2 gene), or at least a portion of a gene. In some cases, the copy number of the segment of the sample is an integer.
[0118] In various cases, the copy number of the segment is a modeled copy number. For example, the copy number is determined based on coverage ratio data, allele fraction data, and one or more copy number models associated with the segment of the sample.
[0119] At 304, the entity determines a ploidy of the sample. The ploidy, for example, is determined by calculating an average copy number of multiple segments across a genome and / or chromosome of the sample. The multiple segments may include the segment analyzed with respect to 302. In various cases, the genome and / or chromosome of the sample consists of the multiple segments.
[0120] At 306, the entity determines a ratio by dividing the copy number by the ploidy. At 308, the entity compares the ratio to a threshold. The threshold, in some cases, may be in a range of 0.3 to 5.0. The threshold, for instance, may be determined based on a previously identified condition of the subject (e.g., a previous diagnosis, pregnancy, etc.). In some cases, the threshold is based on one or more genes in the segment. Different segments within the genome may be associated with different thresholds. The threshold may be an amplification threshold or a loss threshold.
[0121] In some examples, the entity determines that the segment has been significantly amplified if the ratio exceeds an amplification threshold. In various cases, the amplification threshold is in a range of 2.0 to 5.0. For instance, the amplification threshold equal to 2.0 if the segment includes, or is included in, ERBB2, or may be 3.0 if the segment is not associated with ERBB2.
[0122] In some cases, the entity determines that the segment is associated with a loss by determining if the ratio is equal to or lower than a loss threshold. In various examples, the loss threshold is less than 1.0. For example, the loss threshold may be equal to 0.0 or 0.5.
[0123] In some cases, the entity outputs an indication of the significant amplification or loss. In various implementations, the entity outputs the copy number, an indication of the segment (e.g., a name of a gene associated with the segment), the ratio, or a combination thereof, based on determining that the segment has been significantly amplified or determining that the segment is associated with a loss. In some cases, the entity refrains from outputting the copy number, the indication of the segment, the ratio, or the combination thereof, if the entity determines that the segment has not been significantly amplified and is not associated with a loss.
[0124] According to various implementations of the present disclosure, the entity identifies a condition of the subject, or a fetus or embryo hosted by the subject, based at least in part on the presence of the significant amplification or loss. For example, the entity may infer that the subject, fetus, or embryo has a type of cancer, a subtype of cancer, a genetic disease or abnormality, or any combination thereof. The condition, for example, may be a pathological condition. In some cases, the entity may determine or otherwise select a treatment for administration to the subject, fetus, or embryo based on the presence of the significant amplification or loss. In some examples, the entity may determine a dosage of the treatment based on the presence of the significant amplification or loss. In some cases, the entity determines a genomic profile of the subject, fetus, or embryo based, at least in part, on the presence of the significant amplification or loss. In some examples, the entity may determine whether the subject qualifies for a clinical trial based, at least in part, on the presence of the significant amplification or loss.
[0125] FIG. 4 illustrates an example report 400 summarizing predicted categories of a cancer of a subject. In various cases, the report 400 is the report 130 described above with reference to FIG. 1. The report 400, for instance, may be displayed to a patient and / or care provider. In some cases, the report 400 is generated based on features of a sample (e.g., a liquid biopsy sample, a tissue biopsy sample, or the like) obtained from the subject. In various examples, the report 400 is at least partially generated based on a genomic test of the subject. In various implementations, the report 400 is generated based on identified significant amplifications and / or losses identified using various techniques described herein. For example, the report 400 may include the copy number model 118, the significant copy number variant(s) 126, the ploidy 122, or any combination thereof.
[0126] The report 400 includes a tissue origin 402 of the cancer. The tissue origin 402, for instance, indicates a histological tissue type 404, a primary site 406, cell subtype 407, or any combination, of the cancer.
[0127] In various cases, the report 400 includes one or more therapy indicators 408. For instance, the therapy indicator(s) 408 convey whether the cancer is predicted to be resistant to one or more predetermined therapies and / or whether the cancer is predicted to be responsive to one or more predetermined therapies. In some cases, the therapy indicator(s) 408 include one or more suggested treatment decisions for the subject. In some examples, the therapy indicator(s) 408 indicate one or more therapeutic agents predicted to treat the cancer and / or a dosage of the therapeutic agent(s).
[0128] In some examples, the report 400 includes one or more prognostic indicators 410. The prognostic indicator(s) 410, for instance, indicate a prognosis of the subject in view of the categorized cancer. For example, the prognostic indicator(s) 410 may indicate a survivability, a recoverability, a quality of life indicator, or other information indicative of the prognosis of the subject.
[0129] The report 400 may include a trial qualification 412 of the subject. The trial qualification 412, for instance, indicates whether the subject is predicted to be eligible for a predetermined clinical trial.
[0130] The report 400, in various implementations, includes a metastasis profile 414 of the subject. The metastasis profile 414, for instance, indicates a likelihood that the cancer will metastasize (e.g., at a particular point in time), one or more tissues in which the cancer is predicted to metastasize, or the like.
[0131] In various cases, the report 400 includes recommended follow-up tests 416. For example, the report 400 may include a recommendation to perform whole genome sequencing on the subject, particularly in cases if the cancer cannot be categorized above a threshold certainty.
[0132] The report 400 may include a genomic profile 418 of the subject. In various cases, the genomic profile 418 includes or is generated based on the results of one or more genomic analyses performed on one or more samples obtained from the subject. The genomic profile 418, for example, may include results from one or more nucleic acid sequencing-based tests or analyses. For instance, the genomic profile 418 may be generated based on a comprehensive genomic profiling test, a WGS test, a WES test, a gene expression profiling test, a cancer hotspot panel test, a DNA methylation test, a DNA fragmentation test, an RNA fragmentation test, or any combination thereof.
[0133] FIG. 5 illustrates an example environment 500 for sequencing various nucleic acid molecules 502. In various implementations, the nucleic acid molecules 502 include cfDNA and / or gDNA. For instance, the nucleic acid molecules 502 may include ctDNA. The nucleic acid molecules 502, in various cases, are extracted from a sample, such as a biological sample obtained from a subject. In some implementations, the nucleic acid molecules 502 include DNA that is complementary to RNA present in the sample.
[0134] The nucleic acid molecules 502, in various cases, are ligated with adapters 504. For examples, the adapters 504 are hybridized to the nucleic acid molecules 502. The adapters 504, for example, include additional nucleic acid molecules. In various implementations, the adapters 504 have a shorter length than the nucleic acid molecules 502 being sequenced. For instance, the adapters 504 include amplification primers, flow cell adapter sequences, substrate adapter sequences, or sample index sequences. Although FIG. 5 illustrates adapters 504 being ligated to one end of each of the nucleic acid molecules 502, implementations are not so limited. For example, the adapters 504 may be ligated to both ends of each of the nucleic acid molecules 502.
[0135] In various examples, the nucleic acid molecules 502 ligated with the adapters 504 are amplified in order to generate amplified molecules 506. Various amplification techniques can be performed. For instance, the amplified molecules 506 are generated using PCR, a non-PCR amplification technique, an isothermal amplification technique, or any combination thereof.
[0136] Amplified molecules 506 may be captured by bait molecules 510 and sequenced. In some implementations, the amplified molecules 506 are sequenced via sequencing-by-synthesis. In various cases, fluorescently tagged deoxyribonucleotide triphosphates (dNTP) 512 are utilized to synthesize a strand that is complementary to DNA strands bound to the substrate 508. When a dNTP 512 is added to the strand (e.g., by an enzyme), the dNTP 512 emits an optical signal 514. In various implementations, the frequency of the optical signal 514 is dependent on the type of dNTP 512 from which the optical signal 514 is emitted. By detecting the optical signals 514 as the strand is being synthesized, the sequence of the original nucleic acid molecules 502 can be derived.
[0137] In some implementations, the amplified molecules 506 are sequenced via nanopore sequencing. For instance, the amplified molecules 506 are directed through a nanopore 516 extending through a substrate 518. In various cases, the amplified molecules 506 are negatively charged, such that they can be directed through thenanopore 516 by imposing an electrical field across the substrate 518. In various cases, the amplified molecules 506 and the nanopore 516 are in the presence of a charged solution. Thus, charged solutes traveling through the nanopore 516 can be monitored by reviewing an electrical signal (e.g., a current) sensed between electrodes 520 on either side of the substrate 518. For instance, the electrodes 520 are disposed (e.g., located) on spaces on either side of the substrate 518. As an amplified molecule 506 is directed through the nanopore 516, the individual bases within the amplified molecule 506 will block the nanopore 516, which may decrease the amount of charged solutes traveling through the nanopore 516 and consequently, the magnitude of the electrical signal detected by the electrodes 520. Each of the four types of bases within the amplified molecules 506, may block the nanopore 516 to a different extent. Therefore, the sequence of the nucleic acid molecules 502 can be derived by analyzing the measured electrical signal with respect to time as the amplified molecules 506 are directed through the nanopore 516.
[0138] FIG. 6 illustrates one or more devices 600 configured to perform various operations described herein. The device(s) 600 include one or more processor(s) 602. In some implementations, the processor(s) 602 includes a central processing unit (CPU), a graphics processing unit (GPU), both CPU and GPU, or other processing unit or component known in the art.
[0139] The processor(s) 602 is operably connected to memory 604. In various implementations, the memory 604 is volatile (such as random access memory (RAM)), non-volatile (such as read only memory (ROM), flash memory, etc.) or some combination of the two. The memory 604 stores instructions that, when executed by the processor(s) 602, causes the processor(s) 602 to perform various operations. In various examples, the memory 604 stores methods, threads, processes, applications, objects, modules, any other sort of executable instruction, or a combination thereof. In some cases, the memory 604 stores files, databases, or a combination thereof. In some examples, the memory 604 includes, but is not limited to, RAM, ROM, electrically erasable programmable read-only memory (EEPROM), flash memory, or any other memory technology. In some examples, the memory 604 includes one or more types of non-transitory computer readable media, such as CD-ROMs, digital versatile discs (DVDs), content-addressable memory (CAM), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the processor(s) 602. For instance, the memory 604 stores instructions that, when executed by the processor(s) 602, causes the processor(s) 602 to perform operations of the copy number modeler 116, the significance determiner 120, the ploidy calculator 124, the report generator 128, or any combination thereof.
[0140] The processor(s) 602 is operably connected to one or more input devices 606 and one or more output devices 608. Collectively, the input device(s) 606 and the output device(s) 608 function as an interface between at least one user and the device(s) 600. The input device(s) 606 is configured to receive an input from a user and includes at least one of a keypad, a cursor control, a touch-sensitive display, a voice input device (e.g., a microphone), a haptic feedback device (e.g., a gyroscope), or any combination thereof. The output device(s) 608 includes at least one of a display, a speaker, a haptic output device, a printer, or any combination thereof. In various examples, the processor(s) 602 causes a display among the input device(s) 606 to visually output various data described herein. In some implementations, the input device(s) 606 includes one or more touch sensors, the output device(s) 608 includes a display screen, and the touch sensor(s) are integrated with the display screen.
[0141] In various implementations, the processor(s) 602 is operably connected to one or more transceivers 610 that transmit and / or receive data over one or more communication networks 612. For example, the transceiver(s) 610 includes a network interface card (NIC), a network adapter, a local area network (LAN) adapter, or a physical, virtual, or logical address to connect to the various external devices and / or systems. In various examples, the transceiver(s) 610 includes any sort of wireless transceivers capable of engaging in wireless communication (e.g., radio frequency (RF) communication). For example, the communication network(s) 612 includes one or more wireless networks that include a 3rd Generation Partnership Project (3GPP) network, such as a Long Term Evolution (LTE) radio access network (RAN) (e.g., over one or more LTE bands), a New Radio (NR) RAN (e.g., over one or more NR bands), or a combination thereof. In some cases, the transceiver(s) 610 includes other wireless modems, such as a modem for engaging in WI-FI®, WIGIG®, WIMAX®, BLUETOOTH®, or infrared communication over the communication network(s) 612.
[0142] The device(s) 600 may further include the sequencer 112. In various implementations, the sequencer 112 includes one or more fluidic circuits 614 configured to receive a sample 616 derived from a subject 617. The sequencer 112, in various cases, may be configured to generate data indicative of one or more sequences of nucleic acid molecules (e.g., DNA and / or RNA) present in the sample 616. In various cases, the sequencer 112 introduces one or more reagents 618 to the fluidic circuit(s) 614 in order to prepare for and perform sequencing of the nucleic acid molecules. Further, the sequencer 112 may include one or more sensors 620 configured to measure or otherwise detect detection signals from the fluidic circuit(s) 614, which may be indicative of the sequences of the nucleic acid molecules. According to various implementations, the sensor(s) 620 may further include one or more ADCs. The sequencer 112, in various cases, outputs sequence read data to the processor(s) 602 for additional processing. Example Clauses
[0143] The following clauses provide various examples of the present disclosure.1. A method, including: providing a plurality of nucleic acid molecules obtained from a sample from a subject; ligating one or more adapters onto one or more nucleic acid molecules from the plurality of nucleic acid molecules; amplifying the one or more ligated nucleic acid molecules from the plurality of nucleic acid molecules; capturing amplified nucleic acid molecules from the amplified nucleic acid molecules; sequencing, by a sequencer, all or a subset of the captured amplified nucleic acid molecules to obtain a plurality of sequence reads that represent the sequenced amplified nucleic acid molecules thereby generating sequence read data; receiving, at one or more processors, the sequence read data for the plurality of sequence reads; determining, by the one or more processors, copy numbers of multiple segments across a genome of the sample; determining, by the one or more processors, a ploidy of the sample by determining an average of the copy numbers; determining a first ratio by dividing a first copy number, among the copy numbers, of a first segment, among the multiple segments, by the ploidy of the sample; determining that the first segment has been amplified by determining that the first ratio exceeds a threshold associated with the first segment; determining a second ratio by dividing a second copy number, among the copy numbers, of a second segment, among the multiple segments, by the ploidy of the sample; determining that the second ratio is less than or equal to a threshold associated with the second segment; and in response to determining that the first ratio exceeds the threshold associated with the first segment and determining that the second ratio isless than or equal to the threshold associated with the second segment, generating a report including an indication of the first copy number and omitting an indication of the second copy number.2. The method of clause 1, wherein the first segment includes at least one first gene, andwherein the second segment includes at least one second gene.3. The method of clause 2, wherein the at least one first gene includes ERRB2 and the threshold associated with the first segment is about 2.0, and wherein the at least one second gene includes a non-ERRB2 gene and the threshold associated with the second segment is in a range of about 3.0 to about 5.0.4. The method of clause 2 or 3, further including: determining the threshold associated with the first segment based on the at least one first gene; and determining the threshold associated with the second segment based on the at least one second gene.5. The method of any of clauses 1 to 4, further including: predicting, based on the first copy number, a cancer type or cancer subtype of the subject, wherein the report includes the cancer type or cancer subtype of the subject.6. The method of any of clauses 1 to 5, further including: predicting, based on the first copy number, a predicted effective therapy to treat a condition of the subject, wherein the report includes the predicted effective therapy.7. The method of any of clauses 1 to 6, further including: outputting the report.8. A method, including: determining, by analyzing sequence read data of a sample obtained from a subject, a copy number of a segment of the sample; determining, by analyzing the sequence read data, a ploidy of the sample by determining an average copy number of multiple segments across a genome of the sample, the multiple segments across the genome of the sample including the segment of the sample; determining a ratio by dividing the copy number of the segment of the sample by the ploidy of the sample; comparing the ratio to a threshold; and based on comparing the ratio to the threshold, outputting an indication of the copy number of the segment of the sample.9. The method of clause 8, wherein the segment includes a subgenomic segment of the sample.10. The method of clause 8 or 9, wherein the segment of the sample includes at least one gene.11. The method of clause 10, wherein the at least one gene includes ABL1, ACVR1B, AKT1, AKT2, AKT3, ALK, ALOX12B, AMER1, APC, AR, ARAF, ARFRP1, ARID1A, ASXL1, ATM, ATR, ATRX, AURKA, AURKB, AXIN1, AXL, BAP1, BARD1, BCL2, BCL2L1, BCL2L2, BCL6, BCOR, BCORL1, BCR, BRAF, BRCA1, BRCA2, BRD4, BRIP1, BTG1, BTG2, BTK, CALR, CARD11, CASP8, CBFB, CBL, CCND1, CCND2, CCND3, CCNE1, CD22, CD274, CD70, CD74, CD79A, CD79B, CDC73, CDH1, CDK12, CDK4, CDK6, CDK8, CDKN1A, CDKN1B, CDKN2A, CDKN2B, CDKN2C, CEBPA, CHEK1, CHEK2, CIC, CREBBP, CRKL, CSF1R, CSF3R, CTCF, CTNNA1, CTNNB1, CUL3, CUL4A, CXCR4, CYP17A1, DAXX, DDR1, DDR2, DIS3, DNMT3A, DOT1L, EED, EGFR, EMSY (C11orf30), EP300, EPHA3, EPHB1, EPHB4, ERBB2, ERBB3, ERBB4, ERCC4, ERG, ERRFI1, ESR1, ETV4, ETV5, ETV6, EWSR1, EZH2, EZR, FAM46C, FANCA, FANCC, FANCG, FANCL, FAS, FBXW7, FGF10, FGF12, FGF14, FGF19, FGF23, FGF3, FGF4, FGF6, FGFR1, FGFR2, FGFR3, FGFR4, FH, FLCN, FLT1, FLT3, FOXL2, FUBP1, GABRA6, GATA3, GATA4, GATA6, GID4 (C17orf39), GNA11, GNA13, GNAQ, GNAS, GRM3, GSK3B, H3F3A, HDAC1, HGF, HNF1A, HRAS, HSD3B1, ID3, IDH1, IDH2, IGF1R, IKBKE, IKZF1, INPP4B, IRF2, IRF4, IRS2, JAK1, JAK2, JAK3, JUN, KDM5A, KDM5C, KDM6A, KDR, KEAP1, KEL, KIT, KLHL6, KMT2A (MLL), KMT2D (MLL2), KRAS, LTK, LYN, MAF, MAP2K1, MAP2K2, MAP2K4, MAP3K1, MAP3K13, MAPK1, MCL1, MDM2, MDM4, MED12, MEF2B, MEN1,MERTK, MET, MITF, MKNK1, MLH1, MPL, MRE11A, MSH2, MSH3, MSH6, MST1R, MTAP, MTOR, MUTYH, MYB, MYC, MYCL, MYCN, MYD88, NBN, NF1, NF2, NFE2L2, NFKBIA, NKX2-1, NOTCH1, NOTCH2, NOTCH3, NPM1, NRAS, NT5C2, NTRK1, NTRK2, NTRK3, NUTM1, P2RY8, PALB2, PARK2, PARP1, PARP2, PARP3, PAX5, PBRM1, PDCD1, PDCD1LG2, PDGFRA, PDGFRB, PDK1, PIK3C2B, PIK3C2G, PIK3CA, PIK3CB, PIK3R1, PIM1, PMS2, POLD1, POLE, PPARG, PPP2R1A, PPP2R2A, PRDM1, PRKAR1A, PRKCI, PTCH1, PTEN, PTPN11, PTPRO, QKI, RAC1, RAD21, RAD51, RAD51B, RAD51C, RAD51D, RAD52, RAD54L, RAF1, RARA, RB1, RBM10, REL, RET, RICTOR, RNF43, ROS1, RPTOR, RSPO2, SDC4, SDHA, SDHB, SDHC, SDHD, SETD2, SF3B1, SGK1, SLC34A2, SMAD2, SMAD4, SMARCA4, SMARCB1, SMO, SNCAIP, SOCS1, SOX2, SOX9, SPEN, SPOP, SRC, STAG2, STAT3, STK11, SUFU, SYK, TBX3, TEK, TERC, TERT, TET2, TGFBR2, TIPARP, TMPRSS2, TNFAIP3, TNFRSF14, TP53, TSC1, TSC2, TYRO3, U2AF1, VEGFA, VHL, WHSC1, WHSC1L1, WT1, XPO1, XRCC2, ZNF217, ZNF703, ABL, ALK, ALL, B4GALNT1, BAFF, BCL2, BRAF, BRCA, BTK, CD19, CD20, CD3, CD30, CD319, CD38, CD52, CDK4, CDK6, CML, CRACC, CS1, CTLA-4, dMMR, EGFR, ERBB1, ERBB2, FGFR1-3, FLT3, GD2, HDAC, HER1, HER2, HR, IDH2, IL-1β, IL-6, IL-6R, JAK1, JAK2, JAK3, KIT, KRAS, MEK, MET, MSI-H, mTOR, PARP, PD-1, PDGFR, PDGFRa, PDGFRβ, PD-L1, PI3Kδ, PIGF, PTCH, RAF, RANKL, RET, ROS1, SLAMF7, VEGF, VEGFA, orVEGFB.12. The method of any of clauses 8 to 11, wherein at least one gene includes the segment of the sample. 13. The method of any of clauses 8 to 12, wherein the copy number of the segment of the sample is an integer.14. The method of any of clauses 8 to 13, wherein the genome of the sample consists of the segments across the genome of the sample.15. The method of any of clauses 8 to 14, wherein determining, by analyzing sequence read data of the sample obtained from the subject, the copy number of the segment of the sample includes: determining coverage ratio data, allele fraction data, and at least one copy number model associated with the segment of the sample; and determining the copy number of the segment of the sample based on the coverage ratio data, the allele fraction data, and the at least one copy number model associated with the segment of the sample.16. The method of any of clauses 8 to 15, wherein the threshold is about 2.0 or greater.17. The method of any of clauses 8 to 16, wherein the threshold is in a range of about 2.0 to about 5.0.18. The method of any of clauses 8 to 16, wherein the threshold is about 0.5 or about 0.0.19. The method of any of clauses 8 to 16, wherein the threshold is in a range of about 1.0 to about 2.0.20. The method of any of clauses 8 to 19, wherein the segment of the sample includes ERRB2 and the threshold is 2.0.21. The method of any of clauses 8 to 20, wherein the sample includes a tissue biopsy sample, a liquid biopsy sample, or a normal control.22. The method of any of clauses 8 to 21, wherein the sample includes blood, plasma, cerebrospinal fluid, sputum, stool, urine, lymphatic fluid, or saliva.23. The method of any of clauses 8 to 22, wherein the sample is a liquid biopsy sample and includes circulating tumor cells (CTOs).24. The method of clause 23, wherein the sample is a liquid biopsy sample and includes cell-free DNA (cfDNA), circulating tumor DNA (ctDNA), or any combination thereof.25. The method of any of clauses 8 to 24, wherein outputting the indication of the copy number of the segment of the sample includes: generating a report based on the copy number of the segment of the sample; and outputting the report.26. The method of clause 25, wherein outputting the report includes: transmitting data indicating the report to an external device.27. The method of clause 26, wherein the external device is associated with the subject and / or a healthcare provider.28. The method of clause 26 or 27, wherein the data is transmitted over one or more communication networks.29. The method of any of clauses 26 to 28, wherein the data is transmitted over a peer-to-peer connection. 30. The method of any of clauses 25 to 29, wherein outputting the report includes: visually presenting, by a display, the report.31. The method of any of clauses 25 to 30, further including: based on comparing the ratio to the threshold, determining, based on the copy number of the segment of the sample, one or more therapies to treat a condition of the subject, wherein the report further indicates the one or more therapies.32. The method of clause 31, wherein the condition includes at least one type or subtype of cancer.33. The method of any of clauses 8 to 32, further including: based on comparing the ratio to the threshold, predicting a condition of the subject based on the copy number of the segment, wherein outputting the indication of the copy number of the segment includes outputting an indication of the condition.34. The method of clause 33, wherein the condition includes a cancer type, a cancer subtype, or a genetic disease.35. The method of any of clauses 8 to 34, wherein comparing the ratio to the threshold includes determining that the ratio exceeds the threshold, and wherein the indication of the copy number of the segment of the sample includes an indication that the segment has been amplified.36. The method of any of clauses 8 to 35, wherein comparing the ratio to the threshold includes determining that the ratio is below the threshold, and wherein the indication of the copy number of the segment of the sample includes an indication that the segment is associated with a loss.37. The method of any of clauses 8 to 36, further including: based on comparing the ratio to the threshold, predicting a condition of an embryo or a fetus of the subject based on the copy number of the segment, wherein outputting the indication of the copy number of the segment includes outputting an indication of the condition.38. The method of any of clauses 8 to 37, further including: receiving a plurality of nucleic acid molecules obtained from the sample; ligating one or more adapters onto one or more nucleic acid molecules from the plurality of nucleic acid molecules; amplifying the one or more ligated nucleic acid molecules; capturing all or a subset of the amplified nucleic acid molecules; and sequencing, by a sequencer, the captured nucleic acid molecules to obtain a plurality of sequence reads that represent the captured nucleic acid molecules, thereby generating the sequence read data for the genome of the sample.39. The method of clause 38, wherein the one or more adapters include amplification primers, flow cell adapter sequences, substrate adapter sequences, or sample index sequences.40. The method of clause 38 or 39, wherein the captured nucleic acid molecules are captured from the amplified nucleic acid molecules by hybridization to one or more bait molecules.41. The method of clause 40, wherein the one or more bait molecules include one or more additional nucleic acid molecules, each of the one or more additional nucleic acid molecules including a region that is complementary to a region of a captured nucleic acid molecule.42. The method of any of clauses 38 to 41, wherein amplifying the one or more ligated nucleic acid molecules includes performing a polymerase chain reaction (PCR) amplification technique, a non-PCR amplification technique, or an isothermal amplification technique.43. The method of any of clauses 38 to 42, wherein sequencing the captured nucleic acid molecules includes use of a massively parallel sequencing (MPS) technique, whole genome sequencing (WGS), whole exome sequencing, targeted sequencing, direct sequencing, or Sanger sequencing.44. The method of any of clauses 38 to 43, wherein sequencing the captured nucleic acid molecules includes next generation sequencing (NGS).45. The method of any of clauses 38 to 44, wherein the sequencer includes a next generation sequencer. 46. The method of any of clauses 38 to 45, wherein sequencing the captured nucleic acid molecules includes sequencing-by-synthesis or nanopore sequencing.47. The method of any of clauses 8 to 46, further including: generating ligated molecules by ligating adapters onto nucleic acid molecules of the sample; generating amplified ligated molecules by amplifying the ligated molecules; generating, using the amplified ligated molecules, detection signals; detecting, by at least one sensor, the detection signals; and generating the sequence read data based on the detection signals.48. The method of clause 47, wherein the detection signals include electrical signals and / or optical signals. 49. The method of clause 47 or 48, wherein generating, using the amplified ligated molecules, the detection signals includes: synthesizing, by a polymerase using fluorescently tagged nucleotide triphosphates (NTPs), a synthesized nucleic acid molecule that is complementary to one of the amplified ligated molecules, and wherein detecting, by the at least one sensor, the detection signals includes: detecting, by at least one optical sensor, optical signals emitted by the fluorescently tagged NTPs upon binding to the synthesized nucleic acid molecule, the optical signals being indicative of at least one sequence of the nucleic acid molecules of the sample.50. The method of any of clauses 47 to 49, wherein generating, using the amplified ligated molecules, the detection signals includes: directing the amplified ligated molecules through a nanopore extending from a first space to a second space through a substrate, and wherein detecting, by the at least one sensor, the detection signals includes: detecting, by sensors disposed in the first space and the second space, an electrical signal over time, the electrical signal being indicative of at least one sequence of the nucleic acid molecules of the sample.51. The method of any of clauses 47 to 50, wherein the sequence read data indicates a full genome or RNA transcriptome of the sample.52. The method of any of clauses 47 to 51, wherein the sequence read data indicates a whole exome of the sample.53. The method of any of clauses 47 to 52, wherein the sequence read data indicates a predetermined panel of genes of the sample.54. The method of clause 53, wherein the predetermined panel includes one or more of ABL1, ACVR1B, AKT1, AKT2, AKT3, ALK, ALOX12B, AMER1, APC, AR, ARAF, ARFRP1, ARID1A, ASXL1, ATM, ATR, ATRX, AURKA, AURKB, AXIN1, AXL, BAP1, BARD1, BCL2, BCL2L1, BCL2L2, BCL6, BCOR, BCORL1, BCR, BRAF, BRCA1, BRCA2, BRD4, BRIP1, BTG1, BTG2, BTK, CALR, CARD11, CASP8, CBFB, CBL, CCND1, CCND2, CCND3, CCNE1, CD22, CD274, CD70, CD74, CD79A, CD79B, CDC73, CDH1, CDK12, CDK4, CDK6, CDK8, CDKN1A, CDKN1B, CDKN2A, CDKN2B, CDKN2C, CEBPA, CHEK1, CHEK2, CIC, CREBBP, CRKL, CSF1R, CSF3R, CTCF, CTNNA1, CTNNB1, CUL3, CUL4A, CXCR4, CYP17A1, DAXX, DDR1, DDR2, DIS3, DNMT3A, DOT1L, EED, EGFR, EMSY (C11orf30), EP300, EPHA3, EPHB1, EPHB4, ERBB2, ERBB3, ERBB4, ERCC4, ERG, ERRFI1, ESR1, ETV4, ETV5, ETV6, EWSR1, EZH2, EZR, FAM46C, FANCA, FANCC, FANCG, FANCL, FAS, FBXW7, FGF10, FGF12, FGF14, FGF19, FGF23, FGF3, FGF4, FGF6, FGFR1, FGFR2, FGFR3, FGFR4, FH, FLCN, FLT1, FLT3, FOXL2, FUBP1, GABRA6, GATA3, GATA4, GATA6, GID4 (C17orf39), GNA11, GNA13, GNAQ, GNAS, GRM3, GSK3B, H3F3A, HDAC1, HGF, HNF1A, HRAS, HSD3B1, ID3, IDH1, IDH2, IGF1R, IKBKE, IKZF1, INPP4B, IRF2, IRF4, IRS2, JAK1, JAK2, JAK3, JUN, KDM5A, KDM5C, KDM6A, KDR, KEAP1, KEL, KIT, KLHL6, KMT2A (MLL), KMT2D (MLL2), KRAS, LTK, LYN, MAF, MAP2K1, MAP2K2, MAP2K4, MAP3K1, MAP3K13, MAPK1, MCL1, MDM2, MDM4, MED12, MEF2B, MEN1, MERTK, MET, MITF, MKNK1, MLH1, MPL, MRE11A, MSH2, MSH3, MSH6, MST1R, MTAP, MTOR, MUTYH, MYB, MYC, MYCL, MYCN, MYD88, NBN, NF1, NF2, NFE2L2, NFKBIA, NKX2-1, NOTCH1, NOTCH2, NOTCH3, NPM1, NRAS, NT5C2, NTRK1, NTRK2, NTRK3, NUTM1, P2RY8, PALB2, PARK2, PARP1, PARP2, PARP3, PAX5, PBRM1, PDCD1, PDCD1LG2, PDGFRA, PDGFRB, PDK1, PIK3C2B, PIK3C2G, PIK3CA, PIK3CB, PIK3R1, PIM1, PMS2, POLD1, POLE, PPARG, PPP2R1A, PPP2R2A, PRDM1, PRKAR1A, PRKCI, PTCH1, PTEN, PTPN11, PTPRO, QKI, RAC1, RAD21, RAD51, RAD51B, RAD51C, RAD51D, RAD52, RAD54L, RAF1, RARA, RB1, RBM10, REL, RET, RICTOR, RNF43, ROS1, RPTOR, RSPO2, SDC4, SDHA, SDHB, SDHC, SDHD, SETD2, SF3B1, SGK1, SLC34A2, SMAD2, SMAD4, SMARCA4, SMARCB1, SMO, SNCAIP, SOCS1, SOX2, SOX9, SPEN, SPOP, SRC, STAG2, STAT3, STK11, SUFU, SYK, TBX3, TEK, TERC, TERT, TET2, TGFBR2, TIPARP, TMPRSS2, TNFAIP3, TNFRSF14, TP53, TSC1, TSC2, TYRO3, U2AF1, VEGFA, VHL, WHSC1, WHSC1L1, WT1, XPO1, XRCC2, ZNF217, ZNF703, ABL, ALK, ALL, B4GALNT1, BAFF, BCL2, BRAF, BRCA, BTK, CD19, CD20, CD3, CD30, CD319, CD38, CD52, CDK4, CDK6, CML, CRACC, CS1, CTLA-4, dMMR, EGFR, ERBB1, ERBB2, FGFR1-3, FLT3, GD2, HDAC, HER1, HER2, HR, IDH2, IL-1p, IL-6, IL-6R, JAK1, JAK2, JAK3, KIT, KRAS, MEK, MET, MSI-H, mTOR, PARP, PD-1, PDGFR, PDGFRo, PDGFRβ, PD-L1, PI3K5, PIGF, PTCH, RAF, RANKL, RET, ROS1, SLAMF7, VEGF, VEGFA, or VEGFB.55. The method of any of clauses 47 to 54, further including: receiving the sample.56. The method of clause 55, wherein the sample includes a tissue biopsy sample, a liquid biopsy sample, or a normal control.57. The method of clause 55 or 56, wherein the sample is a liquid biopsy sample and includes blood, plasma, cerebrospinal fluid, sputum, stool, urine, lymphatic fluid, or saliva.58. The method of any of clauses 55 to 57, wherein the sample is a liquid biopsy sample and includes circulating tumor cells (CTCs).59. The method of any of clauses 55 to 58, wherein the sample is a liquid biopsy sample and includes cell-free DNA (cfDNA), circulating tumor DNA (ctDNA), or any combination thereof.60. The method of any of clauses 55 to 59, further including extracting DNA or RNA from the sample.61. The method of clause 60, wherein the DNA includes genomic DNA or cDNA.62. The method of clause 60 or 61, wherein the RNA includes messenger RNA, microRNA, or non-coding RNA.63. The method of any of clauses 8 to 62, further including: determining the threshold based on a gene in the segment.64. The method of clause 63, wherein determining the threshold based on the gene in the segment includes determining the threshold further based on a demographic of the subject.65. The method of any of clauses 8 to 64, further including: determining a condition of the subject; and determining the threshold based on the condition.66. The method of clause 65, wherein the condition is a pathogenic condition.67. The method of clause 65 or 66, wherein the condition is based on a result of another diagnostic test on the subject.68. The method of any of clauses 8 to 67, the ratio being a first ratio, the segment being a first segment of the sample, the threshold being a first threshold, the method further including: determining a second ratio by dividing a copy number of a second segment of the sample by the ploidy of the sample; determining that the second ratio is less than or equal to a second threshold; and in response to determining that the second ratio is less than or equal to the second threshold, refraining from outputting an indication of the copy number of the second segment of the sample.69. The method of clause 68, wherein the second threshold is equal to the first threshold.70. The method of clause 68 or 69, wherein the second threshold is different than the first threshold.71. The method of any of clauses 68 to 70, wherein outputting the indication of the copy number of the first segment of the sample includes: outputting a report including the indication of the copy number of the first segment of the sample, the report omitting the copy number of the second segment of the sample.72. The method of any of clauses 8 to 71, further including: generating, based on the copy number of the segment, a genomic profile of the subject.73. The method of clause 72, wherein the genomic profile includes results from at least one of: a comprehensive genomic profiling test; a whole genome sequencing (WGS) test; a whole exome sequencing (WES) test; a gene expression profiling test; a cancer hotspot panel test; a DNA methylation test; a DNA fragmentation test; or an RNA fragmentation test.74. The method of clause 72 or 73, wherein the genomic profile of the subject includes: results from a nucleic acid sequencing-based test.75. The method of any of clauses 72 to 74, further including: selecting, based on the genomic profile and / or the copy number of the segment, an anticancer agent for administration to the subject.76. The method of clause 75, further including: administering the anticancer agent to the subject.77. The method of clause 76, further including: applying, based on the genomic profile, an anticancer therapy to the subject.78. The method of clause 77, wherein the anticancer therapy includes at least one of chemotherapy, radiation therapy, immunotherapy, a targeted therapy, or surgery.79. The method of any of clauses 8 to 78, further including: identifying, based on the copy number of the segment, a suggested treatment decision for the subject.80. The method of clause 79, wherein the suggested treatment decision includes radiotherapy and / or chemotherapy.81. The method of any of clauses 8 to 80, further including: generating, based on the copy number of the segment, a therapy for the subject.82. The method of clause 81, wherein the therapy includes a dosage of one or more therapeutic agents predicted to treat a condition of the subject that is associated with the copy number of the segment.83. The method of any of clauses 8 to 82, further including: determining, based on the copy number of the segment, whether the subject is eligible for a clinical trial.84. The method of any of clauses 8 to 83, wherein the method is performed by one or more processors.85. A non-transitory computer readable medium encoding instructions for performing the method of any of clauses 8 to 84.86. A system, including: at least one processor; and memory storing non-transitory instructions that, when executed by the at least one processor, cause the at least one processor to perform operations including the method of any of clauses 8 to 84.87. The system of clause 86, further including: a sequencer configured to generate the sequence read data. 88. The system of clause 86 or 87, further including: a transceiver configured to transmit, to an external device, a communication signal including the indication of the copy number of the segment of the sample.89. The system of any of clauses 86 to 88, further including: an output device configured to output the indication of the copy number of the segment of the sample.Conclusion
[0144] All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference in their entirety to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference in its entirety. In the event of a conflict between a term herein and a term in an incorporated reference, the term herein controls.
[0145] The features disclosed in the foregoing description, or the following claims, or the accompanying drawings, expressed in their specific forms or in terms of a means for performing the disclosed function, or a method or process for attaining the disclosed result, as appropriate, may, separately, or in any combination of such features, be used for realizing implementations of the disclosure in diverse forms thereof.
[0146] As will be understood by one of ordinary skill in the art, each implementation disclosed herein can comprise, consist essentially of or consist of its particular stated element, step, or component. Thus, the terms "include” or "including” should be interpreted to recite: "comprise, consist of, or consist essentially of.” The transition term "comprise” or "comprises” means has, but is not limited to, and allows for the inclusion of unspecified elements, steps, ingredients, or components, even in major amounts. The transitional phrase "consisting of' excludes any element, step, ingredient or component not specified. The transition phrase "consisting essentially of' limits the scope of the implementation to the specified elements, steps, ingredients or components and to those that do not materially affect the implementation. As used herein, the term "based on” is equivalent to "based at least partly on,” unless otherwise specified.
[0147] Unless otherwise indicated, all numbers expressing quantities, properties, conditions, and so forth used in the specification and claims are to be understood as being modified in all instances by the term "about.” Accordingly, unless indicated to the contrary, the numerical parameters set forth in the specification and attached claims are approximations that may vary depending upon the desired properties sought to be obtained by the present disclosure. At the very least, and not as an attempt to limit the application of the doctrine of equivalents to the scope of the claims, each numerical parameter should at least be construed in light of the number of reported significant digits and by applying ordinary rounding techniques. When further clarity is required, the term "about” has the meaning reasonably ascribed to it by a person skilled in the art when used in conjunction with a stated numerical value or range, i.e., denoting somewhat more or somewhat less than the stated value or range, to within a range of ±20% of the stated value; ±19% of the stated value; ±18% of the stated value; ±17% of the stated value; ±16% of the stated value; ±15% of the stated value; ±14% of the stated value; ±13% of the stated value; ±12% of the stated value; ±11% of the stated value; ±10% of the stated value; ±9% of the stated value; ±8% of the stated value; ±7% of the stated value; ±6% of the stated value; ±5% of the stated value; ±4% of the stated value; ±3% of the stated value; ±2% of the stated value; or ±1 % of the stated value.
[0148] Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the disclosure are approximations, the numerical values set forth in the specific examples are reported as precisely as possible. Any numerical value, however, inherently contains certain errors necessarily resulting from the standard deviation found in their respective testing measurements.
[0149] The terms "a,” "an,” "the,” and similar referents used in the context of describing implementations (especially in the context of the following claims) are to be construed to cover both the singular and the plural, unless otherwise indicated herein or clearly contradicted by context. Recitation of ranges of values herein is merely intended to serve as a shorthand method of referring individually to each separate value falling within the range. Unless otherwise indicated herein, each individual value is incorporated into the specification as if it were individually recited herein. All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g., "such as”) provided herein is intended merely to better illuminate implementations of the disclosure and does not pose a limitation on the scope of the disclosure. No language in the specification should be construed as indicating any non-claimed element essential to the practice of implementations of the disclosure.
[0150] Groupings of alternative elements or implementations disclosed herein are not to be construed as limitations. Each group member may be referred to and claimed individually or in any combination with other members of the group or other elements found herein. It is anticipated that one or more members of a group may be included in, or deleted from, a group for reasons of convenience and / or patentability. When any such inclusion or deletion occurs, the specification is deemed to contain the group as modified thus fulfilling the written description of all Markush groups used in the appended claims.
[0151] Unless otherwise indicated, the practice of the present disclosure can employ conventional techniques of immunology, molecular biology, microbiology, cell biology and recombinant DNA. These methods are described in the following publications. See, e.g., Sambrook, et al. Molecular Cloning: A Laboratory Manual, 2nd Edition (1989); F. M. Ausubel, et al. eds., Current Protocols in Molecular Biology, (1987); the series Methods IN Enzymology (AcademicPress, Inc.); M. MacPherson, et al., PCR: A Practical Approach, IRL Press at Oxford University Press (1991); MacPherson et al., eds. PCR 2: Practical Approach, (1995); Harlow and Lane, eds. Antibodies, A Laboratory Manual, (1988); and R. I. Freshney, ed. Animal Cell Culture (1987).
[0152] Tumor mutational burden (TMB) is a measure of the number of mutations carried by tumor cells. By comparing DNA sequences from a patient’s healthy tissues and tumor cells, the number of acquired somatic and / or germline mutations present in tumors, but not in normal tissues, may be determined. In some instances, driver mutations may be excluded from a TMB calculation.
[0153] In certain examples, "tumor mutational burden" or “TMB” refers to the number of somatic and / or germline mutations in a tumor's genome and / or the number of somatic and / or germline mutations per area of the tumor's genome. In some embodiments, TMB, as used herein, refers to the number of somatic and / or germline mutations per megabase (Mb) of DNA sequenced. In some embodiments, germline (inherited) variants are excluded when determining TMB, given that the immune system has a higher likelihood of recognizing these as self. In various cases, driver mutations are excluded from a TMB calculation.
[0154] Microsatellites are highly polymorphic DNA-repeat regions. In certain examples, ''microsatellite” refers to a repetitive nucleic acid having repeat units of less than about 10 base pairs or nucleotides in length. In certain examples, a microsatellite refers to a tract of tandemly repeated (i.e. adjacent) DNA motifs ranging from one to six or up to ten nucleotides, with each motif repeated 5 to 50 repeated times. ''Microsatellite instability" refers to genetic instability in the microsatellite regions. Cancer patients with microsatellite instability classified as being high (MSI-H or MSI-High) frequently exhibit an accumulation of somatic and / or germline mutations in tumor cells that leads to a range of molecular and biological changes including high tumor mutational burden, increased expression of neoantigens and abundant tumor-infiltrating lymphocytes. Chang et al. ''Microsatellite Instability: A Predictive Biomarker for Cancer Immunotherapy,” Appl Immunohistochem Mol Morphol, 26(2):e15-e21 (2018). These changes have been linked to increased sensitivity to checkpoint inhibitor drugs, such as pembrolizumab, which is used to treat advanced melanoma, head and neck squamous cell carcinoma, non-small cell lung cancer (NSCLC), and classical Hodgkin lymphoma.
[0155] A viral status test refers to a test that identifies the presence of viral RNA or DNA in a subject. The test can identify viral load and / or viral identity. For example, the viral status test can identify the presence of viral RNA or DNA associated with the occurrence of certain cancers. Examples of such viruses include Hepatitis B Virus (HBV) and Hepatitis C Virus (HCV), Kaposi Sarcoma-Associated Herpesvirus (KSHV), Merkel Cell Polyomavirus (MCV), Human Papillomavirus (HPV), Human Immunodeficiency Virus Type 1 (HIV-1, or HIV), Human T-Cell Lymphotropic Virus Type 1 (HTLV-1), and Epstein-Barr Virus (EBV).
[0156] Cancer “hotspot” mutations give rise to oncological outcomes. PhyloP, SIFT, Grantham, COSMIC and PolyPhen-2 are in silico tools that can be used to assess pathogenicity of identified variants. Exemplary hotspot genes and mutations include EGFR exon 19 activating mutation, EGFR exon 19 deletion, EGFR exon 19 insertion, EGFR exon 19 sensitizing mutation, EGFR exon 20 activation mutation, EGFR exon 20 insertion, EGFR G719 mutation, EGFR L858R mutation, EGFR L861 mutation, EGFR S768 mutation, EGFR T790M mutation, C797 mutation, KIT activating mutation, KRAS activating mutation, MET activating mutation, LIRAS activating mutation, PMS2 promoter mutations, among many others. Hotspot mutations also occur in the following genes: AKT2, BRCA1, BRCA2, ERC1, NSD1, POLH, PPM1G, PTEN, RAD18, RAD51, RAD51B, RB1, TERT, TP53, TP53Bp1, ALK, ARMT1, ATAD5, ATG7,ATIC, AXL, BIRC6, BRD3, BRD4, CAPRIN1, CCAR2, CCDC6, CDK5RAP2, CHD9, CIT, CTNNB1, CUL1, EBF1, EIF3E, HIP1, HMGA2, IRF2BP2, NOTCH1, NOTCH4, NPM1, OFD1, TACC1. TACC3, TERF2, TMEM106B, UBE2L3, USP10, WDRD48, YAP1, ZEB2, and ZMYND8.
[0157] A "DNA methylation test” refers to an assay, which can be commercially available, for distinguishing methylated versus unmethylated cytosine loci in DNA. Techniques for measuring cytosine methylation include bisulfite-based methylation assays. The addition of bisulfite to DNA results in the methylation of unmethylated cytosine and its ultimate conversion to the nucleotide uracil. Uracil has similar binding properties to thiamine in the DNA sequence. Previously methylated cytosine does not undergo similar chemical conversion on exposure to bisulfite. Bisulfite assays can thus be used to discriminate previously methylated versus unmethylated cytosine.
[0158] An exemplary quantitative methylation detection assay combines bisulfite treatment and restriction analysis COBRA, which uses methylation sensitive restriction endonucleases, gel electrophoresis, and detection based on labeled hybridization probes. (Ziong and Laird, Nucleic Acid Res. 199725; 2532-4). Another exemplary detection assay is the methylation specific polymerase chain reaction PCR (MSPCR) for amplification of DNA segments of interest. This assay can be performed after sodium bisulfite conversion of cytosine and uses methylation sensitive probes. Other detection assays include the Quantitative Methylation (QM) assay, which combines PCR amplification with fluorescent probes designed to bind to putative methylation sites; MethyLightTM (Qiagen, Redwood City, CA) a quantitative methylation detection assay that uses fluorescence-based PCR (Eads, et al., Cancer Res. 1999; 59:2302-2306); and Ms-SNuPE, a quantitative technique for determining differences in methylation levels in CpG sites. As with other techniques, Ms-SNuPE also requires bisulfite treatment to be performed first, leading to the conversion of unmethylated cytosine to uracil while methyl cytosine is unaffected. PCR primers specific for bisulfite converted DNA are then used to amplify the target sequence of interest. The amplified PCR product is isolated and used to quantitate the methylation status of the CpG site of interest. (Gonzalgo and Jones Nuclei Acids Res1997; 25:252-31).
[0159] In particular embodiments, pyrosequencing can be used to detect marker methylation. Pyrosequencing is a method of DNA sequencing that relies on detection of the release of pyrophosphates as DNA is synthesized (and is therefore a "sequencing by synthesis” technique). To assess methylation by pyrosequencing, a DNA sample can be incubated with sodium bisulfite, converting unmethylated cytosine to uracil. The presence of uracil will result in thymine incorporation during PCR amplification. Therefore, sequencing results that include thymine at a nucleotide position that is known to encode cytosine can be interpreted as unmethylated sites. In contrast cytosines present in the sequencing results indicate that the site was methylated in the original DNA sample, because methylation protects cytosine from conversion to uracil upon treatment. Bisulfite treatment can also be performed on control samples with known methylation patterns, to reduce or eliminate false positive results. Commercially available pyrosequencing machines include Pyro Mark Q96 (Qiagen, Hilden, Germany). For more details on methods to use pyrosequencing for measurement of methylation, see Delaney et al. Methods Mol Biol. 2015 1343: 249-264. Pyrosequencing is especially useful for detecting methylation in the CpG sites within genes.
[0160] In particular embodiments, a protein marker is detected by contacting a sample with reagents (e.g., antibodies), generating complexes of reagent and marker(s), and detecting the complexes. Particular embodiments for detecting and measuring protein levels can use methods including agglutination, chemiluminescence, electrochemiluminescence (ECL), enzyme-linked immunoassays (ELISA), immunoassay, immunoblotting, immunodiffusion,immunoelectrophoresis, immunofluorescence, immunohistochemistry, immunoprecipitation, mass-spectrometry, and western blot. See also, e.g., E. Maggio, Enzyme-Immunoassay (1980), CRC Press, Inc., Boca Raton, Fla; and U. S. Pat. Nos. 4,727,022; 4,659,678; 4,376,110; 4,275,149; 4,233,402; and 4,230,797.
[0161] Read depth refers to the number of times that a specific genomic site is sequenced during a sequencing run.
[0162] Certain implementations are described herein, including the best mode known to the inventors for carrying out implementations of the disclosure. Of course, variations on these described implementations will become apparent to those of ordinary skill in the art upon reading the foregoing description. The inventor expects skilled artisans to employ such variations as appropriate, and the inventors intend for implementations to be practiced otherwise than specifically described herein. Accordingly, the scope of this disclosure includes all modifications and equivalents of the subject matter recited in the claims appended hereto as permitted by applicable law. Moreover, any combination of the above-described elements in all possible variations thereof is encompassed by implementations of the disclosure unless otherwise indicated herein or otherwise clearly contradicted by context.
Claims
CLAIMSWhat is claimed is:
1. A method, comprising:determining, by analyzing sequence read data of a sample obtained from a subject, a copy number of a segment of the sample;determining, by analyzing the sequence read data, a ploidy of the sample by determining an average copy number of multiple segments across a genome of the sample, the multiple segments across the genome of the sample comprising the segment of the sample;determining a ratio by dividing the copy number of the segment of the sample by the ploidy of the sample; comparing the ratio to a threshold; andbased on comparing the ratio to the threshold, outputting an indication of the copy number of the segment of the sample.
2. The method of claim 1, wherein the segment of the sample includes at least one gene comprising ABL1, ACVR1B, AKT1, AKT2, AKT3, ALK, ALOX12B, AMER1, APC, AR, ARAF, ARFRP1, ARID1A, ASXL1, ATM, ATR, ATRX, AURKA, AURKB, AXIN1, AXL, BAP1, BARD1, BCL2, BCL2L1, BCL2L2, BCL6, BCOR, BCORL1, BCR, BRAF, BRCA1, BRCA2, BRD4, BRIP1, BTG1, BTG2, BTK, CALR, CARD11, CASP8, CBFB, CBL, CCND1, CCND2, CCND3, CCNE1, CD22, CD274, CD70, CD74, CD79A, CD79B, CDC73, CDH1, CDK12, CDK4, CDK6, CDK8, CDKN1A, CDKN1B, CDKN2A, CDKN2B, CDKN2C, CEBPA, CHEK1, CHEK2, CIC, CREBBP, CRKL, CSF1R, CSF3R, CTCF, CTNNA1, CTNNB1, CUL3, CUL4A, CXCR4, CYP17A1, DAXX, DDR1, DDR2, DIS3, DNMT3A, DOT1L, EED, EGFR, EMSY (C11orf30), EP300, EPHA3, EPHB1, EPHB4, ERBB2, ERBB3, ERBB4, ERCC4, ERG, ERRFI1, ESR1, ETV4, ETV5, ETV6, EWSR1, EZH2, EZR, FAM46C, FANCA, FANCC, FANCG, FANCL, FAS, FBXW7, FGF10, FGF12, FGF14, FGF19, FGF23, FGF3, FGF4, FGF6, FGFR1, FGFR2, FGFR3, FGFR4, FH, FLCN, FLT1, FLT3, FOXL2, FUBP1, GABRA6, GATA3, GATA4, GATA6, GID4 (C17orf39), GNA11, GNA13, GNAQ, GNAS, GRM3, GSK3B, H3F3A, HDAC1, HGF, HNF1A, HRAS, HSD3B1, ID3, IDH1, IDH2, IGF1R, IKBKE, IKZF1, INPP4B, IRF2, IRF4, IRS2, JAK1, JAK2, JAK3, JUN, KDM5A, KDM5C, KDM6A, KDR, KEAP1, KEL, KIT, KLHL6, KMT2A (MLL), KMT2D (MLL2), KRAS, LTK, LYN, MAF, MAP2K1, MAP2K2, MAP2K4, MAP3K1, MAP3K13, MAPK1, MCL1, MDM2, MDM4, MED12, MEF2B, MEN1, MERTK, MET, MITF, MKNK1, MLH1, MPL, MRE11A, MSH2, MSH3, MSH6, MST1R, MTAP, MTOR, MUTYH, MYB, MYC, MYCL, MYCN, MYD88, NBN, NF1, NF2, NFE2L2, NFKBIA, NKX2-1, NOTCH1, NOTCH2, NOTCH3, NPM1, NRAS, NT5C2, NTRK1, NTRK2, NTRK3, NUTM1, P2RY8, PALB2, PARK2, PARP1, PARP2, PARP3, PAX5, PBRM1, PDCD1, PDCD1LG2, PDGFRA, PDGFRB, PDK1, PIK3C2B, PIK3C2G, PIK3CA, PIK3CB, PIK3R1, PIM1, PMS2, POLD1, POLE, PPARG, PPP2R1A, PPP2R2A, PRDM1, PRKAR1A, PRKCI, PTCH1, PTEN, PTPN11, PTPRO, QKI, RAC1, RAD21, RAD51, RAD51B, RAD51C, RAD51D, RAD52, RAD54L, RAF1, RARA, RB1, RBM10, REL, RET, RICTOR, RNF43, ROS1, RPTOR, RSPO2, SDC4, SDHA, SDHB, SDHC, SDHD, SETD2, SF3B1, SGK1, SLC34A2, SMAD2, SMAD4, SMARCA4, SMARCB1, SMO, SNCAIP, SOCS1, SOX2, SOX9, SPEN, SPOP, SRC, STAG2, STAT3, STK11, SUFU, SYK, TBX3, TEK, TERC, TERT, TET2, TGFBR2, TIPARP, TMPRSS2, TNFAIP3, TNFRSF14, TP53, TSC1, TSC2, TYR03, U2AF1, VEGFA, VHL, WHSC1, WHSC1L1, WT1, XP01, XRCC2, ZNF217, ZNF703, ABL, ALK, ALL, B4GALNT1, BAFF, BCL2, BRAF, BRCA, BTK, CD19, CD20, CD3, CD30, CD319, CD38, CD52, CDK4, CDK6, CML, CRACC, CS1, CTLA-4, dMMR, EGFR,ERBB1, ERBB2, FGFR1-3, FLT3, GD2, HDAC, HER1, HER2, HR, IDH2, IL-1β, IL-6, IL-6R, JAK1, JAK2, JAK3, KIT, KRAS, MEK, MET, MSI-H, mTOR, PARP, PD-1, PDGFR, PDGFRo, PDGFRβ, PD-L1, PI3Kδ, PIGF, PTCH, RAF, RANKL, RET, ROS1, SLAMF7, VEGF, VEGFA, or VEGFB.
3. The method of claim 1, wherein determining, by analyzing sequence read data of the sample obtained from the subject, the copy number of the segment of the sample comprises:determining coverage ratio data, allele fraction data, and at least one copy number model associated with the segment of the sample; anddetermining the copy number of the segment of the sample based on the coverage ratio data, the allele fraction data, and the at least one copy number model associated with the segment of the sample.
4. The method of claim 1, wherein the threshold is in a range of about 2.0 to about 5.0.
5. The method of claim 1, wherein the threshold is about 0.5 or about 0.0.
6. The method of claim 1, wherein the segment of the sample comprises ERRB2 and the threshold is 2.0.
7. The method of claim 1, wherein comparing the ratio to the threshold comprises determining that the ratio exceeds the threshold, andwherein the indication of the copy number of the segment of the sample comprises an indication that the segment has been amplified.
8. The method of claim 1, wherein comparing the ratio to the threshold comprises determining that the ratio is below the threshold, andwherein the indication of the copy number of the segment of the sample comprises an indication that the segment is associated with a loss.
9. The method of claim 1, further comprising:based on comparing the ratio to the threshold, predicting a condition of an embryo or a fetus of the subject based on the copy number of the segment,wherein outputting the indication of the copy number of the segment comprises outputting an indication of the condition.
10. The method of claim 1, further comprising:determining the threshold based on a gene in the segment.
11. The method of claim 10, wherein determining the threshold based on the gene in the segment comprises determining the threshold further based on a demographic of the subject.
12. The method of claim 1, further comprising:determining a condition of the subject; anddetermining the threshold based on the condition.
13. The method of claim 1, the ratio being a first ratio, the segment being a first segment of the sample, the threshold being a first threshold, the method further comprising:determining a second ratio by dividing a copy number of a second segment of the sample by the ploidy of the sample;determining that the second ratio is less than or equal to a second threshold; andin response to determining that the second ratio is less than or equal to the second threshold, refraining from outputting an indication of the copy number of the second segment of the sample.
14. A system, comprising:a sequencer configured to generate sequence read data of a sample obtained from a subject;at least one processor; andmemory storing non-transitory instructions that, when executed by the at least one processor, cause the at least one processor to perform operations comprising:determining, by analyzing the sequence read data, a copy number of a segment of the sample; determining, by analyzing the sequence read data, a ploidy of the sample by determining an average copy number of multiple segments across a genome of the sample, the multiple segments across the genome of the sample comprising the segment of the sample;determining a ratio by dividing the copy number of the segment of the sample by the ploidy of the sample;comparing the ratio to a threshold; andbased on comparing the ratio to the threshold, outputting an indication of the copy number of the segment of the sample.
15. A method, comprising:providing a plurality of nucleic acid molecules obtained from a sample from a subject;ligating one or more adapters onto one or more nucleic acid molecules from the plurality of nucleic acid molecules;amplifying the one or more ligated nucleic acid molecules from the plurality of nucleic acid molecules; capturing amplified nucleic acid molecules from the amplified nucleic acid molecules;sequencing, by a sequencer, all or a subset of the captured amplified nucleic acid molecules to obtain a plurality of sequence reads that represent the sequenced amplified nucleic acid molecules thereby generating sequence read data;receiving, at one or more processors, the sequence read data for the plurality of sequence reads; determining, by the one or more processors, copy numbers of multiple segments across a genome of the sample;determining, by the one or more processors, a ploidy of the sample by determining an average of the copy numbers;determining a first ratio by dividing a first copy number, among the copy numbers, of a first segment, among the multiple segments, by the ploidy of the sample;determining that the first segment has been amplified by determining that the first ratio exceeds a threshold associated with the first segment;determining a second ratio by dividing a second copy number, among the copy numbers, of a second segment, among the multiple segments, by the ploidy of the sample;determining that the second ratio is less than or equal to a threshold associated with the second segment; andin response to determining that the first ratio exceeds the threshold associated with the first segment and determining that the second ratio is less than or equal to the threshold associated with the second segment,generating a report comprising an indication of the first copy number and omitting an indication of the second copy number.
16. The method of claim 15, wherein the first segment comprises at least one first gene, andwherein the second segment comprises at least one second gene.
17. The method of claim 16, wherein the at least one first gene comprises ERRB2 and the threshold associated with the first segment is about 2.0, andwherein the at least one second gene comprises a non-ERRB2 gene and the threshold associated with the second segment is in a range of about 3.0 to about 5.0.
18. The method of claim 16, further comprising:determining the threshold associated with the first segment based on the at least one first gene; and determining the threshold associated with the second segment based on the at least one second gene.
19. The method of claim 15, further comprising:predicting, based on the first copy number, a cancer type or cancer subtype of the subject,wherein the report comprises the cancer type or cancer subtype of the subject.
20. The method of claim 15, further comprising:predicting, based on the first copy number, a predicted effective therapy to treat a condition of the subject, wherein the report comprises the predicted effective therapy.