A method and system for cancer screening

By combining low-depth cfDNA sequencing with methylation and tumor fraction analysis, the problem of insufficient sensitivity and specificity in existing lung cancer screening methods has been solved, achieving efficient and accurate lung cancer screening while reducing costs.

CN115896258BActive Publication Date: 2026-06-30BIOCHAIN BEIJING SCI & TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BIOCHAIN BEIJING SCI & TECH
Filing Date
2021-09-30
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing lung cancer screening methods are insufficient in terms of sensitivity and specificity, and are costly, making it difficult to achieve efficient and accurate early detection.

Method used

Lung cancer patients were identified by detecting methylation and tumor fraction through low-depth cfDNA sequencing and combining copy number variation density distribution characteristics. cfDNA samples were treated with bisulfite, and methylation and tumor fraction were analyzed using ichorCNA and Bismar_methylation_extractor software. ROC curves were used to determine the threshold for interpretation.

Benefits of technology

It achieves highly sensitive and specific screening for lung cancer patients, significantly reducing screening costs and improving accuracy.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure BDA0003290886660000081
    Figure BDA0003290886660000081
  • Figure BDA0003290886660000091
    Figure BDA0003290886660000091
  • Figure BDA0003290886660000092
    Figure BDA0003290886660000092
Patent Text Reader

Abstract

This invention discloses a method and system for cancer screening. The method includes: sequencing a sample to obtain a first sequence set; aligning the first sequence set with a human reference genome to obtain a second sequence set; identifying copy number variations in the second sequence set and calculating a first and a second characteristic value; analyzing the second sequence set to obtain a third characteristic value; and determining whether the sample is cancerous based on the second and third characteristic values. This method enables the detection of methylation and tumor fraction to predict cancer patients, significantly reducing the cost of cancer screening and improving screening accuracy.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of cancer screening technology, and more particularly to a method and system for cancer screening. Background Technology

[0002] Lung cancer is one of the cancers with the highest incidence and mortality rates worldwide. In my country, it also ranks first in both incidence and mortality. Routine screening methods for lung cancer include low-dose spiral CT (LDCT) and protein biomarkers such as carcinoembryonic antigen (CEA), squamous cell carcinoma antigen (SCC), and neuron-specific enolase (NSE). However, these methods vary in sensitivity and specificity. Currently, DNA methylation has been proven to be tissue-specific and can be used for early cancer detection. Furthermore, the methylation characteristics of circulating tumor DNA (ctDNA) can be used to trace the primary tumor site.

[0003] Liquid biopsy analyzes cancer-related components in blood to enable early cancer screening, molecular subtyping, prognosis, medication guidance, and recurrence monitoring. As a new precision medicine technology, liquid biopsy plays an increasingly important role in cancer diagnosis and treatment due to its ability to qualitatively and quantitatively detect tumor cells and DNA directly related to tumors, and its non-invasive, convenient sampling, and real-time monitoring characteristics.

[0004] Studies have shown that CNVs are an important factor in tumor development and progression, inducing tumors by affecting the activity of proto-oncogenes and tumor suppressor genes. Multiple studies have demonstrated that CNV detection has the potential to serve as an indicator for tumor diagnosis.

[0005] Therefore, it is essential for those skilled in the art to design a non-invasive cancer screening method that can detect methylation and tumor scores to predict lung cancer patients, thereby significantly reducing the cost of cancer screening and improving the accuracy of screening. Summary of the Invention

[0006] In view of the above problems, the present invention provides a method for cancer screening, which is based on low-depth cfDNA sequencing that can simultaneously detect methylation and tumor fraction. By combining the differences in the methylation level of the biomarker with the copy number variation density distribution characteristics and the tumor fraction, the method characterizes the cfDNA in cancer patients, preferably lung cancer patients, thereby accurately identifying cancer patients, preferably lung cancer patients.

[0007] The specific technical solution of this invention is as follows:

[0008] 1. A method for cancer screening, comprising:

[0009] The sample was sequenced to obtain the first sequence set of the sample;

[0010] The first sequence set is compared with the human reference genome to obtain the second sequence set of the sample;

[0011] The second sequence set of the sample is identified to obtain copy number variations in the second sequence set and the first and second feature values ​​are calculated.

[0012] The third feature value of the sample is obtained by analyzing the second sequence set of the sample;

[0013] The second and third feature values ​​of a sample are used to determine whether it is a cancer sample.

[0014] 2. The method according to item 1, wherein the sample is a cfDNA extracted from the plasma of the subject and then methylated.

[0015] 3. The method according to item 1 or 2, wherein the first sequence set is the original sequence set, and the second sequence set is the processed and aligned sequence set.

[0016] 4. The method according to any one of items 1-3, wherein the first feature value is copy number variation (CNA), the second feature value is tumor fraction, and the third feature value is methylation level.

[0017] 5. The method according to any one of items 2-4, wherein the methylated sample is a sample treated with bisulfite.

[0018] 6. The method according to any one of items 1-5, wherein the cancer is lung cancer.

[0019] 7. The method according to any one of items 1-6, wherein the third feature value is the methylation level of the target region, and the second feature value is the tumor fraction of the target region; preferably, the target region is SEQ ID NO:1.

[0020] 8. The method according to any one of items 1-7, wherein whether a sample is a cancer sample is determined based on a threshold I of a second feature value and a threshold II of a third feature value of the sample, wherein threshold I is determined based on the tumor scores of a given cancer sample and a healthy sample, and wherein threshold II is determined based on the methylation levels of a given positive subject and a negative subject.

[0021] 9. The method according to item 8, wherein if the second feature value of the sample exceeds the threshold I, the sample is a cancer sample, and if the second feature value of the sample is lower than the threshold I, the sample is a healthy sample.

[0022] 10. The method according to item 8 or 9, wherein if the third feature value of the sample exceeds the threshold II, the sample is a cancer sample, and if the third feature value of the sample is below the threshold II, the sample is a healthy sample.

[0023] 11. A system for cancer screening, comprising:

[0024] Sequencing module: Used to sequence a sample to obtain the first set of sequences from the sample;

[0025] Alignment module: used to align the first sequence set with the human reference genome to obtain the second sequence set of the sample;

[0026] Identification and calculation module: used to identify the second sequence set of the sample, obtain the copy number variation in the second sequence set, and calculate the first feature value and the second feature value;

[0027] Analysis module: Used to analyze the second sequence set of the sample to obtain the third feature value of the sample;

[0028] Interpretation module: Determines whether a sample is a cancer sample based on its second and third feature values.

[0029] 12. The system according to item 11, wherein the sample is a cfDNA extracted from the plasma of a subject and then methylated.

[0030] 13. The system according to item 11 or 12, wherein the first sequence set is an original sequence set and the second sequence set is a processed and aligned sequence set.

[0031] 14. The system according to any one of claims 11-13, wherein the first feature value is CNA, the second feature value is tumor score, and the third feature value is methylation level.

[0032] 15. The system according to any one of items 12-14, wherein the methylated sample is a bisulfite-treated sample.

[0033] 16. The system according to any one of items 11-15, wherein the cancer is lung cancer.

[0034] 17. The system according to any one of claims 11-16, wherein the third feature value is the methylation level of the target region, the second feature value is the tumor fraction of the target region, and preferably, the target region is SEQ ID NO:1.

[0035] 18. The system according to any one of items 11-17, wherein, in the interpretation module, whether a sample is a cancer sample is determined based on a threshold I of a second feature value and a threshold II of a third feature value of the sample, wherein threshold I is determined based on the tumor scores of a given cancer sample and a healthy sample, and wherein threshold II is determined based on the methylation levels of a given positive subject and a negative subject.

[0036] 19. The system according to item 18, wherein if the second feature value of a sample exceeds the threshold I, the sample is a cancer sample, and if the second feature value of a sample is below the threshold I, the sample is a healthy sample.

[0037] 20. The system according to item 18 or 19, wherein if the third feature value of a sample exceeds the threshold II, the sample is a cancer sample, and if the third feature value of a sample is below the threshold II, the sample is a healthy sample.

[0038] 21. A biomarker for detecting cancer, the sequence of which is shown in SEQ ID NO:1.

[0039] 22. The biomarker according to item 21, wherein the cancer is lung cancer.

[0040] The effects of the invention

[0041] The method described in this invention can detect methylation and tumor fraction to predict cancer patients, which can significantly reduce the cost of cancer screening and improve the accuracy of screening. Attached Figure Description

[0042] Figure 1 This is a diagram illustrating the difference in tumor score levels between Health_VS_LungCancer.

[0043] Figure 2 The ROC curve for Health_VS_LungCancer is shown.

[0044] Figure 3 A schematic diagram comparing the differences in methylation levels between Health_VS_LungCancer.

[0045] Figure 4 ROC curves for Health_VS_LungCancer based on methylation levels.

[0046] Figure 5 ROC curves for Health_VS_LungCancer based on methylation level and tumor fraction Detailed Implementation

[0047] The present invention will now be described in detail with reference to the accompanying drawings, wherein the same numerals in all the drawings denote the same features. While specific embodiments of the invention are shown in the drawings, it should be understood that the invention can be implemented in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this invention will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

[0048] It should be noted that certain terms are used in the specification and claims to refer to specific components. Those skilled in the art will understand that different terms may be used to refer to the same component. This specification and claims do not distinguish components based on differences in terminology, but rather on differences in function. The terms "comprising" or "including" used throughout the specification and claims are open-ended and should be interpreted as "comprising but not limited to." The following descriptions are preferred embodiments for carrying out the invention; however, these descriptions are for the purpose of understanding the general principles of the specification and are not intended to limit the scope of the invention. The scope of protection of this invention is determined by the appended claims.

[0049] This invention provides a method for cancer screening, comprising:

[0050] The sample was sequenced to obtain the first sequence set of the sample;

[0051] The first sequence set is compared with the human reference genome to obtain the second sequence set of the sample;

[0052] The second sequence set of the sample is identified to obtain copy number variations in the second sequence set and the first and second feature values ​​are calculated.

[0053] The third feature value of the sample is obtained by analyzing the second sequence set of the sample;

[0054] The second and third feature values ​​of a sample are used to determine whether it is a cancer sample.

[0055] The sample was a cfDNA extracted from the subject's plasma and then methylated.

[0056] For cfDNA extracted from the plasma of the subject, it can be extracted using methods conventional in the art, such as obtaining plasma by centrifuging the blood sample, and then lysing, binding and eluting to obtain cfDNA.

[0057] The methylation treatment refers to treatment with bisulfite.

[0058] The first sequence set is the original sequence set, that is, the sequence set obtained after sequencing and removing low-quality reads.

[0059] The second sequence set is a processed and aligned sequence set, that is, a sequence set obtained after being aligned with the human reference genome.

[0060] In one embodiment, the first feature is copy number variation (CNA), the second feature is tumor fraction, and the third feature is methylation level.

[0061] The CNA (copy number alteration) refers to genomic structural variation, which mainly reflects the increase or decrease in the copy number of large segments of the genome.

[0062] Peripheral circulating cell-free DNA (cfDNA) is DNA fragments released into the blood plasma after cell apoptosis, necrosis, or pathological changes. cfDNA can be used to describe various forms of cell-free DNA in peripheral circulating blood. ctDNA is cfDNA derived from tumor cells and is a type of cfDNA. The tumor fraction refers to the proportion of ctDNA in cfDNA, that is, the proportion of cfDNA from tumor cells in the total cfDNA.

[0063] Studies have shown that when tumors occur, cells proliferate malignantly, generally resulting in copy number variations; during tumor development, tumor cell cfDNA is released into the bloodstream. The ichoCNA software calculates CNA using the HMM model after depth assessment of data reads, and calculates the tumor fraction using the EM algorithm based on copy number variations. These two values ​​describe two different aspects of the sample.

[0064] In one implementation, a second sequence set of the sample is identified to obtain copy number variations in the second sequence set and a first feature value and a second feature value are calculated. This is done by using ichorCNA to identify copy number variations in the second sequence set and calculate the first feature value and the second feature value.

[0065] ichorCNA is based on whole-genome methylation sequencing (WGBS) to obtain copy number variations in the second sequence set and calculate the first and second eigenvalues. Its operating principle is as follows:

[0066] The core algorithm of ichorCNA software is to calculate CNA using HMM (Hidden Markov Model) model and to calculate tumor fraction using EM (Expectation-Maximization) algorithm. The detailed principle process of ichorCNA software is as follows: (1) First, set the bin size (1M or 0.5M) and calculate the number of reads in each bin. (2) Correct the number of reads in each bin for GC, mappability and depth difference. (3) Calculate the logR value by comparing the corrected number of reads in each bin with the number of reads in each bin of the software's built-in normal panel. (4) Use HMM and EM algorithms to calculate the result of each possible scheme and the maximum likelihood function of each scheme. (5) Select the scheme with the largest likelihood function as the final result, that is, obtain the first feature value and the second feature value.

[0067] In one implementation, the second sequence set of the sample is analyzed to obtain the third characteristic value of the sample. The methylation site information of the second sequence set can be extracted using Bismar_methylation_extractor to obtain the third characteristic value of the sample, i.e., the methylation level of the sample.

[0068] In one implementation, the cancer is lung cancer.

[0069] In one embodiment, the third feature value is the methylation level of the target region, and the second feature value is the tumor fraction of the target region. Preferably, the target region is SEQ ID NO:1.

[0070] The sequence of SEQ ID NO:1 is as follows:

[0071] GCAGGCAGTACCTCGGCGTGACGCGGTGACGCAGCCGCAGG

[0072] In one implementation, a sample is determined to be a cancer sample based on a second feature value (threshold I) and a third feature value (threshold II), wherein threshold I is determined based on the tumor scores of a given cancer sample and a healthy sample, and threshold II is determined based on the methylation levels of a given positive subject and a negative subject.

[0073] The threshold I for the second feature value and the threshold II for the third feature value both refer to obtaining the optimal threshold point from the ROC curve plotted based on the test dataset.

[0074] The confusion matrix corresponding to the optimal threshold point on the ROC curve will be the basis for calculating metrics such as sensitivity, specificity, and accuracy. Typically, we use the Youden index for selection. The Youden index, also known as the accuracy index, is calculated as the sum of sensitivity and specificity minus 1: Youden index = Sensitivity + Specificity – 1. The Youden index ranges from 0 to 1, representing the classification model's overall ability to distinguish between true patients and non-patients. A higher Youden index indicates better classification model performance.

[0075] In one implementation, if the second feature value of a sample exceeds the threshold I, the sample is a cancer sample; if the second feature value of a sample is below the threshold I, the sample is a healthy sample.

[0076] In one implementation, if the third characteristic value of a sample exceeds the threshold II, the sample is a cancer sample; if the third characteristic value of a sample is below the threshold II, the sample is a healthy sample.

[0077] In one embodiment, the present invention provides a method for cancer screening, comprising: sequencing a sample to obtain a first sequence set of the sample; aligning the first sequence set with a human reference genome to obtain a second sequence set of the sample; identifying the second sequence set of the sample to obtain copy number variations in the second sequence set and calculating a first characteristic value and a second characteristic value; analyzing the second sequence set of the sample to obtain a third characteristic value of the sample; and determining whether the sample is a cancer sample based on the second characteristic value and the third characteristic value of the sample, wherein the sample is a cfDNA extracted from the plasma of a subject and methylated, the first sequence set is the original sequence set, the second sequence set is the processed and aligned sequence set, the first characteristic value is CNA, the second characteristic value is tumor fraction, the third characteristic value is methylation level, the cancer is lung cancer, and the third characteristic value is the methylation level of a target region, preferably, the target region is SEQ ID. NO:1, and determines whether a sample is a cancer sample based on a threshold I for the second feature value and a threshold II for the third feature value. If the second feature value of a sample exceeds the threshold I, the sample is a cancer sample; if the second feature value of a sample is lower than the threshold I, the sample is a healthy sample. If the third feature value of a sample exceeds the threshold II, the sample is a cancer sample; if the third feature value of a sample is lower than the threshold II, the sample is a healthy sample.

[0078] This invention achieves high sensitivity and specificity by simultaneously detecting the second and third feature values ​​of a sample, resulting in a higher AUC than detection using a single feature value alone, thus increasing predictive ability.

[0079] This invention provides a system for cancer screening, comprising:

[0080] Sequencing module: Used to sequence a sample to obtain the first set of sequences from the sample;

[0081] Alignment module: used to align the first sequence set with the human reference genome to obtain the second sequence set of the sample;

[0082] Identification and calculation module: used to identify the second sequence set of the sample to obtain the copy number variation in the second sequence set and calculate the first feature value and the second feature value;

[0083] Analysis module: Used to analyze the second sequence set of the sample to obtain the third feature value of the sample;

[0084] Interpretation module: Determines whether a sample is a cancer sample based on its second and third feature values.

[0085] The system described in this invention can simultaneously detect the second and third feature values, which can significantly reduce the cost of cancer screening and improve the accuracy of screening.

[0086] This invention provides a biomarker for detecting cancer, the biomarker being SEQ ID NO:1.

[0087] In one implementation, the cancer is lung cancer.

[0088] The inventors used epigenomics and bioinformatics techniques to analyze whole-genome methylation data of cancers such as lung cancer, identified a methylation gene associated with cancer such as lung cancer, and determined the target sequence for abnormal methylation of the methylation gene in cancer such as lung cancer. Furthermore, through the target sequence of this methylation gene, the methylation status of the gene can be detected sensitively and specifically, which can then be used for the detection of cell-free DNA in peripheral blood.

[0089] Example

[0090] This invention provides a general and / or specific description of the materials and methods used in the experiments. In the following examples, unless otherwise specified, % represents wt%, i.e., weight percentage. Reagents or instruments used, unless otherwise specified, are all commercially available conventional reagent products.

[0091] Example 1

[0092] 1.1 cfDNA extraction and purification

[0093] 1.1.1 Plasma sample preparation:

[0094] Centrifuge the blood sample at 2000g for 10 minutes at 4°C, and transfer the plasma to a new centrifuge tube. Centrifuge the plasma sample at 16000g for 10 minutes at 4°C. Proceed to the next step depending on the type of collection tube used; in this experiment, an "other" type of collection tube was used.

[0095] Table 1

[0096]

[0097]

[0098] 1.1.2 Fracturing and Binding

[0099] 1.1.2.1. Prepare the Binding Solution / Beads Mix according to the table below, and then mix thoroughly.

[0100] Table 2

[0101]

[0102] Add an appropriate volume of plasma sample.

[0103] 1.1.2.2. Thoroughly mix the plasma sample and Binding Solution / Beads Mix.

[0104] 1.1.2.3. Mix thoroughly on a rotary mixer for 10 minutes to allow the cfDNA to bind to the magnetic beads.

[0105] 1.1.2.4. Place the binding tube on the magnetic rack for 5 minutes until the solution becomes clear and the magnetic beads are completely adsorbed on the magnetic rack.

[0106] 1.1.2.5. Carefully discard the supernatant with a pipette, and continue to keep the tube on the magnetic rack for a few minutes. Remove any remaining supernatant with a pipette.

[0107] 1.1.3 Washing

[0108] 1.1.3.1. Resuspend the beads in 1 ml of Wash Solution.

[0109] 1.1.3.2. Transfer the resuspension to a new, non-adsorbed 1.5 ml centrifuge tube. Retain the binding tube.

[0110] 1.1.3.3. Place the centrifuge tube containing the bead resuspension on a magnetic rack for 20 seconds.

[0111] 1.1.3.4. Aspirate the supernatant obtained from the separation and wash the binding tube. Collect the residual beads after washing back into the resuspension and discard the lysis / binding tube.

[0112] 1.1.3.5. Place the tube on the magnetic rack for 2 minutes, until the solution becomes clear and the beads gather on the magnetic rack. Remove the supernatant with a 1 ml pipette.

[0113] 1.1.3.6. Leave the tube on the magnetic rack and remove as much residual liquid as possible using a 200μL pipette.

[0114] 1.1.3.7. Remove the tube from the magnetic holder, add 1 ml of Wash Solution, and vortex for 30 seconds.

[0115] 1.1.3.8. Place on a magnetic rack for 2 minutes until the solution is clear and the beads gather on the magnetic rack. Remove the supernatant with a 1 ml pipette.

[0116] 1.1.3.9. Leave the tube on the magnetic rack and use a 200 μL pipette to completely remove any remaining liquid.

[0117] 1.1.3.10. Remove the tube from the magnetic rack, add 1 ml of 80% ethanol, and vortex for 30 seconds.

[0118] 1.1.3.11. Place on a magnetic rack for 2 minutes until the solution becomes clear, then remove the supernatant with a 1 ml pipette.

[0119] 1.1.3.12. Leave the tube on the magnetic rack and remove any remaining liquid using a 200 μL pipette.

[0120] 1.1.3.13. Repeat steps 10-12 above once with 80% ethanol to remove as much supernatant as possible.

[0121] 1.1.3.14. Leave the tube on the magnetic rack and let the beads dry in the air for 3-5 minutes.

[0122] 1.1.4 Elution of cfDNA

[0123] 1.1.4.1. Add the solution according to the table below.

[0124] Table 3

[0125]

[0126] 1.1.4.2. Place on a magnetic rack for 2 minutes until the solution becomes clear, then aspirate the cfDNA from the supernatant.

[0127] 1.1.4.3. The purified cfDNA can be used immediately, or the supernatant can be transferred to a new centrifuge tube and stored at -20°C.

[0128] 1.2g DNA fragmentation and purification:

[0129] 1.2.1. According to the Qubit concentration, take 2 μg of DNA, add water to make up to 125 μl, add it to a 130 μl Covaris fragmentation tube, and set the program: 50W, 20%, 200 cycles, 250s.

[0130] 1.2.2 After the fragmentation is completed, take 1 μl of sample and use Agilent 2100 to detect the fragment. After normal fragmentation, the main peak of the sample is about 150bp-200bp.

[0131] For cfDNA samples, Agilent 2100 was used for fragment detection, and the Qubit was directly used for subsequent experiments.

[0132] 1.3 End repair, 3' end with "A":

[0133] 1.3.1. Take 50 ng of fragmented gDNA or cfDNA into a PCR tube, add nuclease-free water to a final volume of 50 μl, add the following reagents, and vortex to mix:

[0134] Table 4

[0135] Components volume gDNA / cfDNA 50μl End Repair & A-Tailing Buffer 7μl End Repair&A-Tailing Enzyme Mix 3μl Total volume 60μl

[0136] 1.3.2. Set the following program to perform the reaction on the PCR instrument:

[0137] The temperature of the hot cap is 85℃.

[0138] Table 5

[0139]

[0140]

[0141] 1.4 Connector connection and purification:

[0142] 1.4.1. Refer to the table below to dilute the connector to a suitable concentration in advance:

[0143] Table 6

[0144] Fragmented DNA per 50ul ER&AT reaction Adapter concentration 1μg 10uM 500ng 10uM 250ng 10uM 100ng 10uM 50ng 10uM 25ng 10uM 10ng 3uM 5ng 5uM 2.5ng 2.5uM 1ng 625nM

[0145] 1.4.2. Prepare the following reagents according to the table below, gently pipette and mix well, then briefly centrifuge:

[0146] Table 7

[0147] Components volume End-stage repair, addition of "A" reaction product 60μl connector 5μl Nuclease-free water 5μl Ligation Buffer 30μl DNA Ligase 10μl Total volume 110μl

[0148] 1.4.3. Set the following program to perform the reaction on the PCR instrument:

[0149] No heated cap.

[0150] Table 8

[0151] temperature time 20℃ 30min 4℃ ∞

[0152] 1.4.4. Add purified magnetic beads to the following system for the experiment (Agencourt AMPure XP magnetic beads should be brought to room temperature and mixed thoroughly beforehand):

[0153] Table 9

[0154]

[0155]

[0156] 1.4.4.1. Gently whisk and mix 6 times.

[0157] 1.4.4.2. Incubate at room temperature for 5-15 minutes, then place the PCR tube on a magnetic rack for 3 minutes to allow the solution to clarify.

[0158] 1.4.4.3. Remove the supernatant, keep the PCR tube on the magnetic rack, add 200 μl of 80% ethanol solution to the PCR tube, and let it stand for 30 seconds.

[0159] 1.4.4.4. Remove the supernatant, then add 200 μl of 80% ethanol solution to the PCR tube, let it stand for 30 seconds, and then completely remove the supernatant (it is recommended to use a 10 μl pipette to remove any residual ethanol solution at the bottom).

[0160] 1.4.4.5. Let stand at room temperature for 3-5 minutes to allow the residual ethanol to evaporate completely.

[0161] 1.4.4.6. Add 22 μl of Nuclease-free water, remove the PCR tube from the magnetic rack, gently aspirate and resuspend the magnetic beads to avoid generating air bubbles, and let stand at room temperature for 2 minutes.

[0162] 1.4.4.7. Place the PCR tube on a magnetic rack for 2 minutes to allow the solution to clarify.

[0163] 1.4.4.8. Use a pipette to draw 20 μl of supernatant and transfer it to a new PCR tube.

[0164] 1.5 Treatment and purification of bisulfite:

[0165] 1.5.1. Prepare the required reagents in advance and dissolve them. Add the reagents according to the table below:

[0166] Table 10

[0167] Components High concentration sample (1ng-2μg) volume Low concentration sample (1-500 ng) volume Connector to purified product 20μl 40μl Bisulfite solution 85μl 85μl DNA protect buffer 35μl 15μl Total volume 140μl 140μl

[0168] 1.5.2. DNA Protect buffer turns the liquid blue upon addition. Gently pipette and mix well, then divide into two tubes and place them on the PCR instrument.

[0169] 1.5.3. Configure and run the following program:

[0170] Heat cover 105℃.

[0171] Table 11

[0172] temperature time 95℃ 5min 60℃ 10min 95℃ 5min 60℃ 10min 4℃ ∞

[0173] 1.5.4. Brief centrifugation: Combine the two identical samples into a single clean 1.5 ml centrifuge tube.

[0174] 1.5.5. Add 310 μl of Buffer BL to each sample (add 1 μl of Carrier RNA (1 μg / μl) for sample volumes less than 100 ng), vortex to mix, and briefly centrifuge.

[0175] 1.5.6. Add 250 μl of anhydrous ethanol to each sample, vortex to mix for 15 s, centrifuge briefly, and add the mixture to the corresponding prepared centrifuge column.

[0176] 1.5.7. Let stand for 1 minute, centrifuge for 1 minute, transfer the liquid in the collection tube back to the centrifuge column, centrifuge for 1 minute, and discard the liquid in the centrifuge tube.

[0177] 1.5.8. Add 500 μl of buffer BW (note whether to add anhydrous ethanol), centrifuge for 1 min, and discard the waste liquid.

[0178] 1.5.9. Add 500 μl of buffer BD (note whether to add anhydrous ethanol), cap the tube, and incubate at room temperature for 15 min. Centrifuge for 1 min and discard the liquid collected after centrifugation.

[0179] 1.5.10. Add 500 μl of buffer BW (note whether to add anhydrous ethanol), centrifuge for 1 min, discard the liquid, and repeat once, for a total of 2 times.

[0180] 1.5.11. Add 250 μl of anhydrous ethanol, centrifuge for 1 min, transfer the centrifuge column to a new 2 ml collection tube, and discard all remaining liquid.

[0181] 1.5.12. Place the centrifuge column into a clean 1.5ml centrifuge tube, add 20μl of nuclease-free water to the center of the centrifuge column membrane, gently cap the tube, incubate at room temperature for 1 min, and centrifuge for 1 min.

[0182] 1.5.13. Transfer the liquid in the collection tube back to the centrifuge column, let it stand at room temperature for 1 min, and then centrifuge for 1 min.

[0183] 1.6 Document Quality Inspection

[0184] 1.6.1. Take 1 μl of sample and use Qubit to determine the library concentration, and record the library concentration.

[0185] 1.6.2. Take 1 μl of sample and use Agilent 2100 to determine the length of the library fragments. The library length is approximately between 270 bp and 320 bp.

[0186] 1.6.3. Sequencing was performed using the Illumina high-throughput sequencing platform.

[0187] 1.6.4. Methylation Bioinformatics Analysis Workflow. The general process is as follows: Quality control software such as FASTP is used to check sequencing quality and remove low-quality reads. Then, alignment software such as Bismarker is used to align the clean data after quality control to the reference genome. Methylation site information is extracted using Bismarker_methylation_extractor, and the methylation level of each target region (SEQ ID NO:1) is calculated. Tumor fraction is calculated using ichorCNA software.

[0188] Twenty-four positive and 38 negative samples were taken, and ROC curves were plotted using the pROC package in R. The threshold values ​​for tumor score were 0.039 and methylation level were 0.276. If the tumor score threshold value exceeded 0.039, it was interpreted as cancer; if it was below 0.039, it was interpreted as normal. The threshold value for methylation level in the target region was 0.276. If the value exceeded 0.276, it was interpreted as cancer; if it was below 0.276, it was interpreted as normal.

[0189] Example 2

[0190] A lung cancer sample was analyzed using the method described in this application. Peripheral blood was collected according to Example 1; a library was constructed, and sequencing was performed using the Illumina platform. The sequencing data were analyzed using the aforementioned bioinformatics workflow to obtain the methylation level of the target region and the tumor score, as detailed below:

[0191] The sample was analyzed using a model. Its tumor score of 0.071, exceeding the threshold of 0.039, was interpreted as a cancer sample. The methylation level in the target region was 0.667, exceeding the threshold of 0.276, also indicating a cancer sample.

[0192] Example 3

[0193] A lung cancer sample was analyzed using the method described in this application. Peripheral blood was collected according to Example 1; a library was constructed, and sequencing was performed using the Illumina platform. The sequencing data were analyzed using the aforementioned bioinformatics workflow to obtain the methylation level of the target region and the tumor score, as detailed below:

[0194] The sample was analyzed using a model. Its tumor score of 0.092, exceeding the threshold of 0.039, was interpreted as a cancer sample. The methylation level in the target region was 0.423, exceeding the threshold of 0.276, also indicating a cancer sample.

[0195] Example 4

[0196] A lung cancer sample was analyzed using the method described in this application. Peripheral blood was collected according to Example 1; a library was constructed, and sequencing was performed using the Illumina platform. The sequencing data were analyzed using the aforementioned bioinformatics workflow to obtain the methylation level of the target region and the tumor score, as detailed below:

[0197] The sample was analyzed using a model. Its tumor score of 0.063, exceeding the threshold of 0.039, was interpreted as a cancer sample. The methylation level in the target region was 0.715, exceeding the threshold of 0.276, also indicating a cancer sample.

[0198] Example 5

[0199] A normal human sample was analyzed using the method described in Example 1. Peripheral blood was collected, a library was constructed, and sequencing was performed using the Illumina platform. The sequencing data were analyzed using the aforementioned bioinformatics workflow to obtain the methylation level of the target region and the tumor score, as detailed below:

[0200] The sample was analyzed using a model. The tumor score was 0.013, less than the threshold of 0.039, and therefore the sample was classified as healthy. The methylation level in the target region was 0.15, less than the threshold of 0.276, and the sample was also classified as healthy.

[0201] Example 6

[0202] A normal human sample was analyzed using the method described in Example 1. Peripheral blood was collected, a library was constructed, and sequencing was performed using the Illumina platform. The sequencing data were analyzed using the aforementioned bioinformatics workflow to obtain the methylation level of the target region and the tumor score, as detailed below:

[0203] The sample was analyzed using a model. The tumor score was 0.026, less than the threshold of 0.039, and therefore the sample was classified as healthy. The methylation level in the target region was 0.183, less than the threshold of 0.276, and the sample was also classified as healthy.

[0204] Example 7

[0205] A normal human sample was analyzed using the method described in Example 1. Peripheral blood was collected, a library was constructed, and sequencing was performed using the Illumina platform. The sequencing data were analyzed using the aforementioned bioinformatics workflow to obtain the methylation level of the target region and the tumor score, as detailed below:

[0206] The sample was analyzed using a model. The tumor score was 0.019, less than the threshold of 0.039, and therefore the sample was classified as healthy. The methylation level in the target region was 0.06, less than the threshold of 0.276, and the sample was also classified as healthy.

[0207] Example 8: Statistical results calculated using a combination of tumor score and target region methylation level indicators.

[0208] In Example 1, 70% of the data from the test dataset of 24 lung cancer samples and 38 normal samples was randomly selected as the model training set (using the R package randomForest). A random forest model was constructed on the training set, combining the tumor scores and target region methylation levels of the samples. The cutoff for predicting lung cancer outcomes based on all tumor score data and target region methylation levels was 0.382, i.e., a specified threshold of 0.382 (for example, for an unknown sample, the tumor score and methylation level values ​​were measured and fed into the trained random forest model to obtain a value; if this value was greater than 0.382, the patient was identified as a lung cancer patient). ROC curves were plotted based on the entire test dataset (24 lung cancer samples and 38 normal samples). The ROC curves are shown below. Figure 5 As shown in Table 12, the results for specificity and sensitivity are presented.

[0209] Table 12 Sensitivity and specificity based on tumor score and methylation level

[0210]

[0211]

[0212] from Figure 5 As shown in Table 12, its AUC is 0.984, sensitivity is 100%, and specificity is 94.73%.

[0213] Statistical results of Comparative Example 1 calculated using tumor score and target region methylation level indicators alone.

[0214] (1) Statistical results calculated using tumor scores alone

[0215] In the test dataset of 24 lung cancer samples and 38 normal human samples, the test data were all tested according to the method described in Example 1, and the test data are as follows. Figure 1 As shown, box plots of tumor score distribution in lung cancer and normal samples are generated using R packages such as ggpubr and ggstatsplot, based on the obtained tumor scores. The ROC curves are then used to validate the results on this dataset. Figure 2 As shown in Table 13, sensitivity and specificity refer to the proportion of individuals who are actually negative but are correctly identified as true negatives, and sensitivity refers to the proportion of individuals who are actually positive but are correctly identified as true positives.

[0216] Table 13 Sensitivity and specificity based on tumor score

[0217]

[0218] from Figure 1 It can be seen that healthy people have low tumor scores, while lung cancer patients have high tumor scores.

[0219] from Figure 2 As shown in Table 13, the AUC reached 0.703, the sensitivity was 79.16%, and the specificity was 60.53%.

[0220] (2) Statistical results calculated using methylation level alone

[0221] Based on the methylation level, the ROC curves plotted in this dataset are as follows: Figure 4 As shown in Table 14, the sensitivity and specificity are as follows, where the test data for methylation levels are as follows: Figure 3 As shown, its operation method is the same as Figure 1 same.

[0222] from Figure 3 It can be seen that healthy people have lower methylation levels, while lung cancer patients have higher methylation levels.

[0223] from Figure 4 As shown in Table 14, the AUC reached 0.845, the sensitivity was 70.83%, and the specificity was 92.11%.

[0224] Table 14 Sensitivity and specificity based on methylation level

[0225]

[0226] The detection method described in this invention simultaneously detects the tumor fraction and methylation level of the sample, which has high sensitivity and specificity, and can increase the predictive ability.

[0227] The above description is merely a preferred embodiment of the present invention and is not intended to limit the invention in any other way. Any person skilled in the art may make changes or modifications to the above-disclosed technical content to create equivalent embodiments. However, any simple modifications, equivalent changes, and modifications made to the above embodiments based on the technical essence of the present invention without departing from the scope of the present invention shall still fall within the protection scope of the present invention. SEQUENCE LISTING <110> Bocheng (Beijing) Technology Co., Ltd. <120> A method and system for cancer screening <130> PE01730 <140> 202111164343.8 <141> 2021-09-30 <160> 1 <170> PatentIn version 3.5 <210> 1 <211> 41 <212> DNA <213> Artificial sequence <220> <223> artificially synthesized <400> 1 gcaggcagta cctcggcgtg acgcggtgac gcagccgcag g 41

Claims

1. A system for cancer screening, comprising: Sequencing module: Used to sequence a sample to obtain the first set of sequences from the sample; Alignment module: used to align the first sequence set with the human reference genome to obtain the second sequence set of the sample; Identification and calculation module: used to identify the second sequence set of the sample, obtain the copy number variation in the second sequence set, and calculate the first feature value and the second feature value; Analysis module: Used to analyze the second sequence set of the sample to obtain the third feature value of the sample; The interpretation module determines whether a sample is a cancer sample based on its second and third feature values. The sample was a cfDNA extracted from the subject's plasma and then methylated. The first feature is copy number variation (CNA), the second feature is tumor fraction, and the third feature is methylation level; The cancer is lung cancer, and the target region is SEQ ID NO:1 The third feature value is the methylation level of the target region, and the second feature value is the tumor score of the target region.

2. The system according to claim 1, wherein, The first sequence set is the original sequence set, and the second sequence set is the sequence set after processing and alignment.

3. The system according to claim 1, wherein, The methylated sample was a sample treated with bisulfite.

4. The system according to any one of claims 1-3, wherein, In the interpretation module, a sample is determined to be a cancer sample based on a threshold I for the second feature value and a threshold II for the third feature value. Threshold I is determined based on the tumor scores of a given cancer sample and a healthy sample, while threshold II is determined based on the methylation levels of a given positive subject and a negative subject.

5. The system according to claim 4, wherein, If the second feature value of a sample exceeds the threshold I, the sample is a cancer sample; if the second feature value of a sample is below the threshold I, the sample is a healthy sample.

6. The system according to claim 4, wherein, If the third feature value of a sample exceeds the threshold II, the sample is a cancer sample; if the third feature value of a sample is below the threshold II, the sample is a healthy sample.

7. Use of a reagent for detecting a biomarker in the preparation of a kit for detecting lung cancer, wherein the biomarker is a methylated biomarker and the sequence of the biomarker is shown in SEQ ID NO:1.