Methylation marker combination of esophageal cancer and application, kit, system, model construction method and nucleic acid fragment combination

By combining methylation immunoprecipitation and high-throughput sequencing with a binary classification model, and utilizing combinations of methylation markers from specific chromosomal regions, the invasiveness and insufficient sensitivity of existing esophageal cancer screening methods have been addressed, achieving efficient and sensitive early screening results.

CN122256504APending Publication Date: 2026-06-23JIANGSU MOLE BIOSCI +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
JIANGSU MOLE BIOSCI
Filing Date
2024-12-20
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing esophageal cancer screening methods are highly invasive, and blood DNA testing is not sensitive enough to meet the needs of early screening. Traditional bisulfite sequencing is inefficient and biased, so there is a need to develop more efficient and sensitive ctDNA methylation marker screening technologies.

Method used

A binary classification model for esophageal cancer was constructed using methylation immunoprecipitation combined with liquid-phase hybridization capture and high-throughput sequencing. Esophageal cancer screening was performed using random forest, extra tree, or K-nearest neighbor models. A combination of methylation markers containing specific chromosomal regions was used to classify the cancer based on relative methylation levels.

Benefits of technology

It achieves highly sensitive and specific screening for esophageal cancer, with high patient compliance, enabling early detection of esophageal cancer risk and providing a more efficient screening program.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122256504A_ABST
    Figure CN122256504A_ABST
Patent Text Reader

Abstract

The application discloses a methylation marker combination of esophageal cancer and application, a kit, a system, a model construction method and a nucleic acid fragment combination. The methylation marker combination of the esophageal cancer comprises at least 10 regions in 37 chromosomal regions defined by Hg38 coordinates. The methylation marker combination of the esophageal cancer provided by the application has high sensitivity to the esophageal cancer, and can be used for early screening of esophageal cancer risk and screening of the esophageal cancer.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of molecular biology and gene detection, specifically involving the combination and application of methylation markers for esophageal cancer, reagent kits, systems, model construction methods, and combinations of nucleic acid fragments. Background Technology

[0002] Esophageal cancer is one of the most common malignant tumors in the world. China is a high-incidence area for esophageal cancer. Although the incidence and mortality rates of esophageal cancer in China are declining, it is still a major malignant tumor threatening the health of Chinese residents.

[0003] Esophageal cancer is one of the most common malignant tumors in my country, characterized by its high malignancy and heavy disease burden. Data from the National Cancer Center shows that in 2022, my country had 224,000 new cases of esophageal cancer and 187,500 deaths, accounting for nearly 50% of both global esophageal cancer incidence and mortality. Currently, over 95% of newly diagnosed esophageal cancer cases in my country are in the middle or late stages, indicating a very low early detection rate, which severely impacts long-term survival and prognosis. The "Guidelines for the Diagnosis and Treatment of Esophageal Cancer (2022 Edition)" issued by the National Health Commission points out that early screening of high-risk groups for esophageal cancer can help improve the early detection rate and fundamentally reduce the disease burden of esophageal cancer in my country.

[0004] Esophageal endoscopic biopsy pathological diagnosis is the gold standard for esophageal cancer screening. However, due to its highly invasive nature, the compliance rate of esophageal endoscopy screening among Chinese people is low, and the number of people undergoing esophageal endoscopy is increasing year by year. Large-scale application of esophagescopy for screening will also result in a huge waste of resources.

[0005] Currently available blood DNA-based auxiliary diagnostic technologies for esophageal cancer generally have low sensitivity, with sensitivity for esophageal cancer generally below 86%, failing to meet clinical needs. According to publicly available data from products approved for marketing by the National Medical Products Administration in 2024, blood MT-1A, Epo, and Septin9 gene methylation detection from Beijing Bocheng achieved a sensitivity of 85.5% and a specificity of 93.6% for esophageal cancer. Clinical practice shows that blood MT-1A, Epo, and Septin9 gene methylation detection is mainly used in patients with esophageal cancer-related symptoms and signs, such as a choking sensation, foreign body sensation when swallowing, retrosternal pain, or significant dysphagia, and esophagography revealing localized thickening of the esophageal mucosa, localized wall rigidity, filling defects, or niche shadows. Therefore, extensive research is still needed to improve sensitivity and specificity.

[0006] In the mining of marker types for early screening of esophageal cancer, circulating tumor DNA (ctDNA) methylation, compared with ctDNA mutation, not only has more modification sites, but also has tissue / cancer type specificity, which can take into account both signal abundance and signal intensity, and has obvious advantages compared with other indicators.

[0007] To obtain information on methylation changes associated with specific cancers, the most traditional screening method for methylation markers, the gold standard, is whole-genome bisulfite sequencing (WGBS), which can obtain signals at single-base resolution. There are also reports of using reduced genome methylation sequencing (RRBS) to enrich CpG regions using restriction endonucleases. However, methylated DNA immunoprecipitation sequencing cannot obtain methylation signals at single-base resolution; it can only determine the presence of methylation in a region by enriching peaks. Therefore, it is less reported and applied to the screening of cancer-related methylated DNA markers.

[0008] However, the bisulfite treatment process involves denaturation, deamination, and desulfonation. DNA is first denatured into single strands, then subjected to extreme temperatures, salt, acidity, and alkalinity, resulting in transformed DNA that is predominantly single-stranded with some double-stranded strands, fragmented, damaged, and uracil-state nucleotides. This process typically leads to the loss of 90% of the DNA template, making a significant amount of methylation information undetectable in subsequent processes. Furthermore, incomplete or over-conversion during base transformation introduces bias, which is amplified by subsequent PCR amplification, resulting in substantial data waste. To reduce WGBS bias, more DNA is needed, the number of PCR cycles needs to be reduced, and the efficiency of the PCR amplification enzyme needs to be optimized. These are the main reasons for the low efficiency of WGBS sequencing in ctDNA methylation labeling screening.

[0009] Therefore, in order to carry out early screening for esophageal cancer, there is an urgent need to develop liquid biopsy DNA samples with high clinical user compliance, such as blood samples, and to explore more efficient and sensitive ctDNA methylation labeling screening technologies other than bisulfite treatment methods, so as to effectively and on a large scale discover new markers that can be used for early screening of esophageal cancer. Summary of the Invention

[0010] One of the technical problems to be solved by the present invention is to provide an application of a combination of methylation markers for esophageal cancer in the preparation of a product for the purpose of screening for esophageal cancer.The methylation marker combination for esophageal cancer is a combination of multiple methylation markers for esophageal cancer, comprising at least 10 regions from the following chromosomal regions (37 chromosomal regions) defined by Hg38 coordinates: chr20:21709322-21709521, chr2:5692940-5693139, chr6:392591-392790, chr10:16520527-16520727, chr13:36431914-36432113, chr2:5696139-5696338, chr2:153477699-153477898, chr2:206 274607-206274808, chr3:194487612-194487813, chr4:20395205-20395404, chr4:16531136-16531335, chr4:20253187-20253386, chr5:414 18626-41418825, chr5:50969510-50969710, chr5:159100465-159100664, chr5:115816316-115816515, chr8:72075503-72075702, chr9:987 07329-98707528、chr13:107867073-107867272、chr10:23199681-23199882、chr11:65833730-65833930、chr11:65833761-65833962、chr11: 43581270-43581471, chr12:4909539-4909740, chr12:24903117-24903316, chr13:107867815-107868014, chr13:91399046-91399245, chr14 :85530051-85530250、chr16:51151107-51151306、chr17:49972810-49973009、chr19:57584000-57584201、chr20:43915976-43916177、chr2 0:21713762-21713963, chr13:112069044-112069243, chr1:110674844-110675042, chr17:74923550-74923750, chr10:17229389-17229588. ;

[0011] In some embodiments, the methylation marker combination for esophageal cancer comprises at least 20 regions from the following chromosomal regions defined by Hg38 coordinates: chr20:21709322-21709521, chr2:5692940-5693139, chr6:392591-392790, chr10:16520527-16520727, chr13:36431914-36432113, chr2:5696139-5696338, chr2:153477699-153477898, chr2:206274607-2062748 08. chr3:194487612-194487813, chr4:20395205-20395404, chr4:16531136-16531335, chr4:20253187-20253386, chr5:41418626-41418 825, chr5:50969510-50969710, chr5:159100465-159100664, chr5:115816316-115816515, chr8:72075503-72075702, chr9:98707329-987 07528, chr13:107867073-107867272, chr10:23199681-23199882, chr11:65833730-65833930, chr11:65833761-65833962, chr11:435812 70-43581471, chr12:4909539-4909740, chr12:24903117-24903316, chr13:107867815-107868014, chr13:91399046-91399245, chr14:855 30051-85530250, chr16:51151107-51151306, chr17:49972810-49973009, chr19:57584000-57584201, chr20:43915976-43916177, chr20: 21713762-21713963, chr13:112069044-112069243, chr1:110674844-110675042, chr17:74923550-74923750, chr10:17229389-17229588.

[0012] In some embodiments, the methylation marker combination for esophageal cancer comprises the following chromosomal regions defined by Hg38 coordinates: chr20:21709322-21709521, chr2:5692940-5693139, chr6:392591-392790, chr10:16520527-16520727, chr13:36431914-36432113, chr2:5696139-5696338, chr2:153477699-153477898, chr2:206274607-206274808, chr3:194487612-194487813, chr4 :20395205-20395404, chr4:16531136-16531335, chr4:20253187-20253386 , chr5:41418626-41418825, chr5:50969510-50969710, chr5:159100465-15 9100664, chr5:115816316-115816515, chr8:72075503-72075702, chr9:987 07329-98707528, chr13:107867073-107867272, chr10:23199681-23199882.

[0013] In some implementations, the application includes the following steps:

[0014] A1. The sample to be tested was treated with methylation immunoprecipitation, and then the methylation sequencing results of the sample to be tested were obtained by liquid phase hybridization capture and high-throughput sequencing.

[0015] A2. Input the methylation sequencing results of the sample to be tested obtained in step A1 into the esophageal cancer binary classification model to obtain the esophageal cancer screening results;

[0016] The method for constructing the esophageal cancer binary classification model involves selecting a random forest model, an extra-tree model, or a K-nearest neighbor model for model construction. The model is obtained through training on a training set, testing on a test set, and evaluation on an independent validation set. The data type for training, testing, and validation is the relative methylation matrix of each sample from the esophageal cancer group and the normal healthy and benign lesion groups. The relative methylation matrix consists of the relative methylation level of each methylation marker in the esophageal cancer methylation marker combination. The relative methylation level of each methylation marker in the esophageal cancer methylation marker combination is calculated using Formula I.

[0017]

[0018] y is the relative methylation level of a single methylation marker, x is the number of sequencing reads within that methylation marker, z is the number of CpG sites in that methylation marker, and n is the number of methylation markers in the methylation marker combination for esophageal cancer.

[0019] In some implementations, the sample is a tissue DNA sample or a liquid biopsy DNA sample. Liquid biopsy DNA samples include plasma cfDNA (circulating cell-free DNA) samples, leukocyte gDNA (genomic DNA) samples, or urine utDNA (urine tumor-derived DNA) samples, preferably plasma cfDNA samples.

[0020] In some implementations, the application uses the Lazy predict package to score the RandomForest Classifier, ExtraTrees Classifier, and Kneighbors Classifier, considering metrics such as AUC, F1-Score, and recall, and selects a model from these scores.

[0021] The second technical problem to be solved by the present invention is to provide a kit for screening esophageal cancer; the kit contains a detection reagent of the combination of methylation markers for esophageal cancer as described above.

[0022] The third technical problem to be solved by this invention is to provide a system for screening esophageal cancer; the system includes:

[0023] B1. Reagents and / or instruments used for immunoprecipitation sequencing of methylated DNA;

[0024] B2. An apparatus for establishing a binary classification model for esophageal cancer and determining whether a sample to be tested has esophageal cancer using the binary classification model. The binary classification model for esophageal cancer is constructed using the following method: a random forest model, an extra-tree model, or a K-nearest neighbor model is selected for constructing the binary classification model; the binary classification model for esophageal cancer is obtained through training on a training set, testing on a test set, and evaluation on an independent validation set; the data type for training, testing, and validation is the relative methylation matrix of each sample in the esophageal cancer group and the normal healthy and benign lesion groups; the relative methylation matrix consists of the relative methylation level of each methylation marker in the methylation marker combination for esophageal cancer; the relative methylation level of each methylation marker in the methylation marker combination for esophageal cancer is calculated using Formula I.

[0025]

[0026] y is the relative methylation level of a single methylation marker, x is the number of sequencing reads within that methylation marker, z is the number of CpG sites in that methylation marker, and n is the number of methylation markers in the methylation marker combination for esophageal cancer.

[0027] The methylation marker combination for esophageal cancer comprises at least 10 regions from the following chromosomal regions defined by Hg38 coordinates: chr20:21709322-21709521, chr2:5692940-5693139, chr6:392591-392790, chr10:16520527-16520727, chr13:36431914-36432113, chr2:5696139-5696338, chr2:153477699-153477898, chr2:206274607-206274808, chr3 :194487612-194487813、chr4:20395205-20395404、chr4:16531136-16531335、chr4:20253187-20253386、chr5:41418626-41418825、ch r5:50969510-50969710, chr5:159100465-159100664, chr5:115816316-115816515, chr8:72075503-72075702, chr9:98707329-98707528 , chr13:107867073-107867272, chr10:23199681-23199882, chr11:65833730-65833930, chr11:65833761-65833962, chr11:43581270-4 3581471, chr12:4909539-4909740, chr12:24903117-24903316, chr13:107867815-107868014, chr13:91399046-91399245, chr14:855300 51-85530250, chr16:51151107-51151306, chr17:49972810-49973009, chr19:57584000-57584201, chr20:43915976-43916177, chr20:2 1713762-21713963, chr13:112069044-112069243, chr1:110674844-110675042, chr17:74923550-74923750, chr10:17229389-17229588.

[0028] In some embodiments, the methylation marker combination for esophageal cancer comprises at least 20 regions from the following chromosomal regions defined by Hg38 coordinates: chr20:21709322-21709521, chr2:5692940-5693139, chr6:392591-392790, chr10:16520527-16520727, chr13:36431914-36432113, chr2:5696139-5696338, chr2:153477699-153477898, chr2:206274607-2062748 08. chr3:194487612-194487813, chr4:20395205-20395404, chr4:16531136-16531335, chr4:20253187-20253386, chr5:41418626-41418 825, chr5:50969510-50969710, chr5:159100465-159100664, chr5:115816316-115816515, chr8:72075503-72075702, chr9:98707329-987 07528, chr13:107867073-107867272, chr10:23199681-23199882, chr11:65833730-65833930, chr11:65833761-65833962, chr11:435812 70-43581471, chr12:4909539-4909740, chr12:24903117-24903316, chr13:107867815-107868014, chr13:91399046-91399245, chr14:855 30051-85530250, chr16:51151107-51151306, chr17:49972810-49973009, chr19:57584000-57584201, chr20:43915976-43916177, chr20: 21713762-21713963, chr13:112069044-112069243, chr1:110674844-110675042, chr17:74923550-74923750, chr10:17229389-17229588.

[0029] In some embodiments, the methylation marker combination for esophageal cancer comprises the following chromosomal regions defined by Hg38 coordinates: chr20:21709322-21709521, chr2:5692940-5693139, chr6:392591-392790, chr10:16520527-16520727, chr13:36431914-36432113, chr2:5696139-5696338, chr2:153477699-153477898, chr2:206274607-206274808, chr3:194487612-194487813, chr4 :20395205-20395404, chr4:16531136-16531335, chr4:20253187-20253386 , chr5:41418626-41418825, chr5:50969510-50969710, chr5:159100465-15 9100664, chr5:115816316-115816515, chr8:72075503-72075702, chr9:987 07329-98707528, chr13:107867073-107867272, chr10:23199681-23199882.

[0030] The fourth technical problem to be solved by this invention is to provide a method for constructing a binary classification model for esophageal cancer. This method selects a random forest model, an extra-tree model, or a K-nearest neighbor model for constructing the binary classification model. The esophageal cancer binary classification model is obtained through training on a training set, testing on a test set, and evaluation on an independent validation set. The data type for training, testing, and validation is the relative methylation matrix of each sample in the esophageal cancer group and the normal healthy and benign lesion groups. The relative methylation matrix consists of the relative methylation level of each methylation marker in the methylation marker combination for esophageal cancer. The relative methylation level of each methylation marker in the esophageal cancer methylation marker combination is calculated using Formula I.

[0031]

[0032] y is the relative methylation level of a single methylation marker, x is the number of sequencing reads within that methylation marker, z is the number of CpG sites in that methylation marker, and n is the number of methylation markers in the methylation marker combination for esophageal cancer.

[0033] The methylation marker combination for esophageal cancer comprises at least 10 regions from the following chromosomal regions defined by Hg38 coordinates: chr20:21709322-21709521, chr2:5692940-5693139, chr6:392591-392790, chr10:16520527-16520727, chr13:36431914-36432113, chr2:5696139-5696338, chr2:153477699-153477898, chr2:206274607-206274808, chr3 :194487612-194487813、chr4:20395205-20395404、chr4:16531136-16531335、chr4:20253187-20253386、chr5:41418626-41418825、ch r5:50969510-50969710, chr5:159100465-159100664, chr5:115816316-115816515, chr8:72075503-72075702, chr9:98707329-98707528 , chr13:107867073-107867272, chr10:23199681-23199882, chr11:65833730-65833930, chr11:65833761-65833962, chr11:43581270-4 3581471, chr12:4909539-4909740, chr12:24903117-24903316, chr13:107867815-107868014, chr13:91399046-91399245, chr14:855300 51-85530250, chr16:51151107-51151306, chr17:49972810-49973009, chr19:57584000-57584201, chr20:43915976-43916177, chr20:2 1713762-21713963, chr13:112069044-112069243, chr1:110674844-110675042, chr17:74923550-74923750, chr10:17229389-17229588.

[0034] In some embodiments, the methylation marker combination for esophageal cancer comprises at least 20 regions from the following chromosomal regions defined by Hg38 coordinates: chr20:21709322-21709521, chr2:5692940-5693139, chr6:392591-392790, chr10:16520527-16520727, chr13:36431914-36432113, chr2:5696139-5696338, chr2:153477699-153477898, chr2:206274607-2062748 08. chr3:194487612-194487813, chr4:20395205-20395404, chr4:16531136-16531335, chr4:20253187-20253386, chr5:41418626-41418 825, chr5:50969510-50969710, chr5:159100465-159100664, chr5:115816316-115816515, chr8:72075503-72075702, chr9:98707329-987 07528, chr13:107867073-107867272, chr10:23199681-23199882, chr11:65833730-65833930, chr11:65833761-65833962, chr11:435812 70-43581471, chr12:4909539-4909740, chr12:24903117-24903316, chr13:107867815-107868014, chr13:91399046-91399245, chr14:855 30051-85530250, chr16:51151107-51151306, chr17:49972810-49973009, chr19:57584000-57584201, chr20:43915976-43916177, chr20: 21713762-21713963, chr13:112069044-112069243, chr1:110674844-110675042, chr17:74923550-74923750, chr10:17229389-17229588.

[0035] In some embodiments, the methylation marker combination for esophageal cancer comprises the following chromosomal regions defined by Hg38 coordinates: chr20:21709322-21709521, chr2:5692940-5693139, chr6:392591-392790, chr10:16520527-16520727, chr13:36431914-36432113, chr2:5696139-5696338, chr2:153477699-153477898, chr2:206274607-206274808, chr3:194487612-194487813, chr4 :20395205-20395404, chr4:16531136-16531335, chr4:20253187-20253386 , chr5:41418626-41418825, chr5:50969510-50969710, chr5:159100465-15 9100664, chr5:115816316-115816515, chr8:72075503-72075702, chr9:987 07329-98707528, chr13:107867073-107867272, chr10:23199681-23199882.

[0036] The fifth technical problem to be solved by the present invention is to provide an electronic device, comprising: a processor and a memory, wherein the processor is connected to the memory;

[0037] The memory is used to store the computer program of the processor;

[0038] The processor is configured to implement the method for constructing the esophageal cancer binary classification model as described above by executing the computer program.

[0039] The sixth technical problem to be solved by the present invention is to provide a computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the method for constructing the esophageal cancer binary classification model as described above.

[0040] The seventh technical problem to be solved by this invention is to provide a methylation marker combination for esophageal cancer, comprising at least 10 regions from the following chromosomal regions defined by Hg38 coordinates: chr20:21709322-21709521, chr2:5692940-5693139, chr6:392591-392790, chr10:16520527-16520727, chr13:36431914-36432113, chr2:5696139-5696338, chr2:153477699-153477898, chr2:206274607-2 06274808, chr3:194487612-194487813, chr4:20395205-20395404, chr4:16531136-16531335, chr4:20253187-20253386, chr5:41418626- 41418825, chr5:50969510-50969710, chr5:159100465-159100664, chr5:115816316-115816515, chr8:72075503-72075702, chr9:98707329 -98707528, chr13:107867073-107867272, chr10:23199681-23199882, chr11:65833730-65833930, chr11:65833761-65833962, chr11:435 81270-43581471, chr12:4909539-4909740, chr12:24903117-24903316, chr13:107867815-107868014, chr13:91399046-91399245, chr14:8 5530051-85530250, chr16:51151107-51151306, chr17:49972810-49973009, chr19:57584000-57584201, chr20:43915976-43916177, chr20 :21713762-21713963, chr13:112069044-112069243, chr1:110674844-110675042, chr17:74923550-74923750, chr10:17229389-17229588.

[0041] In some embodiments, the methylation marker combination for esophageal cancer comprises at least 20 regions from the following chromosomal regions defined by Hg38 coordinates: chr20:21709322-21709521, chr2:5692940-5693139, chr6:392591-392790, chr10:16520527-16520727, chr13:36431914-36432113, chr2:5696139-5696338, chr2:153477699-153477898, chr2:206274607-2062748 08. chr3:194487612-194487813, chr4:20395205-20395404, chr4:16531136-16531335, chr4:20253187-20253386, chr5:41418626-41418 825, chr5:50969510-50969710, chr5:159100465-159100664, chr5:115816316-115816515, chr8:72075503-72075702, chr9:98707329-987 07528, chr13:107867073-107867272, chr10:23199681-23199882, chr11:65833730-65833930, chr11:65833761-65833962, chr11:435812 70-43581471, chr12:4909539-4909740, chr12:24903117-24903316, chr13:107867815-107868014, chr13:91399046-91399245, chr14:855 30051-85530250, chr16:51151107-51151306, chr17:49972810-49973009, chr19:57584000-57584201, chr20:43915976-43916177, chr20: 21713762-21713963, chr13:112069044-112069243, chr1:110674844-110675042, chr17:74923550-74923750, chr10:17229389-17229588.

[0042] In some embodiments, the methylation marker combination for esophageal cancer comprises the following chromosomal regions defined by Hg38 coordinates: chr20:21709322-21709521, chr2:5692940-5693139, chr6:392591-392790, chr10:16520527-16520727, chr13:36431914-36432113, chr2:5696139-5696338, chr2:153477699-153477898, chr2:206274607-206274808, chr3:194487612-194487813, chr4 :20395205-20395404, chr4:16531136-16531335, chr4:20253187-20253386 , chr5:41418626-41418825, chr5:50969510-50969710, chr5:159100465-15 9100664, chr5:115816316-115816515, chr8:72075503-72075702, chr9:987 07329-98707528, chr13:107867073-107867272, chr10:23199681-23199882.

[0043] The eighth technical problem to be solved by the present invention is to provide a nucleic acid fragment combination, wherein the nucleic acid fragment combination records nucleic acid sequence information encoding the combination of methylation markers for esophageal cancer as described above in human blood samples or tissue samples.

[0044] The advantages of this invention over the prior art are as follows:

[0045] The combination of methylation markers for esophageal cancer provided by this invention has high sensitivity for esophageal cancer and can be used for early screening of esophageal cancer risk and screening for esophageal cancer.

[0046] 2. This invention can obtain cfDNA methylation sequencing results through methylated DNA immunoprecipitation sequencing technology, and use the esophageal cancer binary classification model for esophageal cancer screening. It has high sensitivity and good specificity, and only requires blood samples, resulting in high patient compliance and thus has higher clinical application value.

[0047] 3. The system, esophageal cancer binary classification model construction method, electronic device, and esophageal cancer binary classification model in the computer-readable storage medium provided by the present invention adopt the relative methylation level obtained by Formula I and form a matrix, which can realize the classification prediction of normal healthy people (including people with benign lesions) and esophageal cancer people, and has high sensitivity for esophageal cancer, providing a new technical solution for esophageal cancer screening.

[0048] The following will further explain the concept, specific structure, and technical effects of the present invention in conjunction with the accompanying drawings, so as to fully understand the purpose, features, and effects of the present invention. Attached Figure Description

[0049] Figure 1. To more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0050] Figure 1A This is an IGV visualization of some characteristic methylation differential regions, showing the differences in the PAX1 gene between esophageal cancer samples and normal healthy and benign lesion samples. The first eight samples are from the esophageal cancer group; the last eight samples are from the normal healthy and benign lesion groups.

[0051] Figure 1B This is a visualization of IGV in some characteristic methylation differential regions, showing the differences in the SOX11 gene between esophageal cancer samples and normal healthy and benign lesion samples. The first eight samples are from the esophageal cancer group; the last eight samples are from the normal healthy and benign lesion groups.

[0052] Figure 1C This is a visualization of IGV in some characteristic methylation differential regions, showing the differences in the IRF4 gene between esophageal cancer samples and normal healthy and benign lesion samples. The first eight samples are from the esophageal cancer group; the last eight samples are from the normal healthy and benign lesion groups.

[0053] Figure 1D This is an IGV visualization of some characteristic methylation differential regions, showing the differences in the C1QL3 gene between esophageal cancer samples and normal healthy and benign lesion samples. The first eight samples are from the esophageal cancer group; the last eight samples are from the normal healthy and benign lesion groups.

[0054] Figure 1EThis is an IGV visualization of some characteristic methylation differential regions, showing the differences in the CCNA1 gene between esophageal cancer samples and normal healthy and benign lesion samples. The first eight samples are from the esophageal cancer group; the last eight samples are from the normal healthy and benign lesion groups.

[0055] Figure 2 This is the ROC curve of the training / test set of the esophageal cancer binary classification model (based on 37 differential regions) constructed in Example 3, which is the ROC curve to distinguish the esophageal cancer group from the normal healthy group and the benign lesion group.

[0056] Figure 3 This is the ROC curve of the validation set of the esophageal cancer binary classification model (based on 37 differential regions) constructed in Example 3, which is the ROC curve for distinguishing the esophageal cancer group from the normal healthy group and the benign lesion group.

[0057] Figure 4 This is a flowchart of the detection process in Comparative Example 1, which uses qPCR technology to analyze the specificity and sensitivity of methylated DNA by immunoprecipitation and bisulfite conversion.

[0058] Figure 5A The above are qPCR amplification curves for detecting the PAX1 gene after treating the same sample using methylated DNA immunoprecipitation and bisulfite conversion methods, respectively, in Comparative Example 1.

[0059] Figure 5B The figures for Comparative Example 1 are qPCR amplification curves for detecting the SOX11 gene after the same sample was treated with methylated DNA immunoprecipitation and bisulfite conversion methods, respectively.

[0060] Figure 5C The above are qPCR amplification curves for detecting the IRF4 gene after treating the same sample using methylated DNA immunoprecipitation and bisulfite conversion methods, respectively, in Comparative Example 1.

[0061] Figure 5D The above are qPCR amplification curves for detecting the C1QL3 gene after the same sample was treated with methylated DNA immunoprecipitation and bisulfite conversion methods, respectively, in Comparative Example 1.

[0062] Figure 5E The figures for Comparative Example 1 are qPCR amplification curves for detecting the CCNA1 gene after the same sample was treated with methylated DNA immunoprecipitation and bisulfite conversion methods, respectively. Detailed Implementation

[0063] To facilitate understanding by those skilled in the art, some terms appearing in this document are explained and clarified.

[0064] In this document, the singular forms “an,” “an,” and “the” include their plural forms unless the context otherwise requires. Thus, for example, “an agent” can be understood to include multiple agent components.

[0065] In this document, unless otherwise stated, the terms “comprising,” “including,” or “containing” mean that the listed values, steps, or ingredients are included, but do not exclude the inclusion of other values, steps, or ingredients.

[0066] In this document, the terms "individual" or "patient" are used interchangeably and refer to a vertebrate, preferably a mammal. A mammal may be a human, a non-human primate, a mouse, a rat, a dog, a cat, a horse, or a cow, but is not limited to these examples.

[0067] According to embodiments of the present invention, the methylation biomarker for esophageal cancer refers to a chromosomal region capable of detecting or diagnosing whether a subject has the disease. It is a nucleic acid fragment of a certain length and is a product rather than simply information. The terms "methylation biomarker," "characteristic methylation differential region," and "target biomarker" have the same meaning, referring to a methylation level indicating that the subject has the disease. It should be understood that it can include all transcriptomorphs of the genes described herein and all their promoters and regulatory elements. Furthermore, it should be understood that the term "methylation biomarker" should include both the sense and antisense strand sequences of the biomarker or gene.

[0068] In some embodiments of the present invention, the target biomarkers also include various variants of the aforementioned genes. Variants include nucleic acid sequences from the same region that have at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity with the genes or regions described herein (i.e., having one or more deletions, insertions, substitutions, reverse sequences, etc.). Therefore, the scope of the present invention should be understood to extend to such variants that achieve the same results, although in reality, the actual nucleic acid sequences between individuals exhibit minor genetic variation. The term "methylation biomarker" as used is broadly interpreted to include both 1) the original biomarker found in a biological sample or genomic DNA (in a specific methylation state) and 2) its processed sequences (e.g., the corresponding region after immunoprecipitation treatment).

[0069] In this article, “normal healthy” samples in the normal healthy and benign lesion group refer to samples of the same type isolated from individuals known to be free of the aforementioned cancer, tumor, polyp, or adenoma.

[0070] The term "adenoma" refers to a benign tumor originating from a gland. While these growths are benign, they can progress to become malignant over time. The "site" of a polyp, adenoma, cancer, etc., refers to the tissue, organ, cell type, anatomical region, or body part in which the polyp, adenoma, cancer, etc., is located in the subject's body.

[0071] In this article, the "benign lesion" samples in the normal healthy and benign lesion groups include samples of chronic esophagitis, Barrett's esophagus, esophageal leukoplakia, esophageal diverticulum, achalasia, reflux esophagitis, and benign esophageal strictures caused by various reasons.

[0072] The term "AUC" is an abbreviation for "area under the curve." Specifically, it refers to the area under the receiver operating characteristic (ROC) curve. An ROC curve is a plot of the true positive rate versus the false positive rate for different possible cutoff points in a diagnostic test. It shows the balance between sensitivity and specificity based on the selected cutoff point (any increase in sensitivity will be accompanied by a decrease in specificity). The area under the ROC curve (AUC) is a measure of a diagnostic test (the larger the area, the better; optimal is 1; randomized trials will have an ROC curve with an area of ​​0.5 on the diagonal; see: J. Pegan. (1975) Signal Detection Theory and ROC Analysis, Academic Press, New York).

[0073] The present invention will be explained below with reference to embodiments. Those skilled in the art will understand that the following embodiments are for illustrative purposes only and should not be considered as limiting the scope of the invention. Where specific techniques or conditions are not specified in the embodiments, they are performed according to the techniques or conditions described in the literature in the field or according to the product instructions. Reagents or instruments whose manufacturers are not specified are all conventional products that can be obtained commercially.

[0074] In the following embodiments, the material sources and pretreatment methods are as follows:

[0075] 1. Sample Collection: Blood and tissue samples were collected from subjects with esophageal cancer, benign esophageal diseases, and normal esophagus from multiple clinical hospitals in different regions of China. Tissue samples were collected simultaneously, ideally including both adjacent lesions and normal tissue. Fresh cancerous tissue was preferred, but paraffin-embedded tissue sections could also be used.

[0076] 2. Sample Pretreatment: Pretreatment of circulating cell-free DNA (cfDNA) in blood samples and DNA in tissue samples can employ techniques commonly used in the field for subsequent library preparation. For example, a 10 mL blood sample from a patient can be centrifuged twice: first at 4°C and 600 xg for 20 min, and then at 16,000 xg for 10 min to obtain plasma. 1–2 mL of plasma can be used to extract cell-free DNA using the VAHTS Free-Circulating DNA Maxi Kit (Cat. No. N903-03, Vazyme) to obtain cfDNA. The concentration of cfDNA can be quantified using Qubit 4.0. The cfDNA yield per sample should not be less than 5 ng. Alternatively, tissue can be scraped from a slide, and genomic DNA can be extracted (using a tissue genomic DNA extraction kit, Tiangen Biotech (Beijing) Co., Ltd.). The DNA concentration can be quantified using Qubit 4.0, and the DNA yield per sample should not be less than 200 ng.

[0077] Example 1: DNA sequencing to obtain methylation sequencing results of the sample to be tested.

[0078] (1) The DNA in the sample to be tested was treated by methylation immunoprecipitation, and then the methylation sequencing results of the sample to be tested were obtained by liquid phase hybridization capture and high-throughput sequencing.

[0079] Blood samples were collected from 112 patients with esophageal cancer, 30 patients with high-grade esophageal intraepithelial neoplasia, and 351 patients with normal esophageal health and benign lesions, totaling 493 samples. Fifty-six pairs of cancerous and adjacent tissues were collected from esophageal cancer patients, 16 tissue samples from high-grade esophageal intraepithelial neoplasia, and 65 healthy benign tissue samples, totaling 137 samples.

[0080] 1.1 Construction of methylated DNA enrichment library

[0081] DNA libraries were constructed using a library construction kit for the Illumina high-throughput sequencing platform (e.g., VAHTS Universal Pro DNA Library Prep Kit for Illumina, Cat. No. ND608-02, Vazyme). Methylated fragments were enriched with antibodies using a methylated DNA enrichment kit (e.g., MagMeDIP kit, Cat. No. C02010021, Diagenode). Fragment purification and PCR amplification were then performed using a DNA purification kit (e.g., IPure Kit v2, Diagenode) and a library construction kit for the Illumina high-throughput sequencing platform (e.g., VAHTS Universal Pro DNA Library Prep Kit for Illumina, Cat. No. ND608-02, Vazyme). The number of cycles was generally 9-11, not exceeding 15. After amplification, the product was purified using magnetic beads of equal volume (e.g., VAHTSDNA Clean Beads, Cat.No.N411-03, Vazyme). A MeDIP methylated enriched library was obtained based on methylation immunoprecipitation (MeDIP) technology.

[0082] The preferred methylation immunoprecipitation method is the 5-methylated cytosine antibody method. Preferably, equal volumes of esophageal cancer sample libraries and normal sample libraries are added to the same immunoprecipitation reaction; alternatively, samples with unknown clinical outcomes can be added to the same immunoprecipitation reaction. The total amount of DNA library input for each immunoprecipitation reaction should be in the range of 10 ng to 1000 ng, allowing for the immunoprecipitation reaction of a single DNA library sample or up to 100 DNA libraries. The preferred DNA library denaturation conditions are 95 degrees Celsius for 10 minutes.

[0083] 1.2 Liquid-phase hybridization capture

[0084] Using a hybridization capture kit (e.g., Hybrid Capture Reagents (Cat. No. REF1005101, Naonda) were used for liquid-phase hybridization capture: 500 ng of methylated enriched library was taken, and human cot DNA and blocking sequences (e.g., human cot DNA) were added. Nano Blockers (Nano-Blockers) were added to the hybridization reaction solution (containing liquid-phase hybridization capture probes) and hybridized for 4-16 hours at 95℃ / 30sec; 65℃ / Hold (100℃ hot-cover). Then, streptavidin magnetic beads were added to the hybridization system and incubated for 40 minutes, vortexing every 10 minutes to ensure complete resuspension of the magnetic beads. The bound magnetic beads were washed, and residual solution was discarded at each step. Finally, 20 μL of nuclease-free water was added and gently vortexed to mix.

[0085] The hybridization capture reaction temperature is 65°C, instead of the 63°C required for methylation probes designed based on bisulfite conversion. Preferably, the liquid-phase hybridization capture probes are designed to fully cover the target region without gaps or overlap, and each probe is 120 nt in length; the liquid-phase hybridization capture probes are 5' biotin-modified oligonucleotides. Preferably, seven endogenous human fragments (for assessing gene content in the sample) and two exogenous fragments, pUC19 and λDNA (for assessing method effectiveness), are added as internal standards during the liquid-phase hybridization capture process. Hybridization capture reactions can be single-hybrid or multi-hybrid, and the total amount of MeDIP amplification library input for each hybridization capture reaction should range from 300 ng to 8 μg.

[0086] 1.3 High-throughput sequencing

[0087] The hybridization capture products were amplified and purified by PCR using kits (e.g., VAHTS Universal Pro DNA Library Prep Kit for Illumina, Cat. No. ND608-02; VAHTS DNA Clean Beads, Cat. No. N411-03, Vazyme). The library concentration was diluted to 4 nM and mixed according to the required data volume, with the total data volume not exceeding 120 G. After mixing, 5 μL of the library was taken out, and 5 μL of 0.2 N NaOH was added. The mixture was pipetted and incubated for 5 minutes. Immediately after incubation, 990 μL of hybridization reaction solution (HT1 Buffer, REF: 15058251, Illumina) was added. After vortexing, 105 μL was taken out, and 1295 μL of HT1 Buffer was added. After vortexing, the resulting library was ready for PCR testing at a concentration of 1.5 pM.

[0088] The sequencer used was an Illumina NextSeq 550Dx. Reagents used included High Output Reagent Cartridge v2 (REF:15057929, Illumina) (300 cycles), High Output Flow Cell Cartridge v2.5 (REF:20022408, Illumina), and Buffer Cartridge v2 (REF:15057941, Illumina). 1300 μL of the library was added to the sample space of the High Output Reagent Cartridge v2, and each reagent was added sequentially. Sequencing could then begin. Paired-end sequencing was used, and the total sequencing time was approximately 30 hours.

[0089] Fastp (version 0.22.0) was used to perform quality control on the sequencing data, removing low-quality bases. The overall Q20 of the clean data was above 90%, Q30 was above 85%, and the average sequencing depth was around 300x.

[0090] The above steps yield the methylation sequencing results of the sample to be tested.

[0091] Example 2 Screening of methylation markers

[0092] The methylation sequencing results of the obtained samples were compared with the human gene sequence (Hg38) for sequence alignment analysis, peak detection of methylation enrichment regions, and screening for differentially methylated regions in esophageal cancer. The DiffBind tool (version 3.8.4) was used to screen for differential peaks in esophageal cancer and healthy individuals. Ideally, the differentially methylated regions selected from blood and tissue samples were further intersected, and then analyzed to obtain methylation markers for esophageal cancer. Specific operational steps included:

[0093] 1. The blood sample data were grouped, with esophageal cancer as one group, denoted as the POS group, and normal healthy individuals and benign lesions as another group, denoted as the NEG group.

[0094] 2. Use the DiffBind toolkit to analyze the grouped data, with the peak length set to 200.

[0095] 3. Using the DESeq2 and edgeR algorithms of the DiffBind tool, find the difference peaks between the blood sample (cfDNA) and the normal blood sample or the human genome (Hg38), respectively, to obtain the methylation difference region of the blood sample. The intersection of the methylation difference regions obtained by the two algorithms (i.e. the difference peak common to both algorithms) is named S1.

[0096] 4. Group the tissue sample data and repeat steps 1-3 to obtain the intersection of the two algorithms for the methylation difference regions of the tissue samples, named S2;

[0097] 5. Use the intersect function in the bedtools program to find the intersection of the results of steps 3 and 4, i.e., S1∩S2=S, to obtain the differentially methylated regions of blood samples with tissue sample support, and name them S;

[0098] 6. Statistically count the CpG sites in the differentially methylated regions of blood samples with tissue sample support from step 5, and obtain the number of CpG sites for each S.

[0099] 7. Annotate the differentially methylated regions of blood samples with tissue sample support obtained in step 5, calculate the relative methylation levels (RML) value of each S, and obtain a list of related genes.

[0100]

[0101] y represents the relative methylation level of a single S, x represents the number of sequencing reads within that S, z represents the number of CpG sites in that S, and n represents the number of S. The denominator is represented by the product of z and the sum of the number of sequencing reads for all S, which helps to normalize the influence of CG.

[0102] Based on the obtained RML values, the test samples are grouped and labeled. Specifically, the esophageal cancer group is labeled with 1, and the normal healthy and benign lesion groups are labeled with 0. The samples are randomly grouped according to a training set:test set ratio of 8:2, and an RML matrix is ​​constructed.

[0103] The Feature Selector package is used to rank the importance of n S's, and several S's with significant differences are selected as methylation biomarkers. Specifically, the ranking of the importance of n S's using the Feature Selector package includes using Feature Select to score and rank the importance of the S's, constructing a cumulative importance curve, and selecting several S's with significant differences as methylation biomarkers for esophageal cancer.

[0104] The methylation sequencing results of the samples obtained in Example 1 were sorted by importance according to the steps described above. The Top 37 regions were selected as characteristic methylation difference regions, which serve as methylation markers for esophageal cancer (significant differences in methylation levels exist between normal healthy and benign lesion samples and esophageal cancer samples). Their Hg38 coordinates in the Human Genome Database are listed in Table 1. The IGV visualization results of the PAX1, SOX11, IRF4, C1QL3, and CCNA1 genes in the characteristic methylation difference regions are shown in the figure below. Figures 1A to 1E .

[0105] Table 1. 37 Characteristic Regions of Methylation Difference

[0106]

[0107]

[0108] Example 3: Construction and Validation of a Binary Classification Model (Based on 37 Characteristic Differential Methylation Regions)

[0109] Based on actual needs, multiple methylation biomarkers of esophageal cancer (i.e., the characteristic methylation difference regions screened in this invention) are selected as biomarkers for constructing a binary classification model. The training set and test set are split, and binary classification models are constructed using random forest model, extra tree classifier, K-nearest neighbor model, etc.

[0110] The construction of a binary classification model includes the following steps:

[0111] Step 1. Using the methylation biomarkers for esophageal cancer screened by this invention (37 methylation biomarkers screened in Example 2 in this embodiment) as candidate biomarkers for constructing a binary classification model, the FeatureSelector package is used to score the importance of the 37 candidate biomarkers, constructing a cumulative importance curve, and selecting the methylation biomarkers with significant differences. In this embodiment, all 37 candidate biomarkers are selected as methylation biomarkers with significant differences for use in the following steps of this embodiment.

[0112] Step 2. Use the Lazy predict package to score the random forest model, extra tree model, and K-nearest neighbor model, and select one model from the scores. When using the Lazy predict package for scoring, the metrics considered include AUC value, F1-Score, recall rate, etc. In this embodiment, Step 2 selects the random forest model for the construction of the binary classification model based on the scores.

[0113] Step 3. Use the multiple significantly different methylation markers selected in Step 1 and the model selected in Step 2 to build a model; perform 10-fold cross-validation, and further screen the methylation markers based on the changes in AUC values. In this embodiment, the initial AUC value reaches 0.78. If the AUC value decreases, the candidate marker is deleted; if the AUC value does not decrease, the candidate marker is retained. Then, the model is validated on the test set to establish a binary classification model. Finally, the binary classification model is evaluated on the independent validation set.

[0114] 3.1 Construction of a binary classification model (based on 37 characteristic methylation differential regions)

[0115] 105 cfDNA samples were randomly selected, including 42 esophageal cancer samples and 63 normal healthy and benign lesion samples, to construct a binary classification model.

[0116] The sensitivity and specificity of the binary classification model are shown in Table 2. In the training / test set, the sensitivity for esophageal cancer was 100%, and the specificity for normal healthy individuals and benign lesions was 96.83%. The ROC is as follows: Figure 2 As shown, AUC = 0.98.

[0117] Table 2. Sensitivity and specificity of the binary classification model for screening esophageal cancer, normal healthy individuals, and benign lesions (based on 37 characteristic methylation differential regions) on the training / test sets.

[0118]

[0119] 3.2 Validation of the binary classification model (based on 37 characteristic methylation differential regions)

[0120] Unlike the clinical samples used in Example 3.1, 87 additional cfDNA samples were randomly selected for this validation phase, including 16 cases of esophageal cancer and 71 cases of normal healthy individuals and benign lesions, for independent validation of the binary classification model. Sequencing was performed according to Example 1, or it can be performed using methods well known to those skilled in the art.

[0121] The constructed binary classification model was used for analysis, and the results are shown in Table 3 below. The sensitivity for esophageal cancer was 93.75%, and the specificity for normal esophagus (i.e., normal healthy) and benign lesions was 91.55%. The ROC was as follows: Figure 3 As shown, AUC = 0.95. Table 3 shows the sensitivity and specificity of the independent validation set for the binary classification model (based on 37 characteristic methylation differential regions) used to screen for esophageal cancer, normal healthy individuals, and benign lesions.

[0122]

[0123] Example 4: Construction and Validation of a Binary Classification Model (Based on 20 Characteristic Differential Methylation Regions)

[0124] The model construction steps are the same as in Example 3, except that in step (1), 20 of the 37 characteristic methylation difference regions in Table 1 are randomly selected as methylation markers. The Hg38 coordinates of the 20 characteristic methylation difference regions selected in this example are shown in Table 4.

[0125] Table 4. List of 20 characteristic methylation difference regions

[0126] No. chr Start End Gene Name 1 chr20 21709322 21709521 PAX1 2 chr2 5692940 5693139 SOX11 3 chr6 392591 392790 IRF4 4 chr10 16520527 16520727 C1QL3 5 chr13 36431914 36432113 CCNA1 6 chr2 5696139 5696338 SOX11(2) 7 chr2 153477699 153477898 RPRM 8 chr2 206274607 206274808 ZDBF2 9 chr3 194487612 194487813 LINC00884 10 chr4 20395205 20395404 SLIT2-IT1 11 chr4 16531136 16531335 TAPT1-AS1 12 chr4 20253187 20253386 SLIT2 13 chr5 41418626 41418825 PLCXD3 14 chr5 50969510 50969710 LINC02106 15 chr5 159100465 159100664 LINC02202 16 chr5 115816316 115816515 CDO1 17 chr8 72075503 72075702 TRPA1 18 chr9 98707329 98707528 GABBR2 19 chr13 107867073 107867272 FAM155A 20 chr10 23199681 23199882 PTF1A

[0127] 4.1 Construction of a binary classification model (based on 20 characteristic methylation differential regions)

[0128] 105 cfDNA samples were randomly selected, including 42 esophageal cancer samples and 63 normal healthy and benign lesion samples, to construct a binary classification model.

[0129] The sensitivity and specificity of the binary classification model are shown in Table 5. In the training / test set, the sensitivity for esophageal cancer is 100%, and the specificity for normal healthy individuals and benign lesions is 95.23%.

[0130] Table 5. Sensitivity and specificity of the binary classification model for esophageal cancer screening (based on 20 characteristic methylation differential regions) on the training / test sets.

[0131]

[0132]

[0133] 4.2 Validation of the binary classification model (based on 20 characteristic methylation differential regions)

[0134] Unlike the clinical samples used in Example 4.1, 87 additional cfDNA samples were randomly selected for this validation phase, including 16 cases of esophageal cancer and 71 cases of normal healthy individuals and benign lesions, for independent validation of the binary classification model. Sequencing was performed according to Example 1, or it can be performed using methods well known to those skilled in the art.

[0135] The analysis was performed using the constructed binary classification model, and the results are shown in Table 6 below. The sensitivity for esophageal cancer was 93.75%, and the specificity for normal esophageal and benign diseases was 92.96%.

[0136] Table 6. Sensitivity and specificity of the independent validation set for the binary classification model for screening esophageal cancer, normal and benign diseases (based on 20 characteristic methylation differential regions).

[0137]

[0138] Example 5: Construction and Validation of a Binary Classification Model (Based on 10 Characteristic Differential Methylation Regions)

[0139] In this embodiment, the 10 characteristic methylation difference regions selected are the bottom 10 of the 37 characteristic methylation difference regions selected, ranked by importance, in order to evaluate the performance of the selected characteristic methylation difference regions.

[0140] The Hg38 coordinates of the 10 characteristic methylation difference regions selected in this embodiment are shown in Table 7.

[0141] Table 7. List of 10 characteristic methylation difference regions

[0142]

[0143]

[0144] The construction and validation process of the binary classification model based on the 10 characteristic methylation difference regions described above is as described in Example 4. The sensitivity and specificity of the model are shown in Table 8. In the training / test set, the sensitivity for esophageal cancer is 95.24%, and the specificity for normal healthy individuals and benign diseases is 96.83%.

[0145] Table 8. Sensitivity and specificity of the binary classification model for screening esophageal cancer, normal healthy individuals, and benign lesions (based on 10 characteristic methylation differential regions) on the training / test sets.

[0146]

[0147] The constructed binary classification model was validated on 87 samples, and the results are shown in Table 9. The sensitivity for esophageal cancer was 87.5%, and the specificity for normal esophageal and benign diseases was 94.37%.

[0148] Table 9. Sensitivity and specificity of the independent validation set for the binary classification model for screening esophageal cancer, normal healthy individuals, and benign lesions (based on 10 characteristic differentially methylated regions).

[0149]

[0150] Comparative Example 1: Comparison of the MeDIP method and the bisulfite treatment (BS) method

[0151] To further demonstrate the technical advantages of the MeDIP method compared to the BS method, qPCR technology was used to detect and analyze the specificity and sensitivity of the two methods. In this comparative example, 24 plasma samples clinically diagnosed with esophageal cancer (case group) and 48 plasma samples with normal esophagoscopy findings (control group) were selected. In the case group, there were 6 samples of esophageal cancer stages 0-1, 6 samples of stage II, 7 samples of stage III, and 5 samples of stage IV. The methylation gene qPCR detection procedure is as follows: Figure 4 As shown.

[0152] Specifically as follows:

[0153] 1) Methylated DNA was treated using the MeDIP method, with the same procedure as in 1.1 of Example 1.

[0154] 2) BS method for treating methylated DNA

[0155] Methylation and bisulfite treatment of each cfDNA sample was performed using a different reaction, with each cfDNA sample containing approximately 10–100 ng. DNA methylation based on the bisulfite conversion principle can be achieved using commercial kits or self-prepared reagents. The procedure should be followed according to the instructions of the commercial kit, such as using the ZYMO RESEARCH DNA Transformation Kit (EZ DNA Methylation Kit, D5002) for DNA bisulfite modification. The elution volume was 20 μL–50 μL.

[0156] 3) qPCR detection

[0157] Five target genes were selected: PAX1, SOX11, IRF4, C1QL3, and CCNA1. The Taqman MGB probe primer pairs for the MeDIP and BS methods are shown in Table 10 (the fluorescent reporter groups of the probes can be FAM, VIC, JOE, TET, etc., and the quencher groups can be TAMRA, BHQ, etc.). The designed sequences of the probes and primer pairs, as well as the relevant CpG sites, were derived from a portion of the characteristic methylation differential regions screened in Example 2. PCR amplification was performed using enriched or bisulfite-converted DNA as templates, with a final concentration of 10 μM for each primer. The PCR reaction system consisted of 2–15 μL of enriched template DNA, 2.5 μL of premixed solution containing the primers, 17.5 μL of PCR reaction reagents (e.g., 2xRapid Taq Master Mix, Vazyme), and water to a final volume of 35 μL. The PCR reaction conditions were as follows: 95℃ for 5 minutes; 95℃ for 15 seconds, 60℃ for 40 seconds, for 48 cycles of amplification.

[0158] Table 10. Taqman MGB probe primer pair sequences

[0159]

[0160]

[0161] Results analysis:

[0162] Differences in the enrichment effects of the MeDIP method and the BS method on target genes

[0163] The test data were analyzed, and the threshold cycle (Ct value) for samples with no amplification curve or undetermined samples was set to 48. A reference Ct value was set through ROC curve analysis. If the target gene amplification Ct value of the tested sample is equal to or lower than the set reference Ct value, the sample is judged as a positive sample; otherwise, it is judged as a negative sample. Figures 5A to 5E This figure shows qPCR amplification curves of DNA from the same esophageal cancer patient, processed using both MeDIP and BS methods for five target genes. As can be seen from the figure, for the PAX1 gene, the amplification Ct value was 31.61 for MeDIP and 36.73 for BS; for the SOX11 gene, it was 32.4 for MeDIP and 36.93 for BS; for the IRF4 gene, it was 31.7 for MeDIP and 36.35 for BS; for the C1QL3 gene, it was 32.7 for MeDIP and 35.57 for BS; and for the CCNA1 gene, it was 32.51 for MeDIP and 38.2 for BS. Therefore, the MeDIP method resulted in lower amplification Ct values ​​for the target genes compared to the BS method, indicating that MeDIP has a better enrichment effect on the target genes in the sample.

[0164] Differences in validation performance of target genes between the MeDIP method and the BS method

[0165] In this comparative study, 24 plasma samples from patients clinically diagnosed with esophageal cancer (case group) and 48 plasma control samples from patients with normal esophagoscopy findings (control group) were selected. Table 11 shows the performance differences between the MeDIP method and the BS method in qPCR validation of target genes. Specifically, for different target genes, the MeDIP method showed higher sensitivity for esophageal cancer samples than the BS method, higher specificity for non-esophageal cancer samples than the BS method, and higher accuracy. This indicates that the MeDIP method performs better than the BS method in validating target genes.

[0166] In conclusion, the MeDIP method outperforms the BS method in overall performance.

[0167] Table 11. Performance differences between the MeDIP method and the BS method in qPCR gene validation.

[0168]

[0169] Sensitivity is the proportion of samples in the case group that tested positive by the detection method; Specificity is the proportion of samples in the control group that tested negative by the detection method; Accuracy is the number of samples that were correctly detected out of the total number of samples, which includes the sum of the number of samples in the case group that tested positive and the number of samples in the control group that tested negative.

[0170] This document uses specific examples to illustrate the principles and implementation methods of the present invention. The descriptions of the above embodiments are only for the purpose of helping to understand the method and core ideas of the present invention. At the same time, for those skilled in the art, based on the ideas of the present invention, there will be changes in the specific implementation methods and application scope. Modifications and improvements to the present invention are possible without exceeding the concept and scope defined by the claims. Therefore, the content of the embodiments in this specification should not be construed as a limitation of the present invention.

Claims

1. The application of a combination of methylation markers for esophageal cancer in the preparation of a product, characterized in that, The product is intended for screening esophageal cancer; the methylation marker combination for esophageal cancer is a combination of multiple methylation markers for esophageal cancer, comprising at least 10 regions from the following chromosomal regions defined by Hg38 coordinates: chr20:21709322-21709521, chr2:5692940-5693139, chr6:392591-392790, chr10:16520527-16520727, chr13:36431914-36432113, chr2:5696139-5696338, chr2:153477699-153477898, chr2: 206274607-206274808, chr3:194487612-194487813, chr4:20395205-20395404, chr4:16531136-16531335, chr4:20253187-20253386, chr5: 41418626-41418825, chr5:50969510-50969710, chr5:159100465-159100664, chr5:115816316-115816515, chr8:72075503-72075702, chr9:9 8707329-98707528、chr13:107867073-107867272、chr10:23199681-23199882、chr11:65833730-65833930、chr11:65833761-65833962、chr1 1:43581270-43581471, chr12:4909539-4909740, chr12:24903117-24903316, chr13:107867815-107868014, chr13:91399046-91399245, chr1 4:85530051-85530250, chr16:51151107-51151306, chr17:49972810-49973009, chr19:57584000-57584201, chr20:43915976-43916177, chr 20:21713762-21713963, chr13:112069044-112069243, chr1:11067484 4-110675042, chr17:74923550-74923750, chr10:17229389-17229588.

2. The application as described in claim 1, characterized in that, The methylation marker combination for esophageal cancer comprises at least 20 regions from the following chromosomal regions defined by Hg38 coordinates: chr20:21709322-21709521, chr2:5692940-5693139, chr6:392591-392790, chr10:16520527-16520727, chr13:36431914-36432113, chr2:5696139-5696338, chr2:153477699-153477898, chr2:206274607-206274808, chr3 :194487612-194487813、chr4:20395205-20395404、chr4:16531136-16531335、chr4:20253187-20253386、chr5:41418626-41418825、ch r5:50969510-50969710, chr5:159100465-159100664, chr5:115816316-115816515, chr8:72075503-72075702, chr9:98707329-98707528 , chr13:107867073-107867272, chr10:23199681-23199882, chr11:65833730-65833930, chr11:65833761-65833962, chr11:43581270-4 3581471, chr12:4909539-4909740, chr12:24903117-24903316, chr13:107867815-107868014, chr13:91399046-91399245, chr14:855300 51-85530250, chr16:51151107-51151306, chr17:49972810-49973009, chr19:57584000-57584201, chr20:43915976-43916177, chr20:2 1713762-21713963, chr13:112069044-112069243, chr1:110674844-110675042, chr17:74923550-74923750, chr10:17229389-17229588.

3. The application as described in claim 1, characterized in that, The methylation marker combination for esophageal cancer comprises the following chromosomal regions defined by Hg38 coordinates: chr20:21709322-21709521, chr2:5692940-5693139, chr6:392591-392790, chr10:16520527-16520727, chr13:36431914-36432113, chr2:5696139-5696338, chr2:153477699-153477898, chr2:206274607-206274808, chr3:194487612-194487813, chr4:2039 5205-20395404, chr4:16531136-16531335, chr4:20253187-20253386, chr 5:41418626-41418825, chr5:50969510-50969710, chr5:159100465-15910 0664, chr5:115816316-115816515, chr8:72075503-72075702, chr9:98707 329-98707528, chr13:107867073-107867272, chr10:23199681-23199882.

4. The application as described in claim 1, characterized in that, Includes the following steps: A1. The sample to be tested was treated with methylation immunoprecipitation, and then the methylation sequencing results of the sample to be tested were obtained by liquid phase hybridization capture and high-throughput sequencing. A2. Input the methylation sequencing results of the sample to be tested obtained in step A1 into the esophageal cancer binary classification model to obtain the esophageal cancer screening results; The method for constructing the esophageal cancer binary classification model involves selecting a random forest model, an extra-tree model, or a K-nearest neighbor model for model construction. The model is obtained through training on a training set, testing on a test set, and evaluation on an independent validation set. The data type for training, testing, and validation is the relative methylation matrix of each sample from the esophageal cancer group and the normal healthy and benign lesion groups. The relative methylation matrix consists of the relative methylation level of each methylation marker in the esophageal cancer methylation marker combination. The relative methylation level of each methylation marker in the esophageal cancer methylation marker combination is calculated using Formula I. y is the relative methylation level of a single methylation marker, x is the number of sequencing reads within that methylation marker, z is the number of CpG sites in that methylation marker, and n is the number of methylation markers in the methylation marker combination for esophageal cancer.

5. The application as described in claim 4, characterized in that, Random forest, extra tree, and K-nearest neighbor models are scored using the Lazy predict package, with metrics including AUC, F1-Score, and recall. A model is then selected based on the scores.

6. A reagent kit, characterized in that, The kit is intended for screening esophageal cancer; the kit contains a detection reagent for a combination of methylation markers for esophageal cancer as described in any one of claims 1-5.

7. The system, characterized in that, The system is used for screening esophageal cancer; the system includes: B1. Reagents and / or instruments used for immunoprecipitation sequencing of methylated DNA; B2. An apparatus for establishing a binary classification model for esophageal cancer and determining whether a sample to be tested has esophageal cancer using the binary classification model. The binary classification model for esophageal cancer is constructed using the following method: a random forest model, an extra-tree model, or a K-nearest neighbor model is selected for constructing the binary classification model; the binary classification model for esophageal cancer is obtained through training on a training set, testing on a test set, and evaluation on an independent validation set; the data type for training, testing, and validation is the relative methylation matrix of each sample in the esophageal cancer group and the normal healthy and benign lesion groups; the relative methylation matrix consists of the relative methylation level of each methylation marker in the methylation marker combination for esophageal cancer; the relative methylation level of each methylation marker in the methylation marker combination for esophageal cancer is calculated using Formula I. y is the relative methylation level of a single methylation marker, x is the number of sequencing reads within that methylation marker, z is the number of CpG sites in that methylation marker, and n is the number of methylation markers in the methylation marker combination for esophageal cancer. The methylation marker combination for esophageal cancer comprises at least 10 regions from the following chromosomal regions defined by Hg38 coordinates: chr20:21709322-21709521, chr2:5692940-5693139, chr6:392591-392790, chr10:16520527-16520727, chr13:36431914-36432113, chr2:5696139-5696338, chr2:153477699-153477898, chr2:206274607-206274808, chr3 :194487612-194487813、chr4:20395205-20395404、chr4:16531136-16531335、chr4:20253187-20253386、chr5:41418626-41418825、ch r5:50969510-50969710, chr5:159100465-159100664, chr5:115816316-115816515, chr8:72075503-72075702, chr9:98707329-98707528 , chr13:107867073-107867272, chr10:23199681-23199882, chr11:65833730-65833930, chr11:65833761-65833962, chr11:43581270-4 3581471, chr12:4909539-4909740, chr12:24903117-24903316, chr13:107867815-107868014, chr13:91399046-91399245, chr14:855300 51-85530250, chr16:51151107-51151306, chr17:49972810-49973009, chr19:57584000-57584201, chr20:43915976-43916177, chr20:2 1713762-21713963, chr13:112069044-112069243, chr1:110674844-110675042, chr17:74923550-74923750, chr10:17229389-17229588.

8. A method for constructing a binary classification model for esophageal cancer, characterized in that, A random forest model, an extra-tree model, or a K-nearest neighbor model is selected for constructing the binary classification model. The esophageal cancer binary classification model is obtained through training on a training set, testing on a test set, and evaluation on an independent validation set. The data type for training, testing, and validation is the relative methylation matrix of each sample from the esophageal cancer group, the normal healthy group, and the benign lesion group. The relative methylation matrix consists of the relative methylation level of each methylation marker in the esophageal cancer methylation marker combination. The relative methylation level of each methylation marker in the esophageal cancer methylation marker combination is calculated using Formula I. y is the relative methylation level of a single methylation marker, x is the number of sequencing reads within that methylation marker, z is the number of CpG sites in that methylation marker, and n is the number of methylation markers in the methylation marker combination for esophageal cancer. The methylation marker combination for esophageal cancer comprises at least 10 regions from the following chromosomal regions defined by Hg38 coordinates: chr20:21709322-21709521, chr2:5692940-5693139, chr6:392591-392790, chr10:16520527-16520727, chr13:36431914-36432113, chr2:5696139-5696338, chr2:153477699-153477898, chr2:206274607-206274808, chr3 :194487612-194487813、chr4:20395205-20395404、chr4:16531136-16531335、chr4:20253187-20253386、chr5:41418626-41418825、ch r5:50969510-50969710, chr5:159100465-159100664, chr5:115816316-115816515, chr8:72075503-72075702, chr9:98707329-98707528 , chr13:107867073-107867272, chr10:23199681-23199882, chr11:65833730-65833930, chr11:65833761-65833962, chr11:43581270-4 3581471, chr12:4909539-4909740, chr12:24903117-24903316, chr13:107867815-107868014, chr13:91399046-91399245, chr14:855300 51-85530250, chr16:51151107-51151306, chr17:49972810-49973009, chr19:57584000-57584201, chr20:43915976-43916177, chr20:2 1713762-21713963, chr13:112069044-112069243, chr1:110674844-110675042, chr17:74923550-74923750, chr10:17229389-17229588.

9. The method for constructing the esophageal cancer binary classification model as described in claim 8, characterized in that, The methylation marker combination for esophageal cancer comprises at least 20 regions from the following chromosomal regions defined by Hg38 coordinates: chr20:21709322-21709521, chr2:5692940-5693139, chr6:392591-392790, chr10:16520527-16520727, chr13:36431914-36432113, chr2:5696139-5696338, chr2:153477699-153477898, chr2:206274607-206274808, chr3 :194487612-194487813、chr4:20395205-20395404、chr4:16531136-16531335、chr4:20253187-20253386、chr5:41418626-41418825、ch r5:50969510-50969710, chr5:159100465-159100664, chr5:115816316-115816515, chr8:72075503-72075702, chr9:98707329-98707528 , chr13:107867073-107867272, chr10:23199681-23199882, chr11:65833730-65833930, chr11:65833761-65833962, chr11:43581270-4 3581471, chr12:4909539-4909740, chr12:24903117-24903316, chr13:107867815-107868014, chr13:91399046-91399245, chr14:855300 51-85530250, chr16:51151107-51151306, chr17:49972810-49973009, chr19:57584000-57584201, chr20:43915976-43916177, chr20:2 1713762-21713963, chr13:112069044-112069243, chr1:110674844-110675042, chr17:74923550-74923750, chr10:17229389-17229588.

10. The method for constructing the esophageal cancer binary classification model as described in claim 8, characterized in that, The methylation marker combination for esophageal cancer comprises nucleic acid fragments from the following chromosomal regions defined by Hg38 coordinates: chr20:21709322-21709521, chr2:5692940-5693139, chr6:392591-392790, chr10:16520527-16520727, chr13:36431914-36432113, chr2:5696139-5696338, chr2:153477699-153477898, chr2:206274607-206274808, chr3:194487612-194487813, chr4:2 0395205-20395404、chr4:16531136-16531335、chr4:20253187-20253386、 chr5:41418626-41418825、chr5:50969510-50969710、chr5:159100465-159 100664, chr5:115816316-115816515, chr8:72075503-72075702, chr9:9870 7329-98707528, chr13:107867073-107867272, chr10:23199681-23199882.

11. An electronic device, characterized in that, include: A processor and a memory, wherein the processor is connected to the memory; The memory is used to store the computer program of the processor; The processor is configured to implement the method of any one of claims 8-10 by executing the computer program.

12. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when executed by a processor, implements the method as described in any one of claims 8-10.

13. A combination of nucleic acid fragments, characterized in that, The nucleic acid fragment combination records nucleic acid sequence information encoding a combination of methylation markers for human esophageal cancer; the combination of methylation markers for esophageal cancer includes at least 10 regions from the following chromosomal regions defined by Hg38 coordinates: chr20:21709322-21709521, chr2:5692940-5693139, chr6:392591-392790, chr10:16520527-16520727, chr13:36431914-36432113, chr2:5696139-5696338, chr2:153477699-153477898, ch r2:206274607-206274808, chr3:194487612-194487813, chr4:20395205-20395404, chr4:16531136-16531335, chr4:20253187-20253386, chr 5:41418626-41418825, chr5:50969510-50969710, chr5:159100465-159100664, chr5:115816316-115816515, chr8:72075503-72075702, chr9 :98707329-98707528、chr13:107867073-107867272、chr10:23199681-23199882、chr11:65833730-65833930、chr11:65833761-65833962、chr 11:43581270-43581471、chr12:4909539-4909740、chr12:24903117-24903316、chr13:107867815-107868014、chr13:91399046-91399245、chr 14:85530051-85530250、chr16:51151107-51151306、chr17:49972810-49973009、chr19:57584000-57584201、chr20:43915976-43916177、chr 20:21713762-21713963, chr13:112069044-112069243, chr1:11067484 4-110675042, chr17:74923550-74923750, chr10:17229389-17229588.