A biomarker for diagnosing hepatocellular carcinoma and use thereof
By integrating information from multiple databases to screen for specific biomarkers, the HCC Score diagnostic model was constructed, which solved the problems of insufficient sensitivity and specificity in the early diagnosis of hepatocellular carcinoma and achieved efficient early identification of hepatocellular carcinoma.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SUZHOU INST OF NANO TECH & NANO BIONICS CHINESE ACEDEMY OF SCI
- Filing Date
- 2025-09-19
- Publication Date
- 2026-06-26
AI Technical Summary
Existing technologies lack sufficient sensitivity and specificity in the early diagnosis of hepatocellular carcinoma, resulting in poor clinical efficacy. Furthermore, liver cancer cells are prone to developing resistance to chemotherapy and targeted drugs, leading to poor patient prognosis.
By integrating transcriptional profile information from the TCGA, GEO, and CCLE databases, we screened out the encoding nucleic acids of α-1-microglobulin, porphyrin, apolipoprotein A1, fibrinogen γ chain, and fibrinogen α chain as biomarkers, constructed the HCC Score diagnostic model, and diagnosed early hepatocellular carcinoma by detecting the mRNA expression level in CD147-positive extracellular vesicles.
It improved the sensitivity and specificity of early diagnosis of hepatocellular carcinoma, with a sensitivity of 85.71%, a specificity of 87.10%, and an overall diagnostic accuracy of 86.47%. It effectively reduced misdiagnosis of patients with cirrhosis and the model showed high stability.
Smart Images

Figure CN120829975B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of medical testing technology and relates to a biomarker for diagnosing hepatocellular carcinoma and its application. Background Technology
[0002] Hepatocellular carcinoma (HCC) accounts for 75%-85% of primary liver cancers. It is an aggressive tumor with a relatively insidious onset. At the time of initial diagnosis, less than 30% of liver cancer patients are suitable for radical treatment, and most patients are diagnosed at an intermediate or advanced stage. Treatment for intermediate and advanced HCC mainly involves comprehensive therapies such as radiotherapy, chemotherapy, and targeted drugs, but the clinical efficacy still needs improvement. Currently, the 5-year survival rate for HCC patients in China is only 14.1%, partly because liver cancer cells easily develop multidrug resistance to chemotherapy and targeted drugs, ultimately leading to a poor prognosis. Therefore, early detection of HCC is crucial for improving patient survival rates. Current clinical diagnostic methods mainly include imaging and serum biomarker testing, but their diagnostic effectiveness is unsatisfactory, especially in the early stages.
[0003] Therefore, exploring sensitive and specific early diagnostic strategies for hepatocellular carcinoma (HCC) has become a key research focus. In recent years, extracellular vesicles, as nanoscale bilayered membrane vesicles secreted by cells, have gradually become a promising diagnostic tool in liquid biopsy due to their high stability, rich content of biomarkers such as proteins, lipids, and nucleic acids, and their ability to reflect the pathological state of the source cells. Studies have found that the expression profile of extracellular vesicles in the body fluids of HCC patients differs significantly from that of healthy individuals. In particular, the combination of extracellular vesicle membrane proteins and nucleic acids has shown good performance in distinguishing early-stage HCC patients from high-risk groups (such as patients with chronic hepatitis B or cirrhosis). Therefore, a high-throughput detection platform based on multi-omics biomarkers of extracellular vesicles holds promise for achieving non-invasive, sensitive, and early diagnosis of HCC, promoting the application of precision medicine in liver cancer prevention and control.
[0004] CN112280857A discloses a biomarker for the diagnosis of hepatocellular carcinoma, which is NDUFB3 protein or its mRNA. By detecting the expression level of NDUFB3 protein or its mRNA in clinical hepatocellular carcinoma tissue samples, compared with adjacent normal liver tissue, the expression of NDUFB3 protein or its mRNA in hepatocellular carcinoma tissue samples is significantly lower. Therefore, NDUFB3 protein or its mRNA can be used as a basis for early clinical diagnosis of hepatocellular carcinoma.
[0005] In conclusion, developing novel biomarkers and corresponding detection methods associated with early-stage hepatocellular carcinoma (HCC) and expanding diagnostic techniques for HCC is of great significance for the detection of early-stage HCC. Summary of the Invention
[0006] To address the shortcomings of existing technologies and practical needs, this invention provides a biomarker for diagnosing hepatocellular carcinoma (HCC) and its application. By integrating HCC tissue transcription profile information from the TCGA database, the GEO database, and the CCLE database, biomarkers related to HCC are identified through screening. An HCC Score diagnostic model is constructed using the detection results of these biomarkers, demonstrating high diagnostic sensitivity and specificity for HCC. This model shows significant potential for early liver cancer screening and has important practical significance for the early diagnosis of HCC.
[0007] To achieve this objective, the present invention adopts the following technical solution:
[0008] In a first aspect, the present invention provides a biomarker for diagnosing hepatocellular carcinoma, the biomarker comprising a combination of nucleic acids encoding α-1-microglobulin, viscosin, apolipoprotein A1, fibrinogen γ chain, fibrinogen α chain, and albumin; wherein the encoded nucleic acid comprises mRNA or cDNA.
[0009] This invention utilizes large-scale data integration for feature gene screening to identify hepatocellular carcinoma (HCC) biomarkers, offering greater reliability and generalization ability compared to single-database transcriptome sequencing data. The invention purifies CD147-positive extracellular vesicles from plasma and performs PCR detection on the mRNA of the HCC biomarkers they carry. Using the detection results of these biomarkers as a feature, an HCC Score diagnostic model is constructed, exhibiting high diagnostic sensitivity and specificity for HCC. This model shows significant potential for early liver cancer screening and holds important practical significance for the early diagnosis of HCC.
[0010] In a second aspect, the present invention provides the application of the biomarkers and / or detection reagents described in the first aspect for diagnosing hepatocellular carcinoma in the preparation of hepatocellular carcinoma diagnostic products.
[0011] Thirdly, the present invention provides a kit for the diagnosis of hepatocellular carcinoma, the kit comprising reagents for detecting the presence or expression level of the biomarkers for the diagnosis of hepatocellular carcinoma described in the first aspect.
[0012] Preferably, the reagent includes primers and / or probes for detecting the biomarkers for diagnosing hepatocellular carcinoma as described in the first aspect.
[0013] Preferably, the kit further includes the carrier and carrier purification reagent for the biomarker for diagnosing hepatocellular carcinoma as described in the first aspect.
[0014] Preferably, the carrier comprises CD147-positive extracellular vesicles.
[0015] Preferably, the carrier purification reagent comprises silica spheres or magnetic beads modified with antibodies or aptamers targeting CD147.
[0016] Fourthly, the present invention provides a method for constructing a diagnostic model for hepatocellular carcinoma, the method comprising the following steps:
[0017] (1) Integration of samples from different databases: Integrate transcriptome sequencing data from hepatocellular carcinoma tissues and hepatocellular carcinoma cell lines, including datasets of hepatocellular carcinoma tissues and hepatocellular carcinoma cell lines;
[0018] (2) Screening of biomarkers for hepatocellular carcinoma: Based on the data set described in step (1), expression analysis is performed to obtain genes that are highly expressed in hepatocellular carcinoma tissues and hepatocellular carcinoma cell lines. Among the obtained highly expressed genes, genes that are lowly expressed in immune cells are screened to obtain the biomarkers described in the first aspect.
[0019] (3) Sample detection: CD147 positive extracellular vesicles were isolated from the subject's plasma using purification reagents, and RNA was extracted after lysing the extracellular vesicles and reverse transcribed into cDNA. The mRNA expression level of the biomarkers mentioned in the first aspect was quantitatively detected by PCR.
[0020] (4) Model construction: Binary logistic regression analysis was performed on the mRNA expression level data of hepatocellular carcinoma biomarkers in clinical samples with early hepatocellular carcinoma and cirrhosis samples to obtain a hepatocellular carcinoma diagnostic model. The model output variable, the subject's hepatocellular carcinoma score (HCC Score), was used to determine whether the subject was positive for hepatocellular carcinoma. The standard for judgment was: when the HCC Score value was greater than -0.6, the subject was judged to be positive for hepatocellular carcinoma.
[0021] Fifthly, the present invention provides a hepatocellular carcinoma diagnostic model, which is constructed by the method for constructing a hepatocellular carcinoma diagnostic model described in the fourth aspect.
[0022] Preferably, the input variable of the hepatocellular carcinoma diagnostic model is the mRNA or cDNA expression level of the biomarker for diagnosing hepatocellular carcinoma as described in the first aspect, and the output variable of the hepatocellular carcinoma diagnostic model is the subject's hepatocellular carcinoma score (HCC Score).
[0023] Preferably, the formula for calculating the subject's hepatocyte score is as shown in equation (1):
[0024] HCC Score = (0.255×AMBP) + (0.517×ALB) + (0.819×VTN) - (0.19×APOA1) - (0.435×FGG) + (0.684×FGA) - 4.534 Equation (1).
[0025] Among them, HCC Score is the subject's hepatocyte score, AMBP is the expression level of α-1-microglobulin-encoded nucleic acid, ALB is the expression level of albumin-encoded nucleic acid, VTN is the expression level of lentinan-encoded nucleic acid, APOA1 is the expression level of apolipoprotein A1-encoded nucleic acid, FGG is the expression level of fibrinogen γ chain-encoded nucleic acid, and FGA is the expression level of fibrinogen α chain-encoded nucleic acid; the encoded nucleic acid includes mRNA or cDNA.
[0026] In this invention, the clinical test results of the model showed an AUC value of 0.9309, indicating that the model has extremely high discriminative ability. The model's sensitivity was 85.71%, specificity was 87.10%, and overall diagnostic accuracy was 86.47%, effectively identifying individuals with early-stage liver cancer and minimizing misdiagnosis of patients with cirrhosis. Using machine learning methods combined with cross-validation strategies to evaluate the model's performance, the model remained stable, with an AUC value of 0.9287, sensitivity increased to 88.17%, specificity to 84.42%, and diagnostic accuracy maintained at 86.47%, highly consistent with the performance of the logistic regression model.
[0027] In a sixth aspect, the present invention provides a hepatocellular carcinoma diagnostic device, the hepatocellular carcinoma diagnostic device comprising a detection unit and an evaluation unit.
[0028] The detection unit is used to perform the following:
[0029] CD147-positive extracellular vesicles were isolated from the subject's plasma using purification reagents. The extracellular vesicles were lysed to extract RNA, which was then reverse transcribed into cDNA. The mRNA or cDNA expression level of the biomarkers described in the first aspect was quantitatively detected using PCR.
[0030] The evaluation unit is used to perform the following:
[0031] The expression levels of biomarkers mRNA or cDNA detected by the detection unit are input into the hepatocellular carcinoma diagnostic model described in the fifth aspect. The hepatocellular carcinoma is determined based on the hepatocellular carcinoma score (HCC Score). The formula for calculating the hepatocellular carcinoma score (HCCS Score) is shown in equation (1).
[0032] HCC Score = (0.255×AMBP) + (0.517×ALB) + (0.819×VTN) - (0.19×APOA1) - (0.435×FGG) + (0.684×FGA) - 4.534 Equation (1);
[0033] Among them, HCC Score is the subject's hepatocellular carcinoma score, AMBP is the expression level of α-1-microglobulin-encoded nucleic acid, ALB is the expression level of albumin-encoded nucleic acid, VTN is the expression level of lentinan-encoded nucleic acid, APOA1 is the expression level of apolipoprotein A1-encoded nucleic acid, FGG is the expression level of fibrinogen γ chain-encoded nucleic acid, and FGA is the expression level of fibrinogen α chain-encoded nucleic acid; the encoded nucleic acid includes mRNA or cDNA; the criterion for judgment is: when the HCC Score value is greater than -0.6, it is judged as hepatocellular carcinoma positive.
[0034] Compared with the prior art, the present invention has the following beneficial effects:
[0035] (1) This invention uses large-scale data integration to screen characteristic genes and mine biomarkers for hepatocellular carcinoma. It is more reliable and has stronger generalization ability than single-database transcriptome sequencing data.
[0036] (2) In large-sample validation, this invention further confirmed the effectiveness of six hepatocyte metabolism-related genes as biomarkers for early hepatocellular carcinoma diagnosis in CD147-positive extracellular vesicles, and established a high-performance prediction model based on logistic regression and machine learning algorithms, providing a methodological reference for the clinical diagnosis of hepatocellular carcinoma. Attached Figure Description
[0037] Figure 1 A flowchart illustrating the process of establishing a diagnostic model;
[0038] Figure 2 A heatmap of gene expression levels in 376 hepatocellular carcinoma tissues from the TCGA database, with the top 100 expressed genes sorted from ALB to LBP.
[0039] Figure 3 Venn diagrams of candidate genes from four datasets in the GEO database;
[0040] Figure 4 A diagram showing the crossover comparison of candidate genes from seven hepatocellular carcinoma cell lines in the CCLE database;
[0041] Figure 5The data presented here are from clinical sample testing. Figure A shows a heatmap of six gene expression from 93 cirrhosis samples and 77 early-stage hepatocellular carcinoma samples. Figure B is a box plot comparing the cirrhosis and hepatocellular carcinoma groups based on the output HCC Score. Figure C is an ROC curve evaluating the hepatocellular carcinoma diagnostic model based on the output HCC Score, with an AUC value of 0.9309. Figure D is a confusion matrix validating the effectiveness of the hepatocellular carcinoma diagnostic model, showing a sensitivity of 85.71%, a specificity of 87.10%, and an overall diagnostic accuracy of 86.47%. Figure E shows the model performance evaluation using machine learning combined with cross-validation. The model performed stably, with an AUC value of 0.9287, increased sensitivity to 88.17%, specificity to 84.42%, and diagnostic accuracy maintained at 86.47%. Detailed Implementation
[0042] To further illustrate the technical means and effects of this invention, the following description, in conjunction with embodiments and accompanying drawings, provides a further explanation of the invention. It is understood that the specific embodiments described herein are merely illustrative of the invention and not intended to limit it.
[0043] Where specific techniques or conditions are not specified in the examples, they shall be performed in accordance with the techniques or conditions described in the literature in this field, or in accordance with the product instructions. Reagents or instruments whose manufacturers are not specified are all conventional products that can be purchased through legitimate channels.
[0044] This invention utilizes publicly available transcriptome sequencing data of hepatocellular carcinoma tissues and cancer cell lines to screen for hepatocellular carcinoma-specific biomarkers and further construct a diagnostic model. A flowchart is shown below. Figure 1 As shown.
[0045] In one specific embodiment of the present invention, a kit for the diagnosis of hepatocellular carcinoma is provided. The kit includes reagents for detecting the presence or expression level of biomarkers for the diagnosis of hepatocellular carcinoma as described in the present invention, such as AMBP primers and probes (Hs00155697_m1), VTN primers and probes (Hs00169863_m1), APOA1 primers and probes (Hs00163641_m1), FGG primers and probes (Hs00241037_m1), FGA primers and probes (Hs00241027_m1), ALB primers and probes (Hs00609411_m1), and silica beads or magnetic beads modified with antibodies or aptamers targeting CD147.
[0046] In a specific embodiment of the present invention, a method for constructing a diagnostic model for hepatocellular carcinoma is provided, the method comprising the following steps:
[0047] (1) Integration of samples from different databases: Integrate transcriptome sequencing data from hepatocellular carcinoma tissues and hepatocellular carcinoma cell lines, including datasets of hepatocellular carcinoma tissues and hepatocellular carcinoma cell lines;
[0048] (2) Screening of biomarkers for hepatocellular carcinoma: Based on the data set described in step (1), expression analysis was performed to obtain genes that are highly expressed in hepatocellular carcinoma tissues and hepatocellular carcinoma cell lines. Among the obtained highly expressed genes, genes that are lowly expressed in immune cells were screened to obtain a combination of biomarkers encoding nucleic acids of α-1-microglobulin, porphyrin, apolipoprotein A1, fibrinogen γ chain, fibrinogen α chain, and albumin.
[0049] (3) Sample detection: CD147 positive extracellular vesicles were isolated from the subject's plasma using purification reagents. The extracellular vesicles were lysed and RNA was extracted. The RNA was then reverse transcribed into cDNA. The mRNA expression levels of the α-1-microglobulin, porphyrin, apolipoprotein A1, fibrinogen γ chain, fibrinogen α chain and albumin were quantitatively detected by PCR.
[0050] (4) Model construction: Binary logistic regression analysis was performed on the mRNA expression levels of α-1-microglobulin, porphyrin, apolipoprotein A1, fibrinogen γ chain, fibrinogen α chain and albumin in early hepatocellular carcinoma with cirrhosis samples and cirrhosis samples from clinical samples to obtain a hepatocellular carcinoma diagnostic model. The model output variable, the subject's hepatocellular carcinoma score (HCC Score), was used to determine whether the subject was positive for hepatocellular carcinoma. The criterion for judgment was: when the HCC Score value was greater than -0.6, the subject was judged to be positive for hepatocellular carcinoma.
[0051] In one specific embodiment of the present invention, a hepatocellular carcinoma diagnostic model is provided, which is constructed by the method for constructing a hepatocellular carcinoma diagnostic model.
[0052] Preferably, the input variables of the hepatocellular carcinoma diagnostic model are the expression levels of a combination of the nucleic acids encoding α-1-microglobulin, porphyrin, apolipoprotein A1, fibrinogen γ chain, fibrinogen α chain, and albumin, and the output variable of the hepatocellular carcinoma diagnostic model is the subject's hepatocellular carcinoma score (HCC Score). The formula for calculating the subject's hepatocellular carcinoma score is shown in equation (1):
[0053] HCC Score = (0.255×AMBP) + (0.517×ALB) + (0.819×VTN) - (0.19×APOA1) - (0.435×FGG) + (0.684×FGA) - 4.534 Equation (1).
[0054] Among them, HCC Score is the subject's hepatocyte score, AMBP is the expression level of α-1-microglobulin-encoded nucleic acid, ALB is the expression level of albumin-encoded nucleic acid, VTN is the expression level of lentinan-encoded nucleic acid, APOA1 is the expression level of apolipoprotein A1-encoded nucleic acid, FGG is the expression level of fibrinogen γ chain-encoded nucleic acid, and FGA is the expression level of fibrinogen α chain-encoded nucleic acid; the encoded nucleic acid includes mRNA or cDNA.
[0055] In one specific embodiment of the present invention, a hepatocellular carcinoma diagnostic device is provided, characterized in that the hepatocellular carcinoma diagnostic device includes a detection unit and an evaluation unit.
[0056] The detection unit is used to perform the following:
[0057] CD147-positive extracellular vesicles were isolated from the plasma of the subjects using purification reagents. After the extracellular vesicles were lysed, RNA was extracted and reverse transcribed into cDNA. The mRNA or cDNA expression levels of the following nucleic acids were quantitatively detected by PCR: α-1-microglobulin, porphyrin, apolipoprotein A1, fibrinogen γ chain, fibrinogen α chain, and albumin.
[0058] The evaluation unit is used to perform the following:
[0059] The expression levels of biomarkers mRNA or cDNA detected by the detection unit are input into the hepatocellular carcinoma diagnostic model. The HCC Score is used to determine whether the patient is positive for hepatocellular carcinoma. The formula for calculating the HCC Score is shown in Equation (1):
[0060] HCC Score = (0.255×AMBP) + (0.517×ALB) + (0.819×VTN) - (0.19×APOA1) - (0.435×FGG) + (0.684×FGA) - 4.534 Equation (1);
[0061] Among them, HCC Score is the subject's hepatocellular carcinoma score, AMBP is the expression level of α-1-microglobulin-encoded nucleic acid, ALB is the expression level of albumin-encoded nucleic acid, VTN is the expression level of lentinan-encoded nucleic acid, APOA1 is the expression level of apolipoprotein A1-encoded nucleic acid, FGG is the expression level of fibrinogen γ chain-encoded nucleic acid, and FGA is the expression level of fibrinogen α chain-encoded nucleic acid; the encoded nucleic acid includes mRNA or cDNA; the criterion for judgment is: when the HCC Score value is greater than -0.6, it is judged as hepatocellular carcinoma positive.
[0062] Example 1
[0063] This embodiment involves screening for hepatocellular carcinoma-specific biomarkers.
[0064] Based on publicly available transcriptome sequencing data of hepatocellular carcinoma (HCC) tissues and HCC cancer cell lines: Tumor transcriptome data from the TCGA database includes 376 HCC tissues, obtained using the Illumina HiSeq 2000 RNA sequencing platform. Gene expression levels were standardized to TPM (Transcripts Per Milliliter). Genes with the highest expression levels in HCC tissues, such as... Figure 2 As shown, the top 100 genes by expression level were selected from ALB to LBP. Four high-quality expression profile datasets related to hepatocellular carcinoma (GSE164760, GSE63898, GSE56140, GSE25097, etc.) were selected from the GEO database. These datasets all contain control information of hepatocellular carcinoma tissues and corresponding non-tumor tissues. The top 200 genes with the highest expression in hepatocellular carcinoma tissues were screened. A pairwise intersection strategy was used, that is, the high-expression gene sets selected from each of the four datasets were intersected pairwise, and candidate genes that appeared in two or more datasets were extracted, such as... Figure 3As shown in the figure, this screening process based on the GEO database yielded a batch of candidate genes that showed high expression in multiple independent hepatocellular carcinoma (HCC) datasets. Combining the screening results based on the TCGA database, the intersection of GEO and TCGA will be further analyzed to select representative genes with stable high expression characteristics in HCC tissues. To further validate the expression stability and representativeness of candidate genes in HCC at the cellular level, the CCLE (Cancer Cell Line Encyclopedia) database was introduced for model analysis of tumor cell lines derived from HCC. Seven representative HCC cancer cell lines were screened from the CCLE database. These cell lines are widely used in basic and translational medical research related to HCC, covering different molecular characteristics and biological behaviors, and have good representativeness and experimental operability. Gene expression matrix data of these seven HCC cancer cell lines were extracted, and gene expression was ranked for each cell line. The top 200 highly expressed genes were screened as a preliminary set of candidate genes that may be enriched at the cellular level. To improve data robustness, a pairwise intersection approach was used again to cross-compare the top 200 highly expressed genes selected from each of the seven cell lines, extracting genes that were commonly highly expressed across multiple cell lines as the final cellular-level candidate genes, such as... Figure 5As shown. The core objective of this strategy is to identify metabolism-related genes that are stably expressed in multiple hepatocellular carcinoma (HCC) cancer cell models, in order to eliminate expression bias caused by the specific background of a single cell line. Based on the integration of candidate gene screening results from multiple platforms and levels, including TCGA, GEO, and CCLE, a human leukocyte transcriptome dataset (such as data from the Human Protein Atlas) was further introduced. The model analyzed the expression of candidate genes in major immune cell types (including T cells, B cells, monocytes, NK cells, granulocytes, etc.). Genes with significant expression in leukocytes were strictly eliminated to minimize false positive results or signal interference in subsequent plasma extracellular vesicle detection. Through this series of bioinformatics processes, a comprehensive screening and step-by-step elimination process was implemented: screening for metabolism-related genes highly expressed in HCC cancer tissues from the TCGA database; selecting the top 200 differentially expressed genes from four independent GEO datasets and performing pairwise intersections; screening for the top 200 expressed genes from seven representative HCC cancer cell lines and performing intersections; and reverse screening and removal of interfering genes from leukocyte expression profile data, selecting only hepatocellular metabolism-related genes. Ultimately, six candidate genes were identified: the nucleic acids encoding α-1-microglobulin, porphyrin, apolipoprotein A1, fibrinogen γ chain, fibrinogen α chain, and albumin (AMBP, VTN, APOA1, FGG, FGA, and ALB). These genes were highly expressed in hepatocellular carcinoma tissues and cell models, and expressed at low levels or not at all in peripheral leukocytes, demonstrating good specificity, stability, and detectability.
[0065] Example 2
[0066] This embodiment describes the construction and evaluation of a diagnostic model for hepatocellular carcinoma.
[0067] 1. Collection and preparation of clinical data and samples
[0068] A retrospective analysis was conducted, collecting samples from patients with hepatocellular carcinoma and cirrhosis, as well as samples from patients with cirrhosis.
[0069] Patients with hepatocellular carcinoma met the following inclusion criteria: all were early-stage cases that could be treated with radical surgery; pathological diagnosis confirmed the tumor pathology (Edmondson) stage; all cases had complete basic information; all cases had cirrhosis; and the following conditions were excluded: pregnant patients, germ cell tumors, malignant tumors in other organs, severe infectious diseases, and severe diseases of other important organs (such as heart, lung, kidney, etc.).
[0070] Inclusion criteria for the cirrhosis group (control group): A confirmed diagnosis of cirrhosis by clinical, imaging, or liver biopsy, with no evidence of hepatocellular carcinoma or other liver tumors; complete basic case information, clinical examination results, and follow-up data; no liver transplantation or other resective liver treatment within 6 months prior to enrollment; exclusion of the following: pregnant patients, germ cell tumors, malignant tumors of other organs, severe infectious diseases, and severe diseases of other vital organs (such as heart, lung, kidney, etc.).
[0071] 2. Isolation and RNA extraction of CD147-positive extracellular vesicles from subject blood samples
[0072] Four mL of peripheral venous blood was collected from the subjects in EDTA vacuum anticoagulation blood collection tubes. After sampling, the tubes were inverted five times and treated at 4°C for use within 24 hours. To avoid epithelial cell contamination, all blood samples were from peripheral blood outside the first tube. The blood collection tubes were centrifuged at 300 g for 15 min in a horizontal centrifuge at room temperature to obtain plasma and the lower layer of blood cells. The upper plasma layer was collected, centrifuged at 2500 g for 15 min at 4°C, and the precipitate was discarded. CD147-positive extracellular vesicles were isolated from the subjects' plasma using purification reagents. The extracellular vesicles were lysed using QIAzol lysis buffer to release RNA. RNA was extracted using the QiagenmiRNeasy Micro Kit and reverse transcribed into cDNA template using a reverse transcription kit.
[0073] 3. The six mRNA genes screened in Example 1 were quantified using ddPCR.
[0074] Based on the mRNA coding gene sequences of the target genes (AMBP, ALB, APOA1, FGG, FGA, and VTN) screened in Example 1, target gene primers and probes (Thermo Fisher) were ordered. The mRNA expression levels of the target genes in the samples were detected using a QX200 Auto DG Droplet Digital PCR (ddPCR) model to obtain the mRNA expression levels of the target genes in each sample. The ddPCR reaction system was 20 μL, consisting of ddPCR probe premix (dUTP-free), enzyme-free sterile water, cDNA template, target gene primers, and probes. After PCR amplification of the target gene cDNA template, the cDNA corresponding to the target gene mRNAs in the sample was quantitatively detected using a droplet reader (QX200 Droplet Reader) (transcripts / μL), and the mRNA expression level of the target genes captured in each sample was calculated.
[0075] The target gene primers and probes for the nucleic acids are commercial products, as shown in Table 1 below:
[0076] Table 1
[0077]
[0078] 4. Construct and validate an early diagnostic model for hepatocellular carcinoma.
[0079] Gene expression heatmaps are generated based on gene expression data, such as... Figure 5 As shown in Figure A, six genes—the encoding nucleic acids of α-1-microglobulin, porphyrin, apolipoprotein A1, fibrinogen γ chain, fibrinogen α chain, and albumin—exhibited clear expression differences between the early-stage hepatocellular carcinoma (HCC) and cirrhosis groups, demonstrating good classification potential and consistent expression trends. Furthermore, a binary logistic regression model was constructed using gene expression data from these 170 samples for the diagnostic prediction of early-stage HCC. The input variables include the mRNA expression levels of the α-1-microglobulin-encoding nucleic acid, the fibronectin-encoding nucleic acid, the apolipoprotein A1-encoding nucleic acid, the fibrinogen γ-chain-encoding nucleic acid, the fibrinogen α-chain-encoding nucleic acid, and the albumin-encoding nucleic acid. The output variable is the subject's score, HCC Score, which is calculated as: HCC Score = (0.255×AMBP) + (0.517×ALB) + (0.819×VTN) - (0.19×APOA1) - (0.435×FGG) + (0.684×FGA) - 4.534 (Equation 1).
[0080] Among them, HCC Score is the subject's hepatocyte score, AMBP is the expression level of α-1-microglobulin-encoded nucleic acid, ALB is the expression level of albumin-encoded nucleic acid, VTN is the expression level of lentinan-encoded nucleic acid, APOA1 is the expression level of apolipoprotein A1-encoded nucleic acid, FGG is the expression level of fibrinogen γ chain-encoded nucleic acid, and FGA is the expression level of fibrinogen α chain-encoded nucleic acid. Figure 5 As shown in Figure B, this model exhibits excellent diagnostic performance, as... Figure 5 As shown in Figure C, the ROC curve plotted based on the HCC score evaluates the hepatocellular carcinoma diagnostic model, with an AUC value of 0.9309. In the confusion matrix validating the effectiveness of the hepatocellular carcinoma diagnostic model, the model's sensitivity was 85.71%, specificity was 87.10%, and the overall diagnostic accuracy was 86.47%. Figure 5As shown in Figure D, the model can effectively identify individuals with early-stage liver cancer and minimize misdiagnosis of patients with cirrhosis. To further verify the model's generalization ability and stability, machine learning methods combined with cross-validation were used to evaluate the model's performance. The model's performance remained stable, as shown in Figure D. Figure 5 As shown in Figure E, the AUC value was 0.9287, the sensitivity was increased to 88.17%, the specificity was 84.42%, and the diagnostic accuracy was 86.47%.
[0081] In summary, this invention mines biomarkers for hepatocellular carcinoma and further constructs a diagnostic model for hepatocellular carcinoma based on clinical data and binary logistic regression and machine learning algorithms, enabling accurate diagnosis of hepatocellular carcinoma from cirrhosis controls.
[0082] The present invention has been illustrated with the above embodiments to explain the detailed method of the present invention. However, the present invention is not limited to the detailed method described above, that is, it does not mean that the present invention must rely on the detailed method described above to be implemented. Those skilled in the art should understand that any improvements to the present invention, equivalent substitutions of the raw materials of the product of the present invention, addition of auxiliary components, selection of specific methods, etc., all fall within the protection scope and disclosure scope of the present invention.
Claims
1. A diagnostic device for early hepatocellular carcinoma in high-risk populations for cirrhosis, characterized in that, The hepatocellular carcinoma diagnostic device includes a detection unit and an evaluation unit; The detection unit is used to perform the following actions: isolating CD147-positive extracellular vesicles from plasma of patients with cirrhosis using purification reagents, wherein the purification reagents include silica spheres or magnetic beads modified with antibodies or aptamers targeting CD147; lysing the extracellular vesicles to extract RNA, and reverse transcribing it into cDNA; and quantitatively detecting the mRNA or cDNA expression level of the biomarker using PCR; wherein the biomarker includes a combination of nucleic acids encoding α-1-microglobulin, vitrin, apolipoprotein A1, fibrinogen γ chain, fibrinogen α chain, and albumin; wherein the encoded nucleic acid includes mRNA or cDNA. The evaluation unit is used to perform the following: inputting the expression level of biomarker mRNA or cDNA detected by the detection unit into the hepatocellular carcinoma diagnostic model, and determining whether hepatocellular carcinoma is positive based on the hepatocellular carcinoma score (HCC Score). The calculation formula for the hepatocellular carcinoma score (HCC Score) is shown in Equation (1): HCC Score = (0.255×AMBP) + (0.517×ALB) + (0.819×VTN) - (0.19×APOA1) -(0.435×FGG) + (0.684×FGA) - 4.534 Equation (1); Among them, HCC Score is the subject's hepatocyte score, AMBP is the expression level of α-1-microglobulin-encoded nucleic acid, ALB is the expression level of albumin-encoded nucleic acid, VTN is the expression level of lentinan-encoded nucleic acid, APOA1 is the expression level of apolipoprotein A1-encoded nucleic acid, FGG is the expression level of fibrinogen γ chain-encoded nucleic acid, and FGA is the expression level of fibrinogen α chain-encoded nucleic acid; the encoded nucleic acid includes mRNA or cDNA; The criteria for judgment are: when the HCC Score value is greater than -0.6, it is judged as positive for hepatocellular carcinoma.
2. The application of the detection reagent for the biomarker and the purification reagent for CD147 positive extracellular vesicles as described in claim 1 in the preparation of diagnostic products for early hepatocellular carcinoma in high-risk populations of cirrhosis.
3. The application according to claim 2, characterized in that, The detection reagent for the biomarker includes primers and / or probes for detecting the biomarker.