Individual risk score method and system based on multiple gene characteristics

By screening for significant SNP sites and HLA typing interaction features, a multigene risk scoring model was established, which solved the accuracy and reliability problems of diabetes risk assessment in existing technologies and achieved more accurate individual risk assessment.

CN122224480APending Publication Date: 2026-06-16BEIJING CHILDRENS HOSPITAL AFFILIATED TO CAPITAL MEDICAL UNIV +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
BEIJING CHILDRENS HOSPITAL AFFILIATED TO CAPITAL MEDICAL UNIV
Filing Date
2026-01-20
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing technologies have low accuracy and reliability in diabetes risk assessment. Existing PRS models do not adequately screen SNP sites and correct for population structure during model construction, and lack consideration for complex abnormal interactions.

Method used

Based on genome-wide association analysis results and diabetes literature datasets, significantly associated single nucleotide polymorphism sites were screened. Combined with HLA typing and its interaction characteristics, an individual polygenic risk scoring model was established, and the individual risk score was calculated by weighting multidimensional genetic effects.

🎯Benefits of technology

It improves the accuracy and reliability of diabetes risk assessment, enabling more accurate quantification of an individual's risk of developing the disease and providing a basis for early screening and prevention of diabetes.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122224480A_ABST
    Figure CN122224480A_ABST
Patent Text Reader

Abstract

The application provides an individual risk score method and system based on a multi-gene feature, relates to the technical field of risk assessment, and comprises the following steps: screening significant single nucleotide polymorphism sites under a risk classification of interest, obtaining capture sequencing data of a genetic feature group of a target population and a healthy control group, performing association analysis by using the capture sequencing data and a preliminary candidate site set, performing interaction analysis after genetic feature correction and HLA typing, establishing an individual multi-gene risk score model, performing multi-dimensional risk quantitative assessment of a target individual by using the individual multi-gene risk score model, and outputting a risk score. The application solves the technical problem that the accuracy and reliability of risk assessment in the prior art are low. The application achieves more accurate scoring of individual risk and improves the technical effects of risk assessment accuracy and reliability.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of risk assessment technology, specifically to an individual risk scoring method and system based on multi-gene characteristics. Background Technology

[0002] With the advancement of genomics research, single nucleotide polymorphisms (SNPs) have received widespread attention in susceptibility studies for complex diseases such as diabetes. Existing studies have identified multiple SNP loci associated with diabetes, but the explanatory power of a single locus in terms of disease risk is limited, with low independent predictive power, explaining only a small portion of the genetic risk. Current PRS models do not adequately screen SNP loci and correct for population structure during model construction, lack consideration for complex abnormal interactions, and suffer from low accuracy and computational precision in risk assessment.

[0003] Existing technologies suffer from low accuracy and reliability in risk assessment. Summary of the Invention

[0004] The purpose of this application is to provide an individual risk scoring method and system based on multi-gene characteristics, in order to solve the technical problem that the accuracy and reliability of risk assessment in the prior art are low.

[0005] In view of the above problems, this application provides a method and system for individual risk scoring based on multi-gene characteristics.

[0006] The first aspect of this application provides an individual risk scoring method based on polygenic characteristics. The method includes: screening significantly associated single nucleotide polymorphism (SNP) sites under risk categories of interest based on genome-wide association analysis (GWAS) results and a diabetes literature dataset, and establishing a preliminary candidate site set; acquiring captured sequencing data of the genetic characteristics of the target population and a healthy control population, performing association analysis using the captured sequencing data and the preliminary candidate site set to generate a raw genetic data matrix of the target population; performing genetic characteristic correction on the raw genetic data matrix, performing HLA typing, and then performing interaction analysis, establishing an individual polygenic risk scoring model using the interaction analysis results; and using the individual polygenic risk scoring model to perform multidimensional risk quantification assessment of the target individual and output a risk score.

[0007] Optionally, after performing quality control and sample filtering using Plink software, the effect value of each SNP locus in the target population is calculated based on logistic regression; significant SNP loci are screened according to the calculation results and weighted coefficients are output; and an individual polygenic risk scoring model is established based on the significant SNP loci and weighted coefficients.

[0008] Optionally, HLA genotyping is performed on the sample data after quality control and sample filtering to obtain the HLA-DQA1, HLA-DQB1, and HLA-DRB1 genotyping results of the target population samples; the interaction between HLA-DQA1 and HLA-DQB1 is calculated to obtain the interaction pair scores and interaction pair weights; the genotyping score and genotyping weight of HLA-DRB1 are calculated; and an individual polygenic risk scoring model is established based on the significant SNP sites, weighting coefficients, interaction pair scores, interaction pair weights, genotyping scores, and genotyping weights.

[0009] Optionally, the individual polygenic risk scoring model is as follows: ;in, To score risk, Characterizing the number of SNPs, For SNP index, Indicates the genotype of the SNP. These are weighting coefficients. This indicates the number of interaction pairs between HLA-DQA1 and HLA-DQB1. Characterize the interaction effect on weights, Characterizing the number of HLA-DRB1 genotypes, For fractal weights.

[0010] Optionally, the HLA-DQA1 and HLA-DQB1 interaction pairs include 58 valid combinations, and the HLA-DRB1 typing includes 20 valid typings.

[0011] Optionally, the weighting coefficient is constructed by calculating the ratio of SNP loci in the genetic characteristic group of the target population to that in the healthy control group, and taking the natural logarithm to obtain the statistical effect value.

[0012] Optionally, the screened significant SNP sites include rs117267808, rs73069541, rs2313430, rs2624847, rs2581787, rs6752053, rs76155021, rs9515905, rs505922 and rs117767867.

[0013] A second aspect of this application provides an individual risk scoring system based on multigene characteristics. The system includes: a data screening module for screening significantly associated single nucleotide polymorphism (SNP) sites under risk categories of interest based on genome-wide association analysis (GWAS) results and a diabetes literature dataset, establishing a preliminary candidate site set; an association analysis module for acquiring captured sequencing data from a target population's genetic characteristics group and a healthy control group, performing association analysis using the captured sequencing data and the preliminary candidate site set, and generating an original genetic data matrix for the target population; a model building module for performing genetic characteristic correction on the original genetic data matrix, performing HLA typing, and then performing interaction analysis, using the interaction analysis results to establish an individual multigene risk scoring model; and a risk assessment module for using the individual multigene risk scoring model to perform multidimensional risk quantification assessment of the target individual and output a risk score.

[0014] One or more technical solutions provided in this application have at least the following technical effects or advantages: The method provided in this application, based on genome-wide association analysis results and a diabetes literature dataset, screens significantly associated single nucleotide polymorphism (SNP) sites under risk categories of interest to establish a preliminary candidate site set; acquires captured sequencing data of the genetic characteristics of the target population and a healthy control population; performs association analysis using the captured sequencing data and the preliminary candidate site set to generate a raw genetic data matrix for the target population; after genetic feature correction of the raw genetic data matrix, performs HLA genotyping and interaction analysis, and establishes an individual polygenic risk scoring model using the interaction analysis results; and uses the individual polygenic risk scoring model to perform multidimensional risk quantification assessment of the target individual and outputs a risk score. By introducing significant SNP sites, HLA genotyping and their interaction characteristics, and combining multidimensional genetic effects to weighted calculate the individual polygenic risk scoring model, a more accurate risk score for individuals is achieved, thereby improving the accuracy and reliability of risk assessment.

[0015] The above description is merely an overview of the technical solution of this application. To enable a clearer understanding of the technical means of this application and to facilitate its implementation according to the description, and to make the above and other objects, features, and advantages of this application more apparent, specific embodiments of this application are described below. It should be understood that the content described in this section is not intended to identify key or important features of the embodiments of this application, nor is it intended to limit the scope of this application. Other features of this application will become readily apparent through the following description. Attached Figure Description

[0016] To more clearly illustrate the technical solutions in this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are merely exemplary. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort.

[0017] Figure 1 A flowchart illustrating the individual risk scoring method based on multi-gene characteristics provided in this application.

[0018] Figure 2 The ROC curve of the individual multigene risk scoring model in the individual risk scoring method based on multigene features provided in this application.

[0019] Figure 3 The ROC curves of the model classification performance of the individual multigene risk scoring model are used to validate the individual risk scoring method based on multigene features provided in this application.

[0020] Figure 4 A schematic diagram of the structure of the individual risk scoring system based on multi-gene characteristics provided in this application.

[0021] Figure labeling: Data filtering module 11, correlation analysis module 12, model building module 13, risk assessment module 14. Detailed Implementation

[0022] This application provides a method and system for individual risk scoring based on polygenic characteristics, addressing the technical problem of low accuracy and reliability in risk assessment in existing technologies. By introducing significant SNP loci, HLA genotyping, and their interaction characteristics, and combining them with a multidimensional genetic effect weighted calculation model for individual polygenic risk scoring, a more accurate individual risk score is achieved, thereby improving the accuracy and reliability of risk assessment.

[0023] The technical solutions of the present invention will now be clearly and completely described with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present invention, and not all of them. It should be understood that the present invention is not limited to the exemplary embodiments described herein. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative effort are within the scope of protection of the present invention. It should also be noted that, for ease of description, only the parts related to the present invention are shown in the accompanying drawings, not all of them.

[0024] Example 1, as Figure 1 As shown, this application provides an individual risk scoring method based on multi-gene features, which includes: Based on genome-wide association analysis results and diabetes literature datasets, we screened significantly associated single nucleotide polymorphism sites under risk categories of interest and established a preliminary candidate site set.

[0025] Specifically, based on genome-wide association analysis (GWAS) results, single nucleotide polymorphisms (SNPs) significantly associated with the risk of developing diabetes were screened across the entire genome. GWAS is a statistical method that uses genotype-phenotype associations in large populations to identify genetic variations in disease. By calculating the difference in allele frequencies at each locus between the case and control groups, the significance of each SNP with disease risk was obtained. A significance threshold was set, and multiple public or internal diabetes GWAS databases were integrated and screened. Biological validation and supplementation were performed using diabetes literature datasets, including published diabetes-related research papers, functional gene annotation databases, and disease-related gene aggregation platforms such as DisGeNET. Significantly associated SNPs under different risk categories were identified. Based on the screened significantly associated SNPs, a preliminary candidate locus set was constructed. This preliminary candidate locus set was statistically significant and biologically reasonable, effectively representing the genetic characteristics associated with diabetes risk. This provided a precise locus basis for data association analysis in the target population, improving the accuracy and reliability of individual risk assessment. The target population was children.

[0026] Acquire the genetic characteristics of the target population and the healthy control population, and use the captured sequencing data and the preliminary candidate locus set to perform association analysis to generate the original genetic data matrix of the target population.

[0027] Specifically, capture sequencing was performed on the genetic characteristics of the target population and the healthy control population to obtain capture sequencing data. The target population refers to the population in the area where risk analysis is required. The capture sequencing data includes: S1, establishing the sample population: The sample population includes 143 T1 abnormal characteristic groups and 110 healthy control groups. The sample population is derived from children's samples. The T1 abnormal characteristic groups meet the criteria of being diagnosed with diabetes by the World Health Organization (WHO) and the American Diabetes Association (ADA); at least one of the antibody tests for GADA, ICA, IAA, IA-2A and ZnT8A is positive; and they are dependent on insulin treatment. The healthy control group is composed of healthy people without diabetes or a family history of diabetes from the same geographical area as the T1 patients. S2, DNA extraction and agarose gel electrophoresis: (1) The extracted whole blood was inverted and mixed, and 3 ml was added to a centrifuge tube containing red blood cell lysis buffer and allowed to stand at room temperature for 10 minutes. During this period, the centrifuge tube was inverted and mixed 9 times to ensure complete lysis. Centrifuged at 3000 rpm for 2 minutes, then 3 ml of red blood cell lysis buffer was added, vortexed for 5 seconds, allowed to stand for 3 minutes, centrifuged at 3000 rpm for 1 minute, and the supernatant was discarded; (2) 3 ml of red blood cell lysis buffer was added to the centrifuge tube, and vortexed until the white blood cell clumps could be observed to be fully resuspended. The tube was placed in a 60-65℃ constant temperature oven for 60 minutes, and inverted and mixed once every 8-10 minutes during this period; (3) The protein precipitation solution was then aliquoted into centrifuge tubes, with 1 ml added to each. The centrifuge tubes were vortexed to ensure complete mixing, centrifuged at 3000 rpm for 20 minutes, and the supernatant was aspirated and transferred to a new centrifuge tube. The same volume of isopropanol was added, and the tubes were inverted until cotton-like white DNA precipitate was observed in the centrifuge tube. The DNA precipitate was aspirated and transferred to a clean centrifuge tube, centrifuged at 12000 rpm for 1 minute, and the supernatant was discarded. (4) Rinse the DNA, add 500 μL of 75% ethanol, invert several times, centrifuge at 12000 rpm for 1 minute, and discard the supernatant. Place at room temperature to allow the ethanol to evaporate and dry. (5) Add 300 μL of DNA dissolving solution to the EP tube that has been dried with ethanol, gently shake the tube to mix it thoroughly, and place it at room temperature or in a 4°C refrigerator to allow the DNA precipitate to hydrate for at least 8 hours. (6) Use Nanodrop2000 to detect the concentration of the extracted DNA samples and record the A230, A260, and A280 of each sample. Perform agarose gel electrophoresis on all samples to determine whether the DNA is contaminated or degraded, exclude unqualified samples, and do not use them for subsequent capture sequencing. S3, DNA sample library construction and sequencing: The DNA sample library construction and sequencing includes DNA fragmentation, end repair and A-tailing, adapter ligation, library purification and fragment selection, library quality control and sequencing, and obtain capture sequencing data through sequencing.

[0028] Genome-wide association analysis was performed using the genetic characteristics of the target population and the captured sequencing data of the healthy control population, along with a preliminary candidate locus set. This generated susceptible SNP loci in the target population. Based on the association analysis results, a raw genetic data matrix of the target population was generated. Each row of the raw genetic data matrix corresponds to one sample, and each column corresponds to one SNP locus, which is a single nucleotide polymorphism locus selected from the candidate locus set or susceptible locus set. The matrix elements represent the genotype of the sample at that SNP locus.

[0029] After genetic feature correction of the original genetic data matrix, HLA typing is performed, followed by interaction analysis. The results of the interaction analysis are used to establish an individual polygenic risk scoring model.

[0030] Furthermore, genetic feature correction is performed on the original genetic data matrix, including: after performing quality control and sample filtering using Plink software, calculating the effect value of each SNP locus in the target population based on logistic regression; screening significant SNP loci based on the calculation results and outputting weighting coefficients; and establishing an individual polygenic risk scoring model based on the significant SNP loci and weighting coefficients.

[0031] Specifically, the plink software was used to perform data quality control and sample filtering on the raw genetic data matrix. Data quality control included filtering for SNP exhalation rate, minor allele frequency, and Hardy-Weinberg equilibrium by removing samples and loci with insufficient sequencing depth or low genotype quality. Sample filtering refers to removing duplicate samples, related samples, or population outliers identified by principal component analysis (PCA) to eliminate biases caused by population structure and kinship.

[0032] Logistic regression is used to calculate the effect size of each SNP locus in the target population. Specifically, using a binary phenotype (e.g., 1 case of diabetes and 0 healthy controls) as the dependent variable, SNP genotype as the independent variable, and age, sex, etc., as covariates, binary logistic regression is performed on each SNP to obtain the regression coefficient, which represents the effect size of the SNP locus in the target population. This coefficient indicates the increase in the log-ratio of each additional alternative allele pair for the SNP. The log-ratio is the natural logarithm of the ratio of the probability of disease occurrence to the probability of not occurrence. The natural logarithm transforms the effect size into an odds ratio, quantifying the association strength between each SNP and the risky disease or phenotype. Based on the logistic regression results, significant SNP loci are selected using a p-value or FDR control as a significance threshold, and the weighted coefficient of each SNP locus is output. This weighted coefficient represents the contribution weight of each SNP locus in the multigene risk scoring model. An individual multigene risk scoring model is then established based on the significant SNP loci and their weighted coefficients to assess individual risk scores.

[0033] By eliminating noise and population structure interference through quality control, calculating the true effect value of each SNP through logistic regression, and screening significant loci, the accuracy and reliability of individual polygenic risk scores are improved.

[0034] Furthermore, after performing HLA genotyping, interaction analysis is performed, including: performing HLA genotyping on the sample data after quality control and sample filtering to obtain the HLA-DQA1, HLA-DQB1, and HLA-DRB1 genotyping results of the target population samples; calculating the interaction between HLA-DQA1 and HLA-DQB1 to obtain the interaction pair score and interaction pair weight; calculating the genotyping score and genotyping weight of HLA-DRB1; and establishing an individual polygenic risk scoring model based on the significant SNP sites, weighting coefficients, interaction pair scores, interaction pair weights, genotyping scores, and genotyping weights.

[0035] Specifically, after genetic feature correction of the original genetic data matrix, HLA-LA software was used to perform HLA typing on the captured sequencing data of each sample, obtaining HLA-DQA1, HLA-DQB1, and HLA-DRB1 typing results for the target population samples. Typing allows for accurate identification of allele combinations at HLA loci for each sample. HLA typing determines the allele composition of a sample in major histocompatibility complex gene regions, such as DQA1, DQB1, and DRB1. HLA genes are closely related to immune responses, and some allele combinations are significantly associated with the risk of developing diabetes.

[0036] Interaction analysis was performed on allele combinations of HLA-DQA1 and HLA-DQB1 to determine whether each sample carried a specific combination. First, the presence of an interaction was determined. An interaction refers to the combined effect of two HLA allele combinations on disease risk. If a susceptibility combination exists, the interaction pair score is 1; otherwise, it is 0. The interaction pair weight was obtained by dividing the ratio of cases with interaction pairs to cases without interaction pairs by the ratio of cases with interaction pairs to cases without interaction pairs in the control group.

[0037] Simultaneously, HLA-DRB1 genotyping was analyzed separately to determine whether each sample had the HLA-DRB1 genotype. If it existed, the genotype score was 1; otherwise, it was 0. The scores of each genotype were then summed to obtain the HLA-DRB1 genotype score. The genotype weight was obtained by taking the natural logarithm of the ratio.

[0038] After obtaining the interaction pair scores and weights of HLA-DQA1 and HLA-DQB1, and the genotyping scores and weights of HLA-DRB1, a polygenic risk scoring model for individuals was established by combining the selected significant SNP sites and their weighting coefficients. This model can comprehensively and accurately reflect an individual's genetic risk status. Among them, significant SNP sites represent gene variant sites that are significantly associated with diabetes, the weighting coefficients reflect the contribution of each SNP site to the disease risk, the interaction pair scores and weights reflect the impact of interactions between different genes on the disease risk, and the genotyping scores and weights reflect the association between specific genetic traits and disease risk.

[0039] By establishing an individual polygenic risk scoring model, we can intuitively, comprehensively, and accurately quantify the degree of risk of an individual's disease, reduce the assessment bias caused by one-sided consideration of a single genetic factor, and improve the accuracy and reliability of risk assessment.

[0040] Furthermore, the individual polygenic risk scoring model is as follows: ;in, To score risk, Characterizing the number of SNPs, For SNP index, Indicates the genotype of the SNP. These are weighting coefficients. This indicates the number of interaction pairs between HLA-DQA1 and HLA-DQB1. Characterize the interaction effect on weights, Characterizing the number of HLA-DRB1 genotypes, For fractal weights.

[0041] Specifically, the polygenic risk score (PRS) is calculated based on a statistical model. The individual polygenic risk score model is as follows: ;in, Characterizing the number of SNPs, This represents the genotype of the SNP (0, 1, 2). These are weighting coefficients. This indicates the number of interaction pairs between HLA-DQA1 and HLA-DQB1. Characterize the interaction effect on weights, Characterizing the number of HLA-DRB1 genotypes, For fractal weights.

[0042] By establishing an individual polygenic risk scoring model for multi-gene synergistic assessment, a specific and accurate risk score is provided for each individual, intuitively reflecting their risk level of developing diabetes. This approach is more comprehensive and accurate than assessment methods based on single genetic factors. Through genetic testing and assessment, individuals with higher genetic risk can be identified, enabling risk identification during the asymptomatic stage. This provides a clear target population for early screening, disease prevention, and intervention measures for diabetes, thereby reducing the risk of disease development.

[0043] Based on 88 identified susceptible SNP loci and genotypes, including 58 valid combinations, 20 valid genotypes, and 10 significant SNP loci, the efficacy of the individual polygenic risk scoring model was validated in an independent external validation cohort. Figure 2 As shown, in the discovery cohort, the individual polygenic risk scoring model can effectively distinguish between T1 genetic traits and control genetic traits, with an ACU of 0.903. The specificity and sensitivity of the risk traits are 0.818 and 0.860, respectively. The discovery cohort refers to the sample population used for the construction and parameter training of the individual polygenic risk scoring model. Figure 3 As shown, in the validation queue, the genetic trait classification still has good computational efficiency, ACU=0.857, and the specificity sensitivity of the risk features is 0.826 and 0.776, respectively. The validation queue is a sample population independent of the discovery queue, used to perform external efficiency validation on the established individual polygenic risk scoring model, ensuring that the constructed individual polygenic risk scoring model can accurately distinguish individuals susceptible to diabetes, improve the comprehensiveness and accuracy of risk assessment, and enhance the stability and reliability of risk prediction.

[0044] Furthermore, the interaction pairs of HLA-DQA1 and HLA-DQB1 include 58 valid combinations, and the HLA-DRB1 genotyping includes 20 valid genotypes.

[0045] Specifically, after quality control and sample screening in the target population, the interaction pairs of HLA-DQA1 and HLA-DQB1 included 58 valid combinations. That is, there are 58 allele combinations of HLA-DQA1 and HLA-DQB1 that have a risk impact on diabetes. These HLA-DQA1 and HLA-DQB1 interaction combinations include: .

[0046] The HLA-DRB1 typing includes 20 valid typings, which include: .

[0047] The HLA-DQA1 and HLA-DQB1 interaction pairs include 58 valid combinations, and the HLA-DRB1 genotyping includes 20 valid genotypes. These refer to specific HLA allele combinations or genotypes that significantly influence the risk of diabetes in the target population. There are a total of 58 HLA-DQA1-DQB1 combinations and 20 HLA-DRB1 effective combinations and genotypes. These are used to quantify individual risk in a polygenic risk scoring model. By calculating interaction scores and genotype scores and their corresponding weights, and integrating the individual's comprehensive risk score, the model improves the comprehensiveness and accuracy of individual risk assessment.

[0048] Furthermore, the weighting coefficient is constructed by calculating the ratio of SNP loci in the genetic characteristic group of the target population to that in the healthy control group, and then taking the natural logarithm to obtain the statistical effect value.

[0049] Specifically, the number of individuals carrying the risk allele at each specific SNP locus in both the genetic trait group and the healthy control group within the target population, as well as the number of individuals not carrying the risk allele, are counted separately. This yields the proportions of individuals carrying and not carrying the risk allele in both groups. The ratio of the proportion of individuals carrying the risk allele in the genetic trait group to that in the healthy control group is then calculated to obtain the odds ratio of SNP loci in the genetic trait group to those in the healthy control group. The odds ratio represents the relative probability of disease occurrence for individuals carrying the effect allele compared to those not carrying it. The calculated odds ratio is then logarithmed to convert it into a statistical effect value, resulting in a weighting coefficient.

[0050] By obtaining quantifiable risk weights through statistical association, we can ensure that the contribution of each significant SNP in the polygenic risk score is consistent with the true genetic effect, thereby improving the accuracy and reliability of individual disease risk prediction.

[0051] Further, the significant SNP sites selected included rs117267808, rs73069541, rs2313430, rs2624847, rs2581787, rs6752053, rs76155021, rs9515905, rs505922, and rs117767867.

[0052] Specifically, after quality control and sample filtering using Plink software, significant SNP values ​​for the target population were obtained using logistic regression. Ten significant SNP loci were identified: rs117267808, rs73069541, rs2313430, rs2624847, rs2581787, rs6752053, rs76155021, rs9515905, rs505922, and rs117767867. The information on significant SNP loci is shown in Table 1. Table 1: Information on significant SNP sites The identified significant SNP sites are those that are significantly associated with the risk of diabetes, providing reliable and accurate genetic markers for the construction of individual polygenic risk scoring models, and improving the accuracy and effectiveness of risk assessment by individual polygenic risk scoring models.

[0053] The individual multi-gene risk scoring model is used to conduct a multi-dimensional risk quantification assessment of the target individual and output a risk score.

[0054] Specifically, by obtaining the genetic information of target individuals through gene testing technology and using an individual multi-gene risk scoring model, a multi-dimensional risk quantification assessment of the target individuals is conducted, and the PRS value of the target individuals is calculated to obtain a risk score. This can intuitively reflect the level of risk of an individual developing diabetes, providing a clear target population for early screening, disease prevention and intervention measures for diabetes, and reducing the risk of disease occurrence.

[0055] Example 2, based on the same inventive concept as the individual risk scoring method based on multi-gene characteristics in the foregoing examples, such as... Figure 4 As shown, this application provides an individual risk scoring system based on multi-gene characteristics, wherein the individual risk scoring system based on multi-gene characteristics includes: The data screening module 11 is used to screen significantly associated single nucleotide polymorphism (SNP) sites under the risk category of interest based on genome-wide association analysis results and diabetes literature datasets, and to establish a preliminary candidate site set. The association analysis module 12 is used to acquire captured sequencing data of the genetic characteristics of the target population and the healthy control population, and to perform association analysis using the captured sequencing data and the preliminary candidate site set to generate the original genetic data matrix of the target population. The model building module 13 is used to perform HLA typing and interaction analysis on the original genetic data matrix after genetic characteristic correction, and to establish an individual polygenic risk scoring model using the interaction analysis results. The risk assessment module 14 is used to perform multidimensional risk quantification assessment of the target individual using the individual polygenic risk scoring model and output a risk score.

[0056] Furthermore, the model building module 13 is also used to: perform quality control and sample filtering using plink software, calculate the effect value of each SNP site in the target population based on logistic regression; screen significant SNP sites according to the calculation results and output weighting coefficients; and establish an individual polygenic risk scoring model based on the significant SNP sites and weighting coefficients.

[0057] Furthermore, the model building module 13 is also used to: perform HLA genotyping on the sample data after quality control and sample filtering to obtain the HLA-DQA1, HLA-DQB1, and HLA-DRB1 genotyping results of the target population samples; calculate the interaction between HLA-DQA1 and HLA-DQB1 to obtain the interaction pair score and interaction pair weight; calculate the genotyping score and genotyping weight of HLA-DRB1; and establish an individual polygenic risk scoring model based on the significant SNP sites, weighting coefficients, interaction pair scores, interaction pair weights, genotyping scores, and genotyping weights.

[0058] Furthermore, the model building module 13 is also used for: individual polygenic risk scoring model as follows: ;in, To score risk, Characterizing the number of SNPs, For SNP index, Indicates the genotype of the SNP. These are weighting coefficients. This indicates the number of interaction pairs between HLA-DQA1 and HLA-DQB1. Characterize the interaction effect on weights, Characterizing the number of HLA-DRB1 genotypes, For fractal weights.

[0059] Furthermore, the model building module 13 is also used to: include 58 valid combinations of the interaction pairs of HLA-DQA1 and HLA-DQB1, and include 20 valid types of HLA-DRB1 typing.

[0060] Furthermore, the model building module 13 is also used to: construct the weighting coefficient by calculating the ratio of SNP loci of the genetic characteristic group of the target population to that of the healthy control group, and taking the natural logarithm to obtain the statistical effect value.

[0061] Furthermore, the model building module 13 is also used to: the screened significant SNP sites include rs117267808, rs73069541, rs2313430, rs2624847, rs2581787, rs6752053, rs76155021, rs9515905, rs505922 and rs117767867.

[0062] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on the differences from other embodiments. The individual risk scoring method and specific examples based on multi-gene features in the foregoing embodiment one are also applicable to the individual risk scoring system based on multi-gene features in this embodiment. Through the foregoing detailed description of the individual risk scoring method based on multi-gene features, those skilled in the art can clearly understand the individual risk scoring system based on multi-gene features in this embodiment. Therefore, for the sake of brevity, it will not be described in detail here.

[0063] The above description of the disclosed embodiments enables those skilled in the art to make or use this application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of this application. Therefore, this application is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

[0064] Obviously, those skilled in the art can make several improvements and modifications to this application without departing from the principles of this application, and these improvements and modifications also fall within the protection scope of this application.

Claims

1. An individual risk scoring method based on multi-gene characteristics, characterized in that, The method includes: Based on genome-wide association analysis results and diabetes literature datasets, we screened significantly associated single nucleotide polymorphism sites under risk categories of interest and established a preliminary candidate site set. Acquire the genetic characteristics of the target population and the healthy control population, and use the captured sequencing data and the preliminary candidate locus set to perform association analysis to generate the original genetic data matrix of the target population; After genetic feature correction of the original genetic data matrix, HLA typing is performed, followed by interaction analysis. The results of the interaction analysis are used to establish an individual polygenic risk scoring model. The individual multi-gene risk scoring model is used to conduct a multi-dimensional risk quantification assessment of the target individual and output a risk score.

2. The individual risk scoring method based on multi-gene characteristics as described in claim 1, characterized in that, Genetic feature correction is performed on the original genetic data matrix, including: After performing quality control and sample filtering using Plink software, the effect value of each SNP locus in the target population was calculated based on logistic regression. Significant SNP sites are selected based on the calculation results and weighting coefficients are output. An individual polygenic risk scoring model is then established based on the significant SNP sites and weighting coefficients.

3. The individual risk scoring method based on multi-gene characteristics as described in claim 2, characterized in that, After performing HLA typing, interaction analysis is performed, including: HLA typing was performed on the sample data after quality control and sample filtering to obtain the HLA-DQA1, HLA-DQB1, and HLA-DRB1 typing results of the target population samples; Calculate the interaction between HLA-DQA1 and HLA-DQB1 to obtain the interaction pair scores and interaction pair weights; Calculate the HLA-DRB1 typing score and typing weight; An individual polygenic risk scoring model is established based on the significant SNP sites, weighting coefficients, interaction pair scores, interaction pair weights, genotype scores, and genotype weights.

4. The individual risk scoring method based on multi-gene characteristics as described in claim 3, characterized in that, The individual polygenic risk scoring model is as follows: ; in, To score the risk, Characterizing the number of SNPs, For SNP index, Indicates the genotype of the SNP. These are weighting coefficients. This indicates the number of interaction pairs between HLA-DQA1 and HLA-DQB1. Characterize the interaction effect on weights, Characterizing the number of HLA-DRB1 genotypes, For fractal weights.

5. The individual risk scoring method based on multi-gene characteristics as described in claim 3, characterized in that, The HLA-DQA1 and HLA-DQB1 interaction pairs include 58 valid combinations, and the HLA-DRB1 genotyping includes 20 valid genotypes.

6. The individual risk scoring method based on multi-gene characteristics as described in claim 2, characterized in that, The weighting coefficients are constructed by calculating the ratio of SNP loci in the genetic characteristic group of the target population to that in the healthy control group, and then taking the natural logarithm to obtain the statistical effect value.

7. The individual risk scoring method based on multi-gene characteristics as described in claim 2, characterized in that, The significant SNP sites identified after screening include rs117267808, rs73069541, rs2313430, rs2624847, rs2581787, rs6752053, rs76155021, rs9515905, rs505922, and rs117767867.

8. An individual risk scoring system based on multi-gene characteristics, characterized in that, The steps for implementing the individual risk scoring method based on polygenic characteristics according to any one of claims 1 to 7 include: The data filtering module is used to screen significantly relevant single nucleotide polymorphism sites under the risk categories of interest based on genome-wide association analysis results and diabetes literature datasets, and to establish a preliminary candidate site set; The association analysis module is used to acquire the genetic characteristics of the target population and the captured sequencing data of the healthy control population, and to perform association analysis using the captured sequencing data and the preliminary candidate locus set to generate the original genetic data matrix of the target population. The model building module is used to perform genetic feature correction on the original genetic data matrix, perform HLA typing, perform interaction analysis, and use the interaction analysis results to build an individual multigene risk scoring model. The risk assessment module is used to perform multidimensional risk quantification assessment of the target individual using the individual multigene risk scoring model and output a risk score.