System and method for assessing complex gene-gene interactions for genetic risk diagnosis

A computerized system assesses HLA and IRF5 gene interactions to predict autoimmune diseases, offering high predictive accuracy and enabling early detection and prevention strategies.

WO2026143147A1PCT designated stage Publication Date: 2026-07-02NEW YORK SOC FOR THE RUPTURED & CRIPPLED MAINTAINING THE HOSPITAL FOR SPECIAL SURGERY

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
NEW YORK SOC FOR THE RUPTURED & CRIPPLED MAINTAINING THE HOSPITAL FOR SPECIAL SURGERY
Filing Date
2025-12-23
Publication Date
2026-07-02

AI Technical Summary

Technical Problem

Current methods fail to accurately predict autoimmune diseases like systemic lupus erythematosus and Sjogren’s disease due to the complex genetic interactions involving multiple genetic risk factors, which individually confer small risks and are not sufficient for disease prediction.

Method used

A computerized system and method assess gene-gene interactions between the HLA and IRF5 gene regions by performing statistical analyses, including Fisher exact tests, Bonferroni correction, and Monte Carlo simulation to identify high-risk groups through complex compound heterozygosity and synergy, forming predictive models for autoimmune diseases.

Benefits of technology

The system provides a highly predictive model for autoimmune diseases, with positive predictive values up to 97.6% for early detection, enabling preventive strategies and reducing irreversible organ damage.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure US2025061194_02072026_PF_FP_ABST
    Figure US2025061194_02072026_PF_FP_ABST
Patent Text Reader

Abstract

A computerized system and method are provided for assessing a number of gene-gene interactions between the HLA and IRF5 gene regions. At least one computing device enrolls subjects in a registry, including SLE patients having met classification criteria for SEE and Sjogren's patients having met AECG criteria. Moreover, at least one computing device can perform genotyping for the subjects and healthy control subjects, for submission to a genotyping platform. Further, at least one computing device can develop HLA risk factor models for each of a plurality of stages, and perform statistical analysis for each of the plurality of stages.
Need to check novelty before this filing date? Find Prior Art

Description

SYSTEM AND METHOD FOR ASSESSING COMPLEX GENE-GENE INTERACTIONS FOR GENETIC RISK DIAGNOSISField

[0001] The present disclosure relates, generally, computing technology and, more particularly, to developing a new model for complex multi-locus gene-gene interaction.Background

[0002] Autoimmune diseases, including systemic lupus erythematosus and Sjogren’s disease, are serious chronic conditions characterized by inflammation in multiple organ systems. Immune dysregulation begins years before the emergence of disease. Lack of early detection precludes preventive strategies, and delays in treatment can result in irreversible organ damage. Autoimmune diseases are genetically complex, caused by multiple genetic risk factors. The common genetic variants underlying the diseases individually confer small risk, and thus far have not allowed for prediction of disease.

[0003] The present system and method address these and other deficiencies, and it is with respect to these and other considerations that the disclosure made herein is presented.Brief Summary

[0004] In one or more implementations of the present disclosure, a computerized system and method are provided for assessing a number of gene-gene interactions between the HLA and IRF5 gene regions. At least one computing device enrolls subjects in a registry, including SLE patients having met classification criteria for SLE and Sjogren’s patients having met AECG criteria. Moreover, at least one computing device can perform genotyping for the subjects and healthy control subjects, for submission to a genotyping platform.Further, at least one computing device can develop HLA risk factor models for each of a plurality of stages, and perform statistical analysis for each of the plurality of stages.

[0005] In one or more implementations of the present disclosure, the at least one computing device can perform Fisher exact tests between each HLA allele variant and a respective outcome. Further, the at least one computing device can perform risk factor association scans using Fisher’s exact tests. Moreover, at least one computing device can expand dimensionality and specificity of HLA risk factors by including multiple allele combinations across multiple loci on one of the chromosome strands, and can apply a classification system to N3,out factors. Further, the at least one computing device can definea group risk factor by combining each of a plurality of intra-group combinations using logical OR operators.

[0006] In one or more implementations of the present disclosure, at least one computing device can apply Bonferroni correction and identify individual risk factors having p values less than Bonferroni-corrected 95% significance thresholds.

[0007] In one or more implementations of the present disclosure, the Bonferroni correction includes using Monte Carlo simulation.

[0008] In one or more implementations of the present disclosure, at least one computing device can form a high-risk group by identifying combinations of alleles present at significant frequencies in SLE patients.Brief Description Of The Drawings

[0009] Aspects of the present disclosure will be more readily appreciated upon review of the detailed description of its various embodiments, described below, when taken in conjunction with the accompanying drawings.

[0010] FIG. 1 illustrates charts representing an association analysis representing risk epistasis due to HLA+ (DRB1*3 carrier) and IRF5+ (TACA carrier) gene-gene interaction, non-carriers are indicated by the minus postfix (HLA- or 1RF5-);

[0011] FIG. 2 represents a LD heatmap, in which three regions are modeled with the HLA pairs separately;

[0012] FIG. 3 illustrates the various types of interaction patterns in the high-risk group that were observed between the maternal and paternal HLA alleles and the IRF5 region in each ancestral background and disease category;

[0013] FIG. 4 includes Venn diagrams for each of the sub-models, and the numbers show the subject carrier counts in patient cohorts, showing minimal overlaps, suggesting potentially unique mechanism / pathways for each of the sub-models;

[0014] FIG. 5A illustrates performance of the model at varying levels of complexity, from simple single gene on the left to the final model on the right, with the ORs observed in each model and subject category;

[0015] FIG. 5B shows the ORs of the various risk groups on a logarithmic scale, along with the frequency of that combination group, showing the percentage of patients that the gene-gene combination category applies to;

[0016] FIGS. 6A - 6B illustrate box plots of risk and protective factors for each of the SLE EA, SLE AA, SEE HA, SS EA cohorts, grouped by HLA allele age (0: all alleles are more than 100k years old, 1: contains at least a newer allele that is less than 100k years old);

[0017] FIG. 6C show SLE EA cohort HLA risk factors for each HLA locus, broken down by the allele age groups (N: newer alleles, less than 100k years old; O: older alleles, more than 100k years old), for the protective, low-medium risk (M+L), high risk cases:

[0018] FIG. 7 is a block diagram illustrating steps associated with an example implementation of the present disclosure;

[0019] FIG. 8 is a diagram illustrates an example implementation of the present disclosure, including an association of a plurality of devices and flow of information associated with the devices; and

[0020] FIG. 9 is a block diagram that illustrates functional elements of one or more of a data processing apparatus and computing device.

[0021] FIG. 10 illustrates the various types of interaction patterns in the high-risk group that were observed between the maternal and paternal HLA alleles and the IRF5 region in each ancestral background and disease category. High-risk factor counts are included for SLE European- American (EA), SLE African-American (AA), SLE Hispanic- American (HA), SS European- American (EA) for HLA class combinations interacting with IRF5 region genes (KCP, IRF5, TNPO3) and with no interaction (HLA only). The diagrams illustrate chromosome 6 pairs for HLA genes at the top, and chromosome 7 for IRF5 region genes at the right side.

[0022] FIG. 11A. Odds ratios for high-risk classes for each cohort studied (SLE EA, SLE AA, SLE HA, SS EA), comparing the association strength for the sub-models of HLA risk factors with IRF5 region gene interactions (KCP, IRF5, TNPO3), and for the combined models that include all risk factors. The asterisks indicate the p-values for each model.

[0023] FIG. 11B. The Venn diagrams at the top are labeled for each of the sub¬ models, and the numbers show the subject carrier counts in patient cohorts, showing minimal overlaps, suggesting potentially unique mechanism / pathways for each of the sub-models.

[0024] FIG. 11C. Association analysis results for SLE EA, SLE AA, SLE HA, SS EA showing increasing strength from Model 0 (single alleles / haplotypes). Model 1 (single HLA alleles interacting with IRF5 region genes), Model 2 (2-dimensional HLA alleles interacting with IRF5 region genes), Model 3 (3 -dimensional HLA alleles interacting with IRF5 region genes). The odds ratios indicate association strength for SLE patients relative to control population for each model.

[0025] FIGS. 12A. Bar plots show composite odds ratios (bars ± 95% CI; log scale, top panels) and corresponding -log10 p-values (bottom panels) for high-, medium-, and low-risk variant classes across SLE European American (SLE-EA), African American (SLE-AA), Hispanic American (SLE-HA), and Sjogren’s Syndrome European American (SS-EA) cohorts. Discovery: full risk factors set applied on discovery dataset; Validation: replicated risk factors set applied on validation dataset; Meta: replicated risk factors set applied to discovery and validation datasets. Horizontal dotted lines mark the nominal significance threshold (p = 0.05).

[0026] FIG. 12B. Association analysis results for the Model 3, with the 95% confidence intervals shown, for the odds ratios and cumulative coverage (percentage of subjects that carry the gene-gene combination factors in each cohort). The positive predictive value (PPV) and negative predictive value (NPV) are shown for the high-risk groups.

[0027] FIG. 13A. Association between composite risk odds ratios and HLA allele age category. Each point represents a composite genetic risk factor. Each point represents a risk factor, grouped by HLA allele age (0: all alleles are more than 100k years old, 1: contains at least a newer allele that is less than 100k years old). The y-axis shows odds ratios (OR) from the training dataset. Mean and 95% confidence intervals are shown as diamonds and whiskers. A two-sample t-test assuming unequal variances indicated that composite risk factors containing at least one younger allele had significantly higher odds ratios than those composed of older alleles (p=1.6e-7), suggesting the increased risk for SLE / SS diagnosis may be associated with newer HLA alleles.

[0028] FIG. 13B. SLE EA cohort HLA risk factors for each HLA locus, broken down by the allele age groups (N: newer alleles, less than 100k years old; O: older alleles, more than 100k years old), for the protective, low-medium risk (M+L), high risk cases. Similar patterns are observed across all loci, where higher risk factors are associated with higher proportions of newer allele pairs (N / N), and lower proportions of older allele pairs (O / O), with the exception of the DQA1 & DQB1 loci.Detailed Description of Certain Implementations

[0029] B y way of introduction and overview, a computerized method and system are provided, including a new genetic model for autoimmune disease, complex multi-locus genegene interaction, which can be usable in diagnostic and preventive strategies. The present disclosure includes features assessing a number of gene-gene interactions between the human leukocyte antigen (HLA) and interferon regulatory factor 5 (IRF5) gene regions thatcumulatively result in an extremely strong predictive model for lupus, for example, in European ancestry (OR=583.5, p=3.6 x 10-68) at a respective level of clinical utility. These gene-gene interactions can be characterized by complex compound heterozygosity in the HLA region and synergy with the IRF5 region. High-risk combinations can be associated with more severe disease in the lupus cohort. Similar but distinct interactions are observable in lupus patients of other ancestral backgrounds and Sjogren’s disease patients, which supports a conclusion of relevance of this overall genetic model in autoimmune disease. HLA alleles which arose in the human lineage <100,000 years ago are over-represented in high-risk combinations, and older alleles are more common in protective combinations. This suggests evolutionary pressure from infectious diseases resulting in pro-inflammatory HLA alleles that confer autoimmunity risk.

[0030] Both the HLA and 1RF5 gene regions have been strongly associated with systemic lupus erythematosus (SLE), as well as autoantibody formation in SLE patients. One hypothesis set forth herein is that IRF5 variants may interact with HLA variants to predispose to autoimmunity. Modeling well characterized risk alleles for SLE from the 1RF5 and HLA regions, in combinations, revealed an expected association with SLE and additive risk in combinations. FIG. 1 illustrates charts representing an association analysis representing risk epistasis due to HLA+ (DRB 1*3 carrier) and IRF5+ (TACA carrier) gene-gene interaction, non-carriers are indicated by the minus postfix (HLA- or IRF5-). As illustrated in FIG. 1, the upper and lower 95% confidence intervals are shown by the lines. Association analysis results are shown in tabular form. More particularly, FIG. 1 indicates, via an association analysis, risk epistasis due to HLA+ (DRB 1 *3 carrier) and IRF5+ (TACA carrier) gene-gene interaction, non-carriers by the minus postfix (HLA- or IRF5-).

[0031] Furthermore, more complex interactions are detected comprehensively, including by examining combinations of major common haplotypes across the HLA region, as well as the interaction of 1RF5 gene region variants with these HLA region pairs. In operation, reported HLA association with SLE fits a compound heterozygosity model, in which the associated allele exerts a stronger effect in the heterozygous state than the homozygous state. A potential synergy between different HLA alleles in susceptibility is at least suggested. Moreover, the present disclosure includes steps associated with examining combinations of HLA alleles across the class I and class II HLA region while focusing on HLA alleles that individually show significant associations. Moreover, an interaction between these HLA combinations and common variants in the IRF5 region is sought, as described in greater detail herein. The IRF5 region, for example, can be represented as three separatelinkage disequilibrium (LD) regions, the KCP gene upstream of IRF5, the region containing and immediately surrounding IRF5, and the downstream region that contains the TNPO3 gene. These different regions are chosen at least partially because they are not in strong LD with each other, as illustrated in FIG. 2, which represents a LD heatmap, in which these three regions are modeled with the HLA pairs separately. The heatmap illustrates pairwise linkage disequilibrium (LD) across the analyzed single nucleotide polymorphisms (SNPs) in IRF5 regions, represented by r2values. Higher r2values (darker shading) indicate stronger LD between pairs of SNPs, signifying closer genetic association, where lower r2values (lighter shading) represent weaker LD.

[0032] The present disclosure includes systems and methods that identify combinations of alleles that were present at significant frequencies in SLE patients, which were not found in controls. Accordingly, these identified combinations form a high-risk group. The approach set forth herein for this group mirrors that used in monogenic disease, in which the abnormal disease associated variant is not commonly found in healthy populations. FIG. 3 illustrates the various types of interaction patterns in the high-risk group that were observed between the maternal and paternal HLA alleles and the 1RF5 region in each ancestral background and disease category. High-risk factor counts are represented for SLE European- American (EA), SLE African- American (AA), SLE Hispanic- American (HA), SS European- American (EA) for HLA class combinations interacting with IRF5 region genes (KCP, IRF5, TNPO3) and with no interaction (HLA only). Chromosome 6 pairs are represented for HLA genes at the top (class I: yellow, class II: orange), and chromosome 7 for IRF5 region genes at the right side. There are 35, 25, 23, 3 high-risk factors for SLE EA, SLE AA, SLE HA, SS EA respectively. Accordingly, the HLA allele pairs largely fit the compound heterozygosity model and frequently include alleles in both class I and class II HLA regions. Interestingly, class 11 / class II allelic combinations appear more common in African-American SLE high-risk combinations as compared to European-American SLE (p=0.02). These associated high-risk combinations are epistatic multiplicative gene-gene interactions between HLA allele pairs and 1RF5 region variants, which supports a staggering level of complexity in the interaction between these two chromosomal regions.

[0033] Table 1, below, shows the top 3 individual high-risk combinations for each IRF5 gene region in European ancestry. Top 3 high-risk factors for SLE EA sorted by odds ratio, for each interaction group (HLA+KCP, HLA+IRF5, HLA+TNPO3, HLA only). The odds ratios (OR) indicate association strength for SLE patients relative to control population for subjects that carry the respective risk factors. The epistatic coefficient is the differencebetween the observed OR and the expected OR for an additive model. A positive epistatic coefficient indicates a synergistic or enhanced interaction that is stronger than expected, if they were simply additive. Further, odds ratios are provided for high-risk classes for each cohort studied (SLE EA, SLE AA, SLE HA, SS EA), comparing the association strength for the sub-models of HLA risk factors, without (HLA only) and with IRF5 region gene interactions (KCP, 1RF5, TNPO3), and for the combined models that include all risk factors. The asterisks indicate the p-values for each model.

[0034] With reference to FIG. 4, Venn diagrams can represent each of the submodels, and the numbers show the subject carrier counts in patient cohorts, showing minimal overlaps, suggesting potentially unique mechanism / pathways for each of the sub-models. The full list of high-risk combinations is shown in Table 2, including Bonferroni p-value threshold of 1.3E-6 to account for the number of possible allelic pairs. These epistatic genegene combinations not found in controls were combined to form the high-risk group.Grouping these allelic combinations together can produce a predictive power similar to that of a monogenic disorder with high penetrance. The strength of association of these HLA / IRF5 combinations organized by the various 1RF5 regions in high-risk class is shown in FIG. 4. The absence of the high-risk combinations in controls is consistent with an estimated very low frequency, given known allele frequencies and the assumption of independent assortment of these loci that reside on different chromosomes (maternal and paternal strands at chromosome 6 for HLA, and chromosome 7 for IRF5). Expected frequencies of such high-risk allelic combinations in controls were estimated using a Monte Carlo simulation, using their allelic frequencies and assuming independent assortment, as described in greater detail herein. As shown in Tables 1 and 2, frequencies of these SLE-associated combinations range from 3.4e-4 (1 in 2976 people) to 1.5e-6. 12.7 percent of SLE patients had one of these high-risk combinations, supporting the idea that this model explains a significant proportion of the liability to SLE. Interestingly, the high-risk (IRF5 haplotype) group had statistically significant associations with more severe clinical manifestations of SLE (ACR score >9), OR=8.1(1.6-41.5), as represented in Table 2.Table 1Risk Factor Interaction Case Contr Odds Ratio P- Epistatic freq, ol (95% CI) value Coefficie (obs.) freq, nt (est.)DPB1*1 / C*7& B*27 & HLA+KCP 4.2E- 3.5E-05 112.32 1.7E- 11.90 rsl 495461_CT 03 OS (45.35-278.23) 09 DRB1*3 / DRB1*15& DP HLA+KCP 3.4E- 4.3E- 78.30 3.1E- 14.62 Bl *5 & rs1495461_CT 03 05 (28.69-213.72) 07 DRB1*15 / DQB1*2& C* HLA+KCP 6.7E- 3.4E- 20.10 1.2E- 27.31 3 & rs1495461_CT 03 04 (9.99-40.46) 08 B*8 / B*37& C*7 & HLA+IRF5 3.4E- 1.9E- 177.21 1.4E- 15.48 TATA 03 05 (63.15-497.28) 08 DRB1*1O / DQB1*2& C* HLA+IRF5 3.4E- 2.2E- 153.04 2.4E- 13.56 6 & GCTA 03 05 (54.90-426.60) 08 B*7 / C*7& A*3 & HLA+IRF5 3.4E- 2.6E- 132.04 4.2E- 12.95 TACAhom 03 05 (47.65-365.90) 08 DQA1*5 / DQA1*5& C*2 HLA+TNP 4.2E- 1.5E- 2808.20 4.2E- 21.39 & TNPO3 03 03 06 (670.36- 1511763.81) C*12 / DQB1*2& DQA1* HLA+TNP 3.4E- 2.5E- 1346.80 1.6E- 13.11 3 & TNPO3 03 03 06 (361.21- 115021.57) DRB1*15 / A*1& DPB1* HLA+TNP 5.0E- 4.0E- 1264.75 1.3E- 23.05 1 & TNPO3 03 03 06 (438.17- 163650.65)B*18& A*3 / B*8 HLA only 5.0E- 7.9E- 64.03 1.0E- 19.2603 05 (28.28-144.98) 09 DRB1*15& C*15 / A*1 HLA only 4.2E- 1.0E- 41.70 2.0E- 15.3203 04 (17.14-101.46) 07 B*27 / DQB1*2& DQA1* HLA only 5.0E- 2.3E- 21.80 5.1E- 20.363 03 04 (9.72-48.88) 07Table 2Risk HLA Allele Frequency (%) OR p-value Factors Case (N) Control (N) (95% CI)HLA- IRF5- DRB1*3 & 51.01 66.73 0.52 (0.45- 1.1 x 10-18TACA 0.60)HLA- IRF5+ DRB1*3 & 21.81 16.21 1.44 (1.20- 5.1 x 10-5TACA 1.73)HLA+ IRF5- DRB1*3 & 17.11 13.22 1.36 (1.11- 1.6 x 10-3TACA 1.65)HLA+ DRB1*3 & 10.07 3.84 2.80 (2.09- 3.3 x 10-12IRF5+ TACA 3.77)

[0035] The present disclosure further includes systems and methods to explore data sets from non-European ancestry SEE patients using a similar interaction model such as, for example, shown in FIG. 4. In Amerindian ancestry SLE, the present disclosure reveals evidence for a similar epistatic model with the 1RF5 risk haplotype TACA showing the strongest interaction with HLA compound heterozygous genotypes, and the odds ratio for high-risk group in this ancestry is OR = 393.1 (24.3 - 6347.5), p=1.2x10-43. The overall high-risk group is significantly associated with more severe (ACR score >7) with OR=1.9(1-3.5),p=3.4e-2 and arthritis [OR=1.8(1-3.2), p=3.9e-2], (see, for example, Table 2). In African-American subjects, the present disclosure again reveals an epistatic relationship between HLA and IRF5. The odds ratio for African-American cohort for the high-risk group is OR= 368.7 (23 - 5941.5), p=6.5x10-45. The overall high-risk group is significantly associated with more severe (ACR score >9) with OR=2.6( 1.4-4.6), p=2.5e-3 and renal disorder (OR=2.3(1.3-4.1), p=1.8e-3) and (Table 2). In the Venn diagrams shown in FIG. 4, each subject is plotted according to the number of high-risk genetic models fulfilled thereby. The vast majority of subjects fit in to only one group. In the case of overlap, it is most commonly a different HLA pair interacting with an alternate IRF5 allele that results in being categorized in two models, supporting the idea that there are two distinct risk models operative in that person rather than simple correlation between nearby alleles resulting in dual categorization (see Table 2).

[0036] Moreover, the systems and methods set forth in the present disclosure can be effective to establish medium risk, low risk, and protective groups, as above in each ancestralbackground. The protective group can contain allele combinations that were rarely observed in patients, providing strong power to exclude disease (OR = 0.1 (p=2.4e-2), NPV =99.8 %, accuracy = 99.1%). While the observed odds ratios for the high-risk groups may be somewhat lower in non-European ancestry, it has been noted that the HLA region is not as strongly associated with SEE in non-European ancestry. One or more algorithms set forth in the present disclosure work well in these two additional ancestral backgrounds, thereby providing additional validation.

[0037] Further, the present disclosure provides systems and methods for analyzing data from Sjogren’s syndrome, another autoimmune disease in which both HLA and IRF5 are risk factors. Using the same algorithm, many complex epistatic interactions between HLA allele pairs and IRF5 can be revealed. Such combinations differ from those found in SLE, supporting differences in the genetic pathogenesis of these two autoimmune diseases. Pulling the high-risk alleles together in a group produced an OR of 95.7 (5.6 - 1629), p = 1.8x10-8 for SS. These data demonstrate that the overall strategy extends beyond SLE to other autoimmune diseases, and provides yet another replication of the strategy.

[0038] FIG. 5A illustrates performance of the model at varying levels of complexity, from simple single gene on the left to the final model on the right, with the ORs observed in each model and subject category. As illustrated in FIG. 5 A, association analysis results for SLE EA, SLE AA, SLE HA, SS EA show increasing strength from Model 0 (single alleles / haplotypes), Model 1 (single HLA alleles interacting with IRF5 region genes), Model 2 (2-dimensional HLA alleles interacting with IRF5 region genes), Model 3 (3-dimensional HLA alleles interacting with 1RF5 region genes). Moreover, the odds ratios indicate association strength for SLE patients relative to control population for each model.

[0039] FIG. 5B shows the ORs of the various risk groups on a logarithmic scale, along with the frequency of that combination group, showing the percentage of patients that the gene-gene combination category applies to. In each case, the high-risk group represents a significant proportion of patients (4.2-19.3%). More particularly, association analysis results are shown in FIG. 5B for Model 3, with the 95% confidence intervals shown, for the odds ratios and cumulative coverage (percentage of subjects that carry the gene-gene combination factors in each cohort). The positive predictive value (PPV) and negative predictive value (NPV) are shown for the high-risk and protective groups.

[0040] FIGS. 6A and 6B illustrate box plots of risk and protective factors for each of the SLE EA, SLE AA, SLE HA, SS EA cohorts, grouped by HLA allele age (0: all alleles are more than 100k years old, 1: contains at least a newer allele that is less than 100k years old).Each point on the plot corresponds to a risk / protective factor. The lines connect the means of the groups. The SLE EA and SS EA cohorts show >95% significance in a t-test, where the odds ratios are higher for the factors that contain at least one newer allele, suggesting the increased risk for SLE / SS diagnosis may be associated with newer HLA alleles. For example, the estimated age of the HLA alleles within the human lineage are examined using MAFFT alignment and BEAST software. Using a cutoff of <100,000 years old as a newer allele, and >100,000 years old as older alleles, the HLA alleles that combine with IRF5 are revealed to form the high-risk groups that are significantly enriched in newer alleles, while the protective group is enriched for older alleles, and the medium and low risk groups are intermediate. Interestingly, this pattern can be observed with class I alleles, and in class II alleles with the exception of DQA1 in which no age difference was observed. This suggests evolutionary pressure due to infectious disease driving more pro-inflammatory HLA allele development which then carries additional risk of SLE. Evolution and selec tion in the HLA region are complex, and the large number of alleles that can be identified in accordance with the present disclosure makes it likely that this relationship is complex.

[0041] FIG. 6C illustrates SLE EA cohort HLA risk factors for each HLA locus, broken down by the allele age groups (N: newer alleles, less than 100k years old; O: older alleles, more than 100k years old), for the protective, low-medium risk (M+L), high risk cases. Similar patterns are observed across all loci, where higher risk factors are associated with higher proportions of newer allele pairs (N / N), and lower proportions of older allele pairs (O / O), with the exception of the DQA1 locus.

[0042] The systems and methods provided herein indicate that both IRF5 and HLA are among the strongest common genetic risk factors for SLE, and these alleles are demonstrated to combine in powerfully synergistic ways to influence autoimmune disease risk. The degree of complexity is usable to explain the relatively small number of gene-gene interactions, despite the large number of risk loci documented in complex disease. Epistatic gene-gene interactions with complex compound heterozygosity would not be detected by typical interaction screening methods, which typically do not take into account multiple variants and multiple loci simultaneously. A combination of prior knowledge, biological hypotheses, and directed models may be required to identify this type of interaction.

[0043] The associated HLA regions are long haplotypes containing many immune system genes, and there is precedent that both regulatory elements as well as coding-change polymorphisms in MHC molecules mediate risk of autoimmunity. The HLA polymorphisms may alter MHC binding to peptides and thus mediate the break in tolerance to specificantigens, while regulatory variations impacting other immune system genes in the HLA region could synergize with IRF5-related immune activation to result in risk of SLE or Sjogren’s. The strength of association in the high-risk group is analogous to single gene defects associated with SLE, which have yielded important biological insights. It is expected that study of the biological impact of these allelic combinations will shed significant light on SLE pathogenesis. There is previous precedent for HLA class I / class II interactions in both clinical and serological autoimmunity, although dual compound class I / class II heterozygosity interacting with a non-HLA allele has not been reported.

[0044] The high-risk group HLA alleles are more often newer alleles in the human lineage (<100,000 years old), which is striking and may suggest the action of selective forces driving risk of disease. While this precedent has been set in some monogenic examples, such as the APOL1 gene, it is not certain whether this is the case in the HLA region. Tire HLA region is complex, containing a large number of immune system genes, and some studies have suggested that features of selection are present in this region. It is logical that selective pressure from infectious disease could drive selection at the HLA locus toward more pro-inflammatory alleles, and that this could result in risk of SLE. And this could well be the case for IRF5 as well.

[0045] In accordance with the findings produced in accordance with the teachings herein, the gene-gene interactions upon SLE risk are an actionable result. With SLE prevalence of 164 per 100,000 (0.16%) in European ancestry women, the positive predictive values for each cohort can be predicted as a measure of clinical diagnostic utility. In European ancestry SLE, the high-risk group has a PPV of 97.6% for individuals carrying the high-risk combinations to develop SLE. For Amerindian ancestry SLE the PPV is 96.2%, African American ancestry SLE PPV is 96.4%, and European ancestry Sjogren’s PPV is 71.5% (see, FIG. 5B). This novel strong predictive power can be useful in clinical applications, such as for early detection of autoimmune diseases (even at a very young age or for newborns). Moreover, the time to diagnosis can play a significant role in the outcome of the disease, as those diagnosed earlier can potentially seek preventive treatment or a close monitoring program. While HLA typing is likely too expensive to consider for the general population, the present disclosure makes screening in high-risk groups now feasible. First-degree relatives of SLE patients have increased risk of disease at 4-8% (21, 22). A first-degree relative that carries one of the reported high-risk IRF5 / HLA genotypes would have an almost determinative probability to develop the disease. Prevention trials for SLE are underway, and the risk genotypes that can be reported can help in strategizing recruitment.Prevention trials are currently limited by low rates of people developing disease, and enrichment of high-risk individuals would greatly increase the chance to see statistically significant differences at the end of the study. In accordance with the systems and methods shown and described herein, complex genetic interactions between common risk alleles can be documented, resulting in significant overall liability to SLE and autoimmunity, and the novel genetic model of the present disclosure can provide a way forward to detect gene-gene interactions in other complex diseases.

[0046] With reference now to particular methodologies, in an example implementation of the present disclosure SLE patients were enrolled in the registry' having met the American College of Rheumatology classification criteria for SLE and Sjogren’s patients met the American-European Consensus Group (AECG) criteria (702, FIG. 7). Healthy controls included 1502 European-American, 473 Amerindian, and 1372 African-American control samples from OMRF, and subject data from the 1000 Genomes database was also examined as an additional control data set (503 European, 347 Amerindian, and 661 African ancestry control samples). Genetic ancestry was confirmed in all subjects and no duplicates included.

[0047] With regard to genotyping (704), subjects in the OMRF registry' and the healthy controls can be genotyped and passed quality control for submission to the IMMUNOCHIP genotyping platform. HLA haplotypes can be imputed from the SNPs available on the chip, such as 5054 SNPs used for the HLA imputation. All case and control data can be imputed together. IRF5 SNP rs2004640 can be imputed using a server computing device, such as the Michigan Imputation Server. HLA alleles can be imputed for all subjects using the same server and HIBAG. Further, HLA alleles with a posterior probability >0.9, >0.8, >0.7 for European- American, Amerindian, and African- American respectively were taken forward into subsequent analyses.

[0048] Thereafter, risk factor models developed and statistical analysis performed (706). Risk factor models can be developed in multiple stages 708 independently in SLE for each ancestry separately (European-American, Hispanic- American, African-American) and for Sjogren’s disease, using a similar framework. An example of this framework is described below for each of example stages 1 -6.

[0049] At stage 1, multiple Fisher’s exact tests can be performed to check for association between each HLA allele variant and the outcome (diagnosis vs control). The number of potential risk factors tested in this stage can be Ni,inand Ni,outfactors obtained that meet suitable significance criteria Ci for this stage.

[0050] At stage 2, 3-component risk factor association scans (using Fisher’s exact tests) can be performed. Each risk factor can contain three components: a left-strand HLA allele (for each of the N i,out factors in Stage 1 ). a right-strand HLA allele (for each of the Ni.out factors in Stage 1), and a 3rd component for one of IRF5, KCP, TNPO3, or no third component. 1RF5: subjects that carry each specific common IRF5 haplotype (TACA, TATA, GCTG, GCTA, TCTA) is treated as a risk factor carrier. KCP: subjects that carry each specific allele pair of SNP rsl495461 is treated as a risk factor carrier. TNPO3: a logistic regression model can be applied with Least Absolute Shrinkage and Selection Operator (LASSO) regularization as a predictor based on the 85 SNPs (GRCh37 chromosome positions 128593948 to 128695983). Further, a cross-validation method can be used to optimize the model. Subjects with predicted probability >0.5 can be treated as a risk factor carrier, and non-carrier otherwise. The models AUC= 0.7, 0.74, 0.69 for FA, HA and AA respectively. Further, HLA only: no 3rd component.

[0051] In accordance with an example implementation of the present disclosure, in total, there are N2,in potential factors tested in Stage 2, and that results in N2,out factors that meet a suitable significance criteria C2 for this stage.

[0052] At stage 3, for the N2,in factors from Stage 2 that are not in N2,out and meet criteria, the dimensionality and specificity of the HLA risk factors are expanded by including multiple allele combinations across multiple loci on one of the chromosome strands. This results in 4-component risk factors, on which association analysis (using Fisher’s exact test) can be performed. Each 4-component risk factor can contain: a left-strand HLA allele, a right-strand HLA allele, a 3rd component (IRF5, KCP, TNPO3, or none), and a 4th HLA allele component that is located on one of the two strands. In total, there can be N3,in potential factors tested in Stage 3, and that results in N3,outfactors that meet suitable significance criteria C3 for this stage.

[0053] At stage 4, a classification system can be applied to the N3,out factors according to the following criteria. Combinations that have an odds ratio >10 that meet 95% significance and are not found at all in controls classified into high-risk group. Combinations that have odds ratio > 10 that meet 95% significance and have at least 1 sample in controls are classified into the medium-risk group. Factors that have odds ratio 8-10 with 95% significance are grouped into low-risk group, and factors that have odds ratio < 0.1 with 95% significance are grouped into protective group. Regularized regression (using ELASTIC NET with AIC validation to optimize for the regularization hyperparameters) can be applied that resulted in 101 significant risk factors, (have effect test p-value < 0.05) 65 for high-risk, 19for medium-risk, 12 for low-risk, and 5 for protective groups. ELASTIC NET was used to identify risk factors that contribute significantly to disease association, remove correlated / redundant factors and penalize model complexity (this is particularly helpful for excluding HLA risk factors that are in LD), which is effective to manage multicollinearity and to perform model feature selection to prevent overfitting. The result of Stage 4 are N4,outrisk factors, classified into several groups: N4,out high factors in high-risk group, N4, out medium in medium-risk group, N4.0ut.i0w in low-risk group, and N4,out,Protective in protective group.

[0054] At stage 5, for each of the 4 risk groups (protective, low, medium, high), a group risk factor can be defined by combining each of the intra-group combinations using logical OR operators. For example, a high-risk group carrier carries at least one of the N4, out, high factors. Association analysis against the diagnosis (vs control) can then be performed on each of the 4 group risk factors using Fisher’s Exact test, and the group risk factors defined as Fproteetive, Flow, F edium, and F igh.

[0055] At stage 6, to account for the collective Type 1 error rate from multiple hypothesis testing performed across Stages 1-5, Bonferroni correction factors can be applied to account for the number of possible combinations to the significance thresholds for each of the risk factors in Fproteetive, Flow, Fmedium, Fhigh. To aid with this analysis, the limited control samples can be expanded using Monte Carlo method to simulate the upper bounds of the risk carrier frequencies in the control cohort, based on independent assortment of 3-dimensional components (HLA left strand, HLA right strand, and the 3rd component as defined in Stage 2). Given the associations observed from the sample data of these risk factors, the true carrier frequencies are expected to be lower than independent random assortment frequencies. The Monte Carlo simulation can be performed for 2 million samples, which is useful to obtain non-zero factor carrier count. For example, most SLE EA high-risk factors have carrier frequencies between 1 in ~10 thousands to -700 thousands in the control cohort, as opposed to only -2000 control samples.

[0056] Using the expanded control samples data with Monte Carlo simulation and the complete case cohort data, Bonferroni correction can be applied for each of the individual factors in Fprotective, Flow, Fmedium, Fhigh, and identify the individual risk factors that have p values less than the Bonferroni-corrected 95% significance thresholds. This can result in N’ 4, out, protective) N’ 4.out.low, N’ 4, out, medium^ N’4, out, high factors for each of the risk groups.

[0057] Moreover, for completeness, the N' 4,out.protee ve- N’ 4, out, low, N’ 4,out.medium, and N’ 4,out,high factors as in Stage 5, can form the risk group factors F' protective, F low, F medium* F’ high, and apply additional Bonferroni correction for combining the intra-group factors.

[0058] In connection with allele age, a phylogenetic analysis can be performed to estimate the age of HLA alleles by constructing a phylogenetic tree and using the branch lengths to estimate divergence times. HLA nucleotide sequence data can be retrieved, for example, from HLA IPD-IMGT / HLA and multiple sequence alignment performed using a multiple sequence alignment program (MAFFT) and Bayesian evolutionary analysis sampling trees (BEAST) software that uses Bayesian statistical methods to infer evolutionary relationships, divergence times, and demographic history from the genetic sequence data.

[0059] In connection with clinical phenotypes, an association analysis can be performed for the case cohorts between the group risk factors Fprotective, Flow, Fmedium, Fhigh against each clinical manifestation using Fisher’s exact tests.

[0060] Accordingly, and as shown and described herein, application of a new genetic model for autoimmune disease and complex multi-locus gene-gene interaction is provided, which is usable, for example, in diagnostic and preventive strategies.

[0061] Referring now to FIG. 8, a block diagram is shown illustrating an example implementation of the present disclosure and that represents an association of a plurality of devices and the flow 108 of information associated with the devices. In the example shown in FIG. 8, various computing devices 802 and 804 are shown, each capable of executing desktop and / or mobile computing device web browser application / s) including MICROSOFT EDGE, INTERNET EXPLORER, CHROME, FIREFOX, and other (e.g., SAFARI, OPERA). In addition to standard web browser application functionality, user information can be gathered via Push Notifications, and information can be retrieved from a computing device using a “REST” interface. Various mobile devices running different operating systems are shown, including IOS, ANDROID and other (e.g., PALM, WINDOWS or other mobile device) operating system.

[0062] In the example shown in FIG. 8. one or more data processing apparatuses 802 is operatively coupled to one or more user computing device(s) 804. Devices 802 / 804 can be respectively operated by one or more users skilled in the use of the proposed workflow, including, but not limited to, healthcare providers and associated staff, medical specialists, and / or biomechanical specialists. Healthcare providers can include, for example, physicians, physician assistants, nurses, therapists and / or other providers of healthcare services.Biomechanical specialists can include, for example, engineers specialized in biomechanics. Data processing apparatus 802 and / or user computing device 804 can be operable to access and / or store various information on database(s) 803 including, for example, historic medical and procedure information patients, physicians, devices, or the like.

[0063] Continuing with reference to FIG. 8, network 806 is illustrated, which can be configured as a local area network (LAN), wide area network (WAN), Peer-to-Peer network (“P2P”), Multi-Peer network, the Internet, one or more telephony networks or a combination thereof, that is operable to connect data processing apparatus 802 and / or devices. Though many of the examples and implementations shown and described herein relate to product and / or service recommendations, many other forms of content can be provided and / or delivered by system 800.

[0064] FIG. 9 is a block diagram that illustrates functional elements of one or more of data processing apparatus 802 or computing device 804 and preferably include one or more central processing units (CPU) 902 used to execute software code in order to control operations, including of data processing apparatus 802, read only memory (ROM) 904, random access memory (RAM) 906, one or more network interfaces 908 to transmit and receive data to and from other computing devices across a communication network, storage devices 910 such as a hard disk drive, solid state drive, universal serial bus (USB) drive, floppy disk drive, tape drive, CD-ROM or DVD drive for storing program code, databases and application code, one or more input devices 912 such as a keyboard, mouse, track ball and the like, and a display 914.

[0065] The various components of devices 802 and / or 804 need not be physically contained within the same chassis or even located in a single location. For example, storage device 910 can be located at a site which is remote from the remaining elements of computing devices 802 and / or 804 and can even be connected to CPU 902 across communication network 806 via network interface 908.

[0066] The functional elements shown in FIG. 9 (designated by reference numbers 902-914) are preferably of the same categories of functional elements preferably present in computing device 802 and / or 804. However, not all elements need be present, for example, storage devices in the case of mobile computing devices (e.g., smartphones), and the capacities of the various elements are arranged to accommodate expected user demand. For example, CPU 902 in computing device 804 can be of a smaller capacity than CPU 902 as present in data processing apparatus 802. Similarly, it is likely that data processing apparatus 802 will include storage devices 910 of a much higher capacity than storage devices 910 present in computing device 804. Of course, one of ordinary skill in the art will understand that the capacities of the functional elements can be adjusted as needed. For example, one or more graphics processing units (GPU) can be utilized for processing and providingfunctionality shown and described herein. In addition, or in the alternative, a cluster of computing devices can work to provide functionality shown and described herein.

[0067] The nature of the present disclosure is such that one skilled in the art of writing computer executed code (software) can implement the described functions using one or more or a combination of a popular computer programming language including but not limited to C++, JAVA, ACTIVEX, HTML, XML, ASP, SOAP, IOS, OBJECTIVE C, ANDROID, TORR, PYTHON, MATLAB, and various web application development environments.

[0068] As used herein, references to displaying data on computing device 804 refer to the process of communicating data to the computing device 804 across communication network 806 and processing the data such that the data can be viewed on the user computing device 804 display 914 using a web browser, custom application or the like. The display screens on computing devices 802 / 804 present areas within system 800 such that a user can proceed from area to area within the system 800 by selecting a desired link. Therefore, each user’s experience with system 800 will be based on the order with which (s)he progresses through the display screens. In other words, because the system is not completely hierarchical in its arrangement of display screens, users can proceed from area to area without the need to “backtrack” through a series of display screens. For that reason and unless stated otherwise, the following discussion is not intended to represent any sequential operation steps, but rather the discussion of the components of system 800.Further Examples

[0069] Aspects of the present disclosure are now further described with reference to FIGS. 10-13B, which provide further data related to identifying combinations of alleles that are present at significant frequencies in SLE patients, which were not found in controls. Specifically, the following data shows a number of gene-gene interactions between the HLA and IRF5 gene regions that cumulatively result in an extremely strong predictive model for lupus in European ancestry (meta-analysis of discovery and validation cohorts: OR=981.9 (61.2-15,762.3, p=2.2 x 10-103), which is at the level of clinical utility. As with the foregoing data, these gene-gene interactions are overwhelmingly characterized by complex compound heterozygosity in the HLA region and synergy with the IRF5 region. The high-risk combinations were associated with more severe disease in the lupus cohort. Similar but distinct interactions were observed in lupus patients of other ancestral backgrounds and Sjogren’s disease patients, supporting the relevance of this overall genetic model inautoimmune disease. HLA alleles which arose in the human lineage <100,000 years ago are again over-represented in high-risk combinations, and older alleles are more common in lower risk combinations.

[0070] Patients and samples

[0071] SLE patients were enrolled in the registry having met the American College of Rheumatology classification criteria for SLE and Sjogren’s patients met the AECG criteria. Patients were recruited at the Oklahoma Medical Research Foundation and the University of Alabama (1719 European- American, 698 Amerindian, and 1744 African- American SLE patient samples, and 264 European-American Sjogren’s patient samples). Healthy controls included 1502 European- American, 473 Amerindian, and 1372 African-American control samples from OMRF, and subject data from the 1000 Genomes database was also examined as an additional control data set (503 European, 347 Amerindian, and 661 African ancestry control samples). Genetic ancestry was confirmed in all subjects, and it was confirmed that there were no duplicates. Independent validation datasets from the All of Us Research Program included 989 European, 1,744 African, and 813 Hispanic SLE cases, 2,481 European Sjogren’s cases, and corresponding controls (4,942 European, 3,306 African, 3,979 Hispanic).

[0072] Genotyping

[0073] Subjects in the OMRF registry and the healthy controls were genotyped and passed quality control for the Immunochip genotyping platform. HLA haplotypes were imputed from the SNPs available on the chip, 5054 SNPs were used for the HLA imputation. All case and control data were imputed together. 1RF5 SNP rs2004640 was imputed using Michigan Imputation Server. Haplotype frequencies in SLE patients and controls w'ere similar' to known data. HLA alleles were imputed for all subjects using the same Michigan Imputation Server and HIBAG. HLA alleles with a posterior probability >0.9, >0.8, >0.7 for European-American, Amerindian, and African-American, respectively, were taken forward into subsequent analyses.

[0074] Risk factor models and statistical analysis

[0075] The risk factor models were developed in multiple stages independently in SLE for each ancestry separately (European-American, Hispanic -American, African-American) and for Sjogren’s disease using the same framework described herein with reference to FIG. 7 (stages 1-6).

[0076] Results

[0077] To detect more complex interactions comprehensively, combinations of all major common haplotypes across the HLA region were examined, and then the interaction of IRF5 gene region variants with these HLA region pairs was examined. Focusing on HLA alleles that individually showed significant associations, first, combinations of HLA alleles across the class I and class 11 HLA region were examined, and then interaction between these HLA combinations and common variants in the IRF5 region were examined.

[0078] The results indicated many combinations of alleles that were present at significant frequencies in SLE patients that were not found in the controls. This formed our high risk group. Tire approach for this group mirrors that used in monogenic disease, in which the disease associated variant is either not found or very rarely found in healthy populations.

[0079] FIG. 10 illustrates the various types of interaction patterns in the high-risk group that were observed between the maternal and paternal HLA alleles and the IRF5 region in each ancestral background and disease category. The HLA allele pairs largely fit the compound heterozygosity model, and they frequently included alleles in both class I and class II HLA regions. These associated high-risk combinations were epistatic multiplicative genegene interactions between HLA allele pairs and IRF5 region variants.

[0080] Table 3 below shows the top 10 individual high-risk combinations in European ancestry (EA). Bonferroni p-value threshold of 6.1E-6, to account for the number of possible allelic pairs. These epistatic gene-gene combinations not found in controls were combined to form the high-risk group. Grouping these allelic combinations together produces a predictive power similar to that of a monogenic disorder with high penetrance. The odds ratio for high-risk group in this ancestry is OR = 565.4 (35.2 - 9076.6), p=3.9x 10‘75. The strength of association of these HLA / IRF5 combinations organized by the various IRF5 regions in high-risk class is shown in FIGS. 11A-11C. This absence of the high-risk combinations in controls is consistent with their estimated very low frequency, given known allele frequencies and the assumption of independent assortment of these loci that reside on different chromosomes (maternal and paternal strands at chromosome 6 for HLA, and chromosome 7 for IRF5). Expected frequencies of these high-risk allelic combinations in controls were estimated using a Monte Carlo simulation, the using their allelic frequencies and assuming independent assortment. As shown in Table 3, frequencies of these SLE-associated combinations range from 6.1e-4 (1 in 1636 people) to 8.0e-6. 12.3 percent of SLE patients had one of these high-risk combinations, supporting the idea that this model explains a significant proportion of the liability to SLE.

[0081] Table 3. Top 10 high-risk factors for SLE EA sorted by odds ratio. The odds ratios (OR) indicate association strength for SLE patients relative to control population for subjects that carry the respective risk factors.Table 3ControlCase freq. Odds RatioHLA IRF5 freq. p-value (obs.) (95% CI)(est.)DQB1*2 / DRB1*3& B*4O TNPO3 2.3e-3 2.5e-5 93.30 (32.40-268.30) 2.0 x 10'7DQB1*2 / DRB1*3& B*27 TNPO3 2.3e-3 3.0e-5 77.70 (27.40-220.90) 3.8 x 10'7B*8 / DRB1*15& A*1 TNPO3 2.3e-3 4.7e-5 49.60 (17.90-137.90) 2.0 x 10'6DRB1 *3 / A*24& DQBl *2 TNPO3 2.3e-3 7.8e-5 29.90 (10.90-81.80) 1.4 x 10’5B*27 / B*8& DQA1*3 IRF5 2.3e-3 8.6e-5 27.10 (9.90-74.00) 2.0 x 10-5DRB1*3 / DPB1*5 IRF5 4.1e-3 1.4e-4 29.00 (13.60-62.00) 9.4 x l0"9DRB1*1 / DRB1*15& A*2 TNPO3 5.2e-3 1.9e-4 27.80 (14.20-54.40) 1.0 x 10-10DRB1*15 / DQB1*2& DPB1*2 TNPO3 7.0e-3 3.3e-4 21.50 (12.10-38.30) 1.5 x 10-12A*24 / DRB1*3& A*2 TNPO3 2.3e-3 1.6e-4 14.50 (5.40-39.10) 2.1 x 10-4DQB1*2 / DRB1*3& B*4O IRF5 3.5e-3 1.6e-4 93.30 (32.40-268.30) 4.9 x 10-7

[0082] Data sets from non-European ancestry SLE patients were also examined using a similar interaction model (see FIGS. 11A-11C). In Amerindian ancestry SLE, evidence for a similar epistatic model was found with the IRF5 risk haplotype TACA showing thestrongest interaction with HLA compound heterozygous genotypes, and the odds ratio forhigh-risk group in this ancestry is OR = 106.2 (6.5 - 1729.6), p=3.4x10-15. In African- American subjects, an epistatic relationship was again observed between HLA and IRF5. The odds ratio for African-American cohort for the high-risk group is OR= 173.8 (10.8 - 2807.3), p=6.8x10-25.

[0083] In the Venn diagrams in FIG. 11B, each subject is plotted according to thenumber of high-risk genetic models they fulfill. The vast majority of subjects fit in to onlyone group. In the case of overlap, it is most commonly a different HLA pair interacting with an alternate IRF5 allele that results in being categorized in two models, supporting the ideathat there are two distinct risk models operative in that person rather than simple correlation or linkage disequilibrium between nearby alleles resulting in dual categorization.

[0084] Medium risk and low risk groups were established as above in each ancestral background. While the observed odds ratios for the high-risk groups are somewhat lower innon-European ancestry, it has been noted that the HLA region is not as strongly associated with SLE in non-European ancestry. Again, one or more algorithms of the present application worked well in these two additional ancestral backgrounds, providing further validation for this approach.

[0085] Further data are analyzed from Sjogren’s syndrome (SS) in which, as stated herein, both HLA and IRF5 are risk factors. Using the same algorithm, many complex epistatic interactions between HLA allele pairs and IRF5 were again found. These combinations differ from those found in SLE, further supporting differences in the genetic pathogenesis of these two autoimmune diseases.

[0086] High-risk alleles pulled together in a group produce an OR of 347.6 (21.3 - 5682.3), p = 1.7xl0‘27for SS. As such, these data further demonstrate that the one or more algorithms of the present application and the overall strategy extend beyond SLE to other autoimmune diseases. FIG. 12A illustrates the performance of the model at varying levels of complexity, from simple single gene on the left to the final model on the right, with the ORs observed in each model and subject category. Figure 12B shows the ORs of the various risk groups on a logarithmic scale, along with the frequency of that combination group, showing the percentage of patients that the gene-gene combination category applies to. In each case, the high-risk group represents a significant proportion of patients (4.1-14%).

[0087] To further ensure the rigor and generalizability of the study, independent validation was performed using new case-control datasets derived from the All of Us Research Program, distinct from the discovery cohorts. This step provided a stringent test of model robustness and reproducibility across populations. The full set of risk factors identified in the discovery cohorts was applied to the independent validation datasets for individual risk factors that meet replication criteria (FIG. 12A).

[0088] Risk factors demonstrating directional concordance between discovery and validation cohorts, a one-sided replication p-value < 0.05, low heterogeneity (I2< 0.25), and a significant pooled fixed-effect meta-analysis p-value < 0.05 were considered replicated. For the high-risk class, an additional stringency criterion requiring complete absence of carriers among controls was imposed. In the discovery (training) cohort, these high-risk variants yielded markedly elevated composite odds ratios (OR = 565.4 (35.2-9076.6)), indicating strong enrichment among cases with high statistical significance (-log 10 p > 2) despite their rarity. The independent validation cohort is based on the subset of individual risk factors that met predefined replication criteria and reproduced this pattern (up to OR = 723.1 (44.8-11,673.5)), confirming that the observed magnitude of effect is robust and reproducible in an independent dataset.

[0089] Medium- and low-risk variant composites displayed moderate effects (OR ~ 5-18). Importantly, when analyses were restricted to replicated variants that met all predefined replication criteria (directional concordance, one-sided p < 0.05, 12< 0.25, Meta_P_FE < 0.05), significant composite odds ratios were consistently observed across all ancestral cohorts (SLE-EA, SLE-AA, SLE-HA, and SS-EA). These results reinforce that the strongest genetic signals represent stable associations across independent cohorts.Collectively, these findings indicate that while high-risk alleles are rare, their cumulative impact across replicated loci substantially contributes to SEE susceptibility across multiple ancestries and to Sjogren’s syndrome in European-American individuals.

[0090] The relationship between high-risk genetic burden and clinical manifestations was also examined. In the SEE European-American cohort, high-risk individuals showed significant associations with neurological or renal involvement (OR = 2.2 [1.0-4.5], p = 4.3 x 102). And in the SEE African- American cohort, the high-risk group was significantly associated with more severe disease (ACR score > 5; OR = 2.2 [1.1—4.1 ], p - 1.3 x 10"2).

[0091] The estimated age of the disease-associated HLA alleles within the human lineage was then examined using MAFFT alignment and BEAST software as known in the art. Using a cutoff of <100,000 years old as a newer allele, and > 100,000 years old as older alleles, it was found that the HLA alleles that combine with IRF5 to form the high-risk groups are significantly enriched in newer alleles, while the protective group is enriched for older alleles, and the medium and low risk groups are intermediate (see FIGS. 13A-13B). Interestingly, this pattern was observed with class I alleles, and in class II alleles with the exception of DQA1 and DQB1 in which no age difference was observed. Again, this suggests evolutionary pressure due to infectious disease driving more pro-inflammatory HLA allele development which then carries additional risk of SLE. Evolution and selection in the HLA region are complex, and the large number of alleles as identified in accordance with present disclosure makes it likely that this relationship is complex.

[0092] As exemplified by the results of this additional study, the systems and methods provided herein indicate that both IRF5 and HLA are among the strongest common genetic risk factors for SLE, and this study further demonstrates that these alleles combine in powerfully synergistic ways to influence autoimmune disease risk.

[0093] Again, the degree of complexity can explain the relatively small number of gene-gene interactions reported to date, despite the large number of risk loci documented incomplex disease. Epistatic gene-gene interactions with complex compound heterozygosity would not be detected by typical interaction screening methods, which typically do not take into account multiple variants and multiple loci simultaneously.

[0094] The findings of this additional study, produced in accordance with the systems and methods disclosed herein, further demonstrate that the disclosed gene-gene interactions upon SLE risk are an actionable result. With SEE prevalence of 164 per 100,000 (0.16%) in European ancestry women, the positive predictive values (PPV) for each cohort were also determined as a measure of clinical diagnostic utility. In European ancestry SLE, it was found that the high-risk group has a PPV of 100% (98.1-100%) for individuals carrying the high-risk combinations to develop SLE. For Amerindian ancestry SLE the PPV was 100% (78.2-100%), for African American ancestry SLE the PPV was 100% (94.9-100%), and for European ancestry Sjogren’s the PPV was 100% (95.8-100%)% (see FIG. 12B). As mentioned herein, a strong predictive power can be useful for clinical applications for early detection of autoimmune diseases (even at a very young age or for newborns).

[0095] Although the present disclosure is described by way of example herein in terms of a web-based system using web browsers, custom applications and a web site server (data processing apparatus 802), and with mobile computing devices, system 800 is not limited to that particular configuration. It is contemplated that system 800 can be arranged such that computing device 804 can communicate with, and display data received from, data processing apparatus 802 using any known communication and display method, for example, using a non-Internet browser Windows viewer coupled with a local area network protocol such as the Internetwork Packet Exchange (IPX). It is further contemplated that any suitable operating system can be used on computing device 804, for example, WINDOWS, MAC OS, OSX, LINUX, IOS, ANDROID and any suitable PDA or other computer operating system.

[0096] As used herein, the terms “function” or “module” refer to hardware, firmware, or software in combination with hardware and / or firmware for implementing features described herein. In the hardware sense, a module can be a functional hardware unit designed for use with other components or modules. For example, a module may be implemented using discrete electronic components, or it can form a portion of an entire electronic circuit such as an Application Specific Integrated Circuit (ASIC). Numerous other possibilities exist, and those of ordinary skill in the art will appreciate that the system can also be implemented as a combination of hardware and software modules. In the software sense, a module may be implemented as logic executing in a collection of software instructions, possibly having entry and exit points, written in a programming language, such as, for example, Java, Lua, C orC++. A software module may be compiled and linked into an executable program, installed in a dynamic link library, or may be written in an interpreted programming language such as, for example, Perl, or Python. It will be appreciated that software modules may be callable from other modules or from themselves, and / or may be invoked in response to detected events or interrupts. Software instructions may be embedded in firmware. Moreover, the modules described herein can be implemented as software modules, but may be represented in hardware or firmware. Generally, the modules described herein refer to logical modules that may be combined with other modules or divided into sub-modules despite their physical organization or storage.

[0097] While operations shown and described herein may be in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing can be advantageous. Moreover, the separation of various system components in the embodiments described herein should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.

[0098] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and / or “comprising”, when used in this disclosure, specify the presence of stated features, integers, steps, operations, elements, and / or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and / or groups thereof.

[0099] It should be noted that use of ordinal terms such as “first,” “second,” “third,” etc., in the claims to modify a claim element does not by itself connote any priority, precedence, or order of one claim element over another or the temporal order in which acts of a method are performed, but are used merely as labels to distinguish one claim element having a certain name from another element having a same name (but for use of the ordinal term) to distinguish the claim elements.

[0100] Also, the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. The use of “including,” “comprising,” or“having,” “containing,” “involving,” and variations thereof herein, is meant to encompass the items listed thereafter and equivalents thereof as well as additional items.

[0101] Particular embodiments of the subject matter described in this disclosure have been described. Other embodiments are within the scope of the following claims. For example, the actions recited in the claims can be performed in a different order and still achieve desirable results. As one example, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In certain implementations, multitasking and parallel processing can be advantageous.

Claims

WHAT IS CLAIMED:

1. A computerized method for assessing a number of gene-gene interactions between the HLA and IRF5 gene regions, the method comprising:enrolling, by at least one computing device in a registry, subjects including SLE patients having met classification criteria for SLE and Sjogren’s patients having met AECG criteria;genotyping, by at least one computing device, the subjects and healthy control subjects for submission to a genotyping platform;developing, by at least one computing device, HLA risk factor models for each of a plurality of stages; andperforming, by at least one computing device, statistical analysis for each of the plurality of stages.

2. The method of claim 1, further comprising:performing, by at least one computing device, Fisher exact tests between each HLA allele variant and a respective outcome;performing, by at least one computing device, risk factor association scans using Fisher’s exact tests;expanding, by at least one computing device, dimensionality and specificity of HLA risk factors by including multiple allele combinations across multiple loci on one of the chromosome strands;applying, by at least one computing device, a classification system to N3,out factors; anddefining, by at least one computing device, a group risk factor by combining each of a plurality of intra-group combinations using logical OR operators.

3. The method of claim 2, further comprising:applying, by at least one computing device, Bonferroni correction; and identifying, by at least one computing device, individual risk factors having p values less than Bonferroni-corrected 95% significance thresholds.

4. The method of claim 3, wherein the Bonferroni correction includes using Monte Carlo simulation.

5. The method of claim 1, further comprising:forming, by at least one computing device, a high-risk group by identifying combinations of alleles present at significant frequencies in SLE patients.

6. A computerized system for assessing a number of gene-gene interactions between the HLA and IRF5 gene regions, the system comprising:at least one computing accessing instructions stored on processor-readable media, wherein the at least one computing device is configured by executing the instructions for:enrolling, in a registry, subjects including SLE patients having met classification criteria for SLE and Sjogren’s patients having met AECG criteria;genotyping the subjects and healthy control subjects for submission to a genotyping platform;developing HLA risk factor models for each of a plurality of stages; and performing statistical analysis for each of the plurality of stages.

7. The system of claim 6, wherein the at least one computing device is further configured for:performing Fisher exact tests between each HLA allele variant and a respective outcome;performing risk factor association scans using Fisher’s exact tests;expanding dimensionality and specificity of HLA risk factors by including multiple allele combinations across multiple loci on one of the chromosome strands;applying a classification system to N3,out factors; anddefining a group risk factor by combining each of a plurality of intra-group combinations using logical OR operators.

8. The system of claim 7, wherein the at least one computing device is further configured for:applying Bonferroni correction; andidentifying individual risk factors having p values less than Bonferroni-corrected 95% significance thresholds.

9. The system of claim 8, wherein the Bonferroni correction includes using Monte Carlo simulation.

10. The system of claim 6, wherein the at least one computing device is further configured for:forming a high-risk group by identifying combinations of alleles present at significant frequencies in SLE patients.