Modeling method of ophthalmic disease classification model

By using symptom and sign feature similarity algorithms and DBSCAN algorithm to generate a homogeneous disease set in the ophthalmic disease classification model, and combining it with the physiological and pathological baseline feature library and SHAP feature attribution analysis, the problems of insufficient specialty feature guidance and lack of hierarchical classification in existing models are solved, and accurate disease hierarchical division and orthodontic intervention fit quantification are achieved.

CN122245814APending Publication Date: 2026-06-19XUZHOU FIRST PEOPLES HOSPITAL

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
XUZHOU FIRST PEOPLES HOSPITAL
Filing Date
2026-03-17
Publication Date
2026-06-19

Smart Images

  • Figure CN122245814A_ABST
    Figure CN122245814A_ABST
Patent Text Reader

Abstract

This invention relates to the field of modeling technology, specifically disclosing a modeling method for an ophthalmic disease classification model, including: collecting comprehensive ophthalmic diagnosis and treatment data and completing structured processing; dividing homogeneous and non-homogeneous disease sets based on a cosine similarity algorithm guided by ophthalmic specialty features; performing feature association analysis and aggregation optimization on the homogeneous disease sets, generating subgroups through secondary clustering; and completing the subgroup physiological and pathological boundary failure judgment by combining a standardized ophthalmic physiological and pathological baseline feature library; if the physiological and pathological boundary fails, extracting the core fusion features suitable for orthodontic intervention through the SHAP feature attribution algorithm, constructing an effective organic determination system for correction, and combining it with the ophthalmic specialty classification system, using a multi-dimensional feature weighted voting algorithm for hierarchical classification, outputting exclusive feature labels, improving the specialty specificity and accuracy of ophthalmic disease classification, and quantifying the orthodontic intervention suitability.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of modeling technology, specifically to a modeling method for classification models of ophthalmic diseases. Background Technology

[0002] With the deep integration of medical digitalization and artificial intelligence technologies, the field of ophthalmology has achieved large-scale collection and intelligent analysis of multi-dimensional data. Currently, the classification of ophthalmological diseases is developing towards data-driven, precise, and hierarchical directions, while hierarchical classification and feature attribution analysis are gradually becoming the core trends in model optimization.

[0003] However, existing ophthalmic disease classification models still suffer from two major deficiencies, making it difficult to meet the needs of precision diagnosis and treatment: First, the modeling logic lacks specialty feature guidance and physiological and pathological boundary quantification. Existing models mostly use general similarity algorithms and clustering logic for modeling, without designing specific similarity calculation and clustering modeling rules, and without constructing a boundary failure judgment module. This results in vague homogeneous disease classifications output by the models, making it impossible to accurately distinguish the physiological and pathological abnormalities of subgroups, and the specialty specificity and accuracy of the model classification are insufficient. Second, hierarchical classification and orthodontic fitting modeling mechanisms are lacking. Existing models do not construct a modeling module for orthodontic fusion feature extraction and judgment based on SHAP feature attribution, making it impossible to achieve hierarchical and accurate modeling of disease categories, subtypes, and orthodontic fitting subtypes. Orthodontic intervention fit cannot be quantitatively output through the model, making it difficult to support the formulation of precise clinical diagnosis and treatment plans.

[0004] Therefore, this invention provides a modeling method for classifying ophthalmic diseases. Summary of the Invention

[0005] The purpose of this invention is to provide a modeling method for classifying ophthalmic diseases in order to solve the aforementioned background problems.

[0006] The objective of this invention can be achieved through the following technical solutions: Modeling methods for ophthalmic disease classification models include: We acquired a comprehensive ophthalmology diagnosis and treatment dataset, and used a symptom and sign feature similarity algorithm to perform homogeneous disease clustering to determine the homogeneous disease set. Based on a homogeneous disease set, feature aggregation and combination analysis is carried out to generate homogeneous disease subgroups. Combined with the ophthalmic physiological and pathological baseline feature library, a physiological and pathological boundary failure judgment is made. If the physiological and pathological boundary fails, orthodontic fusion features are extracted from homogeneous disease subgroups through feature attribution priority analysis, and an effective organic determination system for orthodontic treatment is constructed to determine orthodontic fusion in homogeneous disease subgroups. Based on the orthodontic fusion determination results of homogeneous disease subgroups, combined with the ophthalmology disease specialty classification system, a hierarchical classification of homogeneous disease subgroups is carried out through a multi-dimensional feature weighted voting classification algorithm, and feature labels of homogeneous diseases are output.

[0007] As a further aspect of the present invention: Furthermore, the process of clustering homogeneous diseases to determine the homogeneous disease set is as follows: The symptom data from the full-dimensional ophthalmology diagnosis and treatment dataset are categorized according to the same ophthalmology specialty characteristics, and after performing one-hot encoding, they are integrated with the physical examination data into a specialty feature matrix. Based on the symptom and sign feature similarity algorithm, all patient pairs in the specialty feature matrix are traversed, and the cosine similarity of any two patient specialty feature vectors is calculated one by one. By setting similarity criteria and the minimum number of samples for the DBSCAN algorithm, patients with highly consistent specialty characteristics are grouped together. The distance metric in the similarity matrix is ​​calculated as 1 - cosine similarity. Patients with cosine similarity greater than or equal to the standard and meeting the minimum sample size requirement are grouped into the same cluster using the DBSCAN algorithm. For each cluster, extract the common specialty feature combination of patients within the cluster. If the specialty feature combination is unique across clusters, it is determined to be a valid specialty feature cluster. By summarizing all valid specialty feature clusters, a homogeneous disease set is obtained.

[0008] Furthermore, the process of generating homogeneous disease subgroups is as follows: Based on whether any homogeneous disease group contains specialized characteristics, it is divided into symptom manifestation category and physical examination category according to whether it contains numerical values. Calculate the absolute value of the Pearson correlation coefficient among all specialty characteristics within a homogeneous disease group, and label it as the correlation coefficient. Features with a correlation coefficient greater than or equal to the correlation coefficient threshold are classified as strongly correlated features. Each group of strongly correlated features is integrated into a feature combination; Based on the physical examination data of each feature combination module, the K-means clustering algorithm is used to perform secondary clustering on this homogeneous disease group to obtain several sub-clusters. Extract the feature combinations of each sub-cluster. If the feature combinations are unique across clusters, they are identified as a homogeneous disease subgroup.

[0009] Furthermore, the process for determining the failure of the physiological and pathological boundary is as follows: For each homogeneous disease subgroup, physiological and pathological features are extracted from the feature combinations; Retrieve the general normal reference range for the corresponding physiological and pathological features from the ophthalmic physiological and pathological baseline feature database; Deviation checks were performed on the physiological and pathological characteristics of all patients within the homogeneous disease subgroup: If the physiological and pathological features exceed the baseline reference range or the classification features are yes, then the patient is determined to have an abnormal physiological and pathological boundary for that feature. The number of patients with abnormal physiological and pathological boundaries in each homogeneous disease subgroup is counted, and the percentage of abnormal patients is calculated as the number of abnormal patients / the total number of patients with homogeneous diseases. If the percentage of abnormal patients is greater than the failure judgment threshold, the physiological and pathological boundary is judged to be failed.

[0010] Furthermore, the process of extracting orthotic fusion features from homogeneous disease subgroups is as follows: Using all specialty feature data of homogeneous disease subgroups as input features and physiological and pathological boundary failure markers as output labels, a logistic regression model is constructed. Calculate the attribution weight of each input feature to the output label, and perform orthodontic removal screening on the input features: retain the input features that are directly related to orthodontic intervention; Redundancy check is performed on the input features after the orthodontic correction elimination screening: extract the Pearson correlation coefficient between any two features, and if it is greater than or equal to the correlation coefficient threshold, then the feature with the smaller attribution weight among the two features is eliminated; The input features obtained through orthodontic elimination screening and redundancy testing are integrated to obtain orthodontic fusion features of homogeneous disease subgroups.

[0011] Furthermore, the process of constructing an effective organic lesion assessment system is as follows: The extracted orthopedic fusion features are divided into two categories based on their function: corrective effectiveness features and organic association features. The total attribution weight is obtained by summing the attribution weights of all orthotics fusion features. Calculate the total attribution weight of the corrective effective category features and the total attribution weight of the organic association category features, and then perform ratio processing with the total attribution weight to obtain the proportion of corrective effective weight and the proportion of organic weight. Based on the fusion features of any orthodontic appliance, calculate the average deviation of the features of the effective correction class and the average deviation of the features of the organic correlation class. Deviation percentage = Average deviation / Average attribution weight; .

[0012] Furthermore, the process of calculating the average deviation of the corrective effective class features and the average deviation of the organic association class features is as follows: The average attribution weight of all orthotics fusion features is obtained by processing the ratio of the total attribution weight to the number of features. The absolute value of the difference between the attribution weight and the average attribution weight is calculated to obtain the absolute deviation of the orthotics fusion feature; Calculate the mean absolute deviation of the features of the effective correction class to obtain the average deviation of the features of the effective correction class; The mean absolute deviation of the qualitative correlation features is used to obtain the mean deviation of the organic correlation features.

[0013] Furthermore, the process for determining orthotic fusion in homogeneous disease subgroups is as follows: Based on any homogeneous disease subgroup: If the comprehensive judgment value is greater than or equal to the comprehensive judgment threshold, the orthodontic fusion intervention is deemed successful. If the comprehensive judgment value is less than the comprehensive judgment threshold, the intervention for orthotics fusion is deemed unsuccessful.

[0014] Furthermore, the process of hierarchically classifying homogeneous disease subgroups and outputting feature labels for homogeneous diseases is as follows: Based on any homogeneous disease subgroup: Three features are extracted as classification criteria: Category 1 features: a unique combination of specialty features; Category 2 features: orthotics fusion features; Category 3 features: orthotics fusion determination results. Calculate the weights of the first, second, and third features of the homogeneous disease subgroups, and then normalize them. Based on different characteristics, the disease is divided into three layers: disease category layer, disease subtype layer, and orthodontic fitting subtype layer. Based on any homogeneous disease subgroup: calculate according to the correspondence of candidate categories. ; Where i=1 corresponds to the disease category layer, i=2 corresponds to the disease subtype layer, i=3 corresponds to the orthotic fitting subtype layer, and the effectiveness coefficient is calculated from the correspondence between the effectiveness coefficient and the fit. The candidate category with the highest comprehensive score at each level is taken as the final classification result for each level; The final classification results at each level are combined with the physiological and pathological boundary failure judgment results to obtain the feature labels of each homogeneous disease.

[0015] Furthermore, the process of calculating the weights of the first, second, and third characteristics of the homogeneous disease subgroup is as follows: The proportion of the frequency of occurrence of a unique combination of specialty features in all patients within a homogeneous disease subgroup is calculated to obtain the first-class feature weight of the homogeneous disease subgroup. Substitute the two types of features into the logistic regression model, extract the attribution weights of each feature, perform mean processing to obtain the attribution weights of the homogeneous disease subgroup, and perform min-max normalization on the attribution weights to obtain the weights of the two types of features of the homogeneous disease subgroup. The comprehensive judgment values ​​of the three features of all homogeneous disease subgroups are normalized by min-max to obtain the weights of the three features of homogeneous disease subgroups. Fit = (Number of matching features between feature combination and candidate category / Total number of features) × 100%; The correspondence between candidate categories is as follows: Candidate categories at the disease category level correspond only to the disease category level. The candidate categories for the disease subtype layer correspond to the disease category layer + the disease subtype layer; The candidate categories of the orthotic adaptation subtype layer correspond to the disease subtype layer + orthotic adaptation subtype layer.

[0016] The beneficial effects of this invention are: (1) Improve the specialty specificity and classification accuracy of ophthalmic disease classification model: By collecting structured ophthalmic diagnosis and treatment data in all dimensions, using ophthalmic specialty feature-guided similarity algorithm, combined with clustering technology to divide disease sets, then generating subgroups through feature aggregation, and combining with physiological and pathological baseline database to complete boundary failure judgment, the problems of ambiguous division and insufficient specialty specificity caused by the general clustering logic of existing models are solved. (2) Achieve precise hierarchical classification of ophthalmic diseases and quantitative output of orthodontic intervention fit: Extract orthodontic fusion features through feature attribution and complete fit determination. Combined with the ophthalmology specialty classification system, adopt the weighted voting algorithm to optimize the strategy of hierarchical classification of diseases, and make up for the defects of existing models in lack of hierarchical classification and lack of quantitative basis for orthodontic fit. Attached Figure Description

[0017] The invention will now be further described with reference to the accompanying drawings.

[0018] Figure 1 This is a flowchart illustrating the steps involved in building a classification model for ophthalmic diseases. Figure 2 This is a logic diagram illustrating the modeling method for an ophthalmic disease classification model. Detailed Implementation

[0019] To make the technical means, creative features, objectives and effects of this invention easier to understand, the invention will be further described below in conjunction with specific embodiments.

[0020] Please see Figures 1-2As shown, this invention is a modeling method for ophthalmic disease classification. This invention primarily addresses the problems in the current ophthalmic disease classification process based on multi-dimensional diagnostic and treatment data, such as insufficient homogeneity (scattered patient characteristics, lack of unified clustering standards), blurred physiological and pathological boundaries, and a lack of precise judgment criteria for orthodontic intervention and adaptation. These issues lead to unclear disease classification hierarchies and difficulty in predicting the effects of orthodontic intervention. By distinguishing between homogeneous / non-homogeneous disease sets, subgroup division and boundary judgment, orthodontic fusion judgment, and hierarchical classification, this invention achieves accurate hierarchical classification of homogeneous ophthalmic diseases, including the following steps: Step 1: Obtain a full-dimensional ophthalmology diagnosis and treatment dataset, and use a symptom and sign feature similarity algorithm to perform homogeneous disease clustering to determine the homogeneous disease set and the non-homogeneous disease set; In step one, the full-dimensional ophthalmology diagnosis and treatment dataset refers to: several patient data, each of which includes: symptom data (degree of visual impairment, characteristics of blurred vision, photophobia and tearing, etc.) and physical examination data (intraocular pressure, corneal curvature, signs of fundus lesions, etc.). The acquisition of a full-dimensional ophthalmology diagnosis and treatment dataset can be achieved by: collecting data through electronic medical record systems (EMR) and data interfaces of ophthalmic examination equipment (such as OCT equipment, tonometers, corneal topometers, etc.), performing natural language processing (such as BERT-based text classification models) on unstructured text related to symptoms and signs in the medical records (such as a patient experiencing persistent blurred vision, significant photophobia, and occasional slight tearing in the past month), and extracting structured features. Specifically, according to the "Guidelines for Clinical Diagnosis and Treatment of Ophthalmology (2022 Edition)," the text description is converted into preset quantitative indicators or classification labels (such as mapping "persistent blurred vision" to "blurred vision feature - persistent," "significant photophobia" to "photophobia - moderate," and "slight tearing" to "tearing - slight"). In step one, the symptom and sign feature similarity algorithm refers to a cosine similarity matching algorithm based on specialty feature guidance. Its core function is to represent the degree of similarity in ophthalmic specialty features between different patients. The structured symptom and sign data is categorized according to ophthalmic specialty feature dimensions (such as refractive-related features, intraocular pressure-related features, fundus-related features, etc.) and transformed into specialty feature vectors. Symptom data (such as blurred vision type, photophobia level) are converted into numerical values ​​through one-heat encoding. Physical examination data (such as intraocular pressure, corneal curvature) are normalized and quantified. Then, the specialty feature similarity is calculated using the ratio of the vector inner product to the modulus, as shown in the formula: Where A and B are the specialty feature vectors of different patients, respectively. For vector dot product, , The vector length is denoted by , and the similarity value ranges from [0,1]. The closer the value is to 1, the more consistent the patients' specialty characteristics are. In step one, the process of performing homogeneous disease clustering is as follows: The symptom data from the full-dimensional ophthalmology diagnosis and treatment dataset are categorized according to the same ophthalmology specialty characteristics (such as "decreased vision + abnormal corneal curvature" and "blurred vision + fundus lesions"), and after performing unique coding, they are integrated with the physical examination data into a specialty feature matrix. Based on the symptom and sign feature similarity algorithm, all patients in the specialty feature matrix are traversed, and the cosine similarity of the specialty feature vectors of any two patients is calculated one by one. The matrix elements represent the degree of fit of the specialty feature performance between patients. The similarity standard is set to a range of 0.8-0.95, and this embodiment uses 0.88; The similarity standard is set based on the systematic and practical principles in Article 5.1 of Chapter 5 of WS / T 306-2023 "Classification and Coding Rules for Health Information Datasets" (which requires datasets to be systematically arranged according to the inherent relationship of features, ensuring unique categories and reasonable structure, and meeting the consistent understanding of health data query and application). It is also combined with the specialty requirements for homogeneous ophthalmic diseases in the diagnosis and treatment chapters of core diseases such as refractive errors and glaucoma in the "Guidelines for Clinical Diagnosis and Treatment of Ophthalmology (2022 Edition)" (which require that the clinical fit of patient symptoms and signs within the cluster be ≥85%). Similarity optimization method: By traversing the threshold within the value range, the optimal similarity standard is determined with the maximization of the silhouette coefficient as the optimization index. At the same time, based on the basic concept of a dataset in Chapter 4.1 of WS / T 306-2023, which states that a dataset is a collection of several data records, and the systematic principle in Article 5.1, combined with the sample distribution characteristics of the ophthalmology diagnosis and treatment dataset (the total sample size of the full-dimensional ophthalmology diagnosis and treatment dataset in this embodiment is 1000 cases), the minimum sample size of the DBSCAN algorithm is set to cluster patients with highly consistent specialty characteristics into one cluster. It should be noted that, through experimental verification, when the similarity standard is set to 0.88, the silhouette coefficient of the clusters reaches 0.82 (the optimal value in the interval), and the clinical fit of the patient specialty characteristics within the clusters reaches 92%, which meets the requirements of the ophthalmology specialty guidelines. The minimum number of samples for the DBSCAN algorithm is 10 cases. At this time, the Davidson-Bourdin index of the clusters is 0.35 (the minimum value in the interval, with the best separation between clusters), and it meets the general statistical requirements of health datasets, which stipulate that the number of samples should not be less than 1% of the total sample size and not less than 5.

[0021] Calculate the distance metric in the similarity matrix: Distance metric = 1 - Cosine similarity; DBSCAN clustering core parameters: Neighborhood radius ε = 1 - similarity criterion (preferably ε = 0.12 in this embodiment); The minimum sample size is determined based on the practicality and systematic principles of the basic principles of dataset classification in Chapter 5.1 of WS / T 306-2023, and the basic concept of datasets in Chapter 4.1, which defines health information datasets as sets of thematic information that meet clinical diagnosis and treatment needs. This is combined with the minimum sample size requirements for homogeneous disease subgroup analysis in the chapter on ophthalmology clinical examination and data management specifications in the "Guidelines for Clinical Diagnosis and Treatment of Ophthalmology (2022 Edition)". In this embodiment, the total sample size is 1000 cases, and the minimum sample size of 10 cases is used. This meets both the general statistical requirement in WS / T306-2023 that the sample size for clinical dataset cluster analysis should not be less than 1% of the total sample size and not less than 5 cases, and the sample size specifications for homogeneous disease subgroup analysis in the ophthalmology specialty guidelines. Verification has shown that this value provides the best clinical fit for the clustering results. Parameter optimization method: The silhouette coefficient and the Davidson-Bolding index are used to verify the clustering effect and determine the optimal parameter combination; By using the DBSCAN algorithm for clustering, patients with cosine similarity ≥ similarity criteria and meeting the minimum sample size requirement are divided into the same cluster, and each cluster corresponds to a group of patients with the same specialty characteristics. For each cluster, extract the common specialty feature combinations among patients within the cluster (e.g., "persistent blurred vision + intraocular pressure 18-22 mmHg + corneal curvature 43-45D"). If the specialty feature combination is unique across clusters (not the same as the specialty feature combinations of other clusters), it is determined to be a valid specialty feature cluster. In step one, the process of determining the homogeneous disease set and the non-homogeneous disease set is as follows: All valid specialty feature clusters are summarized to obtain a homogeneous disease set. The set contains multiple homogeneous disease groups, and each homogeneous disease group corresponds to a combination of specialty features (such as homogeneous disease group 1: "mild visual impairment + no fundus lesions + normal intraocular pressure", homogeneous disease group 2: "persistent blurred vision + macular degeneration + elevated intraocular pressure", etc.). Patient data that are not effective specialty feature clusters (with no obvious commonalities in specialty features) and patient data corresponding to invalid clusters with non-unique combinations of specialty features within the clusters together constitute a non-homogeneous disease set. Patients in this type of data lack uniform specialty feature manifestations and have the specificity of specialty diseases, so no further classification is required. Step 2: Based on the homogeneous disease set, conduct feature aggregation and combination analysis to generate homogeneous disease subgroups, and combine them with the ophthalmic physiological and pathological baseline feature library to make a decision on the failure of the physiological and pathological boundary. In step two, the feature aggregation and combination analysis refers to an analysis method that classifies, associates, integrates, and optimizes the multi-dimensional specialty features of each homogeneous disease group in a homogeneous disease set. In step two, the process of generating homogeneous disease subgroups is as follows: Based on whether numerical values ​​are included in any homogeneous disease group, the specialty characteristics are divided into symptom manifestation category (such as degree of visual impairment, blurred vision) and sign examination category (such as intraocular pressure value, corneal curvature, fundus lesion markers). The correlation coefficient (absolute value of Pearson correlation coefficient) is used to calculate the correlation between features. A correlation coefficient threshold is set, and features with a correlation coefficient ≥ the correlation coefficient threshold are judged as strongly correlated features. The correlation coefficient threshold is based on the systematic principle of the basic principles of dataset classification in Chapter 5, Section 5.1 of WS / T 306-2023, combined with the clinical correlation rules of symptom-sign characteristics in the diagnosis and treatment chapters of various diseases in the "Guidelines for Clinical Diagnosis and Treatment of Ophthalmology (2022 Edition)" (such as the clinical strong correlation criteria between corneal curvature and visual impairment, and intraocular pressure and fundus lesions). Each group of strongly correlated features is integrated into a feature combination (such as "intraocular pressure value + fundus lesion marker" or "corneal curvature + degree of visual impairment"), and isolated features that are independent and have no strong correlation are removed. Based on the physical examination data of each feature combination module, the K-means clustering algorithm is used to perform secondary clustering on the homogeneous disease group to obtain several sub-clusters; The elbow rule is applied as follows: The candidate range for the number of clusters k is set to 2-8; Iterate through and calculate the intra-cluster squared error and SSE corresponding to different k values; Plot the k-SSE variation curve and select the inflection point (elbow point) of the curve as the initial optimal number of clusters; The final number of clusters k is determined by combining the number of disease subtypes in clinical guidelines; The number of disease subtypes in the clinical guidelines refers to the number of official subtypes of each ophthalmic disease recorded in the "Ophthalmology Clinical Diagnosis and Treatment Guidelines (2022 Edition)". Extract the feature combinations of each sub-cluster (e.g., subgroup 1 under homogeneous disease group 1: "mild visual impairment + normal intraocular pressure + corneal curvature 40-42D", subgroup 2: "mild visual impairment + normal intraocular pressure + corneal curvature 43-45D"). If the feature combination has cross-cluster uniqueness (no overlap with other feature combinations), it is determined to be a homogeneous disease subgroup. The homogeneous disease subgroups obtained by splitting each homogeneous disease group are summarized to obtain a set of homogeneous disease subgroups; In step two, the ophthalmic physiological and pathological baseline feature database refers to a standardized feature database constructed based on large-scale general healthy population ophthalmic physical examination data and authoritative clinical guidelines. The data comes from the national multi-center general healthy population epidemiological survey (sample size ≥ 100,000) and the "ICD-11 Clinical Diagnosis and Treatment Guidelines for Ophthalmic Diseases". In step two, the process of determining the failure of the physiological and pathological boundary is as follows: For each homogeneous disease subgroup, physiological and pathological features are extracted from feature combinations (such as "mild visual impairment + normal intraocular pressure + corneal curvature 40-42D"). Physiological and pathological characteristics refer to the set of features of the core disease attributes of a homogeneous disease subgroup, specifically quantitative or classification features extracted from patient clinical data that can reflect the physiological state, pathological changes and disease progression of the target ophthalmic disease (such as glaucoma, diabetic retinopathy, macular degeneration). Retrieve the general normal reference ranges for the corresponding physiological and pathological features from the ophthalmic physiological and pathological baseline feature database (such as intraocular pressure 10-21 mmHg, corneal curvature 40.00-46.00D, and fundus structure normality marked as "No"). Deviation checks were performed on the physiological and pathological characteristics of all patients within the homogeneous disease subgroup: If the physiological and pathological features exceed the baseline reference range or the classification features (such as fundus lesion markers) are yes, then the patient is determined to have an abnormal physiological and pathological boundary (i.e., an abnormality exists). The number of patients with abnormal physiological and pathological boundaries in this homogeneous disease subgroup was counted, and the percentage of abnormal patients was calculated as: number of abnormal patients / total number of patients with homogeneous diseases. If the proportion of abnormal patients is greater than the failure threshold, it means that the physiological and pathological characteristics of patients in the homogeneous disease subgroup generally deviate from the healthy baseline, and are judged as physiological and pathological boundary failure. If the proportion of abnormal patients is less than or equal to the failure judgment threshold, it means that the physiological and pathological characteristics of patients in the homogeneous disease subgroup have not deviated generally, and the physiological and pathological boundary is judged to have not failed. The failure judgment threshold is set according to WS / T 306-2023 and the requirements for judging abnormal physiological and pathological boundaries of homogeneous ophthalmic disease subgroups in the "Guidelines for Clinical Diagnosis and Treatment of Ophthalmology (2022 Edition)". Step 3: If the physiological and pathological boundary fails, extract orthodontic fusion features from the homogeneous disease subgroup through feature attribution priority analysis, construct an orthodontic effective organic determination system, and determine orthodontic fusion in the homogeneous disease subgroup. In step three, the feature attribution priority ranking analysis refers to the feature attribution analysis method based on SHAP (SHapley Additive ex Planations) values, which is used to quantify the contribution of each specialty feature in the homogeneous disease subgroup to the result of "failure of the physiological and pathological boundary". The features are ranked according to their contribution. By constructing a logistic regression model, the specialty feature data of the homogeneous disease subgroup is used as input and the physiological and pathological boundary failure identifier is used as output. The marginal contribution of each feature to the failure result is calculated by utilizing the additivity and fairness of SHAP values. In step three, the process of extracting orthotic fusion features from the homogeneous disease subgroup is as follows: Using all specialty feature data (such as degree of visual impairment, intraocular pressure, corneal curvature, etc.) of the homogeneous disease subgroup as input features, and using the physiological and pathological boundary failure indicator ("yes / no") of the subgroup as the binary output label, a logistic regression model is constructed. Loss function: The binary cross-entropy loss function is adopted to adapt to the binary classification judgment of physiological and pathological boundary failure; Optimizer: The Adam optimizer is used to achieve adaptive gradient updates of model parameters; Regularization method: L2 regularization is used to suppress model overfitting; Number of iterations: 1000, convergence threshold: 10 -5 ; Dataset partitioning: Randomly partition the dataset into training and validation sets in an 8:2 ratio; Model training process: The input specialty feature data is processed by min-max normalization and the training set and validation set are divided in an 8:2 ratio. The logistic regression model and regularization parameters are initialized. The model weights are updated iteratively based on the training set. Training is stopped when the change in the loss function is less than the convergence threshold or the maximum number of iterations is reached. Accuracy is used as the model performance verification index. The validation set is used to complete the model performance verification and parameter fine-tuning. The SHAP value analysis algorithm is used to calculate the attribution weight of each input feature to the output label (i.e., the mean of the absolute values ​​of the SHAP values). This weight directly represents the contribution of the feature to the failure of the physiological and pathological boundary. Perform orthodontic removal filtering on the input features: Preserve input features that are directly related to orthodontic intervention (such as refractive-related signs, corneal curvature, degree of visual impairment, eye position-related features, etc., which can be corrected and intervened through orthodontics such as eyeglasses and contact lenses). Input features that are directly related to organic lesions and cannot be corrected by orthotics (such as the degree of macular degeneration, retinal hemorrhage markers, etc., which require drug or surgical treatment and are ineffective with orthotics intervention) are excluded. Redundancy check is performed on the input features after orthodontic correction elimination screening: Extract the Pearson correlation coefficient between any two features; If the absolute value of the Pearson correlation coefficient is greater than or equal to the correlation coefficient threshold (which is consistent with the correlation coefficient threshold of feature aggregation and combination analysis), it is determined that there is strong redundancy, and the feature with the smaller attribution weight among the two features is removed. The input features obtained through orthodontic elimination screening and redundancy testing are integrated to obtain orthodontic fusion features for homogeneous disease subgroups (this feature set contains both the core factors driving the failure of the physiological and pathological boundary and the feasibility of orthodontic intervention). In step three, the process of constructing an effective organic lesion assessment system is as follows: The extracted orthotic fusion features were divided into two categories according to their function: features with corrective effectiveness (features that can be improved through orthotic intervention and are directly related to the corrective effect in the "Clinical Guidelines for Ophthalmic Orthotic Intervention", such as features related to visual acuity improvement potential, features related to refractive correction fit, and features related to eye position correction) and features related to organic lesions (features that are related to organic lesions and affect the effect of orthotic intervention, such as features related to fundus lesions and features related to optic nerve dysfunction). The total attribution weight is obtained by summing the attribution weights of all orthotics fusion features. Calculate the total attribution weights of the effective correction category features and the total attribution weights of the organic association category features, respectively. Calculate the percentage of effective correction weights = total attribution weights of effective correction classes / total attribution weights; The calculator's qualitative weighting percentage = total attribution weight of organic association classes / total attribution weight; Calculate the average attribution weight of all orthodontic fusion features = total attribution weight / number of features; Based on any orthodontic fusion feature: The absolute value of the difference between the attribution weight and the average attribution weight is calculated to obtain the absolute deviation of the orthotics fusion feature; Calculate the mean absolute deviation of the features of the effective correction class to obtain the average deviation of the features of the effective correction class; The mean absolute deviation of qualitative correlation features in the calculator, and the mean deviation of organic correlation features; Calculate the deviation percentage = average deviation / average attribution weight (quantifies the dispersion of the attribution weights of the two types of features relative to the benchmark value); Construct a formula for calculating the comprehensive judgment value: Overall judgment value = (Proportion of effective correction weight - Proportion of organic cause weight) × (1 - Proportion of deviation in effective correction category) × (1 - Proportion of deviation in organic cause-related category). It should be noted that the logic for calculating the comprehensive judgment value is as follows: the higher the proportion of the corrective effective weight and the lower the proportion of the two types of feature deviations, the higher the comprehensive judgment value. (1 - proportion of the corrective effective deviation) × (1 - proportion of the organic association deviation) is the penalty term. The higher the deviation proportion, the more dispersed the feature weights are, the smaller the penalty term is, and the lower the final comprehensive judgment value is. This ensures that the judgment result focuses on the core influencing features. If the proportion of the corrective effective weight is less than the proportion of the organic weight, the comprehensive judgment value of the homogeneous disease subgroup is directly determined to be lower than the comprehensive judgment threshold, and the orthopedic device fusion intervention is not approved.

[0022] In step three, the process of determining orthodontic fusion in the homogeneous disease subgroup is as follows: Based on any homogeneous disease subgroup: Extract the comprehensive judgment value and compare it with the comprehensive judgment threshold: The comprehensive judgment threshold was set according to WS / T 306-2023 and the criteria for judging the effectiveness of orthotic intervention in the "Guidelines for Clinical Diagnosis and Treatment of Ophthalmology (2022 Edition)". If the comprehensive judgment value is greater than or equal to the comprehensive judgment threshold, it indicates that the corrective effective features are dominant in the subgroup and the influence of the features is concentrated, while the influence of organic correlation is weak. The orthodontic intervention can achieve the expected effect, and the orthodontic fusion intervention is deemed to have passed. If the comprehensive judgment value is less than the comprehensive judgment threshold, it indicates that the proportion of organic-related features in the subgroup is too high, or the feature weights are scattered. The effect of orthopedic intervention may be affected by organic factors and not meet expectations. Therefore, the orthopedic fusion intervention is deemed unsuccessful. Step 4: Based on the orthodontic fusion determination results of the homogeneous disease subgroups, and combined with the ophthalmology disease specialty classification system, a hierarchical classification of the homogeneous disease subgroups is carried out through a multi-dimensional feature weighted voting classification algorithm, and the feature labels of the homogeneous diseases are output. In step four, the ophthalmic disease specialty classification system refers to a three-level hierarchical classification system built on the core of the "ICD-11 Clinical Diagnosis and Treatment Guidelines for Ophthalmic Diseases" and combined with the characteristics of ophthalmic specialty diagnosis and treatment (such as refractive correction, intraocular pressure regulation, etc.), covering three core dimensions: disease category layer, disease subtype layer, and orthotic fitting subtype layer.

[0023] In step four, the multi-dimensional feature weighted voting classification algorithm refers to an algorithm that uses three core features of homogeneous disease subgroups (unique specialty feature combination, orthodontic fusion feature, and orthodontic fusion judgment result) and their corresponding weights as the core, combined with the ophthalmology disease specialty classification system, to achieve hierarchical classification by calculating the comprehensive score of candidate categories. In step four, the process of hierarchically classifying homogeneous disease subgroups is as follows: Based on any homogeneous disease subgroup: Three features were extracted as classification criteria: Category 1 features: a unique combination of specialty features; Category 2 features: orthotics fusion features; and Category 3 features: orthotics fusion determination results (orthotics fusion intervention passed / failed). The proportion of the frequency of occurrence of a unique combination of specialty features in all patients within a homogeneous disease subgroup is calculated to obtain the first-class feature weight of the homogeneous disease subgroup. The two types of features are substituted into a logistic regression model with specialty features as input and physiological and pathological boundary failure indicators as output. The attribution weights of each feature are extracted and mean-based to obtain the attribution weights of the homogeneous disease subgroup. The attribution weights of all homogeneous disease subgroups are normalized using the min-max method to obtain the binary feature weights of the homogeneous disease subgroups. The comprehensive judgment values ​​of the three features of all homogeneous disease subgroups are normalized by min-max to obtain the weights of the three features of homogeneous disease subgroups. The weights of the first, second, and third features of homogeneous disease subgroups are normalized to ensure that the sum of the weights of the three features of the same homogeneous disease subgroup is 1. The first-class features, the second-class features, and the third-class features are respectively divided into the disease category layer, the disease subtype layer, and the orthotic fitting subtype layer; The correspondence between candidate categories is as follows: candidate categories at the disease category level correspond to the disease category level; candidate categories at the disease subtype level correspond to both the disease subtype level and the disease category level; and candidate categories at the orthopedic fitting subtype level correspond to both the disease subtype level and the orthopedic fitting subtype level (candidate categories at the disease category level are determined solely by the category itself; candidate categories at the disease subtype level are jointly determined by both the disease subtype level and the disease category level; and candidate categories at the orthopedic fitting subtype level are jointly determined by both the disease subtype level and the orthopedic fitting subtype level). Based on any homogeneous disease subgroup: ; Where i=1 corresponds to the disease category layer, i=2 corresponds to the disease subtype layer, and i=3 corresponds to the orthodontic fitting subtype layer; the effectiveness coefficient is an indicator of the effectiveness of the feature combination in matching the homogeneous disease subgroup. It should be noted that the relationship between the validity coefficient and the fit is as follows: the mapping rule between the validity coefficient and the fit is a piecewise linear relationship, and the specific calculation method is as follows: if the fit between the feature combination and the homogeneous disease subgroup is 100%, then the validity coefficient is 1.0; if 50% ≤ fit ≤ 99%, then the validity coefficient is calculated according to the linear formula: validity coefficient = 0.1 + (fit - 50%) × 0.8 / 49 (achieving a linear mapping from 50% to 0.1 and from 99% to 0.9); if the fit is < 50%, then the validity coefficient is 0.

[0024] The candidate category with the highest comprehensive score was selected as the final classification result for the disease category layer, disease subtype layer, and orthodontic fitting subtype layer. In step four, the process of outputting the feature labels of homogeneous diseases is as follows: Based on any homogeneous disease in a homogeneous disease set: Extract the final classification results of the disease category layer, the final classification results of the disease subtype layer, the final classification results of the orthotics fitting subtype layer, and the physiological and pathological boundary failure judgment results, and combine them to output the feature labels of homogeneous diseases. For example, the feature labels of homogeneous diseases are as follows: For a certain homogeneous disease: the final classification result of the disease category layer is: refractive error category, the final classification result of the disease subtype layer is: astigmatism subtype, the final classification result of the orthotic fitting subtype layer is: orthopedic subtype, and the physiological and pathological boundary failure judgment result is: physiological and pathological boundary failure; then the feature labels of the homogeneous disease are: refractive error category - astigmatism subtype - orthopedic subtype - physiological and pathological boundary failure.

[0025] The working principle of this invention is as follows: By collecting and structuring comprehensive ophthalmic diagnosis and treatment data, a cosine similarity algorithm based on specialty features is used in conjunction with DBSCAN clustering to divide homogeneous and non-homogeneous disease sets. Then, feature aggregation and combination analysis is performed on the homogeneous disease sets, and secondary clustering is used to generate subgroups. Combined with an ophthalmic physiological and pathological baseline feature library, a physiological and pathological boundary failure judgment is made. For subgroups with boundary failure, SHAP feature attribution is used to extract orthodontic fusion features and construct an organic determination system for effective correction, completing the orthodontic fusion intervention judgment. Finally, combined with the ophthalmic disease specialty classification system, a multi-dimensional feature weighted voting algorithm is used to achieve hierarchical classification of homogeneous diseases and output feature labels, thereby accurately classifying the major categories, subtypes, and orthodontic fitting subtypes of homogeneous ophthalmic diseases, and clarifying the physiological and pathological state and orthodontic intervention suitability of each subgroup.

[0026] The foregoing has provided a detailed description of one embodiment of the present invention, but this description is merely a preferred embodiment and should not be construed as limiting the scope of the invention. All equivalent variations and modifications made within the scope of the present invention should still fall within the scope of the present invention.

Claims

1. A modeling method for an ophthalmic disease classification model, characterized by: include: We acquired a comprehensive ophthalmology diagnosis and treatment dataset, and used a symptom and sign feature similarity algorithm to perform homogeneous disease clustering to determine the homogeneous disease set. Based on a homogeneous disease set, feature aggregation and combination analysis is carried out to generate homogeneous disease subgroups. Combined with the ophthalmic physiological and pathological baseline feature library, a physiological and pathological boundary failure judgment is made. If the physiological and pathological boundary fails, orthodontic fusion features are extracted from homogeneous disease subgroups through feature attribution priority analysis, and an effective organic determination system for orthodontic treatment is constructed to determine orthodontic fusion in homogeneous disease subgroups. Based on the orthodontic fusion determination results of homogeneous disease subgroups, combined with the ophthalmology disease specialty classification system, a hierarchical classification of homogeneous disease subgroups is carried out through a multi-dimensional feature weighted voting classification algorithm, and feature labels of homogeneous diseases are output.

2. The modeling method for the ophthalmic disease classification model according to claim 1, characterized in that: The process of clustering homogeneous diseases to determine a homogeneous disease set is as follows: The symptom data from the full-dimensional ophthalmology diagnosis and treatment dataset are categorized according to the same ophthalmology specialty characteristics, and after performing one-hot encoding, they are integrated with the physical examination data into a specialty feature matrix. Based on the symptom and sign feature similarity algorithm, all patient pairs in the specialty feature matrix are traversed, and the cosine similarity of any two patient specialty feature vectors is calculated one by one. By setting similarity criteria and the minimum number of samples for the DBSCAN algorithm, patients with highly consistent specialty characteristics are grouped together. The distance metric in the similarity matrix is ​​calculated as 1 - cosine similarity. Patients with cosine similarity greater than or equal to the standard and meeting the minimum sample size requirement are grouped into the same cluster using the DBSCAN algorithm. For each cluster, extract the common specialty feature combination of patients within the cluster. If the specialty feature combination is unique across clusters, it is determined to be a valid specialty feature cluster. By summarizing all valid specialty feature clusters, a homogeneous disease set is obtained.

3. The modeling method for the ophthalmic disease classification model according to claim 1, characterized in that: The process of generating homogeneous disease subgroups is as follows: Based on whether any homogeneous disease group contains specialized characteristics, it is divided into symptom manifestation category and physical examination category according to whether it contains numerical values. Calculate the absolute value of the Pearson correlation coefficient among all specialty characteristics within a homogeneous disease group, and label it as the correlation coefficient. Features with a correlation coefficient greater than or equal to the correlation coefficient threshold are classified as strongly correlated features. Each group of strongly correlated features is integrated into a feature combination; Based on the physical examination data of each feature combination module, the K-means clustering algorithm is used to perform secondary clustering on this homogeneous disease group to obtain several sub-clusters. Extract the feature combinations of each sub-cluster. If the feature combinations are unique across clusters, they are identified as a homogeneous disease subgroup.

4. The modeling method for the ophthalmic disease classification model according to claim 1, characterized in that: The process of determining the failure of the physiological and pathological boundary is as follows: For each homogeneous disease subgroup, physiological and pathological features are extracted from the feature combinations; Retrieve the general normal reference range for the corresponding physiological and pathological features from the ophthalmic physiological and pathological baseline feature database; Deviation checks were performed on the physiological and pathological characteristics of all patients within the homogeneous disease subgroup: If the physiological and pathological features exceed the baseline reference range or the classification features are yes, then the patient is determined to have an abnormal physiological and pathological boundary for that feature. The number of patients with abnormal physiological and pathological boundaries in each homogeneous disease subgroup is counted, and the percentage of abnormal patients is calculated as the number of abnormal patients / the total number of patients with homogeneous diseases. If the percentage of abnormal patients is greater than the failure judgment threshold, the physiological and pathological boundary is judged to be failed.

5. The modeling method for the ophthalmic disease classification model according to claim 1, characterized in that: The process of extracting orthotic fusion features from homogeneous disease subgroups is as follows: Using all specialty feature data of homogeneous disease subgroups as input features and physiological and pathological boundary failure markers as output labels, a logistic regression model is constructed. Calculate the attribution weight of each input feature to the output label, and perform orthodontic removal screening on the input features: retain the input features that are directly related to orthodontic intervention; Redundancy check is performed on the input features after the orthodontic correction elimination screening: extract the Pearson correlation coefficient between any two features, and if it is greater than or equal to the correlation coefficient threshold, then the feature with the smaller attribution weight among the two features is eliminated; The input features obtained through orthodontic elimination screening and redundancy testing are integrated to obtain orthodontic fusion features of homogeneous disease subgroups.

6. The modeling method for the ophthalmic disease classification model according to claim 1, characterized in that: The process of constructing an effective organic lesion assessment system is as follows: The extracted orthopedic fusion features are divided into two categories based on their function: corrective effectiveness features and organic association features. The total attribution weight is obtained by summing the attribution weights of all orthotics fusion features. Calculate the total attribution weight of the corrective effective category features and the total attribution weight of the organic association category features, and then perform ratio processing with the total attribution weight to obtain the proportion of corrective effective weight and the proportion of organic weight. Based on the fusion features of any orthodontic appliance, calculate the average deviation of the features of the effective correction class and the average deviation of the features of the organic correlation class. Deviation percentage = Average deviation / Average attribution weight; 。 7. The modeling method for the ophthalmic disease classification model according to claim 6, characterized in that: The process of calculating the average deviation of the corrective effective class features and the average deviation of the organic association class features is as follows: The average attribution weight of all orthotics fusion features is obtained by processing the ratio of the total attribution weight to the number of features. The absolute value of the difference between the attribution weight and the average attribution weight is calculated to obtain the absolute deviation of the orthotics fusion feature; Calculate the mean absolute deviation of the features of the effective correction class to obtain the average deviation of the features of the effective correction class; The mean absolute deviation of the qualitative correlation features is used to obtain the mean deviation of the organic correlation features.

8. The modeling method for the ophthalmic disease classification model according to claim 6, characterized in that: The process for determining orthopedic fusion in homogeneous disease subgroups is as follows: Based on any homogeneous disease subgroup: If the comprehensive judgment value is greater than or equal to the comprehensive judgment threshold, the orthodontic fusion intervention is deemed successful. If the comprehensive judgment value is less than the comprehensive judgment threshold, the intervention for orthotics fusion is deemed unsuccessful.

9. The modeling method for the ophthalmic disease classification model according to claim 1, characterized in that: The process of hierarchically classifying homogeneous disease subgroups and outputting feature labels for homogeneous diseases is as follows: Based on any homogeneous disease subgroup: Three types of features were extracted as classification criteria: Type 1 features: a unique combination of specialty features; Type 2 features: orthotics and device fusion features; Type 3 features: Orthopedic fusion determination results; Calculate the weights of the first, second, and third features of the homogeneous disease subgroups, and then normalize them. Based on different characteristics, the disease is divided into three layers: disease category layer, disease subtype layer, and orthodontic fitting subtype layer. Based on any homogeneous disease subgroup: calculate according to the correspondence of candidate categories. ; Where i=1 corresponds to the disease category layer, i=2 corresponds to the disease subtype layer, i=3 corresponds to the orthotic fitting subtype layer, and the effectiveness coefficient is calculated from the correspondence between the effectiveness coefficient and the fit. The candidate category with the highest comprehensive score at each level is taken as the final classification result for each level; The final classification results at each level are combined with the physiological and pathological boundary failure judgment results to obtain the feature labels of each homogeneous disease.

10. The modeling method for the ophthalmic disease classification model according to claim 9, characterized in that: The process of calculating the weights of Class I, Class II, and Class III features for homogeneous disease subgroups is as follows: The proportion of the frequency of occurrence of a unique combination of specialty features in all patients within a homogeneous disease subgroup is calculated to obtain the first-class feature weight of the homogeneous disease subgroup. Substitute the two types of features into the logistic regression model, extract the attribution weights of each feature, perform mean processing to obtain the attribution weights of the homogeneous disease subgroup, and perform min-max normalization on the attribution weights to obtain the weights of the two types of features of the homogeneous disease subgroup. The comprehensive judgment values ​​of the three features of all homogeneous disease subgroups are normalized by min-max to obtain the weights of the three features of homogeneous disease subgroups. Fit = (Number of matching features between feature combination and candidate category / Total number of features) × 100%; The correspondence between candidate categories is as follows: Candidate categories at the disease category level correspond only to the disease category level. The candidate categories for the disease subtype layer correspond to the disease category layer + the disease subtype layer; The candidate categories of the orthotic adaptation subtype layer correspond to the disease subtype layer + orthotic adaptation subtype layer.