Fundus image diagnosis and analysis system based on pattern recognition

By integrating multi-scale feature collaborative extraction, structured knowledge graphs, and evidence reasoning, the problem of insufficient interpretability and generalization ability of fundus image diagnostic models has been solved, achieving high-precision and interpretable fundus image diagnosis and improving the credibility and efficiency of clinical applications.

CN122243941APending Publication Date: 2026-06-19ZHEJIANG MEDICAL COLLEGE

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
ZHEJIANG MEDICAL COLLEGE
Filing Date
2026-03-19
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing fundus image diagnostic models suffer from poor interpretability, strong dependence on labeled data, and insufficient generalization ability to identify lesions at multiple scales and in various forms, which limits their application in clinical diagnosis.

Method used

By employing a multi-scale feature collaborative extraction module, a structured pathological knowledge graph module, and an evidence-based reasoning decision fusion module, combined with an image preprocessing and standardization module, high-precision and interpretable diagnosis of fundus images can be achieved through multi-scale feature extraction, structured knowledge guidance, and evidence-based reasoning fusion.

Benefits of technology

It significantly improves the ability to identify different types of fundus lesions and the overall generalization performance, enhances the interpretability and clinical credibility of the system, and promotes the effectiveness of human-machine collaborative diagnosis.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122243941A_ABST
    Figure CN122243941A_ABST
Patent Text Reader

Abstract

This invention discloses a pattern recognition-based fundus image diagnostic analysis system, belonging to the field of medical image processing and computer-aided diagnostic technology. The system includes an image preprocessing and standardization module, a multi-scale feature collaborative extraction module, a structured pathological knowledge graph module, an evidence-based reasoning decision fusion module, and an interpretable visualization output module. This invention aims to address the problems of poor interpretability, strong dependence on labeled data, and insufficient generalization ability for multi-scale lesion recognition in existing technologies. Through multi-scale feature collaborative extraction and knowledge graph-guided evidence fusion, it achieves accurate and interpretable identification of fundus lesions and quantifies the uncertainty of diagnostic decisions.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of medical image processing and computer-aided diagnosis technology, and particularly relates to a fundus image diagnostic analysis system based on pattern recognition. Background Technology

[0002] Artificial intelligence (AI) technology is increasingly being applied in the healthcare field. Its core lies in using algorithmic models to assist or replace humans in performing complex diagnostic and analytical tasks, thereby improving the efficiency and accuracy of medical services. Among these applications, intelligent analysis of medical images, as a crucial branch of AI, is gradually changing the traditional model that relies on doctors' subjective experience for interpretation.

[0003] Among them, disease screening and diagnostic analysis based on fundus images is a key technological direction for intelligent medical image analysis. This technology aims to use computer vision and pattern recognition algorithms to automatically extract lesion features from fundus images and use them to assist in the diagnosis of eye diseases such as diabetic retinopathy and glaucoma, thereby alleviating the pressure of insufficient professional ophthalmologist resources.

[0004] Existing technologies typically employ deep learning models such as convolutional neural networks for end-to-end classification or segmentation of fundus images. However, these methods face significant challenges: the interpretability of these models is generally poor, and their diagnostic decision-making process is like a "black box," making it difficult to gain the trust of clinicians; model performance heavily relies on large-scale, high-quality, and precisely labeled training data, which is difficult to obtain and prone to annotation noise in real-world medical scenarios; furthermore, a single model often struggles to simultaneously and sensitively identify multiple fundus lesions of different morphologies and scales, resulting in insufficient generalization ability. In the high-reliability application scenario of clinical diagnosis, these problems directly restrict the practical deployment and effective application of AI-assisted diagnostic systems. Therefore, constructing a fundus image analysis system that combines high accuracy, strong interpretability, and good generalization ability has become an urgent technical challenge. Summary of the Invention

[0005] The purpose of this invention is to provide a pattern recognition-based fundus image diagnostic analysis system to address the contradictions in existing technologies, such as poor model interpretability, strong dependence on labeled data, and insufficient generalization ability to recognize multi-scale and multi-morphological lesions.

[0006] To solve the above-mentioned technical problems, the present invention provides the following technical solution: A pattern recognition-based fundus image diagnostic analysis system includes: The image preprocessing and standardization module receives raw fundus images and performs a series of standardization operations. First, this module performs image quality assessment, specifically by calculating the image's global contrast, local sharpness indicators, and the presence of overexposed or underexposed areas to determine if the image meets the analysis requirements. For images that do not meet the quality threshold, the system generates a re-acquisition command. For images that meet the quality requirements, this module performs vascular structure-guided image registration to eliminate image positional deviations caused by shooting angle or eye movement. Furthermore, this module employs an adaptive normalization algorithm based on the retinal vascular network topology to correct the image's brightness and color, eliminating inter-domain differences caused by different imaging devices and shooting conditions, and outputting a standardized fundus image.

[0007] A multi-scale feature extraction module, connected to the image preprocessing and normalization module, is used to extract pathological features at different levels from normalized images in parallel. This module contains three parallel feature extraction sub-networks, each focusing on visual patterns at different scales. The first sub-network is a high-resolution local detail network, employing a densely connected convolutional layer structure with a receptive field designed to cover 2% to 5% of the image area, specifically designed to capture the fine texture and edge features of minute lesions such as microaneurysms and small hemorrhages. The second sub-network is a mesoscale region association network, employing a convolutional structure with spatial pyramid pooling, with a receptive field covering 10% to 30% of the image area, used to identify the regional distribution features of mesoscale lesions such as exudates and cotton wool spots and their association with surrounding blood vessels. The third sub-network is a global structural semantic network, employing a deep residual network structure combined with a self-attention mechanism, with a receptive field covering the entire image, used to understand the macroscopic structural morphology of the optic disc and macula, the overall orientation of vascular arches, and the spatial layout semantics of lesions in the global context. These three sub-networks cascade parameters by sharing the underlying feature map and establish a feature interaction gateway in the intermediate layer, allowing selective fusion and complementarity of feature information at different scales, thereby collaboratively constructing a hierarchical and complementary multi-scale feature representation.

[0008] The structured pathology knowledge graph module is an independently built and continuously updated knowledge base that stores prior medical knowledge about fundus diseases. This knowledge graph is organized in an entity-relationship-attribute format. Entities include specific pathological signs, anatomical structures, and disease diagnostic categories; relationships define the causal, concurrent, and exclusionary medical logical connections between entities; and attributes describe the typical morphological parameters and statistical characteristics of entities in images. For example, the knowledge graph explicitly encodes the entity "microaneurysm," whose attributes include a typical round or oval shape, a diameter ranging from 10 to 100 micrometers, and a common location in the posterior pole; it also defines the relationship that "microaneurysm" is an "early typical sign" of "diabetic retinopathy." This module connects to the output of a multi-scale feature collaborative extraction module through a knowledge embedding layer, mapping the extracted numerical feature vectors to the semantic space of the knowledge graph and calculating the matching degree between features and each pathological entity node in the graph.

[0009] The evidence-based decision fusion module receives hierarchical features from the multi-scale feature collaborative extraction module and semantic matching evidence from the structured pathology knowledge graph module. At its core is a Dempster-Shafer evidence theory fusion framework. Specifically, this module constructs a basic probability assignment function for each of the three feature extraction sub-networks. Each basic probability assignment function takes the feature vector extracted by the corresponding sub-network as input and outputs a set of basic confidence assignments for the recognition results. The target recognition framework for the confidence assignment includes a specific set of pathological signs and an "unknown" proposition. Simultaneously, the semantic matching degree provided by the structured pathology knowledge graph module is transformed into an independent source of evidence, whose basic probability assignment tends to support or weaken the recognition hypothesis consistent with prior knowledge. Furthermore, this module employs Dempster's combination rule to perform pairwise combination and iterative fusion of the basic probability assignments of all four evidence sources, ultimately obtaining a consistent confidence distribution that integrates features across all scales and prior knowledge. The decision-making process does not simply select the category with the highest confidence, but rather sets a confidence threshold and an uncertainty threshold. When the overall reliability of a pathological sign or diagnostic category exceeds the reliability threshold of 0.85, and the reliability of the "unknown" proposition is below the uncertainty threshold of 0.15, the system outputs a definitive diagnosis. If this condition is not met, the output is "suspected" along with the reliability values ​​of each candidate proposition, indicating that further examination or expert review is needed.

[0010] The interpretability visualization output module, connected to all the aforementioned modules, generates understandable diagnostic reports for clinicians. First, based on the activation maps of each sub-network in the multi-scale feature collaborative extraction module, it generates heatmaps using activation-mapping techniques, visually displaying which regions in the image contribute most to the final diagnostic decision and highlighting suspected lesion locations. Second, this module retrieves knowledge paths related to the current diagnosis from the structured pathology knowledge graph module and explains, in natural language, why specific signs are inferred from image features and how these signs are associated with the final disease diagnosis based on medical knowledge. Finally, this module displays key evidence and confidence values ​​in the evidence-based reasoning decision fusion process, quantitatively presenting the confidence level of the system's decision and the competition between different hypotheses, transforming the "black box" decision-making process into a transparent and traceable chain of evidence.

[0011] As one embodiment of the present invention, the image registration process guided by vascular structure in the image preprocessing and standardization module is as follows: First, a template image with a standard vascular topology is selected from the standard image library; then, the vascular structure in the input image and the template image is enhanced using a Hessian matrix filter; then, a non-rigid registration algorithm based on phase information is used to align the two images with vascular ridges as features; finally, the input image is spatially transformed using a bicubic interpolation algorithm to align its vascular network with the template in terms of topology.

[0012] In one embodiment of the present invention, the feature interaction gateway in the multi-scale feature collaborative extraction module is implemented using a gated recurrent unit structure. This gateway takes the feature map of a certain layer of the current sub-network as input and the feature maps of corresponding layers or the previous layer of other sub-networks as context information, generating a gated weight vector between 0 and 1 using the sigmoid function. This weight vector controls how much content from the context information needs to be added to the feature stream of the current sub-network, thereby achieving dynamic and adaptive fusion of cross-scale features and avoiding information redundancy and the curse of dimensionality caused by simple feature concatenation.

[0013] In one embodiment of the present invention, the update mechanism of the structured pathological knowledge graph module adopts a human-machine collaborative iterative approach. When a "suspected" case output by the evidence-based reasoning decision fusion module is reviewed and confirmed by authoritative experts, its corresponding image features and final diagnostic label will constitute a new candidate knowledge triple. The system performs a consistency check between this candidate triple and the existing knowledge graph. If there are no logical conflicts, its vectorized representation is integrated into the existing knowledge graph space through a knowledge graph embedding model, and the connection weights of relevant entities and relationships are adjusted to achieve continuous evolution and improvement of the knowledge base.

[0014] In one embodiment of the present invention, the basic probability allocation function in the evidence-based reasoning decision fusion module is constructed using a deep evidence neural network. This network takes feature vectors as input and outputs not only confidence estimates for each category but also a Dirichlet distribution concentration parameter that measures the uncertainty of the evidence. The network is trained end-to-end by maximizing the evidence likelihood of accurately classified samples and minimizing the evidence likelihood of misclassified samples. This allows the network to learn to assign high-concentration parameters and concentrated confidence to reliable predictions, and low-concentration parameters and dispersed confidence to ambiguous or difficult predictions, thereby more reasonably quantifying the uncertainty of the model itself.

[0015] Compared with the prior art, the beneficial technical effects of the present invention are as follows: This invention designs a multi-scale feature collaborative extraction module, constructing a parallel and interactive feature extraction pathway that can simultaneously and sensitively capture lesion information across all scales, from microscopic details to macroscopic structures. The feature interaction gateway enables adaptive fusion of cross-scale information, allowing the system to overcome the limitations of a single model with a fixed receptive field and significantly improve the joint recognition ability and overall generalization performance of different types of fundus lesions with varying morphologies and sizes.

[0016] This invention introduces a structured pathological knowledge graph module, which transforms the difficult-to-formulate clinical expert experience into computable structured knowledge. The system combines data-driven feature learning with knowledge-driven logical reasoning, using prior medical knowledge to constrain and guide the model's learning and decision-making process. This not only reduces the model's pure dependence on massive labeled data and enhances its robustness in scenarios with small samples or noisy data, but more importantly, it provides a semantic interpretation basis for decision-making that conforms to medical logic, fundamentally improving the system's interpretability and clinical credibility.

[0017] This invention employs an evidence-based reasoning decision fusion framework, which rigorously integrates evidence from feature extractors at different scales and prior knowledge bases in a mathematical sense. This framework can explicitly handle and express the uncertainty in the identification process. By setting dual thresholds for reliability and uncertainty, the system can distinguish between confirmed diagnoses, suspected cases, and cases that cannot be determined. This prudent decision-making mechanism is more in line with the requirements of clinical practice, avoiding erroneous outputs due to overconfidence. At the same time, its quantitative evidence presentation makes the decision-making process completely transparent to doctors, greatly promoting the feasibility and effectiveness of human-machine collaborative diagnosis. Attached Figure Description

[0018] Figure 1 This is a schematic diagram of the overall technical architecture of the fundus image diagnostic and analysis system based on pattern recognition proposed in this invention; Figure 2 This is a schematic diagram illustrating the core principle and process of multi-scale feature collaborative extraction and evidence fusion in this invention. Detailed Implementation

[0019] The features and exemplary embodiments of various aspects of the present invention will now be described in detail. To make the objectives, technical solutions, and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are merely intended to explain the present invention and not to limit the present invention. For those skilled in the art, the present invention can be practiced without some of these specific details. The following description of the embodiments is merely to provide a better understanding of the present invention by illustrating examples of the invention.

[0020] It should be noted that, in this document, relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising..." does not exclude the presence of additional identical elements in the process, method, article, or apparatus that includes said element.

[0021] In the embodiments of the present invention, the same reference numerals denote the same components, and for the sake of brevity, detailed descriptions of the same components are omitted in different embodiments. It should be understood that the thickness, length, width, and other dimensions of various components in the embodiments of the present invention shown in the accompanying drawings, as well as the overall thickness, length, width, and other dimensions of the integrated device, are merely illustrative and should not constitute any limitation on the present invention; the term "multiple" in the present invention refers to two or more (including two).

[0022] Example 1 The overall architecture of the pattern recognition-based fundus image diagnostic analysis system proposed in this invention is shown in the attached figure. Figure 1 As shown, the system consists of five core functional units: an image preprocessing and standardization module, a multi-scale feature collaborative extraction module, a structured pathology knowledge graph module, an evidence-based reasoning decision fusion module, and an interpretable visualization output module. These modules collaborate through strictly defined data interfaces and logical connections, forming a complete closed-loop process from raw image input to clinically interpretable diagnostic output. The entire system runs on a medical image analysis server equipped with a high-performance graphics processor and large-capacity memory. The operating system adopts an environment that complies with medical device software security standards, and all data transmission and storage adhere to medical information privacy protection regulations.

[0023] First, the image preprocessing and standardization module receives the raw fundus image acquired by the fundus camera as input. This image is typically a three-channel (red, green, and blue) color image with a resolution of at least 3072×2048 pixels and a bit depth of 8 bits. The module first performs an image quality assessment operation, which includes three parallel sub-steps: global contrast calculation, local sharpness index assessment, and exposure anomaly region detection. Global contrast is obtained by calculating the standard deviation of the image's grayscale histogram; if this value is below 15, the overall image contrast is considered too low. The local sharpness index is calculated by convolving the image with the Laplacian operator and taking the mean of the absolute values; if this value is less than 30, the image is considered blurry. Exposure anomaly regions are identified by statistically analyzing the percentage of areas where any of the red, green, or blue channel pixel values ​​exceeds 240 or is below 20. If this percentage exceeds 10% of the total image area, overexposure or underexposure is considered. If any of the above three indicators fails to reach a preset threshold, the system generates a re-acquisition command and prompts the operator to adjust the shooting parameters or refocus through the human-computer interaction interface. For images that pass the quality assessment, the module proceeds to the vascular structure-guided image registration stage. This stage first retrieves a template image from the built-in standard fundus image library, featuring a typical posterior pole vascular distribution. The optic disc center of this template image is located at image coordinates (1536, 1024), and the fovea is located at (1800, 1024). Subsequently, the module applies Hessian matrix filters to both the input and template images to enhance vascular structures. The Hessian matrix is ​​defined as a symmetric matrix composed of second-order partial derivatives at each pixel (x, y) in the image; its eigenvalues ​​can be used to distinguish linear structures from the background. By setting an eigenvalue ratio threshold, the system can effectively extract continuous vascular ridges. Based on this, the module employs a phase-information-based non-rigid registration algorithm. This algorithm uses the phase consistency of the vascular ridges as the feature matching criterion and optimizes the thin-plate spline transformation parameters to align the vascular network of the input image topologically with the template image. After registration, the module executes an adaptive normalization algorithm based on the retinal vascular network topology. The algorithm first divides the image into vascular and non-vascular regions using a registered vascular mask. Then, it calculates the average brightness and chromaticity of each region. Finally, it uses a linear transformation to map the brightness and chromaticity distribution of the input image to a preset standard distribution range, where the mean brightness is corrected to 128±5 and the chromaticity saturation is limited to between 40 and 70. This processing unifies images acquired from different devices and under different lighting conditions into the same visual domain, outputting a standardized fundus image for use by subsequent modules.

[0024] The standardized image is fed into the multi-scale feature collaborative extraction module. The internal structure of this module is shown in the attached figure. Figure 2As shown, the network comprises three parallel feature extraction sub-networks: a high-resolution local detail network, a mesoscale region association network, and a global structure semantic network. The high-resolution local detail network takes a normalized image of the original resolution as input. Its front end consists of five densely connected 3×3 convolutional layers stacked together, with 64 output channels per layer. The activation function is a modified linear unit. The receptive field of this network is precisely controlled to cover 2% to 5% of the total image area, corresponding to an actual physical size of approximately 100 to 250 micrometers, perfectly matching the typical scale of small lesions such as microaneurysms and punctate hemorrhages. Through a deep supervision mechanism, an auxiliary classification head is introduced in the intermediate layers, forcing the network to learn local texture and edge features sensitive to small lesions. The mesoscale region association network receives images downsampled to the original resolution. Figure 1 The network takes a 2 / 3 resolution image as input and employs a convolutional structure with spatial pyramid pooling as its backbone. This structure contains four parallel dilated convolutional branches with dilation rates of 1, 2, 4, and 8, effectively covering 10% to 30% of the image (approximately 500 to 1500 micrometers). It can capture the regional distribution patterns of medium-scale lesions such as exudates and cotton wool spots, as well as their topological relationships with adjacent blood vessels. The spatial pyramid pooling layer fuses contextual information at different scales into a fixed-length feature vector, avoiding the input size limitations of traditional fully connected layers. The global structural semantic network takes an image further downsampled to 1 / 4 resolution as input and uses a deep residual network ResNet-50 as its architecture, embedding a self-attention mechanism in its last two residual blocks. The self-attention module dynamically aggregates global contextual information by calculating the correlation weights between any two locations in the feature map, enabling the network to understand macroscopic structural semantics such as the integrity of the optic disc contour, the presence of the foveal reflection, and the symmetry of the vascular arch. These three sub-networks are not completely independent; instead, they share feature maps from the first three convolutional layers at the bottom layer to reduce redundant computation and promote consistency of early features. More importantly, a feature interaction gateway is established in the middle layer of the network. This gateway is implemented using a gated recurrent unit structure. Specifically, for high-resolution local detail networks, the feature maps in the fourth layer are used for feature interaction. The gateway receives feature maps F2 from layer 3 of the mesoscale region association network and feature maps from layer 2 of the global structure semantic network. As contextual information, the gateway will first... and Projected onto the array via a 1×1 convolution. The same channel dimensions are concatenated and then fed into a small convolutional network, which outputs a signal with the same channel dimensions. A gating weight matrix G of the same spatial size has its element values ​​constrained to between 0 and 1 by a sigmoid function. Ultimately, Updated to Where W is the learnable weight matrix, This represents element-wise multiplication. This mechanism allows high-resolution networks to selectively absorb mid- and low-frequency information, avoiding dimensionality explosion and information interference caused by simple concatenation. Similarly, other sub-networks also have corresponding interaction gateways. After multiple rounds of cross-scale information exchange, the three sub-networks finally output their respective high-level feature vectors, denoted as... , , Together, they constitute a hierarchical and complementary multi-scale feature representation.

[0025] Meanwhile, the structured pathology knowledge graph module runs continuously as an independent knowledge base. This module pre-constructs a medical knowledge graph containing 128 entities, 215 relationships, and 432 attributes. Entities include "microaneurysm," "hard exudate," "optic disc edema," "diabetic retinopathy," and "age-related macular degeneration," etc.; relationships include "is an early sign of...", "often occurs concurrently with...", and "excludes..."; attributes describe the morphological features of entities in the form of numerical ranges or discrete labels, for example, the diameter attribute of "microaneurysm" is [10, 100] micrometers, and the typical color attribute of "hard exudate" is "bright yellow." This knowledge graph is stored in a graph database format, supporting efficient subgraph queries and path reasoning. During system operation, this module interfaces with a multi-scale feature extraction module through a knowledge embedding layer. The knowledge embedding layer uses the TransR model to map each entity and relationship in the graph to a 128-dimensional continuous vector space. When the multi-scale feature vector... , , After being input, the system first concatenates them into a comprehensive feature vector. Then, it is projected onto a semantic space aligned with the knowledge graph embedding space through a two-layer fully connected network to obtain the query vector. Subsequently, the system calculates... A matching degree vector is obtained by calculating the cosine similarity between the embedding vectors of all pathological entity nodes in the graph and the embedding vectors of the pathological entity nodes. ,in This represents the semantic matching degree between the current image features and the i-th pathological entity. This matching degree vector... It will be used as an independent source of evidence and input into the subsequent decision fusion module.

[0026] The evidence-based reasoning-based decision fusion module is the core decision engine of the system. This module receives four sources of evidence: basic probability assignments from a high-resolution local detail network. Basic probability assignments from mesoscale regional association networks Basic probability assignments from global structural semantic networks And semantic matching evidence from the structured pathology knowledge graph module. Each basic probability assignment function is implemented using a deep evidence neural network. For example, its input is a feature vector. The network structure contains three fully connected layers, and the output layer dimension is equal to the recognition frame. Add 1 to the base, where Include A specific category of pathological signs (such as microaneurysms, hemorrhage, exudation, etc.) and an "unknown" proposition. The network output is a nonnegative vector of dimension , is called the concentration parameter of the Dirichlet distribution. The basic probability assignment is calculated using the following formula:

[0027] in, To identify the framework, including A specific category of pathological signs (such as microaneurysms, hemorrhage, exudation, etc.) and an "unknown" proposition. , For the j-th category or an unknown category, Dirichlet concentration parameter, representing the concentration of a substance at a concentration of 100% of the total mass of a substance at a concentration Strength of evidence For the reliability assignment of evidence source i to proposition A, for A non-empty subset, where K is the number of known pathological sign categories. =0. This formula shows that the reliability distribution is proportional to the value of the concentration parameter minus 1. When a certain When much larger than other components, Approaching 1 indicates a high degree of confidence; when all When it approaches 1, ( A large value indicates high uncertainty. Deep evidence neural networks are trained by maximizing the evidence likelihood of accurate samples. Their loss function is a weighted sum of negative log-likelihood and KL divergence regularization, ensuring that the network outputs high-concentration parameters when making reliable predictions and low-concentration parameters when in ambiguity.

[0028] Obtain the four basic probability assignments , , , Then, the module uses the Dempster combination rule for iterative fusion. The Dempster combination rule is defined as follows:

[0029] The fusion process proceeds in sequence: first, fusion... and get Then and Fusion Finally and The fusion yields the final comprehensive basic probability allocation. Throughout the fusion process, the system monitors the conflict coefficient in real time. ,like If the value exceeds 0.95, the conflict resolution mechanism is triggered, temporarily reducing the knowledge graph evidence. The weights are adjusted to avoid fusion failures caused by serious contradictions between prior knowledge and data evidence. After fusion is completed, the system checks... Does it meet the criteria for a definitive diagnosis: that is, does a certain pathological sign exist? , making ,and If the condition is met, a definitive diagnosis result is output; otherwise, the system outputs a "suspected" status, lists the top 3 candidate signs with the highest reliability and their corresponding m-values, marks the level of uncertainty, and indicates that expert review is required.

[0030] Finally, the interpretable visualization output module integrates information from all the aforementioned modules to generate a structured diagnostic report for clinicians. This report comprises three core parts. The first part is heatmap visualization. The module utilizes class activation mapping technology to extract the spatial regions that contribute most to the final diagnosis from three feature extraction sub-networks. Specifically, for the high-resolution network, the final classification score is backpropagated to the last convolutional layer, weighted, summed, and then activated by ReLU to obtain a local lesion heatmap; for the mesoscale network, gradient-weighted class activation mapping is used to highlight the regional lesion distribution; and for the global network, self-attention weights are used to generate a macroscopic structural attention map. The three heatmaps are color-coded (red indicates high contribution) and superimposed on the original normalized image to visually display the suspicious areas of interest to the system. The second part is natural language interpretation. The module retrieves knowledge paths related to the current diagnostic result from the structured pathology knowledge graph. For example, if the system diagnoses "early diabetic retinopathy," it automatically extracts the triplet path "microaneurysm → yes → early sign of diabetic retinopathy," and combines it with features such as the number and location (e.g., posterior pole) of microaneurysms detected in the image to generate explanatory text such as "Multiple circular high-brightness spots located in the posterior pole were detected in the image, consistent with the typical morphology of microaneurysms. According to medical knowledge, this is a typical early sign of diabetic retinopathy." The third part is the evidence chain display. The module lists the support of the four evidence sources for the final diagnosis in the form of a bar chart, as well as the overall reliability value after fusion, and marks the uncertainty threshold line, enabling doctors to clearly understand the basis and confidence level of the system's decision. The entire report is output in a standard template conforming to the medical document format, supporting printing and integration with electronic medical record systems.

[0031] Furthermore, the system incorporates a human-machine collaborative knowledge update mechanism. When the decision fusion module outputs a "suspected" case and it is verified and confirmed by an authoritative ophthalmologist, the expert marks the final diagnosis label and the corresponding lesion area on the interactive interface. The system automatically extracts the multi-scale feature vector v and the final label y of the case to construct a new knowledge triple (lesion feature description, is the manifestation of..., y). This triple is sent to the knowledge graph consistency verification engine, which checks whether it conflicts with the logical rules in the existing graph (for example, if a new sample shows "hard exudate" in "retinal vein occlusion" rather than "diabetic retinopathy," but existing knowledge considers the two to be strongly correlated, then manual review is required). If there is no conflict, the system calls the knowledge graph embedding model to vectorize the new entities and relations, and fine-tunes the embedding vectors of relevant nodes in the graph through gradient descent, while updating the connection weights. This process ensures that the knowledge base continuously evolves with clinical practice, improving the long-term performance of the system.

[0032] In summary, this embodiment constructs a high-precision, robust, and highly interpretable fundus image diagnostic analysis system through a triple mechanism of multi-scale feature collaborative extraction, structured knowledge guidance, and evidence-based reasoning fusion. This system can not only comprehensively identify various fundus lesions but also explain the decision-making logic to doctors in a way that aligns with clinical thinking, effectively bridging the gap between artificial intelligence and clinical practice.

[0033] Example 2 Building upon the aforementioned embodiments, this embodiment improves the feature interaction gateway in the multi-scale feature collaborative extraction module to further enhance the efficiency and relevance of cross-scale information fusion. Specifically, while the gated recurrent unit structure used in the original embodiment can achieve selective information transmission, its gate weights rely solely on the static combination of context features, failing to fully consider the specific needs of the current task. Therefore, this embodiment introduces a task-aware dynamic routing mechanism.

[0034] The core of this mechanism lies in the fact that the feature interaction gateway no longer only receives feature maps from other sub-networks as context, but also additionally receives feedback signals from the evidence-based reasoning decision fusion module. In the initial stage of system operation, before the decision fusion module has formed a stable output, the gateway still uses the gating mechanism of the original embodiment. However, once the decision fusion module completes the first round of fusion and outputs a preliminary confidence distribution, this distribution is encoded as a task vector t, whose dimension is the same as the cardinality of the recognition framework Θ, and each element represents the current confidence value of the corresponding pathological sign. This task vector t is broadcast to all feature interaction gateways.

[0035] Taking a gateway for high-resolution local detail networks as an example, its input, in addition to In addition, it also includes the task vector t. The gateway first transforms t into an attention guidance map aligned with the feature map space through an embedding layer. Its size and The same applies. Subsequently, the gateway calculates contextual features. and The element-wise product yields the task-weighted context information. Gating weights The generation of features depends not only on the contextual features themselves but also on the modulation of T. This allows the system to prioritize enhancing high-frequency information related to microaneurysms when the current task focuses on microaneurysms, and suppress irrelevant details and enhance mesoscale textures when focusing on exudates. This mechanism achieves "diagnosis-driven feature focusing," dynamically aligning the feature extraction process with the final decision objective.

[0036] Furthermore, this embodiment optimizes the embedding method of the structured pathology knowledge graph module. The original embodiment used a static method. In this model, all entities share the same projection matrix. This embodiment introduces context-aware knowledge embedding, which is based on the global semantics of the input image (derived from the output of the global structural semantic network). (Representation) Dynamically adjust entity embedding. Specifically, this involves... A hypernetwork is input as a conditional variable, generating a personalized projection matrix for the current case to map entities in the knowledge graph to the semantic space. This method results in a slight difference between the embedding vector of "microaneurysm" in images of diabetic patients and in images of hypertensive patients, better reflecting the diversity of actual clinical situations.

[0037] The above improvements significantly enhance the ability to identify complex cases with comorbidities while maintaining the overall system architecture. Experiments show that on a test set containing multiple coexisting fundus diseases, the diagnostic accuracy of this embodiment is 3.2 percentage points higher than that of Embodiment 1, while the clinical relevance score of the interpretability report is improved by 12%.

[0038] The above description is merely a preferred embodiment of the present invention and is not intended to limit the scope of protection of the present invention. Therefore, all equivalent changes made in accordance with the structure, shape, and principle of the present invention should be covered within the scope of protection of the present invention.

Claims

1. A fundus image diagnostic analysis system based on pattern recognition, characterized in that, include: The image preprocessing and normalization module is used to receive raw fundus images and perform normalization processing operations to output normalized fundus images; A multi-scale feature collaborative extraction module, connected to the image preprocessing and standardization module, is used to extract pathological features of different scales in parallel from the standardized fundus image. The multi-scale feature collaborative extraction module contains three parallel feature extraction sub-networks. The structured pathology knowledge graph module, as an independently constructed and continuously updated knowledge base, is used to store prior medical knowledge about fundus diseases. The knowledge graph is organized in the form of entity-relationship-attribute. The structured pathology knowledge graph module is connected to the output of the multi-scale feature collaborative extraction module through a knowledge embedding layer, which maps the extracted numerical feature vectors to the semantic space of the knowledge graph and calculates the matching degree between the features and each pathological entity node in the graph. The decision fusion module based on evidence reasoning is used to receive hierarchical features from the multi-scale feature collaborative extraction module and semantic matching evidence from the structured pathological knowledge graph module, and to perform decision fusion based on the Dempster-Shafer evidence theory fusion framework. An interpretable visualization output module, connected to all the aforementioned modules, is used to generate understandable diagnostic reports for clinicians. The interpretable visualization output module generates a heatmap based on the activation graphs of each sub-network in the multi-scale feature collaborative extraction module, retrieves knowledge paths related to the current diagnosis from the structured pathology knowledge graph module and interprets them in the form of natural language description, and displays key evidence and reliability values ​​in the decision fusion process based on evidence reasoning.

2. The fundus image diagnostic analysis system based on pattern recognition according to claim 1, characterized in that, The standardization operations in the image preprocessing and standardization module include image quality assessment, vascular structure-guided image registration, and adaptive normalization based on the retinal vascular network topology. The specific process of vascular structure-guided image registration is as follows: a template image with a standard vascular topology is selected from the standard image library; the vascular structure in the input image and the template image is enhanced using a Hessian matrix filter; a phase-information-based non-rigid registration algorithm is used to align the two images using vascular ridges as features; and a bicubic interpolation algorithm is used to spatially transform the input image to align its vascular network with the template topologically.

3. The fundus image diagnostic analysis system based on pattern recognition according to claim 1, characterized in that, The three sub-networks in the multi-scale feature collaborative extraction module are a high-resolution local detail network, a mid-scale region association network, and a global structure semantic network. These three sub-networks are cascaded by sharing the bottom-level feature map and a feature interaction gateway is set up in the middle layer to collaboratively construct a hierarchical and complementary multi-scale feature representation. The feature interaction gateway is implemented using a gated recurrent unit structure. The gateway takes the feature map of the current sub-network at a certain layer as input and the feature map of the corresponding layer or the previous layer of other sub-networks as context information. It generates a gated weight vector between 0 and 1 through the sigmoid function. The weight vector is used to control how much content in the context information needs to be added to the feature stream of the current sub-network.

4. The fundus image diagnostic analysis system based on pattern recognition according to claim 3, characterized in that, The high-resolution local detail network employs a densely connected convolutional layer structure, with a receptive field designed to cover 2% to 5% of the image area; the mesoscale region association network employs a convolutional structure with spatial pyramid pooling, with a receptive field covering 10% to 30% of the image area; and the global structural semantic network employs a deep residual network structure combined with a self-attention mechanism, with a receptive field covering the entire image.

5. The fundus image diagnostic analysis system based on pattern recognition according to claim 1, characterized in that, The update mechanism of the structured pathological knowledge graph module adopts a human-machine collaborative iterative approach. When the "suspected" case output by the evidence-based reasoning decision fusion module is reviewed and confirmed by authoritative experts, the corresponding image features and the final diagnostic label constitute a new knowledge triple candidate. The system performs consistency verification between the candidate triple and the existing knowledge graph. If there is no logical conflict, the system embeds the vectorized representation of the triple into the existing graph space through the knowledge graph embedding model and adjusts the connection weights of related entities and relationships.

6. The fundus image diagnostic analysis system based on pattern recognition according to claim 1, characterized in that, The basic process of the evidence-based reasoning decision fusion module is as follows: The decision fusion module constructs a basic probability allocation function for each of the three feature extraction sub-networks, and transforms the semantic matching degree provided by the structured pathology knowledge graph module into an independent evidence source. The Dempster combination rule is used to perform pairwise combination and iterative fusion of the basic probability allocations of the four evidence sources to obtain a comprehensive reliability distribution. The decision fusion module sets a reliability threshold and an uncertainty threshold. When the comprehensive reliability of a certain pathological sign or diagnostic category exceeds the reliability threshold of 0.85 and the reliability of the "unknown" proposition is lower than the uncertainty threshold of 0.15, a definitive diagnosis is output; otherwise, a "suspected" state is output. The construction of the basic probability allocation function is implemented using a deep evidence neural network. The network takes feature vectors as input and outputs reliability estimates for each category and a Dirichlet distribution concentration parameter that measures the uncertainty of the evidence. The network is trained end-to-end by maximizing the evidence likelihood of accurately classified samples and minimizing the evidence of misclassified samples.

7. The fundus image diagnostic analysis system based on pattern recognition according to claim 6, characterized in that, The basic probability assignment function is calculated as follows: the deep evidence neural network outputs a non-negative vector with a dimension equal to the cardinality of the recognition frame plus 1 as the concentration parameter of the Dirichlet distribution, and the basic probability assignment is proportional to the value of the concentration parameter minus 1.

8. The fundus image diagnostic analysis system based on pattern recognition according to claim 1, characterized in that, The knowledge graph in the structured pathology knowledge graph module includes specific pathological signs, anatomical structures, and disease diagnosis categories; relationships define the causal, concurrent, and exclusionary medical logical connections between entities; attributes describe the typical morphological parameters and statistical characteristics of entities in the image.

9. The fundus image diagnostic analysis system based on pattern recognition according to claim 1, characterized in that, The heatmap generated by the interpretable visualization output module is generated using class activation mapping technology, and the natural language description explains the logical path of inferring specific signs from image features and how these signs are related to the final disease diagnosis based on medical knowledge.

10. The fundus image diagnostic analysis system based on pattern recognition according to claim 1, characterized in that, The system runs on a medical image analysis server equipped with a high-performance graphics processor and a large capacity of memory, and all data transmission and storage comply with medical information privacy protection standards.