An eye image classification method and system based on deep learning
By combining bi-branch Bayesian feature extraction and Bayesian classification head, the discrepancy features between lesions and observers in ocular ultrasound images are explicitly characterized. This solves the problems of model overfitting and lack of evidence for follow-up recommendations when multiple lesions coexist. It realizes the quantification and spatial localization of prediction uncertainty, generates reliable clinical follow-up recommendations, and improves the reliability and clinical application value of ocular image classification.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHANGHAI SHIQUAN SHIMEI TECH DEV CO LTD
- Filing Date
- 2026-03-13
- Publication Date
- 2026-06-19
AI Technical Summary
Existing deep learning-based ocular ultrasound image classification methods lack explicit characterization of model prediction uncertainty in scenarios with multiple coexisting lesions. They are prone to overfitting to regions with blurred boundaries and conflicting features, and the review recommendations lack objective basis, resulting in insufficient reliability of classification results and clinical application value.
A two-branch Bayesian feature extraction network is used to extract lesion semantics and observer disagreement features. A lesion probability map and an observer disagreement map are generated through a Bayesian classification head. An uncertainty superposition heatmap is combined to identify disputed areas. A cross-modal review suggestion network is used to generate structured review suggestions, thereby quantifying and spatially locating the prediction uncertainty.
Explicitly characterizing model prediction uncertainty, accurately locating prediction discrepancies, and generating reliable clinical review suggestions improves the reliability and clinical application value of ocular image classification results, and enhances the interpretability and trustworthiness of model output results.
Smart Images

Figure CN122244932A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of medical image recognition technology, and in particular to a method and system for classifying eye images based on deep learning. Background Technology
[0002] In ophthalmic ultrasound screening, a single B-mode ultrasound image often simultaneously reveals multiple lesions, such as vitreous opacities, shallow retinal detachment, and choroidal thickening. These lesions overlap in spatial location and morphological characteristics, with blurred boundaries, increasing the complexity of image interpretation. Clinical practice shows low consistency in lesion delineation results among different physicians on the same ophthalmic ultrasound image, with typically low overlap, leading to widespread label ambiguity in the labeled data used for model training. Existing deep learning-based ophthalmic ultrasound image classification methods still have certain limitations in practical applications.
[0003] On the one hand, in complex scenarios where multiple lesions coexist or lesion morphologies overlap, existing methods typically classify based on a single forward prediction result, lacking explicit characterization of the model's prediction uncertainty. This can easily lead to overfitting of regions with blurred boundaries, conflicting features, or significant labeling discrepancies, thus risking the treatment of regions with large prediction discrepancies as deterministic lesions and affecting the reliability of the classification results.
[0004] On the other hand, after outputting classification results, existing methods mostly only provide overall diagnostic conclusions or simple confidence information, failing to clearly reveal the spatial distribution sources of model prediction discrepancies. This leads to subsequent clinical review recommendations often relying on human experience or uniform rules, lacking objective evidence corresponding to specific disputed areas, and making it difficult to provide doctors with targeted decision support.
[0005] Therefore, there is an urgent need for an ocular image classification method that can explicitly characterize the uncertainty of model prediction in scenarios with multiple coexisting lesions, accurately locate controversial spatial regions, and generate reliable clinical follow-up recommendations based on this. Summary of the Invention
[0006] To address the problem that existing models exhibit lag and non-adaptability when processing new postoperative data, making it difficult to update prediction results in real time, this invention provides a deep learning-based eye image classification method and system.
[0007] To achieve the above objectives, the technical solution adopted by the present invention is as follows: On one hand, this invention discloses a deep learning-based eye image classification method, comprising the following steps: Step 1: Acquire ocular ultrasound images containing regions with multiple coexisting lesions, and input the ocular ultrasound images into a two-branch Bayesian feature extraction network to output lesion semantic feature tensors and observer divergence feature tensors; Step 2: Concatenate the lesion semantic feature tensor and the observer divergence feature tensor along the channel dimension to form a joint ambiguity perception feature tensor, and output the reweighted ambiguity perception feature tensor. Step 3: Input the reweighted ambiguity-aware feature tensor into the pixel-wise Bayesian classification head, output the lesion probability map from the pixel-wise Bayesian classification head, and simultaneously calculate the prediction variance map of the lesion probability map at each pixel position, and use the prediction variance map as the observer divergence map. Step 4: Perform dual-image overlay encoding on the lesion probability map and the observer divergence map to generate an uncertainty overlay heat map, and mark the connected regions in the uncertainty overlay heat map that are higher than the preset uncertainty threshold as dispute candidate regions to obtain a dispute region mask map; Step 5: Perform pixel-level multiplication on the masked image of the disputed region and the original input ocular ultrasound image to extract the image block set of the disputed region. Then, input the image block set of the disputed region into the cross-modal review suggestion network, and the cross-modal review suggestion network outputs structured review suggestion labels. Step 6: Perform multi-image linkage visualization of the lesion probability map, the observer divergence map, and the structured review suggestion label to generate a color overlay display map with an uncertainty scale, and send the color overlay display map back to the doctor's workstation to complete the deep learning-based eye image classification.
[0008] Furthermore, step 1 includes: The ocular ultrasound image is input into the first branch of the two-branch Bayesian feature extraction network. Multiple Bayesian convolutional layers in the first branch extract the lesion edge-texture coupling information step by step during the weight distribution sampling process to form a lesion semantic feature tensor. The same ocular ultrasound image is input in parallel into the second branch of the two-branch Bayesian feature extraction network. Multiple Bayesian convolutional layers in the second branch capture the sampling differences of the same pixel position in multiple rounds during the same weight distribution sampling process to form an observer divergence feature tensor. In each forward propagation, Monte Carlo sampling is performed on the weight distribution of the first branch and the second branch simultaneously to ensure that the lesion semantic feature tensor and the observer divergence feature tensor are generated and output synchronously in the same probability space.
[0009] Furthermore, step 2 includes: The lesion semantic feature tensor and the observer divergence feature tensor are concatenated along the channel dimension to obtain a joint ambiguity sensing feature tensor. The concatenation order is to first place all channels of the lesion semantic feature tensor and then place all channels of the observer divergence feature tensor to ensure that lesion information and divergence information can be perceived simultaneously. The joint ambiguity sensing feature tensor is input into the channel attention submodule, which sequentially performs global average pooling, fully connected compression, ReLU activation, fully connected recovery, and Sigmoid activation to generate a joint ambiguity sensing feature tensor. The channel weight vectors with the same number of channels as the known feature tensor are multiplied channel by channel by the joint ambiguity-aware feature tensor to obtain the channel reweighted feature tensor. The channel reweighted feature tensor is input into the spatial attention submodule, which sequentially performs channel dimension averaging and maximum value stacking, 7×7 convolution, and sigmoid activation to generate a spatial weight map with the same spatial size as the channel reweighted feature tensor. The spatial weight map is then multiplied pixel by pixel by the channel reweighted feature tensor to output the reweighted ambiguity-aware feature tensor.
[0010] Furthermore, step 3 includes: The reweighted ambiguity-aware feature tensor is input into a pixel-wise Bayesian classification head. The weight distribution convolution kernel of the pixel-wise Bayesian classification head continuously performs Monte Carlo sampling during T forward propagations, generating T pixel-level classification logic value tensors. Each pixel-level classification logic value tensor maintains the same spatial size as the reweighted ambiguity-aware feature tensor. The T pixel-level classification logic value tensors are fed into a shared Softmax operator to obtain T pixel-level classification probability tensors. The T pixel-level classification probability tensors are arithmetically averaged along the sampling dimension to output a lesion probability map. Each pixel position in the lesion probability map stores the average probability value of the corresponding position belonging to a lesion. The variance of the T pixel-level classification probability tensors is calculated along the sampling dimension to generate a prediction variance map with the same spatial size as the lesion probability map. The variance value of each pixel position in the prediction variance map is directly used as the observer divergence value of the corresponding position in the observer divergence map and output synchronously.
[0011] Furthermore, step 4 includes: The lesion probability map and the observer divergence map are stacked pixel-wise at the same spatial position to form a dual-channel overlay map. A 3×3 convolution, batch normalization, and ReLU activation are then performed on the dual-channel overlay map to complete the dual-map overlay encoding, outputting an uncertainty overlay heatmap. The value of each pixel in the uncertainty overlay heatmap is compared pixel-by-pixel with a preset uncertainty threshold. Pixels with values higher than the preset uncertainty threshold are marked as candidate pixels for dispute, resulting in a candidate dispute region composed of all candidate pixels. An 8-adjacent connected component analysis is performed on the candidate dispute region, retaining connected components with a pixel count greater than a preset minimum area threshold to generate a dispute region mask map. The dispute region mask map has a value of 1 at pixel positions within the candidate dispute region and a value of 0 at other positions.
[0012] Furthermore, step 5 includes: Step 5 includes: The disputed region mask image is multiplied pixel-level with the original input ocular ultrasound image. The original grayscale values of pixels with a value of 1 in the disputed region mask image are retained, while the remaining pixels are set to zero to obtain a grayscale image of the disputed region. The grayscale image of the disputed region is then cropped by sliding at a preset fixed step size to obtain a set of disputed region image blocks of uniform size. Each disputed region image block carries the original coordinate information corresponding to the ocular ultrasound image. The set of disputed region image blocks is input into a cross-modal review suggestion network, which is processed sequentially by the convolutional layer, global average pooling layer, and fully connected classification layer of the cross-modal review suggestion network to output structured review suggestion labels. The structured review suggestion labels include labels suggesting additional OCT examination, labels suggesting additional fundus endoscopy review, and labels suggesting continued follow-up observation.
[0013] Furthermore, step 6 includes: The lesion probability map, the observer divergence map, and the structured review suggestion label are input into the multi-image linkage visualization module. The multi-image linkage visualization module assigns red-blue gradient values to the lesion probability map and yellow-green gradient values to the observer divergence map to generate a lesion pseudo-color image and a divergence pseudo-color image. The lesion pseudo-color image and the divergence pseudo-color image are then pixel-level superimposed and fused according to a preset transparency to obtain a color superimposed display image. The text content of the structured review suggestion label is embedded in the lower right corner of the color superimposed display image. The color superimposed display image containing the text content is pushed to the doctor's workstation display window via DICOM write-back, completing the presentation of the eye image classification results based on deep learning.
[0014] On the other hand, the present invention also discloses a deep learning-based eye image classification system, comprising: Feature extraction module: acquires ocular ultrasound images containing regions with multiple coexisting lesions, and inputs the ocular ultrasound images into a two-branch Bayesian feature extraction network to output lesion semantic feature tensors and observer divergence feature tensors; Ambiguity perception fusion module: Concatenates the lesion semantic feature tensor and the observer divergence feature tensor along the channel dimension to form a joint ambiguity perception feature tensor, and outputs a reweighted ambiguity perception feature tensor; Bayesian classification module: The heavily weighted ambiguous perception feature tensor is input into the pixel-wise Bayesian classification head, the pixel-wise Bayesian classification head outputs the lesion probability map, and the prediction variance map of the lesion probability map at each pixel position is calculated simultaneously, and the prediction variance map is used as the observer divergence map. The disputed region identification module: performs dual-image superposition encoding on the lesion probability map and the observer divergence map to generate an uncertainty superposition heat map, and marks the connected regions in the uncertainty superposition heat map that are higher than a preset uncertainty threshold as disputed candidate regions, thus obtaining a disputed region mask map; Review suggestion generation module: Performs pixel-level multiplication operation on the mask image of the disputed area and the original input ocular ultrasound image to extract the image block set of the disputed area, and inputs the image block set of the disputed area into the cross-modal review suggestion network, which outputs structured review suggestion labels. Visualization module: The lesion probability map, the observer divergence map, and the structured review suggestion label are visualized in a multi-map linkage to generate a color overlay display map with an uncertainty scale. The color overlay display map is then sent back to the doctor's workstation to complete the eye image classification based on deep learning.
[0015] The technological advancements achieved by this invention compared to existing technologies are as follows: This invention introduces an observer divergence modeling mechanism during lesion prediction to explicitly characterize the model's prediction uncertainty in the spatial dimension. This enables the model to distinguish between predicted stable regions and regions with significant prediction divergence, avoiding overfitting to controversial regions in cases of multiple overlapping lesions or blurred boundaries. This improves the robustness of classification results and effectively suppresses the problem of model overfitting to controversial regions in scenarios with multiple coexisting lesions. By constructing an observer divergence map and spatially locating high-uncertainty regions, this invention prevents the system from directly using regions with significant prediction divergence as the basis for deterministic classification in scenarios with multiple coexisting lesions or similar morphologies, thereby effectively reducing the risk of erroneous decisions. Using a mask map of controversial regions as a constraint, this invention generates review suggestions only for regions with significant prediction divergence, establishing a clear spatial correspondence between review suggestions and specific controversial regions, thus enhancing the rationality and credibility of clinical review decisions.
[0016] This invention organically combines the quantification of predictive uncertainty, the localization of disputed regions, and the generation of structured follow-up recommendations. This enables the system to proactively provide targeted clinical follow-up pathways while identifying sources of predicted risk, overcoming the lack of clear triggering criteria for follow-up recommendations in existing methods. By visually presenting lesion probability information, observer disagreement information, and follow-up recommendations through multi-graph linkage, this invention enhances the interpretability of the model's output, allowing physicians to intuitively understand the model's judgment basis and sources of uncertainty, thus increasing trust in clinical applications. This invention generates follow-up recommendations based on internal uncertainty modeling and a disputed region-driven strategy, eliminating the need for additional manual rules or multiple rounds of annotation. This effectively improves the system's clinical applicability and promotional value in complex multi-lesion scenarios.
[0017] In summary, this invention solves the technical problems in the prior art where models tend to overfit to controversial areas and follow-up recommendations lack credible basis in scenarios with multiple coexisting lesions. It can explicitly characterize the uncertainty of model predictions, accurately locate spatial areas where predictions differ, and generate credible clinical follow-up recommendations corresponding to specific controversial areas, thereby improving the reliability and clinical application value of ocular image classification results. Attached Figure Description
[0018] The accompanying drawings are provided to further illustrate the invention and form part of the specification. They are used together with the embodiments of the invention to explain the invention and do not constitute a limitation thereof.
[0019] In the attached diagram: Figure 1 This is a flowchart of the present invention. Detailed Implementation
[0020] The following specific embodiments can be combined with each other, and the same or similar concepts or processes may not be described again in some embodiments. The embodiments of the present invention will now be described with reference to the accompanying drawings.
[0021] Example 1 This embodiment discloses a deep learning-based eye image classification method, including the following steps: Step 1: Acquire ocular ultrasound images containing regions with multiple coexisting lesions, and input the ocular ultrasound images into a two-branch Bayesian feature extraction network to output lesion semantic feature tensors and observer divergence feature tensors; Step 2: Concatenate the lesion semantic feature tensor and the observer divergence feature tensor along the channel dimension to form a joint ambiguity perception feature tensor, and output the reweighted ambiguity perception feature tensor. Step 3: Input the reweighted ambiguity-aware feature tensor into the pixel-wise Bayesian classification head, output the lesion probability map from the pixel-wise Bayesian classification head, and simultaneously calculate the prediction variance map of the lesion probability map at each pixel position, and use the prediction variance map as the observer divergence map. Step 4: Perform dual-image overlay encoding on the lesion probability map and the observer divergence map to generate an uncertainty overlay heat map, and mark the connected regions in the uncertainty overlay heat map that are higher than the preset uncertainty threshold as dispute candidate regions to obtain a dispute region mask map; Step 5: Perform pixel-level multiplication on the masked image of the disputed region and the original input ocular ultrasound image to extract the image block set of the disputed region. Then, input the image block set of the disputed region into the cross-modal review suggestion network, and the cross-modal review suggestion network outputs structured review suggestion labels. Step 6: Perform multi-image linkage visualization of the lesion probability map, the observer divergence map, and the structured review suggestion label to generate a color overlay display map with an uncertainty scale, and send the color overlay display map back to the doctor's workstation to complete the deep learning-based eye image classification.
[0022] Specifically, step 1 includes: The purpose of step 1 is to explicitly model the uncertain semantic representation of the lesion ontology (lesion semantic feature tensor) and the uncertain representation caused by observer annotation ambiguity and subjective judgment difference (observer discrepancy feature tensor) in ocular ultrasound images with multiple lesions, boundary occlusion, and significant discrepancies in manual annotation, using a two-branch Bayesian feature extraction network, under the same probability space and the same weight sampling conditions. This provides a unified probabilistic basis for subsequent ambiguity perception, Bayesian classification, and identification of disputed regions.
[0023] In step 1, an ocular ultrasound image containing multiple coexisting lesions is first acquired. This ocular ultrasound image is a two-dimensional B-mode grayscale image, which may simultaneously contain multiple lesion structures such as vitreous opacities, shallow retinal detachment, and choroidal thickening, with different lesions exhibiting spatial overlap, occlusion, or weak boundary features. Before being input into the two-branch Bayesian feature extraction network, the ocular ultrasound image undergoes only size unification and intensity normalization to ensure the numerical stability of subsequent Bayesian convolutional layers; no artificial prior segmentation or lesion enhancement processing is performed on the image to avoid introducing additional deterministic bias. The preprocessed ocular ultrasound image, as a single input instance, is fed in parallel into the first and second branches of the two-branch Bayesian feature extraction network.
[0024] The bi-branch Bayesian feature extraction network consists of a first branch and a second branch. The two branches maintain structural isomorphism in terms of network depth, kernel size, and feature map resolution variation paths, but have a clear division of labor in terms of feature representation objectives. The core commonalities of the two branches are: both are composed of multiple cascaded Bayesian convolutional layers; the kernel weights of each Bayesian convolutional layer are not represented as fixed values, but as parameterized probability distributions; and during each forward propagation, the weight distributions of all Bayesian convolutional layers are simultaneously sampled using Monte Carlo methods.
[0025] The above design ensures that the first branch and the second branch use completely identical weight sampling instances in the same forward propagation, thereby generating the two types of feature tensors under the same probability space and the same source of randomness.
[0026] In step 1, the first branch is used to generate the semantic feature tensor of the lesion. Specifically, after the ocular ultrasound image is input to the first branch, it passes through multiple Bayesian convolutional layers in sequence. In each Bayesian convolutional layer, the kernel weights are... It follows a parameterized probability distribution, for example: in, and These are learnable parameters.
[0027] In a single forward propagation, a specific weight instance is obtained by performing a Monte Carlo sampling on the weight distribution. This allows for the completion of convolution operations. Multiple Bayesian convolutional layers are stacked sequentially to achieve the following functionality: Local structural information such as lesion edges and abrupt changes in echo intensity is extracted from the superficial layer; The distribution of lesion texture, morphological continuity, and spatial context are gradually integrated in the middle and upper layers; Characterizing the inherent uncertainty of lesion semantics in a probabilistic sense.
[0028] Finally, a multi-channel feature tensor is formed at the output of the first branch. The feature tensor corresponds to the input image in the spatial dimension and encodes the high-dimensional semantic representation of the lesion in the channel dimension. This feature tensor is defined as the lesion semantic feature tensor.
[0029] Parallel to the first branch, the ocular ultrasound image is simultaneously input to the second branch of the two-branch Bayesian feature extraction network to generate the observer divergence feature tensor. The second branch also consists of multiple Bayesian convolutional layers and shares the same set of weight distribution sampling results with the first branch in each forward propagation. The difference lies in that the feature modeling target of the second branch is not the deterministic semantics of the lesion, but rather the response instability pattern of the same pixel under multiple random weight samplings. Specifically, this is achieved as follows: Under the same weighted sampling conditions, the second branch focuses more on preserving the sensitivity to local echo changes, weak contrast regions, and structurally blurred boundaries in each layer; As the network layer deepens, the ability to represent the differences in response at the same location under probabilistic perturbations is gradually amplified; The implicit discrepancies in the judgments of multiple potential observers are encoded by channel dimension.
[0030] Finally, the second branch outputs a feature tensor that is spatially consistent with the lesion semantic feature tensor and independently encoded in the channel dimension. This feature tensor is defined as the observer divergence feature tensor.
[0031] In a complete forward propagation process: The first branch outputs the semantic feature tensor of the lesion; The second branch outputs the observer divergence feature tensor; Both are based on the same ocular ultrasound image input, the same weight distribution sampling instance, and the same probability space generation; This completes step 1, namely: Without introducing the assumption of consistency in manual annotation, we can simultaneously obtain the semantic expression of lesions and the observer's divergent expression, laying a unified probabilistic feature foundation for subsequent ambiguity perception fusion and Bayesian classification.
[0032] Specifically, step 2 includes: Step 2 takes the lesion semantic feature tensor and the observer disagreement feature tensor output synchronously in Step 1 as input. Its core purpose is to explicitly couple the lesion ontology semantic information and the observer disagreement information in a unified probability space through feature-level fusion and attention reweighting mechanism, and construct a reweighted ambiguity perception feature tensor that is sensitive to the coexistence of multiple lesions and label ambiguity, so as to provide a feature expression foundation that emphasizes both discrimination and uncertainty for subsequent pixel-by-pixel Bayesian classification.
[0033] In step 1, the semantic feature tensor of the lesion with consistent spatial dimensions and the observer divergence feature tensor have been obtained, denoted as follows: in, , For spatial dimensions; The number of channels in the semantic feature tensor of the lesion; The number of channels in the observer divergence feature tensor.
[0034] In step 2, the two feature tensors are first concatenated along the channel dimension, strictly following the order described in the claims: first, all channels of the lesion semantic feature tensor are placed, then all channels of the observer divergence feature tensor are placed, resulting in the joint ambiguity perception feature tensor. The joint ambiguity-aware feature tensor satisfies: This cascaded operation enables the network to simultaneously perceive the semantics of lesion structure and the observer divergence pattern within the same feature space, providing an explicit pathway basis for the subsequent attention mechanism to distinguish between deterministic lesion features and highly divergent features.
[0035] The joint ambiguity-aware feature tensor is then input into the channel attention submodule, which is used to adaptively evaluate the contribution of various features to the current input image in the channel dimension.
[0036] 1. Global Statistics and Channel Compression First, global average pooling is performed on the joint ambiguity-aware feature tensor in the spatial dimension to obtain the channel-level global description vector: This global descriptor vector probabilistically reflects the overall activation intensity of each channel in the current image.
[0037] 2. Nonlinear channel weight modeling The global description vector is sequentially processed through fully connected compression, ReLU activation, fully connected recovery, and Sigmoid activation to generate a channel weight vector: in, , These are learnable parameters; Represents the ReLU activation function; This represents the Sigmoid activation function. The dimension of the channel weight vector is exactly the same as the number of channels in the joint ambiguity-aware feature tensor, and it is used to characterize the importance of different semantic and divergent channels under the current input conditions.
[0038] 3. Channel-level reweighting Multiply the channel weight vector by the joint ambiguity-aware feature tensor channel by channel to obtain the channel-weighted feature tensor: Through this process, the network can suppress channels that contribute little to the current sample or are noisy, while strengthening feature channels that are closely related to lesion identification or bifurcation characterization.
[0039] The channel-weighted feature tensor is further input into the spatial attention submodule to locate key areas in the spatial dimension where lesions may coexist or where observers may disagree.
[0040] 1. Construction of spatial saliency mapping First, perform average pooling and max pooling operations along the channel dimension, and stack the results along the channel dimension to form a two-dimensional spatial description tensor: This operation preserves both the overall response trend and local extreme value information, which is beneficial for highlighting weak boundaries and high uncertainty regions.
[0041] 2. Spatial Weight Graph Generation The spatial description tensor is subjected to a 7×7 convolution operation and activated by a Sigmoid function to generate a spatial weight map: The spatial weight map and the channel reweighted feature tensor are completely identical in spatial dimensions.
[0042] 3. Spatial-level reweighting The spatial weight map is multiplied pixel-by-pixel by the channel-reweighted feature tensor to obtain the final output reweighted ambiguity-aware feature tensor: Through the cascaded reweighting process of channel attention and spatial attention described above, step 2 finally outputs a reweighted ambiguity perception feature tensor. This feature tensor distinguishes the importance of lesion semantic information and observer disagreement information at the channel level, and focuses on multiple lesion coexistence, blurred boundaries and high disagreement regions at the spatial level, providing highly discriminative and uncertainty-sensitive input features for the pixel-by-pixel Bayesian classification head in step 3.
[0043] Specifically, step 3 includes: Step 3 takes the reweighted ambiguity-aware feature tensor output from Step 2 as its sole input. Its core technical objective is to obtain, at the pixel-by-pixel scale, a probabilistic judgment result (lesion probability map) for each pixel belonging to a lesion, and a prediction variance representation (observer disagreement map) jointly reflected by model uncertainty and potential observer disagreement, through Bayesian modeling and Monte Carlo sampling. This avoids traditional deterministic classification networks from giving a single, uninterpretable hard decision in disputed areas.
[0044] The reweighted ambiguity-aware feature tensor output in step 2 is denoted as: in, , For spatial dimensions; This represents the number of feature channels after reweighting by channel and spatial attention.
[0045] The reweighted ambiguity-aware feature tensor is input to a pixel-wise Bayesian classification head. This classification head consists of at least one Bayesian convolutional layer, whose kernel weights are also represented as parameterized probability distributions, and Monte Carlo sampling is performed in each forward propagation. The output of the pixel-wise Bayesian classification head maintains the same spatial dimensions as the input feature tensor to ensure that the classification results correspond one-to-one with the original ocular ultrasound image at the pixel level.
[0046] In step 3, the pixel-by-pixel Bayes classification head is executed. The second forward propagation, in the... In the next forward propagation, an independent Monte Carlo sampling is performed on the weight distribution of all Bayesian convolution kernels in the classification head to obtain the weight instances. And based on this, the pixel-level classification logical value tensor is calculated: in, This represents the forward mapping of the pixel-by-pixel Bayesian classification head; , This represents the number of categories.
[0047] Each pixel-level classification logic value tensor obtained is completely consistent with the reweighted ambiguity perception feature tensor in spatial dimension, thus ensuring that the output of the same pixel position under different sampling can be directly statistically analyzed.
[0048] For each pixel-level classification logical value tensor generated during forward propagation, it is fed into a shared Softmax operator to map the logical values to probability values, thus obtaining the result. Subpixel-level classification probability tensor: in, Indicates the spatial location of a pixel; This indicates a category index.
[0049] Therefore, in Under the Monte Carlo sampling, a set of probability prediction results corresponding to the same pixel position are obtained. .
[0050] To obtain a stable estimate of the presence of lesions, the above... The lesion probability map is obtained by arithmetically averaging the subpixel-level classification probability tensor along the sampling dimension. in, Represents pixels The average predicted probability of belonging to the lesion category. The lesion probability map is completely consistent with the input ocular ultrasound image in terms of spatial dimensions, and the value of each pixel position is continuously distributed in [0,1], which is used to characterize the probability intensity of the lesion's presence.
[0051] In obtaining Based on the subpixel-level classification probability tensor, the variance of the prediction results at the same pixel location in the sampling dimension is further calculated to generate a prediction variance map: The prediction variance map and the lesion probability map are completely identical in spatial dimensions, and the variance value of each pixel position directly reflects the degree of fluctuation of the prediction result at that position under multiple Bayesian samplings.
[0052] In this invention, the prediction variance is not additionally normalized or reinterpreted; instead, the prediction variance plot is directly used as the observer divergence plot. Synchronous output, in technical terms, means that the pixel position with a lower variance value indicates that the model's prediction is stable in that region, corresponding to a high consistency of observer judgment; the pixel position with a higher variance value indicates that the prediction result fluctuates significantly in the sampling space, corresponding to potential observer disagreement or disputed lesion boundary areas.
[0053] Step 3 outputs the following results simultaneously through multiple Monte Carlo sampling and statistical modeling using a pixel-by-pixel Bayesian classification head: Lesion probability map: used to depict the probability distribution of lesions at the pixel level; Observer divergence plot: used to characterize the uncertainty in prediction of the same pixel location and the degree of potential observer divergence.
[0054] Both are generated on the same spatial scale and within the same probabilistic modeling framework, providing a direct and quantifiable input basis for uncertainty superposition coding and disputed area identification in the subsequent step 4.
[0055] Specifically, step 4 includes: Step 4 takes the lesion probability map and observer disagreement map output synchronously in Step 3 as input. Its core technical purpose is to construct an uncertainty superposition heatmap that can simultaneously reflect the intensity of lesion probability and the degree of disagreement by spatially aligning and jointly encoding the probability of lesion existence and the uncertainty of prediction / observer disagreement. Based on this, it automatically identifies the controversial areas most likely to cause ambiguity in clinical judgment for subsequent review suggestions and visualization.
[0056] In step 3, the probability map of lesions with consistent spatial dimensions and the observer divergence map have been obtained, denoted as follows: in, Represents pixels The average predicted probability of belonging to a lesion; Represents pixels The predicted variance value corresponds to the degree of observer disagreement.
[0057] In step 4, the two images are first stacked pixel-by-pixel according to the same spatial position to form a dual-channel overlay image: The dual-channel overlay image maintains the pixel alignment of the original ocular ultrasound image in the spatial dimension, and explicitly preserves probability and divergence information in the channel dimension.
[0058] To further integrate the nonlinear relationship between lesion probability and observer discrepancy, a convolutional encoding operation is performed on the dual-channel overlay image. Specifically, the dual-channel overlay image is sequentially input into: 1. A single 3×3 convolution operation is used to jointly model probability changes and divergence changes within a local neighborhood; 2. Batch normalization operation to stabilize the numerical distribution at different pixel locations; 3. ReLU activation function to enhance response to regions of high uncertainty.
[0059] After the above operations, the uncertainty superposition heatmap is output, denoted as: The uncertainty superposition heatmap comprehensively reflects the following information at each pixel location: Areas with a high probability of lesions and significant discrepancies; Areas with moderate lesion probability but unstable prediction and ambiguous boundaries; The coexistence of multiple lesions may lead to inconsistencies in model assessments, creating potentially contentious areas.
[0060] After obtaining the uncertainty superposition heatmap, a preset uncertainty threshold is introduced. Perform pixel-by-pixel comparisons on the heatmap, for any pixel The rules for determining disputed candidates are as follows: in, A binary label map representing the candidate region of controversy; pixels with a value of 1 are labeled as candidate pixels of controversy. The region consisting of all candidate pixels of controversy constitutes the candidate region of controversy, which is usually concentrated in the lesion boundary, echo transition zone, or area where multiple lesions overlap.
[0061] To avoid interference from scattered noisy pixels in subsequent analysis, an 8-adjacent connected component analysis is performed on the disputed candidate region. The specific processing steps include: 1. Under the 8-adjacency rule, perform connected component partitioning on all disputed candidate pixels; 2. Count the number of pixels in each connected component; 3. Retain connected regions with a pixel count greater than the preset minimum area threshold, and filter out isolated regions with too small an area.
[0062] After the above processing, the final mask image of the disputed area is generated, denoted as: Among them, the pixel positions within the disputed area In the undisputed pixel location, .
[0063] Through the dual-image overlay coding and spatial connectivity analysis in step 4, the uncertainty overlay heatmap is finally obtained: used to continuously characterize the joint distribution of lesion probability and observer disagreement; and the disputed region mask map is used to clearly locate the spatial regions that are most ambiguous in clinical judgment and most in need of re-examination or further imaging support. The disputed region mask map will be used in subsequent step 5 to extract disputed region image patches, providing accurate input for the cross-modal re-examination suggestion network.
[0064] Specifically, step 5 includes: Step 5 uses the disputed area mask generated in Step 4 as the core guiding information, combined with the initially input ocular ultrasound image. Its technical purpose is to accurately extract local areas where the model and the observer have a high concentration of disagreement from the entire ocular ultrasound image, and automatically generate structured review suggestion labels based on the image features of these disputed areas, thereby upgrading the model output from result prompts to clinical decision support.
[0065] In step 4, a mask map of the disputed region, with spatial dimensions consistent with the original ocular ultrasound image, has been obtained, denoted as... Let the original input ocular ultrasound image be denoted as: In step 5, pixel-level multiplication is first performed on the masked image of the disputed region and the original ocular ultrasound image: This generates a grayscale image of the disputed area. Its technical effect lies in fully preserving the original echo grayscale information within the disputed area; and uniformly setting the pixel values of non-disputed areas to zero to avoid interference from irrelevant backgrounds in subsequent analysis.
[0066] To transform the disputed region information into a local input suitable for network processing, a sliding cropping operation based on a fixed step size is performed on the grayscale image of the disputed region. Specifically, sliding cropping is performed on the grayscale image of the disputed region with a preset window size and a fixed step size to obtain multiple image patches of the disputed region with uniform size, forming a set of disputed region image patches, denoted as: Each disputed area image patch The spatial dimensions are the same; each disputed region image block contains at least a portion of a pixel region with a mask value of 1; each disputed region image block also carries its original coordinate information in the original ocular ultrasound image. This method is used for subsequent result backtracking and visualization localization. It transforms irregularly shaped disputed regions into structurally unified and semantically focused local image representations.
[0067] The set of image patches representing the disputed region is input as a whole into a cross-modal review suggestion network. This network is used to learn the correspondence between the imaging patterns of the disputed region and the clinical review methods. The cross-modal review suggestion network includes, in sequence: 1. Convolutional layer: Extracts local features for each disputed region image patch, focusing on modeling echo distribution, boundary blurring degree, and internal texture structure; 2. Global average pooling layer: aggregates spatial features to generate an overall representation vector for the current disputed region; 3. Fully connected classification layer: Maps the feature representation of the disputed region to the review suggestion category space.
[0068] For the disputed region image patch The predicted results of its follow-up examination can be formally represented as: in, This represents the overall mapping function of the cross-modal review suggestion network.
[0069] The output of the cross-modal review suggestion network is a structured review suggestion label, which belongs to a predefined set of discrete categories, including but not limited to: It is recommended to add an OCT examination label: to indicate that further confirmation of retinal or choroidal structures is needed through high-resolution tomography. It is recommended to add an ophthalmoscopy follow-up label: to indicate that a direct or indirect fundus examination is needed to assist in the diagnosis; The "Continue to follow up and observe" label is used to indicate that the current disputed area does not have clear indicators and can be dynamically followed up.
[0070] When the set of image patches in the disputed area contains multiple image patches, a corresponding set of structured review suggestion labels can be formed based on the prediction results of all image patches, providing doctors with multi-regional and multi-angle review reference information.
[0071] Through the processing flow in step 5, the disputed areas identified in step 4 are accurately mapped back to the original ocular ultrasound images; highly ambiguous image regions are extracted in the form of local image blocks to avoid interference from overall image noise; based on the image features of the disputed areas, interpretable and actionable structured review suggestion labels are automatically generated. These structured review suggestion labels are then visualized in a multi-image linkage with the lesion probability map and observer divergence map in subsequent step 6, ultimately forming a complete decision support output for clinical use.
[0072] Specifically, step 6 includes: Step 6 is the result presentation and clinical interaction stage of the method of the present invention, which takes the following three types of core results generated in the upstream steps 1-5 as input: the lesion probability map generated in step 3; the observer divergence map generated in step 3; and the structured review suggestion label generated in step 5.
[0073] The technical objective of step 6 is to visualize the model's probability judgment of lesion existence, the uncertainty of observer disagreement, and the corresponding clinical review suggestions through spatially consistent and semantically complementary multi-image linkage, and to transmit them back to the doctor's workstation in a manner that conforms to the clinical workflow, thereby achieving an interpretable, traceable, and decision-supportable presentation of eye image classification results.
[0074] In step 6, the following three types of data are simultaneously input into the multi-graph linkage visualization module: 1. Lesion probability map Used to characterize the probability distribution intensity of lesions in space.
[0075] 2. Observer Disagreement Diagram Used to characterize the instability of model predictions and the degree of potential observer disagreement.
[0076] 3. Structured review suggestion labels These include recommendations to add OCT examination labels, recommendations to add funduscopy re-examination labels, and recommendations to continue follow-up observation labels, in order to provide clear clinical action guidelines for controversial areas.
[0077] The above data are strictly aligned in the spatial dimension to ensure that the subsequent visualization results can correspond to the original ocular ultrasound images at the pixel level.
[0078] First, pseudo-color mapping is performed on the lesion probability map. The multi-map visualization module uses a red-blue gradient to assign color values to the lesion probability map, where: Low-probability areas are mapped to cool colors (blue); High-probability areas are mapped to warm colors (red); The intermediate probability region corresponds to a continuous gradient color level.
[0079] Formal representation: in, This represents the red-blue gradient color level mapping function; This is a false color image of the lesion.
[0080] This mapping enables doctors to intuitively identify the spatial distribution and intensity variations of lesion probabilities.
[0081] While performing pseudo-color mapping on the lesion probability map, an independent pseudo-color mapping process is applied to the observer bifurcation map. The multi-map linkage visualization module uses a yellow-green gradient to assign color values to the observer bifurcation map, where: Low-discrepancy regions are mapped to green, indicating high prediction consistency; High-disagreement areas are mapped in yellow, indicating significant uncertainty and disputed judgments.
[0082] Its mapping relationship can be expressed as: in, This represents the yellow-green gradient color level mapping function; This is a divergence pseudocolor chart, used to explicitly indicate to doctors which areas' judgments require careful interpretation.
[0083] To simultaneously present lesion probability and observer discrepancy information in the same view, pixel-level overlay and fusion of the lesion pseudocolor image and the discrepancy pseudocolor image are performed, with preset transparency parameters. Under the control of [the system / mechanism], a color overlay display image is generated: Through a transparency fusion mechanism, doctors can perceive the following simultaneously within a single display interface: Spatial location and probability intensity of lesions; The reliability and degree of disagreement of the location prediction results.
[0084] After generating the color overlay image, the structured follow-up recommendation labels output in step 5 are embedded in the preset display area of the color overlay image in text form. Preferably, the structured follow-up recommendation labels are embedded in the lower right corner of the color overlay image, and their content includes, but is not limited to: recommending additional OCT examination; recommending additional fundus examination; recommending continued follow-up observation. By presenting the follow-up recommendations synchronously with the image results, a direct transition from image interpretation to clinical action recommendations is achieved.
[0085] Finally, a color overlay image containing lesion probability information, observer disagreement information, and structured review recommendations is pushed to the doctor's workstation display window via DICOM write-back. This color overlay image is stored and displayed as a standard medical image object and can be directly integrated into existing ophthalmic image browsing and diagnostic workflows to complete the presentation of deep learning-based eye image classification results.
[0086] Step 6 enables the simultaneous display of lesion probability, uncertainty, and follow-up recommendations in a single view across multiple graphs; it explicitly presents the credibility of the model's predictions, avoiding the misuse of highly divergent areas as definitive conclusions; and it outputs results in a manner consistent with clinical practice, enhancing doctors' understanding and trust in the deep learning-assisted system.
[0087] This completes the workflow of a deep learning-based eye image classification method, from inputting ocular ultrasound images to outputting interpretable and executable clinical decision support.
[0088] Example 2 This embodiment discloses a deep learning-based eye image classification system, including: Feature extraction module: acquires ocular ultrasound images containing regions with multiple coexisting lesions, and inputs the ocular ultrasound images into a two-branch Bayesian feature extraction network to output lesion semantic feature tensors and observer divergence feature tensors; Ambiguity perception fusion module: Concatenates the lesion semantic feature tensor and the observer divergence feature tensor along the channel dimension to form a joint ambiguity perception feature tensor, and outputs a reweighted ambiguity perception feature tensor; Bayesian classification module: The heavily weighted ambiguous perception feature tensor is input into the pixel-wise Bayesian classification head, the pixel-wise Bayesian classification head outputs the lesion probability map, and the prediction variance map of the lesion probability map at each pixel position is calculated simultaneously, and the prediction variance map is used as the observer divergence map. The disputed region identification module: performs dual-image superposition encoding on the lesion probability map and the observer divergence map to generate an uncertainty superposition heat map, and marks the connected regions in the uncertainty superposition heat map that are higher than a preset uncertainty threshold as disputed candidate regions, thus obtaining a disputed region mask map; Review suggestion generation module: Performs pixel-level multiplication operation on the mask image of the disputed area and the original input ocular ultrasound image to extract the image block set of the disputed area, and inputs the image block set of the disputed area into the cross-modal review suggestion network, which outputs structured review suggestion labels. Visualization module: The lesion probability map, the observer divergence map, and the structured review suggestion label are visualized in a multi-map linkage to generate a color overlay display map with an uncertainty scale. The color overlay display map is then sent back to the doctor's workstation to complete the eye image classification based on deep learning.
[0089] The modules in Embodiment 2 are used to implement the functions in Embodiment 1. This embodiment can be implemented by a system including a processor and a memory. The memory stores computer program instructions, which, when executed by the processor, implement the deep learning-based eye image classification method according to Embodiment 1 of this application. The system also includes other components well known to those skilled in the art, such as a communication bus and a communication interface. Their settings and functions are known in the art and will not be described in detail here.
[0090] In this application, the aforementioned memory can be any tangible medium containing or storing a program that can be used or combined with an instruction execution system, system, or device. For example, a computer-readable storage medium can be any suitable magnetic or magneto-optical storage medium, such as Resistive Random Access Memory (RRAM), Dynamic Random Access Memory (DRAM), Static Random Access Memory (SRAM), Enhanced Dynamic Random Access Memory (EDRAM), High-Bandwidth Memory (HBM), Hybrid Memory Cube (HMC), etc., or any other medium that can be used to store required information and can be accessed by an application, module, or both. Any such computer storage medium can be part of a device or accessible to or connected to a device. Any application or module described in this application can be implemented using computer-readable / executable instructions that can be stored or otherwise retained by such a computer-readable medium.
[0091] Finally, it should be noted that the above descriptions are merely preferred embodiments of the present invention and are not intended to limit the present invention. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing embodiments or make equivalent substitutions for some of the technical features. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the claims of the present invention.
Claims
1. A deep learning-based eye image classification method, characterized in that, Includes the following steps: Step 1: Acquire ocular ultrasound images containing regions with multiple coexisting lesions, and input the ocular ultrasound images into a two-branch Bayesian feature extraction network to output lesion semantic feature tensors and observer divergence feature tensors; Step 2: Concatenate the lesion semantic feature tensor and the observer divergence feature tensor along the channel dimension to form a joint ambiguity perception feature tensor, and output the reweighted ambiguity perception feature tensor. Step 3: Input the reweighted ambiguity-aware feature tensor into the pixel-wise Bayesian classification head, output the lesion probability map from the pixel-wise Bayesian classification head, and simultaneously calculate the prediction variance map of the lesion probability map at each pixel position, and use the prediction variance map as the observer divergence map. Step 4: Perform dual-image overlay encoding on the lesion probability map and the observer divergence map to generate an uncertainty overlay heat map, and mark the connected regions in the uncertainty overlay heat map that are higher than the preset uncertainty threshold as dispute candidate regions to obtain a dispute region mask map; Step 5: Perform pixel-level multiplication on the masked image of the disputed region and the original input ocular ultrasound image to extract the image block set of the disputed region. Then, input the image block set of the disputed region into the cross-modal review suggestion network, and the cross-modal review suggestion network outputs structured review suggestion labels. Step 6: Perform multi-image linkage visualization of the lesion probability map, the observer divergence map, and the structured review suggestion label to generate a color overlay display map with an uncertainty scale, and send the color overlay display map back to the doctor's workstation to complete the deep learning-based eye image classification.
2. The deep learning-based eye image classification method according to claim 1, characterized in that, Step 1 includes: The ocular ultrasound image is input into the first branch of the two-branch Bayesian feature extraction network. Multiple Bayesian convolutional layers in the first branch extract the lesion edge-texture coupling information step by step during the weight distribution sampling process to form a lesion semantic feature tensor. The same ocular ultrasound image is input in parallel into the second branch of the two-branch Bayesian feature extraction network. Multiple Bayesian convolutional layers in the second branch capture the sampling differences of the same pixel position in multiple rounds during the same weight distribution sampling process to form an observer divergence feature tensor. In each forward propagation, Monte Carlo sampling is performed on the weight distribution of the first branch and the second branch simultaneously to ensure that the lesion semantic feature tensor and the observer divergence feature tensor are generated and output synchronously in the same probability space.
3. The eye image classification method based on deep learning according to claim 1, characterized in that, Step 2 includes: The lesion semantic feature tensor and the observer divergence feature tensor are concatenated along the channel dimension to obtain a joint ambiguity sensing feature tensor. The concatenation order is to first place all channels of the lesion semantic feature tensor and then place all channels of the observer divergence feature tensor to ensure that lesion information and divergence information can be perceived simultaneously. The joint ambiguity sensing feature tensor is input into the channel attention submodule, which sequentially performs global average pooling, fully connected compression, ReLU activation, fully connected recovery, and Sigmoid activation to generate a joint ambiguity sensing feature tensor. The channel weight vectors with the same number of channels as the known feature tensor are multiplied channel by channel by the joint ambiguity-aware feature tensor to obtain the channel reweighted feature tensor. The channel reweighted feature tensor is input into the spatial attention submodule, which sequentially performs channel dimension averaging and maximum value stacking, 7×7 convolution, and sigmoid activation to generate a spatial weight map with the same spatial size as the channel reweighted feature tensor. The spatial weight map is then multiplied pixel by pixel by the channel reweighted feature tensor to output the reweighted ambiguity-aware feature tensor.
4. The deep learning-based eye image classification method according to claim 1, characterized in that, Step 3 includes: The reweighted ambiguity-aware feature tensor is input into a pixel-wise Bayesian classification head. The weight distribution convolution kernel of the pixel-wise Bayesian classification head continuously performs Monte Carlo sampling during T forward propagations, generating T pixel-level classification logic value tensors. Each pixel-level classification logic value tensor maintains the same spatial size as the reweighted ambiguity-aware feature tensor. The T pixel-level classification logic value tensors are fed into a shared Softmax operator to obtain T pixel-level classification probability tensors. The T pixel-level classification probability tensors are arithmetically averaged along the sampling dimension to output a lesion probability map. Each pixel position in the lesion probability map stores the average probability value of the corresponding position belonging to a lesion. The variance of the T pixel-level classification probability tensors is calculated along the sampling dimension to generate a prediction variance map with the same spatial size as the lesion probability map. The variance value of each pixel position in the prediction variance map is directly used as the observer divergence value of the corresponding position in the observer divergence map and output synchronously.
5. The deep learning-based eye image classification method according to claim 1, characterized in that, Step 4 includes: The lesion probability map and the observer divergence map are stacked pixel-wise at the same spatial position to form a dual-channel overlay map. A 3×3 convolution, batch normalization, and ReLU activation are then performed on the dual-channel overlay map to complete the dual-map overlay encoding, outputting an uncertainty overlay heatmap. The value of each pixel in the uncertainty overlay heatmap is compared pixel-by-pixel with a preset uncertainty threshold. Pixels with values higher than the preset uncertainty threshold are marked as candidate pixels for dispute, resulting in a candidate dispute region composed of all candidate pixels. An 8-adjacent connected component analysis is performed on the candidate dispute region, retaining connected components with a pixel count greater than a preset minimum area threshold to generate a dispute region mask map. The dispute region mask map has a value of 1 at pixel positions within the candidate dispute region and a value of 0 at other positions.
6. The deep learning-based eye image classification method according to claim 1, characterized in that, Step 5 includes: The disputed region mask image is multiplied pixel-level with the original input ocular ultrasound image. The original grayscale values of pixels with a value of 1 in the disputed region mask image are retained, while the remaining pixels are set to zero to obtain a grayscale image of the disputed region. The grayscale image of the disputed region is then cropped by sliding at a preset fixed step size to obtain a set of disputed region image blocks of uniform size. Each disputed region image block carries the original coordinate information corresponding to the ocular ultrasound image. The set of disputed region image blocks is input into a cross-modal review suggestion network, which is processed sequentially by the convolutional layer, global average pooling layer, and fully connected classification layer of the cross-modal review suggestion network to output structured review suggestion labels. The structured review suggestion labels include labels suggesting additional OCT examination, labels suggesting additional fundus endoscopy review, and labels suggesting continued follow-up observation.
7. The deep learning-based eye image classification method according to claim 1, characterized in that, Step 6 includes: The lesion probability map, the observer divergence map, and the structured review suggestion label are input into the multi-image linkage visualization module. The multi-image linkage visualization module assigns red-blue gradient values to the lesion probability map and yellow-green gradient values to the observer divergence map to generate a lesion pseudo-color image and a divergence pseudo-color image. The lesion pseudo-color image and the divergence pseudo-color image are then pixel-level superimposed and fused according to a preset transparency to obtain a color superimposed display image. The text content of the structured review suggestion label is embedded in the lower right corner of the color superimposed display image. The color superimposed display image containing the text content is pushed to the doctor's workstation display window via DICOM write-back, completing the presentation of the eye image classification results based on deep learning.
8. A deep learning-based eye image classification system, characterized in that, include: Feature extraction module: acquires ocular ultrasound images containing regions with multiple coexisting lesions, and inputs the ocular ultrasound images into a two-branch Bayesian feature extraction network to output lesion semantic feature tensors and observer divergence feature tensors; Ambiguity perception fusion module: Concatenates the lesion semantic feature tensor and the observer divergence feature tensor along the channel dimension to form a joint ambiguity perception feature tensor, and outputs a reweighted ambiguity perception feature tensor; Bayesian classification module: The heavily weighted ambiguous perception feature tensor is input into the pixel-wise Bayesian classification head, the pixel-wise Bayesian classification head outputs the lesion probability map, and the prediction variance map of the lesion probability map at each pixel position is calculated simultaneously, and the prediction variance map is used as the observer divergence map. The disputed region identification module: performs dual-image superposition encoding on the lesion probability map and the observer divergence map to generate an uncertainty superposition heat map, and marks the connected regions in the uncertainty superposition heat map that are higher than a preset uncertainty threshold as disputed candidate regions, thus obtaining a disputed region mask map; Review suggestion generation module: Performs pixel-level multiplication operation on the mask image of the disputed area and the original input ocular ultrasound image to extract the image block set of the disputed area, and inputs the image block set of the disputed area into the cross-modal review suggestion network, which outputs structured review suggestion labels. Visualization module: The lesion probability map, the observer divergence map, and the structured review suggestion label are visualized in a multi-map linkage to generate a color overlay display map with an uncertainty scale. The color overlay display map is then sent back to the doctor's workstation to complete the eye image classification based on deep learning.