Quality map for optical coherence tomography angiography

By applying machine learning technology to generate quality maps in the OCT system, the subjectivity and time-consuming nature of OCT scan quality assessment are solved, enabling automated quality identification and improvement, and enhancing the reliability and efficiency of scan quality.

CN122243893APending Publication Date: 2026-06-19CARL ZEISS MEDITEC INC +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CARL ZEISS MEDITEC INC
Filing Date
2021-11-29
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing OCTA scan quality assessment methods are subjective and time-consuming, failing to provide objective quality metrics. This leads to diagnostic uncertainty and data loss in low-quality data, and there is a lack of methods for quickly identifying and improving scan quality.

Method used

By applying machine learning techniques, especially deep learning and neural networks, to the OCT system, quantitative OCT/OCTA data quality maps are generated, low-quality areas are identified, and correction suggestions are provided, thereby achieving automated scan quality improvement.

Benefits of technology

It provides objective quality metrics for OCT/OCTA scans, quickly identifies low-quality areas, and automatically suggests improvement measures, thereby improving the reliability and efficiency of scan quality.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122243893A_ABST
    Figure CN122243893A_ABST
Patent Text Reader

Abstract

A system, method, and / or apparatus for determining a quality metric for OCT structured data and / or OCTA functional data, using a machine learning model trained to provide a single overall quality metric or quality map distribution for the OCT / OCTA data based on the generation of multiple feature maps extracted from one or more slice views of the OCT / OCTA data. The extracted feature maps may be different texture type maps, and the machine model is trained to determine the quality metric based on the texture maps.
Need to check novelty before this filing date? Find Prior Art

Description

[0001] This application is a divisional application of the invention patent application filed on November 29, 2021, entering the Chinese national phase on May 23, 2023, with application number 202180078832.3 and invention title "Quality image of optical coherence tomography angiography". Invention Field

[0002] This invention primarily relates to optical coherence tomography (OCT) systems. More specifically, this invention relates to methods for determining quality metrics of OCT and OCT angiography scans and generating quality maps. Background Technology

[0003] Optical coherence tomography (OCT) is a non-invasive imaging technique that uses light waves to penetrate tissue and generate image information at different depths within the tissue. Essentially, an OCT system is an interferometric imaging system that determines the scattering profile of a sample along the OCT beam by detecting the interference of light reflected from the sample with a reference beam, thus creating a three-dimensional (3D) representation of the sample. Each scattering profile in the depth direction (e.g., z-axis or axial direction) can be individually reconstructed as an axial scan or A-scan. Cross-sectional slice images (e.g., two-dimensional (2D) bisection scans or B-scans) and volumetric images (e.g., three-dimensional (3D) cube scans or C-scans) can be constructed from multiple A-scans acquired as the OCT beam scans / moves through a set of lateral (e.g., x-axis and y-axis) positions on the sample. OCT systems also allow for the construction of planar frontal views (e.g., facial) images of selected portions of a tissue volume (e.g., a slice view of the target tissue (sub-volume) or a target tissue layer, such as the retina of the eye).

[0004] In the field of ophthalmology, OCT systems were initially developed to provide structural data, such as cross-sectional images of retinal tissue, but today they can also provide functional information, such as flow information. While OCT structural data allows observation of different tissue layers of the retina, OCT angiography (OCTA) extends the functionality of OCT systems by also identifying (e.g., presenting in image format) the presence or absence of flow in retinal tissue. For example, OCTA can identify flow by recognizing differences over time (e.g., contrast differences) in multiple OCT scans of the same retinal region and designating differences that meet predetermined criteria as flow. Although data generated by an OCT system (e.g., OCT data) can include both OCT structural data and OCT flow data, depending on the capabilities of the OCT system, for ease of discussion, unless otherwise stated or understood from the context, OCT structural data may be referred to herein as "OCT data" and OCT angiography (or flow) data may be referred to herein as "OCTA data." Thus, OCT can be said to provide structural information, while OCTA provides flow (e.g., functional) information. However, since both OCT and OCTA data can be extracted from the same one or more OCT scans, the term "OCT scan" can be understood to include both structural OCT scans (e.g., OCT acquisition) and / or functional OCT scans (e.g., OCTA acquisition), unless otherwise stated. A more in-depth discussion of OCT and OCTA is provided below.

[0005] OCTA provides valuable diagnostic information not found in structural OCT, but OCTA scans can suffer from acquisition problems, potentially leading to unsatisfactory quality. Existing attempts to quantify OCT scan quality focus on OCT structural data and typically rely on signal intensity measurements, such as those described in "A Novel Parameter for Evaluating the Quality of Optical Coherence Tomography" by DMStein et al., *British Journal of Ophthalmology*, 2006, 90:186-190. While signal intensity measurements have been found useful for evaluating OCT structural data, the use of such methods in OCTA data is limited because the quality of the derived flow information depends on many other factors not included in such quantification.

[0006] Therefore, OCTA scan quality is often subjectively determined by the observer to determine whether a particular OCTA acquisition (e.g., an OCTA scan) is suitable for diagnosis or should be included in a broad study. Examples of this approach can be found in: “Determinants of quantitative optical coherence tomography angiography parameters in diabetic patients,” Tang FY et al., *Scientific Reports*, 2018;8:7314; “Contact lens-related corneal vascularization scanning source optical coherence tomography angiography,” Ang M et al., *Journal of Ophthalmology*, 2016, 9685297; and “Impact of eye-tracking technology on imaging quality of OCT angiography in age-related macular degeneration,” Lauermann et al., *Glasife Archives of Clinical and Experimental Ophthalmology*, 2017, 255:1535. However, these methods are extremely subjective and time-consuming. Furthermore, subjective quality assessment, often performed after the patient’s examination during an alpha-posterior scan, makes it impossible to attempt and acquire additional scans of better quality to replace low-quality data when the patient leaves the clinic, leading to data loss or indeterminate diagnoses. Even for operators who can proactively assess the quality of OCTA scans during acquisition while the patient is still in the clinic, there are currently no guidelines with quantitative quality scores that would help establish an objective quality cutoff metric for rescanning or for improving the quality of subsequent acquisitions.

[0007] The purpose of this invention is to provide a system / apparatus / method for providing objective quality measurements of OCT / OCTA data.

[0008] Another objective of this invention is to provide a rapid determination of when OCT / OCTA scans are of insufficient quality and may need to be repeated.

[0009] Another object of the present invention is to provide a quality metric for OCTA data based on A-scans one by one.

[0010] Another object of the present invention is to provide a (e.g., 2D or 3D) quality map of OCTA data that visually identifies portions of OCTA scans that may have poor quality, for example, as determined by the present system / method / apparatus. Summary of the Invention

[0011] The above objectives are achieved in methods / systems / devices used to identify low-quality OCT scans (or low-quality portions within an OCT scan, such as OCT structural scans and / or OCTA functional scans), identify possible sources of low quality, and recommend (or implement) corrective actions to improve subsequent OCT scans. Additionally, a quality map of the OCT scan can also be provided.

[0012] For example, this system / method / apparatus can provide one or more (e.g., 2D and / or 3D) quantitative quality maps describing the quality of OCT / OCTA acquisition, for example, at each face location (e.g., pixel or pixel region / window location). The resulting quality map correlates well with a subjective quality metric (e.g., provided by a human test subject) observed from slices produced from the acquisition at the corresponding face location. Optionally, the values ​​in the quality map can be averaged to provide an overall quality score for the acquisition, which has also been found to correlate well with the subjective quality level. This contrasts with prior quality assessment methods that determine a measurement of the total signal strength recorded in an OCT structural component compared to a noise baseline. Such prior methods do not provide reliable quality values ​​or location-specific quality values ​​for the OCTA flow components. The appended claims describe the invention in more detail.

[0013] The system / method / apparatus can also identify and output one or more possible sources / causes for low-quality acquisition or low-quality areas. For example, the system / method / apparatus can identify sources of low-quality acquisition as incorrect focusing, opacity (e.g., cataracts or floaters in opaque media), illumination below a predetermined threshold (e.g., possibly caused by small pupils), tracking problems (e.g., due to blinking), and suggest corrective actions, such as correcting / adjusting the focus, suggesting a different imaging angle to avoid opacity, identifying the need for pupil dilation, and identifying possible causes of eye-tracking loss. This information can be used to provide recommendations to the system operator (or an automated / semi-automatic subsystem within the OCT system) during data acquisition, which can be used to acquire repeated scans for better image quality. For example, the OCT system can use this information to automatically (or semi-automatically, e.g., in response to an OK input signal from the system operator) take the recommended corrective actions to improve subsequent scan acquisitions.

[0014] Other objects and achievements of the invention, as well as a more complete understanding, will become apparent and clear from the following description and claims, taken in conjunction with the accompanying drawings.

[0015] Several publications may be cited or referenced herein to aid in understanding the invention. All publications cited or referenced herein are incorporated herein by reference in their entirety.

[0016] The embodiments disclosed herein are merely examples, and the scope of the invention is not limited thereto. Any embodiment feature mentioned in one claim class, such as systems, apparatus, or methods, may also be claimed in another claim class, such as systems, apparatus, or methods. Dependent or referenced in the appended claims are chosen solely for formal reasons. However, any subject matter arising from an intentional reference to any preceding claim may also be claimed, such that any combination of claims and their features is disclosed and can be claimed, regardless of any dependent chosen in the appended claims. Attached Figure Description

[0017] Priority applications U.S. Serial Nos. 63 / 119377 and 63 / 233033 contain at least one color drawing, which is incorporated herein by reference.

[0018] In the accompanying drawings, the same reference symbols / characters denote the same parts: Figure 1 A set of Haralick features is shown, which are extracted from a retinal stream slice in a 250-micron circular neighborhood using a pixel-by-pixel sliding window.

[0019] Figure 2 A representation of an exemplary training scheme workflow according to the present invention is provided.

[0020] Figure 3 A representation of an exemplary application (or test) phase workflow according to the present invention is provided.

[0021] Figure 4 This explains the effects of reducing the overlap ratio, such as increasing computational speed, and the current applications of Gaussian filtering.

[0022] Figure 5 Exemplary results are provided by applying extrapolation to a quality map with standard filtering according to the present invention to fill non-numerical (NaN) values ​​(e.g., filling NaN values ​​along the perimeter of the quality map), wherein the quality map is obtained from a retinal angiography slice (e.g., a frontal image).

[0023] Figures 6A to 6E Various exemplary criteria for classifying traffic image quality using a 1-5 scale are shown.

[0024] Figure 6F The first example of the results from a good quality scan (top row, Gd1) and the second example of the results from a scan with poor overall quality are shown (bottom row, Pr1).

[0025] Figure 7 Examples of four slices 71-74 for feature extraction (max_flow, avg_struc, max_struc and min_stuc respectively) are provided, along with labels for the considered neighborhoods (white circles) and the resulting target quality map 75.

[0026] Figure 8A A graph is provided showing the results of the predicted values ​​in the training data before adjustment (correction) was applied using a quadratic polynomial.

[0027] Figure 8B The results are provided after adjustment / correction using a quadratic polynomial.

[0028] Figure 9 This demonstrates the effect of the smoothed average classifier plot when considering the basic truth value.

[0029] Figure 10A The analysis shows the average score of all pixels in all images of the training group with respect to the predicted value, where the horizontal axis represents the average given score of the expert grader, and the vertical axis represents the predicted score of the trained model.

[0030] Figure 10B Provide the percentage of failure cases, because the failure threshold is assumed to vary for images in the training set, where failure is defined as a given ratio (e.g., percentage or score) of images that deviate from the ground truth by more than 1 quality point.

[0031] Figure 11 The example results of comparing retinal slice 111 are shown, including the basic truth quality scores collected from three expert graders 112, the quality grading results of the current algorithm 113, the differences between the basic truth and algorithm quality maps 114 designed in 0-5 gray levels, and the regions with a quality score 115 that deviates from the basic truth and algorithm quality maps by more than 1.0.

[0032] Figure 12A It is a graph of the average grader score of all 6,500,000 data points in the test group versus the predicted value.

[0033] Figure 12B Displays the percentage of failures encountered, taking into account a failure threshold. Failures in the training set are considered variable, and failure is defined as a given ratio (e.g., percentage or score) of an image that has a computed quality metric deviation from the ground truth by more than 1 quality point.

[0034] Figure 13A Exemplary results for three images considered acceptable are shown, given the 20% acceptable deviation limit.

[0035] Figure 13B Examples of three images considered suboptimal from 26 analyses that give the same 20% deviation limit are shown.

[0036] Figure 14 A first example is shown of quality maps generated / obtained (using the present invention) from the right eye (four images on the left) and left eye (four images on the right) of a first patient for different 3×3 mm OCTA acquisitions (tissue-paired).

[0037] Figure 15 A second example of quality images obtained from the right eye (four images on the left) and left eye (four images on the right) of a second patient is shown for different 3x3mm OCTA acquisitions (tissue-paired).

[0038] Figure 16 Examples of quality images obtained from a patient's right eye using OCTA acquisitions for different fields of view (FOV) are shown: 3x3mm, 6x6mm, 9x9mm, and 12x12mm. OCTA acquisitions (paired tissue) are presented in a manner similar to... Figure 14 Displayed in the manner described above.

[0039] Figure 17 Provide the percentage of acceptable results for each human grader and the algorithm that uses the average of the three human graders as the base truth test data.

[0040] Figure 18 Provides the percentage of acceptable results obtained by comparing the annotations of each grader (separately removed from those that build the basic truth test data) and the results of the algorithm with the average results of the other two remaining graders as basic truth test data.

[0041] Figure 19 A general frequency-domain optical coherence tomography system for collecting 3D image data of the eye, suitable for use in this invention, is shown.

[0042] Figure 20 An exemplary OCT B scan image of a normal retina of the human eye is shown, and various canonical retinal layers and boundaries are identified by way of example.

[0043] Figure 21 An example of a frontal vascular system image is shown.

[0044] Figure 22 An exemplary image of an OCTA (Optical Coherence Traction Axis) scan of the vascular system is shown.

[0045] Figure 23 An example of a multilayer perceptron (MLP) neural network is shown.

[0046] Figure 24 A simplified neural network consisting of an input layer, hidden layers, and an output layer is shown.

[0047] Figure 25 An example convolutional neural network structure is shown.

[0048] Figure 26 An example U-Net structure is shown.

[0049] Figure 27 An example computer system (or computing device or computer) is shown. Detailed Implementation

[0050] Optical coherence tomography (OCT) scans can suffer from acquisition problems that may adversely affect the quality of the acquisition / scan. These problems include, in particular, misfocusing, the presence of floaters in opaque media, low illumination (e.g., signal intensity less than 6 on a scale from 0 to 10), low light transmittance (e.g., less than half the target transmittance, or less than 5 μm), tracking / motion artifacts, and / or the presence of high noise (e.g., root mean square noise values ​​above a predetermined threshold). These problems can adversely affect the quality of OCT system data, as shown in B-scans or frontal views (e.g., in slice form), and may adversely affect the accuracy of data extraction (or image processing) techniques or algorithms applied to OCT system data, such as segmentation or vascular density quantification techniques. Therefore, low-quality OCT system data, especially OCTA data, can potentially complicate accurate diagnosis. Thus, a quantitative assessment of the quality of acquired OCT / OCTA scans is necessary to quickly determine the feasibility or validity of the scan.

[0051] The quality of OCT structural data is typically determined based on the total signal intensity value. Generally, if the total signal intensity value is below a predetermined threshold, the entire OCT structural scan is considered poor, e.g., a scan failure. Therefore, signal intensity-based quality metrics only provide (e.g., output) a single quality value / measurement for the entire volumetric field of view (FOV), and this method is not a sufficiently reliable and adaptable approach when attempting to assess, for example, the quality of a frontal image whose structural information varies from one scan position to another. This method is also particularly unsuitable for assessing the quality of OCTA acquisitions, which provide functional, flow information rather than structural information. Therefore, the applicant understands that there is no commercially available method to automatically and quantitatively assess the quality of flow information in OCTA scans, whether as a unique value for each scan or in the form of a quality map.

[0052] A system and method for generating quality maps of OCT system data are provided herein, wherein the quality metric of the OCT system data can vary throughout the acquisition (e.g., at a desired FOV). Some parts of the present discussion may describe the invention as applicable to one or the other of OCT structured data or OCTA traffic data; however, it should be understood that, unless otherwise stated, the description of the invention may also be applied to the other of OCT structured data or OCTA traffic data.

[0053] This invention provides a system and method for quantitatively measuring the relative quality of OCT system data at each of multiple image quality locations / positions in OCT system data (e.g., at each scan position (e.g., each A-scan position) or at each quality metric window position (or pixel neighborhood) that may span multiple A-scan positions). While this quality assessment method can be applied to any OCT system data viewing / imaging technique (e.g., frontal, A-scan, B-scan, and / or C-scan images), for ease of discussion, this method is primarily described herein as being applied to frontal images (unless otherwise stated), while it should be understood that the same (or substantially similar to what is understood by those skilled in the art) method / technique can be applied to any other OCT system data viewing / imaging technique (e.g., A-scan, B-scan, and / or C-scan images).

[0054] This system / method can evaluate a set of texture properties of OCT / OCTA data in the vicinity of each image quality location (e.g., each frontal location (e.g., a pixel or image quality window or pixel neighborhood)) and assign a quantitative quality score related to scan quality to that location (and / or vicinity). In the case of a frontal image, the result is a two-dimensional quality map that describes the scan quality at each frontal location, for example, by using a color code (or grayscale code) indicating image quality.

[0055] This quality map can be used to judge / determine / calculate the quality of individual scans across their FOV, quantify the quality differences between several acquisitions (e.g., OCT system scan acquisitions) of the same subject at each frontal position, and / or provide an overall quality metric (e.g., a measure) for each acquisition, such as by averaging the quality map values. As discussed in more detail below, OCTA flow data can be determined by identifying contrast differences over time in multiple OCT scans (or acquisitions) of the same tissue (e.g., retina) region. The quality map technique of the present invention can be determined for individual OCT scans used to define the OCTA flow image, and the quality maps of individual OCT scans can be averaged to define the quality map for the OCTA flow map used for its definition. Alternatively or additionally, this quality map technique can be applied directly to the defined OCTA flow data or image (which may be based on contrast information or other flow indication data from multiple OCT scans). Optionally, this directly determined OCTA quality map can also be combined with the quality maps of the individual OCT scans that define the OCTA flow data / image (e.g., weighted averaging, such as equal or heavier weighting toward the directly determined OCTA quality map).

[0056] Regardless, the defined quality map (or overall quality metric of the acquisition) can provide important information to the OCT system operator to determine when the acquisition quality is low and a rescan (e.g., an OCT scan or an OCTA scan) is required. The system can further identify one or more possible causes of the low quality and output (e.g., to the system operator or to the OCT system's automated / sub-automated subsystem) recommendations aimed at achieving better quality in subsequent acquisitions. For example, if the quality map indicates that at least a predefined target retinal area (e.g., a predefined region of interest (ROI)) in the acquisition is below a predefined threshold quality metric, or if the total metric of the acquisition is below a predefined threshold total quality metric, the quality map (or total metric) can be used to determine in the automated system that another acquisition is needed. The automated system can then initiate another acquisition automatically or in response to an approval input signal from the system operator. The system can also identify one or more corrective actions (movements) to improve the acquisition quality and automatically perform one or more of the identified corrective actions before initiating another acquisition. Alternatively or additionally, quality maps (e.g., OCTA quality maps) of multiple acquisitions of the same retinal region can be compared with each other, and the best quality (or higher quality) portions / regions of the multiple acquisitions (as determined from their respective quality maps (e.g., per pixel or per window)) can be combined to define a composite acquisition with higher overall quality than each of the individual acquisitions (OCT and / or OCTA acquisitions).

[0057] Specific embodiments of the present invention are applied to OCTA acquisition at the frontal level. This embodiment generates a 2D quantitative map describing the quality of the OCTA acquisition (scan) at each frontal location. The technique first extracts a set of features related to image texture and other features from pixel neighborhoods in a slice visualization (e.g., a frontal image) obtained from the OCTA volume. Features are extracted for different pixel neighborhoods and assigned to the neighborhoods in a sliding window manner. For example, the window can be of any shape (e.g., rectangle, circle, etc.) and contains a predetermined number of pixels (e.g., a 3×3 pixel window). At each window location, features can be determined for a target pixel (e.g., the center pixel) within the window using information from multiple (e.g., all) pixels within the window. Once the features of the target (e.g., the center) pixel are determined, the window can be moved one (or more) pixel locations, and new features determined for another pixel (e.g., the new center pixel) can be moved within the new window location. The result is a set of two-dimensional feature maps, each describing different image features at each frontal location. These features can be manually formulated (e.g., intensity, energy, entropy) or learned using deep learning schemes (or other machine learning or artificial intelligence techniques) as training results. Examples of machine learning techniques can include artificial neural networks, decision trees, support vector machines, regression analysis, Bayesian networks, etc. Typically, machine learning techniques include one or more training phases, followed by one or more testing or application phases. A more detailed discussion of, for example, neural networks that can be used in this invention is provided below.

[0058] In the training phase of one or more of the machine learning methods of this invention, a set of two-dimensional feature maps obtained from a set of training OCTA acquisitions are combined in a machine learning or deep learning method to produce a model with an output corresponding to a quality score previously provided manually by a (human) expert grader for the same set of acquisitions. Furthermore, the model can be trained to indicate common acquisition problems that can be deduced from previous annotations in the images, such as misfocusing, low light or light penetration, tracking / motion artifacts, etc.

[0059] During the testing or application phase, the model learned from the training phase is applied to a set of 2D feature maps obtained from data never seen before (e.g., data not used in the training phase) to produce a 2D quality map as output. Individual quality metrics (or combined quality metrics of one or more sub-regions (e.g., partial area / partial) of the 2D quality map, such as by averaging the individual quality metrics within the respective sub-regions) can be compared to a predetermined minimum quality threshold to identify regions in the scan that are below the desired quality threshold. Alternatively or additionally, the values ​​in the 2D quality map can also be averaged across the entire map to produce an overall quality score. Furthermore, if the model is trained to indicate potential acquisition problems in the image, the feature maps can also be used to provide this information in unseen test images. The different parts of this method are discussed in more detail below.

[0060] Extract feature maps

[0061] A single frontal image (or slice) or N or more frontal images are generated from the OCTA cube. Each of these frontal images is analyzed to generate a set of M feature maps. As discussed above, these feature maps can be designed from known handcrafted image properties (e.g., gradient, entropy, or texture) in a given neighborhood (e.g., window or pixel neighborhood) of each frontal location, or as a result of intermediate layers in a deep learning (or other machine learning) scheme. The result is a set of N×M feature maps acquired for each OCTA.

[0062] For each handcrafted image attribute (or an abstract attribute from a deep learning scheme) and the generated slice, a sliding window approach is considered to generate a single map from a set of images of the same size as the slice, where the neighborhood of each pixel is considered to generate a unique attribute value (e.g., texture value, such as one or more Haralick features). This attribute value is assigned to the pixel's neighborhood in the map. As the sliding window moves, the values ​​computed for each neighborhood are considered, centered on different pixel locations within the slice, and the resulting values ​​in each neighborhood are averaged. The neighborhood can be defined differently depending on the application, such as rectangular or circular neighborhoods. Similarly, the extent of the neighborhood and overlap in the sliding window approach can be defined depending on the application. For example, Figure 1A set of 22 Haralick features H1 to H22 is shown, extracted from a retinal flow slice 13 using a pixel-by-pixel sliding window 11 within a 250-micron circular neighborhood, as indicated by arrow 15. Haralick features, or Haralick texture features, are a well-known mathematical method for extracting texture features from matrices or image regions. A more detailed discussion of Haralick features can be found in Robert M. Haralick's "Statistical and Structural Methods of Texture," IEEE Journal, Vol. 67, No. 5, 1979, pp. 786–804, which is incorporated herein by reference in its entirety.

[0063] Training phase

[0064] Figure 2 A representation of an exemplary training scheme workflow according to the present invention is provided. This example illustrates K training OCTA acquisition (scanning) samples A1 to AK. For each acquisition, N slices (e.g., frontal images) 23 can be generated, and M features can be defined for each of the N slices 23. Figure 25 (For example, such as) Figure 1 (The Haralick feature shown). This example can be trained using a holistic scoring method, where OCTA acquisitions are assigned a single holistic (e.g., quality) score, and / or using a region-based scoring method, where OCTA acquisitions are divided into multiple regions (e.g., P distinct regions) and each region is assigned a corresponding (e.g., quality) score. How the training vector is defined can depend on whether a holistic scoring method or a region-based scoring method is used. In any case, for the purposes of this discussion, the training vector used for each OCTA acquisition is referred to herein as a "case". Therefore, Figure 2 K cases are shown (e.g., cases 1 to K), with one case for each OCTA acquisition A1 to AK. Optionally, additional labels 27 (e.g., overall quality, segment quality, image or physiological or other recognition features, etc.) can be combined with the feature maps to define the corresponding cases, as shown by arrow 29. Cases 1 to K can then be submitted to the model training module 31, which outputs the generated model 33.

[0065] In summary, training can be performed using an overall quality score and / or information score given to the entire OCTA scan and / or region-based scores given to specific regions of the OCTA scan. For example, if an overall score is provided (e.g., per OCTA acquisition), the average of each feature map (or any other aggregation function) can be used to provide the corresponding individual value. In this case, a single N×M feature vector can be generated for training (e.g., training input) based on a single OCTA acquisition (e.g., A1), and the provided total value serves as the training result (e.g., the training target output). Alternatively, if a region-based score is provided for each acquisition, the average of each feature map (e.g., by region) (or any other aggregation function) can be used for training, resulting in multiple training instances.

[0066] In this scenario, if OCTA acquisition / scanning is graded across P different regions, this would mean training P number of N×M feature vectors [f]. 1 to f P The method assigns P corresponding values ​​as training results (e.g., the training target output). This method is flexible for different labels during the training phase, even for training with an overall score for each collection. This accelerates the collection of training data because images can be graded using scores.

[0067] Adjust the predicted score using a higher-order polynomial.

[0068] Depending on the model and data used to train the algorithm used according to the present invention, additional adjustments may be made to the resulting quality scores. For example, using a linear model to describe quality based on a combination of features and fit weights (such as linear regression) may unduly follow subjective scores and may require adjustment. That is, the quantitative difference between scores 1 and 2 cannot be guaranteed to be the same as the difference between scores 2 and 3. Furthermore, using training data that represents some particular scores more than others can lead to model imbalance, which may produce better results adjusted to a given score while having larger errors for others. One way to mitigate this behavior is to add additional adjustments or fits to the model's predicted scores by using a polynomial of a higher order than the polynomial initially considered / used to train the model, to a given target score. For example, when using a linear model to train the algorithm, a quadratic polynomial may be considered to adjust the predicted scores to better represent the target data. Examples of such adjustments (e.g., applied to a linear model) are provided below.

[0069] Application phase

[0070] Figure 3 A representation of an exemplary application (or test) phase workflow according to the present invention is provided. Similar to... Figure 2In the training phase, during the application phase, corresponding slices 41 and feature maps 43 are extracted from samples that were not previously seen / used during the training phase (OCTA acquisition 45). As shown in box 47, from the training phase ( Figure 2 The generated model 33 (as shown) can be applied in a sliding window manner to the neighborhood of a given pixel in feature map 43, producing a result value for each neighborhood. As the sliding window moves, the values ​​calculated for each neighborhood are considered, centered on different pixel locations in the feature map, and the result values ​​in each neighborhood are averaged. The result is a map of each frontal position of OCTA acquisition 45, such as a quality map 49 indicating the quality of angiography (or any other attribute or information the model is designed to represent). The values ​​in the resulting map 49 can also be aggregated (e.g., by averaging) to provide an overall value (e.g., a quality score) for acquisition 45.

[0071] Post-processing

[0072] When using texture neighborhood regions to predict the quality of a single pixel in a resulting W×H (width×height) dimensional quality map, a total of W×H neighborhood regions need to be evaluated to produce the complete map. This process can be applied using a sliding window approach, but this can take longer than optimal / desirable, sliding the window pixel-by-pixel because the feature extraction process and model prediction can be computationally expensive. To speed up this process, the sliding window can be defined as having a given overlap between the neighborhood regions under consideration, and having an average value among the overlapping neighborhood regions. While lower overlap between neighborhood regions is desirable for faster computation, this can lead to abrupt shifts in the pixelated image when the overlap is too low. To correct for this, a Gaussian filter can be used / applied to the generated quality map. Since larger neighborhood regions with lower overlap ratios may require more aggressive filtering, this Gaussian filter can be adapted to the defined neighborhood region size and the overlap ratio between neighborhood regions to produce visually pleasing results with minimal image degradation. Exemplary filter parameters can be defined as follows: , , Where σ x and σ y It is the σ parameter of the Gaussian filter; Filter_radius x and Filter_radius y It is the filtering radius (degree) of the filtering function; RadFeatPix x and RadFeatPix y These are the pixel-level neighborhood (or window) sizes of the adjacent areas in the horizontal and vertical directions, respectively; and overlapR x and overlapy These are the overlap ratios defined in the horizontal and vertical directions, respectively.

[0073] Figure 4 This illustrates the effects of reducing the overlap ratio, such as increased computational speed, and the application of current Gaussian filtering. Box B1 provides color and / or grayscale quality maps of a given input image before filtering instances of overlap ratios from 95% to 50%. Box B2 provides color and / or grayscale maps of the same quality image after applying Gaussian filtering. It can be seen that reducing the overlap ratio produces more pixelation and abrupt changes between neighboring regions in the image before filtering; however, this effect can be reduced by applying appropriate filtering while still producing results substantially similar to those with higher overlap ratios. For example, after filtering, there is almost no difference between a quality map with 50% overlap and one with 85% or 95% overlap.

[0074] Figure 4 One further limitation of the sliding window method is that when using circular neighborhoods, not all image boundary locations will be evaluated because not all circular neighborhoods will be able to assign a neighborhood with a given radius and overlap ratio. This problem can be more pronounced when Gaussian filtering is applied, as regions lacking information (e.g., "Not a Number" or "NaN") cannot be included in the filtering process. This effect can be eliminated by applying Gaussian filtering, which ignores NaN pixels with appropriate weighting and / or extrapolates to produce values ​​in such NaN pixels. This can be achieved through the following steps: 1) Define an image with the same size as the quality image where all pixel values ​​are 1: Img1 2) Replace all NaN positions in the unfiltered quality map and Img1 with the value 0. 3) Apply Gaussian filtering to Img1 4) Apply Gaussian filtering to the unfiltered quality map 5) Divide the result of step 4 by the result of step 3. For comparison, Figure 5 An exemplary result 51 (e.g., color or grayscale) is provided, which applies the extrapolation of the aforementioned NaN value to a quality map having a standard filter 53 (e.g., color or grayscale) obtained from a retinal angiography slice (frontal image) 55. The result of this operation is a filtered image 51 (e.g., color or grayscale) with a value correctly extrapolated at the NaN location.

[0075] Training program

[0076] This section provides examples for training and testing models to characterize subjective angiographic quality in retinal flow OCTA slices. The model is trained on retinal flow slices collected from multiple acquisitions (e.g., 72 or 250 OCTA scans, each a 6×6 mm OCTA scan).

[0077] Collected annotations

[0078] A set of graders (e.g., humans) independently grade each acquisition at each pixel location in each frontal slice, for example, using a digital grading scale. For instance, each grader could use the grading scale to depict different regions within each slice based on the corresponding quality of each region. The digital grading scale could consist of quality values ​​ranging from 1 to 5, where 1 indicates the worst quality (unavailable data) and 5 indicates the best quality (optimal). Figures 6A to 6E Various exemplary criteria (or examples thereof) for classifying the quality of flow maps using a 1-5 level scale are shown. Manually graded outputs can also be used to define one or more weights for the Haralick coefficients in flow quality algorithms.

[0079] Figure 6AThis is a first example of assigning individual quality grades to different regions of the positive flow map 61 using a 1-5 quality grade scale. For ease of illustration, each region is identified by its grade value and a corresponding color code or line pattern or brightness intensity boundary. For example, a blue or black solid line boundary corresponds to grade 1, a red or gray solid line boundary corresponds to grade 2, a green or long dashed line boundary corresponds to grade 3, a light purple or short dashed line boundary corresponds to grade 4, and a yellow or dashed line boundary corresponds to grade 5. In this example, grade 5 (the yellow / dashed line perimeter region) identifies the highest quality image region with excellent brightness and contrast, as determined by (e.g., a human) expert grader. The grade 5 region exhibits excellent brightness and / or contrast, capillaries are well depicted, and the grader can follow the capillaries. In the grader's estimation, the grade 5 region is an example of the best achievable image quality. Grade 4 (the light purple / short dashed line region) identifies image regions with reduced brightness and / or contrast (compared to the grade 5 region), but the grader can still follow the capillaries well. Level 3 (green / dashed line area) is of lower quality than Level 4, but the classifier can still infer (or guess) the presence of capillaries, exhibiting some discontinuities. Therefore, the classifier may miss some capillaries within the Level 3 area. Level 2 areas (red / solid gray area) have lower quality than Level 3 and define areas where the classifier can see some signals between large vessels but cannot distinguish capillaries. Within Level 2 areas, some areas between capillaries may incorrectly represent localized ischemia. In Level 1 areas (blue / solid black area), contents are washed away, and the classifier considers these areas unusable.

[0080] Figure 6B Provided with similar Figure 6A The second example of elements, which have similar reference numerals and are defined above. Figure 6B The effects of artifacts caused by floating objects on image quality are illustrated. In this example, floating objects are identified by noticing that such artifacts are absent in other scans of the same area.

[0081] To provide more accurate annotations for training algorithms, specific regions of a retinal slice may require specific annotation instructions. For example, determining capillary visibility is more difficult because the fovea region is typically avascular. Below is an example of dividing a positive flow map into three regions of interest, each considered to define grading criteria to illustrate differences in angiogenesis characteristics. Figure 6C The diagram illustrates grading using the central avascular zone (FAZ). Figure 6D The use of the nasal segment for grading is shown, and Figure 6E This demonstrates grading using the rest of the image.

[0082] exist Figure 6C In the classification, Level 5 again represents the best image quality with excellent brightness and / or contrast. In these areas, the classifier can follow the contours of the FAZ without difficulty or hesitation. Level 4 areas have reduced brightness and / or contrast, but the classifier can still track the FAZ. In Level 3 areas, capillaries are in focus, and the classifier can follow the contours of the FAZ with some noticeable discontinuities. Capillaries may be missed in this area. In Level 2 areas, poor signal obstructs some portions of the FAZ, and some capillaries appear out of focus. As previously stated, Level 1 areas are unusable, and at least some portion of the FAZ content is washed away.

[0083] exist Figure 6D In the example of graded nasal segments, grade 5 represents the best image quality with excellent brightness and / or contrast. In grade 5 regions, the retinal nerve fiber layer (RNFL) is well depicted, allowing the grader to follow the RNFL without difficulty. Grade 4 regions have reduced contrast, but the RNFL pattern is still recognizable. In grade 3 regions, the grader can infer (guess) the presence of RNFLs, despite some discontinuities. In this region, the grader can deduce the presence of RNFL capillaries, but individual capillaries may be difficult to trace. Grade 2 represents regions where the grader may see some signals between large vessels, but the RNFLs cannot be resolved. Within grade 2 regions, some areas may exhibit pseudo-ischemia. Grade 1 regions are considered unusable. Within grade 1 regions, medium-sized vessels may appear blurry, and the content often appears washed out.

[0084] exist Figure 6E In the classification, Level 5 represents the best quality area with excellent brightness and / or contrast, featuring very well-depicted capillaries that are easy to follow. Level 4 is an area with reduced contrast, where capillaries can still be followed and have a higher visible density compared to Level 3, where the presence of capillaries can be guessed but they cannot be followed. Nevertheless, in Level 3, the branches of fine vessels are well focused (the branches of arterioles and venules are focused), but the details of the capillaries are lost. Level 2 identifies areas where some signals between large vessels are still visible, but the capillaries are not resolved. Within Level 2 areas, fine vessels may appear slightly out of focus, and some areas between capillaries may appear as pseudo-ischemic (e.g., due to the presence of floating objects). Level 1 represents an unusable area, where large vessels are typically blurred and their contents are washed away.

[0085] Using the grading example described above, free-drawing tools such as the ImageJ plugin, known in the art, can be used to collect grader annotations. The grader can draw and label regions of a slice using any shape based on the quality of the region, with the aim of covering the entire slice's field of view with annotated regions. Annotations collected from different graders can be averaged to generate an average manual quality map, which can be used as a target result when training a quality map model.

[0086] Figure 6F A first example (top row, Gd1) of the results from a scan of overall good quality and a second example (bottom row, Pr1) of the results from a scan of overall poor quality are shown. The leftmost image (e.g., image in column C1) is a single 6×6 angiographic retinal slice; the center image (e.g., image in column C2) is a quality map value (in grayscale) with a scale from 1 to 5; and the rightmost image (e.g., image in column C3) shows the quality map overlay on a retinal slice (e.g., C1) with a color-coded or grayscale-coded scale from 1 to 5, as shown in the figure. This exemplary method evaluates a set of texture properties of OCTA data in the vicinity of each frontal location (e.g., pixel or window / region) and assigns a quantitative score associated with the scan quality at that location. The result is a two-dimensional quality map describing the scan quality at each frontal location. This map can be used to determine the quality of individual scans across their FOV, quantify the quality differences between multiple acquisitions of the same object at each frontal location, or provide an overall quality measure for each acquisition by averaging the map values. Furthermore, these quality maps can provide OCT / OCTA operators with crucial information about the need for rescanning due to poor quality and / or recommendations aimed at obtaining better quality scans in subsequent acquisitions. Other potential applications of this algorithm include reliability measurements of other algorithms applied to OCTA (or OCT) data, automatically excluding low-quality regions from quantization performed by different algorithms and using them as relative weights for mosaicking, and averaging or mosaicking OCTA slices or OCTA volumes from multiple acquisitions from overlapping frontal positions based on their quality.

[0087] Consider the extracted features

[0088] For each OCTA scan considered for training, the retinal slice definition can be used to generate four distinct slice images: a frontal flow slice (max_flow) generated by averaging the five maximum pixels at each A-scan location; a frontal structure slice (avg_struc) generated by averaging the values ​​at each A-scan location; a frontal structure slice (max_struc) generated by averaging the five maximum pixels at each A-scan location; and a frontal structure slice (min_struc) generated by averaging the five minimum pixels at each A-scan location. Optionally, no further processing or resizing is considered when generating these frontal projections. For each of the four slice images, a set of 22 Haralick features indicating texture properties can be extracted from a circular neighborhood with a 250-micron radius circular sliding window having a given offset (e.g., per-pixel offset or 75% (i.e., 0.75) overlap). For example, if 72 images are graded, this means extracting 88 features from 133,128 distinct neighborhoods used during training. Figure 7 Examples of four slices 71, 72, 73, and 74 for feature extraction (max_flow, avg_struc, max_struc, and min_stuc, respectively) and labeled neighborhoods (white circles) are provided, from which eighty-eight features were extracted (22 per image). The resulting target quality map 75 (e.g., color or grayscale) is also shown.

[0089] It can be noted that the Haralick features extracted from images can be highly dependent on the characteristics of specific instruments and software versions, such as baseline signal levels, internal normalization, or possible internal data filtering. For example, in the proof-of-concept implementation of this invention, the scans used for training undergo an internal process of 3D Gaussian filtering of the collected flow volumes. To apply the algorithm in subsequent software versions that do not support this internal Gaussian filtering, the same type of filtering needs to be pre-applied. That is, the scans are pre-filtered by the algorithm using flow volume forms obtained from instruments that do not include internal Gaussian filtering before feature extraction.

[0090] Training Model

[0091] In one example, Lasso regression (a generalized linear regression model) is used to train the average of the computed feature maps to predict the overall quality score of the acquisition. For instance, Lasso regression is used to train a set of 88 features extracted from 133,128 neighborhoods to predict a manual quality rating for the center pixel of the corresponding neighborhood in the target quality map, such as... Figure 7As shown. The Lasso model is trained to select the regularization coefficient (e.g., λ) that produces the minimum mean squared error in the prediction. A more detailed discussion of Lasso regression can be found in "Regression Shrinkage and Selection by Lasso", Journal of the Royal Statistical Society, Series B (Methodology), Wiley, 1996, 58(1): 267-88, by Robert Tibshirani, the entire contents of which are incorporated herein by reference. The resulting model can be applied in a sliding window manner (defined for feature extraction) to different acquisitions to produce angiographic quality maps.

[0092] As discussed above in the "Adjusting Predicted Scores with Higher-Order Polynomials" section, because a linear model is used in this example, and different amounts of data are considered for all 1-5 levels, an additional second-order polynomial is used to adjust the training results. For comparison purposes, Figure 8A A graph showing the predicted values ​​from the training data before the adjustment (correction) is provided, and Figure 8B The results after applying a quadratic polynomial (after adjustment / correction) are provided. In both graphs, the horizontal axis represents a given score in the target quality map for each training neighborhood (e.g., a total manually assigned by an expert), while the vertical axis represents the score predicted by the trained model. Also in both graphs, the light gray shaded area represents the standard deviation, while the dark gray shaded area represents the 95% limit of the predicted score for a given target score. From these two graphs, it can be observed how the average prediction for each given score becomes closer to the given score after applying the current adjustment, and how the confidence interval of the prediction is more stable across different given scores.

[0093] result

[0094] To evaluate the accuracy of the algorithm, the quality map generated by the automated algorithm was compared with the ground truth quality map. As explained above, the ground truth quality map was constructed based on manual region grading averaged across different graders (see the "Collected Annotations" section). Since the average grader map can exhibit abrupt changes due to region annotation processing, and the automated quality map exhibits smooth behavior due to moving window analysis and smoothing post-processing across different regions, the average grader map was smoothed to provide a fair comparison. This smoothing was considered to be the expected behavior of the automated algorithm, which used a smoothing filter with a kernel equal to the region range used in the moving window processing of the automated algorithm (in this case, a circular neighborhood with a radius of 250 micrometers).

[0095] Figure 9This illustrates the effect of smoothing the average grader plot when it is considered as a ground truth. For illustrative purposes, the max_flow plot of OCTA retinal slice 91, the average of the manual grader scores 92 of the max_flow plot, the smoothed average grader plot 93 (considered as a ground truth), and the results of this algorithm 94 (e.g., color or grayscale) are shown.

[0096] The evaluation was conducted in two steps: (1) first, the behavior on the same data used to train the algorithm was analyzed to understand the best expected behavior; and (2) then the behavior was analyzed on a separate set of test images.

[0097] Expected Results - Analysis of Training Data

[0098] While analyzing conventional accuracy on the same training data typically does not represent expected results on independent test data, in this example, a linear model is used to fit a very large number of instances with a much smaller number of extracted features as predictors, making overfitting highly unlikely, and the result speaks well to expectations on independent test data. Analyzing the results obtained on the training data helps understand the algorithm's optimal expected behavior and sets boundaries that can be considered good and non-optimal results.

[0099] Back Figure 9 It shows example results obtained in one case used to train the algorithm, allowing observation of how and to what extent the automatic result 94 from the algorithm resembles those collected in the smoothed average grader score 93 (used as the base truth). After reviewing the results in the training case, all results are, from a subjective point of view, “acceptable.”

[0100] To understand how close the results from the algorithm are to the ground truth values ​​in the training data, the values ​​predicted by the algorithm for all pixels in all cases (a total of 14,948,928 data points) from the training set are compared with the values ​​in the ground truth values. Figure 10A This diagram shows an analysis of the average grader scores versus predicted scores for all pixels across all images in the training set. The horizontal axis represents the average given score from the expert grader, while the vertical axis represents the predicted scores from the trained model. Light and dark gray shaded areas represent the standard deviation and 95% limit of the predicted scores for a given target score, respectively. It can be observed how the predicted average score closely resembles the average grader score. Outliers in the predicted scores cause large tails, but at different quality levels, the standard deviation and 95% limit of the difference mostly fall below 0.5 and 1.0 points for the specified quality.

[0101] The results on the training data are used to determine what constitutes optimal and suboptimal results, ultimately helping to determine the pass and failure ratio when establishing algorithm requirements. To do this, by changing the threshold considered as a failure, it's possible to determine what percentage of failure cases represent, such as... Figure 10B As shown, failure is defined as an image deviating from the ground truth by more than one quality point at a given ratio (or percentage). That is, Figure 10B The percentage of failure cases is provided because the threshold for considering a failure varies for the images in the training set, where failure is defined as a given percentage (e.g., percentage or score) of images deviating from the ground truth by more than 1 quality point. Since a smaller fraction (e.g., score or percentage) of images with a deviation greater than 1 is considered a failure, a larger percentage of failure cases will be seen, as this would be a more restrictive requirement. It was observed that when the ratio threshold was set to 0.2, the algorithm produced no failures in the training data. That is, by setting the requirement for acceptable results to no more than 20% of images having a deviation from the ground truth by more than 1 quality point, all results would be acceptable in the training set images. This analysis was used to further evaluate the results on independent test data.

[0102] 5.2 Results of Independent Test Data

[0103] As part of the proof-of-concept implementation, 26 6×6mm OCTA scans of different eyes used in the training set were analyzed as independent test data. As indicated in the “Features Considered for Extraction” section above, the flow volumes of the scans acquired using an instrument version excluding internal Gaussian filtering were pre-filtered by the algorithm prior to feature extraction. Following the same method as with the training data, the retinal OCTA flow slices of each scan were manually labeled independently at each pixel location in each frontal slice by three different expert graders, as discussed above in the “Collection Notes” section.

[0104] Figure 11This section shows exemplary results (e.g., color or grayscale) comparing a retinal slice (“Retinal OCTA Front,” 111), the ground truth quality scores collected from three expert graders (“Smooth Average Grader Plot,” 112), the quality grading results of the current algorithm (“Algorithm Quality Plot,” 113), the difference between the ground truth and the algorithm quality plot with a 0-5 grayscale scale (“Grader and Algorithm Difference,” 114), and regions with a quality score deviation greater than 1.0 from the ground truth and the algorithm quality plot (“Regions with Difference > 1,” 115). It can be observed that, for this particular example, the algorithm's results are, on average, similar to those collected from the expert graders, and also similar to the quality of the retinal slice, with regions exhibiting significantly lower quality on the left side of the image. It can also be seen how, by analyzing the training data and leveraging previously established algorithmic criteria, this case can be considered acceptable: for example, a deviation greater than 1.0 between the algorithm and the ground truth is less than 20% of the image.

[0105] Similarly, the values ​​predicted by the algorithm for all pixels in all cases (a total of 6,500,000 data points) from the test set are compared with the values ​​in the ground truth. Figure 12A This is a graph comparing the average classifier scores to the predicted scores across all 6,500,000 data points in the test set. The graph shows that, on average, the predicted scores are similar to those given by the average classifier. The observed tails are smaller than in the training set, but the standard deviation of the differences (e.g., the light gray area in the graph) and the 95% limit (the dark gray area in the graph) appear slightly larger than in the training set. This could be because the data used to train the algorithm provided a better fit and only 3 classifiers were used in the ground truth values ​​of the test set, which may indicate lower reliability. Figure 12B The percentage of failed cases encountered is displayed. Since the threshold for considering images as failed varies in the training set, a failure is defined as a given ratio (e.g., percentage or score) of an image that deviates from the ground truth by more than 1 quality point from the calculated quality metric. The requirement for acceptable results is that no more than 20% of the images have a quality metric that deviates from the ground truth by more than 1 (determined by the training data). 23 out of 26 images are acceptable, constituting 88% of the images with acceptable results.

[0106] Figure 13A Example results (shown, for example, in color or grayscale) of three images considered acceptable given the 20% acceptable deviation limit are shown, and Figure 13B Examples are shown (e.g., in color or grayscale) of three images considered suboptimal out of 26 analyzed, given the same 20% deviation limit. Figure 13A and Figure 13BFrom left to right, the first column of images is an example of a retinal slice (“Frontal OCTA of the Retina,” 131); the second column is the corresponding ground truth map (“Smoothed Average Grader Map,” 132); the third column is the corresponding result of the current algorithm (“Algorithm Quality Map,” 133); the fourth column indicates the difference between the corresponding ground truth map and the algorithm map result (“Grader vs. Algorithm Difference,” 134, with a 0-5 grayscale scale); the fifth column indicates regions with a deviation greater than 1 (“Regions with Difference > 1,” 135). Reference Figure 13B It can be observed that although the area with a deviation greater than 1 in the failed cases is more than 20% of its image, the quality map algorithm still produces results (e.g., the quality map of column 133) that are somewhat similar in quality to its corresponding retinal slice (e.g., the retinal OCTA frontal image of column 131).

[0107] Figure 14 This illustration shows (e.g., shown in color or grayscale) a first example of quality maps generated / obtained from the right eye (four images on the left) and left eye (four images on the right) of a first patient (using the present invention) for different 3×3mm OCTA acquisitions (tissues in pairs). Each pair of images shows the acquired retinal flow slice on the right, with a specified visual score given by the reader at the top, and an overlapping angiography quality map on the left, with a total average score calculated by the current model at the top (the overlap also shows retinal flow slices). It can be seen that the calculated total score correlates well with the given manual score. In addition, higher quality areas in each slice are shown as having higher region scores compared to lower quality areas.

[0108] Figure 15 A second example is shown (e.g., shown in color or grayscale) of quality images obtained from the right eye (four images on the left) and left eye (four images on the right) of a second patient for different 3x3mm OCTA acquisitions (tissues in pairs). For ease of discussion, Figure 15 The result is consistent with Figure 14 The results are displayed in a similar manner. Figure 15 In the example, it is clear how higher quality acquisitions result in higher overall scores, and how higher quality regions within each image also have higher scores.

[0109] Figure 16 Examples of quality images obtained from a patient's right eye for OCTA acquisitions (shown, for example, in color or grayscale) for different FOVs: 3x3mm, 6x6mm, 9x9mm, and 12x12mm are shown. OCTA acquisitions (paired tissue) are presented in a manner similar to... Figure 14 Displayed in the manner described above. Figure 16The example illustrates how this method can be flexible for (e.g., adapting to) different scan sizes.

[0110] Comparison of differences between readers

[0111] To better understand the performance of the algorithm (e.g., compared to human experts), in an exemplary application, an image test set is provided for evaluation of the same three graders (readers) used to build the ground truth training set (e.g., the algorithm) used to train the model according to the invention. Ground truth test data is defined (e.g., for each test image in the test image set) by taking the average of the quality assessment results provided by the three graders. The individual results of each grader and the results produced by the algorithm are compared to the ground truth test data, and their individual performance is determined using the same quality criteria used to define the failure of the algorithm, as discussed above. More specifically, a test image result is considered a failure if 20% or more (e.g., not less than 20%) of the submitted quality data deviates from the ground truth test data by more than one quality point. First, the annotations made by each of the three graders are compared in the same manner to the average quality map used as the ground truth test data. Figure 17 The percentage of acceptable results for each human grader and the algorithm is provided using the average of three human graders as the base truth test data. While this analysis is biased because annotations from each grader are also included in the average annotations used as the base truth test data, it can be observed that the percentage of acceptable results from the human graders is similar to the percentage of acceptable results from the algorithm. In fact, one of the human graders actually achieves a lower rate of acceptable results than the algorithm.

[0112] To remove bias, the results from one classifier were removed from the data used to build the base truth test data, and the removed classifier results were compared with the corrected base truth test data. This experiment was repeated individually for each human classifier. Figure 18 The percentage of acceptable results is provided by comparing the annotations for each classifier (separately removed from those used to build the base truth test data) and the algorithm's results with the average results of the other two remaining classifiers that served as the base truth test data. As shown in the figure, the percentage of acceptable results for individual classifiers using this method is lower than the percentage of acceptable results for the algorithm compared to the base truth test data. Figure 18 This highlights the algorithm's excellent performance and its ability to generate quality maps with quality annotations similar to those produced by an average expert grader, taking into account the difficulty of the task of grading scan quality in images and the stringent verification criteria used.

[0113] Processing speed

[0114] In the exemplary application, the average execution time for 26 scans (e.g., a combination of scans that require and do not require additional Gaussian filtering) was 2.58 seconds (with a standard deviation of 0.79). Of these 26 scans, 11 required additional Gaussian filtering (internally within the algorithm), while 15 did not. For those requiring Gaussian filtering, the average processing time was 2.9 seconds (0.96 std), while for those not requiring Gaussian filtering, the average processing time was 2.34 seconds (0.56 std).

[0115] The following provides a description of various hardware and architectures applicable to this invention.

[0116] Optical coherence tomography imaging system

[0117] Typically, optical coherence tomography (OCT) uses low-coherence light to produce two-dimensional (2D) and three-dimensional (3D) internal views of biological tissues. OCT enables in vivo imaging of retinal structures. OCT angiography (OCTA) produces flow information, such as blood flow from within the retina. Examples of OCT systems are provided in U.S. Patents 6,741,359 and 9,706,915, and examples of OCTA systems can be found in U.S. Patents 9,700,206 and 9,759,544, all of which are incorporated herein by reference. Exemplary OCT / OCTA systems are provided herein.

[0118] Figure 19A general-purpose frequency-domain optical coherence tomography (FD-OCT) system suitable for collecting 3D image data of the eye is illustrated. The FD-OCT system OCT_1 includes a light source LtSrc1. Typical light sources include, but are not limited to, broadband light sources with short time coherence lengths or swept-frequency laser sources. The beam from the light source LtSrc1 is typically guided by an optical fiber Fbr1 to illuminate a sample, such as the eye E; a typical sample is tissue within the human eye. In the case of spectral-domain OCT (SD-OCT), the light source LrSrc1 can be, for example, a broadband light source with a short time coherence length, or in the case of swept-frequency source OCT (SS-OCT), the light source can be a wavelength-tunable laser source. The beam is typically scanned by a scanner Scnr1 located between the output of the optical fiber Fbr1 and the sample E, such that the beam (dashed line Bm) scans laterally across the sample area to be imaged. The beam from the scanner Scnr1 can pass through a scanning lens SL and an ophthalmic lens OL and is focused onto the sample E to be imaged. A scanning lens SL can receive a beam from a scanner SCNR1 at multiple incident angles and produce substantially collimated light, which can then be focused onto the sample by an ophthalmic lens OL. This example illustrates a scanning beam that needs to be scanned in two lateral directions (e.g., the x and y directions in the Cartesian plane) to scan the desired field of view (FOV). An example of this is a point-field OCT, which uses a point-field beam to scan the sample. Thus, the scanner SCNR1 is illustratively shown to include two sub-scanners: a first sub-scanner Xscn for scanning the point-field beam across the sample in a first direction (e.g., the horizontal x direction); and a second sub-scanner Yscn for scanning the point-field beam across the sample in a direction traversing a second direction (e.g., the vertical y direction). If the scanning beam is a line-field beam (e.g., a line-field OCT), which can sample the entire line portion of the sample at a time, then perhaps only one scanner is needed to scan the line-field beam across the sample to span the desired FOV. If the scanning beam is a full field-of-view beam (e.g., full field-of-view OCT), no scanner is required, and the full field-of-view beam can be applied over the entire desired FOV at once.

[0119] Regardless of the type of beam used, light scattered from the sample (e.g., sample light) is collected. In this example, the scattered light returning from the sample is collected into the same fiber Fbr1, which is used to guide the illumination light. The reference light from the same light source LtSrc1 travels through a separate path, in this case involving fiber Fbr2 with an adjustable optical delay and a back reflector RR1. Those skilled in the art will recognize that a transmission reference path can also be used, and the adjustable delay can be placed in the sample or reference arm of the interferometer. The collected sample light is combined with the reference light, for example, in fiber coupler Cplr1, to form an optical interference in the OCT photodetector Dtctr1 (e.g., a photodetector array, digital camera, etc.). Although a single fiber port leading to detector Dtctr1 is shown, those skilled in the art will recognize that interferometers of various designs can be used for balanced or unbalanced detection of interference signals. The output from detector Dtctr1 is provided to a processor (e.g., an internal or external computing device) Cmp1, which converts the observed interference into depth information of the sample. Depth information can be stored in memory associated with processor Cmp1 and / or displayed on a display (e.g., computer / electronic display / screen) Scn1. Processing and storage functions can reside within the OCT device, or the functions can be offloaded to an external processor (e.g., an external computing device) (e.g., executed thereon), to which the collected data can be transferred. Examples of computing devices (or computer systems) are shown in […]. Figure 27 As shown in the diagram. This unit can be dedicated to data processing or performing other tasks that are very general and not specific to the OCT device. The processor (computing device) Cmp1 may include, for example, a field-programmable gate array (FPGA), a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a graphics processing unit (GPU), a system-on-a-chip (SoC), a central processing unit (CPU), a general-purpose graphics processing unit (GPGPU), or a combination thereof, which can perform some or all of the processing steps in a serial and / or parallel manner with one or more host processors and / or one or more external computing devices.

[0120] The sample and reference arms in the interferometer can be composed of bulk optics, fiber optics, or hybrid bulk optics systems and can have different structures, such as Michelson, Mach-Zehnder, or common-path-based designs known to those skilled in the art. The beam used here should be interpreted as any carefully guided optical path. Instead of mechanically scanning the beam, the optical field can illuminate a one-dimensional or two-dimensional region of the retina to generate OCT data (e.g., see U.S. Patent 9,332,902; D. Hillmann et al., “Holographic Mirror-Holographic Optical Coherence Tomography,” Optics Letters, 36(13):2390 2011; Y. Nakamura et al., “High-Speed ​​Three-Dimensional Imaging of the Human Retina by Line-Field Spectral Domain Optical Coherence Tomography,” Optics Express, 15(12):7103 2007; Blazkiewicz et al., “Signal-to-Noise Ratio Study of Full-Field Fourier Domain Optical Coherence Tomography,” Applied Optics, 44(36):7772 (2005)). In time-domain systems, the reference arm needs to have an adjustable optical delay to generate interference. Balance detection systems are typically used in TD-OCT and SS-OCT systems, while spectrometers are used at the detection port of SD-OCT systems. The invention described herein can be applied to any type of OCT system. Various aspects of this invention can be applied to any type of OCT system or other types of ophthalmic diagnostic systems and / or multiple ophthalmic diagnostic systems, including but not limited to fundus imaging systems, visual field testing devices, and scanning laser polarimeters.

[0121] In Fourier domain optical coherence tomography (FD-OCT), each measurement is a real-valued spectral interferogram (Sj(k)). The real-valued spectral data typically undergoes several post-processing steps, including background subtraction and dispersion correction. The Fourier transform of the processed interferogram produces a complex-valued OCT signal output Aj(z) = |Aj|ei. The absolute value |Aj| of this composite OCT signal reveals the profile of scattering intensity at different path lengths, and thus reveals scattering as a function of depth (z-direction) in the sample. Similarly, phase can also be extracted from the complex-valued OCT signal. j. The scattering profile as a function of depth is called an axial scan (A-scan). A set of A-scans measured at adjacent locations in a sample produces a cross-sectional image (computed tomography or B-scan) of the sample. The collection of B-scans collected at different lateral locations on the sample constitutes a data volume or cube. For a given amount of data, the term "fast axis" refers to the scanning direction along a single B-scan, while "slow axis" refers to the axis along which multiple B-scans are collected. The term "cluster scan" can refer to a single data cell or block of data generated by repeated acquisition at the same (or substantially the same) location (or region) for the purpose of analyzing motion contrast, which can be used to identify flow. A cluster scan can consist of multiple A-scans or B-scans collected at approximately the same location on the sample at relatively short time intervals. Because the scans in a cluster scan are in the same region, the static structure remains relatively unchanged from scan to scan in a cluster scan, and the motion contrast between scans that meet predetermined criteria can be identified as blood flow.

[0122] Various methods for creating B-scans are known in the art, including but not limited to: along the horizontal or x-direction, along the vertical or y-direction, along the diagonal of x and y, or in a circular or spiral pattern. B-scans can be in the xz dimension, but can be any cross-sectional image including the z-dimensional dimension. Figure 20 The image shown is an example OCT B scan image of a normal human retina. An OCT B scan of the retina provides a view of the retinal tissue structure. For illustrative purposes, Figure 20 Various typical retinal layers and layer boundaries were identified. The identified retinal boundary layers include (from top to bottom): Internal limiting membrane (ILM) Layer 1, retinal nerve fiber layer (RNFL or NFL) Layer 2, ganglion cell layer (GCL) Layer 3, internal plexiform layer (IPL) Layer 4, nuclear layer (INL) Layer 5, external plexiform layer (OPL) Layer 6, external nuclear layer (ONL) Layer 7, the junction between the outer segment (OS) and inner segment (IS) of the photoreceptor (indicated by reference symbol Layer 8), outer or external limiting membrane (ELM or OLM) Layer 9, retinal pigment epithelium (RPE) Layer 10, and Bruch's membrane (BM) Layer 11.

[0123] In OCT angiography or functional OCT, analytical algorithms can be applied to OCT data collected at the same or nearly identical sample locations on the sample at different times (e.g., cluster scans) to analyze motion or flow (see, for example, U.S. Patent Publications 2005 / 0171438, 2012 / 0307014, 2010 / 0027857, 2012 / 0277579, and U.S. Patent No. 6,549,801, the entire contents of which are incorporated herein by reference). OCT systems can use any of a variety of OCT angiography processing algorithms (e.g., motion contrast algorithms) to identify blood flow. For example, motion contrast algorithms can be applied to intensity information derived from image data (intensity-based algorithms), phase information from image data (phase-based algorithms), or composite image data (composite-based algorithms). A frontal image is a 2D projection of 3D OCT data (e.g., by averaging the intensity of each individual A-scan, such that each A-scan defines pixels in the 2D projection). Similarly, a frontal vascular system image is an image displaying motion contrast signals, where the data dimension corresponding to depth (e.g., along the z-direction of the A-scan) is displayed as a single representative value (e.g., a pixel in a 2D projected image), typically displayed by summing or integrating all or isolated portions of the data (see, for example, U.S. Patent No. 7,301,644, the entire contents of which are incorporated herein by reference). An OCT system providing angiographic imaging capabilities may be referred to as an OCT angiography (OCTA) system.

[0124] Figure 21 An example of a frontal vascular system is shown. After processing the data using any motion contrast technique known in the art to highlight motion contrast, the pixel range corresponding to a given tissue depth on the surface of the internal limiting membrane (ILM) in the retina can be summed to generate a frontal (e.g., anterior view) image of the vascular system. Figure 22 An exemplary B-scan image of the vascular system (OCTA) is shown. As illustrated, structural information may not be well-defined because blood flow may cross multiple retinal layers, making their definitions less precise than those in a structural OCT B-scan. Figure 20As shown. However, OCTA provides a non-invasive technique for imaging microvascular lesions of the retina and choroid, which can be crucial for the diagnosis and / or monitoring of various pathologies. For example, OCTA can be used to differentiate diabetic retinopathy by identifying microaneurysms, neovascular complexes, and quantifying central avascular and non-perfused areas. Furthermore, OCTA has shown good concordance with fluorescein angiography (FA), a more traditional but more invasive technique that requires dye injection to observe vascular flow in the retina. In addition, in dry age-related macular degeneration, OCTA has been used to monitor the overall reduction in choroidal vascular layer flow. Similarly, in wet age-related macular degeneration, OCTA can provide qualitative and quantitative analysis of choroidal neovascularization and membranes. OCTA is also used to investigate vascular occlusion, such as evaluating non-perfused areas and the integrity of superficial and deep vascular plexuses.

[0125] Neural Networks

[0126] As discussed above, this invention can utilize neural network (NN) machine learning (ML) models. For completeness, a general discussion of neural networks is provided herein. This invention can use any of the neural network structures described below, either alone or in combination. A neural network is a network of interconnected neurons (nodes), where each neuron represents a node in the network. Groups of neurons can be arranged hierarchically, where the output of one layer is fed forward to the next layer in a multilayer perceptron (MLP) device. An MLP can be understood as a feedforward neural network model that maps a set of input data to a set of output data.

[0127] Figure 23 An example of a multilayer perceptron (MLP) neural network is shown. Its structure may include multiple hidden (e.g., inner) layers HL1 to HLn, which map an input layer InL (which receives a set of inputs (or vector inputs) in_1 to in_3) to an output layer OutL, which produces a set of outputs (or vector outputs), such as out_1 and out_2. Each layer can have any given number of nodes, which are exemplarily shown herein as circles within each layer. In this example, the first hidden layer HL1 has two nodes, while hidden layers HL2, HL3, and HLn each have three nodes. Generally, the deeper the MLP (e.g., the greater the number of hidden layers in the MLP), the greater its learning capacity. The input layer InL receives vector input (illustratively shown as a three-dimensional vector consisting of in_1, in_2, and in_3), and the received vector input can be applied to the first hidden layer HL1 in the sequence of hidden layers. The output layer OutL receives the output from the last hidden layer (e.g., HLn) in the multi-layer model, processes its input, and produces a vector output (illustratively shown as a two-dimensional vector consisting of out_1 and out_2).

[0128] Typically, each neuron (or node) produces a single output, which is fed forward to neurons in the immediately following layer. However, each neuron in a hidden layer can receive multiple inputs from either the input layer or the outputs of neurons in the preceding hidden layer. Generally, each node can apply a function to its inputs to produce that node's output. Nodes in hidden layers (e.g., learning layers) can apply the same function to their respective inputs to produce their respective outputs. However, some nodes, such as those in the input layer InL, receive only one input and can be passive, meaning they simply forward the value of their single input to their outputs; for example, they provide a copy of their input to their outputs, as indicated by the dashed arrows within the nodes of the input layer InL.

[0129] For the purpose of explanation, Figure 24 A simplified neural network consisting of an input layer InL', a hidden layer HL1', and an output layer OutL' is shown. The input layer InL' is shown as having two input nodes i1 and i2 that receive inputs Input_1 and Input_2, respectively (e.g., the input nodes of layer InL' receive a two-dimensional input vector). The input layer InL' is fed forward to a hidden layer HL1' with two nodes h1 and h2, which in turn is fed forward to an output layer outL' with two nodes o1 and o2. The interconnections or links between neurons (as indicated by solid arrows) have weights w1 to w8. Typically, in addition to the input layers, nodes (neurons) can receive the output of the node in their immediately preceding layer as input. Each node computes its output by summing the products of its inputs (multiplying each of its inputs by the corresponding interconnection weights for each input), adding (or multiplying by) a constant defined by another weight or bias that can be associated with that particular node (e.g., node weights w9, w10, w11, w12 corresponding to nodes h1, h2, o1, and o2, respectively), and then applying a nonlinear or logarithmic function to the result. The nonlinear function can be called an activation function or a transfer function. Various activation functions are known in the art, and the choice of a particular activation function is not critical to the present discussion. However, it should be noted that the operation of an ML model or the behavior of a neural network depends on the weight values, which can be learned to make the neural network provide the desired output for a given input.

[0130] Neural networks learn (e.g., are trained to determine) appropriate weight values ​​to achieve the desired output for a given input during the training or learning phase. Before training the neural network, each weight can be individually assigned an initial (e.g., random and optionally non-zero) value, such as a random number seed. Various methods for assigning initial weights are known in the art. The weights are then trained (optimized) such that, for a given training vector input, the neural network produces an output close to the desired (predetermined) training vector output. For example, the weights can be incrementally adjusted over thousands of iterations using a technique called backpropagation. In each cycle of backpropagation, the training input (e.g., a vector input or training input image / sample) is fed forward through the neural network to determine its actual output (e.g., a vector output). The error for each output neuron or output node is then calculated based on the actual neuron output and the target training output of that neuron (e.g., the training output image / sample corresponding to the current training input image / sample). The weights are then updated by backpropagating through the neural network (in the direction from the output layer back to the input layer) based on how much each weight contributes to the total error, causing the output of the neural network to move closer to the desired training output. This loop is then repeated until the actual output of the neural network falls within an acceptable error range of the expected training output for a given training input. It should be understood that each training input may require many backpropagation iterations before reaching the expected error range. Typically, an epoch refers to one backpropagation iteration (e.g., one forward pass and one backward pass) across all training samples, making training the neural network potentially require many epochs. Generally, the larger the training set, the better the performance of the trained ML model, so various data augmentation methods can be used to increase the size of the training set. For example, when the training set includes multiple pairs of corresponding training input and training output images, the training images can be divided into multiple corresponding image segments (or patches). Corresponding patches from the training input and training output images can be paired to define multiple training patch pairs from a single input / output image pair, which expands the training set. However, training on a large training set places high demands on computational resources such as memory and data processing resources. The computational requirements can be reduced by dividing the large training set into multiple mini-batches, where the size of the mini-batch defines the number of training samples in a forward / backward pass. In this case, one epoch can include multiple mini-batches. Another problem is the possibility that neural networks (NNs) overfit the training set, reducing their ability to generalize from a specific input to different inputs. Overfitting can be mitigated by creating an ensemble of neural networks or by randomly dropping nodes within the neural network during training, effectively removing the dropped nodes from the network. Various dropout conditioning methods, such as reverse dropout, are known in the art.

[0131] It is important to note that the operation of a trained neural network (NN) is not a direct algorithmic step of the operation / analysis process. In fact, when a trained NN receives input, that input is not analyzed in the traditional sense. Instead, regardless of the subject or nature of the input (e.g., a vector defining a real-time image / scan or a vector defining other entities such as demographic descriptions or activity logs), the input will be influenced by the same predefined architectural construction of the trained neural network (e.g., the same node / layer arrangement, trained weights and biases, predefined convolution / deconvolution operations, activation functions, pooling operations, etc.), and it may be unclear how the architectural construction of the trained network produces its output. Furthermore, the values ​​of the trained weights and biases are not deterministic and depend on many factors, such as the amount of time the neural network is given for training (e.g., the number of epochs in training), the random starting values ​​of the weights before training begins, the computer architecture of the machine on which the NN is trained, the selection of training samples, the distribution of training samples across multiple mini-batches, the choice of activation functions, the choice of error functions to modify the weights, and even if training is interrupted on one machine (e.g., with a first computer architecture) and completed on another machine (e.g., with a different computer architecture). The key point is that the reasons why trained ML models achieve certain outputs are not yet clear, and much research is currently underway to try to determine the factors that influence the outputs of ML models. Therefore, the processing of real-time data by neural networks cannot be reduced to simple step-by-step algorithms. Instead, its operation depends on its training architecture, training sample set, training sequence, and various circumstances during the training of the ML model.

[0132] In summary, building a neural network (NN) machine learning model can include a learning (or training) phase and a classification (or operation) phase. In the learning phase, the neural network is trained for a specific purpose, and a set of training examples, including training (sample) inputs and training (sample) outputs, can be provided to the neural network, and optionally a set of validation examples to test the progress of training. During this learning process, various weights associated with the nodes and node interconnections in the neural network are incrementally adjusted to reduce the error between the actual output of the neural network and the expected training output. In this way, a multi-layered feedforward neural network (such as those discussed above) can be made capable of approximating any measurable function to any desired accuracy. The result of the learning phase is a machine learning (ML) model that has been learned (e.g., trained). In the operation phase, a set of test inputs (or real-time inputs) can be submitted to the learned (trained) ML model, which can apply what it has learned to produce output predictions based on the test inputs.

[0133] and Figure 23 and Figure 24Similar to regular neural networks, convolutional neural networks (CNNs) also consist of neurons with learnable weights and biases. Each neuron receives input, performs an operation (e.g., a dot product), and optionally follows a non-linearity. However, CNNs can take raw image pixels at one end (e.g., the input) and provide classification (or class) scores at the other end (e.g., the output). Because CNNs expect images as input, they are optimized to work with volume (e.g., the pixel height and width of the image, plus the image's depth, such as color depth, like RGB depth defined by three colors: red, green, and blue). For example, CNN layers can be optimized for neurons arranged in three dimensions. Neurons in a CNN layer can also be connected to small regions preceding that layer, rather than all neurons in a fully connected NN. The final output layer of a CNN can reduce the entire image to a single vector (classification) arranged along the depth dimension.

[0134] Figure 25 An example convolutional neural network (CNN) architecture is provided. A CNN can be defined as a sequence of two or more layers (e.g., layers 1 to N), where each layer may include a (image) convolution step, a (result) weighted sum step, and a nonlinear function step. Convolution can be performed on the input data, for example, by applying filters (or kernels) over a moving window on the input data, to produce a feature map. Each layer and its components may have different predetermined filters (from a filter bank), weights (or weighting parameters), and / or function parameters. In this example, the input data is an image with a given pixel height and width, which may be the raw pixel values ​​of the image. In this example, the input image is shown as a depth image with three color channels RGB (red, green, and blue). Optionally, the input image may undergo various preprocessing steps, and the preprocessed result may be used instead of the original input image or in addition to the original input image. Some examples of image preprocessing may include: retinal angiography segmentation, color space transformation, adaptive histogram equalization, generation of connected components, etc. Within a layer, a dot product can be computed between a given weight and a small region they connect in the input volume. Many ways of configuring CNNs are known in the art, but as an example, layers can be configured to apply element-wise activation functions, such as a max(0,x) threshold at zero. Pooling functions (e.g., along the xy direction) can be performed to downsample the volume. Fully connected layers can be used to determine the classification output and produce a one-dimensional output vector, and have been found to be useful for image recognition and classification. However, for image segmentation, CNNs will need to classify each pixel. Since each CNN layer tends to degrade the resolution of the input image, another stage is needed to upsample the image back to its original resolution. This can be achieved by applying a transposed (or deconvolutional) level TC, which typically does not use any predefined interpolation methods but instead has learnable parameters.

[0135] Convolutional neural networks have been successfully applied to many computer vision problems. As explained above, training a CNN typically requires a large training dataset. The U-Net architecture is based on CNNs and can usually be trained on a smaller training dataset than traditional CNNs.

[0136] Figure 26 An example U-Net structure is shown. This exemplary U-Net includes an input module (or input layer or level) that receives an input U-in of any given size (e.g., an input image or image patch). For illustrative purposes, the image size at any stage or layer is indicated within a box representing the image; for example, the input module contains the number "128×128" to indicate that the input image U-in consists of 128×128 pixels. The input image can be a fundus image, an OCT / OCTA frontal image, a B-scan image, etc. However, it should be understood that the input can be of any size or dimension. For example, the input image can be an RGB color image, a monochrome image, a volumetric image, etc. The input image undergoes a series of processing layers, each shown at exemplary dimensions, but these dimensions are for illustrative purposes only and will depend on, for example, the dimensions of the image, convolutional filters, and / or pooling levels. This architecture includes a contraction path (exemplarily comprising four encoding modules), followed by an expansion path (exemplarily comprising four decoding modules), and copy and cut links between the respective modules / levels (e.g., CC1 to CC4), which copy the output of an encoding module in the contraction path and connect it to the uptransformed input of the corresponding decoding module in the expansion path (e.g., appending it to the end of the uptransformed input of the corresponding decoding module in the expansion path). This produces a distinctive U-shape, from which its name is derived. Optionally, for computational reasons, a "bottleneck" module / level (BN) may be located between the contraction and expansion paths. The bottleneck BN may comprise two convolutional layers (with batch normalization and optional dropout).

[0137] A shrinking path is similar to an encoder and typically captures contextual (or feature) information using feature maps. In this example, each encoding module in the shrinking path may include two or more convolutional layers, as indicated by the asterisk symbol "". As shown, and this can be followed by a max-pooling layer (e.g., a downsampling layer). For example, the input image U-in is illustrated as undergoing two convolutional layers, each with 32 feature maps. It should be understood that each convolutional kernel produces a feature map (e.g., the output from a convolution operation with a given kernel is an image commonly referred to as a "feature map"). For example, the input U-in undergoes a first convolution applying 32 convolutional kernels (not shown) to produce an output consisting of 32 corresponding feature maps. However, as is known in the art, the number of feature maps produced by the convolution operation can be adjusted (up or down). For example, the number of feature maps can be reduced by averaging the groups of feature maps, discarding some feature maps, or using other known feature map simplification methods. In this example, the first convolution is followed by a second convolution, the output of which is limited to 32 feature maps. Another way to envision feature maps is to consider the output of the convolutional layer as a 3D image, whose 2D dimensions / planes are given by the listed XY plane pixel dimensions (e.g., 128×128 pixels), and whose depth is given by the number of feature maps (e.g., 32 planar image depths). Following this analogy, the output of the second convolution (e.g., the output of the first encoding module in the shrinking path) The output (from the second convolution) can be described as a 128×128×32 image. The output from the second convolution then undergoes a pooling operation, which simplifies the 2D dimensions of each feature map (e.g., the X and Y dimensions can each be halved). The pooling operation can be implemented in the downsampling operation, as indicated by the down arrow. Several pooling methods are known in the art, such as max pooling, and the specific pooling method is not critical to this invention. The number of feature maps can be doubled at each pooling point, starting with 32 feature maps in the first encoding module (or block), 64 feature maps in the second encoding module, and so on. Therefore, the shrinking path is formed by… A convolutional network consists of multiple encoding modules (or levels or blocks). As is typical of convolutional networks, each encoding module provides at least one convolutional level, followed by an activation function (e.g., a rectified linear unit (ReLU) or a sigmoid layer) (not shown) and a max-pooling operation. Typically, the activation function introduces non-linearity into the layer (e.g., to help avoid overfitting), receives the layer's results, and determines whether to "activate" the output (e.g., determining whether the value of a given node meets a predefined criterion to forward the output to the next layer / node). In summary, shrinking paths generally reduce spatial information while increasing feature information.

[0138] The expansion path is analogous to a decoder, where localization and spatial information can be provided to the results of the contraction path, regardless of the downsampling and any max pooling performed during the contraction phase. The expansion path comprises multiple decoding modules, each concatenating its currently uptransformed input with the output of the corresponding encoding module. In this way, feature and spatial information are combined in the expansion path through a sequence of upconvolutions (e.g., upsampling, transposed convolutions, or deconvolutions) and concatenations (e.g., via CC1 to CC4) with high-resolution features from the contraction path. Therefore, the output of the deconvolutional layer is concatenated with the corresponding (optionally clipped) feature map from the contraction path, followed by two convolutional layers and an activation function (with optional batch normalization).

[0139] The output from the final dilation block in the dilation path can be fed into another processing / training block or layer, such as a classifier block, which can be trained along with the U-Net architecture. Alternatively, or additionally, the output of the final upsampling block (at the end of the dilation path) can be submitted to another convolutional operation (e.g., output convolution) before producing its output U-out, as indicated by the dashed arrow. The kernel size of the output convolution can be chosen to reduce the dimensionality of the final upsampling block to a desired size. For example, the neural network can have multiple features per pixel before reaching the output convolution, which can provide a 1×1 convolutional operation to combine these multiple features into a single output value for each pixel at a pixel-wise level.

[0140] Computing devices / systems

[0141] Figure 27 An example computer system (or computing device or computer apparatus) is illustrated. In some embodiments, one or more computer systems may provide the functionality described or illustrated herein and / or perform one or more steps of one or more methods described or illustrated herein. The computer system may take any suitable physical form. For example, the computer system may be an embedded computer system, a system-on-a-chip (SOC), a single-slice computer system (SBC) (e.g., a computer-on-module (COM) or system-on-module (SOM)), a desktop computer system, a laptop or notebook computer system, a computer system grid, a mobile phone, a personal digital assistant (PDA), a server, a tablet computer system, an augmented / virtual reality device, or a combination of two or more of these. Where appropriate, the computer system may reside in a cloud, which may include one or more cloud components in one or more networks.

[0142] In some embodiments, the computer system may include a processor Cpnt1, a memory Cpnt2, a storage device Cpnt3, an input / output (I / O) interface Cpnt4, a communication interface Cpnt5, and a bus Cpnt6. The computer system may also optionally include a display Cpnt7, such as a computer monitor or screen.

[0143] Processor Cpnt1 includes hardware for executing instructions, such as those that constitute a computer program. For example, processor Cpnt1 may be a central processing unit (CPU) or a general-purpose computing (GPGPU) on a graphics processing unit. Processor Cpnt1 may retrieve (or fetch) instructions from internal registers, internal cache, memory Cpnt2, or storage device Cpnt3, decode and execute instructions, and write one or more results to internal registers, internal cache, memory Cpnt2, or storage device Cpnt3. In a particular embodiment, processor Cpnt1 may include one or more internal caches for data, instructions, or addresses. Processor Cpnt1 may include one or more instruction caches and one or more data caches, such as those for storing data tables. Instructions in the instruction cache may be copies of instructions in memory Cpnt2 or storage device Cpnt3, and the instruction cache may accelerate the retrieval of those instructions by processor Cpnt1. Processor Cpnt1 may include any suitable number of internal registers and may include one or more arithmetic logic units (ALUs). Processor Cpnt1 may be a multi-core processor; or may include one or more processors Cpnt1. While this invention describes and illustrates a particular processor, it covers any suitable processor.

[0144] Memory Cpnt2 may include main memory for storing instructions that processor Cpnt1 executes or holds intermediate data during processing. For example, a computer system may load instructions or data (e.g., a data table) from memory Cpnt3 or from another source (e.g., another computer system) into memory Cpnt2. Processor Cpnt1 may load instructions and data from memory Cpnt2 into one or more internal registers or internal caches. To execute instructions, processor Cpnt1 may retrieve and decode instructions from internal registers or internal caches. During or after instruction execution, processor Cpnt1 may write one or more results (which may be intermediate or final results) to internal registers, internal caches, memory Cpnt2, or memory Cpnt3. Bus Cpnt6 may include one or more memory buses (which may include an address bus and a data bus, respectively) and may couple processor Cpnt1 to memory Cpnt2 and / or memory Cpnt3. Optionally, one or more memory management units (MMUs) facilitate the transfer of data between processor Cpnt1 and memory Cpnt2. The memory Cpnt2 (which may be fast volatile memory) may include random access memory (RAM), such as dynamic RAM (DRAM) or static RAM (SRAM). The storage device Cpnt3 may include long-term or high-capacity storage for data or instructions. The storage device Cpnt3 may be internal or external to the computer system and includes one or more of the following: disk drive (e.g., hard disk drive, HDD or solid-state drive, SSD), flash memory, ROM, EPROM, optical disk, magneto-optical disk, magnetic tape, universal serial bus (USB) accessible drive, or other types of non-volatile memory.

[0145] The I / O interface Cpnt4 can be software, hardware, or a combination of both, and includes one or more interfaces (e.g., serial or parallel communication ports) for communicating with I / O devices, enabling communication with a person (e.g., a user). For example, I / O devices may include a keyboard, keypad, microphone, monitor, mouse, printer, scanner, speaker, still camera, pen, tablet, touchscreen, trackball, camera, another suitable I / O device, or a combination of two or more of these.

[0146] The communication interface Cpnt5 can provide a network interface for communicating with other systems or networks. The communication interface Cpnt5 may include a Bluetooth interface or other types of packet-based communication. For example, the communication interface Cpnt5 may include a network interface controller (NIC) and / or a wireless NIC or wireless adapter for communicating with a wireless network. The communication interface Cpnt5 can provide communication with Wi-Fi networks, ad hoc networks, personal area networks (PANs), wireless PANs (e.g., Bluetooth WPANs), local area networks (LANs), wide area networks (WANs), metropolitan area networks (MANs), cellular telephone networks (e.g., Global System for Mobile Communications (GSM) networks), the Internet, or a combination of two or more of these.

[0147] The Cpnt6 bus can provide communication links between the aforementioned components of a computing system. For example, the Cpnt6 bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), HyperTransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an Infinite Bandwidth bus, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCIe) bus, a Serial Advanced Technology Accessory (SATA) bus, a Video Electronics Standards Association Local (VLB) bus, or other suitable buses or combinations of two or more of these.

[0148] Although the present invention describes and illustrates a particular computer system having a particular number of particular components in a particular device, the present invention contemplates any suitable computer system having any suitable number of any suitable components in any suitable device.

[0149] Herein, computer-readable non-transient storage media may include one or more semiconductor-based or other integrated circuits (ICs) (e.g., field-programmable gate arrays (FPGAs) or application-specific integrated circuits (ASICs)), hard disk drives (HDDs), hybrid hard disk drives (HHDs), optical disks, optical disk drives (ODDs), magneto-optical disks, magneto-optical drives, floppy disks, floppy disk drives (FDDs), magnetic tape, solid-state drives (SSDs), RAM drives, secure digital cards or drives, any other suitable computer-readable non-transient storage media, or any suitable combination of two or more of these. Computer-readable non-transient storage media may be volatile, non-volatile, or a combination of volatile and non-volatile.

[0150] While the invention has been described in conjunction with several specific embodiments, it will be apparent to those skilled in the art that many other substitutions, modifications, and variations will be readily apparent from the foregoing description. Therefore, the invention described herein is intended to encompass all such substitutions, modifications, applications, and variations that fall within the spirit and scope of the appended claims.

Claims

1. A method for generating quality metrics of optical coherence tomography (OCT) data, comprising: Obtain OCT data for a volume; Define one or more slice views based on volumetric OCT data; Generate multiple feature maps based on each slice view; The quality metric is determined based on the image attributes of the multiple feature maps; Display the quality metric or store the quality metric for further processing. The collection of OCT data for one volume includes: a) Collect multiple OCT volume samples of different sizes from the same retinal tissue region; b) Apply the following steps to each of the OCT volume samples: define one or more slice views, generate the plurality of feature maps, and determine the quality metric based on the plurality of feature maps, thereby defining a plurality of 2D quality map samples corresponding to the plurality of different OCT volume samples; and The method further includes: Composite OCT data is defined based on regions where the quality of the corresponding 2D quality map samples is higher than that of other regions among the multiple different OCT volume samples.

2. The method according to claim 1, wherein, Define multiple slice views.

3. The method according to claim 1 or 2, wherein, Each slice view is a frontal plan view of a sub-volume of the volume of the OCT data.

4. The method according to claim 1, 2 or 3, wherein, Determining the quality metric involves submitting the plurality of feature maps to a machine model, which is trained using a plurality of pre-graded OCT data volume samples, one or more training slice views defined for each OCT data volume sample, and a plurality of training feature maps generated based on each training slice view.

5. The method according to claim 4, wherein, The machine model is a deep learning model.

6. The method according to claim 4, wherein, The machine model is a neural network model.

7. The method according to any one of claims 1 to 6, wherein, The image attributes of the feature map include image texture features.

8. The method according to any one of claims 1 to 6, wherein, The image attributes of the feature map include Haralick features.

9. The method according to any one of claims 1 to 8, wherein, The OCT data refers to OCT angiography data.

10. The method according to any one of claims 1 to 9, wherein, The quality metric is a two-dimensional (2D) quality map, which identifies the quality metrics of different regions of the corresponding slice view.

11. The method of claim 10, wherein: Determining the quality metric includes submitting the plurality of feature maps to a machine model, which is trained to determine the quality map based on the plurality of feature maps; as well as The machine model is further trained to identify one or more causes of quality regions in the quality map that have lower quality metrics than a predetermined threshold, and to identify one or more corrective actions to improve the lower quality metrics in subsequent OCT acquisitions.

12. The method according to claim 11, wherein, The one or more causes are selected from a predefined list of error sources, which include one or more of the following: misfocus, opacity, illumination below a predefined threshold, light penetration below a predefined threshold, tracking or motion artifacts, and noise above a predefined threshold.

13. The method according to claim 11 or 12, wherein, The one or more corrective actions include focus adjustment, recommendation of pupil dilation, identification of alternative imaging angles, or identification of possible causes of missing eye tracking.

14. The method according to any one of claims 11 to 13, wherein, The correction action is output to an electronic display.

15. The method according to any one of claims 11 to 14, wherein, The correction action is transmitted to an automated subsystem that automatically performs the correction action before subsequent data acquisition.

16. The method according to any one of claims 10 to 15, wherein, It also includes defining the overall quality score of the acquisition based at least in part on the average of the individual quality metric distributions of the quality map.

17. The method according to any one of claims 10 to 16, wherein, The acquisition is an OCTA acquisition defined based on multiple OCT scans of the same area of ​​the retina, and the method further includes defining an overall quality score of the OCTA acquisition based at least in part on the average of the individual quality metric distributions of the quality maps of the multiple OCT scans that define the OCTA acquisition.

18. The method according to any one of claims 10 to 17, further comprising: Identify the target region within the acquisition, and assign the entire acquisition as good or bad based on the quality map metric corresponding to the target region.