OCT en face pathology segmentation using channel-coded slabs
By integrating multiple information channels into OCT images and using a machine learning model based on the U-Net architecture, the accuracy and efficiency issues of GA segmentation in OCT technology were resolved, achieving more accurate and faster GA region recognition.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CARL ZEISS MEDITEC INC
- Filing Date
- 2021-04-28
- Publication Date
- 2026-06-16
AI Technical Summary
Existing optical coherence tomography (OCT) techniques rely on single sub-RPE reflectance analysis when identifying map pattern shrinkage (GA), which is easily affected by other factors, resulting in inaccurate and time-consuming segmentation results, and requiring additional B-scan checks for confirmation.
Multiple OCT information sources are integrated into different channels of a single image, including retinal layer thickness, RPE integrity, and optical attenuation coefficient, and the GA region is automatically segmented using machine learning models such as the U-Net architecture.
It improves the accuracy and efficiency of GA region identification, reduces reliance on B-scan inspection, and provides more accurate and faster segmentation results.
Smart Images

Figure CN115443480B_ABST
Abstract
Description
Technical Field
[0001] This invention generally relates to a method for analyzing optical coherence tomography (OCT) data to identify target pathologies. More specifically, this invention relates to analyzing OCT data to identify geographic atrophy (GA). Background Technology
[0002] Age-related macular degeneration (AMD) is the most common eye disease in the elderly, caused by macular damage that leads to central vision loss. Some patients with AMD develop geographic atrophy (GA), which refers to areas of retinal cell regression and death. Geographic atrophy (GA) is a condition in which the macula appears late in non-exudative macular degeneration. GA has a characteristic appearance caused by the loss of the photoreceptor layer, retinal pigment epithelium (RPE), and choroidal capillaries. GA typically first appears in the perifoveal region and progresses around the fovea, then through the fovea, leading to loss of central vision. While there are currently no known treatments that effectively delay or reverse the effects of GA, characterizing and monitoring macular areas affected by GA is essential for patient diagnosis, monitoring, management, and treatment research purposes.
[0003] The appearance of GA regions has been investigated in reflective (color) fundus imaging, autofluorescence imaging, and more recently in optical coherence tomography (OCT) imaging (e.g., using 2D morphology imaging techniques). GA regions are effectively visualized in OCT not by directly observing RPE interruptions, but by utilizing the effect of these interruptions on light transmitted through the choroid. The RPE is a very highly reflective layer for OCT signals, and the increased penetration of light (OCT signals) into the atrophied choroid allows the visualization of GA presence in frontal sub-RPE images, for example, by axial projection of a sub-capacity (e.g., a slab) of OCT data extending from (or slightly above) the RPE and below it into the choroid. The presence of GA in OCT images can be identified as a brighter region within the frontal sub-RPE image. This characteristic of GA in OCT images can be termed sub-RPE hyperreflectivity.
[0004] The quantification of GA properties that may be valuable or meaningful for monitoring this condition (e.g., region size or distance to the foveal center, which can be viewed in frontal sub-RPE images or other 2D frontal images) depends on the delineation or segmentation of the GA region within the macula. However, manually segmenting GA regions in OCT data is a challenging and time-consuming task.
[0005] While sub-RPE hyperreflectivity remains a reasonable method for visualizing GA in OCT data, the presence of GA is inferred (e.g., based on changes in reflectivity) rather than directly observed. Therefore, this method can be affected by other factors that may influence reflectivity, such as the presence of superficial or choroidal vessels producing shadows, possible retinal opacities as hyperreflective foci, or areas of increased choroidal signal within the intact RPE. Due to these difficulties, frontal (e.g., frontal, planar) images of sub-RPE areas have been insufficient to date, necessitating careful B-scan examination of suspected GA areas to confirm their presence. A B-scan provides an axial (or lateral) slice view of the area (e.g., a slice through the retinal layer), and the presence of GA can be confirmed by noticing verifiable loss of the RPE layer and retinal thinning or collapse.
[0006] The most common automatic GA segmentation methods are based on the analysis of the sub-RPE hyperreflectivity in a single sub-RPE frontal image, and are subject to the aforementioned difficulties and potential errors. To date, checking and correcting potential errors requires careful B-scan inspection, as a single sub-RPE frontal image cannot provide all the necessary information.
[0007] A method is needed to identify the presence of GAs that takes into account information from the entire OCT volume, but is as simple as traditional methods that use sub-RPE frontal images.
[0008] One object of the present invention is to provide a more accurate GA detection system for use with OCT.
[0009] Another object of the present invention is to provide an OCT-based GA detection method that takes into account multiple types of OCT information, such as information that can be obtained from a combination of frontal images and B-scan images.
[0010] A further object of the present invention is to provide an OCT-based system that provides a frontal image representation of a combination of OCT-based data, non-OCT image data, and / or non-image data.
[0011] Another object of the present invention is to provide a system / method that identifies / segments GA regions in a frontal OCT image while taking into account image information and non-image information from different types of imaging modes. Summary of the Invention
[0012] The aforementioned objectives are achieved in methods / systems for analyzing optical coherence tomography (OCT) data to identify / segment target pathologies (e.g., geographic atrophy) in frontal images. This invention integrates different aspects of OCT capacity information into different image channels (e.g., color channels) of a single image to produce improved tools for GA detection, analysis, and segmentation. This method provides better visualization of GA in a single image, which can be used in segmentation algorithms to provide more accurate segmentation (e.g., GA segmentation), or for viewing and editing segmentation results. Unlike prior art that focuses on displaying and characterizing a single factor (e.g., sub-RPE reflectivity) associated with possible GA, this method integrates multiple different aspects of OCT capacity into additional channels (e.g., pixel / voxel channels), which can be combined to more accurately analyze and segment GA. For example, additional information may include analysis or related data on RPE integrity or retinal thinning (e.g., thinning of a specific retinal layer), which previously required analysis of a single B-scan.
[0013] Other objects and achievements, as well as a fuller understanding of the invention, will become apparent and understood by taking into account the accompanying drawings and the following description and claims.
[0014] To facilitate understanding of this invention, several publications may be cited or referenced herein. All publications cited or referenced herein are incorporated herein in their entirety by reference.
[0015] The embodiments disclosed herein are merely examples, and the scope of this disclosure is not limited thereto. Any feature of an embodiment referenced in one claim class (e.g., a system) may also be claimed in another claim class (e.g., a method). Dependencies or references in the appended claims are chosen solely for formal reasons. However, any subject matter arising from an intentional reference back to any prior claim, thereby disclosing any combination of claims and their features, may also be claimed, regardless of the dependency chosen in the appended claims. Attached Figure Description
[0016] In the accompanying drawings, the same reference symbols / characters refer to the same parts:
[0017] Figure 1 Three examples of GA are shown in images from three different imaging modes.
[0018] Figure 2 An example of a B-scan through the GA region is provided.
[0019] Figure 3 An embodiment of the present invention is shown.
[0020] Figure 4AThe sub-RPE slab projection is shown, which can traditionally be used to identify candidate GA regions.
[0021] Figure 4B The present invention illustrates a channel-encoded image (multi-channel composite image) in which each image channel (e.g., a color channel) embodies a different pathological-specific feature.
[0022] Figure 5 An alternative implementation is shown, in which the multichannel composite image consists of images from different imaging modes and optional non-image data.
[0023] Figure 6 The general workflow of the present invention is illustrated, including the U-net architecture implemented as a proof of concept.
[0024] Figure 7 The qualitative results of this proof-of-concept implementation are shown.
[0025] Figure 8 The quantitative measurements implemented in this proof-of-concept are shown.
[0026] Figure 9 A brief overview of the present invention is provided.
[0027] Figure 10 An example of a visual field testing instrument (perimeter) used to test a patient's visual field is shown.
[0028] Figure 11 An example of a slit-scan ophthalmic system for imaging the fundus is shown.
[0029] Figure 12 A generalized frequency-domain optical coherence tomography system suitable for use with the present invention for collecting 3D image data of the eye is shown.
[0030] Figure 13 An exemplary OCT B scan image of a normal retina of the human eye is shown, and various typical retinal layers and boundaries are illustratively identified.
[0031] Figure 14 An example of a frontal vascular system image is shown.
[0032] Figure 15 An exemplary B-scan image of the vascular system (OCTA) is shown.
[0033] Figure 16 An example of a multilayer perceptron (MLP) neural network is shown.
[0034] Figure 17 A simplified neural network consisting of an input layer, hidden layers, and an output layer is shown.
[0035] Figure 18 An example convolutional neural network architecture is shown.
[0036] Figure 19 An example U-Net architecture is shown.
[0037] Figure 20 An example computer system (or computing device or computer) is shown. Detailed Implementation
[0038] Geographic atrophy (GA) refers to areas of retinal cell depletion and death (atrophy). These atrophied areas often result in blind spots in a person's visual field. Therefore, monitoring and characterizing retinal areas affected by GA is fundamental to patient diagnosis and management. Various imaging modalities have been proven useful for detecting and characterizing GA, such as fundus imaging (including autofluorescence and fluorescein angiography), optical coherence tomography (OCT), and OCT angiography (OCTA).
[0039] Fundus imaging (e.g., obtained using a fundus camera) typically provides a frontal planar view of the fundus seen through the pupil. Fundus imaging can use light of different frequencies, such as white, red, blue, green, infrared, etc., to image tissues, or can use selected frequencies to excite fluorescent molecules in certain tissues (e.g., autofluorescence) or to excite fluorescent dyes injected into the patient (e.g., fluorescein angiography). A more detailed discussion of different fundus imaging techniques is provided below.
[0040] OCT is a non-invasive imaging technique that uses light waves to produce cross-sectional images of retinal tissue. For example, OCT allows people to view the unique tissue layers of the retina. Typically, an OCT system is an interferometric imaging system that determines the scattering distribution of the sample along the OCT beam by detecting the interference between light reflected from the sample and a reference beam, thus producing a three-dimensional (3D) representation of the sample. Each scattering distribution in the depth direction (e.g., z-axis or axial direction) can be reconstructed individually as an axial scan or A-scan. As the OCT beam scans / moves through a set of lateral (e.g., x-axis and y-axis) positions on the sample, cross-sections, two-dimensional (2D) images (B-scans), and extended 3D volumes (C-scans or cubic scans) can be constructed from the obtained A-scans. OCT also allows the construction of planar frontal views (e.g., frontal) 2D images of selected portions of tissue volume (e.g., target tissue blanks (sub-volumes) or target tissue layers of the retina). OCTA is an extension of OCT that can identify (e.g., render in image format) the presence or absence of blood flow in tissue layers. OCTA can identify blood flow by recognizing differences (e.g., contrast differences) over time in multiple OCT images of the same retinal region, and designates differences that meet predefined criteria as blood flow. A more in-depth discussion of OCT and OCTA is provided below.
[0041] Each imaging modality may characterize GA differently. Figure 1 Three examples of GA are shown in images from three different imaging modalities. Image 11 is a reflectance (color) fundus image and image 13 is an autofluorescence image (obtained using 2D topographic imaging techniques). Image 15 is a sub-RPE, OCT, and frontal image. In images 11, 13, and 15, GA is identified as region 17. GA typically results in the loss or thinning of several retinal layers, such as the photoreceptor layer, retinal pigment epithelium (RPE), and choroidal capillaries. Suspicious regions in the frontal view can be identified, at least in part, by the effects of the loss / reduction of these retinal layers caused by GA. As seen in fundus, OCT, and / or OCTA images, the reduction of these layers can lead to characteristic changes in color, intensity, or texture in the frontal planar view of the retina. For example, in a healthy retina, these layers, especially the RPE, tend to reflect OCT signals and limit their penetration into the retina. However, in GA regions, due to the thinning or loss of these layers, OCT signals tend to penetrate the retina more deeply, resulting in characteristic sub-RPE hyperreflectivity, as shown in image 15. While this superreflectivity may help identify potential candidate / suspicious regions for GA, it is not a direct detection of GA, as other factors can also affect the reflectivity of OCT signals.
[0042] A more direct way to examine GA is to use a B scan, which provides a slice view of the GA region and displays the individual retinal layers. Figure 2 An example of a B-scan through a GA region is provided. As shown, GA region 19 can be identified by thinning of the RPE and other layers, increased reflectivity / brightness beneath the RPE layer, and / or overall thinning or collapse of the retina. Therefore, a more straightforward method for searching for GA is to examine a single B-scan, but checking every B-scan in a volume for the presence of GA is typically time-consuming. Therefore, a faster method for detecting GA is to examine a single frontal planar view of the fundus / retina and identify suspicious areas, such as... Figure 1 As shown in images 11, 13, or 15, then examine the individual B scans (one after another) that traverse the suspicious areas.
[0043] The detection, characterization, and segmentation of GA in OCT data has traditionally been accomplished by analyzing the increased signal from the choroid derived from RPE interruptions in sub-RPE frontal preforms / images. Examples of this approach can be found in Qiang Chen et al., "Semi-automatic geographic atrophy segmentation for SD-OCT images," Biomed. Opt. Express 4, 2729-2750 (2013), and Sijie Niu et al., "Automatic edgeographic atrophy segmentation for SD-OCT images using region-based C-V model via local similarity factor," Biomed. Opt. Express 7, 581-600 (2016). However, this approach can encounter difficulties and is limited by many factors that can affect the signal observed in sub-RPE frontal images (e.g., reflectivity). Therefore, it is known that commercially available automated GA segmentation tools based on this approach do not provide optimal results, requiring subsequent B-scan checks to confirm the presence of potential GA.
[0044] Ji Z. et al., “Retinal Layers: A Deep Voting Model for Automated Geographic Atrophy Segmentation in SD-OCT Images”, Transl Vis Sci Technol, 2018; 7(1), describes another approach. This approach is based on a neural network and directly uses a single B-scan for GA segmentation. However, this process can be complex due to the need to annotate each B-scan, and may produce discontinuous results in adjacent B-scans. Furthermore, this approach does not address the issue of needing subsequent B-scans to confirm the GA segmentation results.
[0045] Therefore, OCT-based GA characterization methods have traditionally focused on analyzing sub-RPE reflectance in frontal images, neglecting important information such as direct analysis of RPE integrity or retinal thinning. Previously, additional examination of multiple images (e.g., from different imaging modalities) or multiple OCT B scans was required to account for this information.
[0046] This invention collects GA feature information from multiple different image views (and / or imaging modes) and combines / merges this information into a single, customized frontal retinal image. This image, by presenting multiple information sources, allows for more accurate identification of GA regions. In other words, this invention presents these information sources in a single image, which can be used for manual determination of the presence of GA or for automatic or semi-automatic GA segmentation tools. Different feature information can be stored (or encoded) in additional channels of the image (e.g., color channels), for example, based on pixels (or voxels). Combining different features (e.g., GA-related information) into different channels of a single image allows for more accurate and efficient characterization and segmentation of GA regions.
[0047] In summary, the first exemplary implementation of the present invention uses, for example, a set of frontal images with different definitions (from forming an orthographic view). Figure 2 The OCT capacity of the D-image is used to project topographic information onto different channels (e.g., color channels) of a single image, and this information is then stacked to generate channel-coded images, where different attributes characterizing the presence of GA (or other target pathology) are encoded in different channels. A preliminary step could be to segment a set of retinal layers within the OCT capacity to aid in frontal image creation (e.g., to help select layers that are likely characteristic of the target pathology and are associated with it, and to define the slabs from which to create the frontal image). Then, a set of distinct frontal images is created using the segmented layers and / or the set of slabs defined. These frontal images can then be stacked (or combined) in different channels (e.g., pixel / voxel color channels). The resulting stacked slab sets can be used to visualize GA (or other specific pathology) and other retinal landmarks (e.g., if three different slabs are stacked instead of the individual red, green, and blue (RGB) color channels of a typical color image), and / or to automatically segment GA using information encoded in different channels.
[0048] Figure 3An embodiment of the invention is illustrated. The first step is to acquire OCT data 21 (e.g., one or more data volumes of the same region of the retina). Optionally, the OCT data 21 may be submitted to a retinal slice segmentation process. As part of or in addition to this process, each A scan in the OCT data 21 is processed to extract a series of measures 23, where each measure is selected to measure (or emphasize, for example, by weight) a feature of a specific target pathology (e.g., GA). That is, the extracted measures may be associated with the same pathology type, such that each measure provides a different label for the same pathology type. The extracted measures are categorized or otherwise collected into corresponding measure groups (measure-1 group to measure-n group), optionally having a one-to-one correspondence. The OCT data 21 may include OCT structural data and / or OCTA streaming data, and the extracted measures may include OCT-based measures extracted from the OCT structural data, such as retinal slice thickness, distance from a specific A scan to a specific retinal structure (e.g., distance to the foveal center), slice integrity (e.g., loss of a specific slice), sub-RPE reflectivity, internal RPE reflectivity, total retinal thickness, and / or optical attenuation coefficient (OAC).
[0049] The optical attenuation coefficient (OAC) is an optical property of a medium that determines how the power of a coherent beam propagating through a medium (e.g., a turbid medium, such as tissue) is attenuated along its path due to scattering and absorption. The Lambert-Beer law gives the irradiance (power per unit area) of a coherent beam propagating through a medium (e.g., a homogeneous medium): L(z) = L0e -μzWhere L(z) is the irradiance of the beam after passing through the medium a distance z, L0 is the irradiance of the incident beam, and μ is the optical attenuation coefficient. A large attenuation coefficient causes the irradiance of the coherent beam to decrease rapidly exponentially with depth. Since OAC is an optical property of the medium, determining OAC provides information about the composition of that medium. The applicant proposes that providing OAC (per A scan) as one of the measures of extraction may be helpful in identifying specific pathologies (e.g., GA), especially because it can indicate the current state of the tissue at a particular A scan location (e.g., light attenuation state). An example of how OAC can be determined / calculated is provided in KAVermeer et al., “Depth-Resolved Model-Based Reconstruction of Attenuation Coefficients in Optical Coherence Tomography”, Biomedical Optics Express, Vol. 5, Issue 1, pp. 322-337 (2014). Discussions of previous applications of OAC can be found in Utka Baran et al., “In Vivo Tissue Injury Mapping Using OpticalCoherence Tomography Based Methods”, Applied Optics, Vol.54, No.21, July 20, 2015.
[0050] The extracted metrics may also include OCTA-based metrics extracted from OCTA flow data, such as flow measurements (e.g., blood flow) at locations within one or more layers (e.g., choroidal capillaries, Sadler's layer, Haller's layer, etc.) and distances from the flow data to the foveal center. In this way, each set of metrics can describe a different pathological feature, and multiple sets of metrics (metric-1 to metric-n) can be used to define multiple corresponding pathological feature images (PCI-1 to PCI-n), each highlighting a different pathological feature. Each pathological feature image PCI-1 to PCI-n can define a frontal image. Different pathological feature images can then be used to define different pixel channels (Ch1 to Chn) and combined (as shown in box 25) to define a multi-channel composite image (e.g., a channel-encoded image) 27. Optionally, the dimension of the multi-channel composite image 27 may be lower than that of the OCT data 21, such as a frontal image (and / or a B-scan image), where each pixel location of the composite image 27 is based on the corresponding A-scan location of the OCT data 21. In this way, each metric can be used as the basis for different corresponding channels in the synthetic image 27. The resulting multi-channel synthetic image 27 can then be submitted to a machine learning model 29, which is trained to identify target pathologies (e.g., GA) based on pathological feature data (e.g., metric sets) embodied in individual image channels. The identified pathologies can then be displayed or stored for future processing in computing device 31.
[0051] For example, the proof-of-concept implementation uses three sets of metrics to define three distinct plates (pathological feature images) assigned to the three typical red, green, and blue (RGB) color channels of an image. It should be understood that channel-encoded images may optionally have more (or fewer) channels. Figure 4A The sub-RPE slab projection is shown, which can traditionally be used to identify candidate GA regions, and Figure 4BA channel-encoded image (multichannel composite image) according to the present invention is shown, wherein each image channel (e.g., a color channel) embodies a different pathologically specific feature (embodying a different set of metrics). In this example, the red channel (or mid-gray in a monochrome image) contains sub-RPE reflectance data. To collect metrics for the red channel, a 300 μm blank is defined outside the RPE layer and near the choroid, with surface constraints specified between 50 μm and 350 μm offsets added to the RPE layer, respectively. The OCT signal in this blank is filtered to remove noise and then processed so that the signal at each A-scan location has a decreasing function with increasing depth, filling the "valleys" in the signal. That is, for each specific pixel in the A-scan, this value is set to the highest value recorded in such an A-scan from the pixel under consideration to the increased depth within the defined blank constraint. This operation is set to eliminate lower value signals originating from the presence of choroidal vessels. The resulting data is projected onto a frontal image by averaging the pixel values within the blank constraint definition for each A-scan. The resulting values in the frontal image are then normalized to a range between 0 and 1. The objective of this slab is to characterize the increased reflectivity present in the choroid of the GA region.
[0052] In this example, the green channel (e.g., light gray in a monochrome image) contains the internal RPE reflectivity. To collect measurements for the green channel, a 20 μm blank (an estimate of the Bruch membrane curvature set at the RPE centerline level) is defined within the RPE-Fit layer, with surface constraints specified between the RPE-Fit layer and offsets of -50 μm and -30 μm, respectively. The OCT signal within this blank is filtered to remove noise and then processed so that the signal at each A-scan location has a function that increases with depth, filling the "valleys" in the signal. That is, for each specific pixel in the A-scan, this value is set to the highest value recorded in this A-scan from the internal blank constraint to the pixel under consideration. This is set to eliminate lower signal values caused by shadows from other high-opacity structures in the intact RPE (e.g., blood vessels, drusen, or hyperreflective lesions). The resulting data is projected onto the frontal image by averaging the pixel values within the blank constraint definition for each A-scan. The resulting values in the frontal image are normalized to a range between 0 and 1. The objective of this slab is to characterize the lower reflectivity in locations with photoreceptors and RPE loss.
[0053] In this example, the blue channel (or dark gray in a monochrome image) contains retinal thickness. To collect measurements for the blue channel, the distance (retinal thickness) between the ILM layer and the RPE-Fit layer was measured for each A-scan location and projected onto the frontal image. The recorded values were then scaled with an inverse linear operation to take values from 0 to 1, such that a retinal thickness of 100 μm was assigned a value of 1, and a thickness of 350 μm was assigned a value of 0. The goal of this slab is to characterize localized regions of retinal thinning and collapse features present in GA.
[0054] Figure 5 An alternative implementation is shown, wherein the multichannel composite image (i.e., color-coded or monochrome-coded image) 27 consists of images of different imaging modes and optional non-image data. In this embodiment, OCT data 21 is acquired or otherwise accessed. OCT data 21 may be a cube scan consisting of multiple A scans and / or accessed B scans, and may optionally include multiple scans of the same region separated in time. In this case, OCTA data 22 can be determined from OCT data 21, as indicated by dashed arrow 24. Alternatively, OCTA data 22 may be acquired / accessed separately. A series of measures can then be extracted from each OCT data 21 and OCTA data 22, and a set of pathological feature images (or graphs) can be defined from the extracted measures (e.g., measure groups). In this illustration, images OCT1 to OCT4 are defined based on measures extracted from OCT data 21, and images OCTA1 to OCTA3 are defined based on measures extracted from OCTA data 22. For example, images OCT1 through OCT4 could represent a frontal sub-RPE image, a thickness map of selected layers (e.g., photoreceptor layers, retinal pigment epithelium (RPE), and / or choroidal capillaries, and could further locate the fovea, for example, by using an automatic fovea localization algorithm), a layer integrity map (e.g., typically determined from multiple B scans), and an OAC map, respectively. In this example, images OCT1 through OCT3 could represent a frontal OCT image of choroidal capillary flow, an image of Sattler layers (e.g., between the underlying Bruch membrane, choroidal capillaries, and Haller layer, and the upper choroidal layer), and / or flow maps of other selected layers related to the location of the fovea, respectively.
[0055] As explained above, GA can lead to progressive vision loss, especially central vision. However, GA may begin with vision loss outside the central area and progress towards the center over time. Therefore, it is advantageous to combine information from visual field test results (FV). A visual field test is a method of measuring an individual's entire visual field, such as their central and peripheral (lateral) visual acuity. A visual field test is a method of mapping the visual field of each eye individually, which can detect blind spots (scotomas) as well as more subtle areas of visual blur. A visual field meter, or "perimeter," is a specialized machine / device / system used to perform visual field tests on a patient. A more in-depth discussion of visual field meters and visual field tests is provided below. All or selected portions of the visual field test (e.g., VF grayscale or numerical grayscale mapped to the corresponding retinal location) can be incorporated into this multichannel composite image 27.
[0056] Additional imaging modalities may include one or more fundus images (FI, such as white, red, blue, green, infrared, autofluorescence, etc.) and fluorescein angiography images (FL).
[0057] Each of the different data types described above can represent a different pathological feature image and is combined as shown in box 25 to define a multichannel composite image 27. As shown, each pixel (shown as circle Px1) can include data (e.g., measurements) from each of the aforementioned sources. For example, each pixel can define (for the corresponding retinal location) a data record consisting of multiple data fields, one data record for each merged pathological feature image. Each pixel can include a visual field test data field (VF-1), a fundus image data field (FI-1), a fluorescein angiography image data field (FL-1), OCT structure data fields (OCT1-1, OCT2-1, OCT3-1, and OCT1-4) from each corresponding OCT structure image, and an OCTA stream data field (OCTA1-1, OCTA2-1, and OCTA3-1) from each corresponding OCTA stream image.
[0058] The synthesized image 27 can then be submitted to a machine learning model 27 for processing or training, as described below. (As in...) Figure 5 In this implementation, the output from machine learning model 27 can be submitted to a computing device (not shown) for display or storage. Optionally, non-image data 28 can also be submitted to machine model 29 for processing indirectly via dashed arrow 26A or directly via dashed box arrow 26B. That is, non-image data 28 can be optimally incorporated into the synthetic image 27 via box 25. Non-image data 28 may include patient demographic data (e.g., age, ethnicity, etc.) and / or medical history data (e.g., previously prescribed medications and diagnosed diseases related to the sought pathology), which may be obtained, for example, from an electronic medical record (EMR) system.
[0059] The proof-of-concept application of this invention implements machine learning model 29 as a neural network architecture, which is trained for automatic segmentation of GA regions in the synthetic image 27 (e.g., in the generated channel-encoded image). A general discussion of the neural network is provided below. All accessed images (and / or maps) used to define the synthetic image 27 can be normalized and resized to 256 × 256 × 3 pixels. Each image is then divided into nine overlapping blocks of 128 × 128 × 3 pixels, with 64 pixels overlapping (50%) in both directions.
[0060] Figure 6 The general workflow of the present invention is illustrated, including a proof-of-concept implementation of the U-net architecture. As shown, OCT data 21 is accessed / acquired, and multiple pathological feature images (PCIs) are defined from the OCT data 21, as described above. For example, the pathological feature images (PCIs) may include those referenced above. Figure 3 Images of sub-RPE reflectance, internal RPE reflectance, retinal thickness, and / or optical attenuation coefficient (OAC) as described in section 4. Alternatively, other pathological feature images may also be used, such as those referenced above. Figure 5 The aforementioned frontal images / figures of specific retinal layer thickness or layer integrity, OCTA flow images (e.g., flow at or near choroidal capillaries), and / or other pathological feature data are then combined into different (e.g., color or monochrome) channels of a channel-encoded image 27, which is then fed to a machine learning model 29, implemented using the U-Net architecture in this paper. A discussion of the U-Net architecture is provided below.
[0061] In this exemplary U-Net architecture, the shrinking path consists of four convolutional neural network (CNN) blocks. Each CNN block in the shrinking path may include two (e.g., 3×3) convolutions, as indicated by the asterisk symbol "*", and an activation function (e.g., a rectified linear (ReLU) unit), optionally using batch normalization. The output of each CNN block in the shrinking path is downsampled, for example by 2×2 max pooling, as indicated by the down arrow. The output of the shrinking path is fed into a bottleneck BN, which is shown here to consist of two convolutional layers (e.g., with batch normalization and an optional 0.5 exit). The expanding / extending path follows the bottleneck BN and here consists of five CNN blocks. In the expanding path, the output of each block provides a transposed convolution (or deconvolution) to upsample the image / information / data (e.g., upconversion). In this example, the upconversion is characterized by a 2×2 kernel (or convolutional matrix), as indicated by the up arrow. The copy and prune links CC1 through CC4 between the corresponding downsampling and upsampling blocks copy the output of a downsampling block and concatenate it to the input of its corresponding upsampling block. At the end of the expansion path, the output of the last upsampling block is submitted to another convolutional operation (e.g., a 1×1 output convolution), as indicated by the dashed arrow, which then produces its output U-out. For example, a neural network might have multiple features per pixel just before reaching the 1×1 output convolution, but the 1×1 convolution combines these multiple features into a single output value for each pixel at a pixel-by-pixel level.
[0062] A combination of binary cross-entropy and dice coefficient loss was used for training. The final layer was fine-tuned using "cake frosting," a method that involves (re)training only the last layer after normal training. Training used 250 magnifying cubes (58 pixels at 512×128×1024 pixels; and 192 pixels at 200×200×1024 pixels), which were trained using CIRRUS. TM HD-OCT 4000 and 5000 (ZEISS, Dublin, CA) were obtained from 155 patients. Experts manually plotted the GA contour segmentation on the frontal images, while also examining the hyperreflectivity beneath the RPE and any potential RPE disruptions in the available frontal images and B-scans. For each macular cube, a 3-channel frontal image was generated as described above (e.g., refer to...). Figure 3 and 4A The training and test sets of the custom-generated frontal images consisted of 225 eyes (187 with GA; 19 with drusen without GA; and 19 from healthy subjects) and 25 eyes (11 with GA, 5 with drusen without GA; and 9 from healthy subjects), respectively.
[0063] In operation, the trained U-Net outputs GA segmentation 33 based on the channel encoding in image 21, as shown below. Figure 6 As shown, the output GA segment 33 can be submitted to a threshold operation (and other known segment cleanup operations) to produce segment output 35.
[0064] The segmentation of this algorithm in the test set was compared with manually labeled segments using qualitative and quantitative measures (e.g., area, Bland-Altman and Pearson correlations). Figure 7 Qualitative results of this proof-of-concept implementation (e.g., the proposed algorithm) are shown, and Figure 8 The quantitative measurements implemented in this proof-of-concept are shown. Figure 7 In the table, column a) shows the obtained OCT frontal image, column b) shows the generated 3-channel encoded image used as input to the currently trained U-Net machine model, column c) shows the base-ground image (i.e., the GA regions annotated by human experts), column d shows the output produced by the proposed algorithm, and column e) shows the current CIRRUS... TM The output of the "Advanced RPE Analysis" tool available in HD-OCT inspection software is shown. For example... Figure 8 As shown, the absolute area and fractional area differences between the GA region generated by the proposed algorithm and the manually labeled region by experts are 0.11 ± 0.17 mm, respectively. 2 The value was 5.51 ± 4.7%, while the advanced RPE analysis tool showed 0.54 ± 0.82 mm. 2 And 25.61±42.3%. Use The inference time for an i7@2.90GHz CPU is 1183ms per frontal image. The correlation between the GA area generated by the proposed algorithm and expert manual labeling, and the correlation between the GA area generated by advanced RPE analysis tools and manual labeling, are 0.9996 (p-value < 0.001) and 0.9259 (p-value < 0.001), respectively. The Bland-Altman plot between manually labeled segments and segments generated using the proposed algorithm shows stronger consistency than segments generated using existing advanced RPE analysis tools.
[0065] Figure 9 This provides a brief overview of the invention. The method for analyzing optical coherence tomography (OCT) data to identify specific pathologies (e.g., GA) can begin by accessing OCT data (step S1), which may include OCTA data (or the OCTA data may be derived from the accessed OCT data). That is, the accessed data (e.g., data captured using an OCT system, data read from a data store of previously captured / processed OCT data, etc.) may include OCT structural data and OCTA streaming data.
[0066] Optionally, this method may also include accessing non-OCT-based data (step S2), including imaging data from imaging modalities different from OCT. For example, the system may access fundus images, fluorescein angiography images, visual field test maps, and / or non-image data (e.g., patient demographics, disease and medication history, etc.).
[0067] In step S3, a series of measures are extracted from the accessed OCT data (and optionally other data extracted in step S2). The extracted measures may include OCT-based measures extracted from the OCT structural data and / or OCTA-based measures extracted from the OCTA stream data. The measures may be specific to a particular retinal layer and / or may include information related to the distance from the current location to predefined retinal landmarks. For example, measures may be extracted from each individual A-scan, and the measures may include information about the current A-scan location (or axial position within the current A-scan) relative to the fovea, relative to a specific retinal layer region, or relative to other retinal landmarks.
[0068] In step S4, a set of images is created, where each image defines or highlights pathologically specific (e.g., GA) features. That is, the created images can characterize (e.g., associate) the same pathological type. The created images can be based on extracted metrics or any other data type accessed in step S2. For example, extracted metrics from each A scan can be categorized into corresponding metric groups (e.g., having a one-to-one correspondence), and different images can be created based on each corresponding metric group. Images created from OCT-based data can be frontal images, while images created from non-OCT-based data can be planar, frontal view images. For example, the created images can include frontal images of sub-RPE reflectance, internal RPE reflectance, frontal retinal thickness, choroidal capillary flow, Sattler's layer, Haller's layer, and 2D distributions of fundus images (e.g., white light, red light, blue light, green light, infrared light, autofluorescence, etc.), fluorescein angiography images, visual field test maps, and / or non-image data (e.g., patient demographic data).
[0069] In step S5, a multi-channel image of the set of images is defined. For example, a multi-channel image can define multiple "color" channels for each pixel, where each created image defines a separate color channel. In other words, a multi-channel image can include multiple image channels based on multiple imaging modes, respectively. Optionally, a combination of created images can define a single color channel.
[0070] In step S6, the defined multi-channel images are submitted to a machine learning model (e.g., a neural network with a U-Net architecture), which is trained to identify one or more pathologies based on pathological feature data from individual image channels (preferably trained to identify the target pathology). The machine model can identify the target pathology by outlining / segmenting the pathology on a frontal OCT image. That is, the locations of individual image channels can be mapped to a general frontal OCT image, and the regions in the identified multi-channel images containing pathologies (based on a combination of pathological feature data provided by the individual channels of each pixel in the multi-channel image) can be mapped back to a general frontal OCT image.
[0071] In step S7, the identified pathology is displayed or stored in a computing device for future reference.
[0072] The following provides a description of various hardware and architectures applicable to this invention.
[0073] Visual field testing system
[0074] The improvements described herein can be used with any type of visual field testing instrument / system (e.g., a perimeter). One such system is the "bowl-shaped" visual field testing instrument VF0, such as... Figure 10 As shown. The subject (e.g., a patient) VF1 is shown observing a hemispherical projection screen (or other type of display) VF2, which is typically bowl-shaped, and the tester VF0 is referred to as the bowl. Typically, the subject is instructed to gaze at the center point of the hemispherical screen VF3. The subject rests his / her head on a patient support, which may include a chin rest VF12 and / or a forehead rest VF14. For example, the subject rests his / her head on the chin rest VF12 and his / her forehead on the forehead rest VF14. Optionally, the chin rest VF12 and the forehead rest VF14 may move together or independently of each other to properly fix / position the patient's eyes, for example, relative to the test lens support VF9, which can hold the lens through which the subject can view the screen VF2. For example, the chin rest and the headrest may move independently in the vertical direction to accommodate different patient head sizes and move together in the horizontal and / or vertical directions to properly position the head. However, this is not limiting, and those skilled in the art can envision other arrangements / movements.
[0075] A projector or other imaging device VF4, controlled by processor VF5, displays a series of test stimuli (e.g., test points of any shape) VF6 on screen VF2. Subject VF1 indicates that he / she has seen stimulus VF6 by initiating user input VF7 (e.g., pressing an input button). This subject response can be recorded by processor VF5 and can be used to assess the visual field of the eye based on the subject's response, such as determining that subject VF1 can no longer see the size, location, and / or intensity of test stimulus VF6, and thus determining the (visibility) threshold of test stimulus VF6. Camera VF8 can be used to capture the patient's gaze (e.g., gaze direction) throughout the test. Gazing direction can be used to align the patient and / or determine whether the patient is following the correct test procedure. In this example, camera VF8 is located on the Z-axis relative to the patient's eye (e.g., relative to the test lens holder VF9) and behind the bowl (screen VF2) to capture real-time images or videos of the patient's eyes. In other embodiments, the camera may be located outside this Z-axis. Images from the gaze camera VF8 can optionally be displayed on a second display VF10 to a clinician (or, interchangeably, a technician) to aid patient alignment or test verification. The camera VF8 can record and store one or more images of the eye during each stimulus presentation. This may result in the collection of dozens to hundreds of images per visual field test, depending on the testing conditions. Alternatively, the camera VF8 can record and store a full-length video during the test, providing timestamps indicating when each stimulus was presented. Furthermore, images can be collected between stimulus presentations to provide detailed information about the subject's overall attention throughout the VF test.
[0076] The test lens holder VF9 can be placed in front of a patient to correct any refractive error in the eye. Optionally, the lens holder VF9 can carry or hold a liquid test lens (see, for example, U.S. Patent No. 8,668,338, the contents of which are incorporated herein by reference in their entirety), which can be used to provide variable refractive correction for the patient's VF1. However, it should be noted that the invention is not limited to using a liquid test lens for refractive correction, and other conventional / standard test lenses known in the art can also be used.
[0077] In some implementations, one or more light sources (not shown) may be positioned in front of the subject's VF1 eye, producing reflections from the ocular surface, such as the cornea. In one variation, the light source may be a light-emitting diode (LED).
[0078] Although Figure 10The projection-type visual field tester VF0 is shown, but the invention described herein can be used with other types of devices (visual field testers), including those that generate images via liquid crystal displays (LCDs) or other electronic displays (see, for example, U.S. Patent No. 8,132,916, which is incorporated herein by reference). Other types of visual field testers include, for example, flat-screen testers, miniaturized testers, and binocular visual field testers. Examples of these types of testers can be found in U.S. Patent Nos. 8,371,696, 5,912,723, 8,931,905, and U.S. Design Patent D472,637, each of which is incorporated herein by reference in its entirety.
[0079] The visual field testing device VF0 may include an instrument control system (e.g., an algorithm, which may be software, code, and / or routines) using hardware signals and an electric positioning system to automatically position the patient's eyes in the desired location, such as the center of the refractive corrective lens at the lens holder VF9. For example, stepper motors can move the chin rest VF12 and forehead rest VF14 under software control. A rocker switch may be provided, allowing the attending technician to adjust the patient's head position by operating the chin rest and forehead stepper motors. Manually movable refractive lenses may also be placed in front of the patient's eyes on the lens holder VF9, as close as possible to the patient's eyes without adversely affecting patient comfort. Optionally, the instrument control algorithm may pause visual field testing execution while the chin rest and / or forehead motors are in motion, if such movement would interrupt test execution.
[0080] Fundus imaging system
[0081] Two types of imaging systems used for fundus imaging are flood illumination imaging systems (or flood illumination imagers) and scanning illumination imaging systems (or scanning imagers). A flood illumination imager simultaneously illuminates the entire field of view (FOV) of a sample with floodlight, for example, by using a flash and capturing a full-frame image of the sample (e.g., the fundus) using a full-frame camera (e.g., a camera with a sufficiently large two-dimensional (2D) light sensor array to capture the desired FOV as a whole). For example, a flood illumination fundus imager illuminates the fundus of the eye and captures a full-frame image of the fundus in a single image capture sequence from the camera. A scanning imager provides a scanning beam that scans across a subject (e.g., the eye), and as the scanning beam scans across the subject, it images at different scanning locations, producing a series of image segments that can be reconstructed (e.g., montage) to create a synthetic image of the desired FOV. The scanning beam can be a point, a line, or a two-dimensional region, such as a slit or a wide line. Examples of fundus imagers are provided in U.S. Patents 8,967,806 and 8,998,411.
[0082] Figure 11An example of a slit-scanning ophthalmic system SLO-1 for imaging the fundus F is shown. The fundus F is the inner surface of the eye E opposite the lens (or optic disc) CL and may include the retina, optic disc, macula, fovea, and posterior pole. In this example, the imaging system is in a so-called “scan-to-de-scan” configuration, where the scan line beam SB scans the fundus F through the optical components of the eye E (including the cornea Cm, iris Irs, pupil Ppl, and lens CL). In the case of a floodlight fundus imager, a scanner is not required, and light is applied to the entire desired field of view (FOV) at a time. Other scanning configurations are known in the art, and the specific scanning configuration is not critical to the invention. As depicted, the imaging system includes one or more light sources LtSrc, preferably a multicolor LED system or a laser system, wherein the optical extension (electendue, spread, light concentration) has been appropriately adjusted. An optional slit Slt (adjustable or static) is located in front of the light source LtSrc and can be used to adjust the width of the scan line beam SB. Furthermore, the slit Slt can remain stationary during imaging or can be adjusted to different widths to allow for different levels of confocality and different applications, whether for a specific scan or during a scan used to suppress reflections. An optional objective lens ObjL can be placed in front of the slit Slt. The objective lens ObjL can be any lens in the prior art, including but not limited to refractive, diffractive, reflective, or hybrid lenses / systems. Light from the slit Slt passes through the pupil splitter SM and is directed to the scanner LnScn. It is preferable to keep the scanning plane and the pupil plane as close as possible to reduce vignetting in the system. Optional optics DL can be included to manipulate the optical distance between the images of the two components. The pupil splitter SM can transmit the illumination beam from the light source LtSrc to the scanner LnScn and reflect the detection beam from the scanner LnScn (e.g., reflected light returning from the eye E) toward the camera Cmr. The task of the pupil splitter SM is to split the illumination and detection beams and help suppress system reflections. The scanner LnScn can be a rotating galvanometer scanner or other types of scanners (e.g., piezoelectric or voice coil, microelectromechanical systems (MEMS) scanners, electro-optic deflectors, and / or rotating polygon scanners). Depending on whether pupil splitting occurs before or after the scanner LnScn, the scan can be divided into two steps, with one scanner in the illumination path and a separate scanner in the detection path. A particular pupil splitting arrangement is described in detail in U.S. Patent No. 9,456,746, which is incorporated herein by reference in its entirety.
[0083] From the scanner LnScn, an illumination beam passes through one or more optics, in this case a scanning lens SL and an ophthalmic or eyepiece OL, which allows the pupil of the eye E to image onto the system's image pupil. Typically, the scanning lens SL receives the scanning illumination beam from the scanner LnScn at any of a plurality of scanning angles (incident angles) and produces a scanning line beam SB with a substantially planar focal plane (e.g., a collimated optical path). The ophthalmic lens OL can then focus the scanning line beam SB onto the object to be imaged. In this example, the ophthalmic lens OL focuses the scanning line beam SB onto the fundus F (or retina) of the eye E to image the fundus. In this way, the scanning line beam SB produces a transverse scanning line across the fundus F. One possible configuration of these optics is a Keplerian telescope, in which the distance between the two lenses is selected to produce an approximately telecentric intermediate fundus image (4-f configuration). The ophthalmic lens OL can be a single lens, an achromatic lens, or an arrangement of different lenses. As those skilled in the art will know, all lenses can be refractive, diffractive, reflective, or a combination of these. The focal lengths of the ophthalmic lens (OL), scanning lens (SL), and the size and / or form of the pupillary splitter (SM) and scanner (LnScn) can vary depending on the desired field of view (FOV), and thus an arrangement can be envisioned where multiple components can switch the entry and exit beam paths, for example, by using flip-up optics, motorized wheels, or detachable optics, depending on the FOV. Since variations in FOV result in different beam sizes across the pupil, pupillary splitting can also be altered in conjunction with changes in FOV. For example, a 45° to 60° FOV is typical or standard for fundus cameras. Higher FOVs, such as 60°–120° or even larger, may also be feasible. A wide FOV may be required for combinations of wide-line fundus imaging (BLFI) with another imaging modality (e.g., optical coherence tomography (OCT)). The upper limit of the FOV can be determined by the achievable working distance combined with the physiological conditions surrounding the human eye. Since the typical human retina has a 140° horizontal and 80°–100° vertical FOV, an asymmetrical FOV may be necessary to achieve the highest possible FOV on the system.
[0084] The scanning line beam SB passes through the pupil Ppl of the eye E and is directed towards the retina or fundus surface F. The scanner LnScn1 adjusts the position of the light on the retina or fundus F so that a lateral range on the eye E is illuminated. Reflected or scattered light (or emitted light in the case of fluorescence imaging) is guided back along a similar path to the illumination to define the collection beam CB on the detection path to the camera Cmr.
[0085] In the "scan-de-scan" configuration of this exemplary slit-scan ophthalmic system SLO-1, the light returning from eye E is "de-scanned" by scanner LnScn on its way to the pupillary splitter SM. That is, scanner LnScn scans the illumination beam from the pupillary splitter SM to define a scanning illumination beam SB across eye E, but since scanner LnScn also receives the returning light from eye E at the same scanning position, scanner LnScn has the effect of de-scanning the returning light (e.g., canceling the scanning action) to define a non-scanning (e.g., stable or stationary) collection beam from scanner LnScn to pupillary splitter SM, which folds the collection beam onto camera Cmr. At pupillary splitter SM, reflected light (or emitted light in the case of fluorescence imaging) is separated from the illumination beam onto a detection path pointing to camera Cmr, which can be a digital camera with a light sensor for capturing an image. Imaging (e.g., objective lens) lens ImgL can be positioned in the detection path to image the fundus onto camera Cmr. Similar to the objective lens ObjL, the imaging lens ImgL can be any type of lens known in the art (e.g., refractive, diffractive, reflective, or hybrid lens). Additional operational details, particularly methods for reducing artifacts in images, are described in PCT Publication No. WO2016 / 124644, the contents of which are incorporated herein by reference in their entirety. The camera Cmr captures received images, and for example, it creates image files, which can be processed by one or more (electronic) processors or computing devices (e.g., ...). Figure 20 The computer system further processes the data. Therefore, the collected beam (returning from all scan positions of the scan line beam SB) is collected by the camera Cmr, and the full-frame image Img can be constructed, for example, by montage from the synthesis of the individually captured collected beams. However, other scanning configurations are also conceivable, including configurations where the illumination beam scans on the eye E and the collected beam scans on the camera's light sensor array. PCT Publication WO 2012 / 059236 and U.S. Patent Publication No. 2015 / 0131050 (incorporated herein by reference) describe several embodiments of a slit scanning ophthalmoscope, including various designs in which the returned light sweeps across the camera's light sensor array and in which the returned light does not sweep across the camera's light sensor array.
[0086] In this example, the camera Cmr is connected to a processor (e.g., a processing module) Proc and a display (e.g., a display module, computer screen, electronic screen, etc.) Dspl. Both can be part of the imaging system itself, or they can be part of separate dedicated processing and / or display units, such as a computer system, where data is transmitted from the camera Cmr to the computer system via cable or computer network (including wireless network). The display and processor can be an all-in-one unit. The display can be a conventional electronic display / screen or touchscreen type, and can include a user interface for displaying and receiving information to and from the instrument operator or user. The user can interact with the display using any type of user input device known in the art, including but not limited to a mouse, knob, button, pointer, and touchscreen.
[0087] During imaging, it may be necessary for the patient to maintain a fixed gaze. One way to achieve this is to provide a fixation target that can guide the patient's gaze. The fixation target can be inside or outside the instrument, depending on which area of the eye is being imaged. Figure 11 The diagram illustrates one implementation of an internal fixation target. In addition to the primary light source LtSrc used for imaging, a second optional light source FxLtSrc, such as one or more LEDs, can be positioned to image a light pattern onto the retina using a lens FxL, a scanning element FxScn, and a reflector / mirror FxM. The fixation scanner FxScn can move the position of the light pattern, and the reflector FxM guides the light pattern from the fixation scanner FxScn to the fundus F of the eye E. Preferably, the fixation scanner FxScn is positioned so that it lies in the pupil plane of the system, allowing the light pattern on the retina / fundus to move according to the desired fixation position.
[0088] Slit-lamp ophthalmoscope systems can operate in different imaging modes depending on the light source and wavelength-selective filtering elements used. True-color reflectance imaging (similar to the imaging observed by clinicians when examining the eye using a handheld or slit-lamp ophthalmoscope) is achieved when imaging the eye using a series of colored LEDs (red, blue, and green). The image for each color can be built progressively as each LED is turned on at each scanning position, or it can be captured individually as a whole. The three-color images can be combined to display a true-color image, or they can be displayed individually to highlight different features of the retina. The red channel best highlights the choroid, the green channel highlights the retina, and the blue channel highlights the anterior retina. Furthermore, light of specific frequencies (e.g., a single colored LED or laser) can be used to excite different fluorophores in the eye (e.g., autofluorescence), and the generated fluorescence can be detected by filtering out the excitation wavelength.
[0089] Fundus imaging systems can also provide infrared reflective images, for example, by using an infrared laser (or other infrared light source). The advantage of infrared (IR) mode is that the eye is not sensitive to IR wavelengths. This allows users to continuously capture images without disturbing the eye (e.g., in preview / alignment mode) to assist the user during instrument alignment. Furthermore, IR wavelengths increase the ability to penetrate tissue and provide improved visualization of choroidal structures. Additionally, fluorescein angiography (FA) and indocyanine green (ICG) angiography can be performed by collecting images after a fluorescent dye has been injected into the subject's bloodstream. For example, in FA (and / or ICG), a series of time-lapse images can be captured after a photoreactive dye (e.g., a fluorescent dye) has been injected into the subject's bloodstream. It is important to note that caution must be exercised, as fluorescent dyes can cause life-threatening allergic reactions in certain populations. High-contrast, grayscale images are captured by exciting the dye using selected specific light frequencies. As the dye flows through the eye, different parts of the eye emit bright light (e.g., fluorescence), making it possible to discern the progress of the dye and, consequently, the blood flow through the eye.
[0090] Optical coherence tomography system
[0091] Typically, optical coherence tomography (OCT) uses low-coherence light to produce two-dimensional (2D) and three-dimensional (3D) internal views of biological tissues. OCT is capable of in vivo imaging of retinal structures. OCT angiography (OCTA) produces blood flow information, such as blood flow from vessels within the retina. Examples of OCT systems are provided in U.S. Patents 6,741,359 and 9,706,915, and examples of OCTA systems can be found in U.S. Patents 9,700,206 and 9,759,544, all of which are incorporated herein by reference in their entirety. Exemplary OCT / OCTA systems are provided herein.
[0092] Figure 12A generalized frequency-domain optical coherence tomography (FD-OCT) system is illustrated for collecting 3D image data of an eye suitable for use with this invention. The FD-OCT system OCT_1 includes a light source LtSrc1. Typical light sources include, but are not limited to, broadband light sources with short time-coherence lengths or swept-frequency laser sources. The beam from the light source LtSrc1 is typically routed through an optical fiber Fbr1 to illuminate a sample, such as an eye E; a typical sample is tissue within the human eye. The light source LrSrc1 can be, for example, a broadband light source with a short time-coherence length in the case of spectral domain OCT (SD-OCT), or a wavelength-tunable laser source in the case of swept-frequency source OCT (SS-OCT). A scanning beam can be used, typically between the output of the optical fiber Fbr1 and the sample E, such that the beam (dashed line Bm) scans laterally across the sample area to be imaged. The beam from the scanner Scnr1 can pass through a scanning lens SL and an ophthalmic lens OL and be focused onto the sample E being imaged. The scanning lens SL can receive a beam of light from the scanner Scnr1 at multiple incident angles and produce substantially collimated light, which the ophthalmic lens OL can then focus onto the sample. This example illustrates a scanning beam that needs to be scanned in two lateral directions (e.g., the x and y directions in the Cartesian plane) to scan the desired field of view (FOV). An example of this is a point-field OCT, which uses a point-field beam to scan the sample. Thus, the scanner Scnr1 is illustratively shown as comprising two sub-scanners: a first sub-scanner Xscn for scanning the point-field beam across the sample in a first direction (e.g., the horizontal x direction); and a second sub-scanner Yscn for scanning the point-field beam across the sample in a second direction (e.g., the vertical y direction). If the scanning beam is a line-field beam (e.g., a line-field OCT), it can sample the entire line portion of the sample at once, so only one scanner is needed to scan the line-field beam across the sample to span the desired FOV. If the scanning beam is a full-field beam (e.g., a full-field OCT), no scanner is needed, and the full-field beam can be applied to the entire desired FOV at once.
[0093] Regardless of the type of beam used, the light scattered from the sample (e.g., sample light) is collected. In this example, the scattered light returning from the sample is collected into the same fiber Fbr1 used to send the light for illumination. The reference light from the same light source LtSrc1 travels along a separate path, in this case involving fiber Fbr2 and a back reflector RR1 with adjustable optical delay. Those skilled in the art will recognize that a transmission reference path can also be used, and the adjustable delay can be placed in the sample or in the reference arm of the interferometer. The collected sample light is combined with the reference light, for example in a fiber coupler Cplr1, to form an optical interference in an OCT photodetector Dtctr1 (e.g., a photodetector array, digital camera, etc.). Although a single fiber port is shown leading to detector Dtctr1, those skilled in the art will recognize that various designs of the interferometer can be used for balanced or unbalanced detection of the interference signal. The output from detector Dtctr1 is provided to a processor (e.g., an internal or external computing device) Cmp1, which converts the observed interference into depth information of the sample. Depth information can be stored in memory associated with processor Cmp1 and / or displayed on a display (e.g., computer / electronic display / screen) Scn1. Processing and storage functions can be located within the OCT instrument, or functions can be offloaded (e.g., executed on) to an external processor (e.g., an external computing device), and the collected data can be transferred to that processor. Figure 20 An example of a computing device (or computer system) is shown. This unit can be dedicated to data processing or to performing other tasks that are very general and not specific to the OCT device. The processor (computing device) Cmp1 may include, for example, a field-programmable gate array (FPGA), a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a graphics processing unit (GPU), a system-on-a-chip (SoC), a central processing unit (CPU), a general-purpose graphics processing unit (GPGPU), or a combination thereof, which may use one or more host processors and / or one or more external computing devices to perform some or all of the processing steps in a serial and / or parallel manner.
[0094] The sample and reference arms in the interferometer can be composed of bulk optics, fiber optics, or hybrid bulk optics systems, and can have different architectures, such as those based on Michelson, Mach-Zehnder, or common path designs as known to those skilled in the art. The beams used herein should be interpreted as any carefully oriented optical path. Instead of a mechanical scanning beam, an optical field can illuminate a one-dimensional or two-dimensional region of the retina to generate OCT data (see, for example, U.S. Patent 9332902; D. Hillmann et al., “Holoscopy-Holographic Optical Coherence Tomography,” Optics Letters, 36(13):2390 2011; Y. Nakamura et al., “High-Speed Three Dimensional Human Retinal Imaging by Line Field Spectral Domain Optical Coherence Tomography,” Optics Express, 15(12):7103 2007; Blazkiewicz et al., “Signal-To-Noise Ratio Study of Full-Field Fourier-Domain Optical Coherence Tomography,” Applied Optics, 44(36):7722 (2005)). In time-domain systems, the reference arm needs to have an adjustable optical delay to produce interference. Balanced detection systems are typically used in TD-OCT and SS-OCT systems, while spectrometers are used at the detection port of SD-OCT systems. The invention described herein can be applied to any type of OCT system. Various aspects of this invention can be applied to any type of OCT system or other types of ophthalmic diagnostic systems and / or multiple ophthalmic diagnostic systems, including but not limited to fundus imaging systems, visual field testing devices, and scanning laser polarimeters.
[0095] In Fourier domain optical coherence tomography (FD-OCT), each measurement is a real-valued spectral interferogram (Sj(k)). The real-valued spectral data typically undergoes several post-processing steps, including background subtraction and dispersion correction. The Fourier transform of the processed interferogram generates a complex-valued OCT signal output. The absolute value |Aj| of this complex OCT signal reveals the scattering intensity distribution for different path lengths, and therefore the scattering as a function of depth (z-direction) in the sample. Similarly, phase can also be extracted from the complex-valued OCT signal. The scattering distribution as a function of depth is called an axial scan (A-scan). A set of A-scans measured at adjacent locations in a sample produces a cross-sectional image (tomogram or B-scan) of the sample. Sets of B-scans collected at different lateral locations on the sample constitute a data volume or cube. For a specific data volume, the term fast axis refers to the scanning direction along a single B-scan, while slow axis refers to the axis along which multiple B-scans are collected. The term "cluster scan" can refer to a single cell or block of data generated by repeated acquisitions at the same (or substantially the same) location (or region) for analyzing motion contrast, which can be used to identify blood flow. A cluster scan can consist of multiple A-scans or B-scans collected at approximately the same location on the sample at relatively short time intervals. Because the scans in a cluster scan belong to the same region, the static structure remains relatively unchanged between scans within the cluster scan, while the motion contrast between scans that meet predetermined criteria can be identified as blood flow.
[0096] Various methods for generating B-scans are known in the art, including, but not limited to: along the horizontal or x-direction, along the vertical or y-direction, along the diagonal of the x and y directions, or in a circular or spiral pattern. A B-scan can be in the xz dimension, but can be any cross-sectional image including the z dimension. Figure 13 The image shown is an example OCT B scan image of a normal human retina. An OCT B scan of the retina provides a view of the retinal tissue structure. For illustrative purposes, Figure 13 Various typical retinal layers and their boundaries are identified. The identified retinal boundary layers include (from top to bottom): Internal Limiting Membrane (ILM) Layer1, Retinal Nerve Fiber Layer (RNFL or NFL) Layer2, Ganglion Cell Layer (GCL) Layer3, Internal Plumular Layer (IPL) Layer4, Inner Nuclear Layer (INL) Layer5, Outer Plumular Layer (OPL) Layer6, Outer Nuclear Layer (ONL) Layer7, Connection point between the outer segment (OS) and inner segment (IS) of the photoreceptor (indicated by reference character Layer8), External or External Membrane (ELM or OLM) Layer9, Retinal Pigment Epithelium (RPE) Layer10, and Bruch's Membrane (BM) Layer11.
[0097] In OCT angiography or functional OCT, analytical algorithms can be applied to OCT data collected at the same or substantially the same location on the sample at the same time or at different times (e.g., cluster scans) to analyze motion or flow (see, for example, U.S. Patent Publications 2005 / 0171438, 2012 / 0307014, 2010 / 0027857, 2012 / 0277579 and U.S. Patent No. 6,549,801, all of which are incorporated herein by reference in their entirety). OCT systems can use any of a variety of OCT angiography processing algorithms (e.g., motion contrast algorithms) to identify blood flow. For example, motion contrast algorithms can be applied to intensity information derived from image data (intensity-based algorithms), phase information from image data (phase-based algorithms), or complex image data (complex-based algorithms). A frontal image is a 2D projection of 3D OCT data (e.g., by averaging the intensity of each individual A-scan so that each A-scan defines a pixel in the 2D projection). Similarly, a frontal vascular system image is an image that displays motion contrast signals, where the data dimension corresponding to depth (e.g., along the z-direction of the A-scan) is typically displayed as a single representative value (e.g., a pixel in a 2D projected image) by summing or integrating all or isolated portions of the data (see, for example, U.S. Patent No. 7,301,644, the entire contents of which are incorporated herein by reference). An OCT system that provides angiographic imaging capabilities may be referred to as an OCT angiography (OCTA) system.
[0098] Figure 14 An example of a frontal vascular system image is shown. After processing the data using any motion contrast technique known in the art to enhance motion contrast, pixel ranges corresponding to a given tissue depth from the surface of the internal limiting membrane (ILM) in the retina can be added to generate a frontal (e.g., frontal view) image of the vascular system. Figure 15 An exemplary B-scan image of the vascular system (OCTA) is shown. As illustrated, structural information may be well defined because blood flow may pass through multiple retinal layers, making their definition less precise than in a structural OCTB scan, such as... Figure 13As shown. Nevertheless, OCTA provides a non-invasive technique for imaging the microvascular system of the retina and choroid, which can be crucial for the diagnosis and / or monitoring of various pathologies. For example, OCTA can be used to identify diabetic retinopathy by recognizing microaneurysms, neovascular complexes, and quantifying avascular and non-perfused areas of the fovea. Furthermore, OCTA has shown good concordance with fluorescein angiography (FA), a more traditional but more covert technique that requires dye injection to observe vascular flow in the retina. In addition, in dry age-related macular degeneration, OCTA has been used to monitor a general reduction in choroidal capillary flow. Similarly, in wet age-related macular degeneration, OCTA can provide qualitative and quantitative analysis of the choroidal neovascular membrane. OCTA has also been used to investigate vascular occlusion, such as assessing unperfused areas and the integrity of superficial and deep neural plexuses.
[0099] Neural Networks
[0100] As discussed above, this invention can utilize neural network (NN) machine learning (ML) models. For completeness, a general discussion of neural networks is provided herein. This invention can use any of the following neural network architectures, individually or in combination. A neural network or neural network is a network of interconnected neurons (nodes), where each neuron represents a node in the network. Groups of neurons can be arranged hierarchically, with the output of one layer fed forward to the next layer in a multilayer perceptron (MLP) arrangement. An MLP can be understood as a feedforward neural network model that maps a set of input data to a set of output data.
[0101] Figure 16 An example of a multilayer perceptron (MLP) neural network is shown. Its structure may include multiple hidden (e.g., inner) layers HL1 to HLn, which map an input layer InL (receiving a set of inputs (or vector inputs) in_1 to in_3) to an output layer OutL, which produces a set of outputs (or vector outputs), such as out_1 and out_2. Each layer can have any given number of nodes, which are exemplarily shown as circles within each layer in this document. In this example, the first hidden layer HL1 has two nodes, while hidden layers HL2, HL3, and HLn each have three nodes. Generally, the deeper the MLP (e.g., the more hidden layers in the MLP), the greater its learning capacity. The input layer InL receives vector inputs (shown as a three-dimensional vector consisting of in_1, in_2, and in_3) and can apply the received vector inputs to the first hidden layer HL1 in the sequence of hidden layers. The output layer OutL receives the output from the last hidden layer (e.g., HLn) in the multilayer model, processes its inputs, and produces a vector output result (exemplarily shown as a two-dimensional vector consisting of out_1 and out_2).
[0102] Typically, each neuron (or node) produces a single output, which is fed forward to neurons in the immediately following layer. However, each neuron in a hidden layer can receive multiple inputs, either from the input layer or from the output of the neuron in the preceding hidden layer. Generally, each node can apply a function to its inputs to generate an output for that node. Nodes in hidden layers (such as learning layers) can apply the same function to their respective inputs to produce their respective outputs. However, some nodes, such as those in the input layer InL, receive only one input and may be passive, meaning they simply relay the value of their single input to their output; for example, they provide a copy of their input to their output, as indicated by the dashed arrow within the node in the input layer InL.
[0103] For the purpose of explanation, Figure 17 A simplified neural network consisting of an input layer InL', a hidden layer HL1', and an output layer OutL' is shown. The input layer InL' is shown as having two input nodes i1 and i2, which receive inputs Input_l and Input_2 respectively (e.g., the input nodes of layer InL' receive a two-dimensional input vector). The input layer InL' feeds forward to a hidden layer HL1' with two nodes h1 and h2, which in turn feeds forward to an output layer OutL' with two nodes o1 and o2. The interconnections or links between neurons (shown as solid arrows) have weights w1 to w8. Typically, in addition to the input layers, nodes (neurons) can receive the outputs of the nodes immediately preceding them as inputs. Each node can compute its output (e.g., node weights w9, w10, w11, w12 corresponding to nodes h1, h2, o1, and o2 respectively) by multiplying each of its inputs by the corresponding interconnection weights for each input, summing the products of its inputs, adding (or multiplying) a constant defined by another weight or bias that may be associated with that particular node, and then applying a nonlinear or logarithmic function to the result. The nonlinear function can be called an activation function or a transfer function. Several activation functions are known in the art, and the choice of a particular activation function is not critical to this discussion. However, it is important to note that the operation of an ML model or the behavior of a neural network depends on the weight values, which can be learned so that the neural network provides the desired output for a given input.
[0104] During the training or learning phase, the neural network learns (e.g., is trained to determine) appropriate weight values to achieve the desired output for a given input. Before training the neural network, an initial (e.g., random and optionally non-zero) value, such as a random number seed, can be assigned individually to each weight. Various methods for assigning initial weights are known in the art. The weights are then trained (optimized) so that, for a given training vector input, the neural network produces an output close to the desired (predetermined) training vector output. For example, the weights can be progressively adjusted over thousands of iterations using a technique called backpropagation. In each loop of backpropagation, the training input (e.g., a vector input or a training input image / sample) is fed forward through the neural network to determine its actual output (e.g., a vector output). The error of each output neuron or output node is then calculated based on the actual neuron output and the target training output of that neuron (e.g., a training output image / sample corresponding to the current training input image / sample). Then, a backpropagation through the neural network (in the direction from the output layer back to the input layer) updates the weights according to the effect of each weight on the overall error so that the output of the neural network is closer to the desired training output. This loop is then repeated until, for a given training input, the actual output of the neural network is within an acceptable error range of the desired training output. As is understandable, each training input may require multiple backpropagation iterations before reaching the desired error range. Typically, an epoch refers to one backpropagation iteration across all training samples (e.g., one forward pass and one backward pass), so training a neural network may require many epochs. Generally, the larger the training set, the better the performance of the trained ML model, so various data augmentation methods can be used to increase the size of the training set. For example, when the training set includes corresponding pairs of training input and training output images, the training images can be divided into multiple corresponding image segments (or patches). Corresponding patches from the training input and training output images can be paired to define multiple training patch pairs from one input / output image pair, which expands the training set. However, training on a large training set places high demands on computational resources, such as memory and data processing resources. The computational requirements can be reduced by dividing the large training set into multiple mini-batches, where the mini-batch size defines the number of training samples in one forward / backward pass. In this case, one epoch may include multiple mini-batches. Another problem is that the NN may overfit the training set, thus reducing its ability to generalize from a specific input to different inputs. Overfitting can be mitigated by creating an ensemble of neural networks or by randomly dropping nodes from the neural network during training, which effectively removes the dropped nodes from the network. Various dropout adjustment methods, such as inverse dropout, are known in the art.
[0105] It is important to note that the operations of a trained neural network (NN) are not direct algorithms for operational / analysis steps. In fact, when a trained NN receives input, that input is not analyzed in the traditional sense. Instead, regardless of the subject or nature of the input (e.g., a vector defining a real-time image / scan or a vector defining some other entity, such as a demographic description or activity record), the input will be constructed using the same predefined architecture as the trained neural network (e.g., the same node / layer arrangement, trained weights and biases, predefined convolution / deconvolution operations, activation functions, pooling operations, etc.), and it may be unclear how the architecture of the trained network will produce its output. Furthermore, the values of the trained weights and biases are not deterministic and depend on many factors, such as the amount of time the neural network is used for training (e.g., the number of epochs in training), the random initial values of the weights before training begins, the computer architecture of the machine training the NN, the selection of training samples, the distribution of training samples across multiple mini-batches, the choice of activation functions, the choice of error functions to modify the weights, and even if training is interrupted on one machine (e.g., with the first computer architecture) and completed on another machine (e.g., with a different computer architecture). The key issue is that the reasons why trained ML models achieve certain outputs are not yet clear, and extensive research is underway to try to determine the factors upon which ML model outputs are based. Therefore, the processing of real-time data by neural networks cannot be simplified to simple step-by-step algorithms. Instead, its operation depends on its training architecture, training sample set, training sequence, and various circumstances during the training of the ML model.
[0106] In summary, the construction of a neural network (NN) machine learning model can include a learning (or training) phase and a classification (or operation) phase. During the learning phase, the neural network can be trained for a specific purpose and can be provided with a set of training examples, including training (sample) inputs and training (sample) outputs, and optionally a set of validation examples to test the progress of the training. During this learning process, various weights associated with the nodes and node interconnections in the neural network are incrementally adjusted to reduce the error between the actual output of the neural network and the desired training output. In this way, multilayer feedforward neural networks (such as those discussed above) can enable any measurable function to approximate any desired accuracy. The result of the learning phase is a machine learning (ML) model that has been learned (e.g., trained). In the operation phase, a set of test inputs (or real-time inputs) can be submitted to the learned (trained) ML model, which can apply what it has learned to produce output predictions based on the test inputs.
[0107] Like Figure 16 and 17A conventional neural network, a convolutional neural network (CNN), also consists of neurons with learnable weights and biases. Each neuron receives input, performs an operation (such as a dot product), and optionally follows a non-linear path. However, a CNN might receive raw image pixels at one end (e.g., the input) and provide a classification (or category) score at the other end (e.g., the output). Because CNNs expect images as input, they are optimized for capacity (e.g., the pixel height and width of the image and the image depth, such as color depth, such as RGB depth defined by three colors: red, green, and blue). For example, a CNN layer might be optimized for neurons arranged in a 3D plane. Neurons in a CNN layer might also be connected to small regions of the layer preceding them, rather than being connected to all neurons in a fully connected CNN. The final output layer of a CNN can reduce the entire image to a single vector (classification) arranged along the depth dimension.
[0108] Figure 18 An example convolutional neural network architecture is provided. A convolutional neural network can be defined as a sequence of two or more layers (e.g., layers 1 through N), where each layer can include a (image) convolution step, a (result) weighted sum step, and a nonlinear function step. Convolution can be performed on its input data by applying filters (or kernels), for example, over a moving window across the input data, to generate feature maps. Each layer and its components can have different predefined filters (from a filter bank), weights (or weighting parameters), and / or function parameters. In this example, the input data is an image with a given pixel height and width, which can be the raw pixel values of the image. In this example, the input image is shown as a depth image with three color channels RGB (red, green, and blue). Optionally, the input image can be preprocessed in various ways, and the preprocessed results can be used to replace or supplement the original input image. Some examples of image preprocessing include: retinal angiography segmentation, color space transformation, adaptive histogram equalization, connected component generation, etc. Within a layer, dot products can be computed between given weights and the small regions they connect to in the input capacity. Many ways to configure a CNN are known in the art, but as an example, layers can be configured to apply element-wise activation functions, such as a max(0,x) threshold at zero. Pooling functions (e.g., along the xy direction) can be performed to downsample the capacity. Fully connected layers can be used to determine the classification output and produce a one-dimensional output vector, which has been found to be useful for image recognition and classification. However, for image segmentation, a CNN would need to classify each pixel. Since each CNN layer tends to downsample the input image, another stage is needed to upsample the image back to its original resolution. This can be achieved by applying a transposed convolution (or deconvolution) stage TC, which typically does not use any predefined interpolation methods but instead has learnable parameters.
[0109] Convolutional neural networks have been successfully applied to many computer vision problems. As explained above, training a CNN typically requires a large training dataset. The U-Net architecture is based on CNNs and can usually be trained on a smaller training dataset than a traditional CNN.
[0110] Figure 19 An example U-Net architecture is illustrated. This exemplary U-Net includes an input module (or input layer or stage) that receives an input U-in of any given size (e.g., an input image or image patch). For illustrative purposes, the image size of any stage or layer is indicated within a box representing the image; for example, the input module encloses the number "128×128" to indicate that the input image U-in consists of 128×128 pixels. The input image can be a fundus image, an OCT / OCTA frontal B-scan image, etc. However, it should be understood that the input can be of any size or dimension. For example, the input image can be a multi-channel image (e.g., an RGB color image), a monochrome image, a volumetric image, etc. The input image passes through a series of processing layers, each illustrated with exemplary sizes, but these sizes are for illustrative purposes only and will depend on, for example, the image size, convolutional filters, and / or pooling stages. This architecture consists of a shrinking path (exemplarily consisting of four encoding modules in this paper) and a subsequent expanding path (exemplarily consisting of four decoding modules in this paper), along with copy and pruning links between the corresponding modules / stages (e.g., CC1 to CC4). Each module / stage copies the output of an encoding module in the shrinking path and connects it (e.g., appends it) to the upconversion input of the corresponding decoding module in the expanding path. This produces the characteristic U-shape from which the architecture derives its name. Optionally, for computational reasons, a "bottleneck" module / stage (BN) can be positioned between the shrinking and expanding paths. The bottleneck BN may consist of two convolutional layers (with batch normalization and optional dropout).
[0111] A contracting path is similar to an encoder and typically captures contextual (or feature) information using feature maps. In this example, each encoding module in the contracting path may include two or more convolutional layers, illustratively indicated by the asterisk “*”, followed by a max-pooling layer (e.g., a downsampling layer). For example, the input image U-in is illustratively shown as undergoing two convolutional layers, each with 32 feature maps. It is understood that each convolutional kernel produces a feature map (e.g., the output of a convolutional operation with a given kernel is an image commonly referred to as a “feature map”). For example, the input U-in undergoes a first convolution, which applies 32 convolutional kernels (not shown) to produce an output consisting of 32 corresponding feature maps. However, as is known in the art, the number of feature maps produced by a convolutional operation can be adjusted (up or down). For example, the number of feature maps can be reduced by averaging feature map groups, discarding some feature maps, or other known feature map reduction methods. In this example, the first convolution is followed by a second convolution, the output of which is limited to 32 feature maps. Another way to envision feature maps is to consider the output of a convolutional layer as a 3D image whose 2D dimensions are given by the listed XY plane pixel dimensions (e.g., 128×128 pixels), and whose depth is given by the number of feature maps (e.g., 32 planar image depths). Following this analogy, the output of the second convolution (e.g., the output of the first encoding module in the shrinking path) can be described as a 128×128×32 image. The output from the second convolution then undergoes a pooling operation, which reduces the 2D dimensions of each feature map (e.g., the X and Y dimensions can each be halved). The pooling operation can be embodied in a downsampling operation, as indicated by the down arrow. Various pooling methods are known in the art, such as max pooling, and the specific pooling method is not critical to this invention. The number of feature maps may double with each pooling operation, starting with 32 feature maps in the first encoding module (or block), 64 in the second, and so on. Thus, the shrinking path forms a convolutional network consisting of multiple encoding modules (or stages or blocks). As a typical convolutional network, each encoding module provides at least one convolutional stage, followed by an activation function (e.g., a rectified linear unit (ReLU) or a sigmoid layer) (not shown) and a max-pooling operation. Typically, the activation function introduces non-linearity into the layer (e.g., to help avoid overfitting), receives the layer's results, and determines whether to "activate" the output (e.g., determining whether the value of a given node meets a predefined criterion to forward the output to the next layer / node). In summary, shrinking paths generally reduce spatial information while increasing feature information.
[0112] The expansion path resembles a decoder and, among other things, provides localization and spatial information to the results of the contraction path, although downsampling and any max pooling are performed during the contraction phase. The expansion path comprises multiple decoding modules, each concatenating its current upconverted input with the output of the corresponding encoding module. In this way, features and spatial information are combined in the expansion path through a series of upconvolutions (e.g., upsampling, transposed convolutions, or deconvolutions) and concatenations with high-resolution features from the contraction path (e.g., via CC1 to CC4). Thus, the output of the deconvolutional layer is concatenated with the corresponding (optionally cropped) feature map from the contraction path, followed by two convolutional layers and an activation function (with optional batch normalization).
[0113] The output of the last expansion block in the expansion path can be fed into another processing / training block or layer, such as a classifier block, which can be trained together with the U-Net architecture. Alternatively or additionally, the output of the last upsampling block (at the end of the expansion path) can be submitted to another convolutional operation (e.g., output convolution) before producing its output U-out, as shown by the dashed arrow. The kernel size of the output convolution can be chosen to reduce the dimension of the last upsampling block to the desired size. For example, just before reaching the output convolution, each pixel of the neural network may have multiple features, which can provide a 1×1 convolutional operation to combine these multiple features into a single output value for each pixel at the pixel-by-pixel level.
[0114] Computing devices / systems
[0115] Figure 20 An example computer system (or computing device or computer apparatus) is illustrated. In some implementations, one or more computer systems may provide the functionality described or illustrated herein and / or perform one or more steps of one or more methods described or illustrated herein. The computer system may take any suitable physical form. For example, the computer system may be an embedded computer system, a system-on-a-chip (SOC), a single-board computer system (SBC) (such as, for example, a computer on a module (COM) or a system on a module (SOM)), a desktop computer system, a laptop or notebook computer system, a computer system grid, a mobile phone, a personal digital assistant (PDA), a server, a tablet computer system, an augmented / virtual reality device, or a combination of two or more of these. Where appropriate, the computer system may reside in the cloud, which may include one or more cloud components in one or more networks.
[0116] In some implementations, the computer system may include a processor Cpnt1, memory Cpnt2, storage Cpnt3, input / output (I / O) interface Cpnt4, communication interface Cpnt5, and bus Cpnt6. The computer system may also optionally include a display Cpnt7, such as a computer monitor or screen.
[0117] Processor Cpnt1 includes hardware for executing instructions, such as those that constitute a computer program. For example, processor Cpnt1 may be a central processing unit (CPU) or general-purpose computing on a graphics processing unit (GPGPU). Processor Cpnt1 may retrieve (or fetch) instructions from internal registers, internal caches, memory Cpnt2, or storage Cpnt3, decode and execute instructions, and write one or more results to internal registers, internal caches, memory Cpnt2, or storage Cpnt3. In certain embodiments, processor Cpnt1 may include one or more internal caches for data, instructions, or addresses. Processor Cpnt1 may include one or more instruction caches and one or more data caches, such as for storing data tables. Instructions in the instruction cache may be copies of instructions in memory Cpnt2 or storage Cpnt3, and the instruction cache may accelerate the retrieval of these instructions by processor Cpnt1. Processor Cpnt1 may include any suitable number of internal registers and may include one or more arithmetic logic units (ALUs). Processor Cpnt1 may be a multi-core processor; or may include one or more processors Cpnt1. Although this disclosure describes and illustrates a particular processor, this disclosure considers any suitable processor.
[0118] Memory Cpnt2 may include main memory for storing instructions for processor Cpnt1 to execute during processing or to save temporary data. For example, the computer system may load instructions or data (e.g., data tables) from memory Cpnt3 or from another source (e.g., another computer system) into memory Cpnt2. Processor Cpnt1 may load instructions and data from memory Cpnt2 into one or more internal registers or internal caches. To execute instructions, processor Cpnt1 may retrieve and decode instructions from internal registers or internal caches. During or after instruction execution, processor Cpnt1 may write one or more results (which may be intermediate or final results) to internal registers, internal caches, memory Cpnt2, or memory Cpnt3. Bus Cpnt6 may include one or more memory buses (each of which may include an address bus and a data bus) and may couple processor Cpnt1 to memory Cpnt2 and / or memory Cpnt3. Optionally, one or more memory management units (MMUs) facilitate data transfer between processor Cpnt1 and memory Cpnt2. Memory Cpnt2 (which may be fast volatile memory) may include random access memory (RAM), such as dynamic RAM (DRAM) or static RAM (SRAM). Memory Cpnt3 may include long-term or high-capacity storage for data or instructions. Memory Cpnt3 may be internal or external to the computer system and includes one or more of the following: disk drives (e.g., hard disk drives, HDDs, or solid-state drives SSDs), flash memory, ROM, EPROM, optical disks, magneto-optical disks, magnetic tape, Universal Serial Bus (USB) accessible drives, or other types of non-volatile memory.
[0119] The I / O interface Cpnt4 can be software, hardware, or a combination of both, and includes one or more interfaces (e.g., serial or parallel communication ports) for communicating with I / O devices, enabling communication with a person (e.g., a user). For example, I / O devices may include a keyboard, keypad, microphone, monitor, mouse, printer, scanner, speaker, still camera, stylus, tablet, touchscreen, trackball, camera, other suitable I / O devices, or a combination of two or more of these.
[0120] The communication interface Cpnt5 provides a network interface for communicating with other systems or networks. The communication interface Cpnt5 may include a Bluetooth interface or other types of packet-based communication. For example, the communication interface Cpnt5 may include a network interface controller (NIC) and / or a wireless NIC or a wireless adapter for communicating with a wireless network. The communication interface Cpnt5 can provide communication with Wi-Fi networks, ad hoc networks, personal area networks (PANs), wireless PANs (e.g., Bluetooth WPANs), local area networks (LANs), wide area networks (WANs), metropolitan area networks (MANs), cellular telephone networks (e.g., GSM networks), the Internet, or a combination of two or more of these.
[0121] The Cpnt6 bus can provide communication links between the aforementioned components of a computing system. For example, the Cpnt6 bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), HyperTransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an InfiniBand bus, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCIe) bus, a Serial Advanced Technology Accessory (SATA) bus, a Video Electronics Standards Association Local (VLB) bus, or other suitable buses, or a combination of two or more of these.
[0122] Although this disclosure describes and illustrates a particular computer system having a particular number of particular components in a particular arrangement, this disclosure contemplates any suitable computer system having any suitable number of any suitable components in any suitable arrangement.
[0123] In this document, one or more computer-readable non-transitory storage media may include one or more semiconductor-based or other integrated circuits (ICs) (such as field-programmable gate arrays (FPGAs) or application-specific integrated circuits (ASICs)), hard disk drives (HDDs), hybrid hard disk drives (HHDs), optical disks, optical disk drives (ODDs), magneto-optical disks, magneto-optical drives, floppy disks, floppy disk drives (FDDs), magnetic tape, solid-state drives (SSDs), RAM drives, secure digital cards or drives, any other suitable computer-readable non-transitory storage media, or any suitable combination of two or more of these (where appropriate). Where appropriate, computer-readable non-transitory storage media may be volatile, non-volatile, or a combination of volatile and non-volatile.
[0124] Although the invention has been described in conjunction with several specific embodiments, it will be apparent to those skilled in the art that many further alternatives, modifications, and variations will be apparent from the foregoing description. Therefore, the invention described herein is intended to include all such alternatives, modifications, applications, and variations that may fall within the spirit and scope of the appended claims.
Claims
1. A method for analyzing optical coherence tomography (OCT) data, comprising: OCT data is collected using an OCT system, and the OCT data includes multiple A scans; A series of metrics, including the light attenuation coefficient, are extracted from each individual A-scan; A set of images is defined based on the extracted metrics, and each image defines pathological feature data, where the pathology is map-like atrophy; A multi-channel image is defined based on this set of images, wherein each metric defines a separate corresponding channel for each pixel of the multi-channel image; The multi-channel images are submitted to a machine learning model, which is trained to identify one or more pathologies based on the pathological feature data. Display or store identified pathologies for future treatment.
2. The method of claim 1, wherein the stored OCT data is volume data, and each image in the set of images is a two-dimensional image.
3. The method of claim 1 or 2, wherein one or more pixels in the image are defined based on the relative distance from the corresponding A scan to a predefined ophthalmic landmark.
4. The method of claim 3, wherein the pixel is based on the distance from each A scan to the central fovea.
5. The method according to any one of claims 1 to 4, further comprising accessing additional imaging data of one or more other imaging modes different from OCT, wherein the multi-channel image comprises one or more image channels respectively based on the one or more other imaging modes.
6. The method of claim 5, wherein one or more other images are based on one or more of fundus images, autofluorescence images, fluorescein angiography images, OCT angiography images, and visual field test images.
7. The method of claim 5, wherein the machine learning model is further trained using non-image data.
8. The method of claim 7, wherein the non-image data includes patient demographic data.
9. The method according to any one of claims 1 to 8, further comprising: Acquire visual field function data, wherein the multi-channel image includes at least one image channel based on the visual field function data.
10. The method according to any one of claims 1 to 9, further comprising: The metrics extracted from each A scan will be classified into corresponding metric groups with a one-to-one correspondence.
11. The method of claim 10, wherein each channel of the multi-channel image is based on a corresponding metric group.
12. The method according to any one of claims 1 to 11, wherein the measure extracted from each A scan is associated with the same pathological type.
13. The method according to any one of claims 1 to 12, wherein: The OCT data includes OCT structural data and OCT angiography (OCTA) flow data; This series of metrics includes OCT base metrics extracted from OCT structured data and OCT base metrics extracted from OCT streaming data; This set of images includes OCT-based base images and OCTA-based base images; and The multi-channel image is based on the OCT base image and the OCT base image.
14. The method according to any one of claims 1 to 13, wherein the machine learning model identifies regions in the multi-channel image where the pathology exists based on a combination of pathological feature data provided by individual channels of each pixel of the multi-channel image.
15. The method of claim 14, wherein each A scan is mapped to pixels in the multi-channel image, and the identified areas with pathological features are mapped to the accessed OCT data.
16. The method according to any one of claims 1 to 15, wherein: This series of measures includes one or more of the following: sub-RPE (retinal pigment epithelium) reflectance, internal RPE reflectance, retinal thickness, and choroidal capillary flow.
17. The method according to any one of claims 1 to 16, wherein the machine learning model is embodied by a neural network.
18. The method of claim 17, wherein the neural network is a U-Net architecture.
19. The method according to any one of claims 1 to 18, wherein each channel in the multi-channel image is a color channel.
20. The method of claim 19, wherein the multi-channel image is a frontal image.
21. A system for analyzing optical coherence tomography (OCT) data, comprising: OCT system; A processor configured to perform the method of any one of claims 1 to 20.