Method and system for digital staining of microscopic images using deep learning
By training with deep neural networks, virtual stained microscopic images are generated using IHC and FLIM images, solving the problem of efficient imaging and diagnosis of unstained tissue samples, and achieving rapid and accurate pathological diagnosis and image processing.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- RGT UNIV OF CALIFORNIA
- Filing Date
- 2020-12-22
- Publication Date
- 2026-06-26
AI Technical Summary
Existing technologies require time-consuming and laborious manual staining processes in tissue microscopy imaging, and it is difficult to effectively utilize endogenous fluorescence features for efficient diagnosis, especially in pathology and biological sciences where imaging and diagnosis of unstained tissue samples present challenges.
The system employs deep neural networks for training, utilizing microscopic images from immunohistochemical (IHC) staining and fluorescence lifetime (FLIM) images to generate virtual stained microscopic images. Alternatively, it can autofocus and perform virtual staining using defocused and focused images from an incoherent microscope, or use various chemical staining agents for image conversion to achieve virtual destaining and staining agent transformation.
It enables the generation of microscopic images similar to those obtained by chemical staining without the need for traditional chemical staining, improving imaging efficiency, simplifying the pathological diagnosis process, enhancing image quality and focusing accuracy, and enabling rapid and accurate identification of histopathological features.
Smart Images

Figure CN114945954B_ABST
Abstract
Description
[0001] Related applications
[0002] This application claims priority to U.S. Provisional Patent Application No. 63 / 058,329, filed July 29, 2020, and U.S. Provisional Patent Application No. 62 / 952,964, filed December 23, 2019, the entire contents of which are incorporated herein by reference. Priority is claimed pursuant to 35 U.S.SC § 119 and any other applicable regulations. Technical Field
[0003] This technical field generally relates to methods and systems for imaging unstained (i.e., label-free) tissues. Specifically, this technical field relates to microscopic methods and systems that utilize deep neural network learning to digitally or virtually stain images of unstained or label-free tissues. Deep learning in neural networks is a class of machine learning algorithms used to digitally stain images of label-free tissue sections into microscopic images equivalent to stained or labeled samples of the same sample. Background Technology
[0004] Microscopic imaging of tissue samples is a fundamental tool for diagnosing a wide range of diseases and forms the backbone of pathology and bioscience. The clinically established gold standard image of tissue sections is the result of a time-consuming and laborious process involving formalin-fixed paraffin-embedded (FFPE) tissue specimens, slicing them into thin sections (typically about 2 μm to 10 μm), labeling / staining them, and mounting them on a glass slide, followed by microscopic imaging using, for example, a bright-field microscope. All these steps use a variety of reagents and introduce irreversible effects on the tissue. Recent efforts have focused on altering this workflow using different imaging modalities. Attempts have been made to image fresh, non-paraffin-embedded tissue samples using nonlinear microscopy based on, for example, two-photon fluorescence, second harmonic generation, third harmonic generation, and Raman scattering. Other attempts have used controlled supercontinuum light sources to acquire multimodal images for chemical analysis of fresh tissue samples. These methods require the use of ultrafast lasers or supercontinuum light sources, which may not be readily available in most setups, and require relatively long scan times due to the weak light signal. In addition to these, other microscopy methods have emerged that image unsectioned tissue samples by using UV excitation on stained samples or by utilizing the fluorescence emission of biological tissue at short wavelengths.
[0005] In fact, fluorescence signals have created rare opportunities for imaging tissue samples by using fluorescence emitted from endogenous fluorophores. These endogenous fluorescence features have been shown to carry useful information that can be mapped to the functional and structural properties of biological specimens, and have therefore been widely used for diagnostic and research purposes. A major focus of these efforts is the spectroscopic study of the relationship between different biomolecules and their structural properties under different conditions. Some of these well-characterized biological components include vitamins (e.g., vitamin A, riboflavin, thiamine), collagen, coenzymes, fatty acids, and more.
[0006] While some of the techniques discussed above possess unique capabilities to differentiate cell types and subcellular components in tissue samples using various contrast mechanisms, pathologists and tumor classification software are typically trained to examine tissue samples stained with the "gold standard" for diagnostic decisions. Inspired in part by this, some of these techniques have been enhanced to produce pseudo-hematoxylin and eosin (H&E) images based on a linear approximation that correlates the fluorescence intensity of the image with the dye concentration per tissue volume using an empirically determined constant representing the average spectral response of various dyes embedded in the tissue. These methods also utilize exogenous staining to enhance the contrast of the fluorescence signal in order to generate a virtual H&E image of the tissue sample. Summary of the Invention
[0007] In one embodiment, a method for generating a virtual stained microscopic image of a sample includes: providing a deep neural network trained by image processing software using one or more processors of a computing device, wherein the trained deep neural network is trained using a plurality of matched immunohistochemical (IHC) stained microscopic images or image patches and corresponding fluorescence lifetime (FLIM) microscopic images or image patches of the same sample obtained prior to IHC staining. A fluorescence lifetime (FLIM) image of the sample is obtained using a fluorescence microscope and at least one excitation light source, and the FLIM image of the sample is input into the trained deep neural network. The trained deep neural network outputs a virtual stained microscopic image of the sample, which is substantially equivalent to a corresponding image of the same sample that has already been immunohistochemically stained (IHC).
[0008] In another embodiment, a method for virtual autofocusing of microscopic images of a sample obtained using an incoherent microscope includes: providing a trained deep neural network executed by image processing software using one or more processors of a computing device, wherein the trained deep neural network is trained using multiple pairs of defocused and / or focused microscopic images or image patches used as input images to the deep neural network and corresponding or matching focused microscopic images or image patches of the same sample obtained using an incoherent microscope as ground truth images for training the deep neural network. Defocused or focused images of the sample are obtained using an incoherent microscope. The defocused or focused images of the sample obtained from the incoherent microscope are then input to the trained deep neural network. The trained deep neural network outputs an output image with improved focus, which substantially matches the focused image (ground truth) of the same sample obtained by the incoherent microscope.
[0009] In another embodiment, a method for generating a virtual stained microscopic image of a sample using incoherent microscopy includes: providing a trained deep neural network executed by image processing software using one or more processors of a computing device, wherein the trained deep neural network is trained using multiple pairs of defocused and / or focused microscopic images or image patches, the multiple pairs of defocused and / or focused microscopic images or image patches serving as input images to the deep neural network, and each matching a corresponding focused microscopic image or image patch of the same sample obtained after a chemical staining process using incoherent microscopy to generate a ground truth image for training the deep neural network. A defocused or focused image of the sample is obtained using incoherent microscopy, and the defocused or focused image of the sample obtained from the incoherent microscopy is input to the trained deep neural network. The trained deep neural network outputs an output image of the sample, which has improved focus and is virtually stained to be substantially similar to and matches the chemically stained focused image of the same sample obtained after a chemical staining process using incoherent microscopy.
[0010] In another embodiment, a method for generating a virtual stained microscopic image of a sample includes: providing a trained deep neural network executed by image processing software using one or more processors of a computing device, wherein the trained deep neural network is trained using multiple pairs of stained microscopic images or image patches, the multiple pairs of stained microscopic images or image patches being virtually stained or chemically stained with a first staining agent type by at least one algorithm, and each matching a corresponding stained microscopic image or image patch of the same sample being virtually stained or chemically stained with another different staining agent type by at least one algorithm, the multiple pairs of stained microscopic images or image patches constituting a ground truth image for training the deep neural network to convert an input image of histochemical or virtual staining using the first staining agent type into an output image of virtual staining using a second staining agent type. An input image of histochemical or virtual staining of the sample using the first staining agent type is obtained. The input image of histochemical or virtual staining of the sample is input to the trained deep neural network, which converts the input image of staining using the first staining agent type into an output image of virtual staining using the second staining agent type. The output image of the trained deep neural network is virtually stained to be substantially similar to and matched to the chemically stained image of the same sample obtained by incoherent microscopy after the chemical staining process using a second type of staining agent.
[0011] In another embodiment, a method for generating virtual stained microscopic images of samples with multiple different staining agents using a single trained deep neural network includes: providing a trained deep neural network executed by image processing software using one or more processors of a computing device, wherein the trained deep neural network is trained using multiple matched chemically stained microscopic images or image patches of the same sample, obtained prior to chemical staining, as ground truth images for training the deep neural network and used as input images for training the deep neural network. A fluorescence image of the sample is obtained using a fluorescence microscope and at least one excitation light source. One or more class condition matrices are applied to adjust the trained deep neural network. The fluorescence image of the sample, along with one or more class condition matrices, is input to the trained deep neural network. The trained and adjusted deep neural network outputs virtual stained microscopic images of samples with one or more different staining agents, wherein the output image or a sub-region thereof is substantially equivalent to a corresponding microscopic image or image sub-region of the same sample subjected to histochemical staining with the corresponding one or more different staining agents.
[0012] In another embodiment, a method for generating virtual stained microscopic images of samples with multiple different staining agents using a single trained deep neural network includes: providing a trained deep neural network executed by image processing software using one or more processors of a computing device, wherein the trained deep neural network is trained using multiple matching chemically stained microscopic images or image patches of the same sample obtained prior to chemical staining with multiple chemical staining agents and corresponding microscopic images or image patches of the same sample obtained prior to chemical staining. An input image of the sample is obtained using a microscope. One or more class condition matrices are applied to adjust the trained deep neural network. The input image of the sample, together with one or more class condition matrices, is input to the trained deep neural network. The trained and adjusted deep neural network outputs a virtual stained microscopic image of the sample with one or more different staining agents, wherein the output image or a sub-region thereof is substantially equivalent to a corresponding microscopic image or image sub-region of the same sample subjected to histochemical staining with corresponding one or more different staining agents.
[0013] In another embodiment, a method for generating a virtual destained microscopic image of a sample includes: providing a first trained deep neural network executed by image processing software using one or more processors of a computing device, wherein the first trained deep neural network is trained using a plurality of matched chemically stained microscopic images or image patches as training inputs to the deep neural network and corresponding unstained microscopic images or image patches of one or more identical samples obtained prior to chemical staining, the plurality of matched chemically stained microscopic images or image patches constituting the ground truth during the training of the deep neural network. Microscopic images of the chemically stained sample are obtained using a microscope. Images of the chemically stained sample are input to the first trained deep neural network. The first trained deep neural network outputs a virtual destained microscopic image of the sample, which is substantially equivalent to a corresponding image of the same sample obtained before any chemical staining or without any chemical staining. Attached Figure Description
[0014] Figure 1 A system for generating a digitally / virtually stained output image of a sample from an unstained microscope image of the sample, according to one embodiment, is illustrated schematically.
[0015] Figure 2 A schematic representation of a deep learning-based digital / virtual histological staining procedure using fluorescence images of unstained tissue is shown.
[0016] Figures 3A to 3H The digital / virtual staining results are shown, matched with chemically stained H&E samples. The first two (2) columns ( Figure 3A and Figure 3EThe image shows an autofluorescence image of an unstained salivary gland tissue section (used as input to a deep neural network), and the third column ( Figure 3C and Figure 3G The last column shows the digital / virtual staining results. Figure 3D and Figure 3H The image shown is a bright-field image of the same tissue section after the histochemical staining process. Figure 3C and Figure 3D The assessments both demonstrated infiltration of tumor cells into islands within the subcutaneous fibro-adipose tissue. Note that the nucleoli (…) were clearly understood in both panels. Figure 3C and Figure 3D The arrows in the image show the kernel details that distinguish it from chromatin texture. Similarly, in... Figure 3G and Figure 3H In the H&E staining, invasive squamous cell carcinoma was indicated. Edema-mucous changes in the adjacent stroma were clearly identifiable in both staining methods / panels. Figure 3G and Figure 3H The asterisk (marked in the image) indicates a connective tissue hyperplasia response.
[0017] Figures 4A to 4H The digital / virtual staining results matched with chemically stained Jones samples are shown. The first two (2) columns ( Figure 4A and Figure 4E The image shows an autofluorescence image of an unstained kidney tissue section (used as input to a deep neural network), and the third column ( Figure 4C and Figure 4G The last column shows the digital / virtual staining results. Figure 4D and Figure 4H The image shown is a bright-field image of the same tissue section after the histochemical staining process.
[0018] Figures 5A to 5P Digital / virtual staining results are shown, matched with Masson's trichrome stain for liver and lung tissue sections. The first two (2) columns show unstained liver tissue sections used as input to the deep neural network (first and second rows – Figure 5A , Figure 5B , Figure 5E , Figure 5F ) and unstained lung tissue sections (third and fourth rows – Figure 5I , Figure 5J , Figure 5M , Figure 5N Autofluorescence images of ). The third column ( Figure 5C , Figure 5G , Figure 5K , Figure 5O This shows the digital / virtual staining results for these tissue samples. The last column ( Figure 5D , Figure 5H , Figure 5L , Figure 5P The image shown is a bright-field image of the same tissue section after the histochemical staining process.
[0019] Figure 6A A graph of the combined loss function for random initialization and transfer learning initialization versus the number of iterations is shown. Figure 6A This demonstrates how transfer learning can achieve superior convergence. A novel deep neural network is initialized using weights and biases learned from salivary gland tissue slices to enable virtual staining of thyroid tissue using H&E. Compared to random initialization, transfer learning achieves faster convergence and also results in lower local minima.
[0020] Figure 6B The network output images at different stages of the learning process for both random initialization and transfer learning are shown to better illustrate the effect of transfer learning in transforming the presented method into a new tissue / stain combination.
[0021] Figure 6C The corresponding bright-field image of H&E chemical staining is shown.
[0022] Figure 7A Virtual staining (H&E staining) of skin tissue using only the DAPI channel is shown.
[0023] Figure 7B The image shows a virtual staining (H&E staining) of skin tissue using DAPI and Cy5 channels. Cy5 refers to a far-red fluorescently labeled cyanine dye used to label biomolecules.
[0024] Figure 7C The corresponding histological staining (i.e., H&E chemical staining) is shown.
[0025] Figure 8 The process of field matching and registration of autofluorescence images of unstained tissue samples after chemical staining with bright-field images of the same sample is shown.
[0026] Figure 9 The training process of a virtual staining network using GAN is illustrated schematically.
[0027] Figure 10 A generative adversarial network (GAN) architecture for generators and discriminators according to one embodiment is shown.
[0028] Figure 11AThis paper demonstrates a machine learning-based virtual IHC staining using autofluorescence and fluorescence lifetime images of unstained tissue. After training with a deep neural network model, the trained deep neural network rapidly outputs images of virtually stained tissue in response to autofluorescence lifetime images of unstained tissue sections, bypassing the standard IHC staining procedure used in histology.
[0029] Figure 11B A generative adversarial network (GAN) architecture for generator (G) and discriminator (D) used in fluorescence lifetime imaging (FLIM) is shown according to one embodiment.
[0030] Figure 12 The results of digital / virtual staining matched with HER2 staining agent are shown. The leftmost two columns show the autofluorescence intensity (first column) and lifetime images (second column) of unstained human breast tissue sections (used as input to a deep neural network 10) for two excitation wavelengths, and the middle column shows the results of virtual staining. The last column shows a bright-field image of the same tissue section after the IHC staining procedure.
[0031] Figure 13 A comparison is shown between standard (prior art) autofocus methods and the virtual focusing method disclosed herein. Top right: Standard autofocus methods require acquiring multiple images to work with an autofocus algorithm that selects the most focused image based on predefined criteria. In contrast, the disclosed method requires only a single aberration image to virtually refocus it using a trained deep neural network.
[0032] Figure 14 A demonstration of the refocusing capability of a post-imaging computational autofocusing method for individual imaging planes of a sample is shown, where the value of z represents the focal length relative to the focal plane, which lies within z = 0 (as a reference plane). The network's refocusing capability can be qualitatively and quantitatively evaluated using a structural similarity index that appears in the upper left corner of each panel and compares that image with a reference focused image (at z = 0).
[0033] Figure 15 A schematic diagram of a machine learning-based approach is shown, which uses class conditions to apply multiple staining agents to a microscope image to generate a virtual stained image.
[0034] Figure 16The illustration shows a display, according to one embodiment, for displaying a graphical user interface (GUI) of an image generated from a trained deep neural network. Different regions or sub-regions of the image are highlighted, and these regions or sub-regions can be virtually stained with different dyes. This can be done manually, automatically by means of image processing software, or some combination of both (e.g., a hybrid method).
[0035] Figure 17 An example of the staining agent microstructure is shown. Diagnostic specialists can manually label sections of unstained tissue. These labels are used by a network to stain different regions of the tissue with the desired staining agent. Co-registration images of histochemically stained H&E tissue are shown for comparison.
[0036] Figures 18A to 18G An example of dye mixing is shown. Figure 18A An autofluorescence image is shown as input to a machine learning algorithm. Figure 18B Co-registration images of H&E tissues with histochemical staining are shown for comparison. Figure 18C Kidney tissue dummy stained using H&E (without Jones) is shown. Figure 18D Kidney tissue dummy stained using Jones staining agent (without H&E) is shown. Figure 18E Kidney tissue dummy stained using H&E:Jones stain with an input class condition ratio of 3:1 is shown. Figure 18F Kidney tissue dummy stained using H&E:Jones stain with an input class condition ratio of 1:1 is shown. Figure 18G Kidney tissue dummy stained using H&E:Jones stain with an input class condition ratio of 1:3 is shown.
[0037] Figure 19A and Figure 19B Another implementation of virtual destaining (e.g., stained image to destaining image) of images obtained from a microscope is shown. An optional machine learning-based operation is also shown to restain the image using a chemical stain different from the one obtained from the microscope. This illustrates virtual destaining and restaining.
[0038] Figure 20A A virtual staining network is shown that can generate both H&E and special staining agent images.
[0039] Figure 20B An implementation of a style transfer network (e.g., a CycleGan network) for transferring styles to images of virtual and / or histochemical staining is shown.
[0040] Figure 20CThe method for training the staining transformation network (10) is shown. stainTN The scheme involves randomly assigning staining agents to virtually stained H&E tissues, or assigning them to tissues traversing a K=8 staining agent network (10). styleTN One of the images in the same field of view after the neural network. The tissue that is perfectly matched with the desired special staining agent (in this case, PAS) is used as the ground value for training the neural network.
[0041] Figure 21 The diagram illustrates the transformations performed by the individual networks during the training phase of a style transformation (e.g., CycleGAN).
[0042] Figure 22A The structure of the first generator network G(x) for style transformation is shown.
[0043] Figure 22B The structure of the second generator network F(y) used for style transformation is shown. Detailed Implementation
[0044] Figure 1An embodiment of a system 2 for outputting a digitally stained image 40 from an input microscopic image 20 of a sample 22 is schematically illustrated. As explained herein, the input image 20 is a fluorescence image 20 of a sample 22 (such as tissue in one embodiment) that has not been stained or labeled with a fluorescent staining agent or marker. That is, the input image 20 is an autofluorescence image 20 of the sample 22, wherein the fluorescence emitted by the sample 22 is the result of one or more endogenous fluorophores or other endogenous emitters of frequency-shifted light contained therein. Frequency-shifted light is light emitted at a different frequency (or wavelength) than the incident frequency (or wavelength). The endogenous fluorophores or endogenous emitters of frequency-shifted light may include molecules, compounds, complexes, molecular species, biomolecules, pigments, tissues, etc. In some embodiments, the input image 20 (e.g., the original fluorescence image) undergoes one or more linear or nonlinear preprocessing operations selected from contrast enhancement, contrast reversal, and image filtering. The system includes a computing device 100 containing one or more processors 102 and image processing software 104 (e.g., a convolutional neural network as explained in one or more embodiments herein) incorporated into a trained deep neural network 10. As explained herein, the computing device 100 may include a personal computer, laptop computer, mobile computing device, remote server, etc., but may use other computing devices (e.g., devices incorporating one or more graphics processing units (GPUs) or other application-specific integrated circuits (ASICs)). GPUs or ASICs may be used to accelerate training and final image output. The computing device 100 may be associated with or connected to a monitor or display 106 for displaying the digitally stained image 40. The display 106 may be used to display a graphical user interface (GUI) that a user uses to display and view the digitally stained image 40. In one embodiment, a user may be able to manually trigger or switch between multiple different digital / virtual stains for a particular sample 22 using, for example, the GUI. Alternatively, triggering or switching between different stains may be performed automatically by the computing device 100. In a preferred embodiment, the trained deep neural network 10 is a convolutional neural network (CNN).
[0045] For example, in a preferred embodiment described herein, a GAN model is used to train the trained deep neural network 10. In the GAN-trained deep neural network 10, two models are used for training. A generative model is used to capture the data distribution, while a second model estimates the probability that a sample comes from the training data rather than from the generative model. Details about GANs can be found in Goodfell et al., Generative Advancesarial Nets., Advances in Neural Information Processing Systems, 27, pp. 2672–2680 (2014), which is incorporated herein by reference. The same or different computing devices 100 can perform network training of the deep neural network 10 (e.g., GAN). For example, in one embodiment, a personal computer can be used to train the GAN, but such training can be quite time-consuming. To accelerate the training process, one or more dedicated GPUs can be used for training. As explained herein, such training and testing are performed on GPUs obtained from commercially available graphics cards. Once the deep neural network 10 has been trained, it can be used or executed on different computing devices 110, which may include computing devices with fewer computing resources for the training process (but GPUs may also be integrated into the execution of the trained deep neural network 10).
[0046] Image processing software 104 can be implemented using Python and TensorFlow, but other software packages and platforms can be used. The trained deep neural network 10 is not limited to a specific software platform or programming language, and can be executed using any number of commercially available software languages or platforms. Image processing software 104, which integrates or coordinates with the trained deep neural network 10, can run in a local environment or a cloud-based environment. In some implementations, some functions of image processing software 104 may be run using one specific language or platform (e.g., image normalization), while the trained deep neural network 10 may be run using another specific language or platform. Nevertheless, both operations are performed by image processing software 104.
[0047] like Figure 1As shown, in one embodiment, the trained deep neural network 10 receives a single fluorescence image 20 of an unlabeled sample 22. In other embodiments, for example, when using multiple excitation channels (see the discussion of melanin herein), multiple fluorescence images 20 of the unlabeled sample 22 may be input to the trained deep neural network 10 (e.g., one image per channel). The fluorescence images 20 may include a wide-field-of-view fluorescence image 20 of the unlabeled tissue sample 22. Wide field of view is intended to indicate that a wide field of view (FOV) is obtained by scanning a smaller FOV, wherein a wide FOV is between 10 and 2,000 mm. 2 Within a certain size range. For example, a smaller FOV can be obtained using a scanning fluorescence microscope 110, which uses image processing software 104 to digitally stitch together smaller FOVs to create a wider FOV. For example, a wide FOV can be used to obtain whole-slide images (WSI) of sample 22. Fluorescence images are obtained using imaging device 110. For the fluorescence implementation described herein, a fluorescence microscope 110 may be included. The fluorescence microscope 110 includes at least one excitation source illuminating sample 22 and one or more image sensors (e.g., CMOS image sensors) for capturing fluorescence emitted by fluorophores or other endogenous emitters of frequency-shifted light contained in sample 22. In some embodiments, the fluorescence microscope 110 may include the ability to illuminate sample 22 using excitation light at multiple different wavelengths or wavelength ranges / bands. This can be achieved using multiple different light sources and / or different filter sets (e.g., standard UV or near-UV excitation / emission filter sets). Furthermore, in some embodiments, the fluorescence microscope 110 may include multiple filter sets that can filter different emission bands. For example, in some implementations, multiple fluorescence images 20 can be captured, with each fluorescence image captured in a different emission band using different filter sets.
[0048] In some embodiments, sample 22 may include a portion of tissue disposed on or within substrate 23. In some embodiments, substrate 23 may include an optically transparent substrate (e.g., a glass or plastic slide). Sample 22 may include tissue sections cut into thin sections using a microtome or similar device. Thin sections of tissue 22 may be considered weakly scattering phase objects with limited amplitude contrast modulation under bright-field illumination. Sample 22 may be imaged with or without a coverslip. The sample may involve frozen sections or paraffin (wax) sections. Tissue sample 22 may be fixed (e.g., using formalin) or unfixed. Tissue sample 22 may include mammalian (e.g., human or animal) tissue or plant tissue. Sample 22 may also include other biological samples, environmental samples, etc. Examples include particles, cells, organelles, pathogens, parasites, fungi, or other microscopic objects of interest (with micrometer-scale dimensions or smaller). Sample 22 may include smears of biological fluids or tissues. For example, these smears include blood smears, Papanicolaou smears, or Pap smears. As explained herein, in a fluorescence-based implementation, sample 22 includes one or more naturally occurring or endogenous fluorophores that emit fluorescence and are captured by the fluorescence microscopy apparatus 110. Most plant and animal tissues exhibit some degree of autofluorescence when excited with ultraviolet or near-ultraviolet light. Endogenous fluorophores may include, as exemplified, proteins such as collagen, elastin, fatty acids, vitamins, riboflavin, porphyrin, lipofuscin, and coenzymes (e.g., NAD(P)H). In some alternative implementations, exogenously added fluorescent labels or other exogenous light emitters may also be added (for training the deep neural network 10 or for testing new samples 12, or both). As explained herein, sample 22 may also contain other endogenous emitters of frequency-shifted light.
[0049] In response to input image 20, the trained deep neural network 10 outputs or generates a digitally stained or labeled output image 40. The digitally stained output image 40 is "stained" using the digital integration of the trained deep neural network 10 into the stained output image 40. In some embodiments, such as those involving tissue sections, the trained deep neural network 10 appears essentially equivalent to a skilled observer (e.g., a trained histopathologist) to a corresponding bright-field image of the same chemically stained tissue section sample 22. Indeed, as explained herein, experimental results obtained using the trained deep neural network 10 show that a trained pathologist can identify histopathological features using two staining techniques (chemical staining versus digital / virtual staining) and with high consistency between these techniques without a clearly preferred staining technique (virtual versus histology). The digital or virtual staining of the tissue section sample 22 appears as if the tissue section sample 22 had undergone histochemical staining, even though no such staining operation was performed.
[0050] Figure 2 The operation involved in a typical fluorescence-based implementation is illustrated schematically. For example... Figure 2 As shown, a sample 22, such as an unstained tissue section, is obtained. This can be obtained from a live tissue via biopsy B, etc. The unstained tissue section sample 22 is then fluorescently imaged using a fluorescence microscope 110, and a fluorescence image 20 is generated. This fluorescence image 20 is then input into a trained deep neural network 10, which rapidly outputs a digitally stained image 40 of the tissue section sample 22. This digitally stained image 40 closely resembles the appearance of a bright-field image of the same tissue section sample 22 after undergoing histochemical staining. Figure 2 (Using dashed arrows) illustrates the routine procedure of histochemical or immunohistochemical (IHC) staining 44 on tissue section sample 22, followed by routine bright-field microscopy 46 to generate a routine bright-field image 48 of the stained tissue section sample 22. Figure 2 As shown, the digitally stained image 40 is very similar to the actual chemically stained image 48. Similar resolution and color profiles are obtained using the digital staining platform described herein. Figure 1 As shown, the digitally colored image 40 can be shown or displayed on a computer monitor 106, but it should be understood that the digitally colored image 40 can be displayed on any suitable display (e.g., computer monitor, tablet computer, mobile computing device, mobile phone, etc.). A GUI can be displayed on the computer monitor 106, allowing the user to view the digitally colored image 40 and optionally interact with it (e.g., zoom, crop, highlight, mark, adjust exposure, etc.).
[0051] In one embodiment, fluorescence microscope 110 acquires a fluorescence lifetime image of an unstained tissue sample 22 and outputs an image 40 that matches well with a bright-field image 48 of the same field of view after IHC staining. Fluorescence lifetime imaging (FLIM) generates an image based on the difference in the excitation-state decay rate of a fluorescent sample. Thus, FLIM is a fluorescence imaging technique with contrast based on the lifetime or decay of an individual fluorophore. Fluorescence lifetime is generally defined as the average time a molecule or fluorophore remains in an excited state before returning to its ground state by emitting photons. Of all the inherent properties of unlabeled tissue samples, the fluorescence lifetime of endogenous fluorophores is one of the most informative channels for measuring the time a fluorophore remains in an excited state before returning to its ground state.
[0052] It is well known that the lifetime of endogenous fluorescent emitters, such as flavin adenine dinucleotide (FAD) and nicotinamide adenine dinucleotide (NAD+ or NADH), depends on an immersion chemical environment such as adequate oxygen, thus indicating physiological and biological changes within tissues that are not readily apparent in bright-field or fluorescence microscopy techniques. Although existing literature has confirmed a close correlation between lifetime changes in benign and cancerous tissues, a lack of cross-modal image transformation methods has been found that allow pathologists or computer software to diagnose disease in unlabeled tissues based on their trained color contrast. In this embodiment of the invention, a machine learning algorithm (i.e., a trained deep neural network 10) enables virtual IHC staining of unstained tissue samples 22 based on fluorescence lifetime imaging. Using this method, virtual staining can replace the laborious and time-consuming IHC staining procedure, which is significantly faster and allows for tissue preservation for further analysis.
[0053] In one embodiment, the trained neural network 10 is trained using a lifetime (e.g., decay time) fluorescence image 20 of the unstained sample 22, paired with a ground truth image 48 as a bright-field image of the same field of view after IHC staining. In another embodiment, the trained neural network 10 can also be trained using a combination of the lifetime fluorescence image 20 and the fluorescence intensity image 20. Once the neural network 10 has converged (i.e., been trained), it can be used to blindly infer new lifetime images 20 from unstained tissue samples 22 and convert or output them to equivalents of the stained bright-field image 40 without any parameter tuning, such as... Figure 11A and Figure 11B As shown.
[0054] To train the artificial neural network 10, a generative adversarial network (GAN) framework was used to perform virtual staining. The training dataset consisted of autofluorescence (endogenous fluorophore) lifetime images 20 of multiple tissue sections 22 at single or multiple excitation and emission wavelengths. The samples 22 were scanned using a standard fluorescence microscope 110 with photon counting capabilities, which outputs a fluorescence intensity image 20I and a lifetime image 20L for each field of view. The tissue samples 22 were also sent to a pathology laboratory for IHC staining and scanned by a bright-field microscope used to generate ground truth training images 48. Fluorescence lifetime images 20L and bright-field images 48 for the same field of view were paired. The training dataset consisted of thousands of such pairs 20L, 48, which were used as input and output for training the network 10, respectively. Typically, the artificial neural network model 10 converged on two Nvidia 1080Ti GPUs after approximately 30 hours. Once the neural network 10 converged, the method was able to perform virtual IHC staining on unlabeled tissue sections 22 in real time, such as… Figure 12As shown. Note that the deep neural network 10 can be trained using the fluorescence lifetime image 20L and the fluorescence intensity image 20I. This is in Figure 11B As shown in the image.
[0055] In another embodiment, a trained deep neural network 10a is provided, which acquires an aberration and / or defocus input image 20 and then outputs a corrected image 20a that substantially matches the focused image of the same field of view. A key step for high-quality and rapid microscopic imaging, such as of tissue sample 22, is autofocus. Autofocus is typically performed using a combination of optical and algorithmic methods. These methods are time-consuming because they image sample 22 at multiple depths of focus. The growing demand for higher-throughput microscopy necessitates making more assumptions about the profile of the specimen. In other words, the accuracy typically gained through multi-focus depth acquisition is sacrificed by assuming that the profile of the specimen is uniform across adjacent fields of view. This type of assumption often leads to image focusing errors. These errors may require re-imaging of the specimen, which is not always feasible in life science experiments, for example. In digital pathology, for instance, such focusing errors can delay the diagnosis of a patient's disease.
[0056] In this particular embodiment, for incoherent imaging modalities, a trained deep neural network 10a is used to perform post-imaging computational autofocus. Therefore, it can be used in conjunction with images obtained through fluorescence microscopy (e.g., fluorescence microscopy) and other imaging modalities. Examples include fluorescence microscopy, wide-field microscopy, super-resolution microscopy, confocal microscopy, confocal microscopy with single-photon or multiphoton-excited fluorescence, second-harmonic or high-harmonic generation fluorescence microscopy, light-sheet microscopy, FLIM microscopy, bright-field microscopy, dark-field microscopy, structured light illumination microscopy, total internal reflection microscopy, computational microscopy, polarizing microscopy, synthetic aperture-based microscopy, or phase-contrast microscopy. In some embodiments, the output of the trained deep neural network 10a generates a modified input image 20a that is focused or more focused than the original input image 20. This modified input image 20a with improved focus is then input to a separately trained deep neural network 10a, as described herein, for conversion from a first image modality to a second image modality (e.g., fluorescence microscopy to bright-field microscopy). For this purpose, the trained deep neural networks 10a and 10 are coupled together in a "daisy-chain" configuration, with the output of the trained autofocus neural network 10a serving as the input to the deep neural network 10 trained for digital / virtual staining. In another embodiment, the machine learning algorithm used to train the deep neural network 10 combines the autofocus function with the function of transforming an image from one microscope mode to another, as described herein. In the latter embodiment, two separately trained deep neural networks are not required. Instead, a single trained deep neural network 10a is provided to perform both virtual autofocus and digital / virtual staining. The functionality of both networks 10 and 10a is combined into a single network 10a. This deep neural network 10a follows the architecture described herein.
[0057] Whether this method is implemented in a single trained deep neural network 10a or multiple trained deep neural networks 10, 10a, it is even possible to generate a virtual stained image 40 using the defocused input image 20 and increase the scanning speed of the imaging sample 22. The deep neural network 10a is trained using images obtained at different focal depths, while the output (which, depending on the implementation, is either the input image 20 or the virtual stained image 40) is a focused image of the same field of view. Images used for training are acquired using a standard optical microscope. For training the deep neural network 10a, a "gold standard" or "ground truth" image is paired with various defocused or aberration images. The gold standard / ground truth image used for training may include a focused image of sample 22, for example, that can be identified by any number of focus standards (e.g., sharp edges or other features). The "gold standard" image may also include a depth-of-field extended image (EDOF), which is a composite focused image based on multiple images that provides a focused view with greater depth of field. For training the deep neural network 10a, some training images may themselves be focused images. A combination of defocused and focused images can be used to train the deep neural network 10a.
[0058] After the training phase, the deep neural network 10a can be used to refocus an aberration image from a single defocused image, such as... Figure 13 As shown, this is in contrast to standard autofocus techniques that require acquiring multiple images through multiple depth planes. For example... Figure 13 As shown, a single defocused image 20d obtained from microscope 110 is input into a trained deep neural network 10a and generates a focused image 20f. In one embodiment explained herein, this focused image 20f can then be input into the trained deep neural network 10. Alternatively, the functionality (virtual staining) of the deep neural network 10 can be combined with an autofocus function into a single deep neural network 10a.
[0059] To train the deep neural network 10a, a generative adversarial network (GAN) can be used to perform virtual focusing. The training dataset consists of autofluorescence (endogenous fluorophore) images of multiple tissue sections for multiple excitation and emission wavelengths. In another embodiment, the training images can be other microscope modalities (e.g., bright-field microscopes, super-resolution microscopes, confocal microscopes, light-sheet microscopes, FLIM microscopes, wide-field microscopes, dark-field microscopes, structured light illumination microscopes, computational microscopes, polarizing microscopes, synthetic aperture-based microscopes, or total internal reflection microscopes, as well as phase-contrast microscopes).
[0060] The sample was scanned using an Olympus microscope, and 21 layers of images with an axial spacing of 0.5 μm were stacked at each field of view (in other embodiments, different numbers of images can be obtained at different axial spacings). Defocused and focused images of the same field of view were paired. The training dataset consisted of thousands of such pairs, which were used as input and output for network training. Training for approximately 30,000 image pairs took approximately 30 hours on an Nvidia 2080 Ti GPU. After training the deep neural network 10a, the method was able to refocus images 20d of the sample 22 for multiple defocus distances into focused images 20f, as shown. Figure 14 As shown.
[0061] The autofocusing method can also be applied to thick specimens or sample 22, wherein network 10a can be trained to refocus on specific depth features of the specimen (e.g., the surface of a thick tissue section), thereby eliminating defocus scattering that significantly degrades image quality. The user can define various user-defined depths or planes. This can include the upper surface of sample 22, the mid-plane of sample 22, or the bottom surface of sample 22. The output of the trained deep neural network 10a can then be used as input to a second, independently trained virtual staining neural network 10, as explained herein, to virtually stain the label-free tissue sample 22. The output image 20f of the first trained deep neural network 10a is then input to the virtual staining trained neural network 10. In an alternative embodiment, a similar process as outlined above can be used to train a single neural network 10 capable of directly capturing a defocused image 20d from an incoherent microscope 110 (such as a fluorescence, bright-field, dark-field, or phase microscope) to directly output a virtual stained image 40 of the label-free sample 22, wherein the original image 20d (at the input of the same neural network) is defocused. Compared to the image modality of the incoherent microscope 110 that obtains a defocused image 20d, the virtually stained image 40 is similar to another image modality. For example, a defocused image 20d can be obtained using a fluorescence microscope, while the focused and digitally stained output image 40 is essentially similar to a brightfield microscope image.
[0062] In another embodiment, a machine learning-based framework is utilized, wherein a trained deep neural network 10 enables digital / virtual staining of sample 22 with multiple staining agents. Multiple histochemical virtual staining agents can be applied to an image using a single trained deep neural network 10. Furthermore, the method enables user-defined regions of interest to undergo specific virtual staining, as well as the mixing of multiple virtual staining agents (e.g., to generate other unique staining agent or staining combinations). For example, a graphical user interface (GUI) can be provided to allow users to spray or highlight specific regions of an image of unlabeled histological tissue using one or more virtual staining agents. The method uses a conditional convolutional neural network 10 to transform an input image consisting of one or more input images 20, which in one specific embodiment include an autofluorescence image 20 of an unlabeled tissue sample 22.
[0063] As an example, to demonstrate its practicality, a single trained deep neural network 10 was used to virtually stain images of unlabeled sections of tissue sample 20 using the following stains: hematoxylin and eosin (H&E) stain, hematoxylin, eosin, Jones silver stain, Masson trichrome stain, periodic acid-Schiff (PAS) stain, Congo red stain, Alcian blue stain, blue iron, silver nitrate, trichrome stain, Ziehl-Neelsen, Grocott hexamine silver (GMS) stain, Gram stain, acid stain, basic stain, silver stain, Nissl, Weigert stain, Golgi stain, Lucas fast blue stain, toluidine blue, Genta, Mallory trichrome stain, Gomori trichrome stain, and van der Rohe stain. Gieson, Giemsa, Sudan Black, Perls' Prussian Blue, Best's Carmine, Acridine Orange, Immunofluorescence Staining Agents, Immunohistochemical Staining Agents, Kinyoun Cold Staining Agent, Albert Staining, Flagella Staining, Spore Staining, Melanin, and Indian Ink.
[0064] This method can also be used to generate new staining agents, which are a combination of various virtual staining agents and staining of specific tissue microstructures using these trained staining agents. In yet another alternative implementation, image processing software can be used to automatically identify or segment regions of interest within an image of the unlabeled tissue sample 22. These identified or segmented regions of interest can be presented to the user for virtual staining or have already been stained by the image processing software. As an example, cell nuclei can be automatically segmented and “digitally” stained with a specific virtual staining agent without requiring identification by a pathologist or other operator.
[0065] In this embodiment, one or more autofluorescence images 20 of unlabeled tissue 22 are used as input to a deep neural network 10 trained on the network. This input is then transformed into an equivalent image 40 of a stained tissue section in the same field of view using a conditional generative adversarial network (c-GAN). (See [link to relevant documentation]) Figure 15 During network training, the class input into the deep neural network 10 is assigned the class of the corresponding ground truth image for that image. In one implementation, class conditions can be implemented as a set of "one-hot" encoding matrices with the same vertical and horizontal dimensions as the network input image. Figure 15 During training, the number of classes can be changed. Alternatively, multiple staining agents can be mixed by modifying the class encoding matrix M to use a mixture of multiple classes instead of a simple one-hot encoding matrix, thereby creating an output image 40 with unique staining agents that have features generated from various staining agents learned by the deep neural network 10. Figures 18A to 18G ).
[0066] Since the deep neural network 10 aims to learn the conversion from the autofluorescence image 20 of the unlabeled tissue sample 22 to the autofluorescence image of the stained sample (i.e., the gold standard), accurate FOV alignment is crucial. Furthermore, when more than one autofluorescence channel is used as input to the network 10, the individual filter channels must be aligned. To use four different staining agents (H&E, Masson's trichrome stain, PAS, and Jones' stain), image preprocessing and alignment were performed for each input and target image pair (training pair) from these four different staining datasets. The image preprocessing and alignment followed the procedures described herein. Figure 8 The global and local registration processes are illustrated. However, a key difference is that when multiple autofluorescence channels are used as network inputs (i.e., DAPI and TxRed as shown here), they must be aligned. Even when images from both channels are captured using the same microscope, the corresponding fields of view (FOVs) from the two channels are not precisely aligned, especially at the edges of the FOVs. Therefore, an elastic registration algorithm, as described herein, is used to precisely align multiple autofluorescence channels. The elastic registration algorithm matches the local features of the two channels (e.g., DAPI and TxRed) of the image by hierarchically dividing the image into increasingly smaller blocks while matching corresponding blocks. A transformation map is then calculated and applied to the TxRed image to ensure alignment with the corresponding image from the DAPI channel. Finally, the aligned image from both channels is aligned with a full-field digital slice containing both DAPI and TxRed channels.
[0067] At the end of the co-registration process, images 20 from one or more autofluorescence channels of unlabeled tissue sections are well aligned with corresponding bright-field images 48 of histochemically stained tissue sections 22. Before feeding these aligned pairs into the deep neural network 10 for training, full-field digital slides of DAPI and TxRed are normalized separately. This full-slide normalization is performed by subtracting the mean of the entire tissue sample and dividing it by the standard deviation between pixel values. After the training procedure, using class conditions, multiple virtual stains can be applied to image 20 using a single algorithm on the same input image 20. In other words, no additional network is required for each individual stain. A single trained neural network can be used to apply one or more digital / virtual stains to the input image 20.
[0068] Figure 16 A display showing a graphical user interface (GUI) for displaying an output image 40 from a trained deep neural network 10, according to one embodiment, is shown. In this embodiment, a list of tools (e.g., pointers, markers, erasers, circles, highlighters, etc.) is provided to the user for identifying and selecting certain regions of the output image 40 for virtual staining. For example, the user can use one or more tools to select certain regions or areas of tissue in the output image 40 for virtual staining. In this specific example, three regions are identified by hash lines (regions A, B, C) manually selected by the user. A palette of staining agents can be provided to the user to select a staining agent for staining these regions. For example, staining agent options (e.g., Masson's tricolor stain, Jones's stain, H&E) can be provided to the user for staining the tissue. The user can then select different regions to be stained with one or more of these staining agents. This leads to, for example, Figure 17 The microstructure output is shown. In a separate embodiment, image processing software 104 can be used to automatically identify or segment certain regions of the output image 40. For example, image segmentation and computer-generated mapping can be used to identify certain histological features in the imaging sample 22. For example, cell nuclei, certain cell types, or tissue types can be automatically identified by image processing software 104. These automatically identified regions of interest in the output image 40 can be stained manually and / or automatically using one or more staining agents / staining agent combinations.
[0069] In another embodiment, a mixture of multiple staining agents can be generated in the output image 40. For example, multiple staining agents can be mixed together in different ratios or percentages to produce unique staining agents or combinations of staining agents. (In this document...) Figures 18E to 18G The disclosure describes the use of staining ratios with different input class conditions (e.g., Figure 18EAn example of generating a virtual stained network output image 40 using a virtual 3:1 mixture of H&E: Jones staining agents. Figure 18F The dyes with a virtual mixture of H&E: Jones at a ratio of 1:1 are shown. Figure 18F The staining agent is shown as a virtual mixture of H&E: Jones at a ratio of 1:3.
[0070] While digital / virtual staining methods can be used to obtain fluorescence images of unlabeled sample 22, it should be understood that multi-staining digital / virtual staining methods can also be used for other microscopy imaging modalities. These include, for example, bright-field microscopic images of stained or unstained sample 22. In other examples, microscopes may include: single-photon fluorescence microscopy, multiphoton microscopy, second-harmonic generation microscopy, high-harmonic generation microscopy, optical coherence tomography (OCT) microscopy, confocal reflection microscopy, fluorescence lifetime microscopy, Raman spectroscopy microscopy, bright-field microscopy, dark-field microscopy, phase-contrast microscopy, quantitative phase microscopy, structured light illumination microscopy, super-resolution microscopy, light sheet microscopy, computational microscopy, polarizing microscopy, synthetic aperture-based microscopy, and total internal reflection microscopy.
[0071] Digital / virtual staining methods can be used with any number of staining agents, including, for example, hematoxylin and eosin (H&E) staining, hematoxylin, eosin, Jones silver staining, Masson trichrome staining, periodate-Schiff (PAS) staining, Congo red staining, Alcian blue staining, blue iron, silver nitrate, trichrome staining, Zynet, Grocott hexamine silver (GMS) staining, Gram staining, acid staining, and basic staining. Silver staining agents, Nissl, Weigert, Golgi, Lucas Quick Blue, Toluidine Blue, Genta, Mallory trichrome, Gomori trichrome, Vangissen, Gymsza, Sudan Black, Perls Prussian Blue, Best's Carmine, Acridine Orange, immunofluorescence staining agents, immunohistochemical staining agents, Kinyoun cold staining agent, Alsace staining agent, flagella staining agent, spore staining agent, melanin, and India Ink staining agent. The imaged sample 22 may include tissue sections or cell / cell structures.
[0072] In another embodiment, the trained deep neural networks 10', 10" are operable for virtual destaining (and optionally, virtual restaining of the samples with different staining agents). In this embodiment, a computing device 100 is provided for use by image processing software 104 (see [link to image processing software]). Figure 1 One or more processors 102 execute a first trained deep neural network 10', wherein multiple matched chemically stained microscopic images or image patches 80 are utilized. trainAnd the corresponding unstained true-value micrograph or image patch of the same sample obtained before chemical staining 82 GT To train the first training deep neural network 10' ( Figure 19A Therefore, in this embodiment, the ground truth image used to train the deep neural network 10' is an unstained (or unlabeled) image, while the training image is a chemically stained (e.g., IHC stained) image. In this embodiment, a microscopic image 84 of the chemically stained sample 12 (i.e., the sample 12 to be tested or imaged) is obtained using any type of microscope 110 described herein. test Then, images 84 of the chemically stained samples. test The input is fed into a trained deep neural network 10'. The trained deep neural network 10' then outputs a virtual destained microscopic image 86 of sample 12, which is substantially equivalent to the corresponding image of the same sample obtained without chemical staining (i.e., unstained or unlabeled).
[0073] Optionally, such as Figure 19B As shown, a computing device 100 is provided for use by image processing software 104 (see [link]). Figure 1 A second trained deep neural network 10” is executed by one or more processors 102, wherein multiple matched unstained or unlabeled microscopic images or image patches 88 are utilized. train and the corresponding true-value micrographs or image patches of the same sample obtained before chemical staining 90 GT To train the second deep neural network 10” Figure 19A In this embodiment, the true value of the staining corresponds to 80. train Different staining agents used in the images. Next, a virtual destained microscopic image 86 of sample 12 (i.e., sample 12 to be tested or imaged) is input into a second trained deep neural network 10". The second trained deep neural network 10" then outputs a stained or labeled microscopic image 92 of sample 12, which is substantially equivalent to the corresponding image of the same sample 12 obtained using different chemical staining agents.
[0074] For example, the staining agent can be converted from or to one of the following: hematoxylin and eosin (H&E) staining agent, hematoxylin, eosin, Jones silver staining agent, Masson trichrome staining agent, periodic acid-Schiff (PAS) staining agent, Congo red staining agent, Alcian blue staining agent, blue iron, silver nitrate, trichrome staining agent, Zinné, Grocott hexamine silver (GMS) staining agent, Gram staining agent, acid staining agent, basic staining agent, silver staining agent, Nissl, Weigert staining agent, Golgi staining agent, Lucas fast blue staining agent, toluidine blue, Genta, Mallory trichrome staining agent, Gomori trichrome staining agent, Vangelin, Gymsah, Sudan Black, Perls Prussian Blue, Best's Carmine, acridine orange, immunofluorescence staining agent, immunohistochemical staining agent, Kinyoun cold staining agent, Alcian staining agent, flagella staining agent, spore staining agent, melanin and Indian ink.
[0075] It should be understood that this implementation can be combined with machine learning-based training of both defocused and focused images. Therefore, the network (e.g., a deep neural network 10') can be trained to focus or eliminate optical aberrations in addition to destaining / restaining. Furthermore, for all the implementations described herein, in some cases, the input image 20 may have the same or substantially similar numerical aperture and resolution as the ground truth (GT) image. Alternatively, the input image 20 may have a lower numerical aperture and poorer resolution compared to the GT image.
[0076] Experiment - Digital staining of label-free tissues using automated fluorescence
[0077] Virtual staining of tissue samples
[0078] Different combinations of tissue section samples 22 and staining agents were used to test and validate the system 2 and method described herein. After training the CNN-based deep neural network 10, its inferences were blindly tested by feeding it autofluorescent images 20 of unlabeled tissue sections 22 that did not overlap with images used in the training or validation sets. Figures 4A to 4H The results show salivary gland tissue sections digitally / virtually stained to match a bright-field image 48 (i.e., a ground truth image) of H&E staining of the same sample 22. These results demonstrate the ability of System 2 to convert the fluorescence image 20 of unlabeled tissue section 22 into a bright-field equivalent image 40, showing the correct color scheme expected from H&E-stained tissue containing various components such as epithelial-like cells, nuclei, nucleoli, matrix, and collagen. Figure 3C and Figure 3DEvaluations of both revealed island-like invasive tumor cells within the subcutaneous fibro-adipose tissue using H&E staining. Note the nuclear details, including the distinction between nucleoli (arrows) and chromatin texture, clearly discernible in both panels. Similarly, in Figure 3G and Figure 3H In the study, H&E staining revealed invasive squamous cell carcinoma. In both staining agents, a connective tissue hyperplasia response with edematous mucinous changes (asterisk) in the adjacent stroma was clearly identifiable.
[0079] Next, the deep network 10 was trained to perform digital / virtual staining on other tissue types using two different stains (i.e., Jones hexamine silver stain (kidney) and Masson trichrome stain (liver and lung)). Figures 4A to 4H and Figures 5A to 5P The results of deep learning-based digital / virtual staining of these tissue sections 22 were summarized, showing excellent matching with bright-field images 48 of the same sample 22 captured after the histochemical staining process. These results demonstrate that the trained deep neural network 10 is capable of inferring staining patterns for different types of histological stains used for different tissue types from a single fluorescence image 20 of an unlabeled specimen (i.e., without any histochemical staining agent). Utilizing... Figures 3A to 3H The same overall conclusion was also confirmed by pathologists. Figure 4C and Figure 5G The neural network output image correctly reveals the corresponding hepatocytes, sinusoidal tubules, collagen, and fat droplets. Figure 5G The histological characteristics of ) and their appearance in chemical staining ( Figure 5D and Figure 5H The pattern was consistent with that in the bright-field image 48 of the same tissue sample 22 captured subsequently. Similarly, the same expert also confirmed... Figure 5K and Figure 5O The deep neural network output image 40 reported in the (lung) reveals histological features with consistent staining corresponding to blood vessels, collagen and alveolar cavities, as they appear in bright-field images 48 of the same tissue sample 22 imaged after chemical staining (Fig. 6L and Fig. 6P).
[0080] The output images 40 of digital / virtual staining from a trained deep neural network 10 were compared with standard histochemical staining images 48 for diagnosing various conditions on multiple types of tissues, including formalin-fixed paraffin-embedded (FFPE) or frozen sections. The results are summarized in Table 1 below. Analysis of fifteen (15) tissue sections by four certified pathologists (who were unfamiliar with the virtual staining technique) showed 100% non-major inconsistency, defined as no clinically significant difference in diagnosis among professional observers. "Diagnosis time" varied significantly among observers, from an average of 10 seconds / image for observer 2 to an average of 276 seconds / image for observer 3. However, intra-observer variability was very small for all observers except observer 2 and tended to be shorter for diagnosing virtual stained slide images 40, which were equal for both virtual slide images 40 and histologically stained slide images 48, i.e., approximately 10 seconds / image. These indicate very similar diagnostic utility between the two image modalities.
[0081] Table 1
[0082]
[0083]
[0084]
[0085] Blind evaluation of staining quality in whole-field digital sections (WSI)
[0086] After assessing the differences in tissue sections and staining agents, the capabilities of the Virtual Staining System 2 were tested in a dedicated staining histology workflow. Specifically, autofluorescence distribution imaging was performed on 15 unlabeled liver tissue sections and 13 unlabeled kidney tissue sections using a 20× / 0.75NA objective. All liver and kidney tissue sections were obtained from different patients and included both small biopsies and larger resections. All tissue sections were obtained from FFPE without coverslips. Following autofluorescence scanning, the tissue sections were histologically stained with Masson's trichrome stain (4 μm liver tissue sections) and Jones's stain (2 μm kidney tissue sections). The WSIs were then split into training and testing sets. For the liver slide group, 7 WSIs were used to train the virtual staining algorithm, and 8 WSIs were used for blinded testing; for the kidney slide group, 6 WSIs were used to train the algorithm, and 7 WSIs were used for testing. The research pathologists were unaware of the staining techniques used for each WSI and were instructed to apply a 1-4 numerical rating scale to the quality of different staining agents: 4 = Perfect, 3 = Very Good, 2 = Acceptable, 1 = Unacceptable. Furthermore, the research pathologists applied the same rating scale (1-4) only for specific liver features: nuclear details (ND), cytoplasmic details (CD), and extracellular fibrosis (EF). These results are summarized in Tables 2 (Liver) and 3 (Kidney) below (winners are bolded). The data indicate that pathologists were able to identify histopathological features using staining techniques and with a high degree of consistency between techniques without explicitly preferred staining techniques (virtual-pair histology).
[0087] Table 2
[0088]
[0089]
[0090] Table 3
[0091]
[0092] Quantization of network output image quality
[0093] Next, besides Figures 3A to 3H , Figures 4A to 4H , Figures 5A to 5PBeyond the visual comparison provided, the results of a deep neural network 10, trained by first calculating pixel-level differences between a bright-field image 48 of the chemically stained sample 22 and a digital / virtual stained image 40 synthesized using the deep neural network 10 without any markers / stains, are quantified. Table 4 below summarizes this comparison using the YCbCr color space, where the chromaticity components Cb and Cr fully define the color, and Y defines the lightness component of the image. The results of this comparison reveal that the average differences between the two sets of images are <~5% and <~16% for the chromaticity (Cb, Cr) and lightness (Y) channels, respectively. Next, the comparison is further quantified using a second metric, the Structural Similarity Index (SSIM), which is typically used to predict the score an observer would give an image relative to a reference image (Equation 8 in this paper). SSIM ranges between 0 and 1, where 1 defines a score for identical images. The results of this SSIM quantification are also summarized in Table 4, which well illustrates the strong structural similarity between the network output image 40 of the chemically stained sample and the bright-field image 48.
[0094] Table 4
[0095]
[0096] It should be noted that the bright-field image 48 of the chemically stained tissue sample 22 does not actually provide the true gold standard for the specific SSIM and YCbCr analysis of the network output image 40, because uncontrolled changes and structural alterations occur in the tissue during the histochemical staining process and the associated dehydration and cleaning steps. Another variation noted for some images is that the automated microscopy scanning software selects different autofocus planes for the two imaging modalities. All these variations pose some challenges to an absolute quantitative comparison of the two sets of images (i.e., the network output 40 of the unlabeled tissue versus the bright-field image 48 of the same tissue after the histological staining process).
[0097] Staining standardization
[0098] An interesting byproduct of the digital / virtual staining system 2 is staining normalization. In other words, the trained deep neural network 10 converges to a “common staining” scheme, where the variability in the histologically stained tissue image 48 is higher than that in the virtually stained tissue image 40. The staining of the virtual stain is merely a result of its training (i.e., the gold standard histological stain used during the training phase) and can be further adjusted based on the pathologist’s preferences by retraining the network with new stains. This “improved” training can be created from scratch or accelerated through transfer learning. This potential staining standard using deep learning can mitigate the negative impact of person-to-person variability at different stages of sample preparation, create a common basis across different clinical laboratories, enhance clinicians’ diagnostic workflows, and help develop new algorithms, such as automated tissue metastasis detection or grading of different types of cancer.
[0099] Transfer learning from other tissue-stain combinations
[0100] Using the concept of transfer learning, training procedures for new tissue and / or staining agent types can converge faster while also achieving improved performance, i.e., better local minima in the training cost / loss function. This means that a CNN model deep neural network 10 pre-learned from different tissue-staining agent combinations can be used to initialize the deep neural network 10 to statistically learn virtual staining for new combinations. Figures 6A to 6C The advantageous properties of this approach are illustrated: a new deep neural network 10 is trained to virtually stain autofluorescence images 20 of unstained thyroid tissue sections, and initialized with the weights and biases of another deep neural network 10 previously trained for H&E virtual staining of salivary glands. The progress of the loss metric based on the number of iterations used during the training phase clearly demonstrates that, compared to the same network architecture trained from scratch, the new thyroid deep network 10 uses a loss metric such as… Figure 6A The random initialization shown converges quickly to a lower minimum value. Figure 6B The output images 40 of the thyroid network 10 at different stages of its learning process were compared, demonstrating the impact of rapid adaptation through transfer learning on the proposed approach for novel tissue / stain combinations. After training phases of, for example, ≥6,000 iterations, network output images 40 revealed irregularly shaped cell nuclei with nuclear grooves and pale chromatin, suggestive of papillary thyroid carcinoma; the cells also showed mild to moderate eosinophilic cytoplasm, and the fibrovascular core at the network output images showed increased inflammatory cells, including lymphocytes and plasma cells. Figure 6C The corresponding bright-field image 48 of the H&E chemical staining is shown.
[0101] Using multiple fluorescence channels at different resolutions
[0102] The method using the trained deep neural network 10 can be combined with other excitation wavelengths and / or imaging modalities to enhance its inference performance on different tissue components. For example, an attempt was made to detect melanin in skin tissue sections using virtual H&E staining. However, melanin was not clearly identified in the network output because it exhibited a weak autofluorescence signal at the DAPI excitation / emission wavelength measured in the experimental system described herein. One potential approach to enhance the autofluorescence of melanin is to image them when the samples are in an oxidizing solution. However, a more practical alternative is to use an additional autofluorescence channel derived from, for example, a Cy5 filter (excitation 628 nm / emission 692 nm), which allows for enhanced and accurate inference of the melanin signal in the trained deep neural network 10. By training the network 10 using both the DAPI and Cy5 autofluorescence channels, the trained deep neural network 10 was able to successfully determine where melanin is present in the sample, such as... Figures 7A to 7C As shown. In contrast, when only the DAPI channel is used ( Figure 7A Network 10 cannot identify regions containing melanin (which appear white). In other words, Network 10 uses additional autofluorescence information from the Cy5 channel to distinguish melanin from background tissue. For in Figures 7A to 7C The results shown employ image 20 acquired using a low-resolution objective (10× / 0.45NA) for the Cy5 channel to supplement the high-resolution DAPI scan (20× / 0.75NA), as it is assumed that the most essential information is found in the high-resolution DAPI scan and additional information (e.g., the presence of melanin) can be encoded using the low-resolution scan. In this way, two distinct channels are used, one of which is used at a lower resolution to identify melanin. This may require multiple scans of sample 22 using a fluorescence microscope 110. In yet another multichannel implementation, multiple images 20 may be fed into a trained deep neural network 10. For example, this could include combining the raw fluorescence image with one or more images that have undergone linear or nonlinear image preprocessing, such as contrast enhancement, contrast inversion, and image filtering.
[0103] The system 2 and method described herein demonstrate the ability to digitally / virtually stain unlabeled tissue sections 22 using a supervised deep learning technique that uses a single fluorescence image 20 of the sample captured by a standard fluorescence microscope 110 and a filter set as input (in other embodiments, multiple fluorescence images 20 are input when multiple fluorescence channels are used). This statistical learning-based approach has the potential to restructure clinical workflows in histopathology and can benefit from a variety of imaging modalities, such as fluorescence microscopy, nonlinear microscopy, holographic microscopy, stimulated Raman scattering microscopy, and optical coherence tomography, to potentially provide a digital alternative to standard practices for histochemical staining of tissue samples 22. Here, the method is demonstrated using fixed unstained tissue samples 22 to provide a meaningful comparison with chemically stained tissue samples, which is necessary for training the deep neural network 10 and blind testing the performance of the network output against clinically approved methods. However, the proposed deep learning-based method is broadly applicable to different types and states of samples 22, including fresh, unsectioned tissue samples (e.g., after a biopsy procedure), without the use of any markers or staining agents. After training, the deep neural network 10 can be used to perform digital / virtual staining on images of unlabeled fresh tissue samples 22 acquired using, for example, UV or deep UV excitation or even nonlinear microscopy modalities. For example, Raman microscopy can provide a very rich set of unlabeled biochemical features that can further enhance the effectiveness of the virtual staining learned by the neural network.
[0104] A crucial part of the training process involves matching the fluorescence image 20 of the unlabeled tissue sample 22 with its corresponding bright-field image 48 (i.e., the chemically stained image) after the histochemical staining process. It should be noted that during the staining process and related steps, some tissue morphology may be lost or distorted in a way that misleads the loss / cost function during the training phase. However, this is only a challenge relevant to training and validation and does not impose any limitations on the practice of a well-trained deep neural network 10 for virtual staining of the unlabeled tissue sample 22. To ensure the quality of the training and validation phases and to minimize the impact of this challenge on network performance, a threshold is established for an acceptable correlation value between the two image sets (i.e., before and after the histochemical staining process), and mismatched image pairs are eliminated from the training / validation sets to ensure that the deep neural network 10 learns the actual signal, rather than the perturbation of tissue morphology due to the chemical staining process. In fact, this process of removing training / validation image data can be done iteratively: it can begin with a coarse elimination of samples with significant alterations and thus converge to the trained neural network 10. Following this initial training phase, the output image 40 of each sample in the available image set can be filtered against its corresponding bright-field image 48 to set a finer threshold, rejecting some additional images and further purifying the training / validation image set. Through several iterations of this process, not only can the image set be further refined, but the performance of the finally trained deep neural network 10 can also be improved.
[0105] The methods described above will alleviate some of the training challenges caused by the random loss of some tissue features following histological staining. In fact, this highlights another motivation to skip the laborious and expensive procedures involved in histochemical staining, as it is easier to preserve the histology of local tissues in a label-free method without requiring expert handling of some of the delicate procedures involved in the staining process, which sometimes also necessitates observation of the tissue under a microscope.
[0106] Training a Deep Neural Network 10 using a PC desktop takes a considerable amount of time (e.g., approximately 13 hours for the salivary gland network). However, the entire process can be significantly accelerated by using dedicated GPU-based computer hardware. Furthermore, as already... Figures 6A to 6CAs emphasized in the paper, transfer learning provides a "warm start" for the training phase of new tissue / stain combinations, making the entire process significantly faster. Once the deep neural network 10 has been trained, digital / virtual staining of the sample images 40 is performed in a single, non-iterative manner, which does not require trial and error or any parameter tuning to achieve optimal results. Based on its feedforward and non-iterative architecture, the deep neural network 10 rapidly outputs images of virtual staining in less than one second (e.g., 0.59 seconds, corresponding to a sample field of view of approximately 0.33 mm × 0.33 mm). With further GPU-based acceleration, it has the potential to achieve real-time or near-real-time performance in outputting images 40 of digital / virtual staining, which may be particularly useful in the operating room or for in vivo imaging applications. It should be understood that this method can also be used in the in vitro imaging applications described herein.
[0107] The implemented digital / virtual staining process is based on a separate CNN deep neural network 10 trained for each tissue / stain combination. If autofluorescence images 20 with different tissue / stain combinations are fed to the CNN-based deep neural network 10, it will not perform as expected. However, this is not a limitation, because for histological applications, the tissue type and staining type are predetermined for each sample 22 of interest, and therefore, the specific CNN selection for creating the digital / virtual stained image 40 from the autofluorescence image 20 of the unlabeled sample 22 does not require additional information or resources. Of course, a more general CNN model for multiple tissue / stain combinations can be learned, for example, by increasing the number of training parameters in the model, at the cost of potentially increasing training and inference time. Another approach is the potential of System 2 and the method to perform multiple virtual stainings on the same unlabeled tissue type.
[0108] A significant advantage of System 2 is its high flexibility. If diagnostic failures are detected through clinical comparison, the feedback can be adapted to statistically correct performance by penalizing the captured failures accordingly. This iterative training and transfer learning cycle, based on clinical evaluation of network output performance, will help optimize the robustness and clinical impact of the proposed method. Finally, this method and System 2 can identify regions of interest based on virtual staining and use this information to guide subsequent tissue analyses, such as microimmunohistochemistry or sequencing, for microguided molecular analysis at the unstained tissue level. This type of virtual microguidance for unlabeled tissue samples can facilitate high-throughput identification of subtype diseases and also help develop patient-specific therapies.
[0109] Sample preparation
[0110] Formalin-fixed paraffin-embedded 2μm thick tissue sections were deparaffined using xylene and then deparaffined using Cytoseal. TM(Thermo-Fisher Scientific, Waltham, MA, USA) mounted on a standard slide, followed by a coverslip (Fishrefinest, Pittsburgh, Pennsylvania, USA) TM ,24×50-1(Fisherfinest TM (24x50-1, Fisher Scientific, Pittsburgh, PA, USA) Following an initial autofluorescence imaging procedure on unlabeled tissue samples (using a DAPI excitation and emission filter set), slides were then immersed in xylene for approximately 48 hours or until the coverslip could be removed without damaging the tissue. Once the coverslip was removed, the slides were immersed (approximately 30 dips) in anhydrous ethanol, 95% ethanol, and then washed in DI water for approximately 1 minute. This step was followed by the corresponding staining procedure for H&E, Masson trichrome staining, or Jones staining. This tissue processing pathway was used only for training and validating the method and is not required after network training. To test the system and method, different combinations of tissues and staining agents were used: salivary gland and thyroid tissue sections were stained with H&E, kidney tissue sections with Jones staining, and liver and lung tissue sections with Masson trichrome staining.
[0111] In the WSI study, FFPE 2–4 μm thick tissue sections were not covered with coverslips during the autofluorescence imaging phase. After autofluorescence imaging, tissue samples were histologically stained as described above (Masson trichrome stain for liver tissue sections and Jones stain for kidney tissue sections). Unstained frozen samples were prepared by embedding tissue sections in O.CT (Tissue Tek, SAKURA FINETEK USA INC) and immersing them in 2-methylbutane with dry ice. The frozen sections were then cut into 4 μm sections and placed in a freezer until imaging. After the imaging process, the tissue sections were washed with 70% alcohol, stained with H&E, and covered with coverslips. Samples were obtained from the Translational Pathology Core Laboratory (TPCL) and prepared by the histology laboratory at UCLA. Kidney tissue sections from diabetic and non-diabetic patients were obtained under IRB 18-001029 (UCLA). All samples were obtained after identifying patient-related information and all samples were prepared from existing specimens. Therefore, this work did not interfere with standard practices of care or sample collection procedures.
[0112] Data Acquisition
[0113] Label-free autofluorescence images 20 of tissue were captured using a conventional fluorescence microscope 110 (Olympus Corporation, Tokyo, Japan) equipped with a motorized stage, wherein... The image acquisition process was controlled by the microscopy automation software (Molecular Devices, LLC). Unstained tissue samples were excited with near-UV light and imaged using a DAPI filter (OSFI3-DAPI-5060C, excitation wavelength 377nm / 50nm bandwidth, emission wavelength 447nm / 60nm bandwidth) with a 40× / 0.95NA objective (Olympus UPLSAPO40X2 / 0.95NA, WD0.18) or a 20× / 0.75NA objective (Olympus UPLSAPO 20X / 0.75NA, WD0.65). For melanin inference, autofluorescence images of the samples were additionally acquired using a Cy5 filter (CY5-4040C-OFX, excitation wavelength 628nm / 40nm bandwidth, emission wavelength 692nm / 40nm bandwidth) with a 10× / 0.4NA objective (Olympus UPLSAPO 10X2). Each autofluorescence image was captured using a scientific CMOS sensor (ORCA-flash4.0 v2, Hamamatsu Photonics KK, Shizuoka Prefecture, Japan) with an exposure time of approximately 500 ms. Bright-field images48 (for training and validation) were acquired using a slide scanner microscope (Aperio AT, Leica Biosystems) with a 20× / 0.75NA objective lens (Plan Apo) equipped with a 2× magnification adapter.
[0114] Image preprocessing and alignment
[0115] Since the deep neural network 10 aims to learn the statistical transformation between the autofluorescence image 20 of a chemically unstained tissue sample 22 and the bright-field image 48 of the same tissue sample 22 after histochemical staining, accurately matching the FOV of the input and target images (i.e., the unstained autofluorescence image 20 and the stained bright-field image 48) is important. This is implemented in MATLAB (The MathWorks Inc., Natick, MA, USA). Figure 8The document describes a general scheme for the global and local image registration process. The first step of this process is to find candidate features for matching unstained autofluorescent images and chemically stained bright-field images. For this purpose, each autofluorescent image is downsampled 20 (2048 × 2048 pixels) to match the effective pixel size of the bright-field microscope image. This produces a 1351 × 1351 pixel image of unstained autofluorescent tissue, whose contrast is enhanced by saturating the bottom 1% and top 1% of all pixel values, and the contrast is inverted (…). Figure 8 Image 20a) is used to better represent a color map of a full-field digital slice of grayscale conversion. Then, a correlation patching process 60 is performed, where a normalized correlation score matrix is calculated by correlating each of the 1351×1351 pixel patches with a corresponding patch of the same size extracted from the full-slide grayscale image 48a. The entry with the highest score in this matrix represents the FOV most likely to match between the two imaging modalities. Using this information (which defines a pair of coordinates), the matching FOV 48c is cropped from the original full-slide bright-field image 48 to produce the target image 48d. After this FOV matching procedure 60, the autofluorescence image 20 and the bright-field microscope image 48 are coarsely matched. However, they still cannot be precisely registered at the single-pixel level due to slight mismatches in sample placement at two different microscopic imaging experiments (autofluorescence, followed by bright-field), which randomly causes slight rotation angles (e.g., about 1–2 degrees) between the input and target images of the same sample.
[0116] The second part of the input-target matching process involves a global registration step 64, which corrects for the slight rotation angle between the autofluorescence image and the bright-field image. This is accomplished by extracting feature vectors (descriptors) and their corresponding locations from the image pair and matching features using the extracted descriptors. The transformation matrix corresponding to the matching pair is then found using the M-Estimation Sample Consistency (MSAC) algorithm (a variant of the Random Sample Consistency (RANSAC) algorithm). Finally, the angle-corrected image 48e is obtained by applying this transformation matrix to the original bright-field microscope image patch 48d. After this rotation is applied, images 20b and 48e are further cropped by 100 pixels (50 pixels on each side) to accommodate undefined pixel values at the image edges due to the rotation angle correction.
[0117] Finally, for the local feature registration operation 68, elastic image registration is achieved by matching corresponding blocks in a hierarchical manner from large to small, matching the local features of the two image sets (autofluorescence 20b versus bright field 48e). A neural network 71 is used to learn the transformation between the coarsely matched images. This network 71 uses... Figure 10The network 71 has the same structure as network 10. Using a low number of iterations, network 71 learns only the accurate color mapping without learning any spatial transformations between the input and labeled images. The transformation map computed from this step is ultimately applied to each bright-field image patch 48e. At the end of these registration steps 60, 64, 68, the autofluorescent image patch 20b and its corresponding bright-field tissue image patch 48f are precisely matched to each other and can be used as input and label pairs for training the deep neural network 10, allowing the network to focus solely on and learn the problem of virtual histological staining.
[0118] A similar process was used for the 20× objective images (used to generate the data in Tables 2 and 3). Instead of downsampling the autofluorescence images 20, the brightfield microscope images 48 were downsampled to 75.85% of their original size to match the lower magnification images. Furthermore, additional shading correction and normalization techniques were applied to generate full-field digital slices using these 20× images. Each field of view was normalized by subtracting the average value across the entire slide and dividing it by the standard deviation between pixel values before being fed into network 71. This normalized the network input within and between each slide. Finally, shading correction was applied to each image to account for the lower relative intensity measured at the edges of each field of view.
[0119] Deep Neural Network Architecture and Training
[0120] Here, the GAN architecture is used to learn the transformation from an unlabeled, unstained autofluorescent input image 20 to a corresponding bright-field image 48 of a chemically stained sample. Training based on a standard convolutional neural network minimizes the loss / cost function between the network's output and the target label. Therefore, this loss function 69 ( Figure 9 and Figure 10 The choice of regularization term is a key component in deep network design. For example, simply choosing the l2-norm penalty as the cost function will tend to produce ambiguous results because the network averages the probabilities of all seemingly reasonable outcomes; therefore, additional regularization terms are usually needed to guide the network to preserve the desired sharp sample features at the network's output. GANs avoid this problem by learning a criterion designed to accurately distinguish between true and false (i.e., correct or incorrect) output images from deep networks. This makes output images inconsistent with the desired labels intolerant, allowing the loss function to adapt to the current data and the desired task. To achieve this, the GAN training procedure involves training two different networks, such as... Figure 9 and Figure 10As shown: (i) a generator network 70, which in this case aims to learn a statistical transformation between the unstained autofluorescent input image 20 and the corresponding bright-field image 48 of the same sample 12 after histological staining; and (ii) a discriminator network 74, which learns how to distinguish between the real bright-field image of the stained tissue section and the output image of the generator network. Ultimately, the expected result of this training process is a trained deep neural network 10 that transforms the unstained autofluorescent input image 20 into a digitally stained image 40, which will be indistinguishable from the stained bright-field image 48 of the same sample 22. For this task, the loss function 69 for the generator 70 and discriminator 74 is defined as follows:
[0121]
[0122] Where D refers to the output of the discriminator network, z label A bright-field image representing chemically stained tissue, z output This represents the output of the generator network. The generator loss function uses regularization parameters (λ, α) to balance the pixel mean square error (MSE) of the generator network's output image relative to its label, the total variation (TV) operator of the output image, and the discriminator network's prediction of the output image. These regularization parameters are empirically set to different values to suit the pixel MSE loss and the combined generator loss (l) respectively. generator Approximately 2% and approximately 20% of the image z. The TV operator for image z is defined as:
[0123]
[0124] Where p and q are pixel indices. Based on equation (1), the discriminator attempts to minimize the output loss while maximizing the probability of correctly classifying the actual label (i.e., bright-field image of chemically stained tissue). Ideally, the discriminator network will aim to achieve D(z label ) = 1 and D(z) output ) = 0, but if the generator is successfully trained by GAN, then D(z) = 0. output Ideally, it will converge to 0.5.
[0125] exist Figure 10 The generator deep neural network architecture 70 is described in detail. The input image 20 is processed by the network 70 in a multi-scale manner using downsampling and upsampling paths, helping the network learn virtual coloring tasks at various scales. The downsampling path consists of four separate steps (four blocks #1, #2, #3, #4), where each step contains a residual block, and each residual block modifies the feature map x. k Mapped to feature map x k+1 :
[0126] xk+1 =x k +LReLU[CONV k3 {LReLU[CONV k2 {LReLU[CONV k1 {x k (3)
[0127] Where CONV{.} is the convolution operator (which includes a bias term), k1, k2, and k3 represent the sequence numbers of the convolutional layers, and LReLU[.] is the nonlinear activation function (i.e., the leaky rectified linear unit) used throughout the network, defined as:
[0128]
[0129] The number of input channels at each stage of the downsampling path is set to 1, 64, 128, and 256, while the number of output channels in the downsampling path is set to 64, 128, 256, and 512. To avoid size mismatch in each block, the feature map x... k Filled with zeros with x k+1 The number of channels is matched. The connection between each downsampling level is a 2×2 average pooling layer with a 2-pixel span, which downsamples the feature map by a factor of 4 (2x in each direction). After the output of the fourth downsampling block, another convolutional layer (CL) keeps the number of feature maps at 512 before connecting it to the upsampling path. The upsampling path consists of four symmetrical upsampling steps (#1, #2, #3, #4), each containing a convolutional block. The feature map y k Mapping to feature map y k+1 The convolutional block operation is given by the following formula:
[0130] y k+1 =LReLU[CONV k6 {LReLU[CONV k5 {LReLU[CONV k4 {CONCAT(x k+1 ,US{y k (5)
[0131] Here, CONCAT(.) is the concatenation of two feature maps to merge the number of channels, US{.} is the upsampling operator, and k4, k5, and k6 represent the sequence numbers of the convolutional layers. The number of input channels for each level in the upsampling path is set to 1024, 512, 256, and 128, respectively, and the number of output channels for each level in the upsampling path is set to 256, 128, 64, and 32, respectively. The last layer is a convolutional layer (CL) that maps 32 channels to 3 channels, represented by a YcbCr color map. Both the generator and discriminator networks are trained with a block size of 256×256 pixels.
[0132] Figure 10 The discriminator network, as summarized in the diagram, receives three (3) input channels corresponding to the YcbCr color spaces 40YcbCr and 48YcbCr of the input image. This input is then transformed into a 64-channel representation using convolutional layers, followed by five blocks with the following operators:
[0133] z k+1 =LReLU[CONV k2 {LReLU[CONV k1 {z k (6)
[0134] Where k1 and k2 represent the sequence numbers of the convolutional layers. The number of channels in each layer is 3, 64, 64, 128, 128, 256, 256, 512, 512, 1024, 1024, and 2048. The next layer is an average pooling layer with a filter size equal to the block size (256×256), which produces a vector with 2048 entries. The output of this average pooling layer is then fed into two fully connected layers (FC) with the following structure:
[0135] z k+1 =FC[LReLU[FC{z k}]] (7)
[0136] Here, FC stands for fully connected layer, with learnable weights and biases. The first fully connected layer outputs a vector with 2048 entries, while the second fully connected layer outputs a scalar value. This scalar value is used as input to the sigmoid activation function D(z) = 1 / (1 + exp(-z)), which calculates the probability (between 0 and 1) that the discriminator network input is true / real or false. Ideally, D(z) = 1 / (1 + exp(-z)). label ) = 1, such as Figure 10 The output is shown in 67.
[0137] The convolutional kernels of the entire GAN were set to 3×3. These kernels were randomly initialized using a truncated normal distribution with a standard deviation of 0.05 and a mean of 0; all network biases were initialized to 0. The training phase of the deep neural network 10 used an adaptive moment estimation (Adam) optimizer via backpropagation. Figure 10 (As shown by the dashed arrow) Update the learnable parameters, where the generator network 70 has a learning rate of 1×10⁻⁶. -4 The learning rate of the discriminator network 74 is 1×10⁻⁶. -5 Furthermore, for each iteration of the discriminator 74, there are 4 iterations of the generator network 70 to avoid training stagnation caused by potential overfitting of the discriminator network with the labels. A block size of 10 is used during training.
[0138] Once all fields of view have passed through Network 10, the full-field digital slices are stitched together using the Fiji mesh / ensemble stitching plugin (e.g., see Schindelin, J. et al.: Fiji: an open-source platform for biological-image analysis. Nat. Methods 9, 676-682 (2012), which is incorporated herein by reference). The plugin calculates the precise overlap between each map and linearly blends them into a single large image. Overall, the inference and stitching per cm... 2 The process takes approximately 5 minutes and 30 seconds respectively, and can be significantly improved with advancements in hardware and software. Slices that are out of focus or have significant aberrations (e.g., due to dust particles) in autofluorescence or bright-field images are cropped before being shown to the pathologist. Finally, the images are exported to Zoomify format (designed to allow viewing of larger images using standard web browsers; http: / / zoomify.com / ) and uploaded to the GIGAMROCRO website (https: / / viewer.gigamacro.com / ) for easy access and viewing by the pathologist.
[0139] Implementation details
[0140] Table 5 below shows details of other implementation methods, including the number of training blocks, the number of stages, and training time. A digital / virtual colored deep neural network10 was implemented using Python version 3.5.0. A GAN was implemented using the TensorFlow framework version 1.4.0. Other Python libraries used were os, time, tqdm, Python Image Processing Library (PIL), SciPy, glob, ops, sys, and numpy. The software was implemented on a desktop computer with a 4.2GHz Intel i7-7700K CPU and 64GB of RAM, running Windows 10 (Microsoft). GTX 1080Ti GPU (Nvidia) was used for network training and testing.
[0141] Table 5
[0142]
[0143]
[0144] Experiment - Virtual staining of samples using fluorescence lifetime imaging (FLIM)
[0145] In this embodiment, a neural network 10 is used, trained to perform virtual IHC staining on unstained tissue sample 22 based on fluorescence lifetime imaging. The algorithm takes a fluorescence lifetime image 20L of the unstained tissue sample 22 and outputs an image 40 that closely matches a bright-field image 48 of the same field of view after IHC staining. Using this method, virtual staining can replace the laborious and time-consuming IHC staining procedure, which is significantly faster and allows for tissue preservation for further analysis.
[0146] Data Acquisition
[0147] refer to Figure 11AUnstained formalin-fixed paraffin-embedded (FFPE) breast tissue (e.g., obtained via biopsy B) was cut into thin 4 μm sections and fixed onto standard microscope slides. These tissue sections 22 were obtained under IRB 18-001029. The tissues were deparaffinized with xylene and mounted on standard slides using Cytoseal (Thermo Fisher Scientific). A standard fluorescence lifetime microscope 110 (SP8-DIVE, Leica Microsystems) equipped with a 20× / 0.75NA objective (Leica HC PL APO 20× / 0.75IMM) and two separate hybrid photodetectors receiving fluorescence signals in the wavelength ranges of 435–485 nm and 535–595 nm, respectively, was used on microscope 110 to image the autofluorescence lifetime of these unlabeled tissue sections 22 using a 700 nm wavelength laser excited at approximately 0.3 W, generating images 20L. For 1024×1024 pixels with a pixel size of 300 nm, the scan speed was 200 Hz. Once autofluorescence lifetime images 20L were obtained, slides were IHC stained using standard HER2, ER, or PR staining procedures. Staining was performed by the UCLA Translational Pathology Core Laboratory (TPCL). These IHC-stained slides were then imaged using a commercially available slide scanning microscope equipped with a 20× / 0.75NA objective (Aperio AT, Leica Biosystems) to create target images 48 for training, validating, and testing neural networks 10.
[0148] Image preprocessing and registration
[0149] Because the deep neural network 10 is designed to learn transformations from the autofluorescence lifetime images 20L of the unlabeled sample 22, it is important to accurately align them with the FOV of the corresponding bright-field image 48 of the target. Image preprocessing and alignment follow the procedures described herein. Figure 8 The global and local registration process is illustrated. At the end of the registration process, images of one or more autofluorescence and / or lifetime channels from unlabeled tissue sections 22 are well aligned with the corresponding bright-field images 48 of the IHC-stained tissue sections. Before feeding these aligned image pairs 20L, 48 into the neural network 10, slide normalization is performed on the fluorescence intensity images by subtracting the average value across the entire slide and dividing it by the standard deviation between pixel values.
[0150] Deep Neural Network Architecture, Training and Validation
[0151] For the trained deep neural network 10, a conditional GAN architecture was used to learn the transformation from an unlabeled, unstained autofluorescence lifetime input image 20L to corresponding bright-field images 48 in three different staining agents (HER2, PR, and ER). After the autofluorescence lifetime image 20L was registered to the bright-field image 48, these precisely aligned FOVs were randomly segmented into overlapping blocks of 256×256 pixels, which were then used to train the GAN-based deep neural network 10.
[0152] The GAN-based neural network
[10] consists of two deep neural networks: a generator network (G) and a discriminator network (D). For this task, the loss functions for the generator and discriminator are defined as follows:
[0153]
[0154]
[0155] The anisotropic total variation (TV) operator and the L1 norm are defined as follows:
[0156]
[0157] Where D(·) and G(·) refer to the outputs of the discriminator and generator networks, respectively, z label A bright-field image representing histologically stained tissue, and z output This represents the output of the generator network.
[0158] The Structural Similarity Index (SSIM) is defined as follows:
[0159]
[0160] Where, μ x μ y It is the average value of the image. σ is the variance of x and y; x,y is the covariance of x and y; and c1 and c2 are variables used to stabilize divisions with small denominators. An SSIM value of 1.0 refers to the same image. The generator loss function balances the pixel-wise SSIM and L1 norm of the generator network output image with its label, the total variational (TV) operator of the output image, and the discriminator network prediction of the output image. The regularization parameters (μ, α, v, λ) are set to (0.3, 0.7, 0.05, 0.002).
[0161] The deep neural network architecture of generator G follows Figure 11BThe structure of the deep neural network 10 shown (and described herein) is illustrated. However, in this implementation, the network 10 begins with a convolutional layer mapping the input lifetime and / or autofluorescence image data 200 to sixteen (16) channels, followed by a downsampling path consisting of five separate downsampling operations 202. The number of input channels in each stage of the downsampling path is set to: 16, 32, 64, 128, 256, while the number of output channels in the downsampling path is set to: 32, 64, 128, 256, 512. After the central block convolutional layer 203, an upsampling path consists of five symmetrical upsampling operations 204. The number of input channels in each stage of the upsampling path is set to: 1024, 512, 256, 128, 64, respectively, and the number of output channels in each stage of the upsampling path is set to: 256, 128, 64, 32, 16, respectively. The final layer, 206, is a convolutional layer that maps 16 channels to 3 channels using the tanh() activation function 208, represented by the RGB color map. Both the generator (G) and discriminator (D) networks are trained using image patch sizes of 256×256 pixels.
[0162] The discriminator network (D) receives three (3) (i.e., red, green, and blue) input channels corresponding to the RGB color space of the input image. Then, a convolutional layer 210 transforms this three-channel input into a 16-channel representation, followed by five blocks 212 of the following operators:
[0163] z k+1 =POOL(LReLU[CONV k2 {LReLU[CONV k1 {z k (11)
[0164] Where CONV{.} is the convolution operator (which includes a bias term), k1 and k2 represent the sequence numbers of the convolutional layers, LReLU[.] is the nonlinear activation function used throughout the network (i.e., leaky rectified linear unit), and POOL(.) is a 2×2 average pooling operation, defined as:
[0165]
[0166] The number of input and output channels in each stage follows the generator downsampling path exactly, followed by the central block convolutional layer 213. The final stage 214 is represented as follows:
[0167] z k+1 =
[0168] Sigmoid(FC[Dropout[LReLU[FC[LReLU[CONV k2{LReLU[CONV k1 {z k (13)
[0169] Here, FC[.] represents a fully connected layer with learnable weights and biases. Sigmoid(.) represents the sigmoid activation function, and Dropout[.] randomly removes 50% of the connections from the fully connected layer.
[0170] The convolutional filter size in the entire GAN-based deep neural network10 is set to 3×3. The learnable parameters are updated through training using an adaptive moment estimation (Adam) optimizer, where the learning rate of the generator network (G) is 1×102. -4 Furthermore, the learning rate of the discriminator network (N) is 1×10⁻⁶. -5 Set the training block size to 48.
[0171] Implementation details
[0172] A virtual coloring network 10 was implemented using Python version 3.7.1 and Pyptorch framework version 1.3. The software was implemented on a desktop computer with a 3.30GHz Intel i9-7900X CPU and 64GB of RAM, running Microsoft Windows 10. Two NVIDIA GeForce GTX 1080Ti GPUs were used for network training and testing.
[0173] Experimental-post-imaging computational autofocus
[0174] This implementation relates to post-imaging computational autofocus for incoherent imaging modalities such as bright-field microscopy and fluorescence microscopy. The method requires only a single aberration image to virtually refocus that single aberration image using a trained deep neural network. The data-driven machine learning algorithm takes the aberration and / or defocused image and outputs an image that closely matches the focused image of the same field of view. Using this method, the scanning speed of microscopes imaging samples, such as tissues, can be increased.
[0175] Fluorescence image acquisition
[0176] refer to Figure 13Tissue autofluorescence images 20 were obtained using an inverted Olympus microscope (IX83, Olympus) 110 controlled by Micro-Manager microscopy automation software. Unstained tissue 22 was excited in the vicinity of ultraviolet light and imaged using a DAPI filter block (OSF13-DAPI-5060C, excitation wavelength 377nm / 50nm bandwidth, emission wavelength 447nm / 60nm bandwidth). Images 20 were acquired using a 20× / 0.75NA objective lens (Olympus UPLSAPO 20× / 0.75NA, WD 0.65). At each stage location, the automation software performed autofocus based on image contrast and acquired the protocol stack from -10μm to 10μm in axial intervals of 0.5μm. Each image 20 was captured with a scientific CMOS sensor (ORCA-flash4.0 v.2, Hamamatsu Photonics) with an exposure time of approximately 100ms.
[0177] Image preprocessing
[0178] To correct for rigid displacement and rotation from the microscope stage, the autofluorescence image stack (2048×2048×41) was first aligned with the ImageJ plugin 'StackReg'. Then, the ImageJ plugin 'Extended Depth of Field' was used to generate depth-of-field extended images for each stack. Laterally, the stacks and their corresponding extended depth-of-field (EDOF) images were cropped into non-overlapping 512×512 smaller blocks, and the most focused plane (the target image) was set as the plane with the highest structural similarity index (SSIM) to the EDOF image. Then, 10 planes above and below the focused plane (corresponding to + / - 5 μm defocus) were set within the stack, and input images for training network 10a were generated from each of these 21 planes.
[0179] To generate the training and validation datasets, defocused and focused images of the same field of view were paired and used as the input and output for training Network 10a, respectively. The original dataset consisted of approximately 30,000 such pairs, randomly divided into a training dataset and a validation dataset, comprising 85% and 15% of the data, respectively. The training dataset was augmented eight times during training by random flipping and rotation, while the validation dataset was not augmented. A test dataset was cropped from individual fields of view (FOVs) not present in the training and validation datasets. Images were normalized at the FOV using their mean and standard deviation before being fed into Network 10a.
[0180] Deep Neural Network Architecture, Training and Validation
[0181] Generative Adversarial Network (GAN) 10a is used here to perform snapshot autofocus. GAN 10a consists of a generator network (G) and a discriminator network (D). The generator network (G) is a U-network with residual connections, and the discriminator network (D) is a convolutional neural network. During training, network 10a iteratively minimizes the loss functions of the generator and discriminator, defined as:
[0182] L G =
[0183] MAE{z label , z output}+λ×MSSSIM{z label , z output}+β×MSE{z label , z output}+α×(1-D(z output )) 2 (14)
[0184] L D =D(z) output ) 2 +(1-D(z label )) 2 (15)
[0185] Among them, z label The focused fluorescence image, z output Let represent the generator output and D represent the discriminator output. The generator loss function is a combination of the mean absolute error (MAE), the multi-scale structural similarity (MS-SSIM) index, and the mean squared error (MSE) balanced by regularization parameters λ, β, and α. During training, the parameters are empirically set to λ = 50, β = 1, and α = 1. The multi-scale structural similarity (MS-SSIM) index is defined as:
[0186]
[0187] Among them, y j and y j It is downsampling 2 j-1 The distortion and reference image; μ x μ y It is the average of x and y; σ is the variance of x; xy is the covariance of x and y; and C1, C2, and C3 are small constants used to stabilize divisions with small denominators.
[0188] An adaptive moment estimation (Adam) optimizer is used to update the learnable parameters, where the learning rates of the generator (G) and discriminator (D) are 1 × 10⁻⁶. -4 and 1×10-6 Furthermore, the generator loss is updated six times and the discriminator loss is updated three times in each iteration. A block size of five (5) is used during training. A validation set is tested every 50 iterations, and the best model is selected as the model with the minimum loss on the validation set.
[0189] Implementation details
[0190] The network was implemented on a PC using TensorFlow, which featured a 2.3GHz Intel Xeon Core W-2195 CPU, 256GB of RAM, and an Nvidia GeForce 2080Ti GPU. Training with approximately 30,000 image pairs of 512×512 pixels took about 30 hours. Testing with 512×512 pixel image patches took about 0.2 seconds.
[0191] Experiment - Virtual staining using multiple staining agents with a single network
[0192] In this embodiment, a conditional convolutional neural network 10 is used to transform an input image consisting of one or more autofluorescence images 20 of unlabeled tissue samples 22. As an example, to demonstrate its practicality, an image of an unlabeled slide is virtually stained using a single network 10 with hematoxylin and eosin (H&E), Jones silver stain, Masson's trichrome stain, and periodate-Scheff (PAS) stain. The trained neural network 10 is capable of generating new stains and staining specific tissue microstructures with these trained stains.
[0193] Data Acquisition
[0194] Unstained formalin-fixed paraffin-embedded (FFPE) kidney tissue was sectioned into 2 μm sections and fixed onto standard microscope slides. These tissue sections 22 were obtained under IRB 18-001029. A standard wide-field fluorescence microscope 110 (IX83, Olympus) equipped with a 20× / 0.75NA objective (Olympus UPLSAPO 20X / 0.75NA, WD0.65) and two separate filter blocks DAPI (OSFI3-DAPI-5060C, EX 377 / 50nm EM 447 / 60nm, Semrock) and TxRed (OSFI3-TXRED-4040C, EX 562 / 40nm EM 624 / 40nm, Semrock) was used to image the autofluorescence of the unlabeled tissue sections 22. The exposure time for the DAPI channel is approximately 50 ms and for the TxRed channel it is approximately 300 ms. Once the autofluorescence of tissue sections 22 has been imaged, the slides are histologically stained using standard H&E, Jones, Masson trichrome staining agents, or PAS staining procedures. Staining was performed by the UCLA Translational Pathology Core Laboratory (TPCL). These histologically stained slides were then imaged using an FDA-approved slide scanning microscope 110 (Aperio AT, Leica Biosystems, using a 20X / 0.75NA objective) to produce target images 48 for training, validating, and testing neural networks 10.
[0195] Deep Neural Network Architecture, Training and Validation
[0196] The conditional GAN architecture is used to train a deep neural network 10 that learns to transform an unlabeled, unstained autofluorescent input image 20 to a corresponding bright-field image 48 using four different staining agents (H&E, Masson's tricolor staining, PAS, and Jones). Of course, other or additional staining agents can be trained in the deep neural network 10. After the autofluorescent image 20 and the bright-field image 48 are co-registered, these precisely aligned FOVs are randomly segmented into overlapping blocks of 256×256 pixels and then used to train the GAN network 10. In the implementation of the conditional GAN network 10, a one-hot encoding matrix M ( Figure 15 During training, for each matrix M corresponding to a different staining agent, the one-hot encoded matrix M is concatenated to the network's 256×256 input images / image stacks. One way to represent this modulation is as follows:
[0197]
[0198] Where [·] refers to series connection, and c ic is a 256×256 matrix representing the labels for the i-th staining agent type (in this example: H&E, Masson tricolor staining, PAS, and Jones). For an input image and target image pair from the i-th staining dataset, c i It is set to a matrix of all 1s, so all other remaining matrices are assigned zero values (see [link]). Figure 15 The Conditional GAN network consists of two deep neural networks, a generator (G) and a discriminator (D), as explained in this paper. For this task, the loss functions for the generator and discriminator are defined as follows:
[0199]
[0200]
[0201] The anisotropic TV operator and the L1 norm are defined as follows:
[0202]
[0203] Where D(·) and G(·) refer to the outputs of the discriminator and generator networks, respectively, z label A bright-field image representing histologically stained tissue, and z output This represents the output of the generator network. P and Q represent the number of vertical and horizontal pixels in the image patch, and p and q represent the summation index. The regularization parameters (λ, α) are set to 0.02 and 2000, which adapt the total variational loss term to approximately 2% of the L1 loss and the discriminator loss term to 98% of the total generator loss.
[0204] The deep neural network architecture of the generator (G) follows the principle of Figure 10 The structure of the deep neural network 10 shown (and described herein) is illustrated. However, in one implementation, the number of input channels for each stage in the downsampling path is set to: 1, 96, 192, 384. The discriminator network (D) receives seven input channels. Three input channels (YCbCr color maps) come from the generator output or the target, and four input channels come from the one-hot encoded class conditional matrix M. This input is transformed into a 64-channel feature map using convolutional layers. The convolutional filter size for the entire GAN is set to 3×3. The learnable parameters are updated through training using an adaptive moment estimation (Adam) optimizer, where the learning rate of the generator network (G) is 1×10⁻⁶. -4 Furthermore, the learning rate of the discriminator network (D) is 2 × 10⁻⁶. -6 For each discriminator step, there are ten iterations of the generator network. The training block size is set to 9.
[0205] Single virtual tissue staining
[0206] Once the deep neural network 10 is trained, one-hot encoded labels are used. It is used to adjust network 10 to generate the desired stained image 40. In other words, for the i-th staining (for a single staining agent implementation), c i The matrix is set to all 1s and the remaining matrices are set to all 0s. Therefore, one or more condition matrices can be applied to the deep neural network 10 to generate corresponding stains on all or sub-regions of the imaging sample. The condition matrix M defines the sub-region or boundary of each stain channel.
[0207] Staining agent mixing and microstructuring
[0208] After training, the condition matrix can be used in a way that Network 10 was not trained to create new or novel types of stainers. The encoding rules that should be satisfied can be summarized in the following equation:
[0209]
[0210] In other words, for a given set of indices i, j, the number of cross-chromatograms across which network 10 is trained (in our example, N) stains The sum of (=4) should equal 1. In a feasible implementation, this is achieved by modifying the class encoding matrix to use a mixture of multiple classes, as described in the following equation:
[0211]
[0212] Various staining agents can be mixed to produce unique staining agents that exhibit characteristics emitted by a variety of staining agents learned from artificial neural networks. This is in Figure 17 As shown in the image.
[0213] Another option is to divide the field of view of the tissue into different regions of interest (ROIs), where each ROI can be virtually stained using a specific staining agent or a mixture of these staining agents:
[0214]
[0215] Here, ROI is a defined region of interest within the field of view. Multiple non-overlapping ROIs can be defined across the field of view. In one implementation, different staining agents can be used for different regions of interest or microstructures. These can be user-defined and manually labeled as explained in this paper, or generated algorithmically. As an example, users can manually define various tissue regions via the GUI and stain them using different staining agents. Figure 16 and Figure 17 This results in different tissue components being stained differently, such as... Figure 16 and Figure 17 As shown, Python is used to segment data packets (Labels) to achieve selective coloring (microstructuring) of ROIs. Using these packets, a logical mask for the ROIs is generated, which is then processed into labels for microstructuring. In another implementation, the tissue structure can be stained based on a computer-generated map, for example, obtained by segmentation software. For instance, this could involve virtually staining all cell nuclei with one (or more) staining agents, while the remainder of tissue 22 is stained with another staining agent or a combination of staining agents. Other manual, software, or hybrid methods can be used to achieve selectivity in tissue structure selection.
[0216] Figure 16 An example of a GUI for a display 106 is shown, which includes a toolbar 120 that can be used to select certain regions of interest in the network output image 40 for staining with a specific dye. The GUI may also include different dye options or a dye palette 122 that can be used to select a desired dye for the network output image 40 or a selected region of interest within the network output image 40. In this specific example, the three regions are identified by hash lines (regions A, B, and C) that have been manually selected by the user. In other embodiments, these regions may also be automatically identified by image processing software 104. Figure 16 A system is shown that includes a computing device 100 containing one or more processors 102 and image processing software 104 incorporating a trained deep neural network 10.
[0217] Implementation details
[0218] A virtual coloring network was implemented using Python version 3.6.0 with TensorFlow framework version 1.11.0. This software can be implemented on any computing device 100. For the experiments described in this paper, computing device 100 was a desktop computer with a 2.30GHz Intel Xeon W-2195 CPU and 256GB of RAM, running the Microsoft Windows 10 operating system. Network training and testing were performed using four NVIDIA GeForce RTX 2080Ti GPUs.
[0219] Enhanced training using multiple styles
[0220] Figures 20A to 20C This illustrates the use of alternative pattern enhancements as a staining agent transformation network 10. stainTN ( Figure 20C Another implementation of training a deep neural network using a virtual staining network 10, and intended to be used for training the stainer migration network 10. stainTNThe input images are normalized (or standardized). However, in some cases, enhancement with a different style is required because full-field digital slides generated using standard histological staining and scanning exhibit unavoidable variability due to different staining procedures and reagents between different laboratories and the specific characteristics of digital full-slide scanners. This implementation thus produces a staining migration network 10. stainTN It can adapt to a wide range of input images by combining multiple styles during training.
[0221] Therefore, in order to make network 10 stainTN The performance is generalized to this staining variability, leveraging additional staining patterns to enhance network training. A pattern refers to the different variability of an image that may appear in a chemically stained tissue sample. In this implementation, it is decided to facilitate this enhancement using a K=8 unique pattern transfer (staining normalization) network trained using the CycleGAN method, but other pattern transfer networks can be used. CycleGAN is a GAN method that uses two generators (G) and two discriminators (D). One pair is used to transform the image from a first domain to a second domain. The other pair is used to transform the image from a second domain to a first domain. For example, this is in... Figure 21 As can be seen in [the original text]. For example, details about the CycleGAN method can be found in Zhu et al., Unpaired Image-to-Image Translation using Cycle-Consistent Adversial Networks, arXiv: 1703.10593v1-v7 (2017-2020), which is incorporated herein by reference. These style enhancements ensure that a wide sample space is utilized by the subsequent staining transformation network
[10] . stainTN It provides coverage, and therefore, when applied to tissue samples stained with chemical H&E, stain migration will be effective regardless of variations between technicians, laboratories, or equipment.
[0222] Style Transformation Network 10 styleTN It can be used to enhance virtual staining generator / transformation networks 10 stainTNThis is intended for use in virtual-to-virtual staining agent transformation or chemical-to-virtual staining agent transformation. Given the variability seen in chemical staining agents (e.g., H&E staining agents), the latter is more likely to be used with staining agent transformation networks. For example, there is a need in industry to transform one type of chemical staining agent into another. Examples include chemical H&E staining agents and the need to produce specialized staining agents such as PAS, MT, or JMS. For example, non-neoplastic nephropathy relies on these “specialized staining agents” to provide a standard of care for pathological assessment. In many clinical practices, H&E staining agents are available before specialized staining agents, and the pathologist can provide a “preliminary diagnosis” so that the patient’s nephrologist can begin treatment. This is particularly useful in settings such as crescentic glomerulonephritis or transplant rejection, where rapid diagnosis, followed by rapid initiation of or treatment, can lead to significant improvements in clinical outcomes. In settings where only H&E slides are initially available, the preliminary diagnosis is followed by a final diagnosis, typically provided on the second business day. As explained in this paper, the development of improved deep neural networks10 stainTN To improve the initial diagnosis, three additional special staining agents—PAS, MT, and Jones hexamine silver (JMS)—can be generated by using H&E-stained slides. These H&E-stained slides can also be examined by pathologists using histochemically stained H&E staining agents.
[0223] A set of supervised deep learning-based workflows is proposed, which enables users to utilize colorimetric transformation networks. stainTN Perform the conversion between the two staining agents. This is achieved by first generating a special staining agent from several pairs of registered virtual stained H&E images and the same autofluorescent image of unlabeled tissue sections. Figure 20A These generated images can be used to transform chemically stained images into specific staining agents. This facilitates the creation of perfectly spatially registered (paired) datasets and allows for staining agent transformation networks10 stainTN It is trained, rather than solely relying on distribution matching loss and unpaired data. Furthermore, misalignment does not introduce aberrations, thus improving the accuracy of the transformation. This was validated by evaluating kidney tissue with different non-tumor diseases.
[0224] Using deep neural networks 10 stainTN The process involves transforming tissues for H&E staining with specific staining agents. To train this neural network, a set of additional deep neural networks 10 are used in conjunction with each other. This workflow relies on the ability of virtual staining to generate images of multiple different staining agents from a single unlabeled tissue slice. Figure 20ABy using a single neural network 10 to generate both H&E images and a specific staining agent (PAS, MT, or JMS), a perfectly matched (i.e., paired) dataset can be created. Due to the normalization (standardization) of the images generated using the virtual staining network 10, the training of the staining agent transformation network 10 is designed to... stainTN The virtual shading image 40 used as input is enhanced with additional shading styles to ensure generalization. Figure 20B This is achieved through... Figure 20B The image 40 with virtual coloring is processed by a style transfer network 10. styleTN The output image 40', which is run to create an enhanced or style-transferred image, is shown. In other words, the staining transformation network 10 stainTN Designed to handle the unavoidable variability in H&E staining, a result of staining procedures and reagents used across different laboratories, as well as the specific characteristics of digital whole-slide scanners. This enhancement is achieved using a K=8 unique pattern transfer (staining normalization) network trained with CycleGAN.10 styleTN implement( Figure 20B These enhancements ensure that the wide H&E sample space is utilized by the subsequent staining transformation network 10. stainTN It provides coverage, and therefore will be effective when applied to H&E-stained tissue samples, regardless of inter-technician, inter-laboratory, or inter-equipment variations. Note that even deep neural networks 10... stainTN This neural network is used to transform chemical dyes into one or more virtual dyes, but different virtual dyeing inputs are used for dye pattern transformation. stainTN Enhanced training.
[0225] Using this dataset, you can... Figure 20C The scheme shown trains the staining transformation network 10 stainTN Network 10 stainTN It is a randomly fed image patch, which comes from a virtually stained tissue 40 (upper path), or from a K=8 style transfer network 10. styleTN The image is a virtual staining image of one of the (left path) fields, and an output image 40” of histochemical staining agent migration is generated. Regardless of the CycleGAN style migration, the corresponding special staining agent (virtual special staining agent from the same unlabeled field of view) is used as the ground truth 48. The network 10 is then blindly tested on various digital H&E slides obtained from the UCLA repository. stainTN (For patients / cases without a trained network), it represents groups of diseases and staining changes.
[0226] Methods for Stain Pattern Transfer in Data Augmentation
[0227] To ensure neural network 10 stainTNCapable of being applied to tissue sections stained with various H&E methods, the CycleGan model is used by employing a style transformation network 10. styleTN Perform style transformations to enhance the training dataset. Figure 20B This neural network 10 styleTN Learn the mapping between two domains X and Y given training samples x and y, where X is the domain of the original virtual-colored H&E (H&E). Figure 21 Image 40 x ), and Y are images generated by H&E in different laboratories or hospitals 48 y The model performs two mappings G: X→Y and F: Y→X. Furthermore, two adversarial discriminators D are introduced. X and D Y . Figure 21 The diagram illustrates the relationships between these different networks. For example... Figure 21 As shown, virtual H&E image 40 x (Style X) is input into generator G(x) to create a generated H&E image 40 with style Y. y Then, a second generator F(G(x)) is used to map back to the X domain to generate image 40. x . refer to Figure 21 H&E images of histochemical staining 40 y (Style Y) is input into generator F(y) to create a generated H&E image with style X.48 x Then, a second generator G(F(y)) is used to map back to the Y domain to generate image 48. y . Figure 22A and Figure 22B Two generator networks G(x)( are shown) Figure 22A ) and F(y)( Figure 22B The structure of ).
[0228] Generator l generator The loss function contains two terms: adversarial loss l adv This is used to match the staining patterns of the generated image with the patterns of histochemical staining images in the target domain; and a cycle consistency loss. cycle This is to prevent the learned mappings G and F from conflicting with each other. Therefore, the total loss is described by the following equation:
[0229]
[0230] Where, λ and This is a constant used to weight the loss function. For all networks, λ is set to 10, and It is set to equal to 1. Figure 21 Each generator and discriminator in the D xOr D y This correlation ensures that the generated image matches the distribution of the ground truth. The loss of each generator network can be written as:
[0231] l adv X→Y =(1-D Y (G(x))) 2 (twenty three)
[0232] l adv Y→X =(1-D X (F(y))) 2 (twenty four)
[0233] Furthermore, the cycle consistency loss can be described as:
[0234] l cycle =L1{y,G(F(y))}+L1{x,F(G(x))} (25)
[0235] The L1 loss, or mean absolute error loss, is given by the following formula:
[0236]
[0237] In this equation, p and q are pixel indices, and P and Q are the total number of pixels in the horizontal dimension.
[0238] Used for training D X and D Y The adversarial loss term is defined as:
[0239]
[0240]
[0241] For these CycleGAN models, G and F use a U-network architecture. This architecture consists of three “lower blocks” 220 and three subsequent “upper blocks” 222. Each of the lower blocks 220 and upper blocks 222 contains three convolutional layers with a 3×3 kernel size, which are activated when activated by the LeakyReLU activation function. The lower blocks 220 each double the number of channels and end with an average pooling layer with a stride and a kernel size of 2. The upper blocks 222 begin with bicubic upsampling before applying the convolutional layers. Skip connections are used between each block of a layer to pass data through the network without having to go through all the blocks.
[0242] Discriminator D X and D YIt consists of four blocks. These blocks include two convolutional layers and LeakyReLU pairs, which together double the number of channels. These are followed by an average pooling layer with a stride of 2. After five blocks, two fully connected layers reduce the output dimension to a single value.
[0243] During training, an adaptive moment estimation (Adam) optimizer is used to update the learnable parameters, where both the generator (G) and discriminator (D) networks have a learning rate of 2 × 10⁻⁶. -5 For each step of the discriminator training, one iteration of training is performed on the generator network, and the block size used for training is set to 6.
[0244] In some implementations, the style transformation network 10 styleTN This can include normalization of the staining vector. The network can also be trained in a supervised manner between two different variations of the same stained slide.10 stainTN This change is the result of the sample being imaged with different microscopes or stained with histochemical stains, followed by a second re-staining of the same slide. The trained network can be trained from a set of paired images of virtual staining.10 stainTN Each image is stained with a different virtual staining agent, generated by a single virtual staining neural network 10.
[0245] Although embodiments of the invention have been shown and described, various modifications may be made without departing from the scope of the invention. For example, while different embodiments have been described as generating digital / virtual stained microscopic images of unlabeled or unstained samples, methods for labeling the samples using one or more exogenous fluorescent markers or other exogenous light emitters may also be used. Thus, these samples are labeled but not using conventional immunochemical (IHC) staining agents. Therefore, the invention should not be limited to the appended claims and their equivalents.
Claims
1. A method for generating a virtual stained microscopic image of a sample, comprising: Provides a deep neural network trained by image processing software using one or more processors of a computing device, wherein the deep neural network is trained with multiple pairs of stained and matched microscopic images or image patches, wherein each pair of stained and matched microscopic images or image patches includes an input image of the sample virtually or chemically stained with a first staining agent type and an output image of the sample virtually or chemically stained with a different second staining agent type. Obtain the input image of the sample virtually or chemically stained with the first staining agent type; The input image of the sample is input into the deep neural network; and The input image is converted into the output image of the sample by the deep neural network, and the output image is virtually stained with the second staining agent type to resemble and match the stained microscopic image of the same sample chemically stained with the second staining agent type.
2. The method according to claim 1, wherein, The input image includes a virtually stained image generated by the output of at least one digitization algorithm or a virtually stained image generated by a separately trained neural network.
3. The method according to claim 1, wherein, The input image includes a virtually stained image generated by the output of a machine learning algorithm or a virtually stained image generated by a separately trained neural network.
4. The method according to claim 1, wherein, The input images include stained images of chemically stained samples obtained through incoherent microscopy.
5. The method according to claim 1, wherein, The deep neural network is trained using enhanced training images that utilize style transformation networks or algorithms.
6. The method according to claim 5, wherein, The style transformation network includes a CycleGAN-based style transformation network.
7. The method according to claim 5, wherein, The style transformation network performs normalization of the coloring vectors.
8. The method according to claim 5, wherein, The deep neural network is trained in a supervised manner between two different variations of the same stained slide, wherein the variations are the result of imaging the sample with different microscopes or chemical staining followed by a second restaining of the same slide.
9. The method according to claim 5, wherein, The deep neural network is trained by a set of paired virtual staining images generated by a single virtual staining neural network or two different virtual staining neural networks.
10. The method according to claim 1, wherein, The chemical staining agent is one of the following: acid staining agent, basic staining agent, or silver staining agent.
11. The method according to claim 4, wherein, The incoherent microscope used to input the input image into the deep neural network includes one of the following: a fluorescence microscope, a wide-field microscope, a super-resolution microscope, a confocal microscope, a confocal microscope with single-photon or multi-photon excited fluorescence, a second-harmonic or high-harmonic generation fluorescence microscope, a light sheet microscope, a FLIM microscope, a bright-field microscope, a dark-field microscope, a structured light illumination microscope, a total internal reflection microscope, a computational microscope, a polarizing microscope, a synthetic aperture-based microscope, and a phase-contrast microscope.
12. The method according to claim 4, wherein, The stained image of the sample obtained using the incoherent microscope is out of focus in an axial range around the plane of the focused image.
13. The method according to claim 1, wherein, The first type of staining agent includes hematoxylin-eosin H&E.
14. The method according to claim 13, wherein, The second type of staining agent includes Jones hexamine silver staining agent, Masson trichrome staining agent, or periodate Schiff PAS staining agent.