Tissue-level whole slide digital section small sample learning-based image recognition

By combining convolutional neural networks and support vector machines with pathologist input, the problem of unreliable manual interpretation in histopathology is solved, achieving efficient and accurate identification of tissue lesions.

CN117115548BActive Publication Date: 2026-06-26NANTOMICS LLC

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
NANTOMICS LLC
Filing Date
2018-09-12
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

In existing technologies, methods for examining whether a tissue pathology is diseased rely on manual interpretation, resulting in unreliable, expensive, and time-consuming results. Furthermore, neural network methods require a large amount of training data, which is difficult to obtain.

Method used

Using convolutional neural networks and support vector machines, combined with pathologist input, we identify whether tissues are diseased through computer vision, and perform sample analysis using a limited number of positive and negative training patches.

Benefits of technology

It improves the accuracy and efficiency of tissue lesion identification, reduces reliance on manual interpretation, and decreases the need for large amounts of training data.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117115548B_ABST
    Figure CN117115548B_ABST
Patent Text Reader

Abstract

A computer-implemented method is provided that generates regions of interest of at least one shape in a digital image. The method includes obtaining, by an image processing engine, access to a digital tissue image of a biological sample; tiling, by the image processing engine, the digital tissue image into a set of image patches; obtaining, by the image processing engine, a plurality of features from each patch in the set of image patches, the plurality of features defining a patch feature vector in a multi-dimensional feature space that includes the plurality of features as dimensions; determining, by the image processing engine, a user selection of a subset of patches in the set of image patches; classifying, by applying a trained classifier to patch vectors of other patches in the set of patches, the other patches as belonging or not belonging to a same class of interest as the subset of user-selected patches; and identifying one or more regions of interest based at least in part on results of the classifying.
Need to check novelty before this filing date? Find Prior Art

Description

[0001] Cross-reference to related applications

[0002] This application is a divisional application of application number 201880059228.4, entitled "Image Recognition Based on Few-Sample Learning for Tissue-Level Full-View Digital Slices". This application claims priority to U.S. Provisional Application No. 62 / 557,737, filed September 12, 2017, and relates to U.S. Application No. 15 / 791,209, filed October 23, 2017, and U.S. Provisional Application No. 62 / 411,290, filed October 21, 2016, each of which has previously been filed with the U.S. Patent and Trademark Office. All the foregoing applications and all other documents cited herein are incorporated herein by reference in their entirety.

[0003] introduction

[0004] The technology of this invention generally relates to identification, and in some embodiments, to image recognition. In some embodiments, the technology relates to histopathology, microscopic examination of tissues for determining whether tissue is diseased and / or studying diseased tissue. Tissue can be removed from any part of the body, including specimens of, for example, breast lumps, intestines, kidneys, liver, endometrium, lungs, chest, lymph nodes, muscles, nerves, skin, testes, thyroid gland, etc.

[0005] In some embodiments, the techniques disclosed herein relate to identifying regions of interest within digital images, such as identifying foreground objects from a background scene, or identifying cancer cells within a digital histopathological image. Cancer types within cancer cells may include, but are not limited to, breast cancer, bladder cancer, brain cancer, lung cancer, pancreatic cancer, skin cancer, colorectal cancer, prostate cancer, stomach cancer, liver cancer, cervical cancer, esophageal cancer, leukemia, non-Hodgkin's lymphoma, kidney cancer, uterine cancer, bile duct cancer, bone cancer, ovarian cancer, gallbladder cancer, gastrointestinal cancer, oral cancer, laryngeal cancer, eye cancer, pelvic cancer, spinal cord cancer, testicular cancer, vaginal cancer, vulvar cancer, and thyroid cancer. Regions of interest / classifications of interest can also be broader and include abnormal tissue, benign tissue, malignant tissue, bone tissue, skin tissue, nerve tissue, mesenchymal tissue, muscle tissue, connective tissue, scar tissue, lymphatic tissue, fat, epithelial tissue, and blood vessels.

[0006] Tissue can be collected from subjects in a variety of settings, including biopsies, surgeries, or autopsies. After removal from the subject, the tissue is placed in a fixative (such as formalin) to prevent decay, thus preparing it for chemical fixation. The tissue is then frozen or placed in molten paraffin. Tissue sections are then cut and placed on glass slides.

[0007] Once a tissue section is placed on a glass slide, a pathologist examines it under a microscope to determine, for example, whether the tissue is diseased, and if so, the stage of the disease. For instance, a pathologist can determine whether a breast lump contains breast cancer cells, and if so, the grade and / or stage of the cancer. Besides determining whether something is diseased, pathologists can make other determinations about the tissue. For example, a pathologist can determine whether the tissue contains lymphocytes. However, these determinations present a technical problem: they are often unreliable, expensive, time-consuming, and typically require verification by multiple pathologists to minimize the possibility of incorrect determinations.

[0008] One solution to this technical problem is to use computer vision to determine tissue characteristics (such as the type and / or grade of cancer) by training neural networks (or other machine learning systems) to determine whether a digital image of the tissue is lesion, and to determine the type of lesion (e.g., breast cancer) and / or stage (e.g., stage 3). However, this approach has a technical problem, for example, it requires a large amount of training data for each disease (e.g., a large number of positive and negative training patches for various cancers would be needed).

[0009] Some embodiments of the present invention address the aforementioned technical problems and provide a technical solution for using neural networks (more specifically, convolutional neural networks) and support vector machines in conjunction with limited inputs (such as from pathologists or other individuals or entities) to determine whether tissue is likely to be diseased. Attached Figure Description

[0010] The patent or application documents contain at least one color drawing. A copy of this patent or patent application disclosure with one or more color drawings will be provided by the official authority upon request and after payment of the necessary fees.

[0011] Figure 1 A block diagram of a distributed computer system in which one or more aspects of embodiments of the present invention can be implemented is shown;

[0012] Figure 2 A block diagram of an electronic device that can implement one or more aspects of embodiments of the present invention is shown;

[0013] Figure 3 An architectural diagram of an electronic device that can implement one or more aspects of embodiments of the present invention is shown;

[0014] Figure 4A A general deep learning architecture is shown that can implement one or more aspects of embodiments of the present invention;

[0015] Figure 4B A layer of a convolutional neural network is shown that can implement one or more aspects of embodiments of the present invention;

[0016] Figure 4C A hardware architecture diagram of an apparatus that can implement one or more aspects of embodiments of the present invention is shown;

[0017] Figures 5A to 5D The process is illustrated and performed by an electronic device capable of implementing one or more aspects of embodiments of the present invention.

[0018] Figure 6A Two types of objects are shown according to one or more aspects of embodiments of the present invention;

[0019] Figure 6B Demonstrates neural network pairs Figure 6A The classification results of the objects in the data;

[0020] Figure 6C Two types of objects are shown according to one or more aspects of embodiments of the present invention;

[0021] Figure 6D Demonstrates neural network pairs Figure 6C The classification results of the objects in the data;

[0022] Figures 7A to 7C A simplified diagram is shown illustrating multiple tissue patches that will be processed by an electronic device implementing one or more aspects of embodiments of the present invention.

[0023] Figure 8 A class of support vector machines (SVMs) with radial basis function (RBF) kernels is shown according to one or more aspects of embodiments of the present invention;

[0024] Figures 9A to 9B Two types of SVMs are illustrated according to one or more aspects of embodiments of the present invention;

[0025] Figure 10A The representation of positive patches after analysis by a binary SVM is shown according to one or more aspects of embodiments of the present invention;

[0026] Figure 10B A diagram illustrating the convex hull around a region of interest generated by an electronic device implementing one or more aspects of embodiments of the present invention is shown; and

[0027] Figures 11A to 11I The image shown is a slide image generated by an electronic device that implements one or more aspects of embodiments of the present invention, as displayed on a graphical user interface.

[0028] Although the invention has been described with reference to the accompanying drawings, the drawings are intended to be illustrative, and other embodiments within the spirit of the invention are contemplated. Detailed Implementation

[0029] The invention will now be described more fully below with reference to the accompanying drawings, which illustrate specific embodiments in which the invention may be practiced. However, the invention can be implemented in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be exhaustive and complete, and will fully convey the scope of the invention to those skilled in the art. Among other things, the invention can be implemented as an apparatus or method. Therefore, the invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Therefore, the following detailed description should not be considered limiting.

[0030] Throughout the specification and claims, unless the context clearly indicates otherwise, the following terms have the meaning explicitly associated herein. The phrases “in one embodiment,” “in an embodiment,” etc., as used herein do not necessarily refer to the same embodiment, although they may. Furthermore, the phrase “in another embodiment,” as used herein, does not necessarily refer to different embodiments, although they may. Therefore, as described below, various embodiments of the invention can be readily combined without departing from the scope or spirit of the invention.

[0031] Additionally, as used herein, unless the context explicitly states otherwise, the term “or” is an inclusive “or” operator and is equivalent to the term “and / or”. Unless the context explicitly states otherwise, the term “based on” is not exclusive and allows for basing on additional factors not described. Furthermore, throughout the specification, the meanings of “a,” “an,” and “the” include plural references. The meaning of “in…” includes both “in…” and “on…”.

[0032] It should be noted that the descriptions in this article are not intended as a broad overview, and therefore, concepts may be simplified for the sake of clarity and conciseness.

[0033] All references to this application are incorporated herein by reference in their entirety. Any process described in this application may be performed in any order, and any step in the process may be omitted. A process may also be combined with other processes or steps of other processes.

[0034] Figure 1Components of one embodiment of an environment in which the invention can be practiced are shown. Not all components are required for practicing the invention, and the arrangement and type of components can be changed without departing from the spirit or scope of the invention. As shown, system 100 includes one or more local area networks (“LAN”) / wide area networks (“WAN”) 112, one or more wireless networks 110, one or more wired or wireless client devices 106, mobile or other wireless client devices 102-106, servers 107-109, an optical microscope system 111, a laser 113, and may include or communicate with one or more data storage devices or databases. Various client devices 102-106 may include, for example, desktop computers, laptop computers, set-top boxes, tablet computers, mobile phones, smartphones, etc. Servers 107-109 may include, for example, one or more application servers, content servers, search servers, web servers, graphics processing unit (GPU) servers, etc.

[0035] The optical microscope system 111 may include a microscope, an eyepiece assembly, a camera, a slide platform, and other components such as... Figure 2 The components of the electronic device 200 shown. Although Figure 1 An optical microscope system 111 is shown to be communicatively coupled to a network 112, but it can also be coupled to any or all servers 107-109, wireless network 110, and / or any client device 102-106.

[0036] A laser 113 that can be connected to network 112 can be used to cut a portion of tissue that is believed to have cancer cells (or other types of cells).

[0037] Figure 2 A block diagram of an electronic device 200, according to an embodiment of the present invention, is shown, which can implement one or more aspects of a system and method for interactive video generation and rendering. Examples of the electronic device 200 may include a server (e.g., servers 107-109), an optical microscope system 111, and client devices (e.g., client devices 102-106). Typically, the electronic device 200 may include a processor / CPU 202, a memory 230, a power supply 206, and input / output (I / O) components / devices 240, such as microphones, speakers, displays, smartphone displays, touchscreens, keyboards, mice, keypads, GPS components, etc., operable to provide a graphical user interface.

[0038] Users can provide input via the touchscreen of electronic device 200. The touchscreen can determine whether a user is providing input, for example, by determining whether the user is touching the touchscreen with a part of the user's body (such as his or her finger). Electronic device 200 may also include a communication bus 204 connecting the aforementioned elements of electronic device 200. Network interface 214 may include a receiver and transmitter (or transceiver) and one or more antennas for wireless communication.

[0039] Processor 202 may include one or more of any type of processing device, such as a central processing unit (CPU). Furthermore, the processor may be, for example, central processing logic or other logic, and may include hardware, firmware, software, or a combination thereof to perform one or more functions or actions, or to induce one or more functions or actions from one or more other components. Moreover, depending on the desired application or requirements, the central processing logic or other logic may include, for example, a software-controlled microprocessor, discrete logic (e.g., application-specific integrated circuits (ASICs), programmable / programmable logic devices, instruction-containing memory devices, etc.), or combinational logic implemented in hardware. Additionally, the logic may also be implemented entirely as software. Electronic device 200 may also include a GPU (not shown), i.e., dedicated electronic circuitry designed to manipulate and modify memory to accelerate the creation and processing of images intended to be output to a frame buffer of a display device.

[0040] The memory 230, which may include random access memory (RAM) 212 and read-only memory (ROM) 232, may be implemented by one or more of any type of memory device, such as a main storage device (directly accessible by the CPU) or an auxiliary storage device (indirectly accessible by the CPU) (e.g., flash memory, disk, optical disk, etc.). RAM may include an operating system 221, a data storage device 224 that may include one or more databases, and software programs and / or application programs 222 that may include, for example, digital histopathology and microanatomy programs 223. ROM 232 may also include a basic input / output system (BIOS) 220 for electronic devices.

[0041] Program 223 is intended to broadly include or represent all programs, applications, algorithms, software, and other tools necessary for implementing or promoting the methods and systems according to embodiments of the present invention. Elements of the systems and methods for interactive video generation and rendering programs may reside on a single server computer or may be distributed among multiple computers, servers, devices, or entities that may include advertisers, publishers, data providers, etc. If the systems and methods for interactive video generation and rendering programs are distributed among multiple computers, servers, devices, or entities, such multiple computers will, for example, Figure 1 Communication is performed as shown.

[0042] The power supply 206 includes one or more power supply components and facilitates the supply and management of power to the electronic device 200.

[0043] The input / output components, including input / output (I / O) interface 240, may include, for example, any interface used to facilitate communication between components of electronic device 200, components of external devices (e.g., components of other devices in network or system 100), and end users. For example, such a component may include a network interface card (NIC), which may be an integration of a receiver, transmitter, transceiver, and one or more input / output interfaces. For example, the NIC may facilitate wired or wireless communication with other devices in the network. In the case of wireless communication, an antenna may facilitate such communication. Furthermore, some of the input / output interfaces 240 and bus 204 may facilitate communication between components of electronic device 200 and, in this example, may simplify the processing performed by processor 202.

[0044] In the case where electronic device 200 is a server, it may include a computing device capable of transmitting or receiving signals, for example, via a wired or wireless network, or capable of processing signals or storing signals in memory, for example, as a physical memory state. The server may be an application server, which includes configuration for providing one or more applications (e.g., aspects of systems and methods for interactive video generation and rendering) to another device via a network. Furthermore, the application server may, for example, host a website that provides a user interface for managing example aspects of systems and methods for interactive video generation and rendering.

[0045] Any computing device capable of sending, receiving, and processing data via wired and / or wireless networks can act as a server, for example, to facilitate aspects of implementations of systems and methods for interactive video generation and rendering. Therefore, devices acting as servers can include devices such as dedicated rack servers, desktop computers, laptop computers, set-top boxes, integrated devices combining one or more of the aforementioned devices, etc.

[0046] Server configurations and capabilities can vary greatly, but they typically include one or more central processing units, memory, large-capacity data storage devices, power supplies, wired or wireless network interfaces, input / output interfaces, and operating systems such as Windows Server, Mac OS X, Unix, Linux, and FreeBSD.

[0047] A server may include, for example, a device configured to provide data or content to another device via one or more networks, or include configurations for providing data or content to another device via one or more networks, such as aspects of example systems and methods for interactive video generation and rendering. For example, one or more servers may be used to host websites, such as www.microsoft.com. One or more servers may host various types of sites, such as, for example, business sites, information sites, social networking sites, educational sites, wikis, financial sites, government sites, personal sites, etc.

[0048] The server can also provide various services, such as web services, third-party services, audio services, video services, email services, HTTP or HTTPS services, instant messaging (IM) services, short message service (SMS) services, multimedia messaging service (MMS) services, file transfer protocol (FTP) services, voice over IP (VoIP) services, calendar services, telephone services, etc., all of which can work in conjunction with example aspects of the sample systems and methods used for interactive video generation and rendering. Content can include, for example, text, images, audio, video, etc.

[0049] In examples of systems and methods for interactive video generation and rendering, the client device may include, for example, any computing device capable of sending and receiving data via wired and / or wireless networks. Such client devices may include desktop computers as well as portable devices such as cellular phones, smartphones, display pagers, radio frequency (RF) devices, infrared (IR) devices, personal digital assistants (PDAs), handheld computers, GPS-enabled devices, tablet computers, sensor-equipped devices, laptop computers, set-top boxes, wearable computers, integrated devices combining one or more of the aforementioned devices, etc.

[0050] Client devices that can be used in example systems and methods for interactive video generation and rendering can vary widely in terms of capabilities and features. For example, a cellular phone, smartphone, or tablet computer might have a numeric keypad and a monochrome liquid crystal display (LCD) that can only display a few lines of text. In another example, a web-enabled client device could have a physical or virtual keyboard, data storage devices (such as flash memory or SD cards), an accelerometer, a gyroscope, GPS or other location-aware capabilities, and a 2D or 3D color touch-sensitive screen that can display both text and graphics simultaneously.

[0051] For example, client devices (such as client devices 102-106) that can be used in the example systems and methods for interactive video generation and rendering can run various operating systems, including personal computer operating systems such as Windows, iOS, or Linux, and mobile operating systems such as iOS, Android, and Windows Mobile. The client device can be used to run one or more applications configured to send or receive data from another computing device. Client applications can provide and receive text content, multimedia information, etc. Client applications can perform actions such as: browsing web pages, using web search engines, interacting with various apps stored on a smartphone, sending and receiving messages via email, SMS, or MMS, playing games (e.g., Fantasy Sports League), receiving advertisements, watching locally stored or streamed videos, or participating in social networks.

[0052] In examples of systems and methods for interactive video generation and rendering, one or more networks (such as network 110 or 112) can, for example, couple server and client devices to other computing devices, including coupling to client devices via wireless networks. Networks can be made capable of using any form of computer-readable medium to transfer information from one electronic device to another. In addition to local area networks (LANs), wide area networks (WANs), direct connections (such as via a universal serial bus (USB) port, other forms of computer-readable media), or any combination thereof, networks can also include the Internet. On a set of interconnected LANs (including LANs based on different architectures and protocols), routers act as links between LANs, enabling data to be sent between them.

[0053] Communication links within a LAN may include twisted-pair or coaxial cables, while communication links between networks may utilize analog telephone lines, cable lines, optical lines, full or partial dedicated digital lines including T1, T2, T3, and T4, Integrated Services Digital Network (ISDN), Digital Subscriber Line (DSL), wireless links including satellite links, fiber optic links, or other communication links known to those skilled in the art. Furthermore, remote computers and other related electronic devices may be remotely connected to the LAN or WAN via modems and telephone links.

[0054] In example systems and methods for interactive video generation and rendering, a wireless network (such as wireless network 110) can couple the device to the network. The wireless network can be a standalone self-organizing network, a mesh network, a wireless LAN (WLAN) network, a cellular network, etc.

[0055] Wireless networks can further include autonomous systems such as terminals, gateways, and routers connected via wireless radio links. These connected devices can be configured to move freely and randomly and organize themselves arbitrarily, allowing the topology of the wireless network to change rapidly. Wireless networks can further employ various access technologies, including second-generation (2G), third-generation (3G), fourth-generation (4G), Long Term Evolution (LTE) radio access for cellular systems, WLAN, and Wireless Router (WR) meshes. Access technologies such as 2G, 2.5G, 3G, and 4G, as well as future access networks, can provide wide-area coverage for client devices (e.g., client devices with various mobility). For example, wireless networks can achieve radio connectivity through radio network access technologies such as Global System for Mobile Communications (GSM), Universal Mobile Telecommunications System (UMTS), General Packet Radio Service (GPRS), Enhanced Data GSM Environment (EDGE), 3GPP Long Term Evolution (LTE), Advanced LTE, Wideband Code Division Multiple Access (WCDMA), Bluetooth, and 802.11b / g / n. Wireless networks can essentially include any wireless communication mechanism through which information can be propagated between client devices and other computing devices, networks, etc.

[0056] The Internet Protocol (IP) is used to transmit data communication packets over networks participating in digital communication networks, and can include protocols such as TCP / IP, UDP, DECnet, NetBEUI, IPX, and AppleTalk. Versions of the Internet Protocol include IPv4 and IPv6. The Internet includes Local Area Networks (LANs), Wide Area Networks (WANs), wireless networks, and long-distance public networks, which allow the transmission of data packets between LANs. Data packets can be transmitted between nodes in a network connected to a site, each node having a unique local network address. Data communication packets can be sent from a user site over the Internet via access nodes connected to the Internet. If the destination site's address is included in the packet header, the packet can be forwarded to any destination site connected to the network by network nodes. Each data packet transmitted over the Internet can be routed via paths determined by gateways and servers, which switch packets based on the destination address and the availability of network paths to connect to the destination site.

[0057] The header of a data packet may include, for example, source port (16 bits), destination port (16 bits), sequence number (32 bits), acknowledgment number (32 bits), data offset (4 bits), reserved (6 bits), checksum (16 bits), urgent pointer (16 bits), options (variable length, a multiple of 8 bits), and padding (which may consist of all zeros and include multiple bits to ensure the header ends with a 32-bit boundary). The number of bits for each of these can also be higher or lower.

[0058] As used in example systems and methods for interactive video generation and rendering, a "content delivery network" or "content distribution network" (CDN) generally refers to a distributed computing system comprising a collection of autonomous computers linked by one or more networks, and software, systems, protocols, and technologies designed to facilitate the storage, caching, or transmission of various services, such as content, streaming media, and applications representing content providers. Such services may utilize supporting technologies, including but not limited to "cloud computing," distributed storage, DNS request processing, provisioning, data monitoring and reporting, content targeting, personalization, and business intelligence. A CDN can also enable entities to operate and / or manage third-party website infrastructure, wholly or partially on behalf of third parties.

[0059] Peer-to-peer (P2P) computer networks rely primarily on the computing power and bandwidth of network participants, rather than centralizing them on a given set of dedicated servers. P2P networks are typically used to connect nodes via a large number of self-organized connections. Pure peer-to-peer networks do not have the concept of clients or servers; instead, they only have equal peer nodes that simultaneously act as both "clients" and "servers" for other nodes on the network.

[0060] One embodiment of the present invention includes a system, method, and one or more non-transitory computer-readable storage media that tangibly store computer program logic relating to digital histopathology and microanatomy that can be executed by a computer processor.

[0061] As mentioned above, relying solely on individual pathologists to examine and determine whether a tissue sample (“sample”) possesses a certain characteristic (e.g., a lesion, or especially cancer) can be unreliable, expensive, or time-consuming. On the other hand, if determination is made solely through neural networks, a large amount of training data (including positive and negative training data) may be required for each of the various tissue characteristics (e.g., the type and grade of disease), which is difficult to collect. For example, generating such training data might require receiving input from one or more pathologists regarding whether such images are positive or negative for a particular disease, or regarding other characteristics of the sample images.

[0062] Embodiments of the present invention include determining whether a sample is diseased. The embodiments described below relate particularly to cancer. However, embodiments of the present invention can be used to determine other characteristics of a sample. For example, embodiments of the present invention can be used to determine whether a sample shows other lesions or disease severity. As another example, embodiments of the present invention can be used to determine whether a sample contains lymphocytes.

[0063] Embodiments of the present invention relate to determining whether a sample is cancerous by using computer vision and input from one or more pathologists regarding whether one or more patches on an image are positive or negative for a specific type or grade of cancer. Other embodiments relate to identifying whether a sample (e.g., an image, sound, pattern, etc.) is positive or negative for meeting criteria (e.g., criteria for identifying a person, object, sound, pattern, etc.).

[0064] Computer vision involves automatically extracting, analyzing, and understanding useful information from one or more digital images. For example, computer vision can be used to determine the age of a person in a photograph by determining the location of a face in a digital image, determining the location of the person's eyes, and measuring the person's interpupillary distance.

[0065] In the field of machine learning, convolutional neural networks (“CNNs”) are a type of artificial neural network that can be used in computer vision. The article “Inception Architecture for ComputerVision” by Christian Szegedy et al. (arXiv:1512.00567v3 [cs.CV] December 11, 2015) discusses the use of CNNs in computer vision, and that article is incorporated into this paper in its entirety by reference.

[0066] Figure 3 An architectural diagram of an electronic device that can implement one or more aspects of embodiments of the present invention is shown. Figure 3 The system includes an image processing engine 301 that processes digital images to output a region of interest 321. The image processing engine 301 includes a patch generation engine 303, a feature extraction engine 305, a user selection engine 307, an SVM training engine 308, a classification engine 309, and a grouping engine 317. The image processing engine 301 is coupled to (e.g., in communication with) a CNN 319, a digitally organized image database 323, training patch databases 302F and 302J, a positive one-class support vector machine (SVM) 311, a negative one-class support vector machine 313, and a two-class SVM 315. The image processing engine 301 is also coupled to a CNN 329, which can be trained using the output of the image processing engine 301 as training data to classify other digital images.

[0067] Figure 4A The deep learning architecture of CNN is shown. (Example) Figure 4B As shown, a CNN has multiple layers, and each layer has multiple parameters (input size). Figure 4BThis includes information about the layer type, patch size, and input size for each layer. The values ​​of the parameters determine the output of the CNN.

[0068] The CNN 319 can be fed input with tissue sample images (or patches of such images), and the CNN 319 can provide multiple (e.g., 2048) image feature values ​​(i.e., feature extraction or feature representation of visual descriptors) as output. This output will come from a linear layer. Figure 4B The softmax layer shown directly below the linear layer in the diagram may not be necessary and can be removed from the CNN 319 architecture. Images of tissue samples can be slide images, and specifically digital histopathological images.

[0069] Figure 5A A process performed by an electronic device capable of implementing one or more aspects of embodiments of the present invention is illustrated. More specifically, Figure 5A This illustrates an advanced end-to-end process that begins with a histological technician slicing a tissue sample into thin sections and ends with the technician cutting a portion of the sample that has been systematically identified as cancerous (or another disease). (See reference...) Figure 4C The hardware architecture diagram shown is for simplification purposes. Figure 4C Some hardware has been omitted. For example, Figure 4C Networks, switches, routers, etc., are not shown. However, those skilled in the art will understand that clients connect to servers or other clients via a network, and servers connect to each other via a network. That is, for example, Figure 4C Client 401 and 403 can be Figure 1 The client devices 102-106, the optical microscope system 111, or the laser 113, and Figure 4C Servers 405, 407, 409, and 411 can be Figure 1 Servers 107-109. Alternatively, any of the processing described below can be performed on any other device (e.g., processing performed on a web server can be performed on a GPU server instead).

[0070] exist Figure 5A In step 500A, the histological technician receives a block of tissue sample and cuts it into thin and thick sections. The thin sections are used for scanning, and once the relevant portions of the thin sections are identified, the thick sections are used for microscopic dissection. The technician can be located in, for example, a hospital, a doctor's office, or a laboratory.

[0071] In step 500B, the technician stains the slides with, for example, hematoxylin and eosin (H&E) and scans the H&E glass slides to generate digital tissue images, also known as full-field digital sections (WSI). Scanning can be performed using a scanner with a resolution of 200,000 × 200,000. The size of a single WSI may be approximately 5 GB.

[0072] In step 500C, a technician uploads the WSI to web server 405 using client 401 or client 403 (there may be multiple web servers, but only one is shown for simplicity). Web server 405 then transfers the WSI to file server 407, which stores multiple WSIs and related metadata. Web server 405 then uses a queuing engine running on the web server based on a load balancing algorithm to determine which of the GPU servers 409 or 411 should be used to break down the WSI into patches and additional features. Once a GPU server is selected, web server 405 transmits instructions to the selected GPU server (e.g., GPU server 409) to use a convolutional neural network running on GPU server 409 to break down the WSI into patches and extract features (as shown in the following reference). Figure 5B Step 505 is discussed in more detail.

[0073] In step 500D, once GPU server 409 (or 411) has extracted features using a convolutional neural network, it transmits a message to file server 407 to store metadata associated with the WSI. This metadata includes the patch location, patch size (e.g., 400×400 pixels), and the value of each feature of the patch extracted by GPU server 409. The metadata is then stored on file server 407 (associated with the stored WSI) and can be accessed by any GPU server (e.g., 409 and 411) or web server 405.

[0074] In step 500E, once metadata for all patches for WSI has been generated and stored on file server 407, web server 405 sends a notification to a user (e.g., a pathologist) that metadata for all patches for WSI has been generated. The pathologist could be, for example, a user at client 403.

[0075] In step 500F, the user at client 403 selects one or more positive patches and zero or more negative patches, as shown in the following reference. Figure 5B Step 507 is discussed in more detail. The user's selections are then transmitted from the client 403 to the web server 405, which in turn transmits them to the file server 407 for storage as metadata associated with the WSI.

[0076] In step 500G, based on the user's selection and the extracted features, the web server generates a mask, such as the following regarding... Figure 5B The microanatomical mask is discussed in detail in steps 509, 511 and 513.

[0077] In step 500H, web server 405 transmits the mask to a technician at client 401. Then, in step 500J, the technician uses laser 113 to cut a portion of the thick-slab tissue sample identified in the mask.

[0078] Figure 5B A process performed by an electronic device capable of implementing one or more aspects of embodiments of the present invention is illustrated. More specifically, Figure 5B This paper illustrates a method for receiving digital tissue images of biological samples and determining, in part, whether a portion of the sample may be cancerous, based on user input and machine learning. Once referenced above... Figure 5A Once these parts have been identified, a portion of the biological sample can be microscopically dissected for further processing. Figure 5B The steps can be performed in the order shown, or in other orders. For example, step 503 can be performed after step 505 or step 507.

[0079] According to embodiments of the invention, before CNN 319 receives relevant tissue sample images (i.e., test samples) (or patches of such images) as input and provides image feature values ​​as output, CNN 319 can be trained on general images (i.e., images that are not cancer cells and images that do not contain cancer cells). That is, CNN 319 can be viewed as a trained general classifier. For example, ImageNet images can be used to train CNN 319. ImageNet is a large visual database for research on visual object recognition software organized according to the so-called WordNet hierarchy (which currently includes a large number of nouns), where each node of the hierarchy includes a large number of images (e.g., 500). See, for example, http: / / www.image-net.org / . For example, ImageNet includes a large number of images organized hierarchically for each of the following nouns: animal, vertebra, mammal, placental mammal, primate, monkey, baboon, mandrill. Each category can include synonyms and / or related terms, such as animal, which includes related terms animate being, beast, and creature. In WordNet, each term (i.e., each category of images) is represented by a large number of images showing the relevant topic in various poses, times, ages, locations, image qualities, colors, and with additional objects, backgrounds, etc. This hierarchy may have, for example, 1000 nodes. The multiple images initially used to train CNN 319 may, for example, include multiple images (or patches) of various general images that are not necessarily related to any disease. In one example, CNN 319 has previously been trained on tens of thousands or even millions of such images.

[0080] CNN 319, positive class 1 SVM 311, negative class 1 SVM 313, class 2 SVM 315, class 1 SVM training patch 302F, class 2 SVM training patch 302J, test patch 323, and region of interest 321 can also be stored and executed on servers 107-109, or alternatively on one or more client devices 102-106, or a combination of servers 107-109 and / or client devices 102-106. More specifically, SVMs 311, 313, and 315 can be... Figure 4C It stores and executes data on a web server with a 405 error.

[0081] The trained CNN 319 can be used to extract features from images of biological samples of unknown classification (i.e., test patches). Once trained, CNN 319 is ready to be used with “test” patches. Test patches (e.g., test patch 323) are patches from actual patient tissue samples that may have been captured using the microscope, eyepiece assembly, camera, and slide platform of the optical microscope system 111.

[0082] In step 502, the image processing engine 301 gains access to a digital tissue image of the biological sample. Instructions can be provided to the user regarding the selection of a digital tissue image of the biological sample (i.e., a full-view digital slice), such as... Figure 11C As shown. Digital images can be in various formats, such as SVS, TIFF, VMS, VMU, NDPI, SCN, MRXS, SVSLIDE, BIF, PDF, JPG, BMP, GIF, and any other digital image format. Furthermore, digital images can reside on servers, can be large images (many GB in size), and can be stored in the cloud. Figure 5B All analysis can be performed in the cloud. The cloud can include servers 107-109. However, Figure 5B The steps can be performed at one or more client devices 102-106, or on a combination of servers 107-109 and / or client devices 102-106. The processing can be parallel and can be performed on multiple servers. The digital image can include biological sample metadata, which includes digital information associated with at least one of the following: tissue type, tissue donor, scanner, staining agent, staining technique, preparer identifier, image size, sample identifier, tracking identifier, version number, file type, image date, symptoms, diagnosis, attending physician identification information, tissue donor medical history, tissue donor demographic information, tissue donor family medical history, and tissue donor species.

[0083] In step 503, the tile generation engine 303 tiles the digital tissue image into a set of image patches 323 (test patches 323). Each tile / patch in the test patches 323 may, for example, be less than or equal to 1000×1000 pixels, less than or equal to 400×400 pixels, less than or equal to 256×256 pixels, or any other suitable number of pixels. The tiling step may be performed iteratively or in parallel by one or more computers. Tiling may include creating image patches with a uniform size and shape. The patch size may be a function of the sizes of the previous positive patch 302D and the previous negative patch 302E, or a function of the size of patches previously selected by the pathologist as positive or negative for cancer, cancer grade, a specific type or grade of suspected cancer in test patch 323, or some other disease. For example, if the patch previously selected by the pathologist as positive or negative is a 400×400 patch, the tile generation engine 303 can tile the image into a patch of the same size (400×400), or within 1%, 3%, 5%, 10%, 20%, 25%, or 30% of the previous patch size.

[0084] In step 503, the test patch 323 may or may not have a uniform size and shape. For example, one patch may be 400×400, while another patch may be 300×300 or 300×200. The patches do not have to be square; they can be rectangular, circular, elliptical, or more complex shapes. Various tiling techniques can be used, such as Penrose tiling, batch exclusion, and / or bounding boxes.

[0085] In step 503, the generated patches can be overlapping or non-overlapping. That is, the same area of ​​the digital image may or may not be included in more than one tile / patch.

[0086] The generated patch is stored as test patch 323 in the memory of servers 107-109 and / or client devices 102-106, or a combination of servers 107-109 and / or client devices 102-106.

[0087] In step 505, the feature extraction engine 305 extracts multiple features from each patch of the test patch 323 and stores them as metadata or a separate data structure within the test patch 323, or in a separate database / data structure outside of the test patch 323. Specifically, the feature extraction engine 305 extracts image features from the patch 323. In one embodiment, the feature extraction engine 305 extracts features that are identified as useful by the CNN 319 (e.g., a CNN based on Inception v3 and trained on various images such as ImageNet images). The feature extraction engine 305 utilizes the fully trained CNN 319 to extract features from the image. In one embodiment, the feature extraction engine 305 communicates with the trained CNN 319 and provides it with a patch 323 for feature extraction one at a time or in a single call / message. When the CNN 319 receives each patch from the feature extraction engine 305, it... Figure 4B The layers shown (from the topmost convolutional layer to the penultimate bottom linear layer) process each patch 323. The linear layers provide multiple features (e.g., 2048) representing the input patch as output, and these 2048 feature values ​​are fed to the feature extraction engine 305, such as... Figure 3 As shown. These values ​​can be provided by CNN319 to feature extraction engine 305 in, for example, arrays, vectors, linked lists, or other suitable data structures. The features can then be stored in test patch 323.

[0088] In step 505, the feature extraction engine 305 can identify / select patches from the test patches 323 based on pixel content, rather than extracting features from all test patches 323. For example, identification may include filtering patches based on the color channels of pixels within the image patch. Identification may be based on the variance of the patch. The variance of the patch may be based on the variance of the red, green, and blue (RGB) channels, and / or hue / saturation values ​​(HSV), and / or hue / saturation and / or brightness (HLS), and / or hue / saturation intensity (HIS) in a particular patch. This step helps ensure that only patches containing cells are considered. Once step 505 is complete, only patches containing cells can be identified / selected for feature extraction.

[0089] In step 507, the user selection engine 307 determines a subset of user selections from the image patch set. For example... Figure 11D As shown, instructions can be provided to the user regarding user selection. Specifically, the graphical user interface (GUI), controlled by the user selection engine 307, displays test patch 323 to the user (e.g., a pathologist or one or more pathologists at one or more sites). For example, Figures 7A to 7CMultiple patches that can be shown to the user are illustrated. On the I / O interface 240 (such as a screen, smartphone screen, tablet computer, touchscreen, etc.), the user selection engine 307 can display to the user similar to... Figure 7A , Figure 7B or Figure 7C The GUI (which can show more or fewer patches). Figure 11E and 11F The GUI is shown after a user has selected a positive patch. The user can select a positive / negative dropdown to indicate whether they want to select a positive or negative patch. Once the user has selected positive or negative, they can click on one or more patches to select them. If the user believes they may have made a mistake, they can click the clear button and start selecting patches again. Figure 11E When the "Fill Holes" checkbox is selected, if a hole (an area in one or more patches that the user did not select) exists in an area that the user has chosen as positive for cancer, that hole (unselected area) will be treated as having been selected as positive for cancer and will therefore be filled (the same applies if the user has selected a negative patch). Once the user is certain that he or she has selected all the positive or negative patches he or she wants, he or she can click the submit button.

[0090] First, the user can choose to select either positive patches (i.e., patches that the user believes are positive for cancer, a specific type or grade of cancer, or other related diseases) or negative patches (i.e., patches that the user believes are negative for cancer, a specific type or grade of cancer, or other related diseases). Once this selection is made, the user will select one or more positive or negative patches (depending on whether the user selected negative or positive patches above). Optionally, the user can select positive patches first, or can select only positive patches without selecting negative patches. Once the user is satisfied with having selected a sufficient number of patches, they can select the "Submit" (or similar) button to let the user selection engine 307 know that they have completed the selection of a specific set (positive or negative) of patches.

[0091] Then, any positive patches selected by the user are stored in the SVM training patch 302F, and specifically in the current positive patch 302B, while any negative patches selected by the user are stored in the current negative patch 302C. The entire selected patch can be stored, or the identifiers of patches in test patches 323 can be referenced. For example, if there are 10,000 test patches 323, and each patch has an identifier between 1 and 10,000, the current positive patch 302B and the current negative patch 302C can store multiple identifiers, each referencing a test patch in test patches 323. For example, after the user selection in step 507, the user selection engine 307 can store a linked list with identifiers 8, 500, 1011, 5000, and 9899 in the current positive patch 302B. Similarly, the user selection engine 307 can store a linked list with identifiers 10, 550, 1015, 4020, and 9299 in the current negative patch 302C.

[0092] The user's selection helps the image processing engine 301 correctly identify other test patches 323 that may be negative or positive for cancer. Assuming the user correctly selects positive and negative patches, the image processing engine 301 can then compare its known positive patches (via the user's selection of positive patches in step 507) with other patches in the test patches 323, selecting such patches as potentially positive based on the feature distance between candidate test patches in the test patches 323 and the user-selected positive patches, as discussed in more detail below with positive one-class support vector machine (SVM) 311, negative one-class SVM 313, and two-class SVM 315. The image processing engine can similarly compare the user-selected negative patches with the test patches 323 to determine possible negative patches.

[0093] Although the image processing engine 301 can use the CNN 319 trained on the general image in step 501, as discussed in detail below, and select patches that may be positive or negative without SVMs 311, 313, and 315, however... Figures 6A to 6D The process shown may contain classification errors, as discussed in more detail below.

[0094] Figure 6A Two types of images are shown: (1) pine trees and bamboo (“bamboo”); and (2) dumplings. Figure 6A The images are input into a CNN (e.g., Inception v3) that has already been trained on general images (e.g., ImageNet). Based on ImageNet training, the linear layers of the CNN will be... Figure 6AEach test image shown generates 2048 features. Image processing engine 301 then calculates the distance between each image and every other image (e.g., the Euclidean / L2 distance of the 2048 features extracted by the linear layers of a CNN) (i.e., the distance between image pairs). Analysis can be performed to determine which of the 2048 features is particularly important and increase its weight to generate a modified L2 distance, thus giving some features a heavier weight than others. Using a prior distance threshold, the system then classifies the image pairs into the same class. Once the user makes a selection about a positive image / patch, the system uses the above analysis to find a matching patch. The same analysis can be performed on the negative patch selected by the user.

[0095] Figure 6B A vertical bar chart is shown, illustrating the points discussed above. Figure 6A The classification results for the images are shown. The x-axis represents the difference in L2 distance between pairs (values ​​of the 2048 features extracted by CNN 319). The y-axis is a normalized histogram showing the normalized number of values ​​for pairs falling within that L2 distance.

[0096] according to Figure 6B Each green bar (i.e., the first 8 bars from the left) shows a matching pair (i.e., bamboo-bamboo or dumpling-dumpling), while each red bar (the last 2 bars near the right end) shows a non-matching pair (i.e., bamboo-dumpling). Figure 6B As shown, users can predetermine an L2 distance threshold to programmatically separate matching pairs from non-matching pairs based on their L2 distance. For example, if an L2 distance of approximately 18 is chosen (where any image pair with an L2 distance less than or equal to 18 is considered a match, and any image pair with an L2 distance greater than 18 is considered a non-match), this appears to provide accurate results for matching and non-matching pairs.

[0097] However, from Figures 6C to 6D As can be seen, the above method may not be applicable to other image sets. Similar to... Figure 6A , Figure 6C Two types of images are shown: (1) bamboo; and (2) dumplings. Similar to... Figure 6B , Figure 6D A vertical bar chart was displayed, showing... Figure 6C The classification results of the images in the image. However, from Figure 6D As can be seen from this, the use and Figures 6A to 6B The same processing occurs in the previous example, where several non-matching pairs (red bars) have lower L2 distances than matched pairs (green bars). Therefore, it is impossible to pre-select an L2 distance threshold that will include only matched pairs when the L2 distance is below or equal to the threshold and only non-matching pairs when the L2 distance is above the threshold.

[0098] Furthermore, in pathological samples, variations in test images / patches (e.g., color and intensity) make classification even more difficult than in... Figures 6A to 6D It is even more difficult in the middle. Therefore, using Figures 6A to 6D The matching method used to process pathological samples may result in even more incorrectly matched image / patch pairs.

[0099] Return to Figure 5B Once the user selection for the positive patch and / or negative patch is determined in step 507, the training images for the two-class SVM 315 are determined in step 509. The following references... Figure 5C Step 509 is described in further detail.

[0100] In step 509A, the SVM training engine 308 accesses the current positive patch 302B (i.e., the positive patch selected by the user from the current test patch 323) (typically, data such as the current positive patch 302B can be directly accessed or can be provided to the current engine (SVM training engine 308) by a previous or other engine (in this case, the user-selected engine 307)). In step 509B, the SVM training engine 308 then accesses the previous positive patch 302D. For example, the SVM training engine 308 can select all or a predetermined number (e.g., 2, 3, 5, 10, 15, 20, 30, 50, or 100) of the most recently selected previous positive patches 302D, or it can select all or a predetermined number of previous positive patches 302D involving the same type of cancer / disease suspected in test patch 323 from the previous test patches, or it can select all or a predetermined number of previous positive patches 302D selected by the current user.

[0101] Since the user selected the current positive patch 302B, it is highly likely to be accurate. However, the previous positive patches 302D include patches from other images / patches (not the current test patch 323). Therefore, it is necessary to determine whether they are related to the test patch 323 by selecting only the previous positive patches 302D that are relevant to the current positive patch 302B. This determination is performed in steps 509C to 509D.

[0102] In step 509C, the SVM training engine 308 accesses and trains a positive-class SVM 311, as follows: Figure 8 As shown. SVM is a supervised learning model with a relevance learning algorithm that analyzes data for classification and regression analysis (classification in our case). An SVM first receives training data (e.g., images) for one or more categories as input. It then builds a model based on the training data that can determine the category to which a new test image belongs. One class of SVM has images of a single class for which it performs classification.

[0103] Figure 8 An example is shown illustrating the results of using a class of SVMs with radial basis function (RBF) kernels according to one or more aspects of embodiments of the present invention. It can be seen that... Figure 8 Multiple strips, including 801, 803, and 805, are shown. Figure 8 Green training observation 802 is also shown, which is a patch used to train a positive class SVM 311. The training observation only includes the current positive patch 302B (or a predetermined number of current positive patches 302B).

[0104] Once the current positive patch 302B is input into the positive class SVM 311 to train the positive class SVM 311, the SVM generates bands using a Gaussian distribution based on the data points (vectors) in the 2048-dimensional feature space used in this embodiment to describe the current positive patch 302B. Then, once the test patch is input into the positive class SVM 311, the test patch 323 within the two innermost pink elliptical regions 801 is most similar to the current positive patch 302B. The innermost darkest blue band 803, completely surrounding the two pink regions 801, will include patches that are less similar to the patches in the innermost pink regions (or typically the training patch (current positive patch 302B)). Regions 805 are even further away from the pink regions 801, and the patches in these regions are even less similar to the patches in the pink regions 801.

[0105] SVM training engine 308 determines which band representations are included in the classification of the training data (in this case, positive patches) and which band representations are excluded. For example, SVM training engine 308 can determine (or can predetermine) that only previous positive patch 302D (observation) representations identified as being in regions 801 and 803 are included in the same classification as the training data (in this example, current positive 302B). That is, the positive-class SVM 311 determines that previous positive patch 302D in region 801 or 803 will be considered to be in the same class as the current positive patch 302B. Specifically, after training, SVM training engine 308 will provide a subset (or all) of previous positive patches 302D as input to the positive-class SVM 311. The positive-class SVM 311 will determine the region for each patch based on the training data and will only identify previous positive patches 302D in region 801 or 803 as belonging to the same class.

[0106] Note that the process described in step 509C, which uses a positive class SVM 311 to determine which patches in the previous positive patch 302D belong to the same category as the current positive patch 302B, can be performed using other outlier detection methods. For example, an elliptical envelope or an isolated forest (e.g., which uses a binary tree to detect data anomalies) can be used instead of a class SVM or in combination with a class SVM to determine which patches in the previous positive patch 302D can be removed as outliers.

[0107] Once the SVM training engine 308 determines which previous positive patches 302D belong to the same category as the current positive patch 302B, in step 509E, the current positive patch 302B and the selected previous positive patches 302D (selected by the positive class I SVM) are combined and stored as a class II positive training patch 302G (the patch can be stored as an identifier of an image or a related test patch 323). These patches 302G are then used to train the class II SVM 315.

[0108] Then, a similar process can be performed for the negative patch. That is, in step 509F, similar to step 509A, the current negative patch 302C can be accessed. In step 509G, similar to step 509B, the previous negative patch 302E can be accessed. In step 509H, similar to step 509C, the negative first-class SVM 313 can be trained. In step 509J, similar to step 509D, a subset of the previous negative patch 302E can be determined. In step 509K, similar to step 509E, the current negative patch 302C and the subset of the previous negative patch 302E can be combined and stored in the second-class negative training patch 302H.

[0109] As an alternative, instead of using the negative class SVM 313 to perform steps 509F-509K, the positive class SVM 311 can be used for both positive and negative patches. That is, for example, if the user does not provide the current negative patch 302C, the SVM training engine 308 can use the positive class SVM 311 to determine the previous negative patch to use as the second-class negative training patch 302H.

[0110] Specifically, in step 509F, no current negative patch is accessible, so this step is skipped. In step 509G, the previous negative patch 302E is accessed. Step 509H can be skipped, and instead, a trained positive class 1 SVM will be used. Step 509J will be modified so that instead of selecting the previous negative patch 302E located within inner bands 801 and 803, only the previous negative patch 302E not located within bands 801 and 803 will be selected. That is, only the previous negative patch 302E (band 805, etc.) that is not sufficiently similar to the current positive patch 302B will be selected. The selected previous negative patch 302E (or its identifier) ​​will then be stored in the class 2 negative training patch 302H. If no previous negative patch 302E exists, the SVM training engine 308 may store stock negative patches selected by users of other systems or never selected by any user but possessing the characteristics of a negative patch. Alternatively, these stock negative patches may be stored in the previous negative patch 302E.

[0111] Once the two types of positive training patches 302G and negative training patches 302H have been determined, the SVM training engine 308 proceeds to step 509L to train the binary SVM 315 using patches 302G and 302H. That is, the two types of the binary SVM 315 are positive and negative patches, and they will be stored using the corresponding positive and negative patches determined by the first-class SVMs 311 and 313.

[0112] In step 509L, once the type II positive training patch 302G (in) has been applied... Figure 9A An exception is shown in the upper right corner) and the second-negative training patch 302H (in Figure 9A The image shown in the lower left corner is provided to a (linear) binary SVM 315, which selects negative support vectors (the circle patch crossed by the bottom dashed line) and positive support vectors (the circle patch crossed by the top dashed line) to maximize the margin width (the width between the two parallel dashed lines). The hyperplane is the solid line shown at the midpoint between the two dashed lines. Figure 9B They are similar.

[0113] Once the hyperplane is determined in the second-class SVM 315 in step 509L, the process is complete. Figure 5B Step 509.

[0114] In step 511, the classification engine 309 provides all or a subset of the test patches 323 as input to the now-trained binary SVM 315. The trained binary SVM 315 then classifies each test patch 323 as follows: if the patch is located on the positive side of the hyperplane, it is classified as a positive patch (e.g., positive for cancer), or if the patch is located on the negative side of the hyperplane, it is classified as a negative patch (negative for cancer), as... Figures 9A to 9B As shown. Once step 511 is completed, each test patch 323 is marked as positive or negative.

[0115] Once the classification in step 511 is completed, the system proceeds to step 513, the grouping step. In step 513, test patches 323 that have been identified as positive are grouped together with other adjacent positive patches. Similarly, test patches 323 that have been identified as negative are grouped together with other adjacent negative patches. See below for reference. Figure 5D Step 513 is described in further detail.

[0116] Typically, the process of generating a convex hull is as follows: (1) Construct a patch mesh graph based on two types of SVMs (positive is 1, negative is 0); (2) Convolve the graph with adjacent kernels; (3) Create a main blob. Within each blob, each patch must have a score of at least 7.5, and all patches are connected; (4) Create auxiliary blobs. Within each auxiliary blob, each patch must have a score of at least 4.0, and all patches are connected; (5) Construct a convex hull for each of the main blob and auxiliary blobs.

[0117] In step 513A, the grouping engine 317 determines a probability score by analyzing each test patch 323 and its neighboring patches. Specifically, the score is determined as follows: starting from 0, adding 2 if the patch itself has been classified into a relevant category (positive or negative), adding 1 for each patch directly above, below, to the left, and to the right of a patch that has been classified into the same category, and adding 0.5 for each corner patch that has been classified into the same category.

[0118] For example, Figure 10AA representative portion of test patch 323 is shown. The P value in some patches indicates that the binary SVM 315 has identified the patch as positive. For example, analyzing patch (2,4) (column 2, row 4), the probability score for this patch is 7.5. We add 2 because the binary SVM 315 has identified patch (2,4) itself as positive. We add 1 to each of the related patches (1,4), (2,5), (3,4), and (2,3) above, below, to the left, and to the right of this patch. We also add 0.5 to each of the three adjacent corner patches (1,5), (3,5), and (3,3). Since (1,3) is not indicated as positive, its probability score is not increased by 0.5. Therefore, its probability score becomes 2 + 4 + 1.5 = 7.5.

[0119] As another example, patch (4,2) scores 5.0 (each patch (3,2), (4,3), (5,2) and (4,1) scores 1.0, and each corner patch (3,3) and (5,3) scores 0.5).

[0120] According to step 513B, once the above processing has been performed for all test patches 323, any patch with a probability score of 7.5 or higher is considered the primary blob, and any patch with a probability score of 4.0 or higher is considered the secondary blob (these numbers are merely examples and different numbers can be used). More specifically, two binary graphs will be generated: a primary and a secondary. The primary binary graph includes only patches with a score of 7.5 or higher, and the secondary binary graph includes only patches with a score of 4.0 or higher.

[0121] In step 513C, the grouping engine 317 generates convex hulls, wire strings, and points for all master blobs, and assigns a probability score of 7.5 (or the highest score of any patch within the convex hull) to all patches within the convex hull.

[0122] Figure 10B The diagram shows the convex hull generated around multiple positive patches (or negative patches, depending on the case). Figure 5D The process is performed for positive patches and then for negative patches.

[0123] In step 513D, the grouping engine 317 generates convex hulls, wire strings, and points for the auxiliary blob, and assigns a probability score of 4.0 (or the highest score of any patch within the convex hull) to all patches within the convex hull.

[0124] In step 513E, if the primary blob and the secondary blob have overlapping geometry, the grouping engine 317 merges the secondary blob with the primary blob. Overlapping geometry is determined if a vertex of one convex hull (e.g., the primary blob convex hull) is inside another convex hull (e.g., the secondary blob / region convex hull). That is, for each vertex in one convex hull, the system determines whether it is inside any other convex hull. If there are no vertices inside the other convex hull, the two convex hulls do not overlap.

[0125] In step 513F, if the convex hull, line string, or point intersect, the grouping engine 317 uses the maximum overlap criterion to group the convex hulls together.

[0126] In step 513G, the probability of each master blob is determined as follows: probability 主 = k * primary_patch_count(primary_patch_count), where k is a constant, such as 0.1, 0.2, 0.3, 0.4, or 0.5, and primary_patch_count is the number of patches in a particular primary blob. The constant k can be set by the user or predetermined. For example, if k is 0.1 and primary_patch_count is 13, then the probability is 0.1 * 13 = 1.3. The purpose of the above calculation is to account for blobs with a larger number of patches that have been identified as potentially containing positive patches being more likely to be positive than blobs with a smaller number of patches.

[0127] In step 513H, the probability score of each auxiliary region / blob that is merged into the main blob is determined as follows: probability 辅助 = k * supporting_patch_count / j, where k is a constant, such as 0.1, 0.2, 0.3, 0.4, or 0.5; supporting_patch_count is the patch count of all auxiliary blobs that have been merged into a specific main blob; and j is a constant, such as 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, or 150. For example, if k is 0.5, supporting_patch_count is 300, and j is 100, then the probability is 0.5 * 300 / 100 = 1.5. Finally, if the probability... 辅助 If the probability is greater than or equal to k or another constant, then the probability will be... 辅助 Set to k. For example, in the example above, due to probability 辅助 It was determined to be 1.5, therefore the probability was... 辅助 Set it to the value k, which is 0.5.

[0128] In step 513J, the final probabilities of the main blob and the auxiliary regions / blobs merged into the main blob are determined as follows: Probability 最终 =probability 主 +probability 辅助 For example, if "probability" 主 The value is 0.7, and the probability is... 辅助 If the final probability is 0.5, then the final probability is 1.2. However, if the final probability is greater than 1.0, then the final probability is set to 1.0. Therefore, in the example above, the final probability will be set to 1.0.

[0129] In step 513K, the grouping engine 317 uses the convex hull of the final group to reassign probabilities to the maximum score of any convex hull within that group. The scores of the convex hulls are then stored in the region of interest 321, and for any convex hull above a threshold (e.g., 1.0), such convex hulls are... Figures 11A to 11B and Figures 11G to 11H Green indicates a high probability of a positive or negative patch. For example, green may indicate a high probability of a positive patch. Embodiments of the present invention can be used to indicate a high probability of a positive patch for various characteristics of a sample / image. For example, in Figures 11A to 11B and Figures 11G to 11H In the text, a green patch indicates a high probability of a positive cancer diagnosis, while... Figure 11I The green indicator cells are highly likely to be lymphocytes.

[0130] Then, a region of interest (ROI) 321 (which may include multiple regions) can be generated based on the output from step 513K. The ROI 321 may include a tissue mask (e.g., a microanatomical mask), which can be used for laser microdissection to remove the target ROI for further processing. Based on the size and quality of the ROI, only certain ROIs can be used as microanatomical masks. That is, some ROIs may not be suitable for microanatomy.

[0131] The embodiments described herein incorporate pathologist input to implement few-sample learning, enabling more efficient and rapid generation of classifications of tissue regions of interest (ROIs) within a sample. Pathologists no longer need to manually delineate all ROIs in an image. Instead, systems using the few-sample learning techniques described herein help accelerate the pathologist's work by at least partially automating the process of identifying ROIs. In another embodiment, the output of the disclosed embodiments is used to train a deep learning network (e.g., Figure 3 This allows for more efficient generation of training data, such as the training data for CNN 329 (as shown) or other machine learning systems that require large amounts of training data.

[0132] Although certain illustrative embodiments have been described herein, it should be understood that these embodiments are presented by way of example only and not by way of limitation. While embodiments have been specifically shown and described, it should be understood that various changes in form and detail may be made. Although various embodiments have been described as having specific combinations of features and / or components, other embodiments may also have any combination of features and / or components from any of the embodiments described above.

Claims

1. A computer-implemented method for generating a region of interest of at least one shape in a digital image, the method comprising: Access to digital tissue images of biological samples is obtained through an image processing engine; The image processing engine flattens the digitally organized image into a set of image patches. The image processing engine obtains multiple features from each patch in the image patch set, and these multiple features define a patch feature vector in a multidimensional feature space that includes these multiple features as dimensions. The image processing engine determines the subset of patches selected by the user within the image patch set. A class of support vector machines is trained based on the user-selected subset of patches. Access all or a predetermined number of previously selected patches, and use a trained support vector machine to determine the subset of previously selected patches that is in the same class as the subset of patches selected by the user; A binary support vector machine is trained based on the user-selected subset of patches and the previously selected subset of patches. By applying a trained binary support vector machine to the patch vectors of other patches in the patch set, these other patches are classified as belonging to or not belonging to the same interest category as the user-selected subset of patches; as well as One or more regions of interest are identified, at least in part, based on the results of this classification.

2. The method as described in claim 1, wherein, The steps involved in tiling this digitally organized image include creating image patches with uniform size and shape.

3. The method as described in claim 2, wherein, These image patches, which have a uniform size and shape, include: a. A square patch less than or equal to 1,000 pixels by 1,000 pixels; b. A square patch less than or equal to 400 pixels by 400 pixels; or c. A square patch less than or equal to 256 pixels by 256 pixels.

4. The method of claim 1, wherein: a. The steps of tiling the digitally organized image include creating image patches with non-uniform sizes and shapes; b. The image patch set includes non-overlapping patches.

5. The method of claim 1, wherein, Obtaining these features involves submitting each patch to a trained image processing neural network that performs feature extraction processing, which has been trained on images with a variety of known objects.

6. The method of claim 5, wherein: a. This neural network is a convolutional neural network; b. The neural network has been trained on at least 1,000,000 images, and these various known objects include objects belonging to at least 1,000 different categories of known objects; c. Most of these known objects are not digital tissue images of biological samples; d. None of these known objects are digital tissue images of biological samples; or e. All of these known objects are not in the same category of interest as the subset of patches selected by the user.

7. The method of claim 1, further comprising having a computing device render the shape of the region of interest on a display.

8. The method of claim 1, wherein, The shape of the region of interest includes at least one tissue mask, wherein the at least one tissue mask includes a microanatomical mask.

9. The method of claim 1, wherein, The classification of interest includes at least one cancer category, wherein the at least one cancer category includes one of the following cancer types: breast cancer, bladder cancer, brain cancer, lung cancer, pancreatic cancer, skin cancer, colorectal cancer, prostate cancer, stomach cancer, liver cancer, cervical cancer, esophageal cancer, leukemia, non-Hodgkin's lymphoma, kidney cancer, uterine cancer, bile duct cancer, bone cancer, ovarian cancer, gallbladder cancer, gastrointestinal cancer, oral cancer, laryngeal cancer, eye cancer, pelvic cancer, spinal cord cancer, testicular cancer, vaginal cancer, vulvar cancer, or thyroid cancer.

10. The method of claim 1, wherein, The classification of interest includes at least one of the following tissue types: abnormal tissue, benign tissue, malignant tissue, bone tissue, skin tissue, mesenchymal tissue, muscle tissue, connective tissue, scar tissue, lymphatic tissue, fat, epithelial tissue, nerve tissue, or blood vessels.

11. The method of claim 1, wherein, The digital tissue image includes a slide image of a tissue sample slide, wherein the slide image includes a digital histopathological image, or the slide image includes biological sample metadata, which includes digital information associated with at least one of the following: tissue type, tissue donor, scanner, staining agent, staining technique, preparer identifier, image size, sample identifier, tracking identifier, version number, file type, image date, symptoms, diagnosis, attending physician identification information, medical history of the tissue donor, demographic information of the tissue donor, family medical history of the tissue donor, or species of the tissue donor.

12. An apparatus for generating a region of interest of at least one shape in a digital image, the apparatus comprising: Non-transitory computer-readable storage device for storing software instructions; as well as The processor, coupled to the computer-readable storage, and configured to execute these software instructions as follows: Gain access to digital tissue images of biological samples; Tile the digital organization image into a set of image patches; Multiple features are obtained from each patch in the image patch set, and these multiple features define a patch feature vector in a multidimensional feature space that includes these multiple features as dimensions; Determine the subset of patches selected by the user within the image patch set; A class of support vector machines is trained based on the user-selected subset of patches. Access all or a predetermined number of previously selected patches, and use a trained support vector machine to determine the subset of previously selected patches that is in the same class as the subset of patches selected by the user; A binary support vector machine is trained based on the user-selected subset of patches and the previously selected subset of patches. By applying a trained binary support vector machine to the patch vectors of other patches in the patch set, these other patches are classified as belonging to or not belonging to the same interest category as the user-selected subset of patches; as well as One or more regions of interest are identified, at least in part, based on the results of this classification.

13. A computer-implemented method for generating training data to train a machine learning system, the method comprising: Access to digital tissue images of biological samples is obtained through an image processing engine; The image processing engine flattens the digitally organized image into a set of image patches. The image processing engine obtains multiple features from each patch in the image patch set, and these multiple features define a patch feature vector in a multidimensional feature space that includes these multiple features as dimensions. The image processing engine determines the subset of patches selected by the user within the image patch set. A class of support vector machines is trained based on the user-selected subset of patches. Access all or a predetermined number of previously selected patches, and use a trained support vector machine to determine the subset of previously selected patches that is in the same class as the subset of patches selected by the user; A binary support vector machine is trained based on the user-selected subset of patches and the previously selected subset of patches. By applying a trained binary support vector machine to the patch vectors of other patches in the patch set, these other patches are classified as belonging to or not belonging to the same interest category as the user-selected subset of patches; as well as The image processing engine generates training data for the machine learning system based on the user-selected patch and other patches in the patch set.