Capsule endoscopy GI tract segmentation using clips classification

A deep learning neural network classifies capsule endoscopy images to automate the detection of gastrointestinal tract transitions, enhancing diagnostic efficiency by reducing the number of images that need to be reviewed.

WO2026133080A1PCT designated stage Publication Date: 2026-06-25GIVEN IMAGING LTD

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
GIVEN IMAGING LTD
Filing Date
2025-12-15
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

Existing capsule endoscopy systems require manual review of thousands of images to identify transitions between gastrointestinal tract segments, such as from the stomach to the small bowel and from the small bowel to the colon, which is time-consuming and inefficient.

Method used

A system utilizing a deep learning neural network to classify video clips of the gastrointestinal tract into distinct segments, enabling automated detection of transitions between these segments, thereby reducing the number of images that need to be reviewed and improving diagnostic efficiency.

Benefits of technology

Automated detection of gastrointestinal tract transitions reduces the time and resources required for image review, facilitating faster diagnosis and more precise localization of medical conditions.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure IB2025062878_25062026_PF_FP_ABST
    Figure IB2025062878_25062026_PF_FP_ABST
Patent Text Reader

Abstract

A system for analyzing images includes memory storing instructions, which when executed by at least one processor, cause the system to access a plurality of video clips of at least a portion of a GIT and for each video clip of the plurality of video clips, provide a score for classifying the video clip to a segment of a plurality of segments of the GIT. The system is also caused to classify each video clip of a subset of the plurality of video clips to one of the plurality of segments of the GIT and detect, in consecutive video clips in the subset, a change from a first classification to a second classification. The system is also caused to classify, among the video clips in the subset, a transition between two adjacent segments of the plurality of segments of the GIT.
Need to check novelty before this filing date? Find Prior Art

Description

Attorney Docket: A0013182W001CAPSULE ENDOSCOPY GI TRACT SEGMENTATION USING CLIPS CLASSIFICATIONCROSS-REFERENCE TO RELATED APPLICATION(S)

[0001] This application claims priority to U.S. Provisional Patent application No. 63 / 735,929, filed December 19, 2024, which is incorporated herein by reference in its entirety.FIELD

[0002] This disclosure relates to image analysis methods and systems and, more particularly, to systems and methods for analyzing a stream of images of a gastrointestinal tract.BACKGROUND

[0003] Capsule endoscopy (CE) allows examining the gastrointestinal tract (GIT) endoscopically. There are capsule endoscopy systems and methods that are aimed at examining a specific portion of the GIT, such as the small bowel (SB) or the colon. CE is a non-invasive procedure which does not require the patient to be admitted to a hospital, and the patient can continue most daily activities while the capsule is in the patient’s body. The capsule, which is about the size of a multi-vitamin, is swallowed by the patient under the supervision of a health professional (e.g., a nurse or a physician) at the medical facility and the patient is provided with a wearable device, e.g., a sensor belt and a recorder placed in a pouch and strap to be placed around the patient's shoulder. The wearable device typically includes a storage device.

[0004] The capsule captures images as it travels naturally through the GIT. Images and additional data (e.g., metadata) are then transmitted to the recorder that is worn by the patient. The capsule is typically disposable and passes naturally with a bowel movement. The procedure data (e.g., the captured images or a portion of them and additional metadata) is stored on the storage device of the wearable device. The procedure data is then downloaded to a computing device typically located at the medical facility, which has an engine software stored thereon. The received procedure data is then processed by the engine to a compiled study (or “study”).

[0005] A reader (which may be the procedure supervising physician, a dedicated physician, or the referring physician) may access the study via a reader application. The reader application may provide an interface for displaying the study on the computing device, typically as a video or movie (e.g., a series of moving images). The reader then reviews the study, evaluates the procedure, and provides input via the reader application. Since the reader needs to review thousands of images,Attorney Docket: A0013182W001 the reading time of a study may usually take between half an hour to an hour on average. A report is then generated by the reader application based on the compiled study and the reader's input. On average, it would take an hour to generate a report. The report may include, for example, images of interest, e.g., images which are identified as including pathologies, selected by the reader; evaluation or diagnosis of the patient's medical condition based on the procedure's data (i.e., the study) and / or recommendations for follow up and / or treatment provided by the reader. The report may be then forwarded to the referring physician. The referring physician may decide on a required follow up or treatment based on the report.

[0006] Accurately identifying transitions of the capsule from the stomach to the SB and from the SB to the colon would enable precise localization and diagnosis of the patient's medical condition.SUMMARY

[0007] The present disclosure relates to systems and methods for analyzing a stream of images of a GIT. More specifically, the present disclosure relates to classifying video clips of the GIT into one of a plurality of distinct classes using a machine learning system such as a deep learning neural network. Distinct classes of the GIT may include, for example: the stomach, the transition from the stomach to the SB, the SB, the transition from the SB to the colon, and the colon. By processing the classified results, various heuristics can be applied to automatically and accurately determine points in the stream of images corresponding to transitions between particular GIT segments, such as transiting between the stomach and the SB, or transitioning between the SB and the colon. Even though examples are shown and described with respect to images captured in vivo by a capsule endoscopy device, the disclosed technology can be applied to images captured by other devices or mechanisms, including anatomical images captured by MRI, for example.

[0008] Provided in accordance with aspects of the present disclosure is a system for analyzing images. The system includes at least one processor and at least one memory storing instructions which, when executed by the at least one processor, cause the system to access a plurality of video clips of at least a portion of a gastrointestinal tract (GIT) captured by a capsule endoscopy device; for each video clip of the plurality of video clips, provide, by a deep learning neural network, a score for classifying each video clip to one of a plurality of segments of the GIT; classify each video clip of a subset of the plurality of video clips, whose scores satisfy a confidence criterion, toAttorney Docket: A0013182W001 one of the segments of the GIT; detect, in consecutive video clips in the subset, a change from a first classification corresponding to a first segment of the GIT to a second classification corresponding to a second segment of the GIT adjacent to the first segment of the GIT; and classify, among the video clips in the subset, a transition between two adjacent segments of the segments of the GIT based on the detected change from the first classification to the second classification.

[0009] Another system for analyzing images is provided in accordance with aspects of the present disclosure and includes a capsule endoscopy device configured to capture, over time, a plurality of video clips of at least a portion of a gastrointestinal tract (GIT) of a person, and a receiving device configured to be secured to the person, to be communicatively coupled with the capsule endoscopy device, and to receive the plurality of video clips. The system also includes a computing system configured to be communicatively coupled with the receiving device and to receive the plurality of video clips. The computing system includes at least one processor and at least one memory storing instructions which, when executed by the at least one processor, cause the computing system to access the plurality of video clips of at least a portion of a gastrointestinal tract (GIT) captured by a capsule endoscopy device; for each video clip of the plurality of video clips, provide, by a machine learning system, scores for classifying the image to each of a plurality of segments of the GIT; classify each video clip of a subset of the plurality of video clips, whose scores satisfy a confidence criterion, to one of the segments of the GIT; detect, in consecutive video clips in the subset, a change from a first classification corresponding to a first segment of the GIT to a second classification corresponding to a second segment of the GIT adjacent to the first segment of the GIT; and classify, among the video clips in the subset, a transition between two adjacent segments of the segments of the GIT based on the detected change from the first classification to the second classification.

[0010] Also provided in accordance with aspects of the present disclosure is a non-transitory machine readable medium storing instructions which, when executed by a processor, cause the processor to perform a method. The method includes accessing a plurality of video clips of at least a portion of a gastrointestinal tract (GIT) captured by a capsule endoscopy device; for each video clip of the plurality of video clips, providing, by a deep learning neural network, a score for classifying each video clip to one segment of a plurality of segments of the GIT; classifying each video clip of a subset of the plurality of video clips, whose scores satisfy a confidence criterion, to one of the plurality of segments of the GIT; detecting, in consecutive video clips in the subset, aAttorney Docket: A0013182W001 change from a first classification corresponding to a first segment of the GIT to a second classification corresponding to a second segment of the GIT adjacent to the first segment of the GIT; and classifying, among the video clips in the subset, a transition between two adjacent segments of the plurality of segments of the GIT based on the detected change from the first classification to the second classification.BRIEF DESCRIPTION OF THE DRAWINGS

[0011] The above and other aspects and features of the present disclosure will become more apparent in view of the following detailed description when taken in conjunction with the accompanying drawings wherein like reference numerals identify similar or identical elements. The components in the drawings are not necessarily to scale. Instead, emphasis is placed on clearly illustrating the principles of the present disclosure.

[0012] FIG. 1 is a diagram illustrating a human gastrointestinal tract (GIT);

[0013] FIG. 2 is a block diagram of an exemplary system for analyzing images captured in vivo by a capsule endoscopy device, in accordance with aspects of the present disclosure;

[0014] FIG. 3 s a block diagram of an exemplary computing device, in accordance with aspects of the disclosure;

[0015] FIG. 4 is a block diagram of an exemplary deep learning neural network, in accordance with aspects of the present disclosure;

[0016] FIG. 5 is a flow diagram of an exemplary operation of determining a transition between adjacent segments of a GIT, in accordance with aspects of the present disclosure;

[0017] FIG. 6 is a flow diagram of another exemplary operation of determining a transition between adjacent segments of a GIT, in accordance with aspects of the present disclosure; and

[0018] FIG. 7 is a flow diagram of yet another exemplary operation of determining a transition between adjacent segments of a GIT, in accordance with aspects of the present disclosure.DETAILED DESCRIPTION

[0019] The present disclosure provides systems and methods for transition detection in a stream of images captured during a CE procedure. The transition that is detected in the stream of images can be a transition from images of one anatomical area of a gastrointestinal tract (GIT) to images of another anatomical area of a GIT, or can be a transition from images of a GIT segment with a pathology present to images of another GIT segment with the pathology not present, or canAttorney Docket: A0013182W001 be a transition from images of a sick / diseased segment of a GIT to images of a healthy segment of a GIT, and / or combinations thereof. Accordingly, as used herein, a “segment” of a GIT includes but is not limited to anatomical portions that have given names. Rather, the term “segment” also includes a portion of a GIT having a particular characteristic, such as sick / diseased, healthy, presence of a pathology, and / or absence of a pathology, among other characteristics. Even though examples are shown and described with respect to images captured in vivo by a capsule endoscopy device, the disclosed technology can be applied to images captured by other devices or mechanisms, including anatomical images captured by MRI, for example.

[0020] In various aspects, the disclosed transition detection utilizes a combination of transition detection operations. Certain transition detection operations are used for detecting a transition from images of the stomach to images of the SB, and certain transition detection operations are used for detecting a transition from images of the SB to images of the colon. Additionally, some transition detection operations are suitable for “online” use, e.g., for use at the same time that the capsule progresses through the GIT, while some transition detection operations are suitable for “offline” use, e.g., after the capsule has exited the patient body.

[0021] By detecting transitions in a stream of in-vivo images, in accordance with the present disclosure, portions of the stream of images can be identified to provide localization information to the reader of a study (typically a physician) and / or to remove images which are not relevant. According to some aspects of the present disclosure, a user (e.g., a physician), may build his or her understanding of a case by reviewing a study, which includes a display of images (e.g., captured by a CE imaging device) that were selected, e.g., automatically, as images that may be of interest. Since the study typically includes thousands of images, its review may be a timeconsuming task. Reducing the number of images included in a study may ease the review process for the user, reduce the reading time per case, and may lead to better diagnosis. For example, in a SB procedure, once the transition to the colon is identified, all the images captured after the transition point may be removed. This may facilitate the generation of a study having a reduced number of images and thus may reduce the study reading time. Furthermore, it may save processing time and resources.

[0022] In various embodiments, transition detection can be utilized online to indicate the end of a procedure, which would “release” the patient and allow the patient to be uncoupled from equipment while the capsule progresses through non-relevant portions of the GIT. In variousAttorney Docket: A0013182W001 embodiments, transition detection can be used to define an anatomical area of interest (e.g., the stomach, the SB, the colon) and / or to segment the GIT (or a portion of it) into different anatomical areas. Various combinations of one or more such embodiments are contemplated to be within the scope of the present disclosure.

[0023] In the following detailed description, specific details are set forth in order to provide a thorough understanding of the disclosure. However, it will be understood by those skilled in the art that the disclosure may be practiced without these specific details. In other instances, well- known methods, procedures, and components have not been described in detail so as not to obscure the present disclosure. Some features or elements described with respect to one system may be combined with features or elements described with respect to other systems. For the sake of clarity, discussion of same or similar features or elements may not be repeated.

[0024] Although the disclosure is not limited in this regard, discussions utilizing terms such as, for example, “processing,” “computing,” “calculating,” “determining,” “establishing,” “analyzing,” “checking,” or the like, may refer to operation(s) and / or process(es) of a computer, a computing platform, a computing system, or other electronic computing device, that manipulates and / or transforms data represented as physical (e.g., electronic) quantities within the computer’s registers and / or memories into other data similarly represented as physical quantities within the computer’s registers and / or memories or other information non-transitory storage medium that may store instructions to perform operations and / or processes. Although the disclosure is not limited in this regard, the terms “plurality” and “a plurality” as used herein may include, for example, “multiple” or “two or more”. The terms “plurality” or “a plurality” may be used throughout the specification to describe two or more components, devices, elements, units, parameters, or the like. The term set when used herein may include one or more items. Unless explicitly stated, the methods described herein are not constrained to a particular order or sequence. Additionally, some of the described methods or elements thereof can occur or be performed simultaneously, at the same point in time, or concurrently.

[0025] The term “location” and its derivatives, as referred to herein with respect to an image, may refer to the estimated location of the capsule along the GIT while capturing the image or to the estimated location of the portion of the GIT shown in the image along the GIT.Attorney Docket: A0013182W001

[0026] A type of CE procedure may be determined based on, inter alia, the portion of the GIT that is of interest and is to be imaged (e.g., the colon or the SB), or based on the specific use (e.g., for checking the status of a GI disease, such as Crohn’s disease, or for colon cancer screening).

[0027] The terms “surrounding” or “adjacent” as referred to herein with respect to images (e.g., images that surround another image(s), or that are adjacent to other image(s)), may relate to spatial and / or temporal characteristics unless specifically indicated otherwise. For example, images that surround or are adjacent to other image(s) may be images that are estimated to be located near the other image(s) along the GIT and / or images that were captured near the capture time of another image, within a certain threshold, e.g., within one or two centimeters, or within one, five, or ten seconds.

[0028] The terms “GIT” and “a portion of the GIT” may each refer to or include the other, according to their context. Thus, the term “a portion of the GIT” may also refer to the entire GIT and the term “GIT” may also refer only to a portion of the GIT.

[0029] The terms “image” and “frame” may each refer to or include the other and may be used interchangeably in the present disclosure to refer to a single capture by an imaging device. For convenience, the term “image” may be used more frequently in the present disclosure, but it will be understood that references to an image shall apply to a frame as well.

[0030] Referring to FIG. 1 , an illustration of the GIT 100 is shown. The GIT 100 is an organ system within humans and other animals. The GIT 100 generally includes a mouth 102 for taking in sustenance, salivary glands 104 for producing saliva, an esophagus 106 through which food passes aided by contractions, a stomach 108 to secret enzymes and stomach acid to aid in digesting food, a liver 110, a gall bladder 112, a pancreas 114, a small intestine 116 (e.g., SB) for the absorption of nutrients, and a colon 400 (e.g., large intestine) for storing water and waste material as feces prior to defecation. The colon 400 generally includes an appendix 402, a rectum 428, and an anus 430. Food taken in through the mouth is digested by the GIT to take in nutrients and the remaining waste is expelled as feces through the anus 430.

[0031] Studies of different portions of the GIT 100 (e.g., SB 116, colon 400, esophagus 106, and / or stomach 108) may be presented via a suitable user interface. As used herein, the terms “study” and “studies” refer to and include at least a set of images selected from the images captured by a CE imaging device (e.g., 212, FIG. 2) and can optionally include information other than images as well. The type of procedure performed may determine which portion of the GIT 100 isAttorney Docket: A0013182W001 the portion of interest. Examples of types of procedures performed include, without limitation, a SB procedure, a colon procedure, a SB and colon procedure, a procedure aimed to specifically exhibit or check the SB, a procedure aimed to specifically exhibit or check the colon, a procedure aimed to specifically exhibit or check the colon and the SB, or a procedure to exhibit or check the entire GIT: esophagus, stomach, SB, and colon.

[0032] FIG. 2 shows a block diagram of a system for analyzing medical images and / or video clips captured in vivo via a CE procedure. The system generally includes a capsule system 210 configured to capture images of the GIT and a computing system 300 (e.g., local system and / or cloud system) configured to process the captured images. The capsule system 210 may include a swallowable CE imaging device 212 (e.g., a capsule) configured to capture images and / or video clips of the GIT as the CE imaging device 212 travels through the GIT. The images may be stored on the CE imaging device 212 and / or transmitted to a receiving device 214 typically including an antenna. In some capsule systems 210, the receiving device 214 may be located on the patient who swallowed the CE imaging device 212 and may, for example, take the form of a belt worn by the patient or a patch secured to the patient.

[0033] The capsule system 210 may be communicatively coupled with the computing system 300 and can communicate captured images and / or video clips to the computing system 300. The computing system 300 may process the received images using image processing technologies, machine learning technologies, and / or signal processing technologies, among other technologies. The computing system 300 can include local computing devices that are local to the patient and / or the patient’s treatment facility, a cloud computing platform that is provided by cloud services, or a combination of local computing devices and a cloud computing platform.

[0034] In the case where the computing system 300 includes a cloud computing platform, the images and / or video clips captured by the capsule system 210 may be transmitted online to the cloud computing platform. In various embodiments, the images can be transmitted via the receiving device 214 worn or carried by the patient. In various embodiments, the images and / or video clips can be transmitted via the patient’s smartphone or via any other device connected to the Internet and which may be coupled with the CE imaging device 212 or the receiving device 214.

[0035] FIG. 3 shows a high-level block diagram of an exemplary computing system 300 that may be used with image analyzing systems of the present disclosure. Computing system 300 mayAttorney Docket: A0013182W001 include a processor or controller 305 that may be or include, for example, one or more central processing unit processor(s) (CPU), one or more Graphics Processing Unit(s) (GPU or GPGPU), a chip or any suitable computing or computational device, an operating system 215, a memory 320, a storage 330, input devices 335 and output devices 340. Modules or equipment for collecting or receiving (e.g., a receiver worn on a patient) or displaying or selecting for display (e.g., a workstation) medical images collected by the CE imaging device 212 (FIG. 2) may be or include, or may be executed by, the computing system 300 shown in FIG. 3. A communication component 322 of the computing system 300 may allow communications with remote or external devices, e.g., via the Internet or another network, via radio, or via a suitable network protocol such as File Transfer Protocol (FTP), etc.

[0036] The computing system 300 includes an operating system 315 that may be or may include any code segment designed and / or configured to perform tasks involving coordination, scheduling, arbitration, supervising, controlling or otherwise managing operation of computing system 300, for example, scheduling execution of programs. Memory 320 may be or may include, for example, a Random Access Memory (RAM), a read-only memory (ROM), a Dynamic RAM (DRAM), a Synchronous DRAM (SD-RAM), a double data rate (DDR) memory chip, a Flash memory, a volatile memory, a non-volatile memory, a cache memory, a buffer, a short term memory unit, a long term memory unit, or other suitable memory units or storage units. Memory 320 may be or may include a plurality of possibly different memory units. Memory 320 may store for example, instructions to carry out a method (e.g., executable code 325), and / or data such as user responses, interruptions, etc.

[0037] Executable code 325 may be any executable code, e.g., an application, a program, a process, task or script. Executable code 325 may be executed by controller 305 possibly under control of operating system 315. For example, execution of executable code 325 may cause the display or selection for display of medical images as described herein. In some systems, more than one computing system 300 or components of computing system 300 may be used for multiple functions described herein. For the various modules and functions described herein, one or more computing systems 300 or components of computing system 300 may be used. Devices that include components similar or different to those included in the computing system 300 may be used, and may be connected to a network and used as a system. One or more processor(s) 305 may be configured to carry out methods of the present disclosure by for example executing software orAttorney Docket: A0013182W001 code. Storage 330 may be or may include, for example, a hard disk drive, a floppy disk drive, a Compact Disk (CD) drive, a CD-Recordable (CD-R) drive, a universal serial bus (USB) device or other suitable removable and / or fixed storage unit. Data such as instructions, code, medical images, image streams, etc. may be stored in storage 330 and may be loaded from storage 330 into memory 320 where it may be processed by controller 305. In some embodiments, some of the components shown in FIG. 3 may be omitted.

[0038] Input devices 335 may include for example a mouse, a keyboard, a touch screen or pad or any suitable input device. It will be recognized that any suitable number of input devices may be operatively coupled to computing system 300. Output devices 340 may include one or more monitors, screens, displays, speakers and / or any other suitable output devices. It will be recognized that any suitable number of output devices may be operatively coupled to computing system 300 as shown by block 340. Any applicable input / output (I / O) devices may be operatively coupled to computing system 300, for example, a wired or wireless network interface card (NIC), a modem, printer or facsimile machine, a universal serial bus (USB) device or external hard drive may be included in input devices 335 and / or output devices 340.

[0039] Multiple computer systems 300 including some or all of the components shown in FIG. 3 may be used with the described systems and methods. For example, CE imaging device 212, a receiver, a cloud-based system, and / or a workstation or portable computing device for displaying images may include some or all of the components of the computer system of FIG. 3. A cloud platform (e.g., a remote server) including components such as computing system 300 of FIG. 3 may receive procedure data such as images and metadata, process and generate a study, and may also display the generated study for the doctor’s review (e.g., on a web browser executed on a workstation or portable computer). An “on-premises” option may use a workstation or local server of a medical facility to store, process and display images and / or a study.

[0040] Referring now to FIG. 4, there is shown a block diagram of an exemplary deep learning neural network 400 for classifying images and / or video clips. The deep learning neural network 400 can be implemented and executed by the computing system 300 of FIGS. 2 and 3. Generally, and as persons skilled in the art will understand, a deep learning neural network 400 includes an input layer, a plurality of hidden layers, and an output layer. The input layer, the hidden layers, and the output layer all include neurons / nodes. The neurons between the various layers are interconnected via weights. Each neuron in the deep learning neural network 400 computes anAttorney Docket: A0013182W001 output value by applying a specific function to the input values coming from the nodes in the previous layer. The function that is applied to the input values is determined by a vector of weights and a bias. Learning, in the deep learning neural network, progresses by making iterative adjustments to these biases and weights.

[0041] In various embodiments, a deep learning neural network 400 includes a convolutional neural network (CNN). In machine learning, a CNN is a class of artificial neural network that is most commonly applied to analyzing visual imagery. As persons skilled in the art will understand, the convolutional aspect of a CNN relates to applying matrix processing operations to localized portions of an image, and the results of those operations are sets of features that are used to train neural networks. A CNN typically includes convolution layers, activation function layers, and pooling (typically max pooling) layers to reduce dimensionality without losing too many features. Additional information may be included in the operations that generate these features. Providing unique information that yields features that give the neural networks information can be used to ultimately provide an aggregate way to differentiate between different data input to the neural networks.

[0042] In the illustrated embodiment, the deep learning neural network 400 may utilize one or more CNNs to classify one or more video clips 422 captured by the CE imaging device 212 (FIG. 2) to a portion of the GIT. In aspects of the present disclosure, the one or more video clips 422 may be a predefined length (e.g., between about 5 seconds and about 10 seconds). In the illustrated embodiment, the portions of the GIT classified by the deep learning neural network 400 include the stomach 412, a transition from the stomach to the SB 414, the SB 416, a transition from the SB to the colon 418, and the colon 420. The deep learning neural network 400 may be executed on the computer system 300 (FIG. 3). Persons skilled in the art will understand the deep learning neural network 400 and how to implement it.

[0043] The deep learning neural network 400 may be trained based on labels 424 for training video clips 422 and / or objects in training images. For example, a video clip 422 may be labeled as a portion of the GIT (for example, the stomach, a transition from the stomach to the SB, the SB, a transition from the SB to the colon, or the colon). In various embodiments, the training may include supervised learning. The training further may include augmenting the training video clips 422 to include adding noise, changing colors, hiding portions of the training images, scaling of theAttorney Docket: A0013182W001 training images, rotating the training images, and / or stretching the training images. Persons skilled in the art will understand training of a deep learning neural network 400 and how to implement it.

[0044] In various embodiments, the deep learning neural network 400 may be used to classify video clips 422 captured by the CE imaging device 212 (FIG. 2). The classification of the video clips 422 may include each clip being classified to a segment of the GIT. For example, the video clip classifications may include the stomach 412, a transition from the stomach to the SB 414, the SB 416, a transition from the SB to the colon 418, and the colon 420. The deep learning neural network 400 provides a classification score for each of the segments of the GIT. The classifications 412-420 of FIG. 4 are exemplary, and other classifications for other portions, segments, or consecutive segments of a GIT are contemplated to be within the scope of the present disclosure.

[0045] The description above relates to classifying video clips acquired by a capsule endoscopy device to segments of a GIT. The following detailed description will describe techniques for using a heuristic-based approach to detecting transitions in an image stream corresponding to the capsule transitioning from one GIT segment to another GIT segment (e.g., transition from the stomach to the SB, transition from the SB to the colon, etc.). Detecting such transitions is beneficial for organizing the stream of images from the capsule, which can include 50,000 to 100,000 images, and to reduce computing burden by isolating images associated with a particular segment of interest to a pathology study.

[0046] FIG. 5 shows a flowchart, in accordance with an aspect of the present disclosure, illustrating an operation using a heuristic- based step detection approach for automatically detecting and / or classifying transitions in an image stream corresponding to a CE device (e.g., CE imaging device 212) transitioning from one GIT segment to another GIT segment (e.g., transitioning from the stomach to the SB, transitioning from the SB to the colon, etc.). At block 500, the operation accesses a plurality of images (e.g., one or more video clips) of at least a portion of a GIT captured by a capsule endoscopy device (e.g., CE imaging device 212). Each video clip may be a predefined length (e.g., from about 5 seconds to about 10 seconds, and / or a length corresponding to a predefined number of images). At block 502, for each video clip, the operation provides, by a machine learning system (e.g., a deep learning neural network), scores for classifying the video clip to each of a plurality of consecutive segments of the GIT. At block 504, the operation classifies each video clip of a subset of video clips, whose scores satisfy a confidence criterion, to one of the consecutive segments of the GIT. An example of a machine learning system, such as the deepAttorney Docket: A0013182W001 learning neural network 400 of FIG. 4, for providing classification scores of images of a GIT acquired by a capsule endoscopy device is described in commonly-owned U.S. Patent Application Publication No. 2023 / 0148834, filed on April 27, 2021, the entire contents of which are incorporated herein by reference. For example, a high score (e.g., above a threshold) indicates a high confidence that the label of the frame is the one indicated by the score. For example, a high colon score indicates a high confidence of the frame being an image of the colon. In the case that scores are “intermediate,” as defined by boundary thresholds, such intermediate scores would indicate moderate confidence and uncertainty. Confidence scores are normalized and normalized scores that do not meet a confidence criterion (e.g., normalized scores between an upper threshold and a lower threshold) are removed and normalized scores that meet the confidence criterion (e.g., normalized scores above the upper threshold or below the lower threshold) are retained.

[0047] At block 506, the operation uses a heuristic-based step detection approach corresponding to the classifications of the video clips in the subset to detect, in consecutive video clips in the subset of video clips, a change from the classification of a first segment of the GIT to a classification of a second segment of the GIT that is adjacent to the first segment of the GIT. For example, a change from one classification to another (e.g., from the stomach to the SB) that is sustained for at least consecutive video clips in the subset of video clips indicates a transition from the stomach to the SB. In aspects of the present disclosure, a change from one classification to another may be considered “sustained” if the subsequent classification is monitored for a predefined threshold number of video clips and / or for a predefined period time over the course of multiple video clips or over the course of a single video clip. At block 508, the operation classifies, among the video clips in the subset, a transition between two adjacent segments of the consecutive segments of the GIT based on the outcome of the heuristic-based step detection approach of block 506.

[0048] FIG. 6 shows a flowchart, in accordance with an aspect of the present disclosure, illustrating an operation using a heuristic-based temporal consistency approach for automatically detecting and / or classifying transitions in an image stream corresponding to a CE device (e.g., CE imaging device 212) transitioning from one GIT segment to another GIT segment (e.g., transitioning from the stomach to the SB, transitioning from the SB to the colon, etc.). At block 600, the operation accesses one or more video clips (e.g., a plurality of images) of at least a portion of a GIT captured by a capsule endoscopy device (e.g., CE imaging device 212). Each video clipAttorney Docket: A0013182W001 may be a predefined length (e.g., between about 5 seconds and about 10 seconds and / or corresponding to a predefined number of images). At block 602, for each video clip, the operation provides, by a machine learning system (e.g., a deep learning neural network), scores for classifying the video clip to each of a plurality of consecutive segments of the GIT. At block 604, the operation classifies each video clip of a subset of video clips, whose scores satisfy a confidence criterion, to one of the consecutive segments of the GIT. At block 606, the operation uses a heuristic-based temporal consistency approach corresponding to the classifications of the video clips in the subset to detect transitions of a CE device (e.g., CE imaging device 212) from one GIT segment to another GIT segment. More specifically, the operation detects transitions of a CE device (e.g., CE imaging device 212) from one GIT segment to another GIT segment by ensuring that detected transitions are consistent over time. More specifically, the operation detects transitions of a CE device from one GIT segment to another GIT segment based on a classification of a threshold number of video clips subsequent to an earlier video clip not reverting to a classification corresponding to the earlier video clip. For example, a monitored transition of classifications from the stomach classification to the SB classification should not revert back to the stomach classification in video clips monitored subsequently to video clips corresponding to the stomach classification. At block 608, the operation classifies, among the video clips in the subset, a transition between two adjacent segments of the consecutive segments of the GIT based on the outcome of the heuristic- based temporal consistency approach of block 606.

[0049] FIG. 7 shows a flowchart, in accordance with an aspect of the present disclosure, illustrating an operation using a heuristic- based approach for automatically detecting and / or classifying transitions in an image stream corresponding to a CE device (e.g., CE imaging device 212) transitioning from one GIT segment to another GIT segment (e.g., transitioning from the stomach to the small bowel, transitioning from the small bowel to the colon, etc.). The heuristicbased approach of FIG. 7 utilizes the detection of specific spatial features and / or patterns unique to each GIT segment to reinforce the detection of transitions from one GIT segment to another GIT segment. At block 700, the operation accesses one or more video clips (e.g., a plurality of images) of at least a portion of a GIT captured by a CE device (e.g., CE imaging device 212). Each video clip may be a predefined length (e.g., from about 5 seconds to about 10 seconds, and / or a length corresponding to a predefined number of images). At block 702, for each video clip, the operation provides, by a deep learning neural network, scores for classifying the video clip to eachAttorney Docket: A0013182W001 of a plurality of consecutive segments of the GIT. At block 704, the operation classifies each video clip of a subset of video clips, whose scores satisfy a confidence criterion, to one of the consecutive segments of the GIT. At block 706, the operation utilizes the detection of specific spatial features and / or patterns unique to each GIT segment to detect and / or reinforce the detection of a transition from one GIT segment to another GIT segment. For example, by detecting a consistent change from spatial features and / or patterns unique to the stomach to spatial features and / or patterns unique to the SB, a transition from the stomach to the SB is detected by the operation and / or a detection of a transition from the stomach to the SB is reinforced by the operation. At block 708, the operation classifies, among the video clips in the subset, a transition between two adjacent segments of the consecutive segments of the GIT based on the outcome of the heuristic-based temporal consistency approach of block 706.

[0050] In aspects of the present disclosure, the accuracy of the automated transition detection results may be tested by comparing the results with a separate test dataset with known transition points. Additionally or alternatively, the automated transition detection results may be compared with manual annotations by a qualified nurse or physician to assess the performance and / or reliability of the automated detection results.

[0051] The embodiments of FIGS. 4-7 are exemplary, and the disclosed systems and methods can be applied to segments of a gastrointestinal tract (GIT) other than the stomach, the SB, and the colon. Additionally, in various embodiments, another machine learning system can be applied in place of the deep learning neural network shown FIG. 4, such as a classic machine learning system, a neural network with less than five classes, or a neural network with more than five classes. As persons skilled in the art will understand, a “classic” machine learning system is one that involves feature engineering. Such applications are contemplated to be within the scope of the present disclosure.

[0052] Even though examples are shown and described with respect to images captured in vivo by a capsule endoscopy device, the disclosed technology can be applied to images captured by other devices or mechanisms, including anatomical images captured by MRI, for example.

[0053] Aspects of this disclosure may be further described by reference to the following numbered clauses:

[0054] 1. A system for analyzing images, comprising: at least one processor; and at least one memory storing instructions which, when executed by the at least one processor, cause the systemAttorney Docket: A0013182W001 to: access a plurality of video clips of at least a portion of a gastrointestinal tract (GIT) captured by a capsule endoscopy device; for each video clip of the plurality of video clips, provide, by a deep learning neural network, scores for classifying the video clip to each of a plurality of segments of the GIT; classify each video clip of a subset of the plurality of video clips, whose scores satisfy a confidence criterion, to one of the segments of the GIT; detect, in consecutive video clips in the subset, a change from a first classification corresponding to a first segment of the GIT to a second classification corresponding to a second segment of the GIT adjacent to the first segment of the GIT; and classify, among the video clips in the subset, a transition between two adjacent segments of the segments of the GIT based on the detected change from the first classification to the second classification.

[0055] 2. The system according to clause 1, wherein the transition between two adjacent segments of the segments of the GIT is one of a transition between the stomach and the small bowel or a transition between the small bowel and the colon.

[0056] 3. The system according to clause 1, wherein the plurality of segments of the GIT include the stomach, a transition between the stomach and the small bowel, the small bowel, a transition between the small bowel and the colon, and the colon.

[0057] 4. The system according to clause 1, wherein each video clip of the plurality of video clips is between from about seven seconds in length to about ten seconds in length.

[0058] 5. The system according to clause 1 , wherein the instructions, when executed by the at least one processor, further cause the system to provide the subset of the plurality of video clips, as images from the plurality of images whose scores, when normalized, are above an upper threshold or below a lower threshold but are not between the upper threshold and the lower threshold.

[0059] 6. The system according to clause 1, further comprising a receiving device configured to be secured to the person, to be communicatively coupled with the capsule endoscopy device, and to receive the plurality of video clips.

[0060] 7. The system according to clause 1, wherein the transition between the two adjacent segments is a transition between an earlier segment of the GIT and a later segment of the GIT.

[0061] 8. A system for analyzing images, comprising: a capsule endoscopy device configured to capture a plurality of video clips, over time, of at least a portion of a gastrointestinal tract (GIT) of a person; a receiving device configured to be secured to the person, to be communicativelyAttorney Docket: A0013182W001 coupled with the capsule endoscopy device, and to receive the plurality of video clips; and a computing system configured to be communicatively coupled with the receiving device and to receive the plurality of video clips, the computing system including: at least one processor; and at least one memory storing instructions which, when executed by the at least one processor, cause the computing system to: access the plurality of video clips of at least a portion of a gastrointestinal tract (GIT) captured by a capsule endoscopy device; for each video clip of the plurality of video clips, provide, by a machine learning system, a score for classifying each video clip to one segment of a plurality of segments of the GIT; classify each video clip of a subset of the plurality of video clips, whose scores satisfy a confidence criterion, to one of the segments of the GIT; detect, in consecutive video clips in the subset, a change from a first classification corresponding to a first segment of the GIT to a second classification corresponding to a second segment of the GIT adjacent to the first segment of the GIT; and classify, among the video clips in the subset, a transition between two adjacent segments of the segments of the GIT based on the detected change from the first classification to the second classification.

[0062] 9. The system according to clause 8, wherein the machine learning system is one of a deep learning neural network or a classic machine learning system.

[0063] 10. The system according to clause 8, wherein the transition between two adjacent segments of the segments of the GIT is one of a transition between the stomach and the small bowel or a transition between the small bowel and the colon.

[0064] 11. The system according to clause 8, wherein the plurality of segments of the GIT include the stomach, a transition between the stomach and the small bowel, the small bowel, a transition between the small bowel and the colon, and the colon.

[0065] 12. The system according to clause 8, wherein each video clip of the plurality of video clips is from about seven seconds in length to about ten seconds in length.

[0066] 13. The system according to clause 8, wherein the instructions, when executed by the at least one processor, further cause the system to provide the subset of the plurality of video clips, as images from the plurality of video clips whose scores, when normalized, are above an upper threshold or below a lower threshold but are not between the upper threshold and the lower threshold.

[0067] 14. The system according to clause 8, wherein the transition between the two adjacent segments is a transition between an earlier segment of the GIT and a later segment of the GIT.Attorney Docket: A0013182W001

[0068] 15. A non-transitory machine readable medium storing instructions which, when executed by a processor, cause the processor to perform a method comprising: accessing a plurality of video clips of at least a portion of a gastrointestinal tract (GIT) captured by a capsule endoscopy device; for each video clip of the plurality of video clips, providing, by a deep learning neural network, a score for classifying each video clip to one segment of a plurality of segments of the GIT; classifying each video clip of a subset of the plurality of video clips, whose scores satisfy a confidence criterion, to one of the segments of the GIT; detecting, in consecutive video clips in the subset, a change from a first classification corresponding to a first segment of the GIT to a second classification corresponding to a second segment of the GIT adjacent to the first segment of the GIT; and classifying, among the video clips in the subset, a transition between two adjacent segments of the segments of the GIT based on the detected change from the first classification to the second classification.

[0069] 16. The non-transitory machine readable medium according to clause 15, wherein the transition between two adjacent segments of the segments of the GIT is one of a transition between the stomach and the small bowel or a transition between the small bowel and the colon.

[0070] 17. The non-transitory machine readable medium according to clause 15, wherein the plurality of segments of the GIT include the stomach, a transition between the stomach and the small bowel, the small bowel, a transition between the small bowel and the colon, and the colon.

[0071] 18. The non-transitory machine readable medium according to clause 15, wherein each video clip of the plurality of video clips is from about seven seconds in length to about ten seconds in length.

[0072] 19. The non-transitory machine readable medium according to clause 15, wherein the instructions, when executed by the at least one processor, further cause the system to provide the subset of the plurality of video clips, as images from the plurality of images whose scores, when normalized, are above an upper threshold or below a lower threshold but are not between the upper threshold and the lower threshold.

[0073] 20. The non-transitory machine readable medium according to clause 15, wherein the transition between the two adjacent segments is a transition between an earlier segment of the GIT and a later segment of the GIT.

[0074] While several embodiments of the disclosure have been shown in the drawings, it is not intended that the disclosure be limited thereto, as it is intended that the disclosure be as broadAttorney Docket: A0013182W001 in scope as the art will allow and that the specification be read likewise. Therefore, the above description should not be construed as limiting, but merely as exemplifications of particular embodiments. Those skilled in the art will envision other modifications within the scope and spirit of the claims appended hereto.

Claims

Attorney Docket: A0013182W001WHAT IS CLAIMED IS:

1. A system (300) for analyzing images, comprising: at least one processor (305); and at least one memory (320) storing instructions which, when executed by the at least one processor (305), cause the system (300) to: access a plurality of video clips (422) of at least a portion of a gastrointestinal tract (GIT 100) captured by a capsule endoscopy device (212); for each video clip (422) of the plurality of video clips, provide, by a deep learning neural network (400), a score for classifying each video clip (422) to one of a plurality of segments of the GIT (412-420); classify each video clip (422) of a subset of the plurality of video clips, whose scores satisfy a confidence criterion, to one of the segments of the GIT (412-420); detect, in consecutive video clips (422) in the subset, a change from a first classification corresponding to a first segment of the GIT (100) to a second classification corresponding to a second segment of the GIT adjacent to the first segment of the GIT; and classify, among the video clips (422) in the subset, a transition between two adjacent segments (412-420) of the segments of the GIT based on the detected change from the first classification to the second classification.

2. The system (300) according to claim 1, wherein the transition between two adjacent segments of the segments of the GIT (412-420) is one of a transition between the stomach (412) and the small bowel (416) or a transition between the small bowel (416) and the colon (420).

3. The system (300) according to any one of claims 1 to 2, wherein the plurality of segments of the GIT (412-420) includes the stomach (412), a transition between the stomach and the small bowel (414), the small bowel (416), a transition between the small bowel and the colon (418), and the colon (420).

4. The system (300) according to any one of claims 1 to 3, wherein each video clip (422) of the plurality of video clips is from about seven seconds in length to about ten seconds in length.Attorney Docket: A0013182W0015. The system (300) according to any one of claims 1 to 4, wherein the instructions, when executed by the at least one processor (305), further cause the system to provide the subset of the plurality of video clips (422), as images from the plurality of video clips (422) whose scores, when normalized, are above an upper threshold or below a lower threshold but are not between the upper threshold and the lower threshold.

6. The system (300) according to any one of claims 1 to 5, further comprising a receiving device (214) configured to be secured to the person, to be communicatively coupled with the capsule endoscopy device (212), and to receive the plurality of video clips (422).

7. The system (300) according to any one of claims 1 to 6, wherein the transition between the two adjacent segments is a transition between an earlier segment of the GIT (412-420) and a later segment of the GIT (412-420).

8. The system (300) according to claim 1, further comprising: a capsule endoscopy device (212) configured to capture, over time, a plurality of video clips (422) of at least a portion of a gastrointestinal tract (GIT 100) of a person; and a receiving device (214) configured to be secured to the person, to be communicatively coupled with the capsule endoscopy device (212), and to receive the plurality of video clips (422); wherein the system (300) is configured to be communicatively coupled with the receiving device (214) to receive the plurality of video clips (422), the system (300) including at least one processor (305) and at least one memory (320) storing instructions which, when executed by the at least one processor (305), cause the system (300) to: access the plurality of video clips (422) of at least a portion of a gastrointestinal tract (GIT 100) captured by a capsule endoscopy device (212); for each video clip (422) of the plurality of video clips, provide, by a machine learning system (400), a score for classifying each video clip (422) to one segment of a plurality of segments of the GIT (412-420); classify each video clip (422) of a subset of the plurality of video clips, whose scores satisfy a confidence criterion, to one of the segments of the GIT (412-420);Attorney Docket: A0013182W001 detect, in consecutive video clips (422) in the subset, a change from a first classification corresponding to a first segment of the GIT to a second classification corresponding to a second segment of the GIT adjacent to the first segment of the GIT; and classify, among the video clips (422) in the subset, a transition between two adjacent segments of the segments of the GIT (412-420) based on the detected change from the first classification to the second classification.

9. The system (300) according to claim 8, wherein the machine learning system (400) is one of a deep learning neural network or a classic machine learning system.

10. The system (300) according to claim 8, wherein the transition between two adjacent segments of the segments of the GIT (412-420) is one of a transition between the stomach (412) and the small bowel (416) or a transition between the small bowel (416) and the colon (420).

11. The system (300) according to claim 8, wherein the plurality of segments of the GIT (412-420) includes the stomach (412), a transition between the stomach (412) and the small bowel (416), the small bowel (416), a transition between the small bowel (416) and the colon (420), and the colon (420).

12. The system (300) according to claim 8, wherein each video clip (422) of the plurality of video clips is from about seven seconds in length to about ten seconds in length.

13. The system (300) according to claim 8, wherein the instructions, when executed by the at least one processor (305), further cause the system (300) to provide the subset of the plurality of video clips (422), as images from the plurality of video clips (422) whose scores, when normalized, are above an upper threshold or below a lower threshold but are not between the upper threshold and the lower threshold.

14. The system (300) according to claim 8, wherein the transition between the two adjacent segments is a transition between an earlier segment of the GIT (412-420) and a later segment of the GIT (412-420).Attorney Docket: A0013182W00115. The system (300) according to any one of claims 1 to 14, wherein the memory (320) stores instructions which, when executed by the processor (305), cause the processor (305) to perform a method comprising: accessing a plurality of video clips (422) of at least a portion of a gastrointestinal tract (GIT 100) captured by a capsule endoscopy device (212); for each video clip (422) of the plurality of video clips, providing, by a deep learning neural network (400), a score for classifying each video clip (422) to one segment of a plurality of segments of the GIT (412-420); classifying each video clip (422) of a subset of the plurality of video clips, whose scores satisfy a confidence criterion, to one of the plurality of segments of the GIT (412-420); detecting, in consecutive video clips (422) in the subset, a change from a first classification corresponding to a first segment of the GIT (100) to a second classification corresponding to a second segment of the GIT adjacent to the first segment of the GIT; and classifying, among the video clips (422) in the subset, a transition between two adjacent segments of the plurality of segments of the GIT (412-420) based on the detected change from the first classification to the second classification.