Liver cancer image analysis system, method of segmenting liver cancer image, and medium
By combining DenseUNet and BDLSTM into the DA-BDLSTM-DenseUNet model, the problem of insufficient utilization of feature relationships in liver cancer detection is solved, achieving more efficient liver cancer segmentation and early diagnosis, and improving the accuracy and survival rate of liver cancer treatment.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- THE UNIVERSITY OF HONG KONG
- Filing Date
- 2021-03-23
- Publication Date
- 2026-06-16
AI Technical Summary
Existing technologies fail to effectively utilize the feature relationships in the encoding and decoding paths in liver cancer detection, and rely on object detection models for tumor location, resulting in inaccurate segmentation and time delays.
By combining DenseUNet, Bidirectional Long Short-Term Memory (BDLSTM) network, and attention mechanisms, a DA-BDLSTM-DenseUNet model is formed. This model enhances the network's expressive power by exploring the feature relationships in the encoding and decoding paths, and suppresses irrelevant background areas and highlights significant lesion areas through attention gating and attention mechanisms.
It improves the accuracy and efficiency of liver cancer segmentation, reduces reliance on clinicians' experience, enables earlier targeted therapy, and improves liver cancer survival rates.
Smart Images

Figure CN115699083B_ABST
Abstract
Description
[0001] Cross-reference to related applications
[0002] This international patent application claims the benefit of U.S. Provisional Patent Application No. 63 / 004,563, filed April 3, 2020, the entire contents of which are incorporated herein by reference. Technical Field
[0003] A system and method for detecting and characterizing liver cancer using artificial intelligence have been disclosed. Background Technology
[0004] Hepatocellular carcinoma (HCC) was the fourth leading cause of cancer death worldwide in 2018. In Hong Kong, China, HCC accounted for 10.3% of cancer deaths. Early diagnosis and detection of HCC can promote early treatment and improve survival rates. Generally, in accordance with international guidelines, suspected HCC patients are diagnosed by radiological examination using computed tomography (CT) or magnetic resonance (MR) scans, without liver biopsy. Artificial intelligence has become an important technological advancement in the field of medical diagnosis. Traditionally, clinicians analyze CT scan images by visual inspection, and the accuracy of diagnosis heavily depends on their experience. Therefore, accurate diagnosis of liver lesions can be a challenging task, and if continuous scanning is required, there may be a significant time delay before diagnosis, which may delay the time for effective treatment. Recently, researchers have adopted deep learning methods to use multi-phase CT images to diagnose HCC. Sun et al. [6] designed a multi-channel fully convolutional network for segmenting tumors from multi-phase contrast-enhanced CT images, in which a network is trained for each phase of CT image, and high-level features of multi-phase images are fused for tumor segmentation. Todoroki et al. [7] studied liver segmentation of contrast-enhanced multiphase CT images of liver tumors [12,13] and used a deep convolutional neural network for liver tumor classification. Lee et al. [9] proposed an optimized version of the Single-Step Multi-Box Detector (SSD)
[14] , which groups the convolutions of multiphase features and utilizes information from multiphase CT images. Liang et al.
[11] classified multiphase CT images of focal liver lesions by combining convolutional and recurrent networks. Ouhmichi et al. [8] proposed a cascaded convolutional neural network based on U-Net and designed two strategies to fuse multiphase information: 1) connecting multidimensional feature maps on the input layer, and 2) independently computing the output map for each phase before merging to produce the final segmentation. Summary of the Invention
[0005] To provide a basic understanding of certain aspects of the invention, a brief overview of the invention is given below. This overview is not a comprehensive summary of the invention. It is neither intended to identify key or essential elements of the invention nor to describe its scope. Rather, the sole purpose of this overview is to present some concepts of the invention in a simplified form as a prelude to the more detailed description that follows.
[0006] While the aforementioned works have achieved satisfactory performance, they do not consider the relationship between the encoded features in the encoding path and the upsampled features in the decoding path. Furthermore, these methods tend to rely on additional object detection models to locate tumor positions for reliable segmentation. To address these issues, this paper proposes an enhanced segmentation model that combines DenseUNet, bidirectional long short-term memory, and two attention mechanisms into these modules, termed DA-BDLSTM-DenseUNet. The advantages of this model are as follows:
[0007] DenseUNet can learn a wide variety of information features and enrich the network's expressive power by adjusting the information flow on the network.
[0008] Bidirectional LSTM explores the relationship between encoded features and upsampled features, and provides a novel strategy for feature fusion rather than feature concatenation;
[0009] In DenseUNet, attention gating suppression corresponds to the response of feature maps in irrelevant background regions and progressively highlights the response of obvious lesion regions; and
[0010] The attention mechanism in BDLSTM takes into account the difference in contributions between the encoded and decoded features and identifies the importance of the feature map from the perspective of channels.
[0011] This paper presents a novel segmentation model, DA-BDLSTM-DenseUNet, which integrates DenseUNet and bidirectional LSTM with an attention mechanism. DenseUNet enables the learning of sufficiently diverse features and enhances the network's representative power by modulating the information flow. The bidirectional LSTM is responsible for exploring the relationship between encoded and upsampled features in the encoding and decoding paths. Simultaneously, an attention gate (AG) is introduced into DenseUNet to progressively reduce responses from irrelevant background regions and progressively amplify responses from salient regions. The attention in the bidirectional LSTM considers the difference in contribution of encoded and upsampled features to segmentation improvement, which in turn adjusts the appropriate weights for these two types of features. Experiments are conducted on a liver CT image dataset collected from multiple hospitals, comparing the proposed DA-BDLSTM-DenseUNet method with state-of-the-art segmentation models. Experimental results demonstrate the effectiveness of the proposed method by achieving comparable performance in terms of dice coefficients.
[0012] To achieve the foregoing and related objectives, the invention includes features fully described below and specifically pointed out in the claims. The following description and drawings illustrate certain illustrative aspects and embodiments of the invention in detail. However, these are merely indications of some of the various ways in which the principles of the invention can be employed. Other objects, advantages, and novel features of the invention will become apparent from the following detailed description of the invention when considered in conjunction with the accompanying drawings. Attached Figure Description
[0013] Figure 1 The overall architecture of the proposed DA-BDLSTM-DenseUNet according to one embodiment is described.
[0014] Figure 2 The histogram shows the number of slices on the four phases of the dataset PYN.
[0015] Figure 3 The overall architecture of the DA-BDLSTM-DenseUNet described herein, according to one embodiment, is shown.
[0016] Figure 4 A block diagram of an example electronic computing environment that can be implemented by combining one or more aspects described herein is shown.
[0017] Figure 5 A block diagram depicts an example data communication network that can operate in conjunction with the various aspects described herein. Detailed Implementation
[0018] This paper discloses methods and systems for automated lesion detection and recognition in cross-sectional medical imaging of the liver, providing more efficient diagnosis and reducing the randomness associated with clinician experience and judgment. Simultaneously, these methods and systems enable earlier targeted medical treatment, thereby improving liver cancer survival rates. These methods and systems involve fusing Dense-UNet and BD-LSTM into a unified framework, introducing two attention mechanisms. Therefore, it combines the advantages of UNet, densely connected convolutional networks, and BD-LSTM in capturing salient feature responses corresponding to regional candidates of liver lesions. On the other hand, attention is emphasized on pixel-wise and feature map-level feature responses, i.e., responses for tumor detection and recognition. Thus, the segmentation performance of liver lesions is improved.
[0019] Some limitations in this field include 1) the availability of sufficient liver cross-sectional images and corresponding lesion segmentation annotations, 2) domain differences between liver cross-sectional images from different hospitals, and 3) hardware resource requirements. These methods and systems overcome the above limitations by: 1) collecting and annotating liver cross-sectional images with lesions; 2) employing domain adaptation to learn transferable knowledge; and 3) exploring novel network architectures to reduce hardware requirements through neural architecture search.
[0020] Since images with manual segmentation annotations form the basis for the effectiveness of this method, more images from hospitals were collected and annotated. Furthermore, publicly available annotated images can supplement this data during model training. Secondly, domain adaptation is utilized to transfer knowledge learned from one image dataset as the source domain to another image dataset as the target domain, taking into account differences in interdomain gaps or lesion distribution. Thirdly, inspired by the success of neural structure search, the feasibility of reducing hardware requirements based on the proposed method can be explored. Therefore, reductions in the required image volume, annotation costs, and diagnostic time can be achieved. Finally, the heatmap learned by the neural network based on the classification model and the constructive relationship between the heatmap and manual segmentation annotations are explored. Based on the learned relationship, the dependence on manual segmentation annotations using only lesion label information can be bypassed.
[0021] refer to Figure 1 The proposed model DA-BDLSTM-DenseUNet for liver lesion segmentation in CT images is illustrated in detail. The overall framework of the proposed DA-BDLSTM-DenseUNet consists of four parts: DenseUNet with a densely connected convolutional network as the backbone, BDLSTM, and a dual attention module.
[0022] DenseUNet
[0023] Inspired by the success of UNet and DenseNet, DenseUNet was proposed by replacing the backbone of a fully convolutional network with DenseNet. Therefore, DenseUNet has the following advantages: 1) It encourages the network to learn sufficiently diverse feature candidates, rather than redundant ones; 2) It enhances the network's learning ability by effectively allowing information flow and feature reuse; and 3) It reduces the risk of gradient explosion or vanishing, and gradients are quickly sent to their respective expected positions in the backpropagation path.
[0024] BD-LSTM
[0025] In natural language processing, bidirectional long short-term memory (BD-LSTM) is widely used to learn the temporal relationships of data. In this method, we explore the correlation between the encoded features in the encoder and the corresponding upsampled features in the decoder in two directions. Essentially, a BD-LSTM unit consists of three gates and a cell. The three gates are the input gate, the forget gate, and the output gate, and they control the amount of information entering and leaving the cell. The cell is designed to remember values in the temporal dimension. BD-LSTM can be described as follows:
[0026] i t = σ g (W i x t + U i h t−1 + b i (1)
[0027] f t = σ g (W f x t + U f h t−1 + b f (2)
[0028] o t = σ g (W o x t + U o h t−1 + b o (3)
[0029] (4)
[0030] (5)
[0031] in, It is an element-wise product, where t represents the t-th timestamp, and σ g and σ c These are the sigmoid and hyperbolic tangent functions, respectively, and σ h (x) = x. By considering the forward and backward hidden states, the output of BD-LSTM can be defined as:
[0032]
[0033] Dual attention mechanism
[0034] Our method employs two attention mechanisms, operating within the DenseUNet and BD-LSTM modules, respectively. Specifically, in standard CNNs like the DenseUNet encoder, the receptive field becomes larger as the depth of convolutional layers increases to capture contextual information. In other words, features learned through convolutional kernels construct relationships between objects at a coarser grid level. However, avoiding false positive predictions remains challenging for small objects with significant shape variability. Existing segmentation methods introduce auxiliary object localization subnetworks to address this issue, resulting in a large number of network parameters. In practice, the object localization subnetwork can be replaced by an attention gate (AG) capable of pruning responses to irrelevant context regions and identifying relevant salient regions layer by layer. For each pixel vector xl RFl, where Fl represents the number of convolutional kernels in layer 1, the output of the AG is the element-wise product of the feature response and the corresponding attention activation response, defined as:
[0035]
[0036] in
[0037]
[0038] Where σ1 and σ att It uses ReLU and sigmoid functions, and AG passes through θ. att To parameterize, including W x ∈ RFl×Fa,W g ∈ RFg×Fa, Ψ ∈ RFa×1, b Ψ ∈ R and b g ∈ RFa.
[0039] Another attention mechanism is introduced in BD-LSTM for neural machine translation. This attention gives different attention to different parts of the encoded input information. Similarly, we explore the difference in contributions of encoded features and upsampled features to segmentation. This attention...
[0040] It can be expressed by the following formula:
[0041]
[0042] Where v a and W a Let β and β represent the transformation matrix, respectively. i,t Reflecting input x i The importance of each hidden state to the corresponding output.
[0043] Example
[0044] Unless otherwise stated in the embodiments, description and claims, all parts and percentages are by weight, all temperatures are in degrees Celsius, and pressures are at or near atmospheric pressure.
[0045] In this section, the effectiveness of the proposed method is evaluated through experiments on two liver CT image datasets: one is the public dataset LiTS, and the other was collected from a hospital in Hong Kong, China. The segmentation quality is measured using the dice coefficient (DC), which is defined as follows:
[0046]
[0047] Where Y and Y' represent the ground truth and predicted labels of all pixels in a given CT image, respectively.
[0048] Data Description
[0049] The first dataset is LiTS, which includes 100 and 31 contrast-enhanced abdominal CT images, used as the training and test sets, respectively. The slice segmentation distance in LiTS varies between 0.46 mm and 6.0 mm. The data used in this study were collected from a hospital in Hong Kong, China, and include 1 / 1.25 / 1.5 mm thin-slice abdominal CT images from 571 patients. Among them, 72 patients were diagnosed with HCC, and 499 patients had other non-CC lesions or no lesions. Each patient's CT images were divided into four phases: non-contrast-enhanced, late arterial, portal venous, and delayed. Both CT image datasets are 512512 pixels in size, while the number of slices per patient in each phase can vary, as shown in Table 1. Figure 2 The above is a summary. Table 1 shows a statistical description of the number of slices on the four phases of the PYN dataset.
[0050] Table 1
[0051]
[0052] Quantitative assessment
[0053] In this section, Table 2 shows the comparison results of the method DA-BDLSTM-DenseUNet presented in this paper with existing segmentation models. Table 2 shows the comparison results of DC by competing methods and DA-BDLSTM-DenseUNet on the LiTS and PYN datasets. These competitors include UNet [4], ResUNet [5], and DenUNet [6]. In addition, BDLSTM was integrated into these models to obtain BDLSTM-UNet, BDLSTM-ResUNet, and BDLSTM-DenseUNet, respectively. The reported best liver lesion segmentation result [7] has reached 0.8570, which was evaluated on the LITS dataset. It can be observed from Table 2 that ResUNet and DenUNet can achieve better segmentation than UNet. The reason is that ResUNet uses skip connections to regulate the flow of identity information, which allows deeper networks to enhance representativeness. Similarly, DenUNet improves the network's capabilities by densely connecting the preceding layers to all subsequent layers. Meanwhile, the combined version of BDLSTM and UNet achieves approximately 1.2% better segmentation performance than UNet. Similar observations were observed by comparing ResUNet and BDLSTM-ResUNet, DenUNet and BDLSTM-DenUNet respectively. This indicates that BDLSTM can help improve segmentation performance by exploring the relationship between encoded features and upsampled features. Our DA-BDLSTM-DenseUNet achieves approximately 0.85% better performance than BDLSTM DenUNet due to the introduction of two attention mechanisms in DenseUNet and BDLSTM. The attention gate in DenseUNet allows focus on pixel-wise responses in salient regions of lesion candidates, and the attention in BDLSTM considers the contribution differences between encoded and upsampled features to assign appropriate weights before feature concatenation. Therefore, DA-BDLSTM-DenseUNet performs best among all compared methods.
[0054] Table 2
[0055]
[0056] Furthermore, we explored the contribution of two attention mechanisms in our method to segmentation performance through ablation studies. Specifically, we removed the attention gates in DenseUNet and the attention mechanism in BDLSTM, and denoted them as BDLSTM-DenseUNet-V1 and BDLSTM-DenseUNet-V2, respectively.
[0057] The comparison results are shown in Table 3. Table 3 reports the results of ablation studies on LiTS and PYN data in the DC aspect according to one embodiment, by removing the attention mechanism module in the method described herein. Compared with BDLSTM-DenseUNet, we observed an improvement of 0.47% and 0.63% in attention in DenseUNet and BDLSTM, respectively. This demonstrates the effectiveness of the attention mechanism. The contribution of attention in DenseUNet to the performance improvement is greater than that in BDLSTM, because the latter works on the basis of the former, which can emphasize significant responses and suppress irrelevant responses. Furthermore, the combination of these two attention methods achieves better segmentation.
[0058] Table 3
[0059]
[0060] Quantitative assessment
[0061] In this section, the segmentation results obtained by our method, along with the original CT image slices and corresponding ground truth values, are presented. Figure 3 As shown in the figure, the method described here can identify lesion locations and achieve satisfactory segmentation performance in simple CT image slices. Regarding images... Figure 3 (d) Difficult sections with lesions of three or more different sizes, although the methods described here can locate most lesions, the segmented area is smaller than that of the true lesion.
[0062] Summarize
[0063] A novel network scheme for automatic liver lesion segmentation is proposed, named DA-BDLSTN-DenseUNet. First, a densely connected convolutional network is used as the backbone of UNet. Second, bidirectional long short-term memory (BD-LSTM) is introduced to explore the correlation between encoded features in the encoding path and upsampled features in the decoding path. Third, an attention mechanism is fused between convolution and BD-LSTM. Experiments were conducted on liver CT images collected from several hospitals. Experimental results validate the effectiveness of the proposed method in terms of dice coefficients.
[0064] Furthermore, the feasibility of using Generative Adversarial Networks (CGANs) to synthesize sufficiently diverse liver images can be explored. First, CGANs can generate images with specific types of lesions, alleviating the problems of data scarcity and class imbalance. Therefore, it can improve the performance of lesion classification. Second, all CT images and corresponding segments from one hospital, along with partial CT images and corresponding segments from another hospital, along with unlabeled CT images, can be used to construct a semi-supervised segmentation scheme via domain adaptation. Here, the GAN learns the distribution of lesions and generates manually annotated images of given lesions. Furthermore, by using the ideas of domain adaptation and multi-model learning, the designed classification and segmentation models trained on CT images can be ported to MR images, as it is easy to treat CT and MR images as source and target domains or two related but distinct model datasets. Additionally, lesion segmentation on multiple organs can be explored simultaneously, such as the joint analysis of the liver and lungs.
[0065] Example computing environment
[0066] As mentioned above, advantageously, the techniques described herein can be applied to any device and / or network used to perform data analysis. The following... Figure 4 The general-purpose remote computer described herein is merely an example, and the disclosed subject matter can be implemented with any client that has network / bus interoperability and interaction. Therefore, the disclosed subject matter can be implemented in a networked hosting service environment containing very few or minimal client resources, such as a networked environment where client devices are used solely as interfaces to the network / bus, like objects placed within a facility.
[0067] Although not essential, some aspects of the disclosed subject matter can be implemented in part via an operating system for use by developers of services for devices or objects, and / or included within application software that operates in conjunction with components of the disclosed subject matter. The software can be described in the general context of computer-executable instructions, such as program modules or components, and executed by one or more computers (e.g., projection display devices, viewing devices, or other devices). Those skilled in the art will understand that the disclosed subject matter can be practiced with other computer system configurations and protocols.
[0068] therefore, Figure 4 Examples of suitable computing system environments 1100 in which some aspects of the disclosed subject matter can be implemented are shown. However, as stated above, computing system environment 1100 is merely one example of a suitable computing environment for the device and is not intended to impose any limitation on the scope or functionality of the disclosed subject matter. Nor should computing environment 1100 be construed as having any dependency or requirement on any component or combination thereof shown in the exemplary operating environment 1100.
[0069] refer to Figure 4 Exemplary devices for implementing the disclosed subject matter include general-purpose computing devices in the form of a computer 1110. Components of the computer 1110 may include, but are not limited to, a processing unit 1120, system memory 1130, and a system bus 1121 that couples various system components, including the system memory, to the processing unit 1120. The system bus 1121 may be any of several types of bus architectures, including a memory bus or memory controller, a peripheral bus, and a local bus using any of various bus architectures.
[0070] Computer 1110 typically includes a variety of computer-readable media. Computer-readable media can be any available medium accessible to computer 1110. By way of example and not limitation, computer-readable media can include computer storage media and communication media. Computer storage media includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storing information such as computer-readable instructions, data structures, program modules, or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other storage technologies, CD-ROM, digital versatile disc (DVD) or other optical disc storage, magnetic tape cassettes, magnetic tape, disk storage or other magnetic storage devices, or any other medium that can be used to store desired information and is accessible to computer 1110. Communication media typically contains computer-readable instructions, data structures, program modules, or other data in modulated data signals such as carrier waves or other transmission mechanisms, and includes any information transmission medium.
[0071] System memory 1130 may include computer storage media in the form of volatile and / or non-volatile memory, such as read-only memory (ROM) and / or random access memory (RAM). A basic input / output system (BIOS) containing basic routines, such as those that facilitate the transfer of information between components within computer 1110 at startup, may be stored in memory 1130. Memory 1130 typically also includes data and / or program modules that are readily accessible and / or currently in operation to processing unit 1120. By way of example and not limitation, memory 1130 may also include an operating system, application programs, other program modules, and program data.
[0072] Computer 1110 may also include other removable / non-removable, volatile / non-volatile computer storage media. For example, computer 1110 may include a hard disk drive that reads from or writes to a non-removable, non-volatile magnetic medium, a disk drive that reads from or writes to a removable, non-volatile magnetic disk, and / or an optical disk drive that reads from or writes to a removable, non-volatile optical disk, such as a CD-ROM or other optical media. Other removable / non-removable, volatile / non-volatile computer storage media that may be used in the exemplary operating environment include, but are not limited to, magnetic tape cassettes, flash memory cards, digital multifunction disks, digital videotapes, solid-state RAM, solid-state ROM, etc. Hard disk drives are typically connected to system bus 1121 via a non-removable memory interface such as an interface, while disk drives or optical disk drives are typically connected to system bus 1121 via a removable memory interface such as an interface.
[0073] Users can input commands and information into computer 1110 using input devices such as keyboards and pointing devices (typically mice, trackballs, or touchpads). Other input devices may include microphones, joysticks, game controllers, satellite dishes, scanners, wireless keyboards, voice commands, etc. These and other input devices are typically connected to processing unit 1120 via user input 1140 coupled to system bus 1121 and associated interfaces, but may also be connected via other interfaces and bus structures such as parallel ports, game ports, or Universal Serial Bus (USB). The graphics subsystem may also be connected to system bus 1121. Projection units in projection display devices or HUDs or other types of display devices in viewing devices may also be connected to system bus 1121 via interfaces such as output interface 1150, which in turn can communicate with video memory. In addition to a monitor, the computer may also include other peripheral output devices, such as speakers that can be connected via output interface 1150.
[0074] Computer 1110 can operate in a networked or distributed environment using a logical connection to one or more other remote computers (such as remote computer 1170), which may have media capabilities different from those of device 1110. Remote computer 1170 may be a personal computer, server, router, network PC, peer-to-peer device, personal digital assistant (PDA), mobile phone, handheld computing device, projection display device, viewing device or other common network node, or any other remote media consumption or transmission device, and may include any or all of the elements described above with respect to computer 1110. Figure 4The logical connections described include networks 1171, such as local area networks (LANs) or wide area networks (WANs), but may also include other wired or wireless networks / buses. This type of network environment is common in home, office, enterprise-wide computer networks, intranets, and the Internet.
[0075] When used in a LAN network environment, computer 1110 can be connected to LAN 1171 via a network interface or adapter. When used in a WAN network environment, computer 1110 may typically include communication components, such as a modem, or other devices for establishing communication over a WAN, such as the Internet. Communication components, such as wireless communication components or modems, which may be built-in or external, can be connected to system bus 1121 via user input interface 1140 or other suitable mechanisms. In a networked environment, program modules or portions thereof described relative to computer 1110 may be stored in a remote memory storage device. It should be understood that the network connections shown and described are exemplary, and other means of establishing communication links between computers may be used.
[0076] Example network environment
[0077] Figure 5 A schematic diagram of an exemplary networked or distributed computing environment 1200 is provided. The distributed computing environment includes computing objects 1210, 1212, etc., and computing objects or devices 1220, 1222, 1224, 1226, 1228, etc., which may include programs, methods, data storage, programmable logic, etc., as represented by applications 1230, 1232, 1234, 1236, 1238 and data storage 1240. It can be understood that computing objects 1210, 1212, etc., and computing objects or devices 1220, 1222, 1224, 1226, 1228, etc., may include different devices, including multimedia display devices or similar devices described in the illustration, or other devices such as mobile phones, personal digital assistants (PDAs), audio / video devices, MP3 players, personal computers, laptops, etc. It should also be understood that data storage 1240 may include one or more cache memories, one or more registers, or other similar data storage disclosed herein.
[0078] Each computing object 1210, 1212, etc., and computing objects or devices 1220, 1222, 1224, 1226, 1228, etc., can communicate directly or indirectly with one or more other computing objects 1210, 1212, etc., and computing objects or devices 1220, 1222, 1224, 1226, 1228, etc., through the communication network 1242. Although in Figure 5 While shown as a single element, the communication network 1242 may include [a network for communication with other components]. Figure 5The system may provide services to other computing objects and computing devices, and / or may represent multiple interconnected networks not shown. Each computing object 1210, 1212, etc., or computing object or device 1220, 1222, 1224, 1226, 1228, etc., may also contain application programs, such as applications 1230, 1232, 1234, 1236, 1238, which may utilize APIs or other objects, software, firmware, and / or hardware adapted to communicate with or implement the technologies and disclosures described herein.
[0079] There are various system, component, and network configurations that support distributed computing environments. For example, computing systems can be connected together via wired or wireless systems, through local networks, or through widely distributed networks. Currently, many networks are coupled to the Internet, which provides the infrastructure for widely distributed computing and contains many different networks, although any network infrastructure can be used for exemplary communications related to the automatic diagnostic data collection of systems as described in the various embodiments herein.
[0080] Therefore, a wide range of network topologies and infrastructures can be utilized, such as client / server, peer-to-peer, or hybrid architectures. A "client" is a member of a class or group that uses services from another class or group that is unrelated to it. A client can be a process that requests services from another program or process; that is, it is essentially a set of instructions or tasks. The client process utilizes the requested services and, in some cases, does not need to "know" any working details about the other program or service itself.
[0081] In client / server architectures, especially in networked systems, clients are typically computers that access shared network resources provided by another computer (such as a server). Figure 5 In the illustration, as a non-limiting example, computing objects or devices 1220, 1222, 1224, 1226, 1228, etc. can be considered as clients, and computing objects 1210, 1212, etc. can be considered as servers, wherein computing objects 1210, 1212, etc. act as servers to provide data, services, such as receiving data from client computing objects or devices 1220, 1222, 1224, 1226, 1228, etc., storing data, processing data, and transmitting data to client computing objects or devices 1220, 1222, 1224, 1226, 1228, etc., although, depending on the specific circumstances, any computer can be considered as a client, a server, or both.
[0082] A server is typically a remote computer system accessible via a remote or local network, such as the Internet or wireless network infrastructure. Client processes may be active on a first computer system, while server processes may be active on a second computer system. They communicate with each other via a communication medium, thereby providing distributed functionality and allowing multiple clients to utilize the server's information gathering capabilities. Any software objects utilized according to the techniques described herein may be provided independently or distributed across multiple computing devices or objects.
[0083] In a network environment where the communication network 1242 or bus is the Internet, for example, computing objects 1210, 1212, etc., can be network servers. Other computing objects or devices 1220, 1222, 1224, 1226, 1228, etc., communicate through this network server via any of a variety of known protocols, such as Hypertext Transfer Protocol (HTTP). Computing objects 1210, 1212, etc., can act as servers or clients. For example, computing objects or devices 1220, 1222, 1224, 1226, 1228, etc., may be characteristic of a distributed computing environment.
[0084] Throughout this specification, references to "an embodiment," "an example," "an implementation," "a disclosed aspect," or "an aspect" mean that a particular feature, structure, or characteristic described in connection with an embodiment, implementation, or aspect is included in at least one embodiment, implementation, or aspect of this disclosure. Therefore, the appearance of the phrases "in an embodiment," "in an example," "in an aspect," "in an implementation," or "in an embodiment" in different places throughout the specification does not necessarily refer to the same embodiment. Furthermore, in the various disclosed embodiments, specific features, structures, or characteristics can be combined in any suitable manner.
[0085] As used herein, the terms “component,” “system,” “architecture,” “engine,” etc., are intended to refer to a computer or electronically related entity, or hardware, a combination of hardware and software, software (e.g., in execution), or firmware. For example, a component can be one or more transistors, memory cells, an arrangement of transistors or memory cells, a gate array, a programmable gate array, an application-specific integrated circuit, a controller, a processor, a process running on a processor, an object that accesses or connects to semiconductor memory, a computer, etc., an executable program, a program or application, or a suitable combination thereof. The component may include erasable programming (e.g., processing instructions at least partially stored in erasable memory) or hard programming (e.g., processing instructions burned into non-erasable memory at manufacturing time).
[0086] For example, both processes executing from memory and processors can be components. As another example, an architecture can include electronic hardware (e.g., parallel or serial transistors), processing instructions, and an arrangement of processors that implements the processing instructions in a manner suitable for the electronic hardware arrangement. Furthermore, an architecture can include individual components (e.g., transistors, gate arrays, etc.) or arrangements of components (e.g., series or parallel arrangements of transistors, gate arrays connected to program circuitry, power leads, electrical grounds, input signal lines, and output signal lines, etc.). A system can include one or more components and one or more architectures. An example system can include a switch block architecture that includes crossed input / output lines and transmission gate transistors, as well as one or more power supplies, signal generators, communication buses, controllers, I / O interfaces, address registers, etc. It should be understood that some overlap in the definition is foreseeable, and that an architecture or system can be a separate component or a component of another architecture, system, etc.
[0087] In addition to the foregoing, the disclosed subject matter can be implemented as a method, apparatus, or article of manufacture that uses typical manufacturing, programming, or engineering techniques to produce hardware, firmware, software, or any suitable combination thereof to control electronic devices to realize the disclosed subject matter. The terms “apparatus” and “article of manufacture” as used herein are intended to cover electronic devices, semiconductor devices, computers, or computer programs accessible from any computer-readable device, carrier, or medium. Computer-readable media can include hardware media or software media. Furthermore, media can include non-transitory media or transmission media. In one example, a non-transitory medium can include a computer-readable hardware medium. Specific examples of computer-readable hardware media may include, but are not limited to, magnetic storage devices (e.g., hard disks, floppy disks, magnetic stripes…), optical discs (e.g., compact discs (CDs), digital multifunction discs (DVDs)…), smart cards, and flash memory devices (e.g., cards, sticks, key drives…). Computer-readable transmission media can include carrier waves, etc. Of course, those skilled in the art will recognize that many modifications can be made to this configuration without departing from the scope or spirit of the disclosed subject matter.
[0088] For any number or range of values with a given property, a number or parameter from one range can be combined with another number or parameter from a different range of the same property to produce a range of values.
[0089] Unless otherwise stated in the operational examples or otherwise described, all numbers, values and / or expressions refer to the amount of ingredients, reaction conditions, etc., and in all cases, the term "about" as used in the specification and claims should be understood as being modified by the term "about".
[0090] While the invention has been explained in conjunction with certain embodiments, it should be understood that various modifications will become apparent to those skilled in the art upon reading the specification. Therefore, it should be understood that the invention disclosed herein is intended to cover such modifications falling within the scope of the appended claims.
[0091] Note that the following is cited here for reference:
[0092] [6]X. Li., H. Chen, X. Qi, Q. Dou, CW Fu, PA Heng: H-DenseUNet: hybrid densely connected UNet for liver and tumor segmentation from CTvolumes. IEEE Trans-actions on Medical Imaging, 37 (12), pp. 26632674, 2016.
[0093] [7]C. Sun, S. Guo, H. Zhang, J. Li, M. Chen, S. Ma, L. Jin, X. Liu,X. Li and X. Qian: Automatic segmentation of liver tumors from multiphasecontrast-enhanced CT images based on FCNs, Artificial Intelligence inMedicine, 83, pp. 58-66, 2017.
[0094] [8] F. Ouhmich, V. Agnus, V. Noblet, F. Heitz, P. Pessaux: Livertissue segmentation in multi-phase CT scans using cascaded convolutionalneural networks. International Journal of Computer Assisted Radiology andSurgery, 14, pp. 1275-1284, 2019.
[0095] [9]S. Lee, J.S. Bae, H. Kim, J.H. Kim, S. Yoon: Liver lesiondetection from weakly- labeled multi-phase CT volumes with a grouped singleshot multi-box detector, MICCAI, 2018.
[0096]
[11] D. Liang, L. Lin, H. Hu, Q. Zhang, Q. Chen, X. Han, Y. Chen:Combining con- volutional and recurrent neural networks for classification offocal liver lesions in multi-phase CT images. In international conference onmedical image computing and computer-assisted intervention, pp. 666-675,Springer, Cham, 2018.
[0097]
[12] C. Dong, Y. Chen, L. Lin, H. Hu, C. Jin, H. Yu, X. Han, T.Tomoko: Simultane- ous segmentation of multiple organs using random walks. Injournal of information processing, 24(2), pp. 320-329, 2016.
[0098]
[13] C. Dong, Y. Chen, A. Foruzan, L. Lin, X. Han, T. Tomoko, X. Wu,X. Gang, H. Jiang: Segmentation of liver and spleen based on computationalanatomy models. In computers in biology and medicine, 67, pp. 146160, 2015.
[0099]
[14] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, A.C. Berg: Ssd: Single shot multibox detector. In European conference oncomputer vision, pp. 21-37. Springer, Cham, 2016.
Claims
1. A liver cancer image analysis system, comprising: Memory that stores executable computer components; and A processor that executes the computer-executable component stored in the memory, wherein the computer-executable component includes: A learning component learns multiple diverse features of a liver sample and calculates the relationship between encoded features and upsampled features in the encoding and decoding paths. The learning component includes a DenseUNet unit and a bidirectional long short-term memory (BD-LSTM) unit. The DenseUNet unit learns multiple diverse features of the liver sample, and the BD-LSTM unit calculates the relationship between encoded features and upsampled features in the encoding and decoding paths. A first attention mechanism and a second attention mechanism operate in the DenseUNet unit and the BD-LSTM unit, respectively. The first attention mechanism includes attention gates capable of pruning responses to irrelevant context regions and identifying relevant salient regions layer by layer. The BD-LSTM unit introduces a second attention mechanism that gives different attention to different parts of the encoded input information. The BD-LSTM unit is located between the encoding and decoding paths of the DenseUNet unit. The attention gate includes a response filter assembly that gradually reduces the response to irrelevant background regions and amplifies the response to salient regions; and The second attention mechanism includes a weighting component that determines the relative contribution of the encoded feature and the upsampled feature.
2. The system according to claim 1, wherein, The BD-LSTM unit includes an input gate, a forget gate, an output gate, and a first unit. The input gate, forget gate, and output gate are used to control the amount of information entering and leaving the first unit, and the first unit is used to store values in the time dimension.
3. The system according to claim 1, wherein, The BD-LSTM unit is represented by the following formula: i t = σ g (W i x t + U i h t−1 + b i )(1) f t = σ g (W f x t + U f h t−1 + b f )(2) o t = σ g (W o x t + U o h t−1 + b o )(3) (4) (5) in, It is an element-wise product, where t represents the t-th timestamp, and σ g and σ c These are the sigmoid and hyperbolic tangent functions, respectively, and σ h (x) = x.
4. The system according to any one of claims 1-3, wherein, The output of the BD-LSTM, which takes into account both the forward and backward hidden states, is defined by the following equation: 。 5. The system according to claim 1, wherein, The response filter component is defined by the following formula: in where σ1and σ att are ReLU and sigmoid functions, and W x ∈ RF1x Fa, W g ∈ RFg x Fa, Ψ ∈ RFa x 1, b Ψ ∈ R and b g ∈ RFa.
6. The system according to claim 1, wherein, The weighting component is represented by the following formula: where v a and W a are respectively the transformation matrices, β i,t reflects the importance of each hidden state of the input x i to the corresponding output.
7. A method for segmenting liver cancer images, comprising: a) Learn multiple diverse features of a liver sample and calculate the relationship between encoded features and upsampled features in the encoding and decoding paths, wherein the learning in step a) is performed by a DenseUNet unit and a bidirectional long short-term memory (BD-LSTM) unit, the DenseUNet unit learns multiple diverse features of the liver sample, and the BD-LSTM unit calculates the relationship between encoded features and upsampled features in the encoding and decoding paths, a first attention mechanism and a second attention mechanism operate in the DenseUNet unit and the BD-LSTM unit, respectively, the first attention mechanism including attention gates that can prune responses of irrelevant context regions and identify relevant salient regions layer by layer, the BD-LSTM unit introduces the second attention mechanism, the second attention mechanism giving different attention to different parts of the encoded input information, and the BD-LSTM unit is located between the encoding and decoding paths of the DenseUNet unit; b) The attention gate gradually reduces the response of irrelevant background regions and amplifies the response of salient regions; c) The second attention mechanism determines the relative contributions of the encoded features and the upsampled features; and d) Segmenting liver cancer images from the liver sample.
8. The method according to claim 7, wherein, The BD-LSTM unit includes an input gate, a forget gate, an output gate, and a first unit. The input gate, forget gate, and output gate are used to control the amount of information entering and leaving the first unit, and the first unit is used to store values in the time dimension.
9. The method according to claim 7, wherein, The BD-LSTM unit is represented by the following formula: i t = σ g (W i x t + U i h t−1 + b i )(1) f t = σ g (W f x t + U f h t−1 + b f )(2) o t = σ g (W o x t + U o h t−1 + b o )(3) (4) (5) in, It is an element-wise product, where t represents the t-th timestamp, and σ g and σ c These are the sigmoid and hyperbolic tangent functions, respectively, and σ h (x) = x.
10. The method according to any one of claims 7-9, wherein, The output of the BD-LSTM, which takes into account both the forward and backward hidden states, is defined by the following equation: 。 11. The method according to claim 7, wherein, Step b) is performed by a response filter component, which is defined as follows: in where σ1and σ att are ReLU and sigmoid functions, and W x ∈ RF1x Fa, W g ∈ RFg x Fa, Ψ ∈ RFa x 1, b Ψ ∈ R and b g ∈ RFa.
12. The method according to claim 7, wherein, Step c) is performed by a weighted component, which is represented by the following formula: where v a and W a represent the transformation matrices, β i,t reflects the importance of each hidden state of the input x i to the corresponding output.
13. A non-transitory machine-readable storage medium comprising executable instructions that, when executed by a processor, facilitate the execution of operations, said operations including: a) Learn multiple diverse features of a liver sample and compute the relationship between encoded features and upsampled features in the encoding and decoding paths, wherein the learning in step a) is performed by a DenseUNet unit and a bidirectional long short-term memory (BD-LSTM) unit, wherein the BD-LSTM unit in step a) computes the relationship between encoded features and upsampled features in the encoding and decoding paths, a first attention mechanism and a second attention mechanism operate in the DenseUNet unit and the BD-LSTM unit, respectively, the first attention mechanism including attention gates capable of pruning responses to irrelevant context regions and identifying relevant salient regions layer by layer, the BD-LSTM unit introducing the second attention mechanism, the second attention mechanism giving different attention to different parts of the encoded input information, the BD-LSTM unit being located between the encoding and decoding paths of the DenseUNet unit; b) The attention gate gradually reduces the response of irrelevant background regions and amplifies the response of salient regions; c) The second attention mechanism determines the relative contributions of the encoded features and the upsampled features; and d) Segmenting liver cancer images from the liver sample.
14. The non-transitory machine-readable storage medium according to claim 13, wherein, The BD-LSTM unit includes an input gate, a forget gate, an output gate, and a first unit. The input gate, forget gate, and output gate are used to control the amount of information entering and leaving the first unit, and the first unit is used to store values in the time dimension.
15. The non-transitory machine-readable storage medium according to claim 13, wherein, The BD-LSTM unit is represented by the following formula: i t = σ g (W i x t + U i h t−1 + b i )(1) f t = σ g (W f x t + U f h t−1 + b f )(2) the t = σ g (W o x t + U o h t−1 + b o (3) (4) (5) in, It is an element-wise product, where t represents the t-th timestamp, and σ g and σ c These are the sigmoid and hyperbolic tangent functions, respectively, and σ h (x) = x.
16. The non-transitory machine-readable storage medium according to any one of claims 13-15, wherein, The output of the BD-LSTM, considering both the forward and backward hidden states, is defined as: 。 17. The non-transitory machine-readable storage medium according to claim 13, wherein, Step b) is performed by a response filter component, which is defined by the following formula: in Where σ1 and σ att It is the ReLU and sigmoid functions, and W x ∈ RFl×Fa,W g ∈ RFg×Fa, Ψ ∈ RFa×1, b Ψ ∈ R and b g ∈ RFa.
18. The non-transitory machine-readable storage medium according to claim 13, wherein, Step c) is performed by a weighted component, which is represented by the following formula: Where v a and W a Let β and β represent the transformation matrix, respectively. i,t Reflecting input x i The importance of each hidden state to the corresponding output.