Segmentation apparatus and segmentation method for medical images

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By using an ANN processor to adjust the shape of anatomical structures and generate segmentation masks, the problems of disconnection and error accumulation in existing medical image segmentation techniques are solved, achieving more accurate anatomical structure segmentation and motion tracking.

CN115830312BActive Publication Date: 2026-06-23SHANGHAI UNITED IMAGING INTELLIGENCE CO LTD

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: SHANGHAI UNITED IMAGING INTELLIGENCE CO LTD
Filing Date: 2022-11-09
Publication Date: 2026-06-23

AI Technical Summary

Technical Problem

Existing medical image segmentation techniques suffer from problems such as disconnected connections, incorrect anatomy, and inconsistent results when processing anatomical structures. In particular, error accumulation affects the integrity of the results in motion tracking applications, and the lack of local image feature estimation makes it difficult.

Method used

An artificial neural network (ANN) processor is used to receive medical scan images of anatomical structures and point cloud representations of statistical shapes. The shape of the anatomical structures is adjusted using deformation and affine transformation parameters, and a segmentation mask is generated. During training, the parameters are optimized using a loss function to improve segmentation accuracy.

Benefits of technology

It improves the accuracy and consistency of medical image segmentation and motion tracking, reduces error accumulation, and enhances the ability to track the movement of anatomical structures.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN115830312B_ABST

Patent Text Reader

Abstract

A medical image segmentation device and method. Systems, methods, and apparatuses associated with segmenting and / or determining the shape of anatomical structures are described herein. These tasks are performed using artificial neural networks (ANNs) based on statistical shape models of anatomical structures. The ANNs are trained by evaluating and backpropagating a plurality of losses associated with shape estimation and segmentation mask generation. Models obtained using these techniques can be used for different clinical purposes, including, for example, motion estimation and motion tracking.

Need to check novelty before this filing date? Find Prior Art

Description

TECHNICAL FIELD

[0001] The present application relates to the field of medical image processing, and in particular to a segmentation device and method for medical images. BACKGROUND

[0002] Segmentation is an important medical image analysis technique. By identifying pixels of an anatomical structure (such as a human heart) from a background medical image, the technique can provide key knowledge about the shape and / or volume of the anatomical structure, which can then be used for multiple clinical purposes, including, for example, volume analysis, strain analysis, motion estimation, and / or motion tracking. With the aid of newly developed machine learning methods and deeper and faster artificial neural networks, segmentation techniques of the prior art have been greatly improved in terms of speed and accuracy. However, due to the similarity of anatomical structures, image artifacts, etc., which often lead to undesirable defects such as disconnected, false anatomy, inconsistent results, etc., there are still many challenges. When applied in motion tracking applications (such as applications that rely on tracking features across multiple image frames), the shortcomings of existing segmentation techniques can be exacerbated, as errors associated with feature prediction can accumulate and eventually affect the integrity of the resulting results when frames move away from the initial position. In the absence of local image features, it can also be a very challenging task to directly estimate point correspondences between different image frames.

[0003] Therefore, it is highly desirable to have systems, methods, apparatuses for improving the quality of medical image segmentation and / or motion tracking. SUMMARY

[0004] Systems, methods, and apparatuses associated with organ shape tracking and image segmentation are described herein. An apparatus configured to perform these tasks can include one or more processors configured to receive a representation of an anatomical structure and a medical scan image of the anatomical structure. The representation can include a point cloud indicative of a statistical shape of the anatomical structure. Such a shape can be, for example, an average shape of the anatomical structure determined based on a predetermined statistical shape model of the anatomical structure. The one or more processors of the apparatus can be configured to implement an artificial neural network (ANN) and can use the ANN to determine, based on the received medical scan image, a plurality of first parameters for adjusting a shape of the anatomical structure indicated by the received representation and a plurality of second parameters for transforming the received representation. Using the plurality of first parameters and the plurality of second parameters, the one or more processors of the apparatus can be further configured to use the ANN to generate a modified representation of the anatomical structure and segment (e.g., by a segmentation mask) the anatomical structure in the medical scan image based on the modified representation of the anatomical structure.

[0005] In the example, the ANN described herein may include one or more rendering layers configured to generate segmentation masks in a differentiable manner based on a modified representation of the anatomical structure. This allows a loss to be determined based on the segmentation mask during ANN training, and this loss can be used to modify the parameters of the ANN. In the example, the ANN described herein may include one or more shape adjustment layers configured to adjust the shape of the anatomical structure using multiple first parameters to obtain a distorted representation of the anatomical structure, and the ANN may also include one or more transformation layers configured to apply affine transformations to the distorted representation of the anatomical structure using multiple second parameters.

[0006] In the example, the ANN described herein can be trained through a process comprising: receiving training images of anatomical structures; receiving a training representation of the anatomical structure (e.g., a point cloud) indicating the average shape of the anatomical structure; estimating values of a plurality of first parameters (e.g., deformation parameters) and a plurality of second parameters (e.g., affine parameters); adjusting the training representation of the anatomical structure using the estimates of the plurality of first parameters and the plurality of second parameters; predicting segmentation of the anatomical structure based on the adjusted training representation of the anatomical structure; and adjusting the parameters of the ANN based on a loss (e.g., difference) between various prediction / estimation results and their associated gold standard. For example, the parameters of the ANN can be adjusted based on the predicted segmentation of the anatomical structure and the gold standard segmentation of the anatomical structure. The parameters of the ANN can also be adjusted based on the difference between the adjusted training representation of the anatomical structure and the gold standard representation. The parameters of the ANN can also be adjusted based on the differences between the plurality of first parameters and the gold standard of the plurality of first parameters and / or the differences between the plurality of second parameters and the gold standard of the plurality of second parameters.

[0007] The parameters, point clouds, and / or segmentation masks determined using the techniques described herein can be used to serve multiple clinical purposes. Using the aforementioned multiple first parameters and multiple second parameters, one or more processors of the device can also be configured to determine the shape of an anatomical structure over a period of time (e.g., a cardiac cycle), thereby tracking the movement of the anatomical structure during that time period. Attached Figure Description

[0008] The examples disclosed herein can be understood in more detail from the following description, which is given by way of example in conjunction with the accompanying drawings.

[0009] Figure 1 This is a simplified diagram illustrating an example neural network according to one or more embodiments described herein.

[0010] Figure 2 This is a simplified diagram illustrating an example of a neural network for determining the shape and / or segmentation of an anatomical structure according to one or more embodiments described herein.

[0011] Figure 3 This is a simplified diagram illustrating the training of a neural network for performing shape modification and / or segmentation tasks according to one or more embodiments described herein.

[0012] Figure 4 This is a simplified diagram illustrating an example neural network that may include a feature encoder and a feature decoder according to one or more embodiments described herein.

[0013] Figure 5 This is a simplified diagram illustrating an example neural network structure according to one or more embodiments described herein.

[0014] Figure 6 This is a simplified diagram illustrating example operations that can be performed when training a neural network described in one or more embodiments provided herein.

[0015] Figure 7 This is a simplified diagram illustrating an example application scenario of the techniques described in one or more embodiments provided herein.

[0016] Figure 8 This is a simplified diagram illustrating example components of a device that can be configured to perform the tasks described in one or more embodiments provided herein. Detailed Implementation

[0017] The present disclosure is illustrated by way of example rather than limitation in the figures.

[0018] Figure 1 This is a simplified diagram illustrating an example neural network 100 according to one or more embodiments described herein. As shown, the neural network 100 can be configured to receive a medical scan image 102 and representation 104 of an anatomical structure, and generate at least one of a modified representation 106 or a segmentation mask 108 of the anatomical structure. The anatomical structures described herein can include organs or tissues of the human body, such as myocardium, left ventricular epicardium, left ventricular endocardium, right ventricular epicardium, right ventricular endocardium, etc. The medical scan image 102 of such an anatomical structure can be captured using various imaging modalities, including magnetic resonance imaging (MRI), computed tomography (CT), X-ray imaging, ultrasound, etc. The medical scan image 102 can include a single (e.g., static) scan image (such as a single MRI scan image) or a series of scan images (e.g., dynamic scan images) (such as images included in an MRI movie). In the latter case, the neural network 100 can process the series of scan images individually (e.g., as multiple static images), for example, according to the chronological order of the images in the movie.

[0019] The representation 104 of the anatomical structure may include a point cloud (e.g., a set of data points in space) that can indicate the shape of the anatomical structure. Representation 104 may also be provided in other forms, including, for example, a three-dimensional (3D) mesh of the anatomical structure. In any case, representation 104 may be derived from a group or population and may represent the average shape of the anatomical structures within that group or population. The techniques used to derive such an average shape will be described in more detail below. Since the average shape may only represent the baseline shape of the anatomical structure (e.g., the average shape), it may not accurately reflect the actual shape of the anatomical structure depicted by the medical scan image 102. The neural network 100 may be configured to adjust the shape of the anatomical structure indicated by representation 104 and generate a modified representation 106 of the anatomical structure based on prior knowledge obtained from a statistical shape model. Representation 106 may be generated in the same format as representation 104 (e.g., a point cloud) and may correspond to a deformed (e.g., distorted) and transformed (e.g., via affine transformation) version of representation 104. The processes and / or techniques used to generate representation 106 and the training of the neural network 100 for performing these tasks will be described in more detail below.

[0020] In addition to generating a modified shape 106 to indicate the anatomical structure, the neural network 100 can also be configured to segment anatomical structures in the medical scan image 102 based on the modified shape of the anatomical structure indicated by the representation 106. For example, the neural network 100 can be configured to generate a segmentation mask 108 that can identify pixels in the medical scan image 102 corresponding to anatomical structures (e.g., left ventricular epicardium, left ventricular endocardium, right ventricular epicardium, right ventricular endocardium, etc.). As will be described in more detail below, the segmentation mask 108 can not only provide a depiction of the anatomical structure in the medical scan image 102, but also provide additional references that can be used to modify the deformation and / or affine parameters (e.g., point cloud) predicted by the neural network.

[0021] Figure 2 This illustrates a neural network 200 for determining the shape and / or segmentation of anatomical structures (e.g., Figure 1 A simplified diagram of an example of neural network 200 shown. As illustrated, neural network 200 can be configured based on anatomical structures in medical scan images 202 (e.g., Figure 1 Medical scan images 102) and representations of anatomical structures 204 (e.g., Figure 1 The medical scan image 202 may include MRI images, MRI films, CT images, ultrasound images, etc., while the representation 204 may include a point cloud (e.g., a 2D or 3D point cloud) representing the average shape of anatomical structures in a population.

[0022] The neural network 200 may include multiple layers, such as one or more convolutional layers, one or more pooling layers, and / or one or more fully connected layers. Each convolutional layer may include multiple convolutional kernels or filters with corresponding weights configured to extract specific features from the medical scan image 202. Following the convolution operation may be batch normalization and / or linear or non-linear activation, and the features extracted by the convolutional layers (e.g., in the form of feature maps or feature vectors) may be downsampled (e.g., using a 2×2 window and a stride of 2) by pooling layers and / or fully connected layers to reduce feature redundancy and / or size (e.g., reduced by a factor of 2). The extracted features may be used by fully connected layers to regress expected values.

[0023] In the example, a subset of the aforementioned layers (e.g., multiple convolutional layers followed by one or more fully connected layers) can form a parameter prediction module 200a (e.g., a parameter determination subnetwork), configured to predict (e.g., regress) multiple first parameters β and multiple second parameters θ to modify the average shape of the anatomical structure indicated by representation 204, thereby matching the shape of the anatomical structure in the medical scan image 202. The multiple first parameters β can be used to adjust the shape of the anatomical structure indicated by representation 204 (e.g., deforming or distorting representation 204 into a distorted representation 206). Therefore, the multiple first parameters can be referred to herein as deformation parameters. The multiple second parameters θ can be used to transform (e.g., via affine transformation) the distorted representation 206 into a modified (e.g., distorted and transformed) representation 208 (e.g., another 2D or 3D point cloud). Therefore, the multiple second parameters can be referred to herein as affine transformation parameters or affine parameters. In the example, the deformation parameter β can correspond to weights associated with the principal components of a statistical shape model, which will be described in more detail below. The affine parameter θ can include one or more transformation vectors or matrices that can be used to modify the geometry of the warped representation 206 (e.g., by translation, rotation, scaling, etc.) to obtain representation 208.

[0024] The neural network 200 may include a shape adjustment module 200b (e.g., one or more shape adjustment layers) and an affine transformation module 200c (e.g., one or more affine transformation layers), which are configured to perform the deformation (e.g., twisting) and transformation operations described herein, as illustrated in equation (1) below:

[0025] P=θ(P m +β*C) (1)

[0026] Where θ can represent the affine parameters predicted by neural network 200 for transforming the shape of the anatomical structure in image space, C can represent the principal component matrix (e.g., including eigenvectors computed from shape space), β can represent the deformation parameters predicted by neural network 200 for distorting the average shape of the anatomical structure, and P...m P can represent the average point cloud of a predetermined anatomical structure (e.g., denoted as 204), and P can represent the target point cloud that can be generated by the neural network 200 (e.g., denoted as 208). The training of the neural network 200 will be described in more detail below. m Derivation of C and statistical shape models.

[0027] Various techniques can be used to perform deformation and transformation operations. For example, deformation module 200b can be configured to deform (e.g., distort) representation 202 by mapping one or more pixels or voxels (e.g., individual pixels or voxels) of representation 204 to corresponding pixels or voxels in distortion representation 206 based on deformation parameter β. Affine transformation module 200c can be configured to manipulate the geometry of distortion representation 206 by applying one or more of translation, rotation, or scaling to distortion representation 206 based on affine parameter θ.

[0028] The neural network 200 can also be configured to segment anatomical structures from a medical scan image 202, for example, by generating a segmentation mask 210 (e.g., a 2D segmentation mask) of the anatomical structures based on a representation 208 predicted by the network. For this purpose, the neural network 200 can also include an image rendering module 200d (e.g., one or more rendering layers) configured to perform the segmentation task. In the example, the rendering module 200d can be configured to generate the segmentation mask 210 in a differentiable manner, such that when training the neural network 200, additional losses (e.g., in addition to parameter prediction losses) can be determined based on the segmentation operation and backpropagated through the neural network to improve prediction accuracy for the deformation parameter β and the affine parameter θ.

[0029] Various techniques can be employed to render the segmentation mask 210 in a differentiable manner. For example, one or more of the following operations can be performed during the rendering process to make it differentiable. The point cloud included in representation 208 can be converted into polygons, for example, by triangulation. For example, assuming the point cloud comprises T points representing the shape of an anatomical structure (e.g., myocardium), half of the T points (e.g., 0, 1, 2, ... (T / 2-1)) can be used to cover the inner boundary of the anatomical structure, and the remaining half of the T points (e.g., T / 2, (T / 2+1), (T / 2+2), ... T-1) can be used to cover the outer boundary of the anatomical structure. Using these points, the faces of the anatomical structure (e.g., as triangles) can be formulated using the following indices / vertices: {0, 1, (T / 2)}, {1, 2, (T / 2+1)}, ..., {(T / 2-2), (T / 2-1), (T / 2-2)}, {T / 2, (T / 2+1), 1}, {(T / 2+1), (T / 2+2), 2)}, ..., {(T-2), (T-1), (T / 2-1)}, resulting in a total of (T-2) triangulated faces. The segmentation mask 210 can then be rendered based on the vertices and triangulated faces through a rasterization process, where pixels within the triangulated faces can be considered to have a value of one, while pixels outside the triangulated faces can be considered to have a value of zero. Rasterization (e.g., sampling) can be performed in a progressive and thus differential manner (e.g., rather than as a threshold-based discrete operation), for example, by using interpolation (e.g., linear interpolation) to approximate sudden changes in sampled values. As will be described in more detail below, by including a differential rendering module or layer 200d in the neural network 200, the system can not only generate a mask for segmenting anatomical structures, but also utilize information obtained during the segmentation process (e.g., loss) to further improve the performance of the parameter prediction module 200a, deformation module 200b, and / or affine transformation module 200c.

[0030] Figure 3 This illustrates a neural network 300 for performing the shape modification and / or segmentation tasks described herein (e.g., Figure 1 Neural network 100 or Figure 2 A simplified diagram of the training of an instance of a neural network 200. Training can be performed end-to-end using a training dataset that includes multiple medical scan images 302 (e.g., 2D or 3D images) of the anatomical structures described herein, a representation 304 (e.g., point cloud) of the anatomical structures indicating the average shape of the anatomical structures, a gold standard deformation parameter β', a gold standard affine parameter θ', a gold standard representation 308G (e.g., point cloud) of the anatomical structures corresponding to the scan images 302, and a gold standard segmentation mask 310G of the anatomical structures corresponding to the scan images 302.

[0031] In the example, the dataset used to train the neural network 300 can be prepared by performing one or more of the following operations: Medical scan images 302 of the anatomical structure and corresponding gold-standard segmentation masks 310 can be obtained, for example, from a public MRI movie dataset. Based on the scan images 302 and the segmentation masks 310, a gold-standard representation 308G (e.g., a point cloud) of the anatomical structure can be derived, for example, by the following operations: The segmentation masks 310 can be registered to each other via an affine transformation (e.g., transformed to a canonical template domain) to remove the effects of translation, rotation, and / or scaling from shape determination (e.g., because segments may differ from each other in scaling and / or position). During registration, an arbitrary segmentation mask can be selected as a reference to which all other segmentation masks can be registered. The registered segmentation masks can then be averaged, and the average can be used as a new reference to which all segmentation masks can be registered. This process can be repeated multiple times (e.g., in a manner similar to generalized Protodyakonov analysis (GPA)) to converge the registration of the segmentation masks, after which the point cloud P can be determined based on the average of the registered segments. Based on the point cloud P, for example through inverse deformation and / or transformation, the point cloud P can be derived for each image i in the image domain. i (For example, i = 1...N). From these point clouds (e.g., P1, P2...P... N ), which can establish statistical shape models, for example by determining the point cloud (P1, P2...P N The average point cloud P of the average value m (For example, equation (1) and / or Figure 2 The average point cloud P in 204 represents m And by applying PCA to the point cloud (P1, P2...P... N To extract the average shape (e.g., the average point cloud P) m The principal component matrix C is determined by the main pattern of the changes. β (For example, the principal component matrix C in equation (1).

[0032] Once the data is prepared, training of the neural network 300 can begin, for example, by receiving the training scan image 302 as input and predicting multiple first parameters β (e.g., with the principal component matrix C) through the parameter prediction module 300a of the neural network. βThe neural network 300 (e.g., a first parameter β) and multiple second parameters θ (e.g., one or more affine transformation vectors or matrices). Using multiple first parameters β, the neural network 300 (e.g., a deformation module 300b of the neural network) can deform representation 304, for example, according to equation (1), to obtain a distorted representation 306 (e.g., a 2D or 3D point cloud). Using multiple second parameters θ, the neural network 300 (e.g., an affine transformation module 300c of the neural network) can further transform the distorted representation 306 (e.g., according to equation (1)) to obtain a modified representation 308 (e.g., another 2D or 3D point cloud). The neural network 300 can then compare the modified representation 308 with the gold standard representation 308G corresponding to the training scan image 302 and determine the loss between the two representations. This loss can be calculated in different ways, for example, as the mean squared error (MSE) between the modified representation 308 and the gold standard representation 308G. One or more other losses may also be determined to facilitate the training of the neural network 300, including, for example, the loss between the predicted deformation parameter β and the gold standard deformation parameter β' (e.g., MSE loss) and / or the loss between the affine parameter θ and the gold standard affine parameter θ' (e.g., MSE loss). Once these losses are determined, they can be backpropagated through the neural network 300 individually or as combined losses (e.g., as the average of multiple losses) to adjust the execution parameters of the neural network (e.g., weights associated with one or more of the parameter prediction module 300a, deformation module 300b, or affine transformation module 300c).

[0033] In the example, neural network 300 can also be configured to perform a segmentation task in conjunction with the parameter prediction task described herein during the training process. For example, neural network 300 can predict a segmentation mask 310 of an anatomical structure based on a modified representation 308. As described herein, such a segmentation mask can be rendered in a differentiable manner using the rendering module or rendering layer 300d of neural network 300. Once generated, the segmentation mask 310 can be compared with a gold standard segmentation 310G corresponding to the training scan image 302, and an additional loss between the two masks can be determined. This loss can be computed, for example, as a Dice loss between the predicted mask 310 and the gold standard mask 310G. And since the rendering of the segmentation mask 310 is performed in a differentiable manner, the loss associated with the mask can be backpropagated through neural network 300 (e.g., gradient descent based on the loss) to further improve the network's execution parameters.

[0034] Figure 3The illustrated and described training techniques can address the problems associated with large search spaces for deformation parameters β and transformation parameters θ (e.g., converging a parameter prediction network within such a large search space can be difficult). Simultaneously, misalignment between the predicted point cloud and the gold-standard point cloud can lead to the generation of defective segmentation masks based on the misaligned point cloud. The results for both operations can be improved by considering the losses associated with both parameter prediction and segmentation during the training process.

[0035] In the example, the neural network described in this article (e.g., Figure 1 Neural networks 100 and / or Figure 2 The neural network (200) can be configured to further combine parameter prediction and segmentation operations, for example, by having one branch of the neural network output the segmentation of the anatomical structure, and another branch of the network output the deformation and transformation parameters described herein, and by allowing the two branches to share certain structures and / or intermediate results (e.g., features extracted from the input scan image) to improve the performance of the two branches.

[0036] Figure 4 An example of such a neural network 400 is shown. The neural network 400 may include a feature encoder 400a and a feature decoder 400b configured to segment anatomical structures from an input scanned image 402. The encoder 400a may include a convolutional neural network (CNN), which in turn may include multiple layers, such as one or more convolutional layers, one or more pooling layers, and / or one or more fully connected layers. Each convolutional layer may include multiple convolutional kernels or filters configured to extract specific features from the input scanned image 402. Following the convolutional operation may be batch normalization and / or linear or non-linear activation, and the features extracted by the convolutional layers (e.g., in the form of feature maps or feature vectors) may be downsampled by pooling layers and / or fully connected layers to reduce feature redundancy and / or size. The decoder 400b may include one or more non-pooling layers and one or more transposed convolutional layers. Through the non-pooling layers, the decoder 400b may upsample the features extracted by the encoder 400a, and the upsampled features may be further processed by one or more transposed convolutional operations (e.g., via one or more transposed convolutional layers) to derive dense feature maps.

[0037] Utilizing both low-level structural and high-level semantic information extracted by encoder 400a and decoder 400b, neural network 400 can predict segmentation mask 404 through branches of the network including encoder 400a and decoder 400b. The encoder / decoder branches can also serve as the backbone of parametric regression branch 400c, configured to predict deformation parameters β and transformation parameters θ as described herein, and / or point clouds representing the shape of anatomical structures as described herein. For example, image features extracted from one or more (e.g., all) encoder layers (or decoder layers) can be concatenated (e.g., to avoid bypassing), and the features can be forwarded to a bottleneck layer to extract information for the regression task. By directly utilizing these segmentation features, information learned through the segmentation task can be used to improve the quality of shape parameter prediction and / or point cloud estimation. Furthermore, neural network 400 can be trained based on a combination of losses (e.g., between the prediction result and the corresponding gold standard (GT)), including, for example, parametric regression loss, point cloud estimation loss, and / or segmentation loss. This training technique can also improve network performance compared to training the neural network based on only a single loss (e.g., parametric regression loss only).

[0038] Figure 5 An example structure that can be included as part of a neural network 400 is illustrated. As shown, a feature encoder (e.g., left side of the figure) and a feature decoder (e.g., right side of the figure) can form the backbone of the neural network to extract features from an input scanned image and estimate a segmentation mask at the output based on the extracted features. The extracted features can also be concatenated and forwarded to the bottleneck of the neural network (e.g., a bottleneck layer), where they can be used by multiple (e.g., three) fully connected layers (e.g., attached to the bottleneck) to regress the deformation parameter β and transformation parameter θ as described herein. The neural network can be trained based on multiple losses, including, for example, losses associated with point cloud generation and losses associated with image segmentation.

[0039] Figure 6 Examples of training neural networks (e.g., according to one or more embodiments described herein) are illustrated. Figure 1 Neural network 100 Figure 2 Neural network 200, Figure 3 300 neural networks Figure 4Examples of operations performed simultaneously with neural networks (e.g., 400, etc.). For example, at 602, parameters of the neural network (e.g., weights associated with various filters or kernels of the neural network) can be initialized. Parameters can be initialized, for example, based on samples collected from one or more probability distributions or parameter values from another neural network with a similar architecture. At 604, the neural network can receive training scan images of anatomical structures (e.g., MRI images of myocardium) and training representations of the anatomical structures (e.g., 3D point clouds) indicating the average shape of anatomical structures derived from a population. At 606, the neural network can extract features from the training scan images and predict corresponding values of multiple first parameters β (e.g., deformation parameters) and multiple second parameters θ (e.g., affine transformation parameters) based on the extracted features. At 608, the neural network can deform (e.g., distort) the received training representation of the anatomical structure using the multiple first parameters β to obtain a distorted representation of the anatomical structure, and further transform the distorted representation to obtain a transformed representation of the anatomical structure. At 610, the neural network can compare the representation obtained at 608 (e.g., which may indicate the adjusted shape of the anatomical structure) with the gold standard representation of the anatomical structure (e.g., which may indicate the gold standard shape of the anatomical structure), and determine a first loss based on this comparison. The first loss may be determined, for example, based on the mean squared error associated with the predicted representation.

[0040] At 612, the neural network can render a segmentation mask associated with the anatomical structure in a differential manner based on the representation of the anatomical structure obtained at 608. The neural network can then compare the rendered segmentation mask with a gold standard segmentation mask and determine a second loss based on the comparison. The second loss can be determined as, for example, the Dessian loss between the mask rendered by the neural network and the gold standard mask. At 614, the neural network can determine whether one or more training termination criteria have been met. For example, if the aforementioned first and second losses are below corresponding predetermined thresholds, if the change in loss value between two training iterations (e.g., between consecutive training iterations) is below a predetermined threshold, etc., the training termination criteria can be considered met. If it is determined at 614 that the training termination criteria have been met, training can end. Otherwise, before training returns to 606, the neural network can adjust its parameters at 616 by backpropagating the first and second losses through the neural network (e.g., gradient descent based on the corresponding gradient descent associated with the first and second losses or a combined loss such as the average of the first and second losses).

[0041] It should be noted that, although Figure 6Only the first loss associated with shape estimation (e.g., point cloud estimation) and the second loss associated with segmentation are shown, but other types of losses can also be determined and / or utilized to facilitate the training of the neural network. These losses may include, for example, the loss between the estimated deformation parameter β and the gold standard deformation parameter β' and / or the loss between the estimated affine transformation parameter θ and the gold standard affine parameter θ'.

[0042] For the sake of simplicity, the training steps are depicted and described in a specific order herein. However, it should be understood that training operations can occur in various orders, simultaneously, and / or with other operations not presented or described herein. Furthermore, it should be noted that not all operations that may be included in the training process are depicted and described herein, and not all exemplified operations need to be performed.

[0043] The parameters, representations (e.g., point clouds), and / or segmentations obtained using the neural networks described herein can be used to serve multiple clinical purposes. For example, as described herein, neural networks are capable of processing not only single scan images (e.g., still image frames) but also a series of scan images (e.g., moving images), such as the scan images included in a film (e.g., a cardiac film). When given a series of scan images of an anatomical structure at input, the neural network can process the images individually and generate point clouds and / or segmentation masks based on the individual images. Such point clouds and / or segmentation masks can indicate changes in the shape of the anatomical structure over a period of time, and thus can be used to track the movement of the anatomical structure during that period.

[0044] Figure 7 An example application scenario of the technique described herein is illustrated. As shown in the figure, the neural network described herein can be used to generate point clouds (e.g., respectively in...). Figure 2 and Figure 3 The points shown (representations 208 and 308) can indicate the shape of the myocardium during the complete cycle of cardiac contraction and relaxation. These point clouds can be used to track the movement of the myocardium from the start of diastole to contraction and back to diastole.

[0045] The systems, methods, and / or apparatuses described herein may be implemented using one or more processors, one or more storage devices, and / or other suitable auxiliary devices (such as display devices, communication devices, input / output devices, etc.). Figure 8This is a block diagram illustrating an example device 800 that can be configured to perform the shape modification and segmentation tasks described herein. As shown, device 800 may include a processor (e.g., one or more processors) 802, which may be a central processing unit (CPU), graphics processing unit (GPU), microcontroller, reduced instruction set computer (RISC) processor, application-specific integrated circuit (ASIC), application-specific instruction set processor (ASIP), physical processing unit (PPU), digital signal processor (DSP), field-programmable gate array (FPGA), or any other circuitry or processor capable of performing the functions described herein. Device 800 may also include communication circuitry 804, memory 806, mass storage device 808, input device 810, and / or communication link 812 (e.g., communication bus) through which one or more components shown in the figure exchange information.

[0046] Communication circuitry 804 can be configured to send and receive information using one or more communication protocols (e.g., TCP / IP) and one or more communication networks, including local area networks (LANs), wide area networks (WANs), the Internet, and wireless data networks (e.g., Wi-Fi, 3G, 4G / LTE, or 5G networks). Memory 806 may include a storage medium (e.g., a non-transitory storage medium) configured to store machine-readable instructions that, when executed, cause processor 802 to perform one or more functions described herein. Examples of machine-readable media may include volatile or non-volatile memory, including but not limited to semiconductor memory (e.g., electrically programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM)), flash memory, etc.). Mass storage device 808 may include one or more disks, such as one or more internal hard disks, one or more removable disks, one or more magneto-optical disks, one or more CD-ROMs or DVD-ROMs, etc., on which instructions and / or data may be stored for operation of processor 802. Input device 810 may include a keyboard, mouse, voice-controlled input device, touch-sensitive input device (e.g., touch screen), etc., for receiving user input from device 800.

[0047] It should be noted that device 800 can operate as a standalone device or can be connected to other computing devices (e.g., networked or clustered) to perform the functions described herein. And even in Figure 8 Only one example of each component is shown in the figure, and those skilled in the art will understand that the device 800 may include multiple instances of one or more components shown in the figure.

[0048] Although this disclosure has been described according to certain embodiments and generally associated methods, changes and variations of the embodiments and methods will be apparent to those skilled in the art. Therefore, the above description of exemplary embodiments does not limit this disclosure. Other changes, substitutions, and modifications are possible without departing from the spirit and scope of this disclosure. Furthermore, unless specifically stated otherwise, discussions using terms such as “analyze,” “determine,” “enable,” “identify,” and “modify” refer to the actions and processes of a computer system or similar electronic computing device that manipulate and transform data representing physical (e.g., electronic) quantities within the registers and memories of the computer system into other data representing physical quantities within the computer system's memory or other such information storage, transmission, or display devices.

[0049] It should be understood that the above description is intended to be illustrative and not restrictive. Many other embodiments will become apparent to those skilled in the art upon reading and understanding the above description. Therefore, the scope of this disclosure should be determined by reference to the appended claims and the full scope of their equivalents.

Claims

1. A medical image segmentation device, comprising: One or more processors, which are configured as follows: Receive a representation of an anatomical structure, wherein the representation indicates the shape of the anatomical structure, the shape being the average shape of the anatomical structure determined based on a predetermined statistical shape model of the anatomical structure; Receive medical scan images of the anatomical structure; Based on the received medical scan image, a plurality of first parameters for adjusting the shape of the anatomical structure indicated by the received representation and a plurality of second parameters for transforming the received representation are determined, wherein the plurality of first parameters and the plurality of second parameters are determined using an artificial neural network (ANN), the ANN determining the plurality of first parameters and the plurality of second parameters through a parameter determination subnetwork, the ANN being trained to make the determination; Using the ANN, a modified representation of the anatomical structure is generated by applying the plurality of first parameters and the plurality of second parameters to the received representation of the anatomical structure; and The anatomical structures in the medical scan image are segmented using the ANN based on the modified representation of the anatomical structures, wherein the ANN includes one or more rendering layers configured to render a segmentation mask of the anatomical structures in a differentiable manner based on the modified representation of the anatomical structures.

2. The device according to claim 1, wherein, The representation of the anatomical structure received by the one or more processors includes a point cloud.

3. The device according to claim 1, wherein, The ANN includes one or more shape adjustment layers configured to adjust the shape of the anatomical structure using the plurality of first parameters to obtain a distorted representation of the anatomical structure. The ANN also includes one or more transformation layers configured to apply affine transformations to the distorted representation of the anatomical structure using the plurality of second parameters.

4. The device according to claim 1, wherein, The ANN is trained through a process that includes: Receive training images of the anatomical structures; Receive a training representation of the anatomical structure, the training representation indicating the average shape of the anatomical structure; Estimate the values of the plurality of first parameters and the plurality of second parameters; The training representation of the anatomical structure is adjusted using the estimated values of the plurality of first parameters and the plurality of second parameters; Predicting segmentation of the anatomical structure based on the adjusted trained representation of the anatomical structure; and The parameters of the ANN are adjusted based on the difference between the predicted segmentation of the anatomical structure and the gold standard segmentation of the anatomical structure.

5. The device according to claim 1, wherein, The one or more processors are also configured to use the plurality of first parameters and the plurality of second parameters to track the motion of the anatomical structure.

6. A method for segmenting a medical image, the method comprising: Receive a representation of an anatomical structure, wherein the representation indicates the shape of the anatomical structure, the shape being the average shape of the anatomical structure determined based on a predetermined statistical shape model of the anatomical structure; Receive medical scan images of the anatomical structure; Based on the received medical scan image, a plurality of first parameters for adjusting the shape of the anatomical structure indicated by the received representation and a plurality of second parameters for transforming the received representation are determined, wherein the plurality of first parameters and the plurality of second parameters are determined using an artificial neural network (ANN), the ANN determining the plurality of first parameters and the plurality of second parameters through a parameter determination subnetwork, the ANN being trained to make the determination; Using the ANN, a modified representation of the anatomical structure is generated by applying the plurality of first parameters and the plurality of second parameters to the received representation of the anatomical structure; and The anatomical structures in the medical scan image are segmented using the ANN based on the modified representation of the anatomical structures, wherein the ANN includes one or more rendering layers configured to render a segmentation mask of the anatomical structures in a differentiable manner based on the modified representation of the anatomical structures.