A local skeleton image partitioning method and system based on an attention mechanism

CN122199573APending Publication Date: 2026-06-12BEIJING JISHUITAN HOSPITAL

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
BEIJING JISHUITAN HOSPITAL
Filing Date
2026-03-04
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing deep learning methods cannot effectively correct segmentation errors that do not conform to anatomical topology in skeletal image segmentation, resulting in a lack of topology optimization in the segmentation results.

Method used

We employ a local skeletal image segmentation method based on an attention mechanism. By constructing an anatomical prior constraint knowledge base and an anatomical correction database, we optimize feature maps using anatomical knowledge. Combining spatial and channel attention mechanisms, we further generate high-precision segmentation mask maps through a diffusion model.

🎯Benefits of technology

It effectively corrects segmentation errors that do not conform to anatomical topology, improves segmentation accuracy, and can better optimize local details to generate segmentation results that meet clinical needs.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122199573A_ABST
    Figure CN122199573A_ABST
Patent Text Reader

Abstract

The application provides a local skeleton image segmentation method and system based on an attention mechanism, and the steps of the method include: constructing constraint knowledge of each skeleton position into a constraint tensor based on an anatomical prior constraint knowledge base; standardizing each image slice of an original skeleton graph into a standard skeleton image, cutting the standard skeleton image, and calculating an enhanced feature map based on a first key region enhancement model; expanding the spatial dimension of the constraint tensor to obtain a spatial condition feature map, inputting the enhanced feature map and the spatial condition feature map into a second key region enhancement model to obtain a second spatial weight mask graph; inputting the enhanced feature map and the constraint tensor into a channel enhancement processing model to obtain a channel weight mask graph; calculating a fusion feature map based on the second spatial weight mask graph, the channel weight mask graph and the enhanced feature map; inputting the fusion feature map into a diffusion model, and the diffusion model outputs a segmentation probability graph, and a segmentation mask graph is determined based on the segmentation probability graph.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of medical image processing technology, and in particular to a method and system for local skeletal image segmentation based on an attention mechanism. Background Technology

[0002] Skeletal image segmentation is a crucial task in medical image analysis, and its accuracy directly impacts clinical diagnosis, surgical planning, and treatment outcome evaluation. With the development of medical imaging technology, traditional image segmentation methods (such as thresholding, region growing, and edge detection) are no longer sufficient to meet clinical needs, and deep learning methods (such as U-Net, V-Net, and TransUNet) are gradually becoming mainstream.

[0003] Deep learning methods can automatically extract image features and achieve high-precision segmentation by learning from large amounts of medical image data. However, existing segmentation results lack topology optimization, and current post-processing methods mostly rely on morphological operations, which can only optimize local details and cannot correct segmentation errors that do not conform to anatomical topology, such as bone fractures and joint dislocations.

[0004] In view of this, the present invention is proposed. Summary of the Invention

[0005] The purpose of this invention is to provide a local skeletal image segmentation method and system based on an attention mechanism. This solution can make full use of anatomical knowledge, integrating anatomical knowledge into the construction of enhanced feature maps and fused feature maps, and finally obtaining a segmentation mask map through fused feature maps, which can better optimize local details and correct segmentation errors that do not conform to anatomical topology.

[0006] This invention provides a method for local skeleton image segmentation based on an attention mechanism, the method comprising the following steps: Obtain the anatomical prior constraint knowledge base and the anatomical correction database, extract the constraint knowledge of each bone position in the anatomical prior constraint knowledge base, and construct the constraint knowledge of each bone position as a constraint tensor; The original bone image is acquired and each image slice is standardized to obtain a standard bone image. The standard bone image is cropped based on a preset anatomical correction database to obtain a coarse-matched bone ROI image. The coarse-matched bone ROI image is input into the first key region enhancement model to obtain an enhancement feature map. The constraint tensor corresponding to the original skeletal image is expanded to the same spatial dimension as the enhancement feature map to obtain a spatial conditional feature map. The enhancement feature map and the spatial conditional feature map are input into the second key region enhancement model to obtain a second spatial weight mask map. The enhancement feature map and the constraint tensor are input into the channel enhancement processing model to obtain a channel weight mask map. A fusion feature map is calculated based on the second spatial weight mask map, the channel weight mask map, and the enhancement feature map. The second spatial weight mask, the channel weight mask, and the enhanced feature map are multiplied element-wise to obtain the final fused feature map. The fused feature map is input into a preset diffusion model, which adopts the U-Net model structure. The diffusion model outputs a segmentation probability map, and a segmentation mask map is determined based on the segmentation probability map.

[0007] The proposed scheme first references anatomical constraints and constructs an anatomical prior constraint knowledge base. Each anatomical constraint is constructed as a constraint tensor. Standardized skeletal images are then cropped using an anatomical correction database and preliminarily enhanced using a first key region enhancement model to obtain an enhanced feature map. A second key region enhancement model is then used to construct a second spatial weight mask map for the spatial dimension, and finally, a fused feature map is obtained. In further processing, the fused feature map is input into a diffusion model to obtain the final segmentation mask map. This scheme fully utilizes anatomical knowledge, integrating it into the construction of the enhanced and fused feature maps, and ultimately obtains the segmentation mask map from the fused feature map. This approach better optimizes local details and corrects segmentation errors that do not conform to anatomical topology.

[0008] In some embodiments of the present invention, in the step of cropping a standard bone image based on a preset anatomical template database, the coordinate range of the ROI region in the standard bone image is calculated based on a preset cropping algorithm to obtain the range of the coarsely matched ROI region. Based on the range of the coarse-matched ROI region, a first ROI region in the standard bone image is determined, and the standard bone image is adjusted into a coarse-matched bone ROI image based on the first ROI region. The coarsely matched skeletal ROI image is input into the first key region enhancement model, and the first key region enhancement model outputs an enhanced feature map.

[0009] In some embodiments of the present invention, in the step of calculating the coordinate range of the ROI region in the standard skeletal image based on a preset cropping algorithm to obtain the range of the coarsely matched ROI region: Obtain the template image corresponding to the standard skeleton image, normalize the pixel values ​​of the standard skeleton image and the template image to [0,1], traverse the normalized standard skeleton image and template image using a sliding window with a step of 1, calculate the average pixel value of each window, and calculate the NCC value of each window based on the calculated average pixel value of each window. The NCC value of each window is compared with the NCC threshold of the threshold. If it is greater than the preset NCC threshold, the ROI sub-region range is determined based on the corresponding position of the window in the standard skeleton image. The coarse matching ROI region range is determined based on the total ROI sub-region range.

[0010] In some embodiments of the present invention, the window size is the same as the template image size, and in the step of calculating the NCC value of each window based on the calculated average pixel value of each window, the NCC value is calculated using the following formula:

[0011] in, The coordinates of the top left corner are: The NCC value of the window. m represents the height of the template image, and n represents the width of the template image. In a standard skeletal image The pixel value of the location, The coordinates of the top left corner in a standard skeleton image are: The average pixel value of the window. Indicates the template image The pixel value of the location, This represents the average pixel value of the template image.

[0012] Using the above scheme, this step utilizes template images pre-stored in the anatomical correction database corresponding to bone positions. Based on these template images, the effective image range of the standard bone image, i.e., the range of the coarse matching ROI region, is determined. The area outside the range of the coarse matching ROI region in the original image is rendered as a preset pixel, which can be black, to highlight the effective area. This step uses the template images in the anatomical correction database to initially divide the effective area.

[0013] In some embodiments of the present invention, in the step of inputting the coarse-matched skeletal ROI image into a first key region enhancement model, and the first key region enhancement model outputting an enhanced feature map, the first key region enhancement model is sequentially configured with an input adaptation layer, a max pooling layer, an average pooling layer, a 7*7 convolutional layer, a Sigmaod activation function layer, and a weight fusion layer. The coarse-matched skeletal ROI image is input into the input adaptation layer, which is sequentially configured with a 3*3 convolutional layer, a BN layer, and a ReLU activation function layer. The ReLU activation function layer outputs a normalized feature map. The max pooling layer and the average pooling layer respectively perform pooling processing on the normalized feature map. The outputs of the max pooling layer and the average pooling layer are concatenated and input into the 7*7 convolutional layer. The Sigmaod activation function layer connected to the 7*7 convolutional layer outputs a first spatial weight mask map. The weight fusion layer multiplies the first spatial weight mask map element-wise with the normalized feature map to obtain the enhanced feature map.

[0014] The proposed scheme employs spatial weighted masks to construct enhanced feature maps, effectively capturing fine-grained anatomical features in medical bone images. While preserving global contextual information, it precisely focuses on key details within the bone region, providing high-quality feature representations for subsequent conditional diffusion segmentation models. Specifically, spatial weighted masks significantly enhance the expression of edge, texture, and morphological features in bone regions, suppressing noise and irrelevant information from the background, further improving the robustness and discriminative power of the feature representations. This provides more representative input features for the conditional diffusion segmentation model, ultimately achieving high-precision segmentation of medical bone images.

[0015] In some embodiments of the present invention, in the steps of inputting the enhanced feature map and the spatial condition feature map into a second key region enhancement model to obtain a second spatial weight mask map; inputting the enhanced feature map and the constraint tensor into a channel enhancement processing model to obtain a channel weight mask map; and calculating a fused feature map based on the second spatial weight mask map, the channel weight mask map, and the enhanced feature map, the enhanced feature map and the spatial condition feature map are concatenated in the channel dimension to obtain a combined feature map; the combined feature map is input into the second key region enhancement model to obtain the second spatial weight mask map; and the enhanced feature map and the constraint tensor are input into the channel enhancement processing model, which outputs a channel weight mask map.

[0016] Using the above approach, this scheme further employs a channel attention mechanism, which can adaptively learn the importance weights of different channels in the feature map, accurately select the feature channels most relevant to the medical bone image segmentation task, and effectively suppress noise and redundant information interference from irrelevant channels. The channel attention mechanism captures global statistical information and local extreme value information of different channels by performing global average pooling and max pooling on the feature map, and then generates channel attention weights through fully connected layers and activation functions to assign a reasonable importance score to each feature channel.

[0017] In some embodiments of the present invention, in the step of inputting the combined feature map into the second key region enhancement model to obtain the second spatial weight mask map, the second key region enhancement model is provided with a max pooling layer, an average pooling layer, a 7*7 convolutional layer, a Sigmaod activation function layer, and a weight fusion layer. The max pooling layer and the average pooling layer respectively perform pooling processing on the combined feature map. The outputs of the max pooling layer and the average pooling layer are concatenated and input into the 7*7 convolutional layer. The second spatial weight mask map is output through the Sigmaod activation function layer connected to the 7*7 convolutional layer.

[0018] In some embodiments of the present invention, in the step of inputting the enhanced feature map and the constraint tensor into the channel enhancement processing model, and the channel enhancement processing model outputting the channel weight mask map, the channel enhancement processing model compresses the enhanced feature map into a channel feature vector through global average pooling, concatenates the channel feature vector with the constraint tensor in the channel dimension to obtain a fused vector, and processes the fused vector into a channel weight mask map through sequentially set MLP layers and Sigmoid function layers.

[0019] In some embodiments of the present invention, the diffusion model includes an encoder module and a decoder module. The encoder module includes multiple downsampling blocks, each of which includes two 3×3 convolutional layers, a ReLU activation layer, and a 2×2 max pooling layer. The decoder module includes multiple upsampling blocks, each of which includes a 2×2 deconvolutional layer, two 3×3 convolutional layers, and a ReLU activation layer.

[0020] In some embodiments of the present invention, the step of determining a segmentation mask map based on the segmentation probability map output by the diffusion model further includes updating the segmentation probability map output by the diffusion model using an energy function. In the step of updating the segmentation probability map output by the diffusion model using an energy function, the probability density of each position in the segmentation region of the segmentation probability map is calculated, and a data item is calculated based on the probability density of each position in the segmentation probability map; feature values ​​of multiple segmentation features are determined based on the segmentation probability map; prior terms are calculated based on the feature values ​​of each segmentation feature and the constraint feature values ​​in the constraint tensor; an energy function is calculated based on the prior terms and the data item; and the segmentation probability map output by the diffusion model is updated based on the energy function.

[0021] Using the above scheme, this scheme further updates the segmentation probability map with an energy function, which can effectively correct anatomical errors and topological inconsistencies in the initial segmentation results output by the diffusion model, achieving a secondary improvement in segmentation accuracy. The energy function, through joint optimization of data terms and regularization terms, constrains the segmentation results to conform to the anatomical prior knowledge and topological characteristics of the skeleton while retaining the correct features in the initial segmentation results: the data terms model the gray value distribution of the segmentation region based on the Gaussian-mixture distribution model, and ensure that the segmentation results are consistent with the gray value features of the input image through Bayesian posterior probability calculation; the regularization term introduces the anatomical topological constraint tensor of the skeleton, such as the arrangement order of the vertebrae of the spine and the direction of the long axis of the femur, to force the segmentation results to conform to the clinically recognized anatomical structure. For areas with blurred boundaries, large noise interference, or complex anatomical structures in medical images, the iterative optimization of the energy function can correct segmentation deviations and generate segmentation results that better meet the actual clinical needs.

[0022] In some embodiments of the present invention, in the step of calculating data items based on the probability density of each position in the segmented probability map, the data items are calculated using the following formula:

[0023] In the step of calculating the prior term based on the eigenvalues ​​of each segmentation feature and the constraint eigenvalues ​​in the constraint tensor, the prior term is calculated using the following formula:

[0024] In the step of calculating the energy function based on the prior terms and data terms, the energy function is calculated using the following formula:

[0025] in, E Represents the energy function value. Represents a data item. Indicates prior terms, () indicates taking the maximum value among (). Representing segmentation features c eigenvalues, Representing segmentation features c In the constraint tensor, the constraint eigenvalues, Q, represent the set of segmentation features. Represented as segmentation features c Preset tolerance threshold, and These represent the maximum values ​​of the horizontal and vertical coordinates in the segmentation regions of the segmentation probability map, respectively. and These represent the horizontal and vertical coordinate positions of the segmented regions in the segmentation probability map, respectively. The representation in the segmentation probability graph is... The probability density at that location. Represents the natural constant. This indicates the preset balance parameters.

[0026] In some embodiments of the present invention, in the step of updating the segmentation probability map output by the diffusion model based on the energy function, the segmentation probability map is updated using the following formula:

[0027] in, A segmentation probability plot representing time step t. Indicates the updated time step t +1 segmentation probability map This represents the preset learning rate parameter.

[0028] In some embodiments of the present invention, in the step of determining the segmentation mask map based on the segmentation probability map, the segmentation probability map is binarized based on a preset segmentation threshold to obtain the segmentation mask map.

[0029] In some embodiments of the present invention, the method further includes stacking the segmentation mask maps of each image slice of the original skeletal image based on the position of the image slice to obtain a three-dimensional mask, and using the MarchingCubes algorithm to extract the isosurface of the three-dimensional mask to generate an initial three-dimensional mesh model.

[0030] Using the above scheme, this scheme further constructs a three-dimensional model through the segmentation results of each slice, which can more intuitively represent the three-dimensional shape of the target location.

[0031] In some embodiments of the present invention, the method further includes the following steps: Extract the coordinates of all vertices in the initial 3D mesh model as node features of the graph, extract all adjacent vertices in the initial 3D mesh model, construct the edges of the graph, and obtain graph data; The graph data is input into a preset graph neural network model, and the graph neural network outputs a topological feature vector. Based on the topological feature vector, skeletal features are determined, and the skeletal features are compared with the constraint knowledge in the anatomical prior constraint knowledge base to determine anomaly indicators and construct a comparison report.

[0032] Using the above approach, based on the generated 3D data, this approach can further compare the 3D structure with the constraint knowledge. For indicators that do not conform to the constraint knowledge, they can be added to the comparison report, providing doctors with preliminary reference.

[0033] In some embodiments of the present invention, in the step of inputting the graph data into a preset graph neural network model and the graph neural network outputting a topological feature vector, the graph neural network model includes a graph convolutional layer, an attention mechanism layer and a graph pooling layer arranged in sequence, and the graph pooling layer outputs a topological feature vector.

[0034] Another aspect of the present invention relates to a local skeleton image segmentation system based on an attention mechanism. The system includes a computer device, which includes a processor and a memory. The memory stores computer instructions, and the processor is used to execute the computer instructions stored in the memory. When the computer instructions are executed by the processor, the system implements the steps of the method.

[0035] In summary, the present invention has the following beneficial effects: 1. This scheme first references anatomical constraints and constructs an anatomical prior constraint knowledge base. For each anatomical constraint, a constraint tensor is constructed. Standardized bone images are cropped using an anatomical correction database and initially enhanced using a first key region enhancement model to obtain an enhanced feature map. Then, a second key region enhancement model is used to construct a second spatial weight mask map for the spatial dimension, and finally, a fused feature map is obtained. In further processing, the fused feature map is input into a diffusion model to obtain the final segmentation mask map. This scheme can fully utilize anatomical knowledge, integrating it into the construction of enhanced and fused feature maps, and finally obtaining a segmentation mask map from the fused feature map, which can better optimize local details and correct segmentation errors that do not conform to anatomical topology. 2. This scheme employs spatial weighted masks to construct enhanced feature maps, effectively capturing fine-grained anatomical features in medical bone images. While preserving global contextual information, it precisely focuses on key details of the bone region, providing high-quality feature representations for subsequent conditional diffusion segmentation models. Specifically, spatial weighted masks significantly enhance the expression of edge, texture, and morphological features of bone regions, suppressing noise and irrelevant information interference from background regions, further improving the robustness and discriminative power of feature representations, providing more representative input features for conditional diffusion segmentation models, and ultimately achieving high-precision segmentation of medical bone images. 3. This scheme further adopts a channel attention mechanism, which can adaptively learn the importance weights of different channels in the feature map, accurately select the feature channels most relevant to the medical bone image segmentation task, and effectively suppress noise and redundant information interference from irrelevant channels. The channel attention mechanism can capture the global statistical information and local extreme value information of different channels by performing global average pooling and max pooling on the feature map. Then, channel attention weights are generated through fully connected layers and activation functions to assign reasonable importance scores to each feature channel. 4. This scheme uses an energy function to further update the segmentation probability map, which can effectively correct anatomical errors and topological inconsistencies in the initial segmentation results output by the diffusion model, achieving a secondary improvement in segmentation accuracy. The energy function, through joint optimization of data terms and regularization terms, constrains the segmentation results to conform to the anatomical prior knowledge and topological characteristics of the skeleton while retaining the correct features in the initial segmentation results: the data terms model the gray value distribution of the segmentation region based on the Gaussian-mixture distribution model, and ensure that the gray value features of the segmentation results are consistent with those of the input image through Bayesian posterior probability calculation; the regularization term introduces anatomical and topological constraint tensors of the skeleton, such as the arrangement order of the vertebrae of the spine and the direction of the long axis of the femur, to force the segmentation results to conform to clinically recognized anatomical structures. For areas in medical images with blurred boundaries, large noise interference, or complex anatomical structures, the iterative optimization of the energy function can correct segmentation deviations and generate segmentation results that better meet the actual clinical needs. Attached Figure Description

[0036] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0037] Figure 1 This is a schematic diagram of the first embodiment of the attention mechanism-based local skeletal image segmentation method of the present invention; Figure 2This is a schematic diagram of a second embodiment of the attention-based local skeletal image segmentation method of the present invention; Figure 3 This is a schematic diagram of the third embodiment of the attention mechanism-based local skeletal image segmentation method of the present invention; Figure 4 This is a schematic diagram of the fourth embodiment of the attention-based local skeletal image segmentation method of the present invention. Detailed Implementation

[0038] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numbers in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of systems and methods consistent with some aspects of the invention as detailed in the appended claims.

[0039] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The singular forms “a,” “the,” and “the” used in this invention and the appended claims are also intended to include the plural forms unless the context clearly indicates otherwise. It should also be understood that the term “and / or” as used herein refers to and includes any or all possible combinations of one or more of the associated listed items.

[0040] like Figure 1 As shown, this invention provides a local skeleton image segmentation method based on an attention mechanism, the steps of which include: Step S100: Obtain the anatomical prior constraint knowledge base and the anatomical correction database, extract the constraint knowledge of each bone position in the anatomical prior constraint knowledge base, and construct the constraint knowledge of each bone position as a constraint tensor; In the specific implementation process, in the step of constructing the constraint knowledge of each bone location into a constraint tensor, the distance of the femur is organized into a 3×3×3 constraint tensor, and the three dimensions are represented as follows: , , The three dimensions correspond to geometric constraints, topological constraints, and morphological constraints, respectively. Among them, [110, 140, 125] correspond to the minimum, maximum, and ideal values ​​of the femoral neck-shaft angle; [15, 25, 20] correspond to the minimum, maximum, and ideal values ​​of the femoral head diameter (unit: mm); [100, 150, 125] correspond to the minimum, maximum, and ideal values ​​of the femoral shaft length (unit: mm); [0.1, 0.3, 0.2] correspond to the minimum, maximum, and ideal values ​​of the femoral head-acetabulum overlap ratio; [0.0, 0.0, 0.0] indicates no overlap constraint on the femoral neck; [0.0, 0.0, 0.0] indicates no overlap constraint on the femoral shaft; [0.8, 1.2, 1.0] indicates the minimum, maximum, and ideal values ​​of the femoral head long-axis to short-axis ratio; [0.5, 0.7, 0.6] indicates the minimum, maximum, and ideal values ​​of the femoral neck long-axis to short-axis ratio; [0.3, 0.5, 0.4 represents the minimum, maximum, and ideal values ​​of the ratio of the long axis to the short axis of the femoral shaft.

[0041] Step S200: Obtain the original bone image and standardize each image slice to obtain a standard bone image. Crop the standard bone image based on a preset anatomical correction database to obtain a coarse-matched bone ROI image. Input the coarse-matched bone ROI image into the first key region enhancement model to obtain an enhancement feature map. Step S300: Expand the constraint tensor corresponding to the original skeletal image to the same spatial dimension as the enhancement feature map to obtain a spatial conditional feature map; input the enhancement feature map and the spatial conditional feature map into the second key region enhancement model to obtain a second spatial weight mask map; input the enhancement feature map and the constraint tensor into the channel enhancement processing model to obtain a channel weight mask map; calculate a fusion feature map based on the second spatial weight mask map, the channel weight mask map, and the enhancement feature map. In the specific implementation process, the second spatial weight mask map, the channel weight mask map, and the enhanced feature map are multiplied element-wise to obtain the final fused feature map.

[0042] Step S400: Input the fused feature map into a preset diffusion model. The diffusion model adopts the U-Net model structure and outputs a segmentation probability map. Determine the segmentation mask map based on the segmentation probability map.

[0043] In some embodiments of the present invention, in the step of determining the segmentation mask map based on the segmentation probability map, each position in the segmentation probability map is segmented based on a preset segmentation probability threshold to obtain the segmentation mask map; specifically, positions in the segmentation probability map that are greater than the segmentation probability threshold are set as first pixels, and positions that are not greater than the segmentation probability threshold are set as second pixels, with the first pixel and the second pixel corresponding to black and white, respectively.

[0044] The above-mentioned scheme first references anatomical constraints and constructs an anatomical prior constraint knowledge base. For each anatomical constraint, a constraint tensor is constructed. Standardized skeletal images are then cropped using an anatomical correction database and initially enhanced using a first key region enhancement model to obtain an enhanced feature map. A second key region enhancement model is then used to construct a second spatial weight mask map for the spatial dimension, and finally, a fused feature map is obtained. In further processing, the fused feature map is input into a diffusion model to obtain the final segmentation mask map. This scheme fully utilizes anatomical knowledge, integrating it into the construction of the enhanced and fused feature maps, and finally obtains the segmentation mask map from the fused feature map. This allows for better optimization of local details and correction of segmentation errors that do not conform to anatomical topology.

[0045] In some embodiments of the present invention, in the step of cropping a standard bone image based on a preset anatomical template database, the coordinate range of the ROI region in the standard bone image is calculated based on a preset cropping algorithm to obtain the range of the coarsely matched ROI region. Based on the range of the coarse-matched ROI region, a first ROI region in the standard bone image is determined, and the standard bone image is adjusted into a coarse-matched bone ROI image based on the first ROI region. In the specific implementation process, the area outside the first ROI region in the standard skeletal image is rendered as a preset pixel, which can be black, to highlight the effective area and obtain a coarse matching skeletal ROI image.

[0046] The coarsely matched skeletal ROI image is input into the first key region enhancement model, and the first key region enhancement model outputs an enhanced feature map.

[0047] In some embodiments of the present invention, in the step of calculating the coordinate range of the ROI region in the standard skeletal image based on a preset cropping algorithm to obtain the range of the coarsely matched ROI region: Obtain the template image corresponding to the standard skeleton image, normalize the pixel values ​​of the standard skeleton image and the template image to [0,1], traverse the normalized standard skeleton image and template image using a sliding window with a step of 1, calculate the average pixel value of each window, and calculate the NCC value of each window based on the calculated average pixel value of each window. The NCC value of each window is compared with the NCC threshold of the threshold. If it is greater than the preset NCC threshold, the ROI sub-region range is determined based on the corresponding position of the window in the standard skeleton image. The coarse matching ROI region range is determined based on the total ROI sub-region range.

[0048] In specific implementation, the NCC threshold can be set within the range of [0.7, 0.9], specifically 0.8.

[0049] In some embodiments of the present invention, the window size is the same as the template image size, and in the step of calculating the NCC value of each window based on the calculated average pixel value of each window, the NCC value is calculated using the following formula:

[0050] in, The coordinates of the top left corner are: The NCC value of the window. m represents the height of the template image, and n represents the width of the template image. In a standard skeletal image The pixel value of the location, The coordinates of the top left corner in a standard skeleton image are: The average pixel value of the window. Indicates the template image The pixel value of the location, This represents the average pixel value of the template image.

[0051] Using the above scheme, this step utilizes template images pre-stored in the anatomical correction database corresponding to bone positions. Based on these template images, the effective image range of the standard bone image, i.e., the range of the coarse matching ROI region, is determined. The area outside the range of the coarse matching ROI region in the original image is rendered as a preset pixel, which can be black, to highlight the effective area. This step uses the template images in the anatomical correction database to initially divide the effective area.

[0052] In some embodiments of the present invention, in the step of inputting the coarse-matched skeletal ROI image into a first key region enhancement model, and the first key region enhancement model outputting an enhanced feature map, the first key region enhancement model is sequentially configured with an input adaptation layer, a max pooling layer, an average pooling layer, a 7*7 convolutional layer, a Sigmaod activation function layer, and a weight fusion layer. The coarse-matched skeletal ROI image is input into the input adaptation layer, which is sequentially configured with a 3*3 convolutional layer, a BN layer, and a ReLU activation function layer. The ReLU activation function layer outputs a normalized feature map. The max pooling layer and the average pooling layer respectively perform pooling processing on the normalized feature map. The outputs of the max pooling layer and the average pooling layer are concatenated and input into the 7*7 convolutional layer. The Sigmaod activation function layer connected to the 7*7 convolutional layer outputs a first spatial weight mask map. The weight fusion layer multiplies the first spatial weight mask map element-wise with the normalized feature map to obtain the enhanced feature map.

[0053] The proposed scheme employs spatial weighted masks to construct enhanced feature maps, effectively capturing fine-grained anatomical features in medical bone images. While preserving global contextual information, it precisely focuses on key details within the bone region, providing high-quality feature representations for subsequent conditional diffusion segmentation models. Specifically, spatial weighted masks significantly enhance the expression of edge, texture, and morphological features in bone regions, suppressing noise and irrelevant information from the background, further improving the robustness and discriminative power of the feature representations. This provides more representative input features for the conditional diffusion segmentation model, ultimately achieving high-precision segmentation of medical bone images.

[0054] In some embodiments of the present invention, in the steps of inputting the enhanced feature map and the spatial condition feature map into a second key region enhancement model to obtain a second spatial weight mask map; inputting the enhanced feature map and the constraint tensor into a channel enhancement processing model to obtain a channel weight mask map; and calculating a fused feature map based on the second spatial weight mask map, the channel weight mask map, and the enhanced feature map, the enhanced feature map and the spatial condition feature map are concatenated in the channel dimension to obtain a combined feature map; the combined feature map is input into the second key region enhancement model to obtain the second spatial weight mask map; and the enhanced feature map and the constraint tensor are input into the channel enhancement processing model, which outputs a channel weight mask map.

[0055] Using the above approach, this scheme further employs a channel attention mechanism, which can adaptively learn the importance weights of different channels in the feature map, accurately select the feature channels most relevant to the medical bone image segmentation task, and effectively suppress noise and redundant information interference from irrelevant channels. The channel attention mechanism captures global statistical information and local extreme value information of different channels by performing global average pooling and max pooling on the feature map, and then generates channel attention weights through fully connected layers and activation functions to assign a reasonable importance score to each feature channel.

[0056] In some embodiments of the present invention, in the step of inputting the combined feature map into the second key region enhancement model to obtain the second spatial weight mask map, the second key region enhancement model is provided with a max pooling layer, an average pooling layer, a 7*7 convolutional layer, a Sigmaod activation function layer, and a weight fusion layer. The max pooling layer and the average pooling layer respectively perform pooling processing on the combined feature map. The outputs of the max pooling layer and the average pooling layer are concatenated and input into the 7*7 convolutional layer. The second spatial weight mask map is output through the Sigmaod activation function layer connected to the 7*7 convolutional layer.

[0057] In some embodiments of the present invention, in the step of inputting the enhanced feature map and the constraint tensor into the channel enhancement processing model, and the channel enhancement processing model outputting the channel weight mask map, the channel enhancement processing model compresses the enhanced feature map into a channel feature vector through global average pooling, concatenates the channel feature vector with the constraint tensor in the channel dimension to obtain a fused vector, and processes the fused vector into a channel weight mask map through sequentially set MLP layers and Sigmoid function layers.

[0058] Specifically, the enhanced feature map is compressed into a channel feature vector of (batch_size, 64, 1, 1) using global average pooling; feature fusion: the channel feature vector and the constraint tensor are concatenated along the channel dimension to obtain a fused vector of (batch_size, 128, 1, 1); feature transformation: the fused vector is mapped to channel weights using MLP (64→16→64) to obtain a weight vector of (batch_size, 64, 1, 1); activation generation: the weights are normalized to the [0,1] interval using the Sigmoid function to obtain the final channel weight mask map.

[0059] In some embodiments of the present invention, the diffusion model includes an encoder module and a decoder module. The encoder module includes multiple downsampling blocks, each of which includes two 3×3 convolutional layers, a ReLU activation layer, and a 2×2 max pooling layer. The decoder module includes multiple upsampling blocks, each of which includes a 2×2 deconvolutional layer, two 3×3 convolutional layers, and a ReLU activation layer.

[0060] In the specific implementation process, the diffusion model includes an encoder module and a decoder module. The encoder module includes multiple downsampling blocks, each of which includes two 3×3 convolutional layers, a ReLU activation layer, and a 2×2 max pooling layer. The decoder module includes multiple upsampling blocks, each of which includes a 2×2 deconvolutional layer, two 3×3 convolutional layers, and a ReLU activation layer. Specifically, for each downsampling block, the feature map size is halved and the number of channels is doubled; for each upsampling block, the feature map size is doubled and the number of channels is halved. Residual branches are added to the downsampling and upsampling blocks to solve the gradient vanishing problem in deep networks.

[0061] In the specific implementation process, in the step of determining the initial segmentation mask based on the segmentation probability map, the segmentation probability map is binarized based on a preset segmentation threshold to obtain the segmentation mask.

[0062] like Figure 2As shown, in some embodiments of the present invention, the step of determining a segmentation mask map based on the segmentation probability map output by the diffusion model further includes updating the segmentation probability map output by the diffusion model using an energy function. In the step of updating the segmentation probability map output by the diffusion model using an energy function, step S410 involves calculating the probability density of each position in the segmentation region of the segmentation probability map and calculating a data item based on the probability density of each position in the segmentation probability map; step S420 involves determining the feature values ​​of multiple segmentation features based on the segmentation probability map and calculating a priori terms based on the feature values ​​of each segmentation feature and the constraint feature values ​​in the constraint tensor; step S430 involves calculating an energy function based on the priori terms and the data item and updating the segmentation probability map output by the diffusion model based on the energy function; and step S440 involves segmenting each position in the segmentation probability map based on a preset segmentation probability threshold to obtain a segmentation mask map.

[0063] In the specific implementation process, in the step of calculating the probability density of each position in the segmentation region of the segmentation probability map, the positions in the segmentation probability map that are greater than the segmentation probability threshold are grouped together to form a segmentation region, and the probability density of each position in the segmentation region is calculated using the following formula:

[0064] in, The segmentation regions of the segmentation probability map The probability density of a location. This represents the variance of pixel values ​​at various locations within the segmented region. This represents the average pixel value at each location within the segmented region. The segmentation regions of the segmentation probability map The pixel value of the location.

[0065] In the specific implementation process, in the step of determining the feature values ​​of multiple segmentation features based on the segmentation probability map, the segmentation features include pixel features and morphological features. The pixel features include mean, variance, and entropy. The morphological features include the area, perimeter, circularity, and aspect ratio of the segmented region. The constraint feature values ​​are constraint values ​​set for each segmentation feature.

[0066] Using the above scheme, this scheme further updates the segmentation probability map with an energy function, which can effectively correct anatomical errors and topological inconsistencies in the initial segmentation results output by the diffusion model, achieving a secondary improvement in segmentation accuracy. The energy function, through joint optimization of data terms and regularization terms, constrains the segmentation results to conform to the anatomical prior knowledge and topological characteristics of the skeleton while retaining the correct features in the initial segmentation results: the data terms model the gray value distribution of the segmentation region based on the Gaussian-mixture distribution model, and ensure that the segmentation results are consistent with the gray value features of the input image through Bayesian posterior probability calculation; the regularization term introduces the anatomical topological constraint tensor of the skeleton, such as the arrangement order of the vertebrae of the spine and the direction of the long axis of the femur, to force the segmentation results to conform to the clinically recognized anatomical structure. For areas with blurred boundaries, large noise interference, or complex anatomical structures in medical images, the iterative optimization of the energy function can correct segmentation deviations and generate segmentation results that better meet the actual clinical needs.

[0067] In some embodiments of the present invention, in the step of calculating data items based on the probability density of each position in the segmented probability map, the data items are calculated using the following formula:

[0068] In the step of calculating the prior term based on the eigenvalues ​​of each segmentation feature and the constraint eigenvalues ​​in the constraint tensor, the prior term is calculated using the following formula:

[0069] In the step of calculating the energy function based on the prior terms and data terms, the energy function is calculated using the following formula:

[0070] in, E Represents the energy function value. Represents a data item. Indicates prior terms, () indicates taking the maximum value among (). Representing segmentation features c eigenvalues, Representing segmentation features c In the constraint tensor, the constraint eigenvalues, Q, represent the set of segmentation features. Represented as segmentation features c Preset tolerance threshold, and These represent the maximum values ​​of the horizontal and vertical coordinates in the segmentation regions of the segmentation probability map, respectively. and These represent the horizontal and vertical coordinate positions of the segmented regions in the segmentation probability map, respectively. The representation in the segmentation probability graph is... The probability density at that location. Represents the natural constant. This indicates the preset balance parameters.

[0071] In practice, the balance parameter can be set within the range of 0.1 to 1, specifically 0.5.

[0072] In some embodiments of the present invention, in the step of updating the segmentation probability map output by the diffusion model based on the energy function, the segmentation probability map is updated using the following formula:

[0073] in, A segmentation probability plot representing time step t. Indicates the updated time step t +1 segmentation probability map This represents the preset learning rate parameter. This indicates the calculation of the gradient. This represents the gradient of the energy function.

[0074] In practice, the learning rate parameter can be 5×10. -4 .

[0075] In the specific implementation process, during the gradient calculation, the partial derivative of each position in the segmentation probability map with respect to the data item is calculated based on the data item, and the partial derivative results at each position are superimposed as the gradient value of the data item; the partial derivative of each position in the segmentation probability map with respect to the prior term is calculated based on the prior term, and the partial derivative results at each position are superimposed as the gradient value of the prior term; the partial derivatives of the prior term and the gradient value of the prior term are weighted and summed to obtain the gradient of the energy function.

[0076] In some embodiments of the present invention, in the step of determining the segmentation mask map based on the segmentation probability map, the segmentation probability map is binarized based on a preset segmentation threshold to obtain the segmentation mask map.

[0077] like Figure 3 As shown, in some embodiments of the present invention, the method further includes step S500, stacking the segmentation mask maps of each image slice of the original skeleton image based on the position of the image slice to obtain a three-dimensional mask, and using the Marching Cubes algorithm to extract the isosurface of the three-dimensional mask to generate an initial three-dimensional mesh model.

[0078] Using the above scheme, this scheme further constructs a three-dimensional model through the segmentation results of each slice, which can more intuitively represent the three-dimensional shape of the target location.

[0079] like Figure 4 As shown, in some embodiments of the present invention, the method further includes the following steps: Step S600: Extract the coordinates of all vertices in the initial 3D mesh model as node features of the graph; extract all adjacent vertices in the initial 3D mesh model; construct the edges of the graph; and obtain graph data. Step S700: Input the graph data into a preset graph neural network model, and the graph neural network outputs a topological feature vector; Step S800: Determine skeletal features based on the topological feature vector, compare the skeletal features with the constraint knowledge in the anatomical prior constraint knowledge base, determine anomaly indicators, and construct a comparison report.

[0080] Using the above approach, based on the generated 3D data, this approach can further compare the 3D structure with the constraint knowledge. For indicators that do not conform to the constraint knowledge, they can be added to the comparison report, providing doctors with preliminary reference.

[0081] In some embodiments of the present invention, in the step of inputting the graph data into a preset graph neural network model and the graph neural network outputting a topological feature vector, the graph neural network model includes a graph convolutional layer, an attention mechanism layer and a graph pooling layer arranged in sequence, and the graph pooling layer outputs a topological feature vector.

[0082] In the specific implementation process, the steps of this solution also include model pre-training. In the model pre-training step, a loss function is calculated based on the segmentation probability map of the label and the actual obtained segmentation probability map. The loss function can be the cross-entropy loss function or the MSE loss function, etc. The model is trained based on the loss function using the backpropagation algorithm. The specific loss function used can also be other existing loss functions.

[0083] Another aspect of the present invention relates to a local skeleton image segmentation system based on an attention mechanism. The system includes a computer device, which includes a processor and a memory. The memory stores computer instructions, and the processor is used to execute the computer instructions stored in the memory. When the computer instructions are executed by the processor, the system implements the steps of the method.

[0084] This invention also provides a computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the aforementioned attention-based local skeleton image segmentation method. The computer-readable storage medium can be a tangible storage medium, such as random access memory (RAM), main memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, floppy disks, hard disks, removable storage disks, CD-ROMs, or any other form of storage medium known in the art.

[0085] Those skilled in the art will understand that the exemplary components, systems, and methods described in conjunction with the embodiments disclosed herein can be implemented in hardware, software, or a combination of both. Whether implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this invention. When implemented in hardware, it can be, for example, electronic circuits, application-specific integrated circuits (ASICs), appropriate firmware, plug-ins, function cards, etc. When implemented in software, the elements of this invention are programs or code segments used to perform the desired tasks. The programs or code segments can be stored in a machine-readable medium or transmitted over a transmission medium or communication link via data signals carried in a carrier wave.

[0086] It should be clarified that the present invention is not limited to the specific configurations and processes described above and shown in the figures. For the sake of brevity, detailed descriptions of known methods are omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method process of the present invention is not limited to the specific steps described and shown. Those skilled in the art can make various changes, modifications, and additions, or change the order of steps, after understanding the spirit of the present invention.

[0087] In this invention, features described and / or illustrated for one embodiment may be used in the same or similar manner in one or more other embodiments, and / or combined with or in place of features of other embodiments.

[0088] The above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention. For those skilled in the art, various modifications and variations of the embodiments of the present invention are possible. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.

Claims

1. A local skeleton image segmentation method based on an attention mechanism, characterized in that, The steps of the method include: Obtain the anatomical prior constraint knowledge base and the anatomical correction database, extract the constraint knowledge of each bone position in the anatomical prior constraint knowledge base, and construct the constraint knowledge of each bone position as a constraint tensor; The original bone image is acquired and each image slice is standardized to obtain a standard bone image. The standard bone image is cropped based on a preset anatomical correction database to obtain a coarse-matched bone ROI image. The coarse-matched bone ROI image is input into the first key region enhancement model to obtain an enhancement feature map. The constraint tensor corresponding to the original skeletal image is expanded to the same spatial dimension as the enhancement feature map to obtain a spatial conditional feature map. The enhancement feature map and the spatial conditional feature map are input into the second key region enhancement model to obtain a second spatial weight mask map. The enhancement feature map and the constraint tensor are input into the channel enhancement processing model to obtain a channel weight mask map. A fusion feature map is calculated based on the second spatial weight mask map, the channel weight mask map, and the enhancement feature map. The fused feature map is input into a preset diffusion model, which outputs a segmentation probability map. A segmentation mask map is then determined based on the segmentation probability map.

2. The local skeleton image segmentation method based on attention mechanism according to claim 1, characterized in that, In the step of cropping the standard bone image based on the preset anatomical correction database, the coordinate range of the ROI region in the standard bone image is calculated based on the preset cropping algorithm to obtain the coarse matching ROI region range. Based on the range of the coarse-matched ROI region, a first ROI region in the standard bone image is determined, and the standard bone image is adjusted into a coarse-matched bone ROI image based on the first ROI region. The coarsely matched skeletal ROI image is input into the first key region enhancement model, and the first key region enhancement model outputs an enhanced feature map.

3. The local skeleton image segmentation method based on attention mechanism according to claim 2, characterized in that, In the step of calculating the coordinate range of the ROI region in the standard skeletal image based on a preset cropping algorithm to obtain the range of the coarsely matched ROI region: Obtain the template image corresponding to the standard skeleton image, normalize the pixel values ​​of the standard skeleton image and the template image to [0,1], traverse the normalized standard skeleton image and template image using a sliding window with a step of 1, calculate the average pixel value of each window, and calculate the NCC value of each window based on the calculated average pixel value of each window. The NCC value of each window is compared with the NCC threshold of the threshold. If it is greater than the preset NCC threshold, the ROI sub-region range is determined based on the corresponding position of the window in the standard skeleton image. The coarse matching ROI region range is determined based on the total ROI sub-region range.

4. The local skeleton image segmentation method based on attention mechanism according to claim 3, characterized in that, The window size is the same as the template image size. In the step of calculating the NCC value for each window based on the calculated average pixel value of each window, the NCC value is calculated using the following formula: ; in, The coordinates of the top left corner are: The NCC value of the window. m represents the height of the template image, and n represents the width of the template image. In a standard skeletal image The pixel value of the location, The coordinates of the top left corner in a standard skeleton image are: The average pixel value of the window. Indicates the template image The pixel value of the location, This represents the average pixel value of the template image.

5. The local skeleton image segmentation method based on attention mechanism according to claim 2, characterized in that, In the step of inputting the coarse-matched skeletal ROI image into the first key region enhancement model, and the first key region enhancement model outputting an enhanced feature map, the first key region enhancement model is sequentially configured with an input adaptation layer, a max pooling layer, an average pooling layer, a 7*7 convolutional layer, a Sigmaod activation function layer, and a weight fusion layer. The coarse-matched skeletal ROI image is input into the input adaptation layer, which is sequentially configured with a 3*3 convolutional layer, a batch normalization layer, and a ReLU activation function layer. The ReLU activation function layer outputs a normalized feature map. The max pooling layer and the average pooling layer respectively perform pooling processing on the normalized feature map. The outputs of the max pooling layer and the average pooling layer are concatenated and input into the 7*7 convolutional layer. The Sigmaod activation function layer connected to the 7*7 convolutional layer outputs a first spatial weight mask map. The weight fusion layer multiplies the first spatial weight mask map element-wise with the normalized feature map to obtain the enhanced feature map.

6. The local skeleton image segmentation method based on attention mechanism according to claim 1, characterized in that, In the steps of inputting the enhanced feature map and the spatial condition feature map into the second key region enhancement model to obtain the second spatial weight mask map; inputting the enhanced feature map and the constraint tensor into the channel enhancement processing model to obtain the channel weight mask map; and calculating the fused feature map based on the second spatial weight mask map, the channel weight mask map, and the enhanced feature map, the enhanced feature map and the spatial condition feature map are concatenated in the channel dimension to obtain the combined feature map; the combined feature map is input into the second key region enhancement model to obtain the second spatial weight mask map; and the enhanced feature map and the constraint tensor are input into the channel enhancement processing model, which outputs the channel weight mask map.

7. The local skeleton image segmentation method based on attention mechanism according to claim 6, characterized in that, In the step of inputting the combined feature map into the second key region enhancement model to obtain the second spatial weight mask map, the second key region enhancement model is provided with a max pooling layer, an average pooling layer, a 7*7 convolutional layer, a Sigmaod activation function layer, and a weight fusion layer. The max pooling layer and the average pooling layer respectively perform pooling processing on the combined feature map. The outputs of the max pooling layer and the average pooling layer are concatenated and input into the 7*7 convolutional layer. The second spatial weight mask map is output through the Sigmaod activation function layer connected to the 7*7 convolutional layer.

8. The local skeleton image segmentation method based on attention mechanism according to claim 6, characterized in that, In the step of inputting the enhanced feature map and the constraint tensor into the channel enhancement processing model, and the channel enhancement processing model outputting the channel weight mask map, the channel enhancement processing model compresses the enhanced feature map into a channel feature vector through global average pooling, concatenates the channel feature vector with the constraint tensor in the channel dimension to obtain a fused vector, and processes the fused vector into a channel weight mask map through sequentially set MLP layers and Sigmoid function layers.

9. The local skeleton image segmentation method based on attention mechanism according to claim 1, characterized in that, The diffusion model includes an encoder module and a decoder module. The encoder module includes multiple downsampling blocks, each of which includes two 3×3 convolutional layers, a ReLU activation layer, and a 2×2 max pooling layer. The decoder module includes multiple upsampling blocks, each of which includes a 2×2 deconvolutional layer, two 3×3 convolutional layers, and a ReLU activation layer.

10. The local skeleton image segmentation method based on an attention mechanism according to any one of claims 1-9, characterized in that, The step of determining a segmentation mask map based on the segmentation probability map output by the diffusion model further includes updating the segmentation probability map output by the diffusion model using an energy function. In this step, the probability density of each position in the segmentation region of the segmentation probability map is calculated, and a data item is calculated based on the probability density of each position. Feature values ​​of multiple segmentation features are determined based on the segmentation probability map. Prior terms are calculated based on the feature values ​​of each segmentation feature and the constraint feature values ​​in the constraint tensor. An energy function is calculated based on the prior terms and the data item, and the segmentation probability map output by the diffusion model is updated based on the energy function.

11. The local skeleton image segmentation method based on attention mechanism according to claim 10, characterized in that, In the step of calculating data items based on the probability density at each location of the segmented probability map, the data items are calculated using the following formula: ; In the step of calculating the prior terms based on the eigenvalues ​​of each segmentation feature and the constraint eigenvalues ​​in the constraint tensor, the prior terms are calculated using the following formula: ; In the step of calculating the energy function based on the prior terms and data terms, the energy function is calculated using the following formula: ; in, E Represents the energy function value. Represents a data item. Indicates prior terms, () indicates taking the maximum value among (). Representing segmentation features c eigenvalues, Representing segmentation features c In the constraint tensor, the constraint eigenvalues, Q, represent the set of segmentation features. Represented as segmentation features c Preset tolerance threshold, and These represent the maximum values ​​of the horizontal and vertical coordinates in the segmentation regions of the segmentation probability map, respectively. and These represent the horizontal and vertical coordinate positions of the segmentation regions in the segmentation probability map, respectively. The representation in the segmentation probability graph is... The probability density at that location. Represents the natural constant. This indicates the preset balance parameters.

12. The local skeleton image segmentation method based on attention mechanism according to claim 1, characterized in that, The method further includes the following steps: Extract the coordinates of all vertices in the initial 3D mesh model as node features of the graph, extract all adjacent vertices in the initial 3D mesh model, construct the edges of the graph, and obtain graph data; The graph data is input into a preset graph neural network model, and the graph neural network outputs a topological feature vector. Based on the topological feature vector, skeletal features are determined, and the skeletal features are compared with the constraint knowledge in the anatomical prior constraint knowledge base to determine anomaly indicators and construct a comparison report.

13. A local skeleton image segmentation system based on an attention mechanism, characterized in that: The system includes a computer device, which includes a processor and a memory. The memory stores computer instructions, and the processor executes the computer instructions stored in the memory. When the computer instructions are executed by the processor, the system implements the steps of the method according to any one of claims 1-12.