Fracture segmentation u-net based on semi-implicit memorized ordinary differential equation

By constructing a U-shaped fracture segmentation network based on a semi-implicit memory ordinary differential formula, and combining weak feature enhancement and lightweight dynamic compression modules, the problems of high computational cost and insufficient robustness of the U-shaped network are solved, and efficient and accurate fracture image segmentation is achieved.

CN122289286APending Publication Date: 2026-06-26TAIYUAN UNIVERSITY OF SCIENCE AND TECHNOLOGY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
TAIYUAN UNIVERSITY OF SCIENCE AND TECHNOLOGY
Filing Date
2026-03-27
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing U-shaped networks suffer from high computational cost, insufficient robustness, and high parameter sensitivity in medical image segmentation, especially in complex scenarios where they struggle to effectively understand spatial relationships.

Method used

A fracture segmentation U-shaped network based on semi-implicit memory ordinary differential formula is adopted. A lightweight U-shaped network architecture is constructed through discretization method. Combined with a weak feature enhancement module and a lightweight dynamic compression excitation module, the feature extraction capability is enhanced and local features of different scales are adaptively fused.

Benefits of technology

It significantly reduces the number of parameters and computational complexity, while improving the accuracy and robustness of fracture image segmentation. It can capture complex fracture structures more accurately with smoother boundaries, adapting to different task requirements.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122289286A_ABST
    Figure CN122289286A_ABST
Patent Text Reader

Abstract

This invention belongs to the field of medical image segmentation and analysis technology, and discloses a U-shaped network for fracture segmentation based on a semi-implicit memory ordinary differential formula. The specific technical solution is as follows: By employing different discretization methods of the neural memory ordinary differential formula, two plug-and-play decoders are proposed. These decoders integrate features at different levels by processing information from skip connections and performing numerical operations on the uplink path. Considering the small fracture region and low contrast with surrounding tissues, a weak feature enhancement module and a lightweight dynamic compression excitation module are designed in the skip connection part to improve feature extraction capabilities. Experiments on datasets demonstrate that while maintaining performance, the number of parameters and floating-point operations are significantly reduced, significantly enhancing fracture segmentation performance. The U-shaped network for fracture segmentation based on a semi-implicit memory ordinary differential formula proposed in this invention significantly improves segmentation performance and has the potential to adapt to all U-shaped networks.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of medical image segmentation and analysis technology, specifically involving a fracture segmentation U-shaped network based on semi-implicit memory ordinary differential formula. Background Technology

[0002] Deep learning is increasingly demonstrating its importance in the medical field, providing strong support for disease detection and diagnosis. The UNet network, as a landmark achievement of jump-connected encoder-decoder convolutional neural networks (CNNs) in medical image segmentation applications, has fully validated its effectiveness. Over time, UNet has become the foundational framework for the most renowned medical image segmentation methodologies. To achieve higher accuracy, many studies have introduced complex modules or increased the number of parameters. For example, ResUNet combines the UNet architecture with ResNet-inspired residual connections. Swin UNet utilizes the SwinTransformer architecture. The TransFuse model employs a dual-channel configuration, seamlessly combining CNNs and ViTs, enabling it to capture both global and local information simultaneously.

[0003] Existing U-shaped networks improve the performance of UNet, but most of them increase computational costs. This can pose challenges when applied in real-world scenarios with limited computational resources. To address this issue, some researchers are now focusing on model lightweighting. In recent developments, in 2022, Valanarasu and Patel proposed UNeXt, combining UNet with the MLP proposed by Tolstikhin et al. in 2021, proposing a lightweight architecture that achieves excellent performance while reducing the number of parameters and computational requirements. Also in 2022, Ruan et al. proposed MALUNet, successfully reducing model size by decreasing the number of model channels and integrating multiple attention modules. In 2023, Ruan et al. proposed the efficient group-enhanced EGEUNet, reducing the number of parameters and computational complexity by integrating enhanced attention mechanisms and feature fusion modules.

[0004] UNet is the cornerstone of all UNet variants for semantic image segmentation. It features a U-like structure with a contracting path for context extraction and an expanding path for precise localization. Utilizing skip connections, UNet effectively preserves fine-grained feature details. In 2018, Oktay et al. proposed Attention UNet (Att-UNet), introducing an attention mechanism into UNet and significantly improving its efficiency in extracting image features. In 2022, Valanarasu and Patel proposed UNeXt, a groundbreaking lightweight medical image segmentation network integrating convolutions and multilayer perceptrons. It achieved significant success in reducing network parameters and computational workload, but suffers from insufficient robustness to low-quality images and high sensitivity to hyperparameters. Also in 2022, Ruan et al. proposed MALUNet, which expands the receptive field through dilated convolutions, focuses on the target region through gated attention mechanisms, and reduces computational overhead through depthwise separable convolutions. It performs well in skin cancer segmentation, but its understanding of spatial relationships is weaker than Transformer models in complex scenarios such as tumor boundaries and intertwined multiple organs. Building on MALUNet, in 2023, Ruan et al. proposed EGE UNet, which further improved the attention mechanism by performing Hadamard attention on feature groups, extracting pathological information from different perspectives, and introducing a new feature fusion module. While ensuring network performance, it significantly reduced network complexity. EGE-UNet outperformed many large networks in both performance and efficiency.

[0005] Ordinary Differential Equations (ODEs) are a unique class of dynamical systems. Neural ODEs (NODEs) provide the mathematical principles that elucidate ResNet, laying the foundation for the unification of neural networks and ODEs. A significant advantage of nodes is their ability to eliminate the storage of intermediate quantities during forward propagation, thereby greatly reducing parameters and computational overhead. However, there are limitations to mapping data through nodes. When modeling problems using differential formulas, it has been proven that attractors in dynamical systems are considered to be related to memory capacity. However, traditional nodes lack the ability to effectively utilize the memory capacity provided by attractors.

[0006] nmODE is a specialized variant of NODE, designed to overcome the inherent limitations of traditional NODE and leverage the full memory capabilities offered by dynamic systems. It enhances the nonlinear expressive power of neural networks by employing implicit mappings and utilizing nonlinear activation functions. Simulating the dynamic system controlling the memory of neocortical neurons, nmODEs introduce a unique feature—a clear dynamic characteristic achieved through the separation of learning and memory neurons. Unlike previous NODE methods, nmODE treats input data as extrinsic parameters rather than using it as initial values ​​for the ODE. By dividing the function of neurons into learning and memory parts, where learning occurs only in the learning part and the memory part maps the input to its global attractor, a mapping from the input space to the memory space is established. nmODE has been successfully applied in various segmentation tasks. For example, in 2023, Dong et al. proposed nmPLS Net, which utilizes the robust nonlinear representation and storage capabilities of nmODE to construct a decoding network based on edge segmentation, achieving accurate lobe segmentation. In addition, some studies have integrated nmODEs into UNet using direct discretization methods. In 2024, Wang et al. showed promising results in tasks such as diabetic kidney segmentation, and in 2023, He et al. showed promising results in liver segmentation. In 2023, Hu et al. used nmODEs to enhance the robustness of medical image segmentation and achieved good results. These applications demonstrate the versatility and effectiveness of nmODEs in various medical image segmentation challenges.

[0007] While the aforementioned methods demonstrate impressive capabilities in reducing model parameters and computational complexity, they often face challenges in adapting to existing frameworks and lack generality. To achieve lightweighting, most of these networks significantly reduce the number of channels in their network architecture, thus striking a trade-off between network performance and parameter and computational reduction. However, these networks typically treat lightweighting and network performance as separate objectives, lacking a comprehensive solution. Summary of the Invention

[0008] To address the technical problems existing in the prior art, this invention provides a fracture segmentation U-shaped network based on a semi-implicit memory ordinary differential formula. The network is transformed into a decoder that can be embedded in the U-shaped network through a discretization method. A weak feature enhancement module and a lightweight dynamic compression excitation module are added to enhance the feature extraction capability and lighten the model.

[0009] To achieve the above objectives, the technical solution adopted in this invention is as follows: a U-shaped network for fracture segmentation based on semi-implicit memory ordinary differential formulas is used. A lightweight U-shaped network architecture is constructed using the discretization method of nmODE. The U-shaped network extracts low-level features using skip connections and injects these low-level features into the internal state for feature aggregation, generating a series of low-parameter nmODE decoders. These decoders are then discretized into concise modules that can be embedded into the U-shaped network. A weak feature enhancement module and a lightweight dynamic compression excitation module are designed in the feature extraction and skip connection parts. Local features of different scales are adaptively fused for the fracture region in the fracture image. The weak feature enhancement module optimizes the model according to the actual task requirements by dynamically selecting and fusing relevant features. The lightweight dynamic compression excitation module dynamically adjusts the structure according to the network feature extraction stage to balance detail preservation and semantic relationship capture.

[0010] The nmODEs decoder is discretized using two different methods, and the differential formula for nmODEs is as follows:

[0011]

[0012] in, Representing state Regarding time The derivative of, describing Rate of change over time; This represents the network's internal state vector at time t, corresponding to the feature information of the upsampled path; Representative moment The external input vector corresponds to the low-level feature information passed by the skip connection; Indicates external input The corresponding learnable parameters act on The function is used to implement the transformation and adaptation of the underlying features; It is a feature transformation function that transforms the input of the skip connections. Transformation to internal state Matching feature dimensions; It is a non-linear activation function that uses batch normalization.

[0013] Given derivative , for along Choose a value for the size of each step of the axis. and set , and of for:

[0014]

[0015] Indicates at discrete time step At that time, the internal state of the network The approximate value of, i.e. This corresponds to the input features of a certain layer in the U-shaped network decoder; Indicates at discrete time step At that time, the internal state of the network The approximate value corresponds to the output feature of the current layer of the U-shaped network decoder; This represents the step size for discretization, which takes the value 1 / L in the U-shaped network; Indicates at time step At that time, state Regarding time The derivative of , its value is equal to ; This is a simplified notation of Formula 1, representing the differential expression of nmODEs; Indicates the first A discrete time point.

[0016] set up From formulas (1) and (2), we can derive:

[0017]

[0018] In the context of a U-shaped network structure, the value at the lowest level As an initial value, the process is generated in discrete form, and the number of solution steps is inversely proportional to the number of network layers:

[0019]

[0020] In the case of applying a U-type network, the total number of layers is ; It is the current layer number. In formula (4) Indicates the first uplink path The layer's input is used as the solution. The initial value; This indicates information transmitted via a hop connection. for ; Indicates the first Parameters for layer skip connections.

[0021] Given derivative , for along Choose a value for the size of each step of the axis. and set as follows:

[0022]

[0023] in, It is time The approximate solution at that point, i.e. , Is with and The relevant final results.

[0024] set up From formulas (1) and (5), we can derive:

[0025]

[0026] Rewrite formula (6) in discrete form:

[0027]

[0028] Based on the internal structure of the decoder designed according to formula (7), the linear multi-step method takes the uplink path information of the two layers as input. and That is, the current layer and the previous layer, which means ;

[0029] Will Treat it as a nonlinear part and solve it using an explicit method.

[0030] The right-hand side term of nmODE is split into two neural networks. and Its characteristics are not limited, and its expression form is as follows:

[0031]

[0032] Representing state Regarding time The derivative of describes the rate of evolution of the system state over time; Represents the state vector of the system; Represents a nonlinear neural network mapping; Represents a linear neural network mapping, i.e. For nonlinear terms, It is a linear term; because , If it is a linear transformation matrix, then its expression can be represented as:

[0033]

[0034] In the forward pass, the implicit-explicit method is used to solve the partition nmODE, on the right side. It is explicitly processed, while Implicitly processed, at each time step, the solution is obtained from the following formula. Advance to ;

[0035]

[0036]

[0037] It is the intermediate state of the i-th stage, used to gradually approximate the solution of the final time step; This is the state at the nth time step. It is the state at the (n+1)th time step; It is the time step. The coefficient matrix corresponds to the outputs of the nonlinear and linear terms at the j-th stage, respectively. Indicates the explicit coefficients. Represents the implicit coefficients; weighting coefficients For the final weights of the explicit part, This represents the final weight of the implicit part.

[0038] To transform formula (11) into a system of linear formulas, the specific steps are to define the linear terms. Substituting into formula (11), we get:

[0039]

[0040] containing unknown quantities The term is moved to the left, and the remaining known terms remain on the right. After simplification, we obtain formula (13):

[0041]

[0042] At this point, the left side of formula (13) contains unknowns. The set of linear terms, where the right-hand side consists entirely of known quantities, where... It is the state of the previous time step. The identity matrix, dimensions, and state vector are given. Consistent It was before The output of the nonlinear term of the stage, It was before The known stage values ​​of the stage.

[0043] Features that have undergone sparse feature preservation and dynamic channel compression are concatenated and then subjected to global average pooling to compress spatial dimensions and obtain channel-level global information. Channel weights are then generated using a 1×1×1 convolutional layer and a sigmoid activation function. The formula is:

[0044]

[0045] in, and These represent the features after sparsity processing and dynamic compression processing, respectively. Each element corresponds to the importance of a channel.

[0046] Use channel weights Features after splicing Channel-by-channel weighting is performed, and then a 1×1×1 convolutional layer is used to reduce the number of channels from 2C to the original number of channels C, resulting in the channel-filtered features. The formula is:

[0047]

[0048] The spatial-level dynamic selection process directly applies the original input features. and Dimensionality reduction and feature interaction enhancement are achieved through 1×1×1 convolutional layers, followed by element-wise addition and a sigmoid activation function to generate spatial weights. The formula is:

[0049]

[0050] in, Each element corresponds to the importance of a spatial location in the feature map (between 0 and 1), with a larger value indicating that the location is a key segmentation region.

[0051] Using spatial weights Features after channel filtering By performing spatial location weighting, the final adaptive fusion features are obtained. The formula is:

[0052]

[0053] The WFEM module integrates low-level edge detail information and high-level structural information by fusing features from different layers and scales, enabling the model to analyze fracture images from both global and local perspectives, thereby improving the accuracy and robustness of fracture image segmentation.

[0054] The loss functions used are BCE loss and Dice loss. The overall loss function expression is as follows:

[0055]

[0056] in, Indicates the loss weight. For binary cross-entropy loss, For Dice coefficient loss;

[0057] The expression for BCE loss is:

[0058]

[0059] in, Indicates the first The real label of each pixel Indicates the first The predicted probability of each pixel. Indicates the total number of pixels in the batch;

[0060] The expression for Dice loss is:

[0061]

[0062] in, Indicates the first Individual samples, locations The predicted probability; Indicates the first Individual samples, locations The true label, Indicates the height and width of the image. Indicates the smoothing term. Indicates batch size.

[0063] Compared with the prior art, the specific beneficial effects of this invention are reflected in:

[0064] First, this invention proposes two plug-and-play decoders by employing different discretization methods of the neural memory ordinary differential equations (nmODEs). These decoders process information from skip connections and perform numerical operations on the upward path, thereby achieving the fusion of features at different abstraction levels. They can be embedded into different U-shaped networks and significantly reduce the number of parameters and floating-point operations while maintaining performance, thus significantly enhancing fracture segmentation performance.

[0065] Second, this invention proposes a weak feature enhancement module in the skip connection part. Through dual-dimensional dynamic calibration, it considers global information in both channel and spatial dimensions to avoid feature loss caused by single-dimensional screening. In the skip connection part, an enhancement module is proposed to improve the robustness of the fusion of small targets and sparse features, taking into account the small size of fracture morphology and its low distinction from the background.

[0066] Third, this invention proposes a lightweight feature extraction module that dynamically matches feature requirements at different stages, preserves details at the shallow level and captures semantics at the deep level, and significantly reduces the number of model parameters and computational cost without reducing segmentation performance.

[0067] Fourth, the segmentation network based on semi-implicit neural memory ordinary differential formula proposed in this invention, compared with the linear combination and simple convolution of traditional decoders, is better able to capture small and complex structures in fracture images, and achieves more accurate segmentation of subtle fracture regions with smoother boundaries. This design not only improves the model's segmentation performance, but also adaptively fuses low-level detail features (such as edges and textures) of skip connections with high-level semantic features (such as the overall contour of the fracture) of the upward path through dynamic trajectory modeling. By applying the semi-implicit neural memory ordinary differential formula, it achieves the goals of being universal, efficient, high-performance, and easy to deploy. Attached Figure Description

[0068] Figure 1 This is a general framework diagram of the present invention.

[0069] Figure 2 This is a comparison chart showing the differences between traditional NODE and nmODE.

[0070] Figure 3 This is a diagram of the internal structure of a semi-implicit Euler decoder.

[0071] Figure 4 This is a diagram of the internal structure of a semi-implicit linear multistep decoder.

[0072] Figure 5 This is a schematic diagram of the weak feature enhancement module.

[0073] Figure 6 This is a schematic diagram of a lightweight dynamic compression excitation module.

[0074] Figure 7 The diagram shows ablation experiments for different modules.

[0075] Figure 8 This is a visual representation of the segmentation method of the present invention. Figure 8 (a) is a visual representation of the YSLMD decoder. Figure 8 (b) is a visual representation of the YSEED decoder.

[0076] Figure 9 The ablation experiment diagram shows the number of channels y(0) for the initial number of channels. Detailed Implementation

[0077] To make the technical problems to be solved, the technical solutions, and the beneficial effects of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present invention and are not intended to limit the present invention.

[0078] like Figure 1As shown, a U-shaped fracture segmentation network based on semi-implicit memory ordinary differential formula is constructed using the discretization method of nmODE to build a lightweight U-shaped network architecture. nmODE has the characteristic of adapting to U-shaped network decoders. These U-shaped networks use skip connections to extract low-level features and inject them into the internal state for feature aggregation, thereby generating a series of low-parameter nmODE decoders. They inherit the nonlinear expression, memory ability and stability of nmODEs, and are transformed into concise modules that can be embedded in U-shaped networks through discretization.

[0079] exist Figure 2 The text illustrates the input differences between a traditional NODE and an nmODE. In a typical NODE configuration, the initial value... The output, derived from the data itself, represents the numerical solution of the ODE. In contrast, in the nmODEs architecture, the data serves as a continuous external input. initial value It is set to a random value, such as 0. nmODE receives two inputs, seamlessly aligned with the decoder of the U-shaped network. Information from the skipped connections is used as the sequential external input to the nmODE decoder, while information from the uplink is used as... This allows for full utilization of nmODE.

[0080] In the improved lightweight U-shaped network, the uplink path is parameterless. Only a few parameters are responsible for information integration in skip connections to match the feature sizes on the uplink path. The nmODE decoder is discretized using two different methods, replacing the original decoder of the U-shaped network. The internal structure of the decoder varies with the discretization method. Both discretization methods solve the initial value problem of nmODE and share common steps.

[0081] First, the differential formula for nmODEs is as follows:

[0082]

[0083] in, Representing state Regarding time The derivative of, describing Rate of change over time; This represents the network's internal state vector at time t, which corresponds to the feature information of the upsampled path; Representative moment The external input vector corresponds to the low-level feature information passed by the skip connection; Indicates external input The corresponding learnable parameters act on The function is used to implement the transformation and adaptation of the underlying features; It is a feature transformation function, whose function is to transform the input of the skip connection. Transformation to internal state Matching feature dimensions; It is a non-linear activation function used to enhance the non-linear expressive power of the network, specifically employing batch normalization.

[0084] Given derivative , for along Choose a value for the size of each step of the axis. and set , and of for:

[0085]

[0086] Indicates at discrete time step At that time, the internal state of the network The approximate value of, i.e. This corresponds to the input features of a certain layer in the U-shaped network decoder; Indicates at discrete time step At that time, the internal state of the network The approximate value corresponds to the output feature of the current layer of the U-shaped network decoder; This represents the step size for discretization, which is 1 / L in the U-shaped network. Indicates at time step At that time, state Regarding time The derivative of , its value is equal to ; This is a simplified notation of Formula 1, representing the differential expression of nmODEs; Indicates the first A discrete time point.

[0087] Theorem 1: The explicit Euler method is the most conceptually direct and also the most convenient discretization method in application. Let... This can be derived from formulas (1) and (2):

[0088]

[0089] Formula (3) is designed for continuous values ​​and needs to be adapted to the needs of discrete neural networks. In the context of a U-shaped network structure, the values ​​at the lowest level... As initial values, the initial value problem is solved during the uplink operation. This process is generated in a discrete form, and the number of solution steps is inversely proportional to the number of network layers.

[0090]

[0091] In the case of applying a U-type network, the total number of layers is . It is the current layer number. In formula (4) Indicates the first uplink path The layer's input is used as the solution. The initial value; This indicates information transmitted via a hop connection. for ; Indicates the first Parameters of layer skip connections. For example... Figure 3 As shown, the internal implementation of the decoder is illustrated based on formula (4).

[0092] Similarly, given the derivative , for along Choose a value for the size of each step of the axis. and set as follows:

[0093]

[0094] in, It is time The approximate solution at that point, i.e. , Is with and The relevant final results.

[0095] Linear multistep methods involve linear combinations of derivatives at multiple selected points. The most common number of selected points is two. The calculation of unknown points is performed using information from the two known points. This type of linear multistep method is also called Euler's two-step method. Let... This can be derived from formulas (1) and (5):

[0096]

[0097] Rewrite formula (6) in discrete form:

[0098]

[0099] The internal structure of the decoder designed based on formula (7) is as follows: Figure 4 As shown, the linear multi-step method requires two layers of uplink path information as input. and That is, the current layer and the previous layer. This means Therefore, the decoder of layer L applies EED.

[0100] The solution logic of YSLMD is similar to that of YSEED. In YSLMD, Treat it as a linear part and solve it using an implicit method. Treat it as a nonlinear part and solve it using an explicit method.

[0101] To accommodate the semi-implicit solution logic, the right-hand side of nmODE is split into two neural networks. and Its characteristics are not limited, and its expression form is as follows:

[0102]

[0103] Representing state Regarding time The derivative of describes the rate of evolution of the system state over time; Represents the state vector of the system; Represents a nonlinear neural network mapping (such as a feedforward neural network, MLP); This represents a linear neural network mapping (such as a fully connected layer or convolutional layer without an activation function), i.e. For nonlinear terms, It is a linear term, and because , If it is a linear transformation matrix, then its expression can be represented as:

[0104]

[0105] In the forward pass, the implicit-explicit RK (IMEX-RK) method is used to solve the partition nmODE, especially the right-hand side. It is explicitly processed, while Implicitly processed, at each time step, the solution is obtained from the following formula. Advance to :

[0106]

[0107]

[0108] It is the intermediate state (stage value) of the i-th stage, used to gradually approximate the solution of the final time step; It is the state at the nth time step (given input). It is the state at the (n+1)th time step (to be determined and output); It is the time step (between two adjacent time points) (interval) These correspond to the outputs of the nonlinear term (explicit processing) and the linear term (implicit processing) at the j-th stage, respectively; the coefficient matrix. This represents the explicit coefficients, and is strictly lower triangular, ensuring that explicit calculations are independent. Represents the implicit coefficients, also known as the lower triangular coefficients, allowing for non-zero diagonal values; weight coefficients. For the final weights of the explicit part, The final weights for the implicit part are defined by the Butcher tableau.

[0109] To avoid nonlinear iteration and improve computational efficiency, formula (11) can be transformed into a linear formula system. The specific steps are to define the linear terms. (k is any stage) Substituting into formula (11) we get:

[0110]

[0111] containing unknown quantities The item (i.e.) time Move the given terms to the left, leaving the remaining known terms on the right, and after simplification, we get formula (13):

[0112]

[0113] At this point, the left side of formula (13) contains unknowns. The set of linear terms, where the right-hand side consists entirely of known quantities, where... It is the state of the previous time step. The identity matrix, dimensions, and state vector are given. Consistent It was before The output of the nonlinear term of the stage, It was before The known stage values ​​of the stage.

[0114] Formula (13) transforms the problem into a system of linear formulas for solution, which can be quickly obtained using readily available linear solvers (such as LU decomposition and matrix-independent iteration methods). No iteration is required. Linear transformation matrix. It is a constant (recalculated only when the model parameters are updated), therefore the coefficient matrix on the left is... It can be reused across different stages of the same time step, or even different time steps, which greatly reduces the computational cost.

[0115] Nonlinear explicit part The core solution is based on direct forward computation of a neural network with known prior results, without iteratively solving the formula set, i.e., calculating the i-th stage. At that time, only relying on the previous Known stage values ​​for each stage It does not involve unknowns at the current stage.

[0116] Taking the s-stage IMEX-RK as an example:

[0117] Phase 1 ( ):because When the summation term is 0, no calculation is required. Directly proceed to solve the linear implicit part. ;

[0118] Phase 2 ( First call The neural network uses the known stage values ​​of stage 1. As input, it is obtained through one forward propagation. The result is substituted into formula (11) as a known quantity to assist in solving the linear implicit part. ;

[0119] Phase i ( ): Before collection Known stage values ​​for each stage Enter them respectively get Then sum through the explicit terms of formula (11) ), thus obtaining the linear implicit part solution. Required known items.

[0120] Time step update: All Each stage After the calculation is completed, sum the results according to the weights in formula (11). The result of the linear implicit part is added together to obtain the output of the current time step. .

[0121] To address the differences in shape and size of fracture regions in fracture images, this paper adaptively fuses local features at different scales to improve the model's ability to segment complex structures. A weak feature enhancement module is designed in the skip connection part, which dynamically selects and fuses relevant features, changing the traditional feature fusion method. This allows the model to be optimized according to the actual task requirements. The specific weak feature enhancement module is as follows: Figure 5 As shown.

[0122] Considering that the feature signals of some small organs or sparse structures in fracture images are weak and easily covered by the strong features of large organs during the fusion process, the sparse features of small organs may be lost during channel compression. Therefore, in the channel fusion stage, a sparse feature detection module, namely the sparse feature preservation branch, is designed first. It measures the sparsity of features by the L1 norm, identifies the channels corresponding to small targets, preserves the residuals of these channels separately, and then directly concatenates them with the compressed feature map. Finally, it enters the channel selection stage and participates in the final spatial selection stage. In addition, to address the issue that the redundancy of features varies and cannot be simply compressed from 2C channels to C channels, a dynamic channel compression ratio strategy is adopted. By calculating the cosine similarity between features, the compression ratio is dynamically adjusted. That is, if the redundancy of two feature maps is high, it is compressed to C channels; if the redundancy is low, some extra channels (such as 1.5C) are retained. Then, key channels are selected through channel attention to balance fusion efficiency and feature integrity.

[0123] Features that have undergone sparse feature preservation and dynamic channel compression are concatenated and then subjected to global average pooling (AVGPool) to compress spatial dimensions and obtain channel-level global information. Channel weights are then generated using a 1×1×1 convolutional layer (Conv_1) and a sigmoid activation function. The formula is:

[0124]

[0125] in, and These represent the features after sparsity processing and dynamic compression processing, respectively. Each element corresponds to the importance of a channel (between 0 and 1), with a larger value indicating that the channel feature is more critical to segmentation.

[0126] Use channel weights Features after splicing Channel-by-channel weighting is performed, and then a 1×1×1 convolutional layer is used to reduce the number of channels from 2C to the original number of channels C, resulting in the channel-filtered features. This preserves important channel features while maintaining the dimensionality consistency of the feature map. The formula is as follows:

[0127]

[0128] The spatial-level dynamic selection process directly applies the original input features. and Dimensionality reduction and feature interaction enhancement are achieved through 1×1×1 convolutional layers, followed by element-wise addition and a sigmoid activation function to generate spatial weights. The formula is:

[0129]

[0130] in, Each element corresponds to the importance of a spatial location in the feature map (between 0 and 1), with a larger value indicating that the location is a key segmentation region.

[0131] Using spatial weights Features after channel filtering By performing spatial location weighting, the final adaptive fusion features are obtained. The formula is:

[0132]

[0133] The WFEM module integrates low-level edge detail information and high-level structural information by fusing features from different layers and scales, enabling the model to analyze fracture images from both global and local perspectives, thereby improving the accuracy and robustness of fracture image segmentation.

[0134] like Figure 6 As shown, the lightweight dynamic compression excitation module dynamically selects between shallow and deep modes based on the channel-to-space size ratio (C / H / W) of the feature map. When the feature map is in an early stage (C / H / W<1), the shallow mode is selected. The shallow mode uses lightweight point convolution and inverse residual structures to reduce computation. SERatio=4 is set, meaning the number of output channels is 4 times the number of input channels, to enhance feature representation. This operation focuses on preserving high-resolution details and avoids information loss during downsampling. When the feature map enters a deep layer (C / H / W≥1), the deep layer mode is selected. The deep layer mode adopts a channel segmentation strategy, which divides the channels into 8 groups. First, the channel dimension is split into 8 independent subsets. Each channel subset is processed by "depth-separable convolution + pointwise convolution". The channels after processing the 8 subsets are merged. Then, SERatio is set to 1 / 4 to achieve channel compression and aggregation of key semantic features. After enhancing the non-linear expression through activation functions, it is fused with other subset features to form unified deep semantic information. It focuses on semantic relationship modeling and improves robustness to complex lesions (such as blurred boundaries and small lesions).

[0135] The lightweight dynamic compression excitation module design incorporated in the encoder downsampling stage combines the detail preservation path and the semantic optimization path. It not only preserves the fine structure and edge details of the fracture area, but also strengthens the global feature association of the fracture area through dynamic semantic aggregation, which significantly improves the pertinence and effectiveness of feature expression.

[0136] The loss functions used are BCE loss and Dice loss. The overall loss function expression is as follows:

[0137]

[0138] in, Indicates the loss weight. For binary cross-entropy loss, This represents the Dice coefficient loss.

[0139] The expression for BCE loss is:

[0140]

[0141] in, Indicates the first The real label of each pixel Indicates the first The predicted probability of each pixel. This indicates the total number of pixels in the batch.

[0142] The expression for Dice loss is:

[0143]

[0144] in, Indicates the first Individual samples, locations The predicted probability; Indicates the first Individual samples, locations The true label, Indicates the height and width of the image. Indicates the smoothing term. Indicates batch size.

[0145] The dataset used is FracAtlas, which contains X-ray scan images of four body parts: shoulder, wrist, hip, and leg. The dataset contains a total of 4083 X-ray images, which were manually annotated using the open-source annotation platform makesense.ai. Each fracture region in the image is equipped with an independent segmentation mask and bounding box, which can be used for fracture classification, localization, and segmentation tasks.

[0146] The dataset contains 717 images with fractures, totaling 922 fracture instances annotated. By location, there are 1538 hand scans, including 437 fractures; 2272 leg scans, including 263 fractures; 338 hip scans, including 63 fractures; and 349 shoulder scans, including 63 fractures.

[0147] During training, the Adam optimizer with 0.5 momentum and 0.01 weight decay was used for 300 epochs of iterative training. All experiments were implemented on a PyTorch platform using a 32GB NVIDIA RTX2080 GPU. All images were uniformly adjusted to a resolution of 256 x 256 pixels. To ensure fair comparison, all models used the same batch size of 8, and the same evaluation method was used during testing. The initial learning rate was 1e-3. Data augmentation was used during training, including random horizontal and vertical flipping, rotation, and random cropping. Segmentation results were evaluated using the Joint Mean Intersection (mIoU) and Dice Similarity Score (DSC) performance metrics commonly used in medical image segmentation.

[0148] To verify the effectiveness of the proposed algorithm, a network design was completed based on the UNet network with a fully explicit decoder. Specifically, the contributions of two semi-implicit decoders (YSLMD and YSEED), the Weak Feature Enhancement Module (WFEM), and the Lightweight Dynamic Compression Excitation Module (EDSSE) to fracture image segmentation performance were explored. Ablation experiments were conducted separately for the two decoders. First, all three modules were removed, and a baseline model, denoted as Model 1, was constructed based on the UNet network with the fully explicit decoder. Then, the proposed YSLMD module was introduced into the baseline model, forming Model 2; adding the WFEM module to the same baseline model formed Model 3; and adding the WFEM module to Model 2 formed Model 4. Finally, the YSLMD, WFEM, and EDSSE modules were integrated into the baseline model to form a complete model, denoted as Model 5. The construction approach for the YSEED ablation experiment model was the same as that for YSLMD.

[0149] The results of the ablation experiments are detailed in Table 1, with the best performance indicators highlighted in bold. From Table 1, the following conclusions can be drawn: incorporating a semi-implicit algorithm into the baseline model can improve the model's segmentation ability; adding a weak feature enhancement module allows the model to analyze fracture images from both global and local perspectives by fusing features from different layers and scales, thereby improving the accuracy and robustness of fracture image segmentation; adding a lightweight dynamic compression excitation module significantly reduces the number of parameters and computational load while maintaining performance. Figure 7 This is a scatter plot of ablation experiments. The horizontal axis represents the mIOU index, with the further to the right, the better; the vertical axis represents the DSC index, with the further to the top, the better; the size of the bubbles represents the parameter magnitude, with the smaller the better. The plot shows that Moudel 9 has better performance, but Moudel 10 offers the lowest parameter count while maintaining high performance.

[0150] Table 1 Ablation experiments for different modules

[0151]

[0152] The visual effects segmented by the YSLMD and YSEED decoders are as follows: Figure 8 As shown, the first row represents the X-ray image of the fracture, the second row represents the ground truth, and the third row represents the segmentation result of the method of the present invention. As can be seen from the visual effect diagram, the method of the present invention can not only segment large areas of fractures such as the legs and shoulders, but also achieve good segmentation results for small areas of fractures in the hands and ankles, as well as areas with two fractures at the same time.

[0153] Furthermore, the number of channels in the initial y(0) plays a crucial role in determining the overall network complexity and significantly impacts its information integration capabilities. As a key hyperparameter, its selection is paramount. Therefore, ablation experiments were conducted with the initial number of channels y(0) set to 3, 6, 8, and 16, respectively. Table 2 shows that increasing the number of channels in y(0) to 6 improves network performance; choosing a higher number of channels, such as 8 or 16, while improving performance metrics, also leads to an increase in the number of parameters and network complexity. For tasks requiring high computational power and a large number of parameters, a configuration with 6 channels is preferable.

[0154] Table 2 Ablation experiments with the initial number of channels y(0)

[0155]

[0156] The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included within the scope of the present invention.

Claims

1. A fracture segmentation U-shaped network based on semi-implicit memory ordinary differential formula, characterized in that, A lightweight U-shaped network architecture is constructed using the discretization method of nmODE. The U-shaped network extracts low-level features through skip connections and injects these features into the internal state for feature aggregation, generating a low-parameter nmODE decoder. This decoder is then discretized into a concise module that can be embedded into the U-shaped network. A weak feature enhancement module and a lightweight dynamic compression excitation module are designed in the feature extraction and skip connection parts. Local features of different scales are adaptively fused for the fracture region in fracture images. The weak feature enhancement module dynamically selects and fuses relevant features, allowing the model to be optimized according to the actual task requirements. The lightweight dynamic compression excitation module dynamically adjusts the structure according to the network feature extraction stage to balance detail preservation and semantic relationship capture.

2. The fracture segmentation U-shaped network based on semi-implicit memory ordinary differential formula according to claim 1, characterized in that, The nmODEs decoder is discretized using two different methods, and the differential formula for nmODEs is as follows: ; in, Representing state Regarding time The derivative, This represents the internal state vector of the network at time t. Representative moment external input vector, Indicates external input The corresponding learnable parameters, It is the characteristic transformation function. It is a non-linear activation function; Given derivative , for along Choose a value for the size of each step of the axis. and set , and of for: ; Indicates at discrete time step At that time, the internal state of the network Approximate value; Indicates at discrete time step At that time, the internal state of the network Approximate value; Indicates the step size of discretization; Indicates at time step At that time, state Regarding time The derivative; This is a simplified notation of formula (1), representing the differential expression of nmODEs; Indicates the first A discrete time point; set up From formulas (1) and (2): ; In the context of a U-shaped network structure, the value at the lowest level As an initial value, the process is generated in discrete form, and the number of solution steps is inversely proportional to the number of network layers: ; In the case of applying a U-type network, the total number of layers is: ; It is the current layer number. In formula (4) Indicates the first uplink path The layer's input is used as the solution. The initial value; This indicates information transmitted via a hop connection. for ; Indicates the first Parameters of layer skip connections; Given derivative , for along Choose a value for the size of each step of the axis. and set as follows: ; in, It is time The approximate solution at that point, i.e. , Is with and The relevant final results; set up From formulas (1) and (5), we can derive: ; Rewrite formula (6) in discrete form: ; Based on the internal structure of the decoder designed according to formula (7), the linear multi-step method takes the uplink path information of the two layers as input. and ; Will Treat it as a nonlinear component and solve it using an explicit method; The right-hand side term of nmODE is split into two neural networks. and The expression is as follows: ; Representing state Regarding time The derivative, Represents the state vector of the system. Represents a nonlinear neural network mapping. Represents a linear neural network mapping. If the transformation matrix is ​​linear, then its expression can be represented as: ; In the forward pass, the partition nmODE is solved using an implicit-explicit method, on the right side. It is explicitly processed, while Implicitly processed, at each time step, the solution is obtained from the following formula. Advance to ; ; ; It is an intermediate state of the i-th stage. This is the state at the nth time step. It is the state at the (n+1)th time step. It is the time step. The coefficient matrix corresponds to the outputs of the nonlinear and linear terms at the j-th stage, respectively. Indicates the explicit coefficients. Represents the implicit coefficients and weighting coefficients. For the final weights of the explicit part, The final weight of the implicit part; To transform formula (11) into a system of linear formulas, the specific steps are to define the linear terms. Substituting into formula (11), we get: ; containing unknown quantities The term is moved to the left, and the remaining known terms remain on the right. After simplification, we obtain formula (13): ; At this point, the left side of formula (13) contains unknowns. The set of linear terms, where the right-hand side consists entirely of known quantities, where... It is the state of the previous time step. The identity matrix, dimensions, and state vector are given. Consistent, It was before The output of the nonlinear term of the stage, It was before The known stage values ​​of the stage.

3. The fracture segmentation U-shaped network based on semi-implicit memory ordinary differential formula according to claim 2, characterized in that, Features that have undergone sparse feature preservation and dynamic channel compression are concatenated and then subjected to global average pooling to compress spatial dimensions and obtain channel-level global information. Channel weights are then generated using a 1×1×1 convolutional layer and a sigmoid activation function. The formula is: ; in, and These represent the features after sparsity processing and dynamic compression processing, respectively. Each element corresponds to the importance of a channel; Use channel weights Features after splicing Channel-by-channel weighting is performed, and then a 1×1×1 convolutional layer is used to reduce the number of channels from 2C to the original number of channels C, resulting in the channel-filtered features. The formula is: ; The spatial-level dynamic selection process directly applies the original input features. and Dimensionality reduction and feature interaction enhancement are achieved through 1×1×1 convolutional layers, followed by the generation of spatial weights using element-wise addition and a sigmoid activation function. The formula is: ; in, Each element corresponds to the importance of a spatial location in the feature map; Using spatial weights Features after channel filtering By performing spatial location weighting, the final adaptive fusion features are obtained. The formula is: 。 4. The fracture segmentation U-shaped network based on semi-implicit memory ordinary differential formula according to claim 3, characterized in that, The loss functions used are BCE loss and Dice loss. The overall loss function expression is as follows: ; in, Indicates the loss weight. For binary cross-entropy loss, For Dice coefficient loss; The expression for BCE loss is: ; in, Indicates the first The real label of each pixel Indicates the first The predicted probability of each pixel. Indicates the total number of pixels in the batch; The expression for Dice loss is: ; in, Indicates the first Individual samples, locations The predicted probability; Indicates the first Individual samples, locations The true label, Indicates the height and width of the image. Indicates the smoothing term. Indicates batch size.