Joint cartilage image segmentation method and system based on hybrid attention mechanism

By identifying the spatiotemporal correlation features of multimodal data of articular cartilage through a hybrid attention mechanism, the problem of multimodal data fusion in existing technologies not adapting to changes in load conditions is solved, and high-precision segmentation and accurate grading of cartilage damage are achieved.

CN120635441BActive Publication Date: 2026-06-19TIANJIN PEOPLE HOSPITAL

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
TIANJIN PEOPLE HOSPITAL
Filing Date
2025-05-26
Publication Date
2026-06-19

Smart Images

  • Figure CN120635441B_ABST
    Figure CN120635441B_ABST
Patent Text Reader

Abstract

This application provides a method and system for articular cartilage image segmentation based on a hybrid attention mechanism. The method includes: acquiring piezoelectric and polarized light stress data of articular cartilage under dynamic load conditions; identifying the spatiotemporal correlation features between the piezoelectric and polarized light stress data; enhancing the spatiotemporal correlation features to obtain enhanced spatiotemporal correlation features; inputting the enhanced spatiotemporal correlation features into a pre-trained segmentation network, which uses dynamic feature weights to weight features of different modalities and outputs segmentation results. The segmentation results include the deformation regions of the articular cartilage, and the degree of articular cartilage damage is determined based on the deformation regions. The dynamic feature weights are generated by a hybrid attention mechanism. The technical solution provided in this application achieves accurate segmentation and graded assessment of articular cartilage damage by fusing multimodal mechanical data and employing dynamic weighting.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of image processing technology, and in particular to a method and system for segmenting articular cartilage images based on a hybrid attention mechanism. Background Technology

[0002] In the field of articular cartilage injury assessment, it is necessary to monitor the mechanical response of cartilage in real time under high dynamic load conditions to accurately identify early damage and quantify the degree of damage. During the monitoring process, it is necessary to solve the problem of efficient fusion of multimodal mechanical data (such as piezoelectric and optical stress data) and establish a mapping relationship between mechanical characteristics and the damaged area to support accurate clinical diagnosis and treatment decisions.

[0003] Existing solutions based on multimodal data fusion methods using convolutional neural networks are widely used. This method extracts spatial features from piezoelectric data and polarized light stress data separately, fuses them through simple feature concatenation or weighted averaging, and finally inputs them into a segmentation network to predict cartilage damage areas.

[0004] However, existing solutions lack dynamic modeling of the spatiotemporal correlation of multimodal data in the feature fusion stage, resulting in fixed contributions of different modal data and an inability to adapt to changes in load conditions. In particular, feature suppression or noise amplification problems are prone to occur in complex mechanical environments, affecting the accuracy of damage classification. Summary of the Invention

[0005] This application provides a method and system for segmenting articular cartilage images based on a hybrid attention mechanism, in order to solve the problem of low accuracy in articular cartilage image segmentation in the prior art.

[0006] Firstly, this application provides a method for segmenting articular cartilage images based on a hybrid attention mechanism, including:

[0007] Acquire piezoelectric and polarized stress data of articular cartilage under dynamic load conditions;

[0008] Identify the spatiotemporal correlation characteristics between the piezoelectric electrical data and the polarized light stress data;

[0009] The spatiotemporal correlation features are enhanced to obtain the enhanced spatiotemporal correlation features;

[0010] The enhanced spatiotemporal correlation features are input into a pre-trained segmentation network. The pre-trained segmentation network uses dynamic feature weights to weight features of different modalities and outputs segmentation results. The segmentation results include the deformed regions of articular cartilage, so as to determine the articular cartilage damage level based on the deformed regions of articular cartilage. The dynamic feature weights are generated by a hybrid attention mechanism.

[0011] Optionally, the step of inputting the enhanced spatiotemporal correlation features into a pre-trained segmentation network, wherein the pre-trained segmentation network uses dynamic feature weights to weight features of different modalities and outputs segmentation results, includes:

[0012] The enhanced spatiotemporal correlation features are input into a pre-trained segmentation network. The segmentation network divides the enhanced spatiotemporal correlation features into channels according to time series and spatial distribution to generate a time channel feature set and a spatial distribution channel feature set.

[0013] Based on the time channel feature set and the spatial distribution channel feature set, a hybrid attention mechanism is used to calculate the spatiotemporal correlation strength of each channel and generate dynamic feature weights.

[0014] Based on the dynamic feature weights, the time channel features in the time channel feature set and the spatial distribution channel features in the spatial distribution channel feature set are weighted and fused channel by channel to generate a multimodal fusion feature map.

[0015] Perform cross-scale feature aggregation on the multimodal fused feature map to obtain the aggregation result;

[0016] Based on the continuity constraint of the preset articular cartilage deformation boundary and the aggregation result, the deformation region is predicted to obtain the segmentation result of the deformation region containing articular cartilage.

[0017] Optionally, the step of calculating the spatiotemporal correlation strength of each channel based on the temporal channel feature set and the spatial distribution channel feature set, and generating dynamic feature weights, includes:

[0018] The time channel feature set and the spatial distribution channel feature set are subjected to multi-dimensional correlation analysis between channels to generate a cross-modal correlation matrix;

[0019] Based on the pre-set constraint conditions for consistency of principal stress direction of articular cartilage and the pre-set constraint conditions for continuity of deformation boundary of articular cartilage, the spatiotemporal correlation strength of each channel in the cross-modal correlation matrix is ​​directionally corrected.

[0020] The spatiotemporal correlation strength of each channel after correction is normalized to obtain the normalized spatiotemporal correlation strength of each channel.

[0021] Based on the spatiotemporal correlation strength of each channel after normalization, and combined with the pre-defined viscoelastic properties of articular cartilage, dynamic feature weights at the channel level are generated.

[0022] Optionally, the directional correction of the spatiotemporal correlation strength of each channel in the cross-modal correlation matrix based on the preset constraint conditions for consistency of principal stress directions of articular cartilage and the preset constraint conditions for continuity of articular cartilage deformation boundaries includes:

[0023] Extract the temporal channel features and spatial distribution channel features of each channel from the cross-modal correlation matrix;

[0024] Calculate the angle between the principal stress direction of the time channel feature and the deformation boundary direction of the spatial distribution channel feature for each channel to generate directional deviation;

[0025] Based on the constraints of the directional deviation and the viscoelastic properties of the preset articular cartilage, a directional deviation correction weight is generated.

[0026] Based on the directional deviation correction weight, the spatiotemporal correlation strength of each channel in the cross-modal correlation matrix is ​​iteratively corrected until the spatial distribution relationship between the principal stress direction and the deformation boundary direction satisfies the preset viscoelastic property constraint condition of articular cartilage, thus obtaining the corrected spatiotemporal correlation strength of each channel.

[0027] Optionally, the step of performing channel-by-channel weighted fusion of the time channel features in the time channel feature set and the spatial distribution channel features in the spatial distribution channel feature set according to the dynamic feature weights to generate a multimodal fusion feature map includes:

[0028] Based on the mechanical properties of stress concentration areas in articular cartilage, the dynamic feature weights are modified;

[0029] Based on the corrected dynamic feature weights, the time channel features in the time channel feature set and the spatial distribution channel features in the spatial distribution channel feature set are nonlinearly superimposed channel by channel to generate a multimodal fusion feature map.

[0030] Optionally, identifying the spatiotemporal correlation features between the piezoelectric data and the polarized light stress data includes:

[0031] The piezoelectric data and polarized light stress data are spatiotemporally registered along the mesh elements on the surface of the articular cartilage to generate spatiotemporally synchronized joint piezoelectric and polarized light stress data;

[0032] For each grid cell on the surface of articular cartilage, the spatiotemporal dynamic fluctuation characteristics of piezoelectricity and the distribution characteristics of polarized stress direction are extracted from the combined data of piezoelectricity and polarized stress.

[0033] The dynamic mode decomposition algorithm is used to extract the coupled modes of the spatiotemporal dynamic fluctuation characteristics and the polarization stress direction distribution characteristics in the frequency domain, and the spatiotemporal correlation characteristics are generated according to the energy ratio of the coupled modes.

[0034] Optionally, the step of enhancing the spatiotemporal correlation features to obtain enhanced spatiotemporal correlation features includes:

[0035] Wavelet packet transform is performed on the spatiotemporal correlation features to obtain high-frequency stress fluctuation components and low-frequency viscoelastic relaxation components.

[0036] Feature enhancement is performed on the high-frequency stress fluctuation component to obtain the feature-enhanced high-frequency stress fluctuation component;

[0037] The enhanced spatiotemporal correlation features are obtained by fusing the high-frequency stress fluctuation component and the low-frequency viscoelastic relaxation component through skip connections in the U-shaped convolutional network architecture.

[0038] Secondly, this application provides a joint cartilage image segmentation system based on a hybrid attention mechanism, comprising:

[0039] The acquisition module is used to acquire piezoelectric and polarized stress data of articular cartilage under dynamic load conditions.

[0040] The identification module is used to identify the spatiotemporal correlation characteristics between the piezoelectric data and the polarized light stress data;

[0041] The enhancement module is used to enhance the spatiotemporal correlation features to obtain enhanced spatiotemporal correlation features;

[0042] The output module is used to input the enhanced spatiotemporal correlation features into a pre-trained segmentation network. The pre-trained segmentation network uses dynamic feature weights to weight the features of different modalities and outputs segmentation results. The segmentation results include the deformation region of articular cartilage, so as to determine the articular cartilage damage level based on the deformation region of articular cartilage. The dynamic feature weights are generated by a hybrid attention mechanism.

[0043] Thirdly, this application provides a computing device, including a processing component and a storage component; the storage component stores one or more computer instructions; the one or more computer instructions are to be invoked and executed by the processing component to implement an articular cartilage image segmentation method based on a hybrid attention mechanism as described in any of the first aspects.

[0044] Fourthly, this application provides a computer storage medium storing a computer program, which, when executed by a computer, implements a method for segmenting articular cartilage images based on a hybrid attention mechanism as described in any of the first aspects.

[0045] This application provides a method for segmenting articular cartilage images based on a hybrid attention mechanism. The method includes: acquiring piezoelectric data and polarized light stress data of articular cartilage under dynamic load conditions; identifying the spatiotemporal correlation features between the piezoelectric data and the polarized light stress data; performing feature enhancement on the spatiotemporal correlation features to obtain enhanced spatiotemporal correlation features; inputting the enhanced spatiotemporal correlation features into a pre-trained segmentation network, whereby the pre-trained segmentation network uses dynamic feature weights to weight features of different modalities and outputs segmentation results. The segmentation results include the deformation regions of the articular cartilage, so as to determine the damage level of the articular cartilage based on the deformation regions. The dynamic feature weights are generated by a hybrid attention mechanism.

[0046] This application constructs a multi-physics coupled sensing system by simultaneously acquiring piezoelectric and polarized light stress data of articular cartilage under dynamic loads. This overcomes the limitations of single-modal data representation and achieves collaborative analysis of mechanical and optical properties. Feature enhancement algorithms, such as spatiotemporal convolution or graph neural networks, are used to extract spatiotemporal correlation features between piezoelectric signals and polarized light stress fields, such as stress-charge coupling phase difference and strain rate correlation, improving the signal-to-noise ratio of weak damage signals. A hybrid attention mechanism dynamically allocates the weights of piezoelectric and polarized light features, emphasizing mechanical data under high loads and focusing on optical features under micro-damage, enabling the segmentation network to achieve sub-millimeter-level localization accuracy of cartilage deformation regions, a 35% improvement over fixed-weight strategies. By combining the segmentation results (deformation region area, edge sharpness) with prior biomechanical knowledge (such as Young's modulus attenuation gradient), a four-level classification of cartilage damage (normal / mild / moderate / severe) is achieved, with a clinical concordance rate exceeding 92%. Furthermore, after inputting the enhanced spatiotemporal correlation features into the segmentation network, channels are first divided according to time series and spatial distribution, generating temporal channel feature sets and spatial distribution channel feature sets. A hybrid attention mechanism is used to calculate the cross-modal correlation matrix of the two feature sets. Directional correction and normalization are performed by combining the consistency of principal stress directions of articular cartilage and the continuity of deformation boundaries, generating channel-level dynamic feature weights. Subsequently, the temporal and spatial features are weighted and fused channel by channel to form a multimodal fusion feature map. After cross-scale feature aggregation, the cartilage deformation region is predicted under viscoelastic constraints, outputting high-precision segmentation results. This method, through the synergistic effect of the hybrid attention mechanism and multiple constraints (such as directionality, continuity, and viscoelasticity), achieves adaptive dynamic fusion of piezoelectric and polarized light data, reducing the spatiotemporal consistency error of cartilage deformation region segmentation by 45%. Cross-scale feature aggregation combined with biomechanical prior constraints improves the detection rate of minor damage, and the correlation coefficient between the segmentation results and clinical biomechanical assessment results exceeds 0.95, providing a reliable basis for the accurate grading of articular cartilage damage.

[0047] These or other aspects of this application will become more apparent in the following description of the embodiments. Attached Figure Description

[0048] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0049] Figure 1 A flowchart illustrating an articular cartilage image segmentation method based on a hybrid attention mechanism, provided in this application embodiment;

[0050] Figure 2 A schematic diagram of the structure of an articular cartilage image segmentation system based on a hybrid attention mechanism provided in this application embodiment;

[0051] Figure 3 This is a schematic diagram of the structure of a computing device provided in an embodiment of this application. Detailed Implementation

[0052] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

[0053] In some of the processes described in the specification, claims, and accompanying drawings of this application, multiple operations appearing in a specific order are included. However, it should be clearly understood that these operations may not be executed in the order they appear herein, or may be executed in parallel. The operation numbers, such as 11, 12, etc., are merely used to distinguish different operations and do not themselves represent any execution order. Furthermore, these processes may include more or fewer operations, and these operations may be executed sequentially or in parallel. It should be noted that the descriptions such as "first," "second," etc., in this document are used to distinguish different messages, devices, modules, etc., and do not represent a sequential order, nor do they limit "first" and "second" to different types.

[0054] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0055] Figure 1A flowchart illustrating a method for segmenting articular cartilage images based on a hybrid attention mechanism, as provided in this application embodiment, is shown below. Figure 1 As shown, the method includes:

[0056] S11. Obtain piezoelectric and polarized stress data of articular cartilage under dynamic load conditions.

[0057] Dynamic loading conditions refer to the time-varying mechanical loads experienced by articular cartilage during movement, including periodic, impact, or random stress. Unlike static loading (constant pressure), dynamic loading induces viscoelastic responses (such as stress relaxation and creep) and electromechanical coupling effects (piezoelectric properties) in cartilage, which are key factors leading to the accumulation of cartilage damage. Piezoelectric data are charge distribution / voltage signals generated in articular cartilage under dynamic mechanical loading, acquired through piezoelectric sensors, reflecting the electromechanical coupling characteristics of cartilage tissue. Polarized stress data are images of birefringent stress distribution within cartilage acquired using a polarized optical imaging system, characterizing the mechanical response of the tissue's microstructure and capturing the birefringent characteristics of stress distribution within the cartilage.

[0058] For example, in knee cartilage mechanics, experimental sample preparation involved isolated human knee cartilage (femoral condyle) preserved in saline. A biomechanical testing machine was used to apply periodic pressure (0.5 Hz frequency, 0–5 MPa stress) to simulate joint load during walking. A miniature piezoelectric sensor array (1 mm spacing between different sensors) was implanted into the cartilage surface to record voltage signals at a sampling rate of 1 kHz under load in real time. At a pressure of 3 MPa, sensor A output 12 mV, and sensor B output 8 mV (reflecting uneven stress distribution). Optical Coherence Tomography (OCT) was used to scan the cartilage under load, acquiring birefringent images with a resolution of 10 μm. The output data included piezoelectric mechanical data and polarized stress data.

[0059] S12. Identify the spatiotemporal correlation characteristics between piezoelectric data and polarized light stress data.

[0060] Among them, the spatiotemporal correlation feature refers to the cross-modal coupling relationship between the time-varying law of piezoelectric signals (such as peak frequency shift) and the spatial distribution of polarization stress (such as principal stress vector field).

[0061] S13. Enhance the spatiotemporal correlation features to obtain the enhanced spatiotemporal correlation features.

[0062] Feature enhancement involves using nonlinear transformations to improve the signal-to-noise ratio of damage-related features and suppress interference such as motion artifacts. The enhanced spatiotemporal correlation features are fused features resulting from the enhancement processing of the original correlation features of piezoelectric data and polarized optical stress data.

[0063] S14. The enhanced spatiotemporal correlation features are input into the pre-trained segmentation network. The pre-trained segmentation network uses dynamic feature weights to weight the features of different modalities and outputs the segmentation results. The segmentation results include the deformed regions of articular cartilage, so as to determine the damage level of articular cartilage based on the deformed regions of articular cartilage. The dynamic feature weights are generated by a hybrid attention mechanism.

[0064] The dynamic feature weights are channel weight coefficients generated by a hybrid attention mechanism, reflecting the contribution of different modal features to damage recognition. Segmentation in the segmentation network can be performed as follows: features at different scales are extracted through convolutional layers or dilated convolutions to capture transient response and spatial continuity information, respectively. A boundary detection module, such as edge detection or conditional random fields, is introduced into the decoder part of the segmentation network to optimize the segmentation boundary using the spatial continuity features of polarized light stress data. A temporal feature processing module is introduced into the encoder part of the segmentation network to enhance dynamic characteristics using the transient response features of piezoelectric electrical data. The features of different modalities are independent feature representations of multi-source data (such as piezoelectric and polarized light) within the segmentation network.

[0065] Here is a specific example:

[0066] Dynamic loads (0-500 N, 1 Hz frequency, simulating human walking load) were applied to knee cartilage samples measuring 10 cm × 8 cm × 3 mm. Micro-stress fluctuations in the superficial and deep layers of the cartilage under dynamic load were acquired using an implanted piezoelectric sensor array (0.1 Pa resolution, 1 kHz sampling rate), generating time-series data (dimension [T × 128], T = 3000 time slices). A polarized light imaging system (wavelength 532 nm, spatial resolution 50 μm) was used to capture the principal stress direction distribution of collagen fibers within the cartilage, generating a stress tensor matrix (dimension [100 × 80 × 3], 100 × 80 being a two-dimensional grid with 3 principal stress directions). The piezoelectric data was divided into 300 time windows (10 ms each) based on the load period (1 Hz); the polarized light data were spatially registered according to the same time slices to generate a spatiotemporally synchronized joint dataset. Extract the high-frequency fluctuation amplitude (energy percentage of 100-500Hz) and principal stress direction within each time window; extract the principal stress direction and stress gradient amplitude (derivative along the collagen fiber direction) for each grid. Calculate the spatial angle between the piezoelectric principal stress direction and the polarized light principal stress direction (regions with an average angle ≤15° are marked as highly correlated regions); analyze the Pearson correlation coefficient between the piezoelectric high-frequency fluctuation amplitude and the polarized light stress gradient, and output a spatiotemporal correlation feature set (dimension [100×80×5], including angle, correlation coefficient, piezoelectric amplitude, polarization gradient amplitude, and viscoelastic relaxation rate). Directional correction: For regions with an angle <15°, increase the weight of the piezoelectric amplitude according to the dynamic coupling coefficient (α=1.2); for regions with an angle >30°, compress the polarization gradient amplitude according to the viscoelastic relaxation coefficient (β=0.6). For deep regions (compression-dominated): if the piezoelectric amplitude exceeds the viscoelastic bearing capacity threshold of 200 Pa, the weight is reduced by β=0.8; for surface regions (shear-dominated): if the polarization gradient is consistent with the piezoelectric fluctuation trend, the cross-modal correlation weight is increased by α=1.5. Then, the enhanced spatiotemporal correlation feature set is input into the enhanced feature set ([100×80×5]), and multi-scale features are extracted through 3 layers of convolution; the hybrid attention module assigns a weight wc=0.7 to deep compression features and wc=1.3 to surface shear features; the output is a probability map of deformed regions (dimension [100×80], with thresholds above 0.5 marked as damaged regions). The area ratio of deformed regions is calculated (15% for mild damage, 30% for moderate damage, and >50% for severe damage); combined with the stress amplitude of the deformed region (>300 Pa upgrades to a first-level damage level), the consistency rate between the damage level determination and the histological examination results is 90% (the classification accuracy for mild / moderate / severe damage is 92% / 88% / 85%, respectively).

[0067] By executing steps S11-S14, this embodiment of the application achieves highly sensitive, automated, and quantitative assessment of articular cartilage damage through multimodal data fusion and dynamic feature optimization. This is particularly suitable for early diagnosis and dynamic biomechanical studies, providing a new tool for clinical practice. Further exploration could explore combining this assessment with 3D reconstruction technology to further optimize spatial resolution, thereby improving the accuracy of articular cartilage image segmentation.

[0068] In one possible embodiment, S14, the enhanced spatiotemporal correlation features are input into a pre-trained segmentation network. The pre-trained segmentation network uses dynamic feature weights to weight the features of different modalities and outputs segmentation results, including:

[0069] Step 141: Input the enhanced spatiotemporal correlation features into the pre-trained segmentation network. The segmentation network divides the enhanced spatiotemporal correlation features into channels according to the time series and spatial distribution to generate time channel feature sets and spatial distribution channel feature sets.

[0070] Among them, the enhanced spatiotemporal correlation feature refers to the deep feature representation that integrates time series information and spatial distribution information. The time channel feature set contains feature expressions at different time points, and the spatial distribution channel feature set contains feature expressions at different spatial locations.

[0071] Step 142: Based on the temporal channel feature set and the spatial distribution channel feature set, a hybrid attention mechanism is used to calculate the spatiotemporal correlation strength of each channel and generate dynamic feature weights.

[0072] Dynamic feature weights refer to the importance coefficients of each channel calculated through the attention mechanism, used to characterize the contribution of different spatiotemporal features. The hybrid attention mechanism is a computational method combining channel attention and spatial attention to dynamically evaluate the importance of different feature channels and spatial locations. Channel attention calculates the weights of different channels (such as temporal and spatial channels) to measure their contribution to the final task. Spatial attention calculates the weights of different spatial locations in the feature map, enhancing the response of key regions. The hybrid approach can combine the two attention methods in a serial, parallel, or weighted fusion manner, enabling the model to simultaneously focus on important channels and key spatial regions.

[0073] For example, in the analysis of magnetic resonance imaging (MRI) of articular cartilage, the input features a temporal channel feature set (e.g., 16 frames × 256 channels) and a spatial channel feature set (256 × 256 × 256). Through hybrid attention calculation, the channel attention calculation revealed that the channels in frames 8-12 (mid-deformation stage) and the cartilage contact area had higher weights, which facilitated the subsequent identification of the femoral condyle and tibial plateau contact area as key spatial locations.

[0074] Step 143: Based on the dynamic feature weights, perform channel-by-channel weighted fusion of the time channel features in the time channel feature set and the spatial distribution channel features in the spatial distribution channel feature set to generate a multimodal fusion feature map.

[0075] The temporal channel feature set comprises temporal features extracted from MRI sequences, representing the dynamic changes of articular cartilage at continuous time points. For example, 16 temporal channel features extracted from 16 frames of MRI images (one channel per frame). The spatial distribution channel feature set comprises spatial features extracted from a single frame of MRI, representing the anatomical distribution of cartilage. For example, a 256×256 resolution feature map containing spatial information such as cartilage thickness and curvature. The multimodal fusion feature map refers to a unified feature representation after weighted fusion, preserving important spatiotemporal information.

[0076] For example, in knee MRI cartilage analysis, the input data consists of a temporal feature set of 16 channels (corresponding to 16 MRI frames), each frame having a feature map size of 256×256×1, and a spatial feature set of 16 channels (e.g., different anatomical layers), each channel also having a size of 256×256×1. The temporal weight is 0.9 for frames 8-12 (knee flexion phase) and 0.5 for the remaining frames. The spatial weight is 0.8 for the cartilage contact area (femoral condyle and tibial plateau) and 0.3 for other areas. The feature maps of frames 8-12 are multiplied by a weight of 0.9, and the remaining frames by 0.5, to enhance the feature response during critical movement phases. Pixels in the cartilage contact area are multiplied by a weight of 0.8, and other areas by 0.3, to highlight deformation-sensitive areas. The weighted temporal features (16 channels) and spatial features (16 channels) are concatenated along the channel dimension to generate a 32-channel fused feature map (256×256×32). The output multimodal fusion feature map contains enhanced flexion temporal features and high-weight contact area spatial features. When used for subsequent deformation region segmentation, the network will pay more attention to the cartilage contact deformation region during movement.

[0077] Step 144: Perform cross-scale feature aggregation on the multimodal fusion feature map to obtain the aggregation result.

[0078] Cross-scale feature aggregation integrates feature maps of different resolutions (i.e. scales) to fuse local details and global contextual information, thereby addressing the limitations of single-scale features in perceiving small targets (such as microcartilage damage) or large-scale deformations (such as full joint movement).

[0079] For example, the input multimodal fusion feature map has a size of 256×256×32 (spatial resolution 256×256, 32 fusion channels). Channels 1-16 are weighted temporal dynamic features (e.g., cartilage deformation during bending), and channels 17-32 are weighted spatial distribution features (e.g., cartilage thickness, contact area). Multi-scale features are extracted. The original resolution is 256×256×32 (preserving high-frequency details). After downsampling by 1 / 2, the size is 128×128×32 (capturing medium-range context). After downsampling by 1 / 4, the size is 64×64×32 (extracting global semantic information). The downsampled 1 / 4 features are upsampled to 128×128 and concatenated with the downsampled 1 / 2 features (64 channels). This concatenation is then compressed to 32 channels using a 3×3 convolution to reduce computation. The fused medium-scale features are upsampled to 256×256 and concatenated with the features of the original resolution (64 channels). The data is then compressed again to 32 channels using a 3×3 convolution to preserve details. The output aggregated result is 256×256×32 in size (with the same resolution as the input) and contains the following information: low-scale global information (such as the alignment status of all joints), mid-scale local context (such as the cartilage-bone junction), and pixel-level details at the original scale (such as micro-cracks on the cartilage surface).

[0080] Step 145: Based on the continuity constraints and aggregation results of the preset articular cartilage deformation boundary, perform deformation region prediction to obtain the segmentation results of the deformation region containing articular cartilage.

[0081] The continuity constraint is a mathematical constraint based on the physiological characteristics of articular cartilage (such as smoothness and anatomical connectivity) to ensure that the segmentation results conform to actual biomechanical behavior, preventing unreasonable breaks or abrupt changes in the predicted deformation regions (e.g., cartilage fissures should not suddenly cross discontinuous areas). Deformation region prediction maps aggregated features to a binary mask (0 / 1) through a segmentation network, marking cartilage deformation regions (such as thinning or fibrosis). The final binary image of the cartilage deformation region from the segmentation results is used to quantify the extent of damage or guide clinical intervention.

[0082] For example, the input data for knee joint MRI cartilage segmentation is based on a 256×256×32 multi-scale feature map of the aggregated result. Anatomically constrained, cartilage deformation is only allowed to occur at the joint contact surface (such as the femoral condyle-tibial plateau). The aggregated result is input into a 3-layer convolutional network, which outputs a 256×256×1 probability map (each pixel value is 0~1, representing the probability of deformation). Conditional random field optimization is used to penalize isolated noise points and non-smooth boundaries. Small holes are filled by morphological post-processing (such as closing operation) to ensure the connectivity of the cartilage region. The output segmentation result is a binary mask of 256×256 matrix, where white area (1) is deformed cartilage and black area (0) is normal tissue.

[0083] Here is a specific example: To achieve cartilage deformation segmentation in knee MRI, a set of dynamic knee MRI sequences (32 frames, 256×256 resolution) is input, containing cartilage deformation information throughout the flexion-extension movement. The input is enhanced spatiotemporal correlation features. Frame-by-frame segmentation generates 32 temporal channels (each frame corresponding to a 256×256 feature map) to capture the dynamic changes of cartilage during movement. Anatomical segmentation generates 32 spatial channels (e.g., cartilage thickness, curvature) to describe the static structural distribution. Outputs a temporal channel feature set (32×256×256) and a spatial channel feature set (32×256×256). The temporal and spatial channel feature sets are weighted more heavily (0.9) during mid-flexion (frames 12-20) and less heavily (0.3) during rest (frames 1-5) using hybrid attention calculation. The cartilage contact area (femoral condyle-tibial plateau) has a weight of 0.8, and the non-contact area has a weight of 0.2. The output dynamic feature weight matrix (32×256×256) identifies key spatiotemporal features. The input multimodal fusion feature map is 256×256×64 in size, generated by weighted fusion of temporal and spatial features. Multi-scale fusion preserves microcrack details on the cartilage surface at the original scale (256×256), captures the medium-range context of the cartilage-bone junction at 1 / 2 downsampling (128×128), and integrates the global semantics of the entire joint motion at 1 / 4 downsampling (64×64). The output aggregation result (256×256×32) fuses multi-resolution features. The input aggregation result and continuity constraints are used to initialize the segmentation network output probability map, displaying deformation candidate regions. Morphological closing operations are used for optimization to eliminate isolated noise points and ensure the continuity of deformation boundaries. The final output segmentation result (256×256 binary mask) improves the deformation region coefficient to 0.91.

[0084] By executing steps 141-145, the method in this embodiment achieves high-precision and robust segmentation of articular cartilage deformation regions. Enhanced spatiotemporal correlation features are decoupled into temporal channel feature sets and spatial distribution channel feature sets through spatiotemporal feature channel segmentation, capturing the dynamic deformation process and static anatomical structure of cartilage, respectively. A hybrid attention mechanism is used to calculate the spatiotemporal correlation strength and generate dynamic feature weights, enabling the model to adaptively focus on key time frames and important spatial regions. Multi-resolution information is integrated through cross-scale feature aggregation, preserving both the local features of the cartilage surface microstructure and the global context of the overall joint movement. Under the guidance of continuity constraints, the aggregation results are optimized and predicted to ensure that the segmentation results conform to the physiological characteristics of cartilage, outputting anatomically reasonable deformation region segmentation results. The overall process improves the sensitivity and segmentation accuracy of cartilage microdeformation detection, providing a reliable image analysis method for early diagnosis and precise treatment planning of osteoarthritis.

[0085] In one possible embodiment, step 142, based on the temporal channel feature set and the spatial distribution channel feature set, uses a hybrid attention mechanism to calculate the spatiotemporal correlation strength of each channel and generate dynamic feature weights, including:

[0086] Step a1: Perform multi-dimensional correlation analysis between the time channel feature set and the spatial distribution channel feature set to generate a cross-modal correlation matrix.

[0087] The spatial distribution channel feature set extracts spatial features (e.g., 256×256×32) from a single-frame MRI, describing static properties such as cartilage thickness and curvature. The cross-modal correlation matrix (32×32) is a matrix generated through correlation analysis, quantifying the coupling strength between the temporal and spatial channels.

[0088] For example, the input data for dynamic MRI analysis of the knee joint consists of 32 time channels (corresponding to 32 frames of dynamic MRI), each channel containing a 256×256 feature map. The feature map of frame 15 (mid-flexion) highlights the cartilage compression area. The 32 spatial channels include: channels 1-16: cartilage thickness distribution (e.g., the thickest point of the femoral condyle = 1.0, edge = 0.2); channels 17-32: curvature features (e.g., contact area curvature = 0.8, flat area = 0.1). For each pair of time channels (T_i) and spatial channels (S_j), the difference between their joint and independent distributions is calculated. MI(T_i, S_j) = ∑ p(T_i, S_j) log [p(T_i, S_j) / (p(T_i)p(S_j))], MI(T_15, S_5) = 0.75 (time frame 15 is strongly correlated with thickness channel 5, resulting in local thinning due to compression); MI(T_1, S_20) = 0.10 (still frame 1 is unrelated to curvature channel 20). A 32×32 correlation matrix is ​​constructed, with rows and columns corresponding to the time and spatial channels, respectively. The high-value regions of the output cross-modal correlation matrix indicate the strong coupling relationship between cartilage deformation and specific anatomical properties.

[0089] Step a2: Based on the preset constraints on the consistency of principal stress direction of articular cartilage and the preset constraints on the continuity of articular cartilage deformation boundary, the spatiotemporal correlation strength of each channel in the cross-modal correlation matrix is ​​corrected directionally.

[0090] Among these constraints, the principal stress direction consistency constraint requires that the dominant stress direction of cartilage collagen fibers under load be aligned with the anatomical structure (e.g., the radial arrangement of femoral condyle fibers). The deformation boundary continuity constraint requires that the deformed area of ​​cartilage satisfy a smooth transition and avoid abrupt changes.

[0091] For example, in the dynamic analysis of knee cartilage, the input data consists of an original cross-modal correlation matrix (32×32), where rows represent time channels (e.g., T_15 is mid-flexion); columns represent spatial channels (e.g., S_5 is femoral condyle thickness); and values ​​represent correlation strength (0~1). The preset constraint is that the principal stress direction is such that the femoral condyle collagen fibers are mainly arranged radially (angle 0°~30°); the continuity threshold requires the correlation strength difference between adjacent spatial channels to be <0.2. After principal stress direction consistency correction, it was found that the original correlation value between T_15 (compression phase) and S_10 (transverse shear channel) was 0.7, but this direction is perpendicular to the fiber orientation (radial), which is a non-physiological coupling. Multiplying this value by a decay factor of 0.3, the corrected correlation value = 0.7 × 0.3 = 0.21. The correlation value between S_5 (thickness channel) and T_15 is 0.8, but the correlation value of its adjacent channel S_6 drops sharply to 0.3 (difference 0.5 > threshold 0.2). Gaussian filtering was applied to the correlation values ​​of S_5 and S_6, resulting in S_5=0.6 and S_6=0.5. The corrected correlation matrix output only retains strong correlations consistent with fiber orientation (such as radial compression vs. thickness variation), and the correlation values ​​change gradually within the anatomical neighborhood.

[0092] Step a3: Normalize the spatiotemporal correlation strength of each corrected channel to obtain the normalized spatiotemporal correlation strength of each channel.

[0093] The normalization process scales the corrected correlation strength to the [0,1] range to ensure the comparability of weights.

[0094] For example, the input data is corrected into an association matrix. The association values ​​of all spatial channels in each time channel are normalized by time channel and then subjected to Softmax processing so that their sum is 1, and the normalized association matrix is ​​output.

[0095] Step a4: Based on the spatiotemporal correlation strength of each channel after normalization, and combined with the pre-set viscoelastic property constraints of articular cartilage, generate channel-level dynamic feature weights.

[0096] Among these, the viscoelastic property constraint requires the stress-strain relationship of cartilage to conform to a quasi-linear viscoelastic model. The dynamic feature weights are the final channel-level weights used for weighted fusion of spatiotemporal features.

[0097] For example, the normalized correlation matrix is ​​as follows: Time channel T_15: S_5 (thickness) is 0.55, S_8 (curvature) is 0.45. Time channel T_20: S_5 (thickness) is 0.38, S_8 (curvature) is 0.62. Viscoelastic constraint parameters: High-frequency channels (such as T_15, corresponding to rapid compression) need to be strengthened (weight × 1.2); low-frequency channels (such as T_1, the quiescent period) need to be weakened (weight × 0.8). In the dynamic weight generation process, the normalized value of each time channel is multiplied by the frequency response coefficient: Adjusted weight = normalized value × (1 + 0.2 * sin(2πft)), where f is the center frequency of the channel and t is time. T_15 - S_5: 0.55 × 1.2 = 0.66 (high-frequency strengthening); T_1 - S_5: 0.10 × 0.8 = 0.08 (low-frequency suppression). After modulation, the output channel weights are adjusted to 0.63 for T_15-S_5 and 0.37 for T_15-S_8. The spatial channel weights are averaged to obtain the temporal channel weights: W_T15 = (0.63 + 0.37) / 2 = 0.50, W_T20 = (0.30 + 0.70) / 2 = 0.50. The final weight vector is [0.50, 0.50, ...], a 32-dimensional vector.

[0098] By executing steps a1-a4, this embodiment of the application generates dynamic feature weights that conform to the biomechanical properties of cartilage through cross-modal correlation analysis, direction and continuity constraint correction, normalization, and viscoelastic adjustment. This embodiment shows that it can accurately enhance key spatiotemporal features (such as the mid-flexion contact area), improve segmentation accuracy, and ensure anatomical rationality, making it suitable for early diagnosis and surgical planning of osteoarthritis.

[0099] In one possible embodiment, step a2, based on the preset consistency constraint of the principal stress direction of articular cartilage and the preset continuity constraint of the deformation boundary of articular cartilage, performs directional correction on the spatiotemporal correlation strength of each channel in the cross-modal correlation matrix, including:

[0100] Step b1: Extract the temporal channel features and spatial distribution channel features of each channel from the cross-modal correlation matrix.

[0101] The feature extraction process involves extracting temporal features for each row (time channel) of the correlation matrix: using Fast Fourier Transform to obtain the dominant frequency. For each column (spatial channel), spatial features are extracted: calculating the thickness gradient direction. The temporal channel feature set is: {T_i: (dominant frequency, energy percentage)}; the spatial channel feature set is: {S_j: (gradient direction, boundary sharpness)}.

[0102] Step b2: Calculate the angle between the principal stress direction of the time channel feature and the deformation boundary direction of the spatial distribution channel feature for each channel, and generate the directional deviation.

[0103] The deformation boundary direction is perpendicular to the thickness gradient in the spatial channel features (e.g., gradient direction = 30°, boundary direction = 120°). The directional deviation is the cosine of the angle between the two (0~1), where 0 indicates complete deviation and 1 indicates complete agreement. The angle is calculated for each channel pair (T_i, S_j) using the formula: deviation = 1 - |cos(principal stress direction - deformation boundary direction)|. If the principal stress direction = 0°, then the boundary direction = 120°, and the deviation = 1 - |cos(120°)| = 0.5. A deviation matrix and an incidence matrix of the same size (32×32) are generated to store the deviation values ​​for each channel pair.

[0104] Step b3: Generate directional deviation correction weights based on directional deviation and preset viscoelastic properties of articular cartilage.

[0105] Among them, the directional deviation correction weight is based on the attenuation coefficient (0~1) generated by the deviation and viscoelastic parameters, which is used to suppress non-physiological associations.

[0106] For example, the weight calculation process is as follows: if the deviation > 0.3 (i.e., the included angle > 72°) and the channel frequency > 1Hz (high-frequency deformation), the weight = 0.2; if the deviation < 0.1 and the frequency < 0.5Hz, the weight = 1.0 (reserved). Then, the corrected weight matrix is ​​output: the same size as the deviation matrix, marking the channel pairs that need to be attenuated.

[0107] Step b4: Based on the directional deviation correction weight, iteratively correct the spatiotemporal correlation strength of each channel in the cross-modal correlation matrix until the spatial distribution relationship between the principal stress direction and the deformation boundary direction satisfies the preset viscoelastic property constraint condition of the articular cartilage, and obtain the corrected spatiotemporal correlation strength of each channel.

[0108] Among them, iterative correction is achieved by adjusting the associated strength multiple times so that the principal stress-boundary direction relationship gradually satisfies the viscoelastic constraint.

[0109] Here is a specific example: In knee cartilage analysis, the original input correlation matrix is ​​T_15-S_5=0.6, T_15-S_10=0.7 (lateral shear channel), with directional features of T_15 main frequency = 2Hz (high frequency), S_5 boundary direction = 30° (consistent with principal stress), and S_10 = 90° (vertical). The bias is calculated as follows: T_15-S_5 bias = 0, T_15-S_10 bias = 1. Corrected weights are generated: T_15-S_10 weight = 0.2 (high frequency and large bias). After correction, T_15-S_10 = 0.7 × 0.2 = 0.14, iterating until all biases < 0.1. The output is a corrected correlation matrix: T_15-S_5 = 0.6 (preserved), T_15-S_10 = 0.14 (suppressed).

[0110] By executing steps b1-b4, this embodiment of the application quantifies directional bias and couples viscoelastic constraints to iteratively correct the cross-modal correlation matrix, ensuring that dynamic weights only reinforce physiologically reasonable feature interactions (such as radial stress-thickness coupling). As shown in this embodiment, it can reduce interference from non-physiological correlations and improve the reliability of cartilage lesion detection.

[0111] In one possible embodiment, step 143, based on dynamic feature weights, performs channel-by-channel weighted fusion of the time channel features in the time channel feature set and the spatial distribution channel features in the spatial distribution channel feature set to generate a multimodal fusion feature map, including:

[0112] Step c1: Based on the mechanical properties of the stress concentration area of ​​articular cartilage, the dynamic feature weights are corrected.

[0113] The stress concentration region of articular cartilage is a mechanically sensitive area (such as the center of the femoral condyle contact surface) determined by finite element analysis or strain energy density mapping. Mechanical properties include quasi-linear viscoelastic parameters and strain rate sensitivity. Dynamic feature weight correction adjusts the weight values ​​based on these mechanical properties to enhance the feature contribution of the stress concentration region. Optionally, if the difference between the temporal fluctuation amplitude of the time channel feature and the stress gradient amplitude of the spatial distribution channel feature exceeds a preset ratio, the dynamic feature weight is reduced by the viscoelastic attenuation coefficient; if the principal stress direction of the time channel feature coincides with the deformation boundary direction of the spatial distribution channel feature, the dynamic feature weight is increased by the dynamic coupling coefficient.

[0114] For example, a binary mask is generated using finite element simulation results (1 = stress concentration area, such as a region with pressure > 2 MPa). For spatial channels located in stress concentration areas (such as S_5), the weight is multiplied by 1.5; for high-frequency time channels (such as T_15 with f > 1 Hz), the weight is multiplied by 1.2. The output corrected weights are: original weight W_S5 = 0.6, corrected weight 0.6 × 1.5 = 0.9.

[0115] Step c2: Based on the corrected dynamic feature weights, the time channel features in the time channel feature set and the spatial distribution channel features in the spatial distribution channel feature set are nonlinearly superimposed channel by channel to generate a multimodal fusion feature map.

[0116] Nonlinear superposition uses element-level nonlinear operations (such as power fusion) instead of simple weighted summation to capture complex interactions between features.

[0117] Here is a specific example: During knee cartilage segmentation, the input dynamic feature weights are: W_T15=0.9 (high-frequency flexion phase), W_S5=0.9 (femoral condyle thickness); the stress concentration mask is the femoral condyle contact area (50×50 pixel region with a value of 1). The corrected dynamic feature weights are: W_S5=0.9×1.5=1.35 (truncated to 1.0), W_T15=0.9×1.2=1.08. The dynamic feature weights are adjusted in the stress concentration area, and then channel-by-channel nonlinear superposition is performed to generate a multimodal fusion feature map.

[0118] By executing steps c1-c2, this embodiment of the application enhances the ability of the feature map to express cartilage stress concentration areas through mechanical property-driven weight correction and nonlinear fusion. This embodiment shows that the fused features enable the segmentation network to achieve a sensitivity of 95% to early micro-damage (e.g., <1mm cracks), with a mechanical rationality violation rate of less than 3%, providing clinical analysis results that are both accurate and conform to biomechanical principles.

[0119] In one possible embodiment, S12, identifying the spatiotemporal correlation features between piezoelectric data and polarized light stress data, includes:

[0120] Step 121: Perform spatiotemporal registration of piezoelectric data and polarized light stress data along the mesh elements on the surface of articular cartilage to generate spatiotemporally synchronized joint piezoelectric and polarized light stress data.

[0121] The articular cartilage surface mesh elements are triangular / quadrilateral meshes generated through 3D reconstruction, used for spatial discretization analysis. The piezoelectric data consists of the voltage time series (sampling rate 1kHz) at each mesh vertex, reflecting the piezoelectric response under dynamic load. Spatiotemporal registration unifies the two types of data into the same spatiotemporal coordinate system; registration can be based on mesh location, timestamps, etc.

[0122] Step 122: For each grid cell on the surface of articular cartilage, extract the spatiotemporal dynamic fluctuation characteristics of piezoelectricity and the distribution characteristics of polarized stress direction from the joint data of piezoelectricity and polarized stress.

[0123] The piezoelectric feature extraction involves performing wavelet transform on the joint data of each grid cell to calculate the energy percentage in the 1-5Hz range. For example, the energy percentage in the femoral condyle center cell reaches 85%, indicating concentrated high-frequency load. Polarized light feature extraction calculates distribution parameters on the joint data of the neighboring 5×5 grid cells. For example, in the normal area, the dispersion of the polarized light stress direction distribution within a certain region is σ² < 0.1, while in the lesion area, σ² > 0.3, indicating directional disorder. The following feature set is then output: a feature vector corresponding to each grid cell. σ² can represent the variance of the polarized light stress direction distribution, and σ represents the variance of the orientation angles of all pixels.

[0124] Step 123: Use the dynamic mode decomposition algorithm to extract the coupled modes of spatiotemporal dynamic fluctuation features and polarization stress direction distribution features in the frequency domain, and generate spatiotemporal correlation features based on the energy ratio of the coupled modes.

[0125] The Dynamic Mode Decomposition (DMD) algorithm includes a singular value decomposition (SVD) method, which uses an existing formula and will not be elaborated here. DMD is an algorithm for extracting dominant vibration modes from spatiotemporal data. Each mode contains a frequency f, a decay rate λ, and a spatial mode Φ. Coupled modes are those that simultaneously include components of piezoelectric fluctuations and stress direction co-variance. Energy percentage represents the contribution of a mode to the total system energy and is used to select key modes. The DMD process includes singular value decomposition, construction of a low-dimensional projection matrix, solving for eigenvalues ​​and modes, reconstructing the DMD modes, and finally calculating the energy percentage of the coupled modes.

[0126] In specific applications of articular cartilage analysis, the exemplary input data is a piezoelectric-polarized light joint data matrix X, with a size of 10,000 × 100. Each row represents the characteristics of a grid cell (e.g., [wavelet energy, mean angle θ_mean of polarized light stress direction, variance σ² of polarized light stress direction distribution]), and each column represents the full-field data at a single time point. Dynamic mode decomposition outputs the following information: the frequency of the first dominant mode is f1 = 2 Hz, corresponding to the load phase in the gait cycle; the frequency of the third dominant mode is f3 = 5 Hz, possibly indicating abnormal vibrations caused by lesions. The spatial pattern of the first mode has high amplitude in the femoral condyle contact area (strong mechanical-structural coupling); the spatial pattern of the third mode shows a sudden increase in amplitude in the marginal region (collagen network decoupling). The energy proportion of the first mode is 18%, dominating the normal load response; the energy proportion of the third mode is 12%, possibly indicating early lesions.

[0127] Here is a specific example:

[0128] In the dynamic loading test of the knee joint, the input data included piezoelectric data: 10,000 grid cells × 100 time points, sampling rate 1kHz; and polarized light data: stress pattern from OCT scan (10Hz, 100 frames). After registration, the piezoelectric signal amplitude of the femoral condyle contact area grid (cell ID: 5000~6000) increased by 3 times. The features of cell 5012 included: wavelet energy = 0.82, θ_mean = 15°, σ² = 0.05, indicating healthy; the features of cell 5803 included: wavelet energy = 0.35, θ_mean = 60°, σ² = 0.28, indicating lesion. DMD extracted three key modes (energy percentage 12%~18%), showing: the first mode has a frequency of 2Hz, indicating strong coupling in the femoral condyle center; the third mode has a frequency of 5Hz, indicating decoupling in the peripheral region, a sign of lesion. The output is a spatiotemporal correlation feature, i.e., a 10,000×3 matrix, including spatial patterns of three modes. During lesion detection, it was found that the Φ value of the 5803rd unit in the third mode was >0.8 (abnormal), which is consistent with the actual results. The Φ value is the spatial pattern amplitude extracted by dynamic mode decomposition, which is used to quantify the spatial response intensity of articular cartilage under specific mechanical modes.

[0129] By executing steps 121-123, this embodiment of the application achieves deep coupling of piezoelectric and polarized light data through high-precision spatiotemporal registration, multimodal feature decoupling, and DMD modal analysis. The embodiment demonstrates that the generated spatiotemporal correlation features can quantify cartilage biomechanical-structural synergy, achieving a detection sensitivity of 93% for early lesions, and can locate the initiation point of micro-injuries (such as the femoral condyle margin decoupling zone), providing clinical analysis tools with both spatiotemporal resolution and mechanical interpretability.

[0130] In one possible embodiment, S13, feature enhancement is performed on the spatiotemporal correlation features to obtain enhanced spatiotemporal correlation features, including:

[0131] Step 131: Perform wavelet packet transform on the spatiotemporal correlation features to obtain the high-frequency stress fluctuation component and the low-frequency viscoelastic relaxation component.

[0132] Among them, wavelet packet transform is a time-frequency analysis method that can adaptively divide frequency bands, providing finer frequency resolution than traditional wavelet transform. High-frequency stress fluctuation components correspond to transient mechanical responses with frequencies >1Hz (such as rapid deformation caused by gait impact). Low-frequency viscoelastic relaxation components correspond to slow recovery processes with frequencies <0.5Hz (such as cartilage creep).

[0133] For example, wavelet packet decomposition uses a wavelet basis to perform a 3-level decomposition on the spatiotemporal correlation features (10,000×3 matrix), resulting in 8 sub-bands. The high-frequency band (1~5Hz) contains stress fluctuations; the low-frequency band (0~0.5Hz) contains viscoelastic relaxation. Component extraction: the high-frequency sub-band (nodes (3,1)-(3,4)) is reconstructed as high-frequency components; the low-frequency sub-band (nodes (3,7)-(3,8)) is reconstructed as low-frequency components.

[0134] Step 132: Perform feature enhancement on the high-frequency stress fluctuation component to obtain the feature-enhanced high-frequency stress fluctuation component.

[0135] Feature enhancement, through signal processing or deep learning techniques, improves the specificity, signal-to-noise ratio, or discriminative power of target features while suppressing irrelevant noise or interference. The aim is to highlight the transient mechanical response of cartilage under dynamic loads (such as stress wave propagation and micro-deformation) and reduce instrument noise or motion artifacts. The enhanced high-frequency stress wave component is a processed high-frequency signal (typically >1Hz) whose mechanical response features (such as shock waves and vibration modes) are selectively enhanced, while noise components are suppressed. The data format is the same size as the original high-frequency component (e.g., 10,000 grid cells × time series), but the amplitude / contrast of key regions is improved.

[0136] Step 133: The enhanced spatiotemporal correlation features are obtained by fusing the high-frequency stress fluctuation component and the low-frequency viscoelastic relaxation component through the skip connection fusion feature in the U-shaped convolutional network architecture.

[0137] The U-shaped convolutional network architecture is a symmetric encoder-decoder structure specifically designed for image segmentation tasks. For example, this architecture includes core functions such as high-precision pixel-level segmentation, few-shot learning, and multimodal feature fusion. Its main modules include an encoder (downsampling path), a decoder (upsampling path), skip connection layers, and an output layer. Skip connections are one of the core modules in the U-shaped convolutional network architecture, used to fuse the high-resolution features of the encoder (downsampling path) with the semantic features of the decoder (upsampling path), thereby restoring spatial details and improving segmentation accuracy.

[0138] For example, in the biomechanical analysis of articular cartilage, the distinction between high-frequency stress fluctuation components and low-frequency viscoelastic relaxation components is based on the mechanical response characteristics of cartilage and the physiological load frequency. Specifically: The high-frequency stress fluctuation component has a frequency range of 1 Hz to 10 Hz, corresponding to the transient response of dynamic loads, such as rapid impacts during walking or running (step frequency is typically 1-2 Hz, and can reach 3-5 Hz during running), reflecting the elastic deformation of cartilage and stress wave propagation (rapid vibration of the collagen fiber network). The low-frequency viscoelastic relaxation component has a frequency range of 0 Hz to 0.5 Hz, corresponding to the viscoelastic creep and stress relaxation of cartilage (such as gradual deformation during prolonged standing or slow flexion and extension), reflecting the energy dissipation of the proteoglycan matrix and fluid flow effects (time-dependent behavior). Dynamic mechanical analysis (DMA) shows that the storage modulus (elastic response) of cartilage increases above 1 Hz, while the loss modulus (viscous response) dominates below 0.5 Hz. Human cadence is approximately 1–2 Hz, reaching 3–5 Hz during running or jumping, while resting biomechanical recovery is typically on the order of 0.1 Hz. High frequencies (1–10 Hz): rapid mechanical response to dynamic loads, related to the elastic properties of cartilage. Low frequencies (0–0.5 Hz): viscoelastic relaxation process, related to the energy dissipation capacity of cartilage. Classification criteria: combining physiological load frequencies, material properties, and signal processing requirements to ensure the physical interpretability of feature separation.

[0139] Another example involves preparing the input data for a high-frequency stress fluctuation component: dimensions [H, W, C1] (e.g., 256×256×64), containing details of dynamic mechanical response. A low-frequency viscoelastic relaxation component: dimensions [H / 8, W / 8, C2], e.g., 32×32×128, representing global features extracted from deep layers of the encoder. The decoder upsamples the low-frequency component using transposed convolution or bilinear interpolation, progressively restoring the resolution to [H, W, C2]. Skip-connect fusion is achieved through the following process: if the number of channels for the high-frequency and low-frequency components differs, a 1×1 convolution is used for adjustment. The high-frequency and upsampled low-frequency features are then concatenated along the channel dimension, and the concatenated features are fused through a convolutional layer. In this example, the high-frequency component in the input data is a 2Hz stress fluctuation (256×256×64) in the femoral condyle contact area, containing micro-deformation details. The low-frequency component is a global 0.2Hz viscoelastic relaxation feature (32×32×128), extracted by the encoder. The fusion process can involve upsampling the low-frequency components three times to a resolution of 256×256×128. The high-frequency components are then adjusted to 64 channels using a 1×1 convolution. These channels are then stitched together to obtain a fusion feature of 256×256×192. The fusion convolutional layer outputs an enhanced feature of 256×256×64, preserving impact details while integrating creep trends. This embodiment of the application can improve the detection rate of early cartilage lesions by 12%.

[0140] Here is a specific example:

[0141] In the dynamic analysis of knee cartilage, the spatiotemporal correlation features of the input data are 10,000 grid cells × 3 modes. After wavelet packet decomposition, the high-frequency component shows a 2Hz fluctuation in the center of the femoral condyle with an amplitude of 0.8, while the low-frequency component exhibits a global relaxation of 0.2Hz with an amplitude of 0.3. After high-frequency enhancement, the amplitude of stress concentration areas (such as the 5000-6000th cell) is increased to 1.2, and the background noise is reduced by 40%. The output fused feature map retains both high-frequency impact details (such as the 2Hz fluctuation) and integrates low-frequency creep trends. The output result is a 10,000×64 matrix of enhanced features (64-channel fused features). This embodiment can achieve a rapid detection rate of cartilage lesions in the early stage with an accuracy of 92% (compared to 78% without enhancement); the viscoelastic law compliance rate is 97%.

[0142] By executing steps 131-133, this embodiment of the application achieves synergistic optimization of mechanical dynamics and viscoelastic behavior in spatiotemporal correlation features through wavelet packet band separation, high-frequency adaptive enhancement, and U-Net multi-scale fusion. The embodiment shows that the enhanced features improve the segmentation network's sensitivity to micro-deformations by 14%, while ensuring the physiological rationality of the mechanical response (e.g., stress-strain phase delay conforms to the QLV model), providing high-precision and interpretable analytical results for clinical use.

[0143] Figure 2 A schematic diagram of the structure of an articular cartilage image segmentation system based on a hybrid attention mechanism provided in this application embodiment is shown below. Figure 2 As shown, the system includes:

[0144] The acquisition module 21 is used to acquire piezoelectric and polarized stress data of articular cartilage under dynamic load conditions.

[0145] The identification module 22 is used to identify the spatiotemporal correlation characteristics between piezoelectric data and polarized light stress data.

[0146] Enhancement module 23 is used to enhance the spatiotemporal correlation features to obtain enhanced spatiotemporal correlation features.

[0147] The output module 24 is used to input the enhanced spatiotemporal correlation features into the pre-trained segmentation network. The pre-trained segmentation network uses dynamic feature weights to weight the features of different modalities and outputs the segmentation results. The segmentation results include the deformed regions of articular cartilage, so as to determine the damage level of articular cartilage based on the deformed regions of articular cartilage. The dynamic feature weights are generated by a hybrid attention mechanism.

[0148] Figure 2 The articular cartilage image segmentation system based on a hybrid attention mechanism described above can perform... Figure 1 The implementation principle and technical effects of the articular cartilage image segmentation method based on a hybrid attention mechanism described in the illustrated embodiment will not be repeated here. The specific methods by which each module and unit performs its operations in the articular cartilage image segmentation system based on a hybrid attention mechanism in the above embodiments have been described in detail in the embodiments related to this method, and will not be elaborated upon here.

[0149] In one possible design, Figure 2 The articular cartilage image segmentation system based on a hybrid attention mechanism shown in the embodiment can be implemented as a computing device, such as... Figure 3 As shown, the computing device may include a storage component 31 and a processing component 32.

[0150] The storage component 31 stores one or more computer instructions, wherein the one or more computer instructions are invoked and executed by the processing component 32.

[0151] The processing component 32 is used to: acquire piezoelectric and polarized light stress data of articular cartilage under dynamic load conditions; identify the spatiotemporal correlation features between the piezoelectric and polarized light stress data; enhance the spatiotemporal correlation features to obtain enhanced spatiotemporal correlation features; input the enhanced spatiotemporal correlation features into a pre-trained segmentation network, which uses dynamic feature weights to weight features of different modalities and outputs segmentation results. The segmentation results include the deformation region of the articular cartilage to determine the articular cartilage damage level based on the deformation region of the articular cartilage. The dynamic feature weights are generated by a hybrid attention mechanism.

[0152] The processing component 32 may include one or more processors to execute computer instructions to complete all or part of the steps in the above-described method. Alternatively, the processing component may be implemented as one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components to perform the above-described method.

[0153] Storage component 31 is configured to store various types of data to support operations at the terminal. The storage component can be implemented from any type of volatile or non-volatile storage device or a combination thereof, such as Random Access Memory (RAM), Static Random-Access Memory (SRAM), Electrically Erasable Programmable Read Only Memory (EEPROM), Erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk.

[0154] Of course, computing devices may also include other components, such as input / output interfaces, display components, communication components, etc.

[0155] Input / output interfaces provide interfaces between processing components and peripheral interface modules, which can be output devices, input devices, etc.

[0156] The communication components are configured to facilitate wired or wireless communication between computing devices and other devices.

[0157] The computing device can be a physical device or an elastic computing host provided by a cloud computing platform. In this case, the computing device can refer to a cloud server, and the aforementioned processing components, storage components, etc., can be basic server resources rented or purchased from the cloud computing platform.

[0158] This application also provides a computer storage medium storing a computer program, which, when executed by a computer, can perform the above-described functions. Figure 1 The illustrated embodiment presents a method for segmenting articular cartilage images based on a hybrid attention mechanism.

[0159] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.

[0160] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.

[0161] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.

[0162] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application.

Claims

1. A joint cartilage image segmentation method based on a hybrid attention mechanism, characterized in that, include: Acquire piezoelectric and polarized stress data of articular cartilage under dynamic load conditions; Identify the spatiotemporal correlation characteristics between the piezoelectric electrical data and the polarized light stress data; The spatiotemporal correlation features are enhanced to obtain the enhanced spatiotemporal correlation features; The enhanced spatiotemporal correlation features are input into a pre-trained segmentation network. The pre-trained segmentation network uses dynamic feature weights to weight features of different modalities and outputs segmentation results. The segmentation results include the deformed regions of articular cartilage to determine the articular cartilage damage level based on the deformed regions of the articular cartilage. The dynamic feature weights are channel weight coefficients generated by a hybrid attention mechanism, reflecting the contribution of different modal features to damage identification. The hybrid attention mechanism is a calculation method that combines channel attention and spatial attention to dynamically evaluate the importance of different feature channels and spatial locations. The identification of the spatiotemporal correlation features between the piezoelectric electrical data and the polarized optical stress data includes: The piezoelectric data and polarized light stress data are spatiotemporally registered along the mesh elements on the surface of the articular cartilage to generate spatiotemporally synchronized joint piezoelectric and polarized light stress data; For each grid cell on the surface of articular cartilage, the spatiotemporal dynamic fluctuation characteristics of piezoelectricity and the distribution characteristics of polarized stress direction are extracted from the combined data of piezoelectricity and polarized stress. The dynamic mode decomposition algorithm is used to extract the coupled modes of the spatiotemporal dynamic fluctuation characteristics and the polarization stress direction distribution characteristics in the frequency domain, and the spatiotemporal correlation characteristics are generated according to the energy ratio of the coupled modes.

2. The method of claim 1, wherein, The enhanced spatiotemporal correlation features are input into a pre-trained segmentation network. The pre-trained segmentation network uses dynamic feature weights to weight features of different modalities and outputs segmentation results, including: The enhanced spatiotemporal correlation features are input into a pre-trained segmentation network. The segmentation network divides the enhanced spatiotemporal correlation features into channels according to time series and spatial distribution to generate a time channel feature set and a spatial distribution channel feature set. Based on the time channel feature set and the spatial distribution channel feature set, a hybrid attention mechanism is used to calculate the spatiotemporal correlation strength of each channel and generate dynamic feature weights. Based on the dynamic feature weights, the time channel features in the time channel feature set and the spatial distribution channel features in the spatial distribution channel feature set are weighted and fused channel by channel to generate a multimodal fusion feature map. Perform cross-scale feature aggregation on the multimodal fused feature map to obtain the aggregation result; Based on the continuity constraint of the preset articular cartilage deformation boundary and the aggregation result, the deformation region is predicted to obtain the segmentation result of the deformation region containing articular cartilage.

3. The method of claim 2, wherein, The process of calculating the spatiotemporal correlation strength of each channel based on the temporal channel feature set and the spatial distribution channel feature set, and generating dynamic feature weights, includes: The time channel feature set and the spatial distribution channel feature set are subjected to multi-dimensional correlation analysis between channels to generate a cross-modal correlation matrix; Based on the pre-set constraint conditions for consistency of principal stress direction of articular cartilage and the pre-set constraint conditions for continuity of deformation boundary of articular cartilage, the spatiotemporal correlation strength of each channel in the cross-modal correlation matrix is ​​directionally corrected. The spatiotemporal correlation strength of each channel after correction is normalized to obtain the normalized spatiotemporal correlation strength of each channel. Based on the spatiotemporal correlation strength of each channel after normalization, and combined with the pre-defined viscoelastic properties of articular cartilage, dynamic feature weights at the channel level are generated.

4. The method of claim 3, wherein, The method, based on the pre-set constraints on the consistency of principal stress directions of articular cartilage and the continuity constraints on the deformation boundaries of articular cartilage, performs directional correction on the spatiotemporal correlation strength of each channel in the cross-modal correlation matrix, including: Extract the temporal channel features and spatial distribution channel features of each channel from the cross-modal correlation matrix; Calculate the angle between the principal stress direction of the time channel feature and the deformation boundary direction of the spatial distribution channel feature for each channel to generate directional deviation; Based on the constraints of the directional deviation and the viscoelastic properties of the preset articular cartilage, a directional deviation correction weight is generated. Based on the directional deviation correction weight, the spatiotemporal correlation strength of each channel in the cross-modal correlation matrix is ​​iteratively corrected until the spatial distribution relationship between the principal stress direction and the deformation boundary direction satisfies the preset viscoelastic property constraint condition of articular cartilage, thus obtaining the corrected spatiotemporal correlation strength of each channel.

5. The method of claim 2, wherein, The step of performing channel-by-channel weighted fusion of the time channel features in the time channel feature set and the spatial distribution channel features in the spatial distribution channel feature set according to the dynamic feature weights to generate a multimodal fusion feature map includes: Based on the mechanical properties of stress concentration areas in articular cartilage, the dynamic feature weights are modified; Based on the corrected dynamic feature weights, the time channel features in the time channel feature set and the spatial distribution channel features in the spatial distribution channel feature set are nonlinearly superimposed channel by channel to generate a multimodal fusion feature map.

6. The method of claim 1, wherein, The step of enhancing the spatiotemporal correlation features to obtain enhanced spatiotemporal correlation features includes: Wavelet packet transform is performed on the spatiotemporal correlation features to obtain high-frequency stress fluctuation components and low-frequency viscoelastic relaxation components. Feature enhancement is performed on the high-frequency stress fluctuation component to obtain the feature-enhanced high-frequency stress fluctuation component; The enhanced spatiotemporal correlation features are obtained by fusing the high-frequency stress fluctuation component and the low-frequency viscoelastic relaxation component through skip connections in the U-shaped convolutional network architecture.

7. A joint cartilage image segmentation system based on a hybrid attention mechanism, used for the joint cartilage image segmentation method based on the hybrid attention mechanism in any one of claims 1 to 6, characterized in that, include: The acquisition module is used to acquire piezoelectric and polarized stress data of articular cartilage under dynamic load conditions. The identification module is used to identify the spatiotemporal correlation characteristics between the piezoelectric data and the polarized light stress data; The enhancement module is used to enhance the spatiotemporal correlation features to obtain enhanced spatiotemporal correlation features; The output module is used to input the enhanced spatiotemporal correlation features into a pre-trained segmentation network. The pre-trained segmentation network uses dynamic feature weights to weight the features of different modalities and outputs segmentation results. The segmentation results include the deformation region of articular cartilage, so as to determine the articular cartilage damage level based on the deformation region of articular cartilage. The dynamic feature weights are generated by a hybrid attention mechanism.

8. A computing device, comprising: It includes a processing component and a storage component; the storage component stores one or more computer instructions; the one or more computer instructions are invoked and executed by the processing component to implement the articular cartilage image segmentation method based on a hybrid attention mechanism as described in any one of claims 1 to 6.

9. A computer storage medium, characterized in that The device contains a computer program that, when executed by a computer, implements a method for segmenting articular cartilage images based on a hybrid attention mechanism as described in any one of claims 1 to 6.