A knee joint degeneration automatic grading method and system based on cross-view feature manifold alignment and anatomical prior constraint
By using cross-view feature manifold alignment and anatomical prior constraints, the method automatically corrects the pose deviation of knee X-ray images and fuses anteroposterior and lateral information, solving the accuracy and robustness problems of the existing KL classification of the knee joint and realizing efficient diagnosis of early knee joint degeneration.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SOUTHEAST UNIV
- Filing Date
- 2026-04-23
- Publication Date
- 2026-06-12
Smart Images

Figure CN122199520A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of computer vision, medical artificial intelligence, and orthopedic imaging diagnostics. Specifically, this invention relates to an automatic grading method and system for knee joint degeneration based on cross-view feature manifold alignment and anatomical prior constraints. By introducing a self-supervised spatial transformation network and dual-view feature fusion technology, a deep learning model is constructed to automatically correct deviations and extract features from non-standard anteroposterior and lateral X-ray images, thereby achieving automatic grading and intelligent assisted diagnosis of the degree of knee joint degeneration. Background Technology
[0002] With the increasing aging of the global population, osteoarthritis of the knee and other degenerative joint diseases pose a significant challenge to public health. These diseases often cause joint pain, stiffness, and a significant decline in mobility, leading to a reduced quality of life. Therefore, accurate assessment and early detection of the degree of knee joint degeneration (such as the clinically used Kellgren-Lawrence classification, or KL classification) are crucial for developing clinical intervention and treatment plans and improving patient prognosis.
[0003] Currently, most automated auxiliary diagnostic methods for patellofemoral joint (KL) grading use convolutional neural networks based on a single view (anteroposterior view only). However, existing technologies have the following significant limitations: First, information is incomplete; it is difficult to observe osteophytes and early degeneration of the intercondylar fossa in the patellofemoral joint using only anteroposterior radiographs, which can easily lead to missed diagnoses in the early stages (KL-1, KL-2 grades). Second, there is pose sensitivity; in actual clinical imaging, patients' knee joint rotation angles and flexion-extension degrees vary greatly, causing pseudo-narrowing of the joint space on X-rays, and traditional models lack pose correction capabilities. Finally, there is a lack of anatomical prior constraints; existing models usually treat anteroposterior and lateral images as independent images, ignoring the fact that anteroposterior and lateral views are projections of the same anatomical entity onto a vertical plane, resulting in a lack of geometric consistency in feature extraction.
[0004] In summary, existing methods suffer from low sensitivity in early diagnosis, susceptibility to interference from camera positioning, and insufficient utilization of multi-view spatial information. Therefore, a novel automatic grading method for knee joint degeneration based on cross-view feature manifold alignment and anatomical prior constraints is needed. This method integrates complementary information from anteroposterior and lateral images and utilizes a deep learning model to automatically correct pose deviations and strengthen anatomical constraints, thereby improving the accuracy of determining "joint space narrowing (JSN)" and "osteophyte formation," and addressing the clinical challenge of accurate grading of early knee osteoarthritis. Summary of the Invention
[0005] To address the problems existing in the prior art, the present invention aims to provide an automatic grading method and system for knee joint degeneration based on cross-view feature manifold alignment and anatomical prior constraints. It aims to solve the problems of how to achieve nonlinear spatial alignment of anatomical features in anteroposterior and lateral X-ray films, and how to improve the accuracy of the determination of "joint space narrowing (JSN)" and "osteophyte formation" by utilizing complementary information from anteroposterior and lateral views without text reports.
[0006] To achieve the above objectives, the present invention adopts the following technical solution:
[0007] An automatic grading method for knee joint degeneration based on cross-view feature manifold alignment and anatomical prior constraints includes the following steps:
[0008] Step 1: Obtain original anteroposterior and lateral X-ray images of the knee joint in non-standard positions;
[0009] Step 2: Perform automated anatomical axis standardization processing on the original anteroposterior and lateral X-ray images of the knee joint, and use a self-supervised spatial transformation network to predict the affine transformation matrix to output the standardized image after pose correction.
[0010] Step 3: Input the standardized anteroposterior and lateral images into the diagnostic backbone network for feature extraction, and obtain the anteroposterior features and lateral features respectively;
[0011] Step 4: Construct a dual-view feature fusion module, perform bidirectional cross-attention and self-attention calculations on the row dimension of the feature space, and perform deep fusion and cross-view anatomical semantic alignment on the positive and lateral features.
[0012] Step 5: Based on the prior knowledge that the frontal and lateral views are projections of the same anatomical entity, calculate the geometric projection consistency loss to constrain the anatomical correspondence between the two viewpoints in vertical height using physical space.
[0013] Step 6: Apply a joint fine-tuning strategy to backpropagate and optimize the global loss function, and output the KL classification prediction results for knee joint degeneration.
[0014] Furthermore, in step 2, the automated anatomical axis standardization process adopts a self-supervised alignment loss function based on gradient projection consistency, which optimizes the affine transformation parameters by minimizing the longitudinal projection entropy in the case of no annotation.
[0015] Furthermore, in step 3, the diagnostic backbone network is ResNet50, which is truncated to the third residual block to output a feature map that maintains spatial resolution.
[0016] Furthermore, in step 4, the dual-view feature fusion module includes a multi-layered stacked cross-attention layer and a self-attention layer; wherein, the cross-attention layer treats features as a sequence along the horizontal direction and independently performs bidirectional attention operations in the row direction to enhance semantic alignment at the same anatomical height.
[0017] Furthermore, the dual-view feature fusion module stacks 6 layers of fusion units, each layer containing a cross-attention layer and a self-attention layer, and takes the output of the positive branch as the final fusion result.
[0018] Furthermore, in step 5, the calculation steps for the geometric projection consistency loss include: pooling and compressing the frontal and lateral features along the horizontal direction to obtain a one-dimensional vertical feature descriptor containing only height information; then calculating the mean square error between the frontal and lateral vertical feature descriptors as the geometric projection consistency loss.
[0019] Furthermore, the method employs a cascaded optimization strategy, consisting of two stages: self-supervised alignment pre-training and task-driven joint fine-tuning. The global loss function in step 6 includes KL-level classification loss, geometric projection consistency loss, and self-supervised alignment loss.
[0020] Furthermore, the self-supervised alignment pre-training stage includes: extracting correctly positioned images from a public dataset, applying random affine transformations to generate perturbation samples, inputting them into a self-supervised spatial transformation network, and pre-training using a self-supervised alignment loss function.
[0021] Furthermore, in the task-driven joint fine-tuning phase, the multi-task weight coefficients of the global loss function set the weight of the self-supervised alignment loss to a larger value in the early stage of training to stabilize the alignment effect, and increase the weight of the KL hierarchical classification loss in the later stage of training to improve the classification accuracy.
[0022] An automatic knee joint degeneration grading system based on cross-view feature manifold alignment and anatomical prior constraints for performing the method includes:
[0023] The automated anatomical axis standardization module is used to automatically correct the pose deviation of the original non-standard knee joint images and generate standardized diagnostic images.
[0024] The feature extraction module is used to extract deep spatial semantic features from the standardized frontal and lateral images;
[0025] The dual-view feature fusion module is used to achieve cross-view anatomical feature alignment and complementarity through a row-independent cross-attention mechanism;
[0026] The geometric projection constraint module is used to extract the one-dimensional descriptor of vertical height and calculate the geometric projection consistency loss.
[0027] The classification prediction and optimization module is used to calculate the joint loss of multiple tasks and output the final KL classification diagnosis result of the knee joint.
[0028] Beneficial effects:
[0029] Compared with the prior art, the present invention has the following advantages:
[0030] Automated correction: The STN module automatically corrects shooting posture deviations, eliminating "pseudo-space narrowing" caused by knee joint rotation or tilt, significantly improving diagnostic robustness.
[0031] Multi-view complementarity: Deeply integrates anteroposterior and lateral features, enabling the detection of occult lesions that are difficult to detect with a single view, significantly improving the sensitivity of early (KL-1, 2 grade) diagnosis.
[0032] Rigorous logical anti-spoofing: The introduction of geometric projection consistency constraints conforms to the principles of radiological imaging, effectively filtering out foreign objects and artifacts, resulting in more reliable diagnostic results.
[0033] Efficient computational deployment: The row-independent attention mechanism significantly reduces computational complexity while ensuring anatomical alignment, resulting in fast inference speed and easy deployment on clinical mobile terminals.
[0034] Reduced dependence on annotations: A self-supervised pre-training strategy is adopted to make full use of unlabeled data, thereby enhancing the model's generalization ability across different medical institutions. Attached Figure Description
[0035] Figure 1 This is a schematic diagram of the Cross Attention Fusion Block in this invention;
[0036] Figure 2 This is a schematic diagram of the knee joint image features and loss function process in this invention. Detailed Implementation
[0037] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be described in further detail below with reference to the accompanying drawings and specific embodiments. These embodiments are for illustrative purposes only and are not intended to limit the scope of the invention.
[0038] This invention provides an automatic grading method for knee joint degeneration based on cross-view feature manifold alignment and anatomical prior constraints, the specific technical solution of which is as follows:
[0039] 1. Automated Anatomical Axis Standardization Module (Self-Supervised STN)
[0040] During clinical imaging, patients exhibit significant differences in knee joint rotation angles and flexion / extension degrees, leading to pseudo-narrowing of the joint space on X-rays. Traditional models lack pose correction capabilities. To achieve alignment without relying on manual punctuation, this invention designs a self-supervised STN based on gradient projection consistency.
[0041] (1) Affine transformation parameter regression
[0042] For a knee X-ray image that may be misaligned or taken from an incorrect angle, after feature extraction using a ResNet18 network, an MLP is used to predict an affine transformation matrix T. θ This is used to transform the original non-standard bit image I into a normalized image I':
[0043]
[0044] Among them, (x) i t ,y i t (x) are the coordinates in the standardized image. i s ,y i s ) are the corresponding sampling coordinates in the original image, and θ is the control parameter that controls translation, rotation and scaling.
[0045] (2) Self-supervised alignment loss function
[0046] In the absence of annotations, automatic alignment is achieved by minimizing the longitudinal projection entropy. The edge gradient of the lateral projection should be steepest when the bone's major axis is parallel to the image's vertical axis. Therefore, the loss function for affine transformation parameter regression is set as follows:
[0047]
[0048] in, ▽xI'(x,y) This represents the horizontal gradient of the standardized image at (x, y). P(x) This represents the normalized distribution of horizontal gradients accumulated in the vertical direction. When the skeleton is straightened, the gradients concentrate in a few... x The coordinates, that is, the position of the bone edge, are where the entropy value is minimum.
[0049] 2. Dual-view feature fusion module
[0050] To address the shortcomings of existing methods in utilizing multi-view image information, this invention designs a dual-view alternating attention fusion module to achieve deep fusion of image features from different perspectives. This module consists of several stacked Cross AttentionFusion Blocks, each containing a cross-attention layer and a self-attention layer. It can respectively perform cross-view information alignment and single-view semantic modeling. Figure 1 The diagram shown is a schematic representation of the Cross AttentionFusion Block in this invention.
[0051] Specifically, firstly, the image features in the AP and Lat directions are reconstructed into a two-dimensional structure F. A F L In each cross-attention layer, bidirectional attention operations are performed independently along the vertical dimension (i.e., row direction) of the feature space, that is, the positive feature F of the i-th row is... A (i) As a query, the lateral feature F L (i) As the key and value, we obtain cross-view aligned output:
[0052] F A (i)' =Attn(F A (i) ,F L (i) ,F L (i) )
[0053] Here, Attn(·) represents the standard multi-head attention mechanism, whose inputs are Query, Key, and Value, and whose output is the attention-weighted features.
[0054] Conversely, the lateral features F of the i-th row L (i) As a query, the positive feature F A (i) As the key and value, we obtain cross-view aligned output:
[0055] F L (i)' =Attn(F L (i) ,F A (i) ,F A (i) )
[0056] Residual connections are then performed, followed by MLP and LayerNorm normalization:
[0057] F A (i) =LayerNorm(F A (i) +MLP(F A (i)' ))
[0058] In the self-attention layer, self-attention calculations are performed separately for the frontal and lateral features to further capture contextual information from the same viewpoint.
[0059] F' A =Attn(F A ,F A ,F A )
[0060] F' L =Attn(F L ,F L ,F L )
[0061] The same residual connection, MLP, and layer normalization operations are performed. After stacking several layers, the fusion module outputs two fused features, and the output of the positive branch is taken as the final fusion result.
[0062] Since the cross-attention in the visual feature fusion module operates only in the row dimension of the feature space, and the input features are all explicitly reshaped into a two-dimensional structure (H×W), this corresponds to the spatial height correspondence of images from different viewpoints. Therefore, the row attention mechanism can enhance the semantic alignment of frontal and lateral images at the same anatomical height, improving the accuracy of cross-view modeling while reducing computational complexity.
[0063] 3. Geometric projection consistency loss
[0064] In radiology, the anteroposterior (AP) and lateral (Lat) views are projections of the same knee joint onto two mutually perpendicular planes. This means that, theoretically, any anatomical point (such as the lowest point of the femoral condyle or the center of the tibial plateau) should have the same vertical height (Y-axis coordinate) in the aligned AP and lateral views. Based on this, a physical spatial constraint is introduced, resulting in a loss of geometric projection consistency.
[0065] Let feature map F A and F L The dimensions are all H×W×C, where H is the height dimension (corresponding to the vertical direction of the anatomy), W is the width dimension, and C is the number of channels. First, the feature maps in both directions are compressed horizontally to obtain a one-dimensional vertical feature vector containing only height information:
[0066] V A (y)=poolx(FA (x,y))
[0067] V L (y)=poolx(F L (x,y))
[0068] Therefore, the geometric projection consistency loss is calculated:
[0069]
[0070] 4. Cascading Strategy Optimization
[0071] (1) Self-supervised alignment pre-training
[0072] For the STN module, firstly, correctly positioned data is extracted from the public dataset to extract the original image, and then a random affine transformation is performed to generate perturbation samples. These perturbation samples are then input into the STN module to predict the transformation matrix θ.
[0073] The loss function used here is L align The goal is to eliminate the need for KL labels, allowing the model to learn how to correct deviations solely through the geometric features (bone edges) of the image itself.
[0074] (2) Task-driven joint fine-tuning
[0075] After STN achieves initial alignment capabilities, it is connected in series with subsequent cross-view fusion modules and classification heads for overall optimization. The global loss function is set as follows:
[0076] L final =λ1L MSE +λ2L geo +λ3L align
[0077] Among them, L MSE This is the loss for KL classification (the main KL classification task). λ1, λ2, and λ3 represent the multi-task weight coefficients. In the early stages of training, λ3 is larger to stabilize the alignment effect. In the later stages of training, λ1 is increased to improve classification accuracy.
[0078] like Figure 2 The diagram shown is a flowchart of the knee joint image features and loss function in this invention, which fully demonstrates the overall process from the original image input to the final graded output and the calculation location of each loss function.
[0079] The above technical solution will be described in detail below through specific embodiments.
[0080] Example 1
[0081] This embodiment includes the following steps:
[0082] 1. Self-supervised alignment pre-training
[0083] A standard anatomically oriented knee joint image was selected and scaled to a 224×224 pixel single-channel grayscale image. A random affine transformation perturbation was applied, and the perturbation parameter matrix θ was recorded. The perturbed image was input into a localization network (ResNet18) to extract a deep spatial location feature vector of dimension 512. This feature vector was then processed by a three-layer multilayer perceptron (MLP, neuron configuration 512-256-128-6). The MLP ultimately outputs the predicted affine transformation parameters θ'. The regression loss between the predicted parameters θ' and the perturbation parameters θ was calculated, and the weights of the localization network and the MLP were updated using the backpropagation algorithm. This completed the self-supervised pre-training of the STN structure, enabling it to automatically identify and correct pose deviations in the image.
[0084] 2. Task-driven joint fine-tuning
[0085] (1) Automated Anatomical Axis Standardization Module:
[0086] The raw knee joint images acquired in clinical settings are scaled to 224×224 pixels and input into the pre-trained STN module. This module outputs the corrective loss L. align The original image is spatially remapped using an affine transformation matrix θ and a bilinear interpolation sampler, resulting in a single-channel image of 224×224 with anatomical axis normalization.
[0087] The standardized anteroposterior (AP) and lateral (Lat) images are input into the diagnostic backbone network (ResNet50). This network is truncated to the third residual block (Layer 3) to maintain necessary spatial resolution while preserving high-level semantics. The AP and Lat branches each output feature maps F of size 14×14×1024. A With F L (That is, H=14, W=14, C=1024).
[0088] (2) Dual-view feature fusion module:
[0089] This module achieves deep feature interaction by stacking 6 identical fusion units, with each unit containing:
[0090] Cross-view attention layer: Leveraging the consistency of the knee joint's projection along the longitudinal anatomical axis in both anteroposterior and lateral views, the feature maps are treated as a sequence along the horizontal direction (X-axis), and bidirectional attention operations are performed independently in the row-wise direction. Specifically, the i-th row of features in the anteroposterior view is used as the query vector, and the i-th row of features in the lateral view is used as the key and value vectors, achieving cross-view anatomical feature alignment and complementarity. The output size of this layer remains 14×14×1024.
[0091] Self-attention layer: The spatial dimension is flattened into a sequence of length 196 (196×1024). A self-attention mechanism is used to capture long-range contextual dependencies within the same viewpoint image (such as the morphological association between the femur and tibia), and then the two-dimensional spatial structure is restored (14×14×1024). After stacking 6 layers, the positive branch output is taken, with a size of 14×14×1024. Subsequent backpropagation training is performed using the two-dimensional cross-entropy (LMSE) of the classification prediction results.
[0092] 3. Geometric projection consistency loss:
[0093] F A With F L Pooling is performed along the W direction to eliminate lateral anatomical differences, resulting in two sets of 14×1024 feature maps as vertical feature descriptors. The mean square error of both is calculated and used as the geometric projection consistency loss L. geo Backpropagation is then performed. This loss function serves as an auxiliary supervisory signal during training, working together with the final KL classification loss to guide the parameter optimization of the Spatial Transform Network (STN), ensuring that the anteroposterior and lateral images are accurately aligned in anatomical height before entering the subsequent cross-view attention fusion module.
[0094] Example 2
[0095] This embodiment provides an automatic knee joint degeneration grading system based on cross-view feature manifold alignment and anatomical prior constraints for performing the above-described method. The system includes the following modules:
[0096] (1) Automated Anatomical Axis Standardization Module
[0097] This module is used to automatically correct pose deviations in raw, non-standard knee X-ray images, generating anatomically standardized diagnostic images. Specifically, this module employs a self-supervised spatial transformation network based on gradient projection consistency, comprising a ResNet18 localization network and a three-layer multilayer perceptron (MLP, neuron configuration 512-256-128-6). The localization network extracts a 512-dimensional deep spatial location feature vector from the input image, and the MLP predicts the parameters θ of the affine transformation matrix based on this feature vector. Subsequently, a bilinear interpolation sampler is used to spatially remap the original image, outputting a pose-corrected standardized image. The training of this module uses a self-supervised alignment loss function L... align By minimizing the longitudinal projection entropy to optimize the affine transformation parameters, automatic alignment can be achieved without manual annotation.
[0098] (2) Feature extraction module
[0099] This module extracts deep spatial semantic features from standardized anteroposterior and lateral images. Specifically, it uses ResNet50 as the diagnostic backbone network and truncates it to the third residual block (Layer 3) to maintain necessary spatial resolution while preserving high-level semantics. The anteroposterior (AP) and lateral (Lat) branches output feature maps F with a size of 14×14×1024. A With F L This lays the foundation for subsequent cross-perspective integration.
[0100] (3) Dual-view feature fusion module
[0101] This module is used to achieve cross-viewpoint anatomical feature alignment and complementarity through a row-independent cross-attention mechanism. Specifically, the module consists of six identical Cross Attention Fusion Blocks stacked together, each containing one cross-attention layer and one self-attention layer. In the cross-attention layer, the feature maps are treated as a sequence along the horizontal direction, and bidirectional attention operations are performed independently in the row direction: using the i-th row feature in the anterior view as the query and the i-th row feature in the lateral view as the key and value, cross-viewpoint anatomical feature alignment is achieved; simultaneously, using the i-th row feature in the lateral view as the query and the i-th row feature in the anterior view as the key and value, reverse alignment is performed. Residual connections and layer normalization are then performed. In the self-attention layer, self-attention calculations are performed on the anterior and lateral features respectively, capturing long-range contextual dependencies within the same viewpoint. After six stacked layers, the output of the anterior branch is taken as the final fused feature, with a size of 14×14×1024. The specific structure of this module is as follows: Figure 1 As shown.
[0102] (4) Geometric projection constraint module
[0103] This module is used to extract a one-dimensional descriptor of vertical height and calculate the geometric projection consistency loss. Specifically, this module will extract the orthogonal feature map F A Lateral feature map F L Pooling compression was performed along the horizontal direction (W direction) to eliminate lateral anatomical differences, resulting in two sets of one-dimensional vertical feature descriptors containing only height information, with a size of 14×1024. Subsequently, the mean squared error between these two vertical feature descriptors was calculated as the geometric projection consistency loss L. geo This loss function serves as an auxiliary supervisory signal during training, working together with the final KL classification loss to guide the parameter optimization of the spatial transformation network, ensuring accurate alignment of the frontal and lateral images at anatomical height.
[0104] (5) Hierarchical prediction and optimization module
[0105] This module is used to calculate the multi-task joint loss and output the final KL grading diagnostic result for the knee joint. Specifically, this module maps the fused features output by the dual-view feature fusion module to the KL grading prediction result through a classification head, and calculates the classification loss L. MSE Simultaneously, it receives the self-supervised alignment loss L from the automated anatomical axis normalization module. align and the geometric projection consistency loss L from the geometric projection constraint module geo Construct a global loss function L final =λ1L MSE +λ2L geo +λ3L align In the early stages of training, λ3 is set to a larger value to stabilize the alignment effect, while λ1 is increased in the later stages of training to improve classification accuracy. This module adopts a cascaded optimization strategy, first performing self-supervised alignment pre-training, then task-driven joint fine-tuning, and finally outputting the KL classification prediction results for knee joint degeneration. The overall process is as follows: Figure 2 As shown.
[0106] Through the coordinated operation of the above five modules, the system achieves fully automated processing from input of non-standard knee joint anteroposterior and lateral X-ray images to KL grade output, which can be used for early and accurate auxiliary diagnosis of knee osteoarthritis.
[0107] The above description is only a preferred embodiment of the present invention. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the principle of the present invention, and these improvements and modifications should also be considered within the scope of protection of the present invention.
Claims
1. An automatic grading method for knee joint degeneration based on cross-view feature manifold alignment and anatomical prior constraints, characterized in that, Includes the following steps: Step 1: Obtain original anteroposterior and lateral X-ray images of the knee joint in non-standard positions; Step 2: Perform automated anatomical axis standardization processing on the original anteroposterior and lateral X-ray images of the knee joint, and use a self-supervised spatial transformation network to predict the affine transformation matrix to output the standardized image after pose correction. Step 3: Input the standardized anteroposterior and lateral images into the diagnostic backbone network for feature extraction, and obtain the anteroposterior features and lateral features respectively; Step 4: Construct a dual-view feature fusion module, perform bidirectional cross-attention and self-attention calculations on the row dimension of the feature space, and perform deep fusion and cross-view anatomical semantic alignment on the positive and lateral features. Step 5: Based on the prior knowledge that the frontal and lateral views are projections of the same anatomical entity, calculate the geometric projection consistency loss to constrain the anatomical correspondence between the two viewpoints in vertical height using physical space. Step 6: Apply a joint fine-tuning strategy to backpropagate and optimize the global loss function, and output the KL classification prediction results for knee joint degeneration.
2. The method according to claim 1, characterized in that, In step 2, the automated anatomical axis standardization process uses a self-supervised alignment loss function based on gradient projection consistency, which optimizes the affine transformation parameters by minimizing the longitudinal projection entropy in the absence of labels.
3. The method according to claim 1, characterized in that, In step 3, the diagnostic backbone network is ResNet50, which is truncated to the third residual block to output a feature map that maintains spatial resolution.
4. The method according to claim 1, characterized in that, In step 4, the dual-view feature fusion module includes a multi-layered stacked cross-attention layer and a self-attention layer; wherein, the cross-attention layer treats features as a sequence along the horizontal direction and independently performs bidirectional attention operations in the row direction to enhance semantic alignment at the same anatomical height.
5. The method according to claim 4, characterized in that, The dual-view feature fusion module stacks 6 layers of fusion units. Each layer contains a cross-attention layer and a self-attention layer, and takes the output of the positive branch as the final fusion result.
6. The method according to claim 1, characterized in that, In step 5, the calculation steps for geometric projection consistency loss include: pooling and compressing the frontal and lateral features along the horizontal direction to obtain a one-dimensional vertical feature descriptor containing only height information; then calculating the mean square error between the frontal and lateral vertical feature descriptors as the geometric projection consistency loss.
7. The method according to claim 1, characterized in that, The method employs a cascaded strategy for optimization, consisting of two stages: self-supervised alignment pre-training and task-driven joint fine-tuning. The global loss function in step 6 includes KL-level classification loss, geometric projection consistency loss, and self-supervised alignment loss.
8. The method according to claim 7, characterized in that, The self-supervised alignment pre-training stage includes: extracting correctly positioned images from a public dataset, applying random affine transformations to generate perturbation samples, inputting them into a self-supervised spatial transformation network, and pre-training using a self-supervised alignment loss function.
9. The method according to claim 7, characterized in that, In the task-driven joint fine-tuning phase, the multi-task weight coefficients of the global loss function set the weight of the self-supervised alignment loss to a larger value in the early stage of training to stabilize the alignment effect, and increase the weight of the KL hierarchical classification loss in the later stage of training to improve the classification accuracy.
10. An automatic grading system for knee joint degeneration based on cross-view feature manifold alignment and anatomical prior constraints for performing the method of any one of claims 1-9, characterized in that, include: The automated anatomical axis standardization module is used to automatically correct the pose deviation of the original non-standard knee joint images and generate standardized diagnostic images. The feature extraction module is used to extract deep spatial semantic features from the standardized frontal and lateral images; The dual-view feature fusion module is used to achieve cross-view anatomical feature alignment and complementarity through a row-independent cross-attention mechanism; The geometric projection constraint module is used to extract the one-dimensional descriptor of vertical height and calculate the geometric projection consistency loss. The classification prediction and optimization module is used to calculate the joint loss of multiple tasks and output the final KL classification diagnosis result of the knee joint.