A cascade-corrected osteoporosis diagnostic system and method
The osteoporosis diagnostic system with cascaded correction solves the problems of scarce labeled data and pseudo-label noise in osteoporosis diagnosis, achieving high accuracy and stability in osteoporosis diagnosis with a small number of labels, and is suitable for early screening and clinical diagnosis of osteoporosis.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HEBEI UNIVERSITY
- Filing Date
- 2026-03-25
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies for osteoporosis diagnosis suffer from problems such as scarce and unevenly distributed labeled data, easy introduction of noise from pseudo-label generation, and difficulty in adapting traditional learning strategies to the identification of low bone density cases, resulting in weak model generalization ability and insufficient diagnostic accuracy and stability.
The osteoporosis diagnostic system employing cascaded correction includes modules such as data collection and preprocessing, feature extraction and reconstruction, pseudo-label generation, cascaded correction, and semi-supervised training scheduling. Through adaptive soft pseudo-label generation, cascaded noise correction, and anti-curricular semi-supervised scheduling, the system improves the generalization and convergence stability of the model and outputs interpretable heatmaps and review prompts.
With limited labeled samples, the model significantly improves the accuracy and stability of osteoporosis diagnosis, reduces the risk of missed diagnoses, enhances the robustness and diagnostic efficiency of the model, and is suitable for early screening and clinical diagnosis of osteoporosis.
Smart Images

Figure CN122245719A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of medical image analysis technology, specifically relating to a cascaded correction osteoporosis diagnostic system and method. Background Technology
[0002] Osteoporosis and related bone abnormalities are common chronic skeletal diseases among middle-aged and elderly people. They are characterized by their insidious nature and slow progression. If not intervened in time, they can easily lead to fragility fractures, significantly increasing the risk of disability and death. Currently, clinical diagnosis mainly relies on dual-energy X-ray absorptiometry (DXA). Although it has high accuracy in measuring bone mineral density, its application in large-scale screening at primary healthcare institutions is limited by issues such as expensive equipment, complex operation, relatively high radiation dose, and long appointment times.
[0003] In contrast, lumbar spine anteroposterior and lateral X-rays are widely used in routine physical examinations and primary care due to their widespread availability, low examination cost, and low radiation dose, accumulating a large amount of image data that can potentially be used for bone quality assessment. In recent years, research on intelligent computer-aided detection and diagnosis of bone status based on X-ray images has gradually emerged. This research aims to use artificial intelligence technology to deeply explore implicit features in images, such as bone texture, cortical thickness, vertebral morphology, and structural degeneration, to achieve automated identification, accurate grading, and prediction of osteoporosis risk and progression trends. This provides a new approach for low-cost, wide-coverage early screening and long-term follow-up management.
[0004] However, such methods still face serious challenges in actual construction: First, high-quality labeled data is scarce and has cross-institutional distribution bias, which restricts the generalization ability of supervised learning models; Second, when using unlabeled data for semi-supervised learning, pseudo-label generation is prone to introducing noise, leading to error accumulation and model performance degradation; Third, the traditional "easy to difficult" course learning strategy is difficult to adapt to the need for identifying rare but critical low bone density cases in bone lesions, and if the enhancement strategy, confidence threshold and loss weight in the training process lack a dynamic scheduling mechanism, it is easy to cause instability of decision boundaries and difficulty in convergence.
[0005] Therefore, there is an urgent need for an intelligent diagnostic framework that can effectively integrate large-scale unlabeled data, suppress noise propagation, and optimize the training dynamic process based on a small number of labeled samples, so as to improve the clinical practical value of X-ray images in bone status assessment. Summary of the Invention
[0006] The present invention aims to at least partially solve one of the technical problems in the aforementioned related technologies.
[0007] Therefore, the purpose of this invention is to provide a cascaded correction osteoporosis diagnostic system and method. Through strategies such as adaptive soft pseudo-label generation, cascaded noise correction, anti-curricular semi-supervised scheduling, and temperature calibration, the system improves the generalization, convergence stability, and diagnostic accuracy of the model, and outputs interpretable heatmaps and re-examination prompts to reduce the risk of missed diagnoses. It can solve the problems of scarce annotations, uneven distribution, and domain differences in osteoporosis X-ray imaging diagnosis.
[0008] To solve the above-mentioned technical problems, the present invention is implemented as follows: This invention provides a cascaded correction osteoporosis diagnostic system, the system comprising: The data collection and preprocessing module is configured to provide standardized, high-quality image input, solving problems such as privacy leaks, inconsistent formats, and unbalanced sample distribution in raw images; The feature extraction and feature reconstruction module is configured to achieve effective feature mining and robust representation of lumbar spine X-ray images, solving the problems of weak generalization ability of single feature extraction and insufficient feature learning under limited annotation. The pseudo-label generation module is configured to generate high-quality soft pseudo-labels for unlabeled samples, solving the problem that single pseudo-label prediction in semi-supervised learning is prone to introducing errors, and realizing the fusion optimization of model prediction and neighborhood distribution. The cascaded correction module is configured to perform step-by-step purification of pseudo-labels, solving the problem of model performance degradation caused by pseudo-label noise propagation, and realizing the credibility classification and accurate utilization of pseudo-labels. The semi-supervised training scheduling and anti-course control module is configured as a global control unit to address the problem that traditional course-based learning, which progresses from easy to difficult, is unsuitable for identifying complex osteoporosis samples, thus achieving dynamic optimization of the training process; and, The reasoning and diagnostic output module is configured to enable the clinical application of the model, solving the problems of uninterpretable intelligent diagnostic results and high risk of missed osteoporosis diagnosis, while balancing diagnostic accuracy and clinical practicality.
[0009] In addition, the cascaded correction osteoporosis diagnostic system according to the present invention may also have the following additional technical features: In some implementations, the data collection and preprocessing module includes: Core operations: De-identification processing of lumbar spine anteroposterior and lateral X-ray images to remove privacy information and mask text overprints; stratified sampling according to proportion and category to divide the data into training, validation and test sets, and extracting 5% to 20% of the samples from the training set as labeled subsets, with the rest as unlabeled subsets; Impact on standardization: Adaptive cropping is achieved through coarse spine localization and peak detection, unifying pixels to the same size, while intensity normalization and noise suppression are performed; and, Data augmentation and distribution constraints: Geometric and lighting augmentation methods are used to expand the samples, and upsampling and downsampling constraints are applied to the training set to make the distribution of the augmented samples approximate the original population distribution.
[0010] In some implementations, the feature extraction and feature reconstruction module includes: Self-supervised pre-training: A contrastive learning method is used to pre-train a lightweight convolutional encoder to learn semantically sensitive general representations, which serve as initial weights for downstream tasks; and, Joint optimization of feature reconstruction: An autoencoder branch (three fully connected or lightweight convolutional layers) is connected in parallel to the encoder representation layer to perform reconstruction on the intermediate feature vectors. The model is jointly optimized by the weighted sum of reconstruction loss and task loss.
[0011] In some implementations, the pseudo-tag generation module includes: Dual distribution acquisition: The class probability distribution of unlabeled samples output by the current model is used to construct a K-nearest neighbor graph in the feature space to obtain the neighborhood class probability distribution; Local density metric: Characterizes the local density of a sample by the inverse of the average neighborhood distance or neighborhood entropy. ρ To distinguish between high-density and low-density samples; and, Adaptive fusion to generate soft pseudo-labels: The dual distributions are fused through a dynamic weight function. The dynamic weight function increases monotonically with the training phase and local density, while outputting the confidence of the pseudo-labels.
[0012] In some implementations, the cascaded correction module includes: Gaussian Mixture Model Clustering: Fit a Gaussian mixture model to the set of pseudo-label confidence scores. The number of components is selected by the Bayesian information criterion, and covariance regularization is applied to identify complex or low-confidence samples. Uncertainty measure: For complex samples, Monte Carlo Dropout is performed, and the variance or entropy of the T classification probabilities is used as the uncertainty measure u; and, Hierarchical utilization of hard and soft labels: Set a staged threshold τlow(s). When u≤τlow(s), hard pseudo-labels are used for training. When u>τlow(s), temperature scaling is applied to the pseudo-labels to obtain soft labels. Noise propagation is suppressed by KL divergence constraint.
[0013] In some implementations, the semi-supervised training scheduling and anti-course control module includes: Staged parameter scheduling: Using the training stage s as the independent variable, the sample ratio, threshold, loss weight and data augmentation intensity are dynamically adjusted; Anti-curricular learning strategies: In the early stages, increase the sampling ratio and loss weight of complex and low-density samples to focus on learning boundary features; in the mid-to-late stages, increase the weight of high-confidence and high-density samples, reduce the enhancement intensity, and solidify the classification boundary; and, Training convergence control: When the F1 value on the validation set does not improve within a preset window, an early stopping mechanism is triggered or the system reverts to the previous stable checkpoint to avoid overfitting.
[0014] In some implementations, the content of the reasoning and diagnostic output module includes: Probability calibration and category determination: Temperature calibration is performed on the model output probability, a category sensitivity threshold is set, and a diagnostic conclusion of normal bone density, osteopenia, or osteoporosis is generated. Interpretable Output: Interpretable heatmaps are generated using gradient-weighted class activation mappings, highlighting regions of interest to the model, including the vertebral cortex and trabeculae; and, Clinical review prompt: When the diagnostic confidence is below the threshold or the uncertainty is above the warning value, it is automatically marked as pending review and a prompt is output for clinical review.
[0015] This invention also provides a cascaded correction method for diagnosing osteoporosis, implemented using the cascaded correction osteoporosis diagnostic system described in any of the preceding embodiments; the method includes the following steps: S1. Image data acquisition and standardized preprocessing; S2. Image feature representation learning; S3. Soft pseudo-labels were not generated for unlabeled samples; S4. Cascade correction and purification of pseudo-labels; S5. Anti-curricular semi-supervised dynamic training; S6. Reasoning, diagnosis, and clinical output.
[0016] In addition, the cascaded correction osteoporosis diagnostic system according to the present invention may also have the following additional technical features: In some implementations, step S3 includes: A K-nearest neighbor graph is constructed in the feature space to obtain the neighborhood distribution. A weight function that dynamically changes with the training stage and local density is introduced to achieve adaptive fusion of the model prediction distribution and the neighborhood distribution. At the same time, the local density is used to distinguish the characteristics of the samples, so that the pseudo-labels are more in line with the distribution characteristics of the data itself, and solve the problem of insufficient mining of information of unlabeled samples under limited labeling. Step S4 includes: By combining Gaussian mixture models with Monte Carlo Dropout, a two-level correction of pseudo-labels is achieved: first, Gaussian mixture models are used to cluster confidence to identify complex samples, and then Monte Carlo Dropout is used to quantify sample uncertainty, enabling the hierarchical utilization of soft and hard labels. At the same time, temperature scaling and KL divergence are combined to constrain the noise propagation of high-uncertainty samples, solving the problem of model performance degradation caused by low pseudo-label quality in semi-supervised learning.
[0017] In some implementations, step S5 includes: Full-parameter dynamic scheduling with training phase as independent variable: In the early stage, the learning weights of complex and low-density samples are strengthened, and in the later stage, the classification boundary of high-confidence and high-density samples is consolidated. At the same time, the sample ratio, loss weight, enhancement intensity and threshold are adjusted in stages, and an early stop or rollback mechanism is set up to solve the problems of poor recognition effect and training convergence difficulty in rare but critical low bone density difficult cases. Based on the self-supervised pre-trained encoder, a parallel autoencoder branch is used to reconstruct features. The reconstruction loss, supervision loss, and pseudo-label loss are fused into a total loss function. By using multi-loss weighted joint optimization to constrain model learning, the model can learn more robust and semantic lumbar X-ray image features under limited annotation, thus solving the problem of weak generalization ability of single feature extraction.
[0018] Compared with the prior art, the present invention has at least the following beneficial effects: In this embodiment of the invention, the cascaded correction osteoporosis diagnosis system and method provided simulates the doctor's comparison process between anteroposterior and lateral images through a combined design of dual-view feature extraction, attention enhancement and bidirectional cross-fusion, thereby effectively improving the complementarity of cross-view features and diagnostic efficiency, and overcoming the problems of insufficient diagnostic information and low feature utilization in a single view. In the embodiments of the present invention, the cascaded correction osteoporosis diagnostic system and method provided still maintain strong robustness and generalization performance under limited data conditions. Experimental results show that it is significantly better than existing methods in terms of accuracy, sensitivity, specificity and other indicators, and can provide reliable support for early screening and clinical diagnosis of osteoporosis.
[0019] Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. Attached Figure Description
[0020] Figure 1 This is an overall block diagram of a cascaded correction osteoporosis diagnostic system disclosed in one embodiment of the present invention; Figure 2 This is a schematic diagram of the overall network structure disclosed in one embodiment of the present invention; Figure 3 This is a schematic diagram of a cascaded correction structure disclosed in one embodiment of the present invention; Figure 4 This is a visualization output heatmap disclosed in one embodiment of the present invention. Detailed Implementation
[0021] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0022] The embodiments of the present invention will be described in detail below with reference to the accompanying drawings and specific examples and application scenarios.
[0023] In some embodiments of the present invention, a cascaded correction osteoporosis diagnostic system is provided, which is a computer-aided diagnostic system for bone status applied to lumbar spine X-ray imaging. The system includes: The data collection and preprocessing module acquires lumbar spine anteroposterior and lateral X-ray images and performs de-labeling. Stratified sampling is used to divide the dataset into training, validation, and test sets, with 5%–20% of the samples from the training set selected as labeled subsets and the remainder as unlabeled subsets. Image size standardization, intensity normalization, noise suppression, and data augmentation are performed. Adaptive cropping is optional. The stratified sampling ratio in the data collection and preprocessing module is 8:1:1 (training:validation:test set), and upsampling and downsampling constraints are applied to the number of samples in each category within the training set to ensure that the enhanced distribution approximates the original population distribution. Adaptive cropping includes coarse spine localization, peak detection of the target row / column integral curve, and fine cropping, unifying the cropped images to a uniform format. Pixels; data augmentation includes rotation, scaling, flipping, and noise perturbation to augment training samples; The feature extraction and reconstruction module employs a self-supervised pre-training method to obtain the initial representation of the encoder. Feature reconstruction is performed by connecting an autoencoder branch in parallel to the encoder representation layer, using a weighted sum of the reconstruction loss and task loss for joint optimization to form a robust feature representation. The autoencoder in this module consists of three fully connected or lightweight convolutional layers, with a reconstruction loss of [missing information]. The total loss is The specific meanings of each parameter in the above formula are as follows: This represents the true intermediate feature vector extracted by the encoder; This represents the feature vector reconstructed from the branches of the autoencoder; The loss term representing the feature reconstruction constraint is used to measure the difference between the extracted features and the reconstructed features; The total loss function represents the model training and optimization. The cross-entropy supervision loss represents the labeled samples; Cross-entropy loss representing hard pseudo-labels; The Kullback-Leibler (KL) divergence loss represents the difference between the soft pseudo-labels and the model output. The weighting coefficients representing the loss of hard pseudo-labels; The weighting coefficients representing the loss of soft pseudo-labels; These represent the weight coefficients of the reconstruction constraint terms. Among them, , , These are the corresponding weight coefficients, which are set in stages by the semi-supervised training scheduling module during the training process to balance the contributions of the supervision signal and feature reconstruction at different stages. The pseudo-label generation module outputs the class probability distribution for unlabeled samples from the current model; it constructs a K-nearest neighbor graph in the feature space and calculates the neighborhood class distribution and the local density represented by the inverse of the average neighborhood distance or neighborhood entropy. Introducing the relationship between stage s and density A monotonically increasing dynamic weight function affects the model distribution. Distribution with neighborhood Adaptive fusion is performed to generate soft pseudo-labels and provide confidence scores; K in the K-nearest neighbor graph is an odd number between 3 and 11, preferably 5; local density measurement is used. Characterized by the reciprocal of the average neighborhood distance or neighborhood entropy; the dynamic weighting function is... ,in With training phase Monotonically increasing, and increasing with density in high-density regions. Monotonically increasing to enhance the weights of the model's confidence region; The cascaded correction module fits a Gaussian mixture model (GMM) to the set of false label confidence scores to identify complex samples; it then performs Monte Carlo Dropout to estimate the uncertainty measure for the complex samples. ; using phased thresholds With temperature control, Decreasing with each stage Decreasing with each stage: Hard pseudo-labels are used in training. The Kullback-Leibler (KL) divergence between the temperature-scaled soft labels and the model output is used as the supervision signal. Specifically, temperature-scaled soft labels are used in training, and the KL divergence is used to implement distribution consistency constraints and suppress the propagation of false label noise. The number of components in the Gaussian mixture model is selected using the Bayesian information criterion, and covariance regularization is used for parameter estimation. The Monte Carlo Dropout sampling number is T times, preferably T = 10–30 times. Uncertainty... The variance or entropy of the T-class classification probabilities is used as a metric. Semi-supervised training scheduling and anti-curricular control module: Using the training phase s as the independent variable, it implements phased scheduling of sample ratio, threshold, loss weight, and augmentation intensity; it adopts an anti-curricular strategy to increase the sampling ratio and loss weight of complex and low-density samples in the early stage, and increase the sampling ratio of high-confidence and high-density samples and weaken the augmentation intensity in the mid-to-late stage, performing multi-stage collaborative optimization until convergence; when the validation set F1 does not improve within a preset window, it triggers early stopping or rollback to the previous stable checkpoint; The inference and diagnosis output module performs temperature calibration on the output probability and sets a category sensitivity threshold during the inference stage to reduce the risk of missing osteoporosis-related categories. It generates imaging diagnostic conclusions or screening discrimination and confidence information, and outputs diagnostic conclusions / screening discrimination, confidence information and Grad-CAM class activation heatmap (an interpretable heatmap based on gradient-weighted class activation mapping). When the confidence level is lower than the threshold or the uncertainty is higher than the threshold, the result is marked as pending review and a prompt is automatically output.
[0024] Please see Figure 1 As shown, in some embodiments of the present invention, a cascaded correction method for diagnosing osteoporosis is provided. First, anteroposterior and lateral X-ray images of the lumbar spine are de-identified, standardized, and adaptively cropped. Standardized data is then generated through enhancements such as rotation, scaling, flipping, and noise perturbation. A self-supervised pre-trained convolutional encoder is used to extract image features, and an autoencoder branch is introduced for feature reconstruction to improve representation robustness. For unlabeled samples, a K-nearest neighbor graph is constructed in the feature space, and soft pseudo-labels are generated by combining a local density weight fusion model with neighborhood distribution prediction. A Gaussian mixture model and Monte Carlo Dropout are used to achieve cascaded correction of pseudo-labels. Hard labels are used for low-uncertainty samples, while high-uncertainty samples are constrained by temperature scaling and KL divergence. Finally, a semi-supervised scheduling and inverse curriculum control strategy is used to dynamically adjust the threshold and loss weights. The steps of this method include: Step 1: Collect lumbar spine X-ray image data from medical imaging equipment; de-label the images; divide the data into training set, validation set and test set according to category, and extract a small number of samples from the training set as labeled subset, and the rest as unlabeled subset; perform size standardization, intensity normalization and noise reduction processing on the images, and generate a standardized input dataset according to the set enhancement strategy library (geometric, illumination, noise and hybrid enhancement).
[0025] Step 2: Use the feature extraction and feature reconstruction module to perform representation learning on the input image: use self-supervised pre-training to obtain the initial weights of the encoder; connect the autoencoder branch in parallel to the encoder representation layer to reconstruct the intermediate features to form an enhanced representation of the semantic neighborhood; update the model parameters during training using a weighted joint optimization method of task loss and reconstruction loss.
[0026] Step 3: Use the pseudo-label generation module to output the class probability distribution of unlabeled samples and construct the nearest neighbor relationship in the feature space to obtain the neighborhood class distribution; perform adaptive fusion based on the local density of the samples and the dynamic weight function in the training stage to generate soft pseudo-labels and their confidence information, which are used to characterize the pseudo-label quality and uncertainty of the basic data.
[0027] Step 4: Use the cascaded correction module to clean up pseudo-labels step by step: First, perform statistical modeling on the confidence distribution to identify complex samples; then perform multiple random forward inferences on the complex samples to estimate uncertainty; use hard pseudo-labels to directly include low uncertainty samples in training, use temperature-scaled soft labels to participate in training for high uncertainty samples, and suppress the propagation of pseudo-label noise through distribution consistency constraints.
[0028] Step 5: Use the semi-supervised training scheduling and anti-course control module as the global control unit to dynamically adjust the sample ratio, threshold parameters, loss weights and augmentation intensity according to the training stage: increase the weight of complex and low-density samples in the early stage, and increase the weight of high-confidence and high-density samples in the later stage; execute pseudo-label generation-cascade correction-model update in a loop according to the multi-stage collaborative optimization strategy until convergence, and trigger early stopping or backoff strategy when the validation set index stagnates.
[0029] Step 6: Use the inference and diagnosis output module to provide inference services for the trained model: perform temperature calibration and category sensitivity threshold determination on the probability output, generate computer-aided diagnostic results of bone status and their confidence level descriptions, and output visualization of the region of interest based on interpretability methods; when the confidence level is low or the uncertainty is higher than the preset threshold, output a review prompt for clinical reference.
[0030] Method Example 1: This embodiment provides a cascaded correction method for diagnosing osteoporosis, the steps of which include: Step 1: Data collection and preprocessing; The data used in this example came from the imaging center of a provincial general hospital. All samples were approved by the ethics committee and underwent de-identification processing. A total of 500 lumbar spine anteroposterior (AP) and lateral (LAT) X-ray images were collected, covering subjects of different genders, ages, and body types. Each sample was assigned a T-score measured by dual-energy X-ray absorptiometry (DXA) as the gold standard for bone status. Based on the DXA results, subjects were divided into three categories: normal bone mineral density (T-score ≥ 100%). 1.0) 180 cases, osteopenia ( 2.5 < T value < 1.0) 160 cases, osteoporosis (T value ≤ 2.5) 160 cases.
[0031] In the preprocessing stage, all images are first de-identified to remove privacy information such as names and numbers from the DICOM file header, and any text overprinting in the images is masked or blurred. Then, stratified sampling is performed based on factors such as category, gender, age group, and imaging pose, dividing the data into training, validation, and test sets in an 8:1:1 ratio. 5%, 10%, 15%, and 20% of the samples in the training set are selected as labeled subsets, with the remainder as an unlabeled subset. After data augmentation, the training set undergoes upsampling and downsampling constraints to ensure the augmented distribution closely approximates the overall distribution. For spatial processing, an adaptive cropping strategy is employed: first, threshold segmentation and connected component analysis are used to coarsely locate the spinal region; then, peak detection of row / column integral curves is used to further determine the target region, achieving precise cropping of the lumbar spine region. The cropped images are then standardized to 512×512 pixels. In the intensity processing stage, Min-Max normalization is performed on the images, and median filtering or bilateral filtering is used for noise suppression to improve image quality and feature stability. Finally, a variety of data augmentation strategies are applied during the training phase, including random rotation (±10°), scaling (0.9–1.1x), horizontal flipping, random noise and contrast perturbation, and CutMix hybrid augmentation operation.
[0032] Step 2: Feature Extraction and Feature Reconstruction This part of the network structure corresponds to, as follows: Figure 2 The “pre-training” module (upper part) is shown. In the feature extraction and feature reconstruction stages, this embodiment uses a lightweight convolutional neural network as the encoder backbone. The structure is based on the improved ResNet-18 framework. In order to improve the robustness of feature extraction under limited annotation, a self-supervised contrastive learning framework based on momentum update is first used to pre-train the encoder.
[0033] Specifically, strong data augmentation and weak data augmentation are performed on the same unlabeled image. The strongly augmented view is input to an online encoder (outputting a feature vector). A momentum encoder that delays the update of input parameters for a weakly augmented view (output feature vector). The feature vectors output by both are used to calculate the contrastive loss. Meanwhile, feature reconstruction modules (such as...) are connected in parallel at the representation layers of both encoders. Figure 2 As shown in the dashed box, this module contains three fully connected or lightweight convolutional branches with feature dimensions of 1024, 512, and 256 respectively. This module performs reconstruction on the intermediate feature vectors and outputs the reconstructed features. and This reconstruction task-constrained model maintains the consistency of anatomical structure in semantic space and obtains the reconstruction loss by calculating the difference between the reconstructed features and the original input information. Its formula is defined in the form of the square of the Euclidean distance:
[0034] In the above formula, This represents the intermediate feature vector extracted by the encoder (corresponding to the one in the figure). or ), This represents the feature vector reconstructed by the feature reconstruction module (corresponding to the one in the figure). or ).
[0035] During the pre-training phase, the contrastive loss is jointly optimized. With reconstruction losses This enables the model to learn semantically sensitive general representations. After pre-training, the encoder parameters are used as downstream semi-supervised tasks (i.e., Figure 2 The initial weights (lower half). The model's total loss function consists of four parts: , in, The cross-entropy supervision loss for labeled samples, For the cross-entropy loss of hard pseudo-labels, The Kullback-Leibler (KL) divergence loss is the difference between the soft pseudo-labels and the model output. To reconstruct the constraint terms; These are the weight coefficients, which are set in stages by the semi-supervised training scheduling module during the training process to balance the contributions of the supervision signal and feature reconstruction at different stages.
[0036] Step 3: Generating pseudo-tags In the pseudo-label generation stage, this embodiment first outputs the class probability distribution for unlabeled samples using the currently trained model. This serves as the initial result for model prediction. Subsequently, a K-nearest neighbor graph is constructed in the encoder's feature space to characterize the local structural relationships between samples, where K is an odd number ranging from 3 to 11, preferably K=5. For each unlabeled sample, the class distribution of its neighboring samples is calculated and a weighted average is taken to obtain the neighborhood class probability distribution. To measure the local density features of a sample, a density parameter is defined. Characterized by the reciprocal of the average neighborhood distance of the samples, high-density regions correspond to larger... Value. Combining model predictions with neighborhood statistics, a dynamic fusion strategy is employed to generate the final soft pseudo-labels. The specific expression is: , Among them, the weight function With training phase Monotonically increasing, and following the trend in high-density regions. Increasing the size of the model allows it to rely more on neighborhood distribution in the early stages of training, while gradually increasing the dominance of the model's confidence region in later stages. (After fusion) This is a soft pseudo-label used in semi-supervised training, and the class probability corresponding to its maximum component can be used as a confidence index for the pseudo-label.
[0037] Step 4: Cascade Correction The cascaded correction module in this embodiment is mainly used to identify and correct the reliability of counterfeit tags. For example... Figure 3 As shown. First, statistical modeling is performed on the pseudo-label confidence set generated in the previous stage. A Gaussian mixture model is used for clustering to distinguish between high-confidence and low-confidence samples. The number of model components is automatically selected by the Bayesian Information Criterion (BIC), and the covariance matrix is regularized to ensure numerical stability. For images judged as complex samples, Monte Carlo Dropout inference is further performed (sampling times T=20), and the variance or entropy of the multiple output probabilities is calculated as an uncertainty measure. When the sample uncertainty... When, its pseudo-labels are treated as trustworthy hard labels and included in the training; when In this process, temperature scaling is applied to the pseudo-labels to obtain soft labels, and KL divergence is used to constrain their consistency with the model's output distribution, thereby suppressing noise propagation while preserving the effective information of the pseudo-labels. Through this cascaded cleanup process, the model can dynamically adjust the credibility level of the pseudo-labels, significantly reducing the negative impact of low-quality pseudo-labels on training convergence.
[0038] Step 5: Semi-supervised training scheduling and anti-curricular control The semi-supervised training scheduling and anti-course control module, as the core of global optimization, is used to dynamically adjust the sample ratio, threshold, loss weights, and augmentation intensity at different training stages. This part, as follows... Figure 2 The second half is shown. Using the training phase *s* as the independent variable, the system updates parameters at each phase based on the model's convergence state and validation set performance. In the early phase (s=1), a back-learning strategy is employed, focusing on strengthening the learning of difficult and low-density samples by increasing their sampling ratio and loss weights to help the model capture boundary features. In the mid-phase (s=2), the ratio of easy to difficult samples is gradually balanced, and the enhancement intensity is reduced to stabilize the feature distribution. In the late phase (s=3), the focus is on increasing the weights of high-confidence and high-density samples to solidify the classification boundary and reduce pseudo-label fluctuations. Meanwhile, , , The loss weight is adjusted by increasing or decreasing with each stage of the scheduling process. With temperature parameters The training dynamically evolves from robustness to convergence by decreasing the set function. When the validation set F1 score does not improve within a set window, an early stopping mechanism is triggered or the system reverts to the previous optimal checkpoint to ensure training stability and final model performance.
[0039] Step 6: Reasoning and Diagnostic Output In the inference and diagnostic output phase, the system performs forward inference on the test samples, outputs class probabilities, and performs temperature calibration to improve the consistency of confidence levels. Subsequently, based on different bone status categories, sensitivity thresholds are set, and the output results are classified to generate diagnostic conclusions for normal bone density, osteopenia, or osteoporosis, along with corresponding confidence information. When the confidence level falls below the set threshold or the uncertainty exceeds the warning value, the system automatically outputs a "Pending Review" prompt for secondary clinical review. Simultaneously, this module generates a diagnostic conclusion using the Grad-CAM algorithm, such as... Figure 4 The interpretable heatmap highlights key features of the model in the vertebral cortex, endplate, and trabecular bone regions, helping physicians understand the model's rationale and improving diagnostic reliability. The final results can be output as a diagnostic report, confidence score, and visualized heatmap on the workstation interface, enabling an intelligent and traceable bone condition-assisted diagnostic process.
[0040] Results Comparison: This embodiment conducted comparative experiments with mainstream semi-supervised models on datasets with different annotation ratios to verify the effectiveness of the proposed cascaded correction semi-supervised diagnostic model under low-annotation conditions. The results show that when only 5% of the samples are labeled, the model significantly outperforms other mainstream methods, such as Pseudolabel (F1 54.97%) and FixMatch (F1 53.43%), achieving an accuracy of 76.92% and an F1 score of 71.75%. When the annotation ratio increases to 10%, the model's F1 score reaches 69.96%, maintaining a stable advantage. As the labeled sample ratio increases to 15% and 20%, the accuracy reaches 80.21% and 84.61%, respectively, and the F1 score improves to 71.93% and 77.74%, consistently ranking highest among the compared models. The comparative experiments of the system under different annotation ratios are shown in Tables 1-4.
[0041] To further explore the underlying mechanisms of the model performance improvement, this embodiment outputs a visual comparison of the results, such as... Figure 4 As shown, the system uses Grad-CAM to generate interpretable heatmaps containing lumbar spine anteroposterior and lateral images at different annotation ratios (5%, 10%, 15%, and 20%). The warm-toned areas (red and orange regions) in the image represent feature areas that the model assigns high attention to when predicting osteoporosis diagnosis. Figure 4 As observed, even under the stringent condition of an extremely low annotation ratio of only 5%, the model was able to initially focus its attention on the critical L1 to L4 vertebral body regions, demonstrating the effectiveness of the combined self-supervised pre-training and semi-supervised cascade correction mechanism in extracting core anatomical features under low annotation conditions. As the annotation ratio gradually increased to 20%, the model's high-attention areas became more refined and highly focused, and meaningless background noise activation (such as the bright artifacts in the upper left corner of the 5% lateral view) was significantly suppressed. The high-confidence heatmap accurately covered the core regions of the L1 to L4 vertebral bodies, which not only visually confirms the superior feature mining and noise-resistant generalization capabilities of the method under limited data, but also provides clinicians with highly reliable visual interpretation evidence for assessing bone status.
[0042] Table 1. Comparative experimental results of different models on a dataset with 5% labeled proportion.
[0043] Table 2. Comparative experimental results of different models on a dataset with 10% labeled proportion.
[0044] Table 3. Comparative experimental results of different models on a dataset with 15% labeled proportion.
[0045] Table 4. Comparative experimental results of different models on a dataset with 20% labeled proportion.
[0046] The overall results show that the bone status diagnosis method based on the cascade correction mechanism proposed in this invention exhibits superior classification accuracy and generalization ability under different annotation ratios. In particular, it can maintain a high level of diagnostic performance even when the number of labeled samples is very small, which verifies the effectiveness and robustness of the cascade correction strategy in semi-supervised osteoporosis diagnosis tasks.
[0047] For the parts of this invention not described in detail, please refer to the prior art or the art known to those skilled in the art. This embodiment does not limit these aspects and will not describe them in detail here.
[0048] The embodiments of the present invention have been described above with reference to the accompanying drawings. However, the present invention is not limited to the specific embodiments described above. The specific embodiments described above are merely illustrative and not restrictive. Those skilled in the art can make many other forms under the guidance of the present invention without departing from the spirit and scope of the claims, and all of these forms are within the protection scope of the present invention.
Claims
1. A system for diagnosing osteoporosis with cascade correction, characterized by, The system includes: The data collection and preprocessing module is configured to provide standardized, high-quality image input, solving problems such as privacy leaks, inconsistent formats, and unbalanced sample distribution in raw images; The feature extraction and feature reconstruction module is configured to achieve effective feature mining and robust representation of lumbar spine X-ray images, solving the problems of weak generalization ability of single feature extraction and insufficient feature learning under limited annotation. The pseudo-label generation module is configured to generate high-quality soft pseudo-labels for unlabeled samples, solving the problem that single pseudo-label prediction in semi-supervised learning is prone to introducing errors, and realizing the fusion optimization of model prediction and neighborhood distribution. The cascaded correction module is configured to perform step-by-step purification of pseudo-labels, solving the problem of model performance degradation caused by pseudo-label noise propagation, and realizing the credibility classification and accurate utilization of pseudo-labels. The semi-supervised training scheduling and anti-course control module is configured as a global control unit to address the problem that traditional course-based learning, which progresses from easy to difficult, is unsuitable for identifying complex osteoporosis samples, thus achieving dynamic optimization of the training process; and, The reasoning and diagnostic output module is configured to enable the clinical application of the model, solving the problems of uninterpretable intelligent diagnostic results and high risk of missed osteoporosis diagnosis, while balancing diagnostic accuracy and clinical practicality.
2. The cascade-corrected osteoporosis diagnostic system of claim 1, wherein, The data collection and preprocessing module includes the following: Core operations: De-identification processing of lumbar spine anteroposterior and lateral X-ray images to remove privacy information and mask text overprints; stratified sampling according to proportion and category to divide the data into training, validation and test sets, and extracting 5% to 20% of the samples from the training set as labeled subsets, with the rest as unlabeled subsets; Impact on standardization: Adaptive cropping is achieved through coarse spine localization and peak detection, unifying pixels to the same size, while intensity normalization and noise suppression are performed; and, Data augmentation and distribution constraints: Geometric and lighting augmentation methods are used to expand the samples, and upsampling and downsampling constraints are applied to the training set to make the distribution of the augmented samples approximate the original population distribution.
3. The cascade-corrected osteoporosis diagnostic system of claim 1, wherein, The feature extraction and feature reconstruction module includes: Self-supervised pre-training: A contrastive learning method is used to pre-train a lightweight convolutional encoder to learn semantically sensitive general representations, which serve as initial weights for downstream tasks; and, Feature Reconstruction Joint Optimization: An autoencoder branch is connected in parallel to the encoder representation layer to perform reconstruction on the intermediate feature vectors. The model is jointly optimized by a weighted sum of reconstruction loss and task loss.
4. The cascade-corrected osteoporosis diagnostic system of claim 1, wherein, The pseudo-tag generation module includes the following: Dual distribution acquisition: The class probability distribution of unlabeled samples output by the current model is used to construct a K-nearest neighbor graph in the feature space to obtain the neighborhood class probability distribution; Local density measure: The local density of a sample is characterized by the inverse of the average neighborhood distance or neighborhood entropy ρ to distinguish between high or low density samples; and, Adaptive fusion to generate soft pseudo-labels: The dual distributions are fused through a dynamic weight function. The dynamic weight function increases monotonically with the training phase and local density, while outputting the confidence of the pseudo-labels.
5. The cascade-corrected osteoporosis diagnostic system of claim 1, wherein, The cascaded correction module includes: Gaussian Mixture Model Clustering: Fit a Gaussian mixture model to the set of pseudo-label confidence scores. The number of components is selected by the Bayesian information criterion, and covariance regularization is applied to identify complex or low-confidence samples. Uncertainty measure: For complex samples, Monte Carlo Dropout is performed, and the variance or entropy of the T classification probabilities is used as the uncertainty measure u; and, Hierarchical utilization of hard and soft labels: Set a staged threshold τlow(s). When u≤τlow(s), hard pseudo-labels are used for training. When u>τlow(s), temperature scaling is applied to the pseudo-labels to obtain soft labels. Noise propagation is suppressed by KL divergence constraint.
6. The cascade-corrected osteoporosis diagnostic system of claim 1, wherein, The semi-supervised training scheduling and anti-course control module includes the following: Staged parameter scheduling: Using the training stage s as the independent variable, the sample ratio, threshold, loss weight and data augmentation intensity are dynamically adjusted; Anti-curricular learning strategies: In the early stages, increase the sampling ratio and loss weight of complex and low-density samples to focus on learning boundary features; in the mid-to-late stages, increase the weight of high-confidence and high-density samples, reduce the enhancement intensity, and solidify the classification boundary; and, Training convergence control: When the F1 value on the validation set does not improve within a preset window, an early stopping mechanism is triggered or the system reverts to the previous stable checkpoint to avoid overfitting.
7. The cascade-corrected osteoporosis diagnostic system of claim 1, wherein, The content of the reasoning and diagnostic output module includes: Probability calibration and category determination: Temperature calibration is performed on the model output probability, a category sensitivity threshold is set, and a diagnostic conclusion of normal bone density, osteopenia, or osteoporosis is generated. Interpretable Output: Interpretable heatmaps are generated using gradient-weighted class activation mappings, highlighting regions of interest to the model, including the vertebral cortex and trabeculae; and, Clinical review prompt: When the diagnostic confidence is below the threshold or the uncertainty is above the warning value, it is automatically marked as pending review and a prompt is output for clinical review.
8. A method of diagnosing osteoporosis with a cascade correction, characterized by, This is achieved using the cascaded correction osteoporosis diagnostic system according to any one of claims 1-7; the method includes the following steps: S1. Image data acquisition and standardized preprocessing; S2. Image feature representation learning; S3. Soft pseudo-labels were not generated for unlabeled samples; S4. Cascade correction and purification of pseudo-labels; S5. Anti-curricular semi-supervised dynamic training; S6. Reasoning, diagnosis, and clinical output.
9. The cascade-corrected osteoporosis diagnostic method according to claim 8, wherein, Step S3 includes: A K-nearest neighbor graph is constructed in the feature space to obtain the neighborhood distribution. A weight function that dynamically changes with the training stage and local density is introduced to achieve adaptive fusion of the model prediction distribution and the neighborhood distribution. At the same time, the local density is used to distinguish the characteristics of the samples, so that the pseudo-labels are more in line with the distribution characteristics of the data itself, and solve the problem of insufficient mining of information of unlabeled samples under limited labeling. Step S4 includes: By combining Gaussian mixture models with Monte Carlo Dropout, a two-level correction of pseudo-labels is achieved: first, Gaussian mixture models are used to cluster confidence to identify complex samples, and then Monte Carlo Dropout is used to quantify sample uncertainty, enabling the hierarchical utilization of soft and hard labels. At the same time, temperature scaling and KL divergence are combined to constrain the noise propagation of high-uncertainty samples, solving the problem of model performance degradation caused by low pseudo-label quality in semi-supervised learning.
10. The cascade-corrected osteoporosis diagnostic method of claim 8, wherein, Step S5 includes: Full-parameter dynamic scheduling with training phase as independent variable: In the early stage, the learning weights of complex and low-density samples are strengthened, and in the later stage, the classification boundary of high-confidence and high-density samples is consolidated. At the same time, the sample ratio, loss weight, enhancement intensity and threshold are adjusted in stages, and an early stop or rollback mechanism is set up to solve the problems of poor recognition effect and training convergence difficulty in rare but critical low bone density difficult cases. Based on the self-supervised pre-trained encoder, a parallel autoencoder branch is used to reconstruct features. The reconstruction loss, supervision loss, and pseudo-label loss are fused into a total loss function. By using multi-loss weighted joint optimization to constrain model learning, the model can learn more robust and semantic lumbar X-ray image features under limited annotation, thus solving the problem of weak generalization ability of single feature extraction.