Low back pain anomaly analysis method based on weighted graph convolution network
By using a neighborhood repartition-based oversampling method and a weighted graph convolutional network, combined with a self-attention mechanism and a feature importance mask, the problems of sample imbalance and overfitting in LBP diagnosis are solved, achieving more accurate and stable diagnosis, which is suitable for auxiliary diagnosis of lower back pain in remote areas.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHANGZHOU UNIV
- Filing Date
- 2023-02-21
- Publication Date
- 2026-06-23
AI Technical Summary
Existing technologies neglect the relationships between patients in the diagnosis of lower back pain (LBP), resulting in an imbalance in the distribution of sample categories, which affects diagnostic accuracy. Furthermore, traditional methods are prone to overfitting and make it difficult to develop personalized treatment plans.
A neighborhood-based oversampling method is used to balance the dataset. A weighted graph convolutional network is constructed, and potential relationships between patients are mined through edge connections. By combining a self-attention mechanism and a feature importance mask, the importance of biological attributes is dynamically captured, thereby achieving feature sparsity to avoid overfitting.
It improves the accuracy and stability of LBP diagnosis, reduces the false negative rate, enables the development of personalized treatment plans, and is suitable for auxiliary diagnosis and early prevention in remote areas.
Smart Images

Figure CN116153511B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of neural network technology, and in particular to a method for analyzing lower back pain abnormalities based on weighted graph convolutional networks. Background Technology
[0002] Lower back pain (LBP) is currently the leading cause of disability worldwide, imposing a significant socioeconomic burden. Measuring the prevalence of LBP in its early stages can reduce the need for surgery and improve cure rates with minimal medication and appropriate physical therapy. Furthermore, in remote and relatively underdeveloped areas where people often struggle to access timely, systematic specialist care at large hospitals, utilizing specialist care as an auxiliary diagnostic and early prevention technique is of paramount importance.
[0003] Some research has applied artificial intelligence to the diagnosis of LBP (Low Biliary Pressure Bleeding), with most work exploring the impact of MRi and CT images on LBP. Sagittal parameters of the spine are important indicators for diagnosing spinal abnormalities, and many spinal diseases are directly related to these biomechanical properties, such as pelvic tilt, pelvic incidence, and sacral tilt. Therefore, this invention focuses on exploring the potential relationship between sagittal parameters of the spine and LBP. Currently, research on LBP diagnosis based on biomechanical characteristics is relatively limited. The common approach involves selecting features based on certain principles and then training traditional machine learning models to achieve good classification accuracy. This method implicitly assumes that the samples are independent, ignoring the correlation between patients. However, considering the relationships between patients is beneficial, as it helps in analyzing and studying similar patient groups. In clinical diagnosis, doctors also extract similar attributes from previous patients of the same type and apply them to the diagnosis and treatment of new patients.
[0004] Unlike popular neural network architectures, Graph Convolutional Networks (GCNs) offer a natural way to represent interactions between groups. Patients can be treated as nodes, and relationships between patients as edges to construct a graph network, explicitly capturing structural information between patient samples one by one through edge connections. Patients are then clustered based on certain similarities, learning specific patient characteristics for personalized diagnosis. Graph Convolutional Networks are an efficient method for performing convolution operations on graph structures, achieving good performance in various graph tasks. GCN applications in medical diagnosis primarily focus on diseases such as Alzheimer's and Parkinson's, with limited research on LBP (Low-Browser Bridge). Existing research focuses on introducing attention mechanisms to strengthen edge connections between patients and addressing multimodal data fusion, with limited analysis of the inherent properties of medical datasets. Limited sample size and imbalanced class distribution are common problems in medical datasets. When class distribution is skewed, graph classifiers tend to favor the primary class while ignoring samples in secondary classes; insufficient sample size can lead to overfitting of the trained model. In node classification tasks using graph convolutional networks, it is also necessary to consider the internal and external influences of each node; the internal influences correspond to its own attributes, while the external influences correspond to aggregate attributes. When investigating the priority of the internal attributes of nodes in each category, the relevance of each attribute can be sorted under the current task, the most important attribute is selected and sent along the graph edges, while less important attributes are suppressed in the message passing mechanism. Summary of the Invention
[0005] To address the shortcomings of existing algorithms, this invention first employs a neighborhood repartition-based oversampling method to equalize the LBP dataset, reducing the impact of uneven sample class distribution on prediction results. Secondly, it represents the physiological attributes of spinal abnormalities as weighted feature maps, mining potential relationships between patients through edge connections and dynamically capturing the importance of biological attributes for different patients through weighted feature weighting within nodes, which is beneficial for developing personalized treatment plans. Finally, it calculates feature importance based on feature information content and relevance, and uses this as a guide to generate feature masks, achieving feature sparsity while strengthening important features, avoiding overfitting in small sample learning, and making diagnostic results more stable. This invention focuses on information mining of internal node attributes rather than external attributes.
[0006] The technical solution adopted in this invention is: a method for analyzing lower back pain abnormalities based on weighted graph convolutional networks, comprising the following steps:
[0007] Step 1: Data preprocessing;
[0008] Furthermore, step one specifically includes:
[0009] Step S11: Analyze the distribution characteristics of samples in the dataset using box plots, and use the interquartile range (IQR) of the box plots to detect outliers;
[0010] Furthermore, the formula for outlier detection using the interquartile range (IQR) of box plots is as follows:
[0011]
[0012] Among them, Upperfence is the point above the upper limit; Lowerfence is the point below the lower limit; QL and QU are the first and third quartiles, respectively.
[0013] Step S12: Based on the characteristics of the dataset, form a circular region with the distance to the third nearest neighbor of each minority class as the radius, and push the majority class samples in the circular region out into the circular region by translation; determine the number of new samples to be generated according to the radius of each minority class; randomly iterate and generate new samples in the circular region, and calculate the cumulative proximity between the current minority class point and other observations in the neighborhood using the Gaussian radial basis function; re-divide the neighborhood using formula (3) to select a new oversampling region, defined as the R function:
[0014]
[0015] Where n represents the scope of the re-divided domain, R represents the oversampled region after the division, φ is the value of each newly generated sample, and U is the set.
[0016] Step 2: Construct a learning model based on graph networks and graph convolutions;
[0017] Furthermore, constructing a graph network includes:
[0018] The adjacency matrix A of graph G is constructed using the k-nearest neighbor algorithm, and the correlation between patient sample nodes is obtained by calculating the Euclidean distance between the features of patient sample nodes.
[0019] Sort node i according to its distance from all other nodes to obtain the adjacency matrix A, as shown in the formula:
[0020] Among them, C i This represents the set of nodes that are connected to node i by an edge.
[0021] Furthermore, constructing graph convolutions includes:
[0022] The constructed feature matrix X is spectrally convolved with the adjacency matrix A to obtain new nodes;
[0023] The formula for spectral convolution is:
[0024]
[0025] Among them, H (l) H represents the node features of the l-th layer. (l+1)W represents the updated node features of layer l+1. (l) It is the training weight matrix for each layer; It is a normalized adjacency matrix.
[0026] Step 3: Add a feature weighting mechanism based on self-attention;
[0027] Furthermore, specifically including:
[0028] The weight coefficients are learned using a self-attention mechanism to obtain the relationship between different input features of the patient, thereby weighting the node representation.
[0029] For patient node i, the node embedding is X. i ∈R 1×D The self-attention score A represents the node embedding X. i The contribution of each feature value to the GCN network is expressed by formula (8):
[0030]
[0031] in, X represents i The transpose of W u and W v S and S are the weights of the self-attention modules, and softmax(·) is the normalized exponential function;
[0032] Representation of node i encoded by self-attention mechanism express
[0033] Step 4: Introduce the impact of feature importance mask on classification performance; use feature importance mask to select features with special attributes for model adjustment. The feature mask retains the top m features with high importance and clears the positions of other features to zero.
[0034] Furthermore, specifically including:
[0035] Step S41: Calculate the information gain brought to H(U) by adding feature V using information gain;
[0036] Furthermore, the information gain formula is:
[0037] IG(U,V)=H(U)-H(U|V) (10)
[0038] Here, H(U) represents the information entropy of set U, and H(U|V) represents the uncertainty of sample set U given set V.
[0039] Step 42: Calculate the class label Y of the training set samples. trainAnd the i-th column (i∈[1,D]) of the node feature matrix Information gain;
[0040] Step 43: Calculate the i-th type of features and category label Y train The Pearson correlation coefficient measures the independence of features; the formula for the Pearson correlation coefficient is:
[0041]
[0042] Where k∈[1,n] is the number of samples in the training set, Y train,k and These represent the k-th sample in set Y. train and The values in and S represents set Y train and The mean;
[0043] Step 44: Define the importance score of the i-th type of feature as a weighted average of the feature's distinctiveness and independence;
[0044] Furthermore, the weighted average formula is:
[0045]
[0046] Where α is an adjustable parameter. For the sample category label Y train And the i-th column (i∈[1,D]) of the node feature matrix Information gain; For the i-th type of feature and category label Y train The Pearson correlation coefficient between them.
[0047] Step 45, Take The first m major category features are used as representative features, and their corresponding positions are assigned values in the Mask matrix to obtain the feature mask. The final updated sparse feature matrix is then obtained.
[0048] Furthermore, sparse feature matrix The formula is:
[0049]
[0050] Where ⊙ is the Hadamard product of the matrices, M k As a feature mask, The representation of node i encoded by the self-attention mechanism.
[0051] The beneficial effects of this invention are:
[0052] 1. An oversampling method based on neighborhood repartitioning is used to equalize the LBP dataset, reducing the impact of uneven distribution of sample classes on the prediction results;
[0053] 2. Representing the abnormal physiological attributes of the spine as a weighted feature graph, we can mine the potential relationships between patients by connecting edges, and dynamically capture the importance of biological attributes to different patients by weighting the internal features of nodes, which is conducive to developing personalized treatment plans.
[0054] 3. Calculate the importance of features based on their information content and relevance, and use this as a guide to generate feature masks. This achieves feature sparsity while strengthening important features, avoids overfitting in small sample learning, makes the diagnostic results more stable, and reduces the LBP false negative rate. Attached Figure Description
[0055] Figure 1 This is a model framework based on weighted graph convolutional networks;
[0056] Figure 2 Comparison results of data cleaning;
[0057] Figure 3 The impact of data equalization;
[0058] Figure 4 This is a feature weighting mechanism based on self-attention;
[0059] Figure 5 This is a feature sparsification process based on feature importance masks. Detailed Implementation
[0060] The present invention will be further described below with reference to the accompanying drawings and embodiments. The drawings are simplified schematic diagrams, which only illustrate the basic structure of the present invention in a schematic manner, and therefore only show the components related to the present invention.
[0061] like Figure 1 As shown, a method for analyzing lower back pain anomalies based on weighted graph convolutional networks includes the following steps:
[0062] Step 1: Data preprocessing; The performance of the model of this invention is evaluated by applying publicly available data from Kaggle.
[0063] Data cleaning: Each patient in the dataset contains 12 features. These 12 features may contain outliers during the measurement and recording process by doctors. First, the distribution characteristics of the samples in the dataset are analyzed by box plot. Outliers refer to points that are far away from the majority of points in the same feature attribute. Data cleaning of outliers is very important. Outliers are detected by using the interquartile range (IQR) of the box plot. QL and QU are the first and third quartiles, respectively. Outlier detection can be performed using the rule of formula (1). Points above the upper limit and below the lower limit are considered abnormal outliers. Directly deleting the corresponding samples is not a suitable choice. Therefore, the mean of the attribute is used to replace the original outliers.
[0064]
[0065] Data equalization: First, based on the characteristics of the dataset, a circular region is formed with the distance to the third nearest neighbor of each minority class as its radius. The majority class samples within this circular region are then shifted outwards into the circular region, creating a clean circular region for subsequent oversampling. Second, the number of new samples to be generated for each minority class is determined based on its radius. Then, new samples are randomly generated iteratively within the circular region, and the cumulative proximity between the current minority class point and other observations in its neighborhood is calculated using the Gaussian radial basis function.
[0066]
[0067] Where, x 0 Let X represent the current minority class point, and let x represent x. 0 The set of minority classes in the neighborhood, and using Euclidean distance as the distance metric, φ(x 0 (,X) represents x 0 The cumulative sum of distances to the observation set X. The value of each newly generated sample is φ, and the set is U. Finally, a method is designed to divide the circular region using the set U, and to re-divide the neighborhood and select new oversampling regions using formula (3), defined as the R function:
[0068]
[0069] Here, n represents the extent of the re-divided domain, and R represents the oversampled region after the partitioning. The purpose of this behavior is to generate samples closer to the original minority class, which helps improve the classifier's prediction performance for the minority class without reducing its prediction performance for the majority class.
[0070] For this dataset, false points are filtered out based on the boxplot detection results, such as... Figure 2As shown. Then, based on the data imbalance ratio, oversampling through neighborhood repartitioning is used for equalization to suppress noise and improve performance, as shown. Figure 3 As shown. Next, each dataset is divided into a training set and a test set. 30% of the users are randomly selected as the test set for testing, and 70% of the user data is selected for training the network. The network is run 10 times using the same parameters, and the average results are reported.
[0071] Step 2: Learning model based on graph convolutional networks. The learning model of graph convolutional networks consists of two parts.
[0072] Graph network definition: Define an undirected graph G = (A, X), where A ∈ R. N×N Let R be an adjacency matrix, representing whether there are edge connections between N nodes; X∈R N×D is the feature matrix of the nodes, where D represents the feature dimension. The dataset contains 12 physiological attribute features of 310 patients, so each patient sample can be regarded as an entity node, and its corresponding 12 attributes can be regarded as node features X, i.e., N=310, D=12; edges are used to define the interaction between two nodes. If the distance between nodes is closer, the interaction is stronger, and the edge connection is more likely to exist.
[0073] The graph network construction part of the learning model: The adjacency matrix A of graph G is constructed using the k-nearest neighbor algorithm, and the Euclidean distance Dis(X) between the features of patient sample nodes is calculated using formula (4). i ,X j ) Obtain their correlation, where X i ,X j ∈R 1×D .
[0074]
[0075] Then, the k-nearest neighbor algorithm is used to sort the distances Dis between node i and all other nodes, and the K nearest nodes are selected as neighbor nodes. As shown in formula (5), where C i This represents the set of nodes connected to node i by an edge. `sort(·)` is the ascending order sorting operator. i Let represent the set of feature distances between node i and all other nodes, where pi is the sorted distance vector, (·). k This represents the K minimum values selected by the k-nearest neighbor algorithm.
[0076]
[0077] Therefore, the adjacency matrix A can be expressed as shown in formula (6).
[0078]
[0079] The graph convolution part of the learning model: Substitute the constructed feature matrix X and the adjacency matrix A into formula (7) to perform spectral convolution operation to obtain the new node representation.
[0080]
[0081] Among them, H (l) H represents the node features of the l-th layer. (l+1) This represents the updated node features of the (l+1)th layer, where H is the input layer node feature. (0) =X;W (l) It is the trainable weight matrix for each layer; It is a normalized adjacency matrix, designed to enable nodes to learn more of their own features during graph convolution. It is a normalized degree matrix; through graph convolution operation, the features of each node are weighted with the features of its neighboring nodes and then propagated to the next layer. As the number of convolutional layers increases, each node can aggregate the features of its distant neighboring nodes, resulting in a larger receptive field. However, this will also cause backpropagation to be too smooth, and the gradient may vanish.
[0082] Step 3: Add an effect comparison using a self-attention-based feature weighting mechanism; the evaluation will compare the effects of using a self-attention-based feature weighting mechanism, such as... Figure 4 As shown.
[0083] Internal priority importance assessment is performed on the 12 feature values of a single node. The main purpose of using a self-attention mechanism is to learn weight coefficients and obtain the relationship between different input features of the patient, thereby weighting the node representation. For patient node i, its node embedding is X. i ∈R 1×D The self-attention score A represents the contribution of each feature value in the node embedding Xi to the GCN network, and can be expressed by formula (8).
[0084]
[0085] in, X represents i The transpose of W u and W v The sum of the weights of the self-attention module is used, and softmax(·) is applied to ensure that the sum of all weights is 1; the final representation of node i is encoded by the self-attention mechanism. This can be represented by formula (9).
[0086]
[0087] like Figure 5Step 4: Introduce the impact of feature importance mask on classification performance; use feature importance mask to select features with special attributes for model adjustment, the feature mask retains the top m features with the highest importance and clears the positions of other features to zero.
[0088] Ideally, each node would have its own attention score to quantify the impact of node attributes on the prediction result, but this would also lead to a heavy computational burden. In practice, only relatively important attributes can be propagated along the edges of the graph, while less important attributes are suppressed in the information propagation mechanism. Therefore, this invention also generates a mask with special values (0 / 1) of the same shape as the node feature matrix to record the locations of locally representative features. This reduces computational overhead and improves network performance by sparsifying the weighted feature matrix. A value of 1 in the mask represents a feature at that location that is a representative attribute of the node, while a value of 0 indicates that the corresponding value at that location may not be representative of the category.
[0089] Information gain is defined as shown in formula (10), where H(U) represents the information entropy of set U, and H(U|V) represents the uncertainty of sample set U given set V. The information gain of adding feature V to H(U) is calculated using formula (10). The larger the information gain of a feature, the greater its ability to reduce system uncertainty, and the more important it is in subsequent diagnostic tasks.
[0090] IG(U,V)=H(U)-H(U|V) (10)
[0091] The discriminative power of each feature can be calculated by measuring the class label Y of the training set samples. train And the i-th column (i∈[1,D]) of the node feature matrix The information gain is achieved as shown in formula (11), which shows the information gain brought by introducing the i-th type of feature to the classification result; since the label of the test set is unknown, only the samples of the training set are selected for calculation.
[0092]
[0093] The independence of features can be measured by the Pearson correlation coefficient, which is calculated by plotting the i-th class feature in the training set. and category label Y train The Pearson correlation coefficient between them is shown in formula (12), where k∈[1,n] is the number of training set samples, Y train,k and These represent the k-th sample in set Y. train and The values in and S represents set Ytrain and The mean.
[0094]
[0095] Importance score of feature of type i Defined as a weighted average of the distinctiveness and independence of the features, as shown in formula (13):
[0096]
[0097] Here, α is an adjustable parameter used to control the importance of feature discriminability and independence. The larger the value, the more important the i-th type of feature; The values are arranged from largest to smallest. The feature i corresponding to the top m largest values is taken as the representative feature, and its corresponding position is assigned a value of 1 in the Mask matrix, while other positions are assigned a value of 0, thus obtaining the feature mask M. k The value of m∈[1,D] can be adjusted according to the feature distribution of the training set samples; the final updated sparse feature matrix can be expressed as formula (14):
[0098]
[0099] Where ⊙ is the Hadamard product of the matrix, indicating that the matrix is element-wise similar, and the test data and training data share the same feature importance mask.
[0100] The method of this invention is compared with state-of-the-art available methods. Finally, the proposed method is compared with existing methods; Table 1 shows the comparison results of this invention with other commonly used machine learning methods. Validated using the same parameters on the Kaggle public dataset with the hold-out method, it achieved the highest accuracy and strongest stability (smallest standard deviation) in 10 randomized experiments.
[0101] Table 1 Comparison with commonly used machine learning methods
[0102] method Accuracy ± Standard Deviation KNN 81.83%±3.7 SVM 84.41%±3.1 Bagging SVM 85.38%±3.0 MLP 86.34%±4.6 This invention 90.20%±1.3
[0103] Based on the above-described preferred embodiments of the present invention, and through the foregoing description, those skilled in the art can make various changes and modifications without departing from the inventive concept. The technical scope of this invention is not limited to the contents of the specification, but must be determined according to the scope of the claims.
Claims
1. A method for analyzing lower back pain abnormalities based on weighted graph convolutional networks, characterized in that, Includes the following steps: Step 1: Data preprocessing; Step one specifically includes: Step S11: Analyze the distribution characteristics of samples in the dataset using box plots, and use the interquartile range (IQR) of the box plots to detect outliers; Step S12: Form a circular region with the distance to the third nearest neighbor of each minority class as the radius, and push the majority class samples within the circular region out of the circular region by translation; determine the number of new samples to be generated based on the radius of each minority class; randomly iterate and generate new samples within the circular region, and calculate the cumulative proximity between the current minority class point and other observations in the neighborhood using the Gaussian radial basis function; select a new oversampling region by re-dividing the neighborhood, defined as... function: ; in, This indicates the scope of the re-division of the domain. This indicates the region that was oversampled after being divided. For the value of each newly generated sample, For a set; Step 2: Construct a learning model based on graph networks and graph convolutions; Building a graph network includes: use k Nearest Neighbor Algorithm Graph Construction adjacency matrix The correlation between patient sample nodes is obtained by calculating the Euclidean distance between the features of the patient sample nodes. According to nodes Sort the adjacency matrix by the distances between the nodes and all other nodes. ; Constructing graph convolutions includes: The constructed feature matrix Adjacency Matrix Perform spectral convolution to obtain new nodes; The formula for spectral convolution is: ; in, Indicates the first Layer node characteristics, Indicates the updated number Layer node characteristics, It is the trainable weight matrix for each layer; It is a normalized adjacency matrix; Step 3: Add a feature weighting mechanism based on self-attention; Step three specifically includes: The weight coefficients are learned using a self-attention mechanism to obtain the relationship between different input features of the patient, thereby weighting the node representation. For patient nodes Node embedding is Self-attention score Indicates node embedding The contribution of each feature value to the GCN network is expressed as: ; in, express transpose, and The weights of the self-attention module, It is a normalized exponential function; Nodes encoded by self-attention mechanism The representation express ; Step 4: The impact of introducing feature importance masks on classification performance; the effect of using feature importance masks to select features with special attributes for model adjustment; before feature mask retention. Select the most important features and reset the positions of the other features to zero; Step four specifically includes: Step 41: Add features through information gain calculation right The resulting information gain; Step 42: Calculate the class labels of the training set samples. and node feature matrix List Information gain; Step 43: Calculate the training set number... Class features and category labels The Pearson correlation coefficient between features measures their independence. The formula for the Pearson correlation coefficient is: ; in, The number of samples in the training set. and Representing the first A sample in the set and The values in and s represents a set and The mean; Step 44, place the first The importance score of a class feature is defined as a weighted average of the feature's distinctiveness and independence. Step 45, Take forward Each major category feature is used as a representative feature, and its corresponding position is assigned in the Mask matrix to obtain the feature mask. The final updated sparse feature matrix is then obtained. .
2. The method for analyzing lower back pain anomalies based on weighted graph convolutional networks according to claim 1, characterized in that, interquartile range of box plot IQR The formula for detecting outliers is: ; in, Upperfence Points that are above the upper limit; Lowerfence Points below the lower limit; QL and QU They are the first and third quartiles, respectively.
3. The method for analyzing lower back pain anomalies based on weighted graph convolutional networks according to claim 1, characterized in that, The information gain formula is: ; in, Represents a set Information entropy Represents a given set Under the conditions, sample set Uncertainty.
4. The method for analyzing lower back pain anomalies based on weighted graph convolutional networks according to claim 1, characterized in that, The weighted average formula is: ; in, Sample category labels and node feature matrix List Information gain; For the first Class features and category labels The Pearson correlation coefficient between them.
5. The method for analyzing lower back pain anomalies based on weighted graph convolutional networks according to claim 1, characterized in that, Sparse feature matrix The formula is: ; in, The Hadamard product of the matrices. As a feature mask, Nodes encoded for self-attention mechanism The representation of.