An unbalanced image classification method based on feature scaling and boundary samples

CN114155407BActive Publication Date: 2026-06-26EAST CHINA UNIV OF SCI & TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
EAST CHINA UNIV OF SCI & TECH
Filing Date
2021-12-01
Publication Date
2026-06-26

Smart Images

  • Figure CN114155407B_ABST
    Figure CN114155407B_ABST
Patent Text Reader

Abstract

The application discloses an unbalanced image classification system based on feature scaling and boundary samples. The image classification system comprises the following steps: 1) dividing original sample data into a training set and a test set, and constructing the training set as unbalanced data; 2) performing feature scaling on each class by implementing a feature scaling module; 3) performing boundary sample mining on each class by implementing a boundary sample mining module, and constraining intra-class and inter-class distances; 4) regulating the training process of the two modules by a loss scheduler; and 5) inputting original test pictures into a trained convolutional neural network, and then judging the pictures through a full connection layer. The application adjusts the model structure, and enhances the classification performance and robustness of a deep learning model on unbalanced image data.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of image recognition, specifically, to an imbalanced image classification method based on feature scaling and boundary samples. Background Technology

[0002] With the continuous development of deep learning, image classification has made tremendous progress. However, imbalance problems are very common in the real world, such as cancer diagnosis and anomaly detection. Existing imbalance problems can be broadly classified into two categories: long-tail imbalance and step imbalance. Generally speaking, the benefit of correctly predicting the minority samples far outweighs the benefit of misclassifying the majority samples. However, most existing algorithms are suitable for balanced data; when applied to imbalanced data, the algorithms tend to favor the majority class.

[0003] Various methods have been proposed to address imbalance problems. Resampling and cost-sensitive learning are common approaches, and these methods can also be combined with deep learning to handle imbalanced images. Resampling rebalances the number of samples in each class through oversampling or downsampling. Oversampling expands the minority class by repeating some samples in the minority class, while downsampling prunes a subset of samples from the majority class. Another approach, cost-sensitive learning, is also used to address imbalance. This strategy assigns a higher penalty to misclassified minority samples, making the model pay more attention to them.

[0004] Over the past few decades, imbalanced learning methods have seen extensive development. A common approach is resampling, which rebalances the number of samples in each class through oversampling or downsampling. Oversampling expands the minority class by repeating some samples in the minority class, while downsampling removes some samples from the majority class. However, oversampling is prone to overfitting due to the presence of duplicated data, while downsampling can remove valuable information, leading to underfitting. Another approach, called cost-sensitive learning, is also used to address imbalance. This strategy assigns a higher penalty to misclassified minority samples, making the model focus more on them. However, the cost matrix needs to be determined empirically. In terms of algorithm improvement, ensemble learning is a typical approach, combining several simple classifiers to improve overall performance. While ensemble learning has proven effective across various domains, many ensemble learning methods have limitations in multi-class classification and image recognition.

[0005] This invention addresses the imbalanced image classification problem by leveraging geometric information between samples, starting from the model structure. While deep learning can perform high-performance image classification in most cases, it doesn't consider data imbalance; most network models are designed for balanced datasets, thus performing poorly on imbalanced image datasets. To solve this problem, we introduce a feature scaling module and a Gabriel graph-based boundary sample mining module. The feature scaling module scales features based on the ratio of hypersphere radii for different classes, thereby facilitating model training on the minority class. Simultaneously, the Gabriel graph-based boundary sample mining module constrains intra- and inter-class distances based on the constructed Gabriel graph, thereby improving the model's representational ability. Summary of the Invention

[0006] The technical problem this invention aims to solve is to provide a more effective image recognition system that can further improve the recognition accuracy of imbalanced image data. This invention provides a deep learning algorithm capable of solving the imbalanced image classification problem by introducing feature scaling and boundary sample mining into traditional deep neural networks to enhance the model's recognition of imbalanced image data.

[0007] Specifically, an imbalanced image classification method based on feature scaling and boundary samples includes the following steps:

[0008] 1) Divide the original sample data into two parts: a training set and a test set, and construct the training set as imbalanced data;

[0009] 2) Feature scaling step: The features are scaled proportionally according to the radius of the hypersphere formed by the features of each class, thereby enhancing the training and learning of the model on minority class samples.

[0010] 3) Boundary sample mining steps based on Gabriel graphs: Construct a Gabriel graph based on sample features according to the definition of a Gabriel graph, and mine boundary sample features. Constrain intra- and inter-class distances based on the mined features.

[0011] 4) By using a loss scheduler, the training process of the above two modules is controlled, so that the training focuses on training that represents your ability in the early stage and on class discrimination in the later stage.

[0012] 5) In the testing step, the original test image is input into the trained convolutional neural network, features are extracted through the network, and then the image is judged through a fully connected layer.

[0013] Step 1) describes constructing imbalanced data by creating different types of imbalanced datasets on top of the training data, while maintaining balance on the test and validation data. The effectiveness and generalization of the model are then validated based on this. On one hand, long-tail imbalanced datasets are constructed using an exponential function. In the training samples, samples of each class are randomly sampled, where i is the class index. It is the number of samples in the i-th class, where the value of i ranges from 0 to 1. c is the number of categories, and finally This is the imbalance ratio. On the other hand, step imbalance is created by reducing the number of samples in half of the classes; that is, the number of samples in half of the classes remains the same, while the number of samples in the remaining classes is reduced to half of the original number. .

[0014] In step 2), feature scaling calculates the center of each class based on the features of the samples contained in each class within a batch. After obtaining the center of each class in the current batch, the class center is calculated using an exponential moving average. This process can be defined as follows:

[0015]

[0016] in Centered on, The coefficients are updated around the center. The center was calculated in the previous batch. This serves as the center for each class in the current batch.

[0017] After calculating the center of each class, the radius of each class needs to be calculated. The Euclidean distance between the center vector of the class and the feature vectors of all samples of the same class is calculated, and the maximum value is taken as the radius of the current class.

[0018] Finally, the ratio of the radius of each class to the radius of the class with the most samples is calculated. Based on this ratio, the features of different classes are scaled. The scaling factor can be defined as follows:

[0019]

[0020] in The temperature coefficient controls the effect of the radius ratio on the original features. Let the radius be the value of class j. The radius of the class containing the most samples.

[0021] After obtaining the scaling factor, the features extracted by the convolutional neural network are scaled, and finally normalized by the softmax function to obtain the prediction result.

[0022] Step 3) involves boundary sample mining based on the Gabriel graph. The Gabriel graph is defined as follows:

[0023]

[0024] If the above formula is satisfied, then the characteristic is... and characteristics An edge can be formed between them, where Represents Euclidean distance. It shows and All other feature vectors besides those.

[0025] After feature extraction via a convolutional neural network, a Gabriel graph is constructed on the features using the definition of a Gabriel graph, retaining edges whose endpoints are outliers. These samples are defined as boundary samples for each class. Based on these boundary samples, a loss function is applied to make the boundary samples themselves closer to the class center, while outlier boundary samples are further apart. This results in more compact intra-class samples and more dispersed inter-class samples, thereby enhancing the model's representational ability. The specific loss function is defined as follows:

[0026]

[0027] Where S is the set of edges in the Gabriel graph. For coefficients, The corresponding weights of the fully connected layer, The threshold is used to control the difference in distance between classes. To balance the imbalance, since the number of samples in each class is imbalanced, the majority class has more samples in the mined boundary samples. This results in a larger majority class distance value when calculating intra-class and inter-class distance values, making the minority class result almost negligible. Weighting is used to enhance the role of the minority class distance in the loss function, where... This represents the number of samples in the category with the most samples. This represents the number of samples in each class.

[0028] The loss scheduler in step 4) controls the training process of the two modules, focusing on training representation ability in the early stages of training, i.e., paying more attention to... The changes in the later stages focused more on training the discriminative ability, that is, on the cross-entropy loss function. The change in temperature coefficient in step 2) is controlled by the loss scheduler. .coefficient and The training process is controlled as follows:

[0029]

[0030] in and is a hyperparameter, and is a decimal number between [0,1]. Used to control the influence of coefficients at different rounds. This is used to control the retention of previously learned representations during later training stages, preventing subsequent training from destroying the representations learned by the model in previous training rounds. Epoch refers to the number of training rounds, and epochs refers to the total number of rounds.

[0031] Therefore, the overall loss function can be defined as follows:

[0032]

[0033] in This represents the cross-entropy loss function.

[0034] When the model makes predictions on a new test set, the data is input into the convolutional neural network for prediction, and the final sample can be predicted as:

[0035]

[0036] in This represents the probability of the i-th prediction.

[0037] The beneficial effects of this invention are as follows: This invention provides an imbalanced image classification method based on feature scaling and boundary samples. The feature scaling module scales the features of each class using the ratio of the hypersphere radius to the feature size, thereby making the model focus more on training the minority class. The boundary sample mining module uses Gabriel graphs to mine boundary samples and minimizes intra-class distances while maximizing inter-class distances, thus enhancing the representational ability of the neural network. The combination of these two methods enables the neural network to better adapt to imbalanced data, enhances the robustness of the image recognition system on imbalanced data, and improves classification performance. Attached Figure Description

[0038] Figure 1 This is an overall framework diagram of the present invention.

[0039] Figure 2 This is a training flowchart of the present invention. Detailed Implementation

[0040] The present invention will now be described in detail with reference to the accompanying drawings and specific embodiments, but this does not limit the present invention to the scope of the described embodiments.

[0041] Example

[0042] This embodiment provides an image recognition method for effectively identifying images in an imbalanced image dataset. See [link to relevant documentation]. Figure 1 and Figure 2 The present invention mainly includes the following steps:

[0043] Training phase

[0044] 1) Preprocess the image data by horizontally flipping the original image data, randomly cropping it, and standardizing it to obtain new image data, and then input the new image data into the model.

[0045] 2) Using a convolutional neural network, extract the features of the new image data obtained in step 1) to obtain the high-order abstract features of the original image;

[0046] 3) Using the abstract features extracted in step 2), the feature scaling module is used to calculate the hypersphere radius for each class. The hypersphere radius is calculated as follows:

[0047]

[0048] in Centered on, The coefficients are updated around the center. The center was calculated in the previous batch. The center of each class in the current batch;

[0049] 4) Using the radius of each class obtained in step 3), calculate a scaling factor for each class to perform feature scaling and obtain new features. The feature scaling factor can be defined as:

[0050]

[0051] in The temperature coefficient controls the effect of the radius ratio on the original feature. Let the radius be the value of class j. The radius of the class containing the most samples;

[0052] 5) For the high-order abstract features extracted in step 2), the system constructs a Gabriel graph on all data in this batch, retaining edges whose endpoints are outliers. These features are defined as boundary sample features. For the mined boundary sample features, a Gaussian loss function is used to constrain intra-class and inter-class similarity. Here, the loss function is specifically defined as:

[0053]

[0054] Where S is the set of edges in the Gabriel graph. For coefficients, The corresponding weights of the fully connected layer, The threshold is used to control the difference in distance between classes. To balance the imbalance, the weights are used to enhance the role of the minority class distance in the loss function, where... This represents the number of samples in the category with the most samples. This represents the number of samples in each class;

[0055] 6) Construct a cross-entropy loss function for the new features obtained after scaling in step 4) to calculate the gap between the type recognized by the model and the true label;

[0056] 7) The loss functions constructed in steps 5) and 6), and the feature scaling coefficients in step 4), are dynamically scheduled and trained using a loss scheduler. and Controlling the training process, in which Controlling the scaling factor, and To control the change in the loss function, it is defined as follows:

[0057]

[0058] in and is a hyperparameter, and is a decimal number between [0,1]. Used to control the influence of coefficients at different rounds. This is used to control the retention of previously trained representations during later training stages, preventing subsequent training from destroying the representations learned by the model in previous training rounds. epoch is the training round, and epochs is the total number of rounds.

[0059] 8) The overall loss function can be defined as follows:

[0060]

[0061] in This represents the cross-entropy loss function.

[0062] Testing phase

[0063] 1) Input one or more images to be recognized into the model;

[0064] 2) Use a convolutional neural network to extract features from the image input in step 1) to obtain abstract features;

[0065] 3) After passing the abstract features obtained in step 2) through a fully connected layer, calculate the probability that the image belongs to each class, and take the class with the highest probability as the category to which the image belongs, as follows:

[0066]

[0067] in This represents the probability of the i-th prediction.

[0068] Experimental Design

[0069] Experimental dataset:

[0070] The experimental data came from the internationally recognized image datasets CIFAR-10 and CIFAR-100. The CIFAR-10 dataset contains 60,000 images with a resolution of 32x32 pixels, divided into 10 categories: airplane, car, bird, cat, deer, dog, frog, horse, boat, and truck. The CIFAR-100 dataset contains 60,000 images with a resolution of 32x32 pixels, divided into 100 categories.

[0071] Comparison Algorithms:

[0072] The method proposed in this paper is mainly for imbalanced image data. Therefore, for comparison, we use Focal Loss, Mixup, CB Focal, LDAM-DRW, and BBN, which are also applicable to imbalanced image data.

[0073] Performance measurement methods:

[0074] Since the test data is a balanced dataset, we use accuracy (ACC) to measure the performance of different methods, where P is the number of samples that are accurately predicted and N is the total number of samples.

[0075]

[0076] Experimental results

[0077] Table 1. Experimental results of the comparative method on unbalanced CIFAR-10.

[0078]

[0079] As shown in Table 1, the proposed method achieves the highest accuracy on datasets with different imbalances, surpassing all the comparison methods, which proves the effectiveness and superiority of the proposed method.

[0080] Table 2. Experimental results of the comparison method on CIFAR-100.

[0081]

[0082] At the same imbalance, CIFAR-100 has more classes than CIFAR-10, but fewer samples per class, thus requiring higher performance from the classification system. Table 2 shows that the proposed classification system has higher robustness compared to other methods.

[0083] For different modules of the proposed system, we implemented an ablation experiment feature scaling module (FSM) and a boundary sample mining module (BSMM). The results are shown in the table below. The experimental results show that both the feature scaling module and the boundary sample mining module are effective for classifying imbalanced data, and the combination of the two can further improve the classification performance.

[0084] Table 3 Ablation experiments of different modules of the proposed system

[0085] In summary, the imbalanced image data classification system based on feature scaling and boundary sample mining of this invention improves the recognition performance of imbalanced image data by using feature scaling to promote training of the minority class through hypersphere radius, while boundary sample mining constrains intra-class and inter-class distances. Furthermore, this invention provides a reference for other related problems in the field, allowing for further development and demonstrating broad application prospects.

[0086] The above description is merely a preferred embodiment of the present invention. It should be noted that those skilled in the art should understand that these are merely illustrative examples, and the scope of protection of the present invention is defined by the appended claims. Several improvements and modifications can be made without departing from the principle of the present invention, and these improvements and modifications should also be considered within the scope of protection of the present invention.

Claims

1. An imbalanced image classification method based on feature scaling and boundary samples, characterized in that, The method includes the following steps: 1) Divide the original sample data into two parts: a training set and a test set, and construct the training set as imbalanced data; 2) Feature scaling step: The features are scaled proportionally according to the radius of the hypersphere formed by the features of each class, thereby enhancing the model's training and learning on minority class samples. Specifically, the center of each class is calculated based on the sample features contained in each class within a batch. After obtaining the center of each class in the current batch, the class center is calculated using an exponential moving average. This process can be defined as follows: in Centered on, The coefficients are updated around the center. The center was calculated in the previous batch. As the center of each class in the current batch, After calculating the center of each class, the radius of each class needs to be calculated. The Euclidean distance between the class center vector and the feature vectors of all samples of the same class is taken, and the maximum value is used as the radius of the current class. Finally, the ratio of the radius of each class to the radius of the class with the most samples is calculated. Based on this ratio, the features of different classes are scaled. The scaling factor can be defined as follows: in The temperature coefficient controls the effect of the radius ratio on the original feature. Let the radius be the value of class j. The radius of the class containing the most samples is used; after obtaining the scaling factor, the features extracted by the convolutional neural network are scaled, and finally normalized by the softmax function to obtain the prediction result. 3) Boundary sample mining steps based on Gabriel graphs: A Gabriel graph is constructed based on the sample features according to its definition. Boundary sample features are then mined. Constraints are applied to intra-class and inter-class distances based on the mined features. The definition of a Gabriel graph is as follows: If the above formula is satisfied, then the characteristic is... and characteristics An edge can be formed between them, where Represents Euclidean distance. Indicates except and All other feature vectors besides; After feature extraction by the convolutional neural network, a Gabriel graph is constructed on the features using the definition of a Gabriel graph. Edges whose endpoints are out-of-class are retained and defined as boundary samples for each class. Based on these boundary samples, a loss function is used to make the boundary samples themselves closer to the class center, while the boundary samples of out-of-classes are further apart. This results in more compact intra-class samples and more dispersed inter-class samples, thereby enhancing the model's representational ability. The specific loss function is defined as follows. Where S is the set of edges in the Gabriel graph. For coefficients, The corresponding weights of the fully connected layer, The threshold is used to control the difference in distance between classes. The weights used to balance imbalances This represents the number of samples in the category with the most samples. This represents the number of samples in each class; 4) By using a loss scheduler, the training process in steps 2 and 3 is controlled so that the training focuses on the training of representation ability in the early stage and on the discrimination of categories in the later stage. 5) In the testing step, the original test image is input into the trained convolutional neural network, features are extracted through the network, and then the image is judged through a fully connected layer.

2. The imbalanced image classification method based on feature scaling and boundary samples according to claim 1, characterized in that: The loss scheduler in step 4) controls the training process of feature scaling and boundary sample mining, focusing on training representation capabilities in the early stages of training, i.e., paying more attention to... The changes in the later stages focused more on training the discriminative ability, that is, on the cross-entropy loss function. The change in temperature coefficient in step 2) is controlled by the loss scheduler. ,coefficient and The training process is controlled as follows: in and is a hyperparameter, and is a decimal number between [0,1]. Used to control the influence of coefficients at different rounds. This is used to control the retention of previously learned representations during later training stages, preventing subsequent training from destroying the representations learned by the model in previous training epochs. epoch refers to the number of training epochs, and epochs refers to the total number of epochs. Therefore, the overall loss function can be defined as follows: in This represents the cross-entropy loss function.