Small sample learning method based on generalized class-mixture distribution and bias complementary assumption

By generating new sample data through a generalized mixed distribution and the hypothesis of complementary bias, the problem of large dataset bias and incomplete information extraction in small sample learning is solved, the accuracy of classification models is improved, and the generator can be quickly embedded into existing models.

CN116416470BActive Publication Date: 2026-06-26SHANGHAI ULUCU ELECTRON TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHANGHAI ULUCU ELECTRON TECH CO LTD
Filing Date
2023-04-12
Publication Date
2026-06-26

Smart Images

  • Figure CN116416470B_ABST
    Figure CN116416470B_ABST
Patent Text Reader

Abstract

The application provides a small sample learning method based on a generalized category mixed distribution and a bias complementary assumption, and comprises the following steps: S1, data is collected, data corresponding to a target category to be identified is taken as target data to generate a target data set, and data corresponding to a category different from the target category but belonging to the same generalized category is taken as background data to generate a background data set; target data and background data are mixed to obtain a first training data set; the first training data set is preprocessed; the preprocessed first training data set is used to train a generator adopting a preset algorithm to obtain a trained generator; the trained generator is used to generate new sample data of the same category as the target data but with different attributes, and all the new sample data and all the target data are combined to obtain a second training data set; the second training data set is used to train a classifier of the target category; and the trained classifier is used for prediction.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of few-shot learning technology, and specifically relates to a few-shot learning method based on the generalized category mixture distribution and the bias complementarity hypothesis. Background Technology

[0002] In recent years, deep learning has achieved tremendous success in image processing and natural language processing, with algorithms such as VGG and ResNet. However, most of this success is built on large-scale data sets, while learning from a small number of samples remains challenging. In contrast, humans can easily learn to distinguish new categories from just a few samples, indicating that current algorithms are still far from achieving human-level intelligence.

[0003] The challenge of learning effective pattern recognition from a small number of samples is called few-shot learning, and some research has been dedicated to solving this problem. Common learning methods include the following:

[0004] (1) Meta-learning (learning-to-learning): In few-shot learning, meta-learning can leverage learning experience from other tasks to quickly assist in learning new tasks, such as learning to design neural structures, initialization parameters, optimizers, and loss functions. Meta-learning classifiers can be easily transferred to new learning tasks, but the design and effectiveness of learning remain unsolved problems.

[0005] (2) Metric learning: This method maps samples to a metric space, so that samples of the same class are closer together and samples of different classes are farther apart, in order to achieve better classification results.

[0006] (3) Generative and augmentation training sample methods: Since the main problem is the lack of samples, increasing the number of training samples is a natural idea. This is mainly divided into two categories. One is to directly generate new samples, such as new samples based on GAN, VAE, or directly synthesized; the other is feature augmentation, mainly represented by attribute-guided augmentation, multi-level semantic feature augmentation, etc. However, these algorithms rarely analyze the prior distribution of the data. Attribute-guided augmentation directly pairs the training set according to corresponding attributes and then performs linear operations on the latent variables. Multi-level semantic feature augmentation trains an attribute operation network.

[0007] These algorithms, whether linear or nonlinear, operate on all attributes of the entire dataset without considering the prior distribution of the data. Operating on a latent variable with an unknown distribution introduces the problem that it's impossible to distinguish which changes are learnable from small samples and which are not, and the range of these changes remains undetermined. Summary of the Invention

[0008] This invention is made to solve the above problems, and aims to provide a small-sample learning method based on the generalized category mixture distribution and the bias complementarity hypothesis.

[0009] It is generally believed that poor learning performance with small samples is mainly due to excessive data bias, which easily leads to overfitting when the dataset is too small. However, after creative thinking and multiple experimental verifications, the inventors believe that this is because current networks do not extract enough information from the dataset. More information should be extracted from the data, including previously considered invalid information; that is, all information in the dataset is useful. To this end, this invention first proposes the following two assumptions:

[0010] (1) Generalized Category Mixed Distribution Assumption: The mixed distribution of common attributes within the same generalized category approaches a normal distribution. A generalized category refers to a data category that shares some common attribute, such as all animals, all plants, and all handwritten characters. A common attribute refers to the attribute that all categories within this generalized category possess. For example, in the category of mammals, attributes such as skin color, eye size, and body length should be present. The probability distribution function of these attributes within a generalized category can be assumed to be a normal distribution.

[0011] (2) Complementary bias hypothesis: It is assumed that in the case of a small sample size, the distribution biases of different types of common attributes are also different. Since the above assumption is that the mixed distribution of common attributes under the same generalized class tends to be a normal distribution, it can be concluded that the biases of these different types should be complementary.

[0012] Through experimental analysis of data distributions at different scales, the inventors discovered that, based on the two assumptions above, the biases in data from different subclasses within the same generalized class can be supplemented. Specifically, under certain conditions, attribute changes from one class can be transferred to another, such as changes in posture or completeness. Therefore, these two assumptions can serve as a foundation for data augmentation under small sample sizes.

[0013] To achieve the above objectives, the present invention employs the following solution:

[0014] This invention provides a few-sample learning method based on a generalized class mixture distribution and the hypothesis of complementary bias, characterized by the following steps:

[0015] Step S1, Data Acquisition and Classification: Collect data, generate target dataset by using data corresponding to the target category to be identified as target data, and generate background dataset by using data corresponding to categories that are different from the target category but belong to the same general category as background data.

[0016] Step S2, data mixing: the target data and background data are mixed according to a preset ratio to obtain the first training dataset;

[0017] Step S3, data preprocessing: preprocess the first training dataset to obtain the preprocessed first training dataset;

[0018] Step S4: Train the generator. Use the preprocessed first training dataset to train the generator using the preset algorithm to obtain the trained generator.

[0019] Step S5, Sample generation: Use the trained generator to generate new sample data with the same category as the target data but different attributes, and merge all the new sample data and all the target data to obtain the second training dataset;

[0020] Step S6: Train the classifier for the target category. Use the second training dataset to train the classifier for the target category, thereby obtaining the trained classifier.

[0021] Step S7: Use the classifier to predict new data corresponding to the target category using the trained classifier.

[0022] The few-sample learning method based on the generalized category mixture distribution and the bias complementarity hypothesis provided by this invention may also have the following feature: wherein step S5 specifically includes the following sub-steps:

[0023] Step S5-1: Select one target data point M from the target dataset; and select two background data points corresponding to any one category from the background dataset, denoted as B respectively. a B b ;

[0024] Step S5-2, combine the target data M and the two background data Z. a Z b Input the trained generator to obtain the corresponding latent vector Z. m Z a Z b ;

[0025] Step S5-3: Calculate the latent vector Z using the following formula. a Z b Subordinate changes D ab :

[0026]

[0027] Where std is the statistical variance of the latent variable {Z} in each dimension of the entire background dataset;

[0028] Step S5-4: Calculate the latent vector Z' corresponding to the new sample data M' using the following formula;

[0029] Z' = Z m +D ab

[0030] Step S5-5: Input the latent vector Z' into the trained generator to obtain new sample data M';

[0031] Steps S5-6: Merge all new sample data and all target data to obtain the second training dataset.

[0032] The few-sample learning method based on the generalized category mixed distribution and the bias complementarity hypothesis provided by the present invention may also have the following feature: wherein, in step S4, the preset algorithm is any one of the following: autoregressive model (AR), variational autoencoder (VAE), generative adversarial network (GAN), and reversible generative flow model (GLOW).

[0033] The small sample learning method based on the generalized category mixed distribution and the bias complementarity hypothesis provided by the present invention may also have the following feature: in step S2, the preset ratio is S:1, where S is a positive integer greater than or equal to 1.

[0034] The few-sample learning method based on the generalized category mixed distribution and the bias complementarity hypothesis provided by this invention may also have the following features: wherein, step S3 specifically involves: firstly, scaling all data in the first training data to a size of M×N, and then scaling the size range of each data to [a,b], with a mean m and a variance d, thereby obtaining the preprocessed first training dataset.

[0035] Compared with other few-shot learning methods, the few-shot learning method of this invention based on the generalized class mixture distribution and the bias complementarity hypothesis has the following advantages:

[0036] (1) Potential changes can be learned from other categories, such as changes in pose, lighting, and local attributes, and these changes can be projected onto the target category to generate new samples, thus fundamentally solving the problem of bias in small sample learning.

[0037] (2) The classification model required by the small sample learning method of the present invention is the same as that of the normal classification model, so it can be quickly embedded into existing work without adding extra burden during prediction. Attached Figure Description

[0038] Figure 1 This is an action flowchart of the few-sample learning method based on the generalized category mixture distribution and bias complementarity assumption in an embodiment of the present invention;

[0039] Figure 2 This is the overall framework of the GLOW algorithm model in the embodiments of the present invention;

[0040] Figure 3 This describes the detailed structure of the GLOW algorithm model step offlow in an embodiment of the present invention; and

[0041] Figure 4 This is a diagram of the resnet50 network structure in an embodiment of the present invention. Detailed Implementation

[0042] To make the technical means, creative features, objectives and effects of this invention easy to understand, the invention will be specifically described below in conjunction with embodiments and accompanying drawings.

[0043] It is generally believed that poor learning performance with small samples is mainly due to excessive data bias, which easily leads to overfitting when the dataset is too small. However, after creative thinking and multiple experimental verifications, the inventors believe that this is because current networks do not extract enough information from the dataset. More information should be extracted from the data, including previously considered invalid information; that is, all information in the dataset is useful. To this end, this invention first proposes the following two assumptions:

[0044] (1) Generalized Category Mixed Distribution Assumption: The mixed distribution of common attributes within the same generalized category approaches a normal distribution. A generalized category refers to a data category that shares some common attribute, such as all animals, all plants, and all handwritten characters. A common attribute refers to the attribute that all categories within this generalized category possess. For example, in the category of mammals, attributes such as skin color, eye size, and body length should be present. The probability distribution function of these attributes within a generalized category can be assumed to be a normal distribution.

[0045] (2) Complementary bias hypothesis: It is assumed that in the case of a small sample size, the distribution biases of different types of common attributes are also different. Since the above assumption is that the mixed distribution of common attributes under the same generalized class tends to be a normal distribution, it can be concluded that the biases of these different types should be complementary.

[0046] Through experimental analysis of data distributions at different scales, the inventors discovered that, based on the two assumptions above, the biases in data from different subclasses within the same generalized class can be supplemented. Specifically, under certain conditions, attribute changes from one class can be transferred to another, such as changes in posture or completeness. Therefore, these two assumptions can serve as a foundation for data augmentation under small sample sizes.

[0047] <Example>

[0048] In this embodiment, the Omniglot handwritten character dataset is used as an example to illustrate the few-shot learning task. The Omniglot handwritten character dataset can be regarded as a generalized class, namely the handwritten character class; it comes from 50 different alphabets and contains 1623 different handwritten characters, with 20 samples in each class.

[0049] refer to Figure 1 This invention provides a small-sample learning method based on the generalized category mixture distribution and the bias complementarity hypothesis, including the following steps S1-S7.

[0050] Step S1, Data Acquisition and Segmentation: Collect data, generate a target dataset by using the data corresponding to the target category to be identified as the target data, and generate a background dataset by using the data corresponding to the category that is different from the target category but belongs to the same general category as the background data.

[0051] The method of this invention requires two types of data for few-sample learning: one is target data of a small number of samples of the target category to be trained, i.e., data corresponding to the target category to be identified; the other is background data related to the target category, i.e., data corresponding to categories that are different from the target category but belong to the same generalized category. For example, when the target data to be identified in a small sample is handwritten characters, the background data is best composed of other handwritten characters. In this way, all handwritten characters are regarded as a generalized category, and under this generalized category, the attribute changes between the same category in the background data can be transferred to the target category.

[0052] In this embodiment, 20 characters are randomly selected from the 1623 characters in the Omniglot handwritten character dataset as the target category, and the rest are used as the background set.

[0053] Step S2, data mixing: The target data and background data are mixed according to a preset ratio to obtain the first training dataset.

[0054] Specifically, the preset ratio is S:1, where S is a positive integer greater than or equal to 1. The purpose of this mixing is to ensure the balance of sampling between target data and background data during generator training.

[0055] In this embodiment, S is 1.

[0056] Step S3, data preprocessing: preprocess the first training dataset to obtain the preprocessed first training dataset.

[0057] In this invention, step S3 specifically involves: first, scaling all data in the first training data to a size of M×N, then scaling the size range of each data to [a, b], with a mean m and a variance d, thereby obtaining the preprocessed first training dataset.

[0058] In this embodiment, M and N are both 64, a is -1, b is 1, m is 0, and d is 0.5.

[0059] Step S4: Train the generator. Use the preprocessed first training dataset to train the generator using the preset algorithm to obtain the trained generator.

[0060] Specifically, the generator can be of various styles, as long as it can completely generate the data sample itself. For example, autoregressive models (AR), variational autoencoders (VAEs), generative adversarial networks (GANs), and reversible generative flow models (GLOW).

[0061] In this embodiment, GLOW is used as an example. The mixed data is input into the GLOW network, and the gradient descent algorithm is used to minimize the loss function. The resulting model is the trained generator.

[0062] The principle of GLOW is to perform a nonlinear transformation on the complex, high-dimensional input data, and this transformation is reversible. This transformation maps the high-dimensional input data to the latent space, generating independent latent variables (and vice versa), that is, finding an invertible bijection to achieve the mutual transformation between the input and the latent space. For example... Figure 2 As shown, the latent vector Z is obtained after several steps from the input tensor X.

[0063] "squeeze" means to compress the channel dimension of the input tensor X, "split" means to separate the channel dimension of the input tensor X, and the others are simply the superposition of K "step offlow".

[0064] The key step in the GLOW model is the "step of flow," the detailed structure of which is as follows: Figure 3 As shown, it includes the following steps:

[0065] The first step is the Actnorm preprocessing layer, calculated using the following formula:

[0066]

[0067] i, j represent the two-dimensional width and height indices, x i,j Indicates the input tensor, y i,j Let represent the output tensor, s represent the scaling parameter, and b represent the bias parameter. The data is normalized using an Acrnorm layer, and activation is performed using the scaling parameter s and bias parameter b for each channel. This ensures that mini-batch data has zero mean and unit variance after activation. The parameters s and b are updated as training progresses.

[0068] The second step is Invertible 1x1 Convolution, the formula is as follows:

[0069]

[0070] W is an invertible 1x1 convolution kernel. Its purpose is to shuffle the data across different dimensions using matrix multiplication, allowing for more thorough information mixing. The 1x1 kernel also enables the parameter matrix to be trained. A 1×1 convolution with an equal number of input and output channels is a generalization of the permutation operation. Simplifying matrix calculations simplifies the overall computational load.

[0071] The third step is the Affine Transformation layer, calculated using the following formula:

[0072] x a ,x b =split(x)

[0073] (log s,t)=NN(x b )

[0074] s = exp(log s)

[0075] y a =s⊙x a +t

[0076] y b =x b

[0077] y = concat(y a ,y b )

[0078] Divide the input tensor X into two halves along the channel dimension, namely tensor X0. a and X b NN stands for Convolutional Neural Network. NN is used to process X. b Perform a nonlinear transformation to obtain the output tensors logS and t. Exponentially calculate logS to obtain the tensor s. Use tensor s to transform tensor X. a Perform affine coupling transformation to obtain Y a Y b equals X b Y b With Y a When the channels are concatenated, the output Y is obtained.

[0079] Z represents the vector of the input tensor X in the latent space. Since the distributions in each dimension are assumed to be independent, Z can be considered as a decoupled representation of X, i.e., a representation of the properties of X.

[0080] Step S5, Sample generation: Use the trained generator to generate new sample data with the same category as the target data but different attributes, and merge all the new sample data and all the target data to obtain the second training dataset.

[0081] Specifically, step S5 includes the following sub-steps:

[0082] Step S5-1: Select one target data point M from the target dataset; and select two background data points corresponding to any one category from the background dataset, denoted as B respectively. a B b .

[0083] Step S5-2: Combine the target data M and the two background data Z. a Z b Input the trained generator to obtain the corresponding latent vector Z. m Z a Z b .

[0084] Step S5-3: Calculate the latent vector Z using the following formula. a Z b Subordinate changes D ab :

[0085]

[0086] Where std represents the statistical variance of the latent variable {Z} across all dimensions of the entire background dataset.

[0087] Step S5-4: Calculate the latent vector Z' corresponding to the new sample data M' using the following formula;

[0088] Z' = Z m +D ab

[0089] Step S5-5: Input the latent vector Z' into the trained generator to obtain new sample data M'.

[0090] Steps S5-6: Merge all new sample data and all target data to obtain the second training dataset.

[0091] Step S6: Train the classifier for the target category. Use the second training dataset to train the classifier for the target category, thereby obtaining the trained classifier.

[0092] Specifically, classifiers typically consist of convolutional networks, fully connected networks, and recurrent networks. Taking the classic residual network ResNet50 as an example... Figure 4 As shown, it consists of multiple layers of convolutional networks, with the last layer outputting the category of the sample. The loss function can be chosen from cross-entropy loss or squared loss, etc., and then optimized using stochastic gradient descent or adaptive moment estimation to achieve its target value.

[0093] Step S7: Use the classifier to predict new data corresponding to the target category using the trained classifier.

[0094] Specifically, let the new target data sample be X. i Inputting the data into the classifier will yield the resulting category.

[0095] The few-sample learning method based on the generalized class mixture distribution and the bias complementarity hypothesis in this embodiment can achieve an accuracy of over 90% on the Omniglot dataset, using 20 target data categories and one sample per category.

[0096] The above embodiments are preferred embodiments of the present invention and are not intended to limit the scope of protection of the present invention.

[0097] For example, the classifier in the above embodiments can be replaced by any kind of deep learning neural network, including but not limited to classification models composed of convolutional neural networks, fully connected networks, recurrent recurrent networks, etc.

Claims

1. A few-sample learning method based on the generalized category mixture distribution and the hypothesis of complementary bias, characterized in that, Includes the following steps: Step S1, Data Acquisition and Segmentation: Acquire the Omniglot handwritten character dataset, generate the target dataset by using the data corresponding to the target category to be recognized as the target data, and generate the background dataset by using the data corresponding to the category that is different from the target category but belongs to the same general category as the background data. Step S2, data mixing: The target data and the background data are mixed according to a preset ratio to obtain the first training dataset; Step S3, data preprocessing: preprocess the first training dataset to obtain the preprocessed first training dataset; Step S4: Train the generator. Use the preprocessed first training dataset to train the generator using the preset algorithm to obtain the trained generator. Step S5, Sample generation: Use the trained generator to generate new sample data with the same category as the target data but different attributes, and merge all the new sample data and all the target data to obtain the second training dataset; Step S6: Train the classifier for the target category. Use the second training dataset to train the classifier for the target category, thereby obtaining the trained classifier. Step S7: Predict using a classifier; use the trained classifier to predict new data corresponding to the target category. Specifically, step S5 includes the following sub-steps: Step S5-1: Select one target data point M from the target dataset; and select two background data points corresponding to any one category from the background dataset, denoted as B respectively. a B b ; Step S5-2, combine the target data M and two background data. , Inputting the trained generator yields the corresponding latent vector Z. m Z a Z b ; Step S5-3: Calculate the latent vector Z using the following formula. a Z b Attribute changes D ab : Where std is the statistical variance of the latent variable {Z} in each dimension of the entire background dataset; Step S5-4: Calculate the latent vector Z' corresponding to the new sample data M' using the following formula; Step S5-5: Input the latent vector Z' into the trained generator to obtain new sample data M'; Steps S5-6: Merge all the new sample data and all the target data to obtain the second training dataset.

2. The few-sample learning method based on the generalized class mixture distribution and the bias complementarity hypothesis according to claim 1, characterized in that: in, In step S4, the preset algorithm is any one of the following: autoregressive model (AR), variational autoencoders (VAEs), generative adversarial networks (GANs), and reversible generative flow model (GLOW).

3. The few-sample learning method based on the generalized class mixture distribution and the bias complementarity hypothesis according to claim 1, characterized in that: in, In step S2, the preset ratio is S:1, where S is a positive integer greater than or equal to 1.

4. The few-sample learning method based on the generalized class mixture distribution and the bias complementarity hypothesis according to claim 1, characterized in that: in, Step S3 specifically involves: first, scaling all data in the first training data to a size of P×Q, then scaling the size range of each data point to [e,f], with a mean m and a variance d, thereby obtaining the preprocessed first training dataset.