A data augmentation method
By assessing the difficulty of data samples and utilizing the attention graph of the Transformer model for saliency detection and label mixing, the problem of high computational cost and slow speed of existing saliency detectors is solved, achieving more efficient data augmentation results.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- FOSHAN NANHAI GUANGDONG TECH UNIV CNC EQUIP COOP INNOVATION INST
- Filing Date
- 2023-10-31
- Publication Date
- 2026-06-12
Smart Images

Figure CN117576500B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of machine vision technology, and in particular to a data augmentation method. Background Technology
[0002] Various data augmentation methods have been proposed for computer vision tasks. Mixup is one of the most successful data augmentation methods, commonly used in image classification. Mixup is an algorithm used in computer vision to augment images by mixing images from different classes, thereby expanding the training dataset. Recent advances in Mixup methods have mainly focused on saliency-based mixing.
[0003] However, many saliency detectors require significant computation. This is because transformer models typically retain a high parameter volume. Focusing on saliency-based mixup methods has certain limitations, including the need to improve the saliency of the mixed label sets; gradient-based methods are slow at saliency-aware data augmentation; and many saliency detectors require substantial computation.
[0004] Therefore, a new data augmentation method is urgently needed to solve the above problems. Summary of the Invention
[0005] This invention provides a data augmentation method aimed at addressing the problems in existing technologies, such as the need to improve the saliency of mixed labeled sets, the slow speed of gradient-based saliency-aware data augmentation, and the large amount of computation required by many saliency detectors.
[0006] The data augmentation method includes the following steps:
[0007] S1. Obtain data samples from the dataset;
[0008] S2. Evaluate multiple instances in the data sample according to a preset function to obtain an evaluation difficulty score. Determine whether to perform data augmentation on the instance based on the evaluation difficulty score. If so, proceed to step S3.
[0009] S3. The instances are saliency detected and optimized using the attention graph of the Transformer model to obtain the hybrid target corresponding to the instances;
[0010] S4. The instances and the corresponding hybrid targets are labeled and mixed to obtain a hybrid label set;
[0011] S5. Perform data augmentation on the hybrid label set and output a data augmentation set, which is used as the training set for the baseline model.
[0012] Preferably, in step S2, the formula for calculating the assessment difficulty score is as follows:
[0013]
[0014] in, X represents the difficulty evaluation function. (i) Y represents the intermediate marker of the i-th instance. (i) Let represent the predicted target value for the i-th instance, and ScoreNet represent the parameterized module.
[0015] Preferably, in step S3, the saliency detection of the instance satisfies the following relationship:
[0016]
[0017] Where t = 1, 2, ..., n represents the tag index; S t Indicates the significance score; A indicates that the instance is from layer i to... i th The significance of layer tokens satisfies the following relation:
[0018] A = Φ (i) ·Φ (i+1) …Φ (i+l) ;
[0019] in, H represents the number of heads in the bullish attention layer. h represents the attention head of the i-th layer. th .
[0020] Preferably, in step S4, the mixed tag set satisfies the following relationship:
[0021]
[0022] in, M represents a mixed tag set. t Indicates the default binary mask. This represents the saliency score corresponding to the instance. Y represents the saliency score of the paired instances of the given instance. (j) This represents the predicted target value of the paired instance of the instance. This represents the predicted target value for the instance.
[0023] Preferably, in step S5, the data augmentation set is obtained by connecting the current level's label set and the previous level's label set in the hybrid label set through a vertical cross-attention mechanism, wherein the vertical cross-attention mechanism satisfies the following relationship:
[0024] Z = Softmax(X1W) q (X2W k ) T )X2W v ;
[0025] Among them, W q W k W v These represent the query, key, and value of the prediction layer, respectively. X1 represents the tag set of the current layer, and X2 represents the tag set of the previous layer to be joined.
[0026] Preferably, in step S1, the dataset is based on ImageNet-1k and CIFAR.
[0027] Preferably, in step S5, the baseline model is based on vanilla ViT-B / 16 and CCT.
[0028] Compared with existing technologies, the data augmentation method provided by this invention includes the following steps: obtaining data samples from a dataset; evaluating a number of instances in the data samples according to a preset function to obtain an evaluation difficulty score; determining whether to perform data augmentation on the instances based on the evaluation difficulty score; if so, proceeding to the next step; performing saliency detection and optimized allocation on the instances using the attention graph of the Transformer model to obtain the hybrid target corresponding to the instance; mixing the instances and the corresponding hybrid target with labels to obtain a hybrid label set; performing data augmentation on the hybrid label set to output a data augmentation set. Through the above steps, this invention effectively improves the saliency of the hybrid label set; provides faster saliency-aware data augmentation; and achieves multi-scale feature augmentation by performing hybrid labeling on a single instance. Attached Figure Description
[0029] The present invention will now be described in detail with reference to the accompanying drawings. The above and other aspects of the present invention will become clearer and more readily understood through the detailed description following the accompanying drawings. In the drawings:
[0030] Figure 1 This is a flowchart of the data augmentation method provided in an embodiment of the present invention. Detailed Implementation
[0031] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the invention.
[0032] Please refer to Figure 1 , Figure 1This is a flowchart of a data augmentation method provided in an embodiment of the present invention. The present invention provides a data augmentation method comprising the following steps:
[0033] S1. Obtain data samples from the dataset;
[0034] In this embodiment of the invention, the dataset is based on ImageNet-1k and CIFAR.
[0035] S2. Evaluate multiple instances in the data sample according to a preset function to obtain an evaluation difficulty score. Determine whether to perform data augmentation on the instance based on the evaluation difficulty score. If so, proceed to step S3.
[0036] In this embodiment of the invention, a preset scoring threshold τ is used to determine whether data augmentation is needed, and a parameterized module ScoreNet is used. ScoreNet is a module based on intermediate markers X. (i) Used to predict the target value Y (i) A simple multi-layer perceptron (MLP) is used. Given the ScoreNet output, the difficulty score is evaluated using the prediction loss, satisfying the following relationship:
[0037]
[0038] in, X represents the difficulty evaluation function. (i) Y represents the intermediate marker of the i-th instance. (i) Let represent the predicted target value for the i-th instance, and ScoreNet represent the parameterized module.
[0039] If the difficulty score is greater than the set threshold τ, the data sample is considered sufficiently hard and unmixed. Conversely, after pairing with other mini-batch data samples, a series of simpler data samples are mixed up using token-level mixing. express It will be enhanced.
[0040] S3. The instances are saliency detected and optimized using the attention graph of the Transformer model to obtain the hybrid target corresponding to the instances;
[0041] In this embodiment of the invention, a transformer attention graph is used instead of a computationally intensive gradient-based saliency detector. Inference of i can be achieved through attention derivation. th The significance of layer tokens is calculated from layer i to layer i. The following relationship exists:
[0042] A = Φ (i) ·Φ (i+1) …Φ (i+l) (2)
[0043] H represents the number of heads in the multi-head attention layer. yes Note the h in the layer header th .
[0044] Significance score S t It can be calculated as follows:
[0045]
[0046] In the formula, t = 1, 2, ..., n represents the tag index.
[0047] Based on the estimation of label significance, the goal is to maximize the overall significance level by optimally assigning a different mixed objective to each instance. First, define... To determine the significance of the difference for a random instance (i,j), calculate using the following formula:
[0048]
[0049] i, j = 1, 2, ..., b, It comes from i th Instance saliency mapping. Then the objective is expressed as an optimization problem:
[0050]
[0051] M is the set of all possible batch permutations of instance b, and This refers to an arbitrary permutation. ρ is the threshold hyperparameter for controlling the minimum significance gain required for token mixing, while r... i is i th The binary decision vector of the instance. That is, when r i,t When r = 1, the label t of i is replaced with the label t of σ(i), when r i,t =0 (t=1,2,...,n) is retained.
[0052] The optimization problem described above can be solved exactly using the Hungarian matching algorithm. The matching algorithm requires a fraction matrix. The calculation formula is:
[0053]
[0054] Where C i Each item in is X (i) and X (j)The maximum significant gain produced by the mixture. The actual maximum gain is reached when ρ = 0, but this invention sets a positive threshold to control the minimum significant gain required for each replaced token. Then, by applying the Hungarian algorithm, the optimal batch permutation σ can be found. * The batch processing arrangement satisfies the following relationship:
[0055]
[0056] Through hybrid instances and the corresponding paired instance X(σ) * (i) Optimization objective. By incorporating the sample difficulty assessment into (5), the final objective can be written as:
[0057]
[0058]
[0059] in, Let τ represent the difficulty assessment function, where τ is the difficulty threshold, and 1 represents a vector and 1 represents the index function.
[0060] If data samples are difficult to obtain, The constraint becomes 1 T r i ≤0, is r mandatory i The element in the array is 0. In this case, the labels will not be mixed. On the other hand, if the data sample is simple, Constraint 1 T r i ≤n, satisfying any random r i Vector. The objective will be consistent with that expressed in Equation 5.
[0061] S4. The instances and the corresponding hybrid targets are labeled and mixed to obtain a hybrid label set;
[0062] In this embodiment of the invention, the example will be... and corresponding mixed targets Perform label mixing. As defined in Equation (5), replace the decision vector r with the label difficulty. i Then, the paired instance j = σ is replaced with the tag index t in the original instance i. * The marker t in (i) is used to perform mixing, thereby satisfying In other words, only the label indices with a significance level greater than ρ are mixed. This is achieved by defining an appropriate binary mask. Implemented. Includes:
[0063]
[0064] yes The significance score is then used to apply the mask.
[0065]
[0066] ⊙ represents element-wise multiplication, X (i) X (j) Representing paired instances. Labels can be reassigned based on the significance score of the replaced instances, and the new mixed labels can be calculated using the following formula:
[0067]
[0068] in, M represents a mixed tag set. t Indicates the default binary mask. This represents the saliency score corresponding to the instance. Y represents the saliency score of the paired instances of the given instance. (j) This represents the predicted target value of the paired instance of the instance. This represents the predicted target value for the instance.
[0069] Therefore, relatively unimportant tags will be replaced by significant tags from another instance, and their labels will be adjusted according to the change in the overall salience level.
[0070] S5. Perform data augmentation on the hybrid label set and output a data augmentation set, which is used as the training set for the baseline model.
[0071] In this embodiment of the invention, the baseline model is a simple model for evaluating and optimizing machine learning tasks, providing label-level enhancements in a single sample by using labels from previous layers. Similar to...
[0072] Equation 5 defines an objective for vertical markup blending:
[0073]
[0074] Where L is the set of the previous set of layer indices, and l(i) is the index of i. th The instance returns an arbitrary layer index. This objective can be optimized using a scheme similar to that in Algorithm 1. However, considering that labels with the same index at each layer often contain similar information, the significance difference matrix P... i,l(i) =S l(i) -S i It is likely that the values are constant across different metrics. This could lead to meaningless confusion without significantly increasing the significance level. Therefore, a different approach was adopted to vertically mix the labels.
[0075] The simplest approach is to concatenate all tokens from the previous layer and apply self-attention. However, naive concatenation can lead to excessive computation due to the quadratic complexity of self-attention. To reduce the task, we selectively gather the k most prominent tokens from the previous layer and concatenate them to the original token set. Furthermore, to preserve the input dimension, a cross-attention mechanism is employed. If X1 is the token set of the current layer, and X2 is the token set of the previous layer to be concatenated, then vertical cross-attention is represented as:
[0076] Z = Softmax(X1W) q (X2W k ) T )X2W v (13)
[0077] Among them, W q W k W v These represent the query, key, and value of the prediction layer, respectively. X1 represents the tag set of the current layer, and X2 represents the tag set of the previous layer to be joined.
[0078] In this embodiment of the invention, CCT (Compact Convolution Transformer) and vanilla ViT-B / 16 were selected as baseline models, respectively. The data augmentation method provided by this invention was evaluated on three representative image classification datasets: CIFAR-10, CIFAR-100, and ImageNet-1K. Experimental results show that the data augmentation method provided by this invention improves the average classification accuracy by 0.3, 0.6, and 1.1 percentage points on CIFAR-10, CIFAR-100, and ImageNet-1K, respectively, compared to the Mixup method.
[0079] Compared with existing technologies, the data augmentation method provided by this invention includes the following steps: obtaining data samples from a dataset; evaluating a number of instances in the data samples according to a preset function to obtain an evaluation difficulty score; determining whether to perform data augmentation on the instances based on the evaluation difficulty score; if so, proceeding to the next step; performing saliency detection and optimized allocation on the instances using the attention graph of the Transformer model to obtain the hybrid target corresponding to the instance; mixing the instances and the corresponding hybrid target with labels to obtain a hybrid label set; performing data augmentation on the hybrid label set to output a data augmentation set. Through the above steps, this invention effectively improves the saliency of the hybrid label set; provides faster saliency-aware data augmentation; and achieves multi-scale feature augmentation by performing hybrid labeling on a single instance.
[0080] It should be noted that, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes that element.
[0081] The embodiments of the present invention have been described above with reference to the accompanying drawings. The disclosed embodiments are merely preferred embodiments of the present invention. However, the present invention is not limited to the specific embodiments described above. The specific embodiments described above are merely illustrative and not restrictive. Those skilled in the art can make many equivalent changes in form without departing from the spirit and scope of the claims of the present invention, and all such changes are within the protection scope of the present invention.
Claims
1. A data augmentation method, characterized in that, The data augmentation method includes the following steps: S1. Obtain data samples from the dataset; S2. Evaluate multiple instances in the data sample according to a preset function to obtain an evaluation difficulty score. Determine whether to perform data augmentation on the instance based on the evaluation difficulty score. If so, proceed to step S3. S3. The instances are saliency detected and optimized using the attention graph of the Transformer model to obtain the hybrid target corresponding to the instances; S4. The instances and the corresponding hybrid targets are labeled and mixed to obtain a hybrid label set; S5. Perform data augmentation on the hybrid label set and output a data augmentation set, which is used as the training set for the baseline model; In step S2, the formula for calculating the assessment difficulty score is as follows: ; in, This represents the difficulty evaluation function. This represents the intermediate marker of the i-th instance. Let represent the predicted target value of the i-th instance, and ScoreNet represent the parameterized module; In step S3, the saliency detection of the instance satisfies the following relationship: ; in, = 1, 2, ..., Indicates the tag index; Indicates the significance score; A indicates that the instance is from Layer to + ( ≥0) The significance of layer tokens satisfies the following relation: ; in, H represents the number of heads in the multi-head attention layer. Indicates the first Pay attention to the head of the layer ; In step S4, the mixed tag set satisfies the following relationship: in, Represents a mixed tag set, Indicates the default binary mask. This represents the saliency score corresponding to the instance. This represents the saliency score of the paired instances of the given instance. This represents the predicted target value of the paired instance of the instance. This represents the predicted target value for the instance; In step S5, the data augmentation set is obtained by connecting the current level's label set and the previous level's label set in the hybrid label set through a vertical cross-attention mechanism. The vertical cross-attention mechanism satisfies the following relationship: ; in, These represent the query, key, and value of the prediction layer, respectively. This represents the tag set of the current level. This represents the tag set of the next higher level to be joined.
2. The data augmentation method as described in claim 1, characterized in that, In step S1, the dataset is based on ImageNet-1k and CIFAR.
3. The data augmentation method as described in claim 1, characterized in that, In step S5, the baseline model is based on vanilla ViT-B / 16 and CCT.