A data augmentation system and method based on target pose perturbation

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By using a data augmentation method based on target pose perturbation, the problem of difficulty in changing the geometric shape and semantic features of images in traditional methods is solved, generating augmented images with diversity and semantic consistency, and improving the generalization ability and robustness of the model.

CN119339180BActive Publication Date: 2026-06-26TIANJIN UNIV

View PDF 3 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: TIANJIN UNIV
Filing Date: 2024-10-18
Publication Date: 2026-06-26

Smart Images

Figure CN119339180B_ABST

Patent Text Reader

Abstract

The application discloses a data enhancement system and method based on target pose disturbance, and the system comprises a key point extraction module, a key point disturbance module and a deformation module; the key point extraction module is used for extracting key points in an original image; the original image is input, and the key points in the original image are output as pre-disturbance key points; the key point disturbance module is used for position disturbance of the key points and generation of new key points; the pre-disturbance key points are input, and the new key points are output as post-disturbance key points; the deformation module is used for shifting the pre-disturbance key points to the post-disturbance key points based on the original image, and generating an enhanced image corresponding to the original image; the pre-disturbance key points on the original image are shifted to the positions of the corresponding post-disturbance key points through an iterative interpolation method, and the enhanced image is output. The application adopts the method based on target pose disturbance, and solves the problems that the traditional data enhancement method cannot generate samples with geometric deformation and lacks diversity.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of machine learning, and in particular to a data augmentation system and method based on target pose perturbation. Background Technology

[0002] Currently, data augmentation aims to modify or expand given data (usually images or portions of images) using a set of predefined transformations or augmentations. Essentially, it generates new samples that conform to the original distribution from a large pool of existing data. Existing data samples typically refer to images already in the model's training data, belonging to known categories and environments. New samples, on the other hand, refer to images with newly generated styles. These images place higher demands on the model's generalization ability.

[0003] Traditional data augmentation methods are typically limited to processing single or multiple images at the pixel level. These methods, based on simple image space operations, have significant limitations in handling changes to texture and geometry, making it difficult to significantly enhance the diversity and novelty of the underlying information in the original data. This limits their effectiveness in introducing new knowledge and improving model generalization. Introducing strategies that alter geometric features can effectively expand the representational power of the original data, providing the model with more new knowledge, which is crucial for improving model performance. However, most existing methods are limited to local geometric transformations and fail to adequately consider broader scenarios, resulting in limited performance in complex applications.

[0004] Furthermore, most existing feature-space-based data augmentation methods utilize information from the feature level to augment data by manipulating the data in the intermediate layers of the neural network. In the feature space, the data has already been transformed into more abstract representations by the first few layers of the neural network, and these representations typically capture the key semantic features of the input data. Therefore, performing data augmentation in the feature space can better maintain the semantic consistency of the samples, making the generated augmented samples more reasonable and meaningful. However, compared to pixel-space augmentation methods, feature-space operations are often difficult to visualize and intuitively understand, and may introduce spurious samples that are inconsistent with the original data distribution, thus negatively impacting model training.

[0005] In conclusion, to maintain model performance across a wider range of applications, especially in situations of data scarcity or imbalanced distribution, there is a pressing need to introduce more universal data augmentation methods. These methods should not only effectively expand the size of the dataset but also avoid introducing noise or invalid information to ensure the quality of the augmented data. Therefore, for data augmentation tasks, it is essential to consider not only the diversity and appropriateness of the generated images but also the consistency and correlation between the generated samples and the original data distribution, thereby truly improving the model's generalization ability and robustness. Summary of the Invention

[0006] This invention provides a data augmentation system and method based on target pose perturbation to solve the technical problems existing in the prior art.

[0007] The technical solution adopted by this invention to solve the technical problems existing in the prior art is as follows:

[0008] A data augmentation system based on target pose perturbation includes a key point extraction module, a key point perturbation module, and a deformation module;

[0009] The key point extraction module is used to extract key points from the original image; it takes the original image as input and outputs the key points of the original image; the key points output are used as the key points before perturbation.

[0010] The keypoint perturbation module is used to perturb the position of keypoints and generate new keypoints; it takes the keypoints before perturbation as input and outputs the new keypoints; the keypoints it outputs are used as the keypoints after perturbation.

[0011] The deformation module is used to shift key points before perturbation to key points after perturbation based on the original image and generate an enhanced image of the corresponding original image. It takes the original image and the corresponding key points before and after perturbation as input. It uses an iterative interpolation method to shift the key points before perturbation on the original image to the corresponding key point positions after perturbation and outputs the enhanced image.

[0012] Furthermore, the deformation module uses thin-plate spline interpolation to shift the key points on the original image before perturbation to the corresponding key point positions after perturbation.

[0013] Furthermore, the key point extraction module includes a Gaussian difference pyramid and / or a skeleton extraction module; the Gaussian difference pyramid is used to extract candidate key points from the original image; the skeleton extraction module is used to extract the skeleton from the original image; and the key point extraction module extracts key points from the candidate key points and / or the skeleton.

[0014] Furthermore, it also includes a feature extraction module and a deformation loss calculation module;

[0015] The feature extraction module is used to extract features from the original image and the enhanced image; it takes the original image and the enhanced image as input respectively, and outputs feature maps of the original image and the enhanced image respectively.

[0016] The deformation loss calculation module is used to calculate the difference between the feature maps of the original image and the enhanced image; this difference is used as the deformation loss; when the deformation loss reaches the set threshold range, it outputs a deformation loss compliance signal to the deformation module. After receiving the deformation loss compliance signal, the deformation module stops iterative interpolation and uses the image generated before stopping iterative interpolation as the final enhanced image.

[0017] Furthermore, the feature extraction module includes the VGG feature extraction network.

[0018] The present invention also provides a data augmentation method based on target pose perturbation, which includes a key point extraction module, a key point perturbation module, and a deformation module.

[0019] The key point extraction module is used to extract key points from the original image; it is then used as input to the original image and output as the key points of the original image; the output key points are then used as the key points before perturbation.

[0020] The key point perturbation module is used to perturb the position of the key points and generate new key points; the input is the key point before perturbation, and the output is the new key point; the output key point is used as the key point after perturbation.

[0021] The deformation module shifts the key points before perturbation to the key points after perturbation based on the original image and generates an enhanced image of the corresponding original image. The input is the original image and the corresponding key points before and after perturbation. The module uses an iterative interpolation method to shift the key points before perturbation on the original image to the corresponding key point positions after perturbation and outputs the enhanced image.

[0022] Furthermore, the deformation module uses thin-plate spline interpolation to shift the key points on the original image before perturbation to the corresponding key point positions after perturbation.

[0023] Furthermore, the key point extraction module is equipped with a Gaussian difference pyramid and / or a skeleton extraction module; the Gaussian difference pyramid is used to extract candidate key points from the original image; the skeleton extraction module is used to extract the skeleton from the original image; and the key point extraction module is configured to extract key points from the candidate key points and / or the skeleton accordingly.

[0024] Furthermore, a feature extraction module and a deformation loss calculation module are also included;

[0025] The feature extraction module is used to extract features from the original image and the enhanced image; these features are then input into the original image and the enhanced image respectively, and the feature maps of the original image and the enhanced image are output respectively.

[0026] The deformation loss calculation module calculates the difference between the feature maps of the original image and the enhanced image; this difference is used as the deformation loss; when the deformation loss reaches the set threshold range, it outputs a deformation loss compliance signal to the deformation module, so that the deformation module stops iterative interpolation after receiving the deformation loss compliance signal, and the image generated before stopping iterative interpolation is used as the final enhanced image.

[0027] Furthermore, the deformation loss calculation formula of the deformation loss calculation module is as follows:

[0028] Ltotal =L content +αL wrap +βL tv +E;

[0029]

[0030] L tv =∑ e，f ((I e+1，f -I e，f ) 2 +(I e，f +1-I e，f ) 2 );

[0031] E = ∑ j w i φ(||p j -p i -θ i ||)+b[cot(θ i，j L(i,j)+cot(θ) j，i )L(j,i)](x i -x j );

[0032] Let r = ||p j -p i -θ i ||, then: φ(Y)=r 2 logr;

[0033] In the formula:

[0034] L total Total deformation loss;

[0035] L content This represents the content distortion loss between the original image and the enhanced image.

[0036] L wrap This represents the deformation loss between the original image and the enhanced image.

[0037] L tv The total variational deformation loss between the original image and the enhanced image;

[0038] E represents the energy of rigid deformation;

[0039] α is the coefficient of deformation loss;

[0040] β is the coefficient of the total variational deformation loss;

[0041] To enhance the feature representation of the image at the v-th layer of the u-th block in the feature extraction module;

[0042] This represents the feature representation of the original image at the v-th layer of the u-th block in the feature extraction module.

[0043] w i Let i be the weight of the i-th key point;

[0044] i and j are the key point numbers;

[0045] θ i Let be the vector of the perturbation at the i-th key point;

[0046] p i This is the vector of the coordinates of the i-th keypoint in the original image relative to the origin.

[0047] p j This is the vector of the coordinates of the j-th keypoint in the original image relative to the origin.

[0048] p i ′ is the vector of the coordinates of the i-th keypoint in the original image relative to the origin in the enhanced image;

[0049] I e，f To enhance the image, the pixel at position (e, f); e is the horizontal coordinate of the enhanced image, and f is the vertical coordinate of the enhanced image.

[0050] I e+1，f To enhance the image, the pixel at position (e+1, f);

[0051] I e，f+1 To enhance the image, the pixel at position (e, f+1);

[0052] θ i，j The angle between the perturbation vector of the i-th keypoint and the perturbation vector of the j-th keypoint;

[0053] θ i，j Let be the angle between the perturbation vector of the j-th keypoint and the perturbation vector of the ith keypoint;

[0054] L(i,j) is the Laplace operator for the i-th keypoint and the j-th keypoint;

[0055] L(j, i) is the Laplace operator for the j-th keypoint and the i-th keypoint;

[0056] x i For p i The x-coordinate;

[0057] x j Let x be the x-coordinate of pj;

[0058] r is an intermediate variable;

[0059] b is a coefficient;

[0060] k is the number of key points.

[0061] The advantages and positive effects of this invention are:

[0062] 1. This invention proposes a data augmentation method based on target pose perturbation, aiming to solve the problem that traditional pixel-level data augmentation methods cannot effectively change the geometric shape of the target, while also making up for the lack of intuitive understanding of feature-level data augmentation. Pose perturbation can not only change the appearance of the target, but also enhance the diversity of the image without changing its semantic features.

[0063] 2. This invention proposes a method combining an iterative style transfer framework and a rigid deformation approach to solve the problems of smoothness and reasonableness of geometric deformation in generated images. This method allows for local deformation of the generated image while preserving the overall structure, resulting in natural and distortion-free deformation. Simultaneously, the iterative style transfer framework effectively generates a variety of potential styles. Applying the same perturbation to keypoints, the segmentation mask also produces the same geometric changes as the original image, making it suitable for semantic segmentation tasks. Attached Figure Description

[0064] Figure 1 This is a schematic diagram of the workflow of a data augmentation system based on target pose perturbation according to the present invention.

[0065] In the picture:

[0066] P represents the key point; θ represents the perturbation angle. Detailed Implementation

[0067] The present invention will now be described in detail with reference to the accompanying drawings and embodiments. It should be understood that the preferred embodiments described herein are for illustration and explanation only and are not intended to limit the present invention.

[0068] Please see Figure 1 A data augmentation system based on target pose perturbation includes a key point extraction module, a key point perturbation module, and a deformation module;

[0069] The key point extraction module is used to extract key points from the original image; it takes the original image as input and outputs the key points of the original image; the key points output are used as the key points before perturbation.

[0070] The keypoint perturbation module is used to perturb the position of keypoints and generate new keypoints; it takes the keypoints before perturbation as input and outputs the new keypoints; the keypoints it outputs are used as the keypoints after perturbation.

[0071] The deformation module is used to shift key points before perturbation to key points after perturbation based on the original image and generate an enhanced image of the corresponding original image. It takes the original image and the corresponding key points before and after perturbation as input. It uses an iterative interpolation method to shift the key points before perturbation on the original image to the corresponding key point positions after perturbation and outputs the enhanced image.

[0072] The key point perturbation module randomly perturbs the extracted key points, thereby causing the generated image to exhibit geometric deformation.

[0073] Preferably, the deformation module can use thin-plate spline interpolation to shift the key points on the original image before perturbation to the corresponding key point positions after perturbation.

[0074] The deformation module employs thin-plate spline interpolation to generate a smooth torsion function for a given set of key points before and after perturbation, mapping the key points before perturbation to those after. By minimizing the energy function through thin-plate spline interpolation, the module not only accurately matches the target point at the given control points but also minimizes the total deformation energy through the physical model, thus ensuring the smoothness and reasonableness of the interpolation results.

[0075] Preferably, the key point extraction module may include a Gaussian difference pyramid and / or a skeleton extraction module; the Gaussian difference pyramid can be used to extract candidate key points from the original image; the skeleton extraction module can be used to extract the skeleton from the original image; and the key point extraction module can correspondingly extract key points from the candidate key points and / or the skeleton.

[0076] The difference-of-Gaussian pyramid can be used to extract images containing ordinary objects, while the skeleton extraction module can be used to extract images containing animal objects.

[0077] Keypoint Extraction Module: This module sorts candidate keypoints generated from the Difference of Gaussian Pyramid and / or Skeleton Extraction Module according to their response values and selects the best keypoints.

[0078] The Difference-of-Gaussian (DAG) pyramid is an image processing technique used for multi-scale image feature detection, widely applied in image matching, feature extraction, and computer vision tasks. The DAG consists of two main parts:

[0079] Gaussian Blur: This part blurs the input image using Gaussian kernels of different scales, each kernel corresponding to a different standard deviation. Lower-level Gaussian blurred images retain more global or large-scale information, while higher-level blurred images emphasize more details and local information. In this way, the difference-of-gaussian pyramid can acquire image features at multiple scales. Low-scale images are suitable for processing large, smooth regions, while high-scale images retain more local edge and corner features, providing a foundation for subsequent edge, corner, and other feature detection.

[0080] Gaussian Difference: This part performs a difference operation on Gaussian blurred images at adjacent scales to generate a Gaussian difference image. This method uses an approximate Laplacian operator to detect edges and feature points. The difference operation can effectively remove irrelevant factors such as illumination and color from the image while preserving key information such as edges and contours. Specifically, difference images at different scales can simultaneously detect feature points and edges of different sizes, thereby generating multi-scale, multi-information feature maps. This process utilizes the smoothness and gradation characteristics of each layer of Gaussian blur image, enhancing the image's feature extraction capability at different scales and helping to handle complex geometric transformation problems.

[0081] Skeleton Extraction Module: This module employs the animal skeleton extraction method from the MMPose open-source framework. It primarily consists of a skeleton regression network and a head network for detecting and regressing skeletal keypoints. The skeleton regression network extracts multi-scale features, helping to combine multi-scale information to enhance keypoint detection performance; the head network generates keypoint heatmaps.

[0082] Preferably, it may also include a feature extraction module and a deformation loss calculation module;

[0083] The feature extraction module can be used to extract features from the original image and the enhanced image; it can take the original image and the enhanced image as input respectively, and output the feature maps of the original image and the enhanced image respectively.

[0084] The deformation loss calculation module can be used to calculate the difference between the feature maps of the original image and the enhanced image; this difference can be used as the deformation loss; when the deformation loss reaches the set threshold range, it can output a deformation loss compliance signal to the deformation module. After receiving the deformation loss compliance signal, the deformation module stops iterative interpolation and uses the image generated before stopping iterative interpolation as the final enhanced image.

[0085] Preferably, the feature extraction module may include a VGG feature extraction network.

[0086] VGG Feature Extraction Network: VGG16 is a classic convolutional neural network architecture, originally proposed by the Visual Geometry Group (VGG) at Oxford University, for image classification and feature extraction. Its main function is to extract feature maps from the original image and the generated image, which are then used in subsequent networks to calculate deformation loss. The first few convolutional layers of the VGG network extract low-level features such as edges and textures, while the middle layers focus more on shapes and contours, and the higher convolutional layers capture more complex image structural information. When used for feature extraction, only the convolutional layer portion is used, and the pre-trained weights are fixed.

[0087] The present invention also provides a data augmentation method based on target pose perturbation, which includes a key point extraction module, a key point perturbation module, and a deformation module.

[0088] The key point extraction module is used to extract key points from the original image; it is then used as input to the original image and output as the key points of the original image; the output key points are then used as the key points before perturbation.

[0089] The key point perturbation module is used to perturb the position of the key points and generate new key points; the input is the key point before perturbation, and the output is the new key point; the output key point is used as the key point after perturbation.

[0090] The deformation module shifts the key points before perturbation to the key points after perturbation based on the original image and generates an enhanced image of the corresponding original image. The input is the original image and the corresponding key points before and after perturbation. The module uses an iterative interpolation method to shift the key points before perturbation on the original image to the corresponding key point positions after perturbation and outputs the enhanced image.

[0091] The number of iterations for iterative interpolation can be constrained by setting a fixed value or by setting other parameters. The number of iterations can be set to 200 to 300.

[0092] Preferably, the deformation module can use thin-plate spline interpolation to shift the key points on the original image before the disturbance to the corresponding key point positions after the disturbance.

[0093] Preferably, the key point extraction module may be equipped with a Gaussian difference pyramid and / or a skeleton extraction module; a Gaussian difference pyramid may be used to extract candidate key points from the original image; a skeleton extraction module may be used to extract the skeleton from the original image; so that the key point extraction module extracts key points from the candidate key points and / or the skeleton accordingly.

[0094] Preferably, a feature extraction module and a deformation loss calculation module may also be provided;

[0095] The feature extraction module can be used to extract features from the original image and the enhanced image; it can take the original image and the enhanced image as inputs and output feature maps of the original image and the enhanced image respectively.

[0096] The deformation loss calculation module can be used to calculate the difference between the feature maps of the original image and the enhanced image; this difference can be used as the deformation loss; when the deformation loss reaches the set threshold range, it can output a deformation loss compliance signal to the deformation module, so that the deformation module stops iterative interpolation after receiving the deformation loss compliance signal, and the image generated before stopping iterative interpolation is used as the final enhanced image.

[0097] The deformation loss calculation module can calculate the deformation loss between the original image and the generated image, and feed it back to the deformation module to obtain a better enhanced image.

[0098] By organically combining these modules, this data augmentation method can generate more diverse samples, enrich datasets, and be widely applied to segmentation tasks in the field of computer vision.

[0099] Preferably, the deformation loss calculation formula of the deformation loss calculation module can be as follows:

[0100] L total =L content +αL wrap +βL tv +E;

[0101]

[0102] L tv =∑ e，f ((I e+1，f -I e，f ) 2 +(I e，f+1 -I e，f ) 2 );

[0103] E = ∑ j w i φ(||p j -p i -θ i ||)+b[cot(θ i，j L(i,j)+cot(θ) j，i )L(j,i)](x i -x j );

[0104] Let r = ||p j -p i -θ i ||, then: φ(r)=r 2 logr;

[0105] In the formula:

[0106] L totai Total deformation loss;

[0107] L content This represents the content distortion loss between the original image and the enhanced image.

[0108] L wrap This represents the deformation loss between the original image and the enhanced image.

[0109] L tv The total variational deformation loss between the original image and the enhanced image;

[0110] E represents the energy of rigid deformation;

[0111] α is the coefficient of deformation loss;

[0112] β is the coefficient of the total variational deformation loss;

[0113] To enhance the feature representation of the image at the v-th layer of the u-th block in the feature extraction module;

[0114] This represents the feature representation of the original image at the v-th layer of the u-th block in the feature extraction module.

[0115] w i Let i be the weight of the i-th key point;

[0116] i and j are the key point numbers;

[0117] α i Let be the vector of the perturbation at the i-th key point;

[0118] p i This is the vector of the coordinates of the i-th keypoint in the original image relative to the origin.

[0119] p j This is the vector of the coordinates of the j-th keypoint in the original image relative to the origin.

[0120] p i ′ is the vector of the coordinates of the i-th keypoint in the original image relative to the origin in the enhanced image;

[0121] I e，f To enhance the image, the pixel at position (e, f); e is the horizontal coordinate of the enhanced image, and f is the vertical coordinate of the enhanced image.

[0122] I e+1，f To enhance the image, the pixel at position (e+1, f);

[0123] I e，f+1 To enhance the image, the pixel at position (e, f+1);

[0124] θ i，j The angle between the perturbation vector of the i-th keypoint and the perturbation vector of the j-th keypoint;

[0125] θ i，j Let be the angle between the perturbation vector of the j-th keypoint and the perturbation vector of the ith keypoint;

[0126] L(i,j) is the Laplace operator for the i-th keypoint and the j-th keypoint;

[0127] L(j, i) is the Laplace operator for the j-th keypoint and the i-th keypoint;

[0128] x i For p i The x-coordinate;

[0129] x j For p j The x-coordinate;

[0130] r is an intermediate variable;

[0131] b is a coefficient;

[0132] k is the number of key points.

[0133] φ(r) represents a function of r.

[0134] The structure, workflow, and working principle of the present invention are further illustrated below with reference to a preferred embodiment:

[0135] Please see Figure 1 A data augmentation system based on target pose perturbation includes a key point extraction module, a key point perturbation module, a deformation module, a feature extraction module, and a deformation loss calculation module.

[0136] The key point extraction module is used to extract key points from the original image; it takes the original image as input and outputs the key points of the original image; the key points output are used as the key points before perturbation.

[0137] The keypoint perturbation module is used to perturb the position of keypoints and generate new keypoints; it takes the keypoints before perturbation as input and outputs the new keypoints; the keypoints it outputs are used as the keypoints after perturbation.

[0138] The deformation module is used to shift key points before perturbation to key points after perturbation based on the original image and generate an enhanced image of the corresponding original image. It takes the original image and the corresponding key points before and after perturbation as input. It uses an iterative interpolation method to shift the key points before perturbation on the original image to the corresponding key point positions after perturbation and outputs the enhanced image.

[0139] The feature extraction module is used to extract features from the original image and the enhanced image; it can take the original image and the enhanced image as input respectively, and output the feature maps of the original image and the enhanced image respectively.

[0140] The deformation loss calculation module is used to calculate the difference between the feature maps of the original image and the enhanced image; this difference can be used as the deformation loss; when the deformation loss reaches the set threshold range, it can output a deformation loss compliance signal to the deformation module. After receiving the deformation loss compliance signal, the deformation module stops iterative interpolation and uses the image generated before stopping iterative interpolation as the final enhanced image.

[0141] To improve performance in data augmentation tasks, this invention proposes a novel data augmentation branch. The deformation module implements target pose perturbation, addressing the problem that traditional pixel-level data augmentation methods cannot effectively change the target's geometry. The deformation generated by pose perturbation not only alters the target's appearance but also enhances image diversity without changing its semantic features.

[0142] This invention incorporates key points in the original image as part of the image generation process. The method is based on iterative style transfer and a rigid deformation algorithm, generating more diverse samples through iterative key point-based generation, thereby enhancing the generalization ability of other segmentation models.

[0143] The key point extraction module may include a Gaussian difference pyramid and / or a skeleton extraction module; the Gaussian difference pyramid is used to extract candidate key points from the original image; the skeleton extraction module is used to extract the skeleton from the original image; and the key point extraction module extracts key points from the candidate key points and / or the skeleton.

[0144] Both the difference-of-Gaussian pyramid and the skeleton extraction module can be used to extract key points; one of them or both can be used simultaneously.

[0145] The Difference of Gaussian pyramid (DGPP) uses a kernel function to apply Gaussian blur to images with different standard deviations layer by layer, producing blurred images of different scales. The formula for its kernel function G(x, y, σ) is as follows:

[0146]

[0147] In the formula:

[0148] x is the x-coordinate of the current pixel in the image relative to the center pixel of the blur;

[0149] y is the ordinate of the current pixel in the image relative to the blur center pixel;

[0150] σ is the standard deviation of the Gaussian function;

[0151] represents the normalization coefficient of the Gaussian function;

[0152] The exponential part of the Gaussian function represents the influence of the distance between a pixel and the center point on the blur weight. The farther a pixel is from the center, the faster its weight decays.

[0153] The difference-of-Gaussian pyramid is used to extract blurred images at different scales from the original image. The value of each pixel is compared with its neighboring pixels in both space and scale. If a pixel is a maximum or minimum value in both space and scale, it is considered a potential key point. The difference-of-Gaussian pyramid is used to perform Gaussian blurring of the image layer by layer with different standard deviations through a kernel function, thus producing blurred images at different scales.

[0154] The key point extraction module sorts the key points according to the response values and filters out the key points with higher intensity.

[0155] The deformation module employs thin-plate spline interpolation to minimize the energy function, thereby obtaining the optimal generation result, whose energy E... m The calculation formula is as follows:

[0156]

[0157] Let s = ||q i -p i ||

[0158] φ(s)=s 2 logs;

[0159] In the formula:

[0160] s is an intermediate variable;

[0161] p i This is the vector of the coordinates of the i-th keypoint in the original image relative to the origin.

[0162] q i is the vector of the coordinates of the i-th perturbated keypoint in the original image relative to the origin;

[0163] w i Let i be the weight of the i-th key point;

[0164] i represents the key point number;

[0165] n represents the number of key points.

[0166] φ(s) represents a function of s.

[0167] Preferably, the VGG feature extraction network can be a VGG-16 network. The skeleton extraction module can be an MMPose skeleton extraction module.

[0168] Preferably, the deformation loss calculation module calculates the difference between the feature maps of the original image and the enhanced image to obtain the deformation loss. The deformation loss function is calculated as follows:

[0169] L total =L content +αL wrap +βL tv +E;

[0170]

[0171] L tv =∑ e，f ((I e+1,f -I e,f ) 2 +(I e，f+1 -I e,f ) 2 );

[0172] E = ∑ j w i φ(||p j -p i -θ i ||)+b[cot(θ i，j L(i,j)+cot(θ) j，i )L(j,i)](x i -x j );

[0173] Let r = ||p j -p i -θ i ||, then: φ(r)=r 2 logr;

[0174] In the formula:

[0175] L total This represents the total deformation loss;

[0176] L content This represents the content distortion loss between the original image and the enhanced image;

[0177] L wrap This represents the deformation loss between the original image and the enhanced image.

[0178] L tvThe total variational deformation loss between the original image and the enhanced image;

[0179] E represents the energy of rigid deformation;

[0180] α is the coefficient of deformation loss;

[0181] β is the coefficient of the total variational deformation loss;

[0182] To enhance the feature representation of the image at the v-th layer of the u-th block in the feature extraction module;

[0183] This represents the feature representation of the original image at the v-th layer of the u-th block in the feature extraction module.

[0184] w i Let i be the weight of the i-th key point;

[0185] i and j are the key point numbers;

[0186] θ i Let be the vector of the perturbation at the i-th key point;

[0187] p i This is the vector of the coordinates of the i-th keypoint in the original image relative to the origin.

[0188] p j This is the vector of the coordinates of the j-th keypoint in the original image relative to the origin.

[0189] p i ′ is the vector of the coordinates of the i-th keypoint in the original image relative to the origin in the enhanced image;

[0190] I e，f To enhance the image, the pixel at position (e, f); e is the horizontal coordinate of the enhanced image, and f is the vertical coordinate of the enhanced image.

[0191] I e+1，f To enhance the image, the pixel at position (e+1, f);

[0192] I e，f+1 To enhance the image, the pixel at position (e, f+1);

[0193] θ i，j Let be the angle between the perturbation vector of the i-th keypoint and the perturbation vector of the j-th keypoint;

[0194] θ i，j The angle between the perturbation vector of the j-th keypoint and the perturbation vector of the ith keypoint;

[0195] L(i,j) is the Laplace operator for the i-th keypoint and the j-th keypoint;

[0196] L(j, i) is the Laplace operator for the j-th keypoint and the i-th keypoint;

[0197] x i For p i The x-coordinate;

[0198] x j Let x be the x-coordinate of pj;

[0199] r is an intermediate variable;

[0200] b is a coefficient;

[0201] k is the number of key points.

[0202] The present invention also provides a data augmentation method based on target pose perturbation, which includes a key point extraction module, a key point perturbation module, a deformation module, a feature extraction module, and a deformation loss calculation module.

[0203] The key point extraction module is used to extract key points from the original image; it is then used as input to the original image and output as the key points of the original image; the output key points are then used as the key points before perturbation.

[0204] The key point perturbation module is used to perturb the position of the key points and generate new key points; the input is the key point before perturbation, and the output is the new key point; the output key point is used as the key point after perturbation.

[0205] The deformation module shifts the key points before perturbation to the key points after perturbation based on the original image and generates an enhanced image of the corresponding original image. The input is the original image and the corresponding key points before and after perturbation. The module uses an iterative interpolation method to shift the key points before perturbation on the original image to the corresponding key point positions after perturbation and outputs the enhanced image.

[0206] The feature extraction module is used to extract features from the original image and the enhanced image; these features are then input into the original image and the enhanced image respectively, and the feature maps of the original image and the enhanced image are output respectively.

[0207] The deformation loss calculation module calculates the difference between the feature maps of the original image and the enhanced image; this difference is used as the deformation loss; when the deformation loss reaches the set threshold range, it outputs a deformation loss compliance signal to the deformation module, so that the deformation module stops iterative interpolation after receiving the deformation loss compliance signal, and the image generated before stopping iterative interpolation is used as the final enhanced image.

[0208] The key point extraction module includes a Gaussian difference pyramid, a key point filtering module, and / or a skeleton extraction module and a skeleton key point extraction module. It uses the Gaussian difference pyramid to extract candidate key points from the original image; the skeleton extraction module extracts the skeleton from the original image; the key point filtering module filters candidate key points, and the skeleton key point extraction module extracts key points from the skeleton. In other words, the key point extraction module extracts key points from candidate key points and / or the skeleton.

[0209] The method includes the following steps:

[0210] Step 1: Use the difference of Gaussian pyramid to extract candidate key points from the original image, or use the skeleton extraction module to extract the skeleton from the original image.

[0211] Step two: Use the keypoint filtering module to filter keypoints, or use the skeleton keypoint extraction module to extract keypoints on the skeleton; use the skeleton extraction module to extract image features at different scales from the original image and generate a keypoint heatmap.

[0212] Step 3: Use the key point perturbation module to generate the perturbed key point coordinates;

[0213] Step four: Use the deformation module to generate an enhanced image;

[0214] Step 5: Use the VGG feature extraction network to extract features from the enhanced image and the original image;

[0215] Step 6: Calculate the deformation loss between the enhanced image and the original image using the deformation loss calculation module;

[0216] Step 7: Repeat steps 4 to 6 until the deformation loss meets the target or the number of iterations reaches the set value, at which point the iteration is complete.

[0217] Step eight yields the final enhanced image.

[0218] The aforementioned key point extraction module, key point perturbation module, deformation module, feature extraction module and deformation loss calculation module, Gaussian difference pyramid, key point screening module, skeleton extraction module, skeleton key point extraction module, VGG feature extraction network, VGG-16 network, MMPose skeleton extraction module and other functional modules can all adopt applicable functional modules in the existing technology, or adopt functional modules and software in the existing technology and construct them using conventional technical means.

[0219] The embodiments described above are only used to illustrate the technical ideas and features of the present invention. Their purpose is to enable those skilled in the art to understand the content of the present invention and implement it accordingly. The patent scope of the present invention should not be limited by these embodiments. That is, all equivalent changes or modifications made in accordance with the spirit disclosed in the present invention still fall within the patent scope of the present invention.

Claims

1. A data augmentation system based on target pose perturbation, characterized in that, It includes a key point extraction module, a key point perturbation module, and a deformation module; The key point extraction module is used to extract key points from the original image; It takes the original image as input and outputs the key points in the original image; the key points output are used as the key points before perturbation. The key point perturbation module is used to perturb the position of key points and generate new key points; Its input is the key point before the disturbance, and its output is the new key point; Use its output key points as the perturbation key points; The deformation module is used to shift key points from the original image to the key points after the perturbation, and generate an enhanced image corresponding to the original image. Its input is the original image and the corresponding key points before and after the perturbation; It uses an iterative interpolation method to shift the key points on the original image before perturbation to the corresponding key point positions after perturbation, and outputs an enhanced image; It also includes a feature extraction module and a deformation loss calculation module; The feature extraction module is used to extract features from the original image and the enhanced image; It takes the original image and the enhanced image as input, and outputs the feature maps of the original image and the enhanced image respectively. The deformation loss calculation module is used to calculate the difference between the feature maps of the original image and the enhanced image; this difference is used as the deformation loss; when the deformation loss reaches the set threshold range, it outputs a deformation loss compliance signal to the deformation module. After receiving the deformation loss compliance signal, the deformation module stops iterative interpolation and uses the image generated before stopping iterative interpolation as the final enhanced image. The deformation loss calculation formula in the deformation loss calculation module is as follows: ；；；；； set up ,but: ; In the formula: This represents the total deformation loss; This represents the content distortion loss between the original image and the enhanced image. This represents the deformation loss between the original image and the enhanced image. The total variational deformation loss between the original image and the enhanced image; It is the energy of rigid deformation; This is the coefficient for deformation loss; The coefficient for the total variational deformation loss; To enhance the image in the feature extraction module Block 1 Feature representation at layer location; For the original image in the feature extraction module Block 1 Feature representation at layer location; For the first The weight of each key point; and Number the key points; For the first The vector of the perturbation at each key point; For the first image in the original image Vectors of the coordinates of each key point relative to the origin; For the first image in the original image Vectors of the coordinates of each key point relative to the origin; For the first image in the original image The vector of the point coordinates of each key point in the enhanced image relative to the origin; To enhance the image Location( (pixel above) To enhance the image x-coordinate of position To enhance the image Position coordinate; To enhance the image Location( (pixel above) To enhance the image Location( (pixel above) For the first The vector of the perturbation at the first key point and the first... The angle between the vectors of the perturbations at each key point; For the first The vector of the perturbation at the first key point and the first... The angle between the vectors of the perturbations at each key point; For the first The key point and the first Laplace operator at key points; For the first The key point and the first Laplace operator at key points; The x-coordinate; The x-coordinate; As an intermediate variable; For coefficients; This refers to the number of key points.

2. The data augmentation system based on target pose perturbation according to claim 1, characterized in that, The deformation module uses thin-plate spline interpolation to shift the key points on the original image before perturbation to the corresponding key point positions after perturbation.

3. The data augmentation system based on target pose perturbation according to claim 1, characterized in that, The key point extraction module includes a Gaussian difference pyramid and / or a skeleton extraction module; the Gaussian difference pyramid is used to extract candidate key points from the original image; the skeleton extraction module is used to extract the skeleton from the original image; and the key point extraction module extracts key points from the candidate key points and / or the skeleton.

4. The data augmentation system based on target pose perturbation according to claim 1, characterized in that, The feature extraction module includes the VGG feature extraction network.

5. A data augmentation method based on target pose perturbation, characterized in that, Set up a key point extraction module, a key point perturbation module, and a deformation module; The key point extraction module is used to extract key points from the original image; it takes the original image as input and outputs the key points of the original image. Use the key points of its output as the key points before the perturbation; The key point perturbation module is used to perturb the position of the key points and generate new key points; the input is the key point before perturbation, and the output is the new key point. Use its output key points as the perturbation key points; The deformation module is used to shift the key points before perturbation to the key points after perturbation based on the original image, and generate an enhanced image of the corresponding original image. The input is the original image and the corresponding key points before and after perturbation. The module uses an iterative interpolation method to shift the key points before perturbation on the original image to the corresponding key point positions after perturbation, and outputs the enhanced image. It also includes a feature extraction module and a deformation loss calculation module; The feature extraction module is used to extract features from the original image and the enhanced image; these features are then input into the original image and the enhanced image respectively, and the feature maps of the original image and the enhanced image are output respectively. The deformation loss calculation module calculates the difference between the feature maps of the original image and the enhanced image; this difference is used as the deformation loss; when the deformation loss reaches the set threshold range, it outputs a deformation loss compliance signal to the deformation module, so that the deformation module stops iterative interpolation after receiving the deformation loss compliance signal, and the image generated before stopping iterative interpolation is used as the final enhanced image. The deformation loss calculation formula in the deformation loss calculation module is as follows: ；；；；； set up ,but: ; In the formula: Total deformation loss; This represents the content distortion loss between the original image and the enhanced image. This represents the deformation loss between the original image and the enhanced image. The total variational deformation loss between the original image and the enhanced image; It is the energy of rigid deformation; This is the coefficient for deformation loss; The coefficient for the total variational deformation loss; To enhance the image in the feature extraction module Block 1 Feature representation at layer location; For the original image in the feature extraction module Block 1 Feature representation at layer location; For the first The weight of each key point; and Number the key points; For the first The vector of the perturbation at each key point; For the first image in the original image Vectors of the coordinates of each key point relative to the origin; For the first image in the original image Vectors of the coordinates of each key point relative to the origin; For the first image in the original image The vector of the coordinates of each key point in the enhanced image relative to the origin; To enhance the image Location( (pixel above) To enhance the image x-coordinate of position To enhance the image Position coordinate; To enhance the image Location( (pixel above) To enhance the image Location( (pixel above) For the first The vector of the perturbation at the first key point and the first... The angle between the vectors of the perturbations at each key point; For the first The vector of the perturbation at the first key point and the first... The angle between the vectors of the perturbations at each key point; For the first The key point and the first Laplace operator at key points; For the first The key point and the first Laplace operator at key points; The x-coordinate; The x-coordinate; As an intermediate variable; For coefficients; This refers to the number of key points.

6. The data augmentation method based on target pose perturbation according to claim 5, characterized in that, The deformation module uses thin-plate spline interpolation to shift the key points on the original image before perturbation to the corresponding key point positions after perturbation.

7. The data augmentation method based on target pose perturbation according to claim 5, characterized in that, The key point extraction module is configured with a Gaussian difference pyramid and / or a skeleton extraction module; Candidate key points are extracted from the original image using the difference of Gaussian pyramid; a skeleton extraction module is used to extract the skeleton from the original image; and the key point extraction module extracts key points from the candidate key points and / or the skeleton accordingly.