Image data enhancement method and binocular stereo matching model training method and device

By generating random masks and disordered images based on target sample image pairs, data augmentation image pairs are constructed, which solves the problem of large differences between artificially defined noise and real noise, and achieves better training results and model fit.

CN116740494BActive Publication Date: 2026-06-23PING AN TECH (SHENZHEN) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
PING AN TECH (SHENZHEN) CO LTD
Filing Date
2023-06-02
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

In existing image data augmentation methods, the artificially defined noise differs significantly from real noise, resulting in low fit with the training sample set and affecting the training effect of deep learning models.

Method used

By generating first and second random masks and out-of-order images, data augmentation images are generated based on target sample image pairs. The noise originates from the original image itself. Data augmentation sample image pairs are constructed to improve the fit between noise and the training sample set.

Benefits of technology

This improves the fit between noise and the training sample set, resulting in better training performance. It also expands the training sample set and enhances the robustness and generalization ability of the model.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116740494B_ABST
    Figure CN116740494B_ABST
Patent Text Reader

Abstract

The application belongs to the technical field of medical treatment and can be used for sample data enhancement in the scene of human body, tissue organ modeling and the like in the medical field, and particularly relates to an image data enhancement method, a training method and device of a binocular stereo matching model. The method comprises the following steps: obtaining a target sample image pair, the target sample image pair comprising a first image and a second image; generating a first random mask and a second random mask; obtaining a first disordered image and a second disordered image; obtaining a first data enhancement image based on the first random mask, the second disordered image and the first image; obtaining a second data enhancement image based on the second random mask, the first disordered image and the second image; and combining the first data enhancement image and the second data enhancement image to obtain a data enhancement sample image pair. The above method and device improve the adaptability of the introduced noise to the training sample set, so that the expanded training sample has better training effect.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of medical technology, and more specifically, to an image data enhancement method, a training method and apparatus for a binocular stereo matching model. Background Technology

[0002] Training neural network models using training samples is a common technique applicable to various fields, such as 3D modeling of patients' bodies or organs in the medical field. These techniques can be implemented through image recognition, which requires prior training of the image recognition model to achieve the desired recognition results. Since the training sample set significantly impacts the performance of deep learning algorithms, an insufficient or imbalanced training sample set often leads to overfitting in deep neural networks. Improving the quantity and quality of samples in the training sample set is a long-standing research topic in deep learning. However, acquiring large-scale training sample sets is often costly and time-consuming; for example, labeling binocular parallax training samples is extremely difficult. In such cases, data augmentation is needed to expand the training sample set. Data augmentation uses existing samples to generate new samples, avoiding network overfitting and improving model robustness and generalization ability. In computer vision, data augmentation methods include contrast adjustment, brightness adjustment, and noise addition. Among these, artificially defined noise methods such as salt-and-pepper noise and Gaussian noise are commonly used.

[0003] Since the noise introduced by existing image data augmentation methods is all artificially defined, there are problems such as the introduced noise differing greatly from the real noise and having low fit with the training sample set. Summary of the Invention

[0004] The main purpose of this application is to provide an image data augmentation method, a training method and apparatus for a binocular stereo matching model, which aims to solve the technical problem that artificially defined noise introduced during data augmentation in the medical field differs greatly from real noise and has poor adaptability to the training sample set.

[0005] To achieve the aforementioned objectives, this application provides an image data enhancement method for expanding training samples for a binocular stereo matching task, comprising:

[0006] Acquire a pair of target sample images, wherein the pair of target sample images includes a first image and a second image;

[0007] A first random mask is generated based on the first image, and a second random mask is generated based on the second image, wherein the first random mask has the same size as the first image, and the second random mask has the same size as the second image;

[0008] A first disordered image is obtained based on the first image, and a second disordered image is obtained based on the second image, wherein the first disordered image is obtained by scrambling the pixels of the first image, and the second disordered image is obtained by scrambling the pixels of the second image.

[0009] Based on the first random mask, the second disordered image, and the first image, a first data-enhanced image is obtained;

[0010] Based on the second random mask, the first disordered image, and the second image, a second data-enhanced image is obtained;

[0011] The first data-enhanced image and the second data-enhanced image are combined to obtain a data-enhanced sample image pair.

[0012] In one embodiment, both the first random mask and the second random mask are binary masks, and the first random mask and the second random mask are obtained by the following formula:

[0013]

[0014]

[0015] Among them, M l M is the first random mask. r For the second random mask, M l [p, q] are used to characterize the coordinates of the pixels in the first image to which noise needs to be added, M r [p, q] is used to characterize the coordinates of the pixels in the second image that need to have noise added, where p and q are pixel indices, and ρ controls the proportion of the pixels with noise added to the total number of pixels in the sample image pair.

[0016] In one embodiment, the first data-enhanced image and the second data-enhanced image are obtained by the following formula:

[0017]

[0018]

[0019] Where, x l For the first image, x r For the second image, x l ′ represents the first disordered image, x r ′ represents the second disordered image, M l M is the first random mask. r λ is the second random mask, and λ is the noise scaling factor.

[0020] In one embodiment, the steps of obtaining a first disordered image based on the first image and obtaining a second disordered image based on the second image include:

[0021] Get the preset scrambling template;

[0022] The pixel values ​​of the first image are filled into the preset scrambling template according to preset rules to obtain a first scrambled image, and the pixel values ​​of the second image are filled into the preset scrambling template according to preset rules to obtain a second scrambled image.

[0023] In one embodiment, the step of acquiring the target sample image pair includes:

[0024] Obtain the original sample image pairs;

[0025] The original sample image pairs are preprocessed to obtain target sample image pairs.

[0026] This application also provides a training method for a binocular stereo matching model, including:

[0027] Construct an image pair sample set, wherein the image pair sample set is constructed based on the image data enhancement method provided in any of the above embodiments;

[0028] The image sample set is divided into a training sample subset and a test sample subset;

[0029] At least one training sample from the training sample subset is input into the original stereo matching model for training. During the training process, the parameters of the original stereo matching model are adjusted according to the preset loss function, the first threshold and the output of the original stereo matching model until the training loss of the original stereo model is less than the first threshold, thereby obtaining the undetermined stereo matching model.

[0030] At least one test sample from the subset of test samples is input into the undetermined binocular stereo matching model for verification, and during the verification process, it is determined whether the output of the undetermined binocular stereo matching model meets the first preset condition.

[0031] If the output of the undetermined binocular stereo matching model satisfies the first preset condition, then the undetermined binocular stereo matching model is determined as the target binocular stereo matching model.

[0032] If the undetermined binocular stereo matching model does not meet the first preset condition, then the undetermined binocular stereo matching model will continue to be trained based on the training sample subset.

[0033] This application also provides an image data enhancement device for expanding training samples for a binocular stereo matching task, comprising:

[0034] A target sample image pair acquisition module is used to acquire target sample image pairs, wherein the target sample image pairs include a first image and a second image;

[0035] A random mask generation module is used to generate a first random mask based on the first image and a second random mask based on the second image, wherein the first random mask has the same size as the first image and the second random mask has the same size as the second image;

[0036] The disordered image acquisition module is used to obtain a first disordered image based on the first image and a second disordered image based on the second image, wherein the first disordered image is obtained by scrambling the pixels of the first image and the second disordered image is obtained by scrambling the pixels of the second image.

[0037] The first data augmentation image acquisition module is used to obtain a first data augmentation image based on the first random mask, the second disordered image, and the first image;

[0038] The second data augmentation image acquisition module is used to obtain a second data augmentation image based on the second random mask, the first disordered image, and the second image;

[0039] The data augmentation sample image pair acquisition module is used to combine the first data augmentation image and the second data augmentation image to obtain a data augmentation sample image pair.

[0040] This application also provides a training device for a binocular stereo matching model, comprising:

[0041] An image pair sample set construction module is used to construct an image pair sample set, wherein the image pair sample set is constructed based on the image data enhancement method provided in any of the above embodiments;

[0042] The sample subset partitioning module is used to divide the image sample set into a training sample subset and a test sample subset;

[0043] The model training module is used to input at least one training sample from the training sample subset into the original stereo matching model for training, and adjust the parameters of the original stereo matching model according to the preset loss function, the first threshold and the output of the original stereo matching model during the training process, until the training loss of the original stereo model is less than the threshold, thereby obtaining the undetermined stereo matching model.

[0044] The model testing module is used to input at least one test sample from the subset of test samples into the undetermined binocular stereo matching model for verification, and during the verification process, determine whether the output of the undetermined binocular stereo matching model meets a first preset condition; and,

[0045] This is used to determine the undetermined binocular stereo matching model as the target binocular stereo matching model when the output of the undetermined binocular stereo matching model meets the first preset condition.

[0046] This application also provides a computer device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps of the image data enhancement method provided in any of the above embodiments.

[0047] This application also provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps of the image data enhancement method provided in any of the above embodiments.

[0048] Beneficial effects:

[0049] This application provides an image data augmentation method, a training method and apparatus for a binocular stereo matching model, used for data augmentation in medical applications, specifically for expanding training samples for a binocular stereo matching task. This application generates a first random mask and a second random mask based on acquired target sample image pairs, as well as a first disordered image and a second disordered image. Based on the first random mask, the second disordered image, and the first image, a first data-enhanced image is obtained. Using the second disordered image as a noise source, the proportion of pixels in the first image that need noise added is determined using the first random mask, and noise is added to the first image to obtain the first data-enhanced image. Furthermore, using the first image in the target sample image pair to construct a first disordered image noise source, the proportion of pixels in the second image that need noise added is determined using the second random mask, and noise is added to the second image to obtain the second data-enhanced image. Finally, the first data-enhanced image and the second data-enhanced image are combined to obtain a data-enhanced image pair to expand the training samples. Since the noise source is constructed based on the original target sample image pair, the noise added is closer to the noise distribution of the target sample image pair in the real world than the noise defined by humans. This improves the fit between the introduced noise and the training sample set, resulting in better training performance when training the relevant model using the training sample set obtained by the image data augmentation method provided in this application. Attached Figure Description

[0050] Figure 1 This is a schematic flowchart of an image data enhancement method according to an embodiment of this application;

[0051] Figure 2 This is a schematic diagram illustrating an application scenario of image scrambling according to an embodiment of this application.

[0052] Figure 3This is a flowchart illustrating step S103 in an image data enhancement method according to an embodiment of this application.

[0053] Figure 4 This is a flowchart illustrating step S101 in an image data enhancement method according to an embodiment of this application.

[0054] Figure 5 This is a flowchart illustrating a training method for a binocular stereo matching model according to an embodiment of this application.

[0055] Figure 6 This is a schematic diagram of the structure of an image data enhancement device according to an embodiment of this application;

[0056] Figure 7 This is a schematic diagram of the structure of a training device for a binocular stereo matching model according to an embodiment of this application;

[0057] Figure 8 This is a schematic diagram of the structure of a computer device according to an embodiment of this application.

[0058] The realization of the objective, functional features and advantages of the present invention will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation

[0059] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.

[0060] Those skilled in the art will understand that, unless specifically stated otherwise, the singular forms “a,” “an,” “the,” and “the” used herein may also include the plural forms. It should be further understood that the term “comprising” as used in this specification means the presence of features, integers, steps, operations, elements, modules, and / or components, but does not exclude the presence or addition of one or more other features, integers, steps, operations, elements, modules, components, and / or groups thereof. It should be understood that when we say an element is “connected” or “coupled” to another element, it can be directly connected or coupled to the other element, or there may be intermediate elements. Furthermore, “connected” or “coupled” as used herein can include wireless connections or wireless coupling. The term “and / or” as used herein includes all or any modules and all combinations of one or more associated listed items.

[0061] It will be understood by those skilled in the art that, unless otherwise defined, all terms used herein (including technical and scientific terms) have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. It should also be understood that terms such as those defined in general dictionaries should be understood to have the same meaning as in the context of the prior art, and should not be interpreted in an idealized or overly formal sense unless specifically defined as herein.

[0062] Stereo matching has always been a research hotspot in binocular vision. Binocular cameras capture two viewpoint images of the same scene from the left and right perspectives, and stereo matching algorithms are used to obtain a disparity map, which in turn yields a depth map. Depth maps have a wide range of applications. Because they record the distance between objects in a scene and the camera, they can be used for measurement, 3D reconstruction, and the synthesis of virtual viewpoints. Therefore, they play a significant role in the medical field, for example, in the 3D modeling of patients' bodies or organs.

[0063] To accurately, quickly, and intelligently perform binocular stereo matching tasks in the medical field, deep learning techniques are typically used to construct binocular stereo matching models to perform these tasks. The training sample set significantly impacts the performance of deep learning algorithms; an insufficient or imbalanced training sample set often leads to overfitting in deep neural networks. However, labeling binocular disparity training samples is challenging, making the acquisition of large-scale training samples extremely costly in terms of both technology and time. Therefore, data augmentation methods are needed to expand the training sample set. In computer vision, data augmentation methods include contrast adjustment, brightness adjustment, and noise addition. Common noise addition methods include salt-and-pepper noise and Gaussian noise. However, these methods have drawbacks: the introduced noise is artificially defined and differs significantly from real noise, resulting in insufficient fit with the training sample set. Therefore, it is necessary to research new noise addition methods to improve the correlation between artificially introduced noise and real noise, thereby enhancing the fit with the training sample set and ultimately obtaining a large number of high-performing training samples.

[0064] Please refer to Figure 1 This application provides an image data enhancement method for use in the medical field. The method includes steps S101-S106, and the details of each step are described below.

[0065] In one embodiment, the image data enhancement method includes:

[0066] S101. Obtain a target sample image pair, wherein the target sample image pair includes a first image and a second image.

[0067] In this embodiment, the target sample image pair is obtained based on a binocular camera and can be a human body image or a partial human body image required in a medical process. The target sample image pair includes a first image and a second image (i.e., left and right images of the same scene captured by the binocular camera). The shooting scene (target object) in the target sample image pair can come from human body modeling, tissue and organ modeling scenes, etc. in the medical field.

[0068] S102. Generate a first random mask based on the first image and a second random mask based on the second image, wherein the first random mask has the same size as the first image and the second random mask has the same size as the second image.

[0069] Masking is a common operation in deep learning, essentially covering a raw tensor with a mask to either block or select specific elements. In image processing applications, its uses include region of interest extraction, image masking, and image feature extraction. In this embodiment, the first random mask is the same size as the first image and is used to determine the proportion of pixels in the first image that need noise added. The second random mask is the same size as the second image and is used to determine the pixels in the second image that need noise added.

[0070] In some embodiments, both the first random mask and the second random mask are binary masks, and the first random mask and the second random mask can be obtained by the following formulas (1) and (2), respectively:

[0071]

[0072]

[0073] Among them, M l M is the first random mask. r For the second random mask, M l [p, q] are used to characterize the coordinates of the pixels in the first image to which noise needs to be added, M r [p, q] is used to characterize the coordinates of the pixels in the second image that need to have noise added, where p and q are pixel indices, and ρ controls the proportion of pixels with added noise to the total number of pixels in the sample image pair.

[0074] The `rand` function generates an array of random numbers uniformly distributed between (0, 1). `rand(0, 1)` generates a random number between (0, 1). When this random number is greater than ρ, the pixel at coordinate (p, q) in both the first and second random masks has a value of 0. When the random number is less than or equal to ρ, the pixel at coordinate (p, q) in both random masks has a value of 1. Performing a dot product operation (multiplying elements with the same coordinates) between the first random mask and the first image determines the pixels in the first image that need noise added. Similarly, performing a dot product operation between the second random mask and the second image determines the pixels in the second image that need noise added.

[0075] S103. Obtain a first disordered image based on the first image, and obtain a second disordered image based on the second image, wherein the first disordered image is obtained by scrambling the pixels of the first image, and the second disordered image is obtained by scrambling the pixels of the second image.

[0076] In this embodiment, a copy of the target sample image pair is first created. Assume the target sample image pair is (x... l x r If ), then its copy is (x l1 x r1 ), and copy the image x l1 and image x r1 The order of the pixels is shuffled, and the pixel values ​​of the corresponding pixels remain unchanged before and after the rearrangement, resulting in a shuffled copy (x). l ′, x r ′), x l ′ is the first disordered image, x r ′ is the second disordered image.

[0077] S104. Based on the first random mask, the second disordered image, and the first image, obtain the first data augmentation image.

[0078] In this embodiment, a second disordered image noise source is constructed based on the second image in the target sample image pair. The proportion of pixels in the first image that need to be noise-added is determined by the first random mask, and noise is added to the first image to obtain the first data-enhanced image.

[0079] S105. Based on the second random mask, the first disordered image, and the second image, obtain the second data-enhanced image.

[0080] In this embodiment, a first disordered image noise source is constructed based on the first image in the target sample image pair. The proportion of pixels in the second image that need to be noise-added is determined by a second random mask, and noise is added to the second image to obtain a second data-enhanced image.

[0081] In steps S104-S105, since the noise source is the target sample image pair itself, the added noise is closer to the noise distribution of the target sample image pair in the real world compared to purely artificially defined noise.

[0082] S106. Combine the first data-enhanced image and the second data-enhanced image to obtain a data-enhanced sample image pair.

[0083] In this embodiment, the first data-enhanced image and the second data-enhanced image are combined to obtain a data-enhanced sample image pair. In this embodiment, based on the steps S20-S60 described above, multiple different sets of data-enhanced sample image pairs can be constructed by adjusting the first and second random masks and the first and second disordered images to expand the training sample set.

[0084] In one embodiment, it is necessary to perform 3D modeling of the human body or organs. A large number of images of the human body or organs of patients are obtained through a binocular camera. These images are used as samples. Then, based on the above steps S20-S60, multiple sets of different data augmentation sample data image pairs are constructed by adjusting the first and second random masks and the first and second disordered images to obtain expanded training samples. Then, the expanded training samples are used to train the corresponding binocular stereo matching model to obtain better training results and obtain the corresponding medical 3D modeling model, etc.

[0085] This application provides an image data augmentation method for medical data augmentation, specifically for expanding training samples for a binocular stereo matching task. The method involves generating a first random mask and a second random mask based on acquired target sample image pairs, generating a first disordered image and a second disordered image, and obtaining a first data-enhanced image based on the first random mask, the second disordered image, and the first image. Using the second disordered image as a noise source, the method determines the proportion of pixels in the first image that need noise addition using the first random mask, adding noise to the first image to obtain the first data-enhanced image. Furthermore, the method constructs a first disordered image noise source using the first image in the target sample image pair, determines the proportion of pixels in the second image that need noise addition using the second random mask, adding noise to the second image to obtain the second data-enhanced image. Finally, the first data-enhanced image and the second data-enhanced image are combined to obtain a data-enhanced image pair to expand the training samples. Since the noise source is constructed based on the original target sample image pair, the noise added is closer to the noise distribution of the target sample image pair in the real world than the noise defined by humans. This improves the fit between the introduced noise and the training sample set, resulting in better training performance when training the relevant model using the training sample set obtained by the image data augmentation method provided in this application.

[0086] In some embodiments, the first data-enhanced image and the second data-enhanced image described above are obtained by the following formulas (3) and (4):

[0087]

[0088]

[0089] Where, x l For the first image, x r For the second image, x l ′ represents the first disordered image, x r ′ represents the second disordered image, M l M is the first random mask. r λ is the second random mask, and λ is the noise scaling factor; ⊙ represents the operation of multiplying pixels with the same coordinate position between matrices.

[0090] Using the above formulas (3) and (4), the data-enhanced image pair can be obtained. By adjusting the first and second random masks, the first and second disordered images, and the noise scaling factor λ, multiple sets of different data augmentation pairs can be obtained, thereby expanding the training samples.

[0091] In some embodiments, please refer to Figure 2The steps of obtaining a first disordered image based on the first image and generating a second disordered image based on the second image include:

[0092] S103a, Obtain the preset scrambling template.

[0093] In this embodiment, the pixel positions of the first and second images are scrambled using a preset scrambling template. The scrambling template is set according to actual needs. It should be noted that the size of the scrambling template matches the size of the image to be scrambled. When the size of the image to be scrambled is smaller than the size of the scrambling template, the boundaries of the image to be scrambled need to be patched (e.g., padded with 0s outside its boundaries) to fit the scrambling template.

[0094] S103b: Fill the pixel values ​​of the first image into the preset scrambling template according to the preset rules to obtain a first scrambled image; and fill the pixel values ​​of the second image into the preset scrambling template according to the preset rules to obtain a second scrambled image.

[0095] In this embodiment, the first image and the second image are combined with scrambling templates, that is, the pixel values ​​of the first image and the second image are respectively filled into the selected scrambling templates according to preset rules, thereby obtaining the first scrambled image and the second scrambled image. Please refer to... Figure 3 , Figure 3 This demonstrates an application scenario for image scrambling.

[0096] In some embodiments, please refer to Figure 4 The step of obtaining the target sample image pair includes:

[0097] S101a, Obtain the original sample image pairs.

[0098] In this embodiment, the original sample image pairs are obtained based on a stereo camera, and each pair includes left and right images of the same scene captured by the stereo camera. The scene (target object) in the original sample image pairs can originate from service scenarios of intelligent question-answering robots in the financial field, human body modeling, or tissue and organ modeling scenarios in the medical field, etc.

[0099] S101b: Preprocess the original sample image pair to obtain the target sample image pair.

[0100] In this embodiment, in order to obtain a large number of expanded training samples, before adding noise, some preprocessing that does not damage the pixels of the target objects in the original sample image pairs can be performed, such as rotation, flipping, and boundary cropping, to obtain multiple sets of different target sample image pairs, thereby obtaining more different training samples.

[0101] Please refer to Figure 5 This application also provides a training method for a binocular stereo matching model, which includes the following steps:

[0102] S201. Construct an image pair sample set, wherein the image pair sample set is constructed based on the image data enhancement method provided in any of the above embodiments.

[0103] In this embodiment, a large number of training samples for training the binocular stereo matching model are constructed based on the image data enhancement method provided in any of the above embodiments, forming an image pair sample set.

[0104] S202. Divide the image sample set into a training sample subset and a test sample subset.

[0105] In this embodiment, to ensure high stability of the model after training, the image sample set is divided into a training sample subset and a test sample subset. The training sample subset is used for the model training process. When the training result converges, i.e., the preset training constraints are met, the stability of the model is tested using the test sample subset. When the model meets the preset stability conditions, training stops, and the current model is used as the target model. If the preset stability conditions are not met, the model continues to be trained using the training samples in the training sample subset until it passes the test verification.

[0106] S203. Input at least one training sample from the training sample subset into the original stereo matching model for training, and adjust the parameters of the original stereo matching model according to the preset loss function, the first threshold and the output of the original stereo matching model during the training process until the training loss of the original stereo model is less than the first threshold, thereby obtaining the undetermined stereo matching model.

[0107] In this embodiment, at least one training sample from a subset of training samples is input into the original stereo matching model for training. For ease of training, batch processing is typically used, meaning that a batch of sample data is input during each training iteration, and the training loss is calculated for all samples. During training, the parameters of the original stereo matching model are adjusted based on a preset loss function, a first threshold (i.e., a convergence threshold), and the output of the original stereo matching model until the training loss of the original stereo model is less than the first threshold, thus obtaining the undetermined stereo matching model.

[0108] S204. Input at least one test sample from the subset of test samples into the undetermined binocular stereo matching model for verification, and determine whether the output of the undetermined binocular stereo matching model meets the first preset condition during the verification process.

[0109] S205. If the output of the undetermined binocular stereo matching model satisfies the first preset condition, then the undetermined binocular stereo matching model is determined as the target binocular stereo matching model.

[0110] S206. If the undetermined binocular stereo matching model does not meet the first preset condition, then the undetermined binocular stereo matching model will continue to be trained based on the training sample subset.

[0111] In this embodiment, after training the undetermined stereo matching model, its stability needs to be tested and verified. Specifically, at least one test sample from the test sample subset is input into the undetermined stereo matching model for verification. During the verification process, it is determined whether the output of the undetermined stereo matching model meets a first preset condition, that is, whether the deviation between its predicted value and the true value is within a preset range (e.g., 2%). If so, the undetermined stereo matching model is considered to have high stability, the training process can be terminated, and it is determined as the target stereo matching model for use. If not, the model continues to be trained using the training samples from the training sample subset until the test verification is passed.

[0112] The stereo matching model trained using the above-mentioned method can be used to perform corresponding stereo assignment tasks, thereby achieving accurate, fast, and intelligent task completion.

[0113] Please refer to Figure 6 This application also provides an image data enhancement device for expanding training samples for a binocular stereo matching task, comprising:

[0114] The target sample image pair acquisition module 601 is used to acquire target sample image pairs, wherein the target sample image pairs include a first image and a second image;

[0115] The random mask generation module 602 is used to generate a first random mask based on the first image and a second random mask based on the second image, wherein the first random mask has the same size as the first image and the second random mask has the same size as the second image;

[0116] The disordered image acquisition module 603 is used to obtain a first disordered image based on the first image and to generate a second disordered image based on the second image, wherein the first disordered image is obtained by scrambling the pixels of the first image and the second disordered image is obtained by scrambling the pixels of the second image.

[0117] The first data augmentation image acquisition module 604 is used to obtain a first data augmentation image based on the first random mask, the second disordered image, and the first image;

[0118] The second data augmentation image acquisition module 605 is used to obtain a second data augmentation image based on the second random mask, the first disordered image, and the second image;

[0119] The data augmentation sample image pair acquisition module 606 is used to combine the first data augmentation image and the second data augmentation image to obtain a data augmentation sample image pair.

[0120] In this embodiment, target sample image pairs are acquired by the target sample image pair acquisition module 601. These target sample image pairs are obtained using a binocular camera and include a first image and a second image (i.e., left and right images of the same scene captured by the binocular camera). The shooting scene (target object) in the target sample image pair can originate from service scenarios of intelligent question-answering robots in the financial field, human body modeling in the medical field, or organ modeling scenarios, etc.

[0121] In this embodiment, a first random mask is generated for the first image in the target sample image pair by a random mask generation module 602, and a second random mask is generated for the second image. The first random mask has the same size as the first image, and the second random mask has the same size as the second image. Masking is a common operation in deep learning, equivalent to covering the original tensor with a mask to block or select specific elements. In image processing applications, its applications include image region of interest extraction, image masking, and image feature extraction. In this embodiment, the first random mask has the same size as the first image and is used to determine the proportion of pixels in the first image that need noise added; the second random mask has the same size as the second image and is used to determine the pixels in the second image that need noise added.

[0122] In this embodiment, both the first random mask and the second random mask are binary masks. The first random mask and the second random mask can be obtained by the random mask generation module 602 based on the following formulas:

[0123]

[0124]

[0125] Among them, M l M is the first random mask. r For the second random mask, M l [p, q] are used to characterize the coordinates of the pixels in the first image to which noise needs to be added, M r [p, q] is used to characterize the coordinates of the pixels in the second image that need to have noise added, where p and q are pixel indices, and ρ controls the proportion of pixels with added noise to the total number of pixels in the sample image pair.

[0126] The `rand` function generates an array of random numbers uniformly distributed between (0, 1). `rand(0, 1)` generates a random number between (0, 1). When this random number is greater than ρ, the pixel at coordinate (p, q) in both the first and second random masks has a value of 0. When the random number is less than or equal to ρ, the pixel at coordinate (p, q) in both random masks has a value of 1. Performing a dot product operation (multiplying elements with the same coordinates) between the first random mask and the first image determines the pixels in the first image that need noise added. Similarly, performing a dot product operation between the second random mask and the second image determines the pixels in the second image that need noise added.

[0127] In this embodiment, a copy of the target sample image pair is also created by the disordered image acquisition module 603, assuming the target sample image pair is (x l x r If ), then its copy is (x l1 x r1 Then, the image x in the copy... l1 and image x r1 The order of the pixels is shuffled, and the pixel values ​​of the corresponding pixels remain unchanged before and after the rearrangement, thus obtaining a shuffled copy (x). l ′, x r ′), x l ′ is the first disordered image, x r ′ is the second disordered image.

[0128] In this embodiment, the first data-enhanced image acquisition module 604 constructs a second disordered image noise source based on the second image in the target sample image pair, uses a first random mask to determine the proportion of pixels in the first image that need to have noise added, adds noise to the first image, and thus obtains the first data-enhanced image.

[0129] In this embodiment, the first data-enhanced image acquisition module 605 constructs a first disordered image noise source based on the first image in the target sample image pair, and uses a second random mask to determine the proportion of pixels in the second image that need to have noise added, thereby adding noise to the second image and obtaining the second data-enhanced image.

[0130] Since the noise source of the first and second data augmented images mentioned above is the target sample image pair itself, the added noise is closer to the noise distribution of the target sample image pair in the real world compared with simply human-defined noise.

[0131] In this embodiment, the data augmentation sample image acquisition module 606 combines the first data augmentation image and the second data augmentation image to obtain a data augmentation sample image pair. In other embodiments, multiple sets of different data augmentation sample image pairs can be constructed by adjusting the first and second random masks and the first and second disordered images to expand the training sample set.

[0132] In some embodiments, the first data-enhanced image and the second data-enhanced image described above are obtained by the first enhanced image acquisition module 604 and the second enhanced image acquisition module 605 respectively based on the following formulas:

[0133]

[0134]

[0135] Where, x l For the first image, x r For the second image, x l ′ represents the first disordered image, x r ′ represents the second disordered image, M l M is the first random mask. r λ is the second random mask, and λ is the noise scaling factor; ⊙ represents the operation of multiplying pixels with the same coordinate position between matrices.

[0136] The data-enhanced image pair can be obtained using the above formula. By adjusting the first and second random masks, the first and second disordered images, and the noise scaling factor λ, multiple sets of different data augmentation pairs can be obtained, thereby expanding the training samples.

[0137] In some embodiments, the disordered image acquisition module 603 includes a scrambling template acquisition unit and a disordered image acquisition unit. The scrambling template acquisition unit is used to acquire a preset scrambling template, and the disordered image acquisition unit is used to fill the pixel values ​​of the first image into the preset scrambling template according to a preset rule to obtain a first disordered image, and to fill the pixel values ​​of the second image into the preset scrambling template according to a preset rule to obtain a second disordered image.

[0138] In this embodiment, a preset scrambling template can be obtained through a scrambling template acquisition unit to scramble the pixel positions of the first and second images. The scrambling template is set according to actual needs. It should be noted that the size of the scrambling template matches the size of the image to be scrambled. When the size of the image to be scrambled is smaller than the size of the scrambling template, the boundaries of the image to be scrambled need to be patched (e.g., padded with 0s outside its boundaries) to fit the scrambling template.

[0139] In this embodiment, the first image and the second image are combined with the scrambling template by the scrambling image acquisition unit, that is, the pixel values ​​of the first image and the pixel values ​​of the second image are filled into the selected scrambling template according to the preset rules, so as to obtain the first scrambling image and the second scrambling image.

[0140] In some embodiments, the target sample image pair acquisition module 601 includes an original sample image pair acquisition unit and a preprocessing unit. The original sample image pair acquisition unit is used to acquire original sample image pairs, and the preprocessing unit is used to preprocess the original sample image pairs to obtain target sample image pairs. In this embodiment, the original sample image pair is acquired by the original sample image pair acquisition unit. The original sample image pair is obtained based on a stereo camera, and the original sample image pair includes left and right images of the same scene captured by the stereo camera. The shooting scene (target object) in the original sample image pair can come from the service scene of intelligent question-answering robots in the financial field, human body modeling and tissue organ modeling scenes in the medical field, etc. In this embodiment, in order to obtain a lot of expanded training samples, before adding noise, the preprocessing unit also performs some non-destructive preprocessing on the original sample image pairs, such as rotation, flipping, and boundary cropping, to obtain multiple different target sample image pairs, thereby further obtaining more different training samples.

[0141] It is understood that each component of the image data enhancement apparatus proposed in this application can realize the function of any of the image data enhancement methods provided in any of the above embodiments, and the specific structure will not be described in detail.

[0142] Please refer to Figure 7 In this embodiment, a training device for a binocular stereo matching model is also provided, comprising:

[0143] Image pair sample set construction module 701 is used to construct an image pair sample set, wherein the image pair sample set is constructed based on the image data enhancement method provided in any of the above embodiments;

[0144] The sample subset partitioning module 702 is used to divide the image sample set into a training sample subset and a test sample subset;

[0145] The model training module 703 is used to input at least one training sample from the training sample subset into the original stereo matching model for training, and adjust the parameters of the original stereo matching model according to the preset loss function, the first threshold and the output of the original stereo matching model during the training process, until the training loss of the original stereo model is less than the threshold, thereby obtaining the undetermined stereo matching model.

[0146] The model testing module 704 is used to input at least one test sample from the test sample subset into the undetermined binocular stereo matching model for verification, and to determine whether the output of the undetermined binocular stereo matching model meets a first preset condition during the verification process; and to determine the undetermined binocular stereo matching model as the target binocular stereo matching model when the output of the undetermined binocular stereo matching model meets the first preset condition.

[0147] In this embodiment, the image pair sample set is constructed by the image data enhancement method provided in any of the above embodiments through the image pair sample set construction module 701, which obtains a large number of training samples for training the binocular stereo matching model, thus forming an image pair sample set.

[0148] In this embodiment, to ensure high stability of the model after training, the image sample set is further divided into a training sample subset and a test sample subset by the sample subset partitioning module 702. The training sample subset is used for the model training process. When the training result converges, that is, when the preset training constraints are met, the stability of the model is tested using the test sample subset. When the model meets the preset stability conditions, training stops and the current model is used as the target model. When the preset stability conditions are not met, the model continues to be trained using the training samples in the training sample subset until it passes the test verification.

[0149] In this embodiment, at least one training sample from a subset of training samples is input into the original stereo matching model for training via the model training module 703. For ease of training, batch processing is typically used, meaning that a batch of sample data is input during each training iteration, and the training loss is calculated for all samples. During training, the parameters of the original stereo matching model are adjusted based on a preset loss function, a first threshold (i.e., a convergence threshold), and the output of the original stereo matching model until the training loss of the original stereo model is less than the first threshold, thus obtaining the undetermined stereo matching model.

[0150] In this embodiment, at least one test sample from the test sample subset is input into the undetermined stereo matching model for verification via the model testing module 704. During the verification process, it is determined whether the output of the undetermined stereo matching model meets the first preset condition, that is, whether the deviation between its predicted value and the true value is within a preset range (e.g., 2%). If so, the undetermined stereo matching model is considered to have high stability, the training process can be terminated, and it is determined as the target stereo matching model for use. If not, the model continues to be trained using the training samples in the training sample subset until the test verification is passed.

[0151] Understandably, each component of the training device for the binocular stereo matching model proposed in this application can realize the function of any of the binocular stereo matching model training methods provided in any of the above embodiments, and the specific structure will not be described in detail.

[0152] Please refer to Figure 8 This application also provides a computer device, which includes a processor, memory, network interface, and database connected via a system bus. The processor in this computer device provides computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and internal memory. The non-volatile storage medium stores operating devices, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs stored in the non-volatile storage medium. The database of the computer device stores image pairs for binocular stereo matching tasks. The network interface of the computer device is used for communication with external terminals via a network connection.

[0153] Furthermore, the aforementioned computer device may also be equipped with an input device and a display screen, etc. When the aforementioned computer program is executed by a processor to implement an image data enhancement method, it includes the following steps: acquiring a target sample image pair, the target sample image pair including a first image and a second image; generating a first random mask based on the first image and generating a second random mask based on the second image, wherein the first random mask has the same size as the first image and the second random mask has the same size as the second image; obtaining a first disordered image based on the first image and obtaining a second disordered image based on the second image, wherein the first disordered image is obtained by shuffling the pixels of the first image and the second disordered image is obtained by shuffling the pixels of the second image; obtaining a first data-enhanced image based on the first random mask, the second disordered image and the first image; obtaining a second data-enhanced image based on the second random mask, the first disordered image and the second image; and combining the first data-enhanced image and the second data-enhanced image to obtain a data-enhanced sample image pair.

[0154] Those skilled in the art will understand that Figure 8 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer equipment on which the present application is applied.

[0155] This application embodiment also provides a computer-readable storage medium storing a computer program thereon. When the computer program is executed by a processor, it implements an image data enhancement method, including the following steps: acquiring a target sample image pair, the target sample image pair including a first image and a second image; generating a first random mask based on the first image and generating a second random mask based on the second image, wherein the first random mask has the same size as the first image, and the second random mask has the same size as the second image; obtaining a first disordered image based on the first image and obtaining a second disordered image based on the second image, wherein the first disordered image is obtained by shuffling pixels in the first image, and the second disordered image is obtained by shuffling pixels in the second image; obtaining a first data-enhanced image based on the first random mask, the second disordered image, and the first image; obtaining a second data-enhanced image based on the second random mask, the first disordered image, and the second image; and combining the first data-enhanced image and the second data-enhanced image to obtain a data-enhanced sample image pair. It is understood that the computer-readable storage medium in this embodiment can be a volatile readable storage medium or a non-volatile readable storage medium.

[0156] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a computer-readable storage medium, and when executed, it can include the processes of the embodiments of the methods described above. Any references to memory, storage, databases, or other media provided in this application and used in the embodiments can include non-volatile and / or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual-rate SDRAM (SSRSDRAM), expanded SDRAM (ESDRAM), synchronous link DRAM (SLDRAM), Rambus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

[0157] It should be noted that, in this document, the terms "comprising," "including," or any other variations are intended to cover non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, apparatus, article, or method. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, apparatus, article, or method that includes that element.

[0158] The above description is only a preferred embodiment of this application and does not limit the patent scope of this application. Any equivalent structural or procedural changes made based on the content of this application's specification and drawings, or direct or indirect applications in other related technical fields, are similarly included within the patent protection scope of this application.

Claims

1. An image data enhancement method for training sample expansion of a binocular stereo matching task, characterized in that, include: Acquire a pair of target sample images, wherein the pair of target sample images includes a first image and a second image; A first random mask is generated based on the first image, and a second random mask is generated based on the second image. The first random mask has the same size as the first image, and the second random mask has the same size as the second image. Both the first random mask and the second random mask are binary masks used to determine the pixels in the corresponding image that need to have noise added. A first disordered image is obtained based on the first image, and a second disordered image is obtained based on the second image, wherein the first disordered image is obtained by scrambling the pixels of the first image, and the second disordered image is obtained by scrambling the pixels of the second image. Based on the first random mask, the second disordered image, and the first image, a first data augmentation image is obtained, which is an augmentation image generated by filtering noisy pixels of the first image through the first random mask and combining the noise information of the second disordered image. Based on the second random mask, the first disordered image, and the second image, a second data augmentation image is obtained, that is, an augmentation image is generated by filtering the noisy pixels of the second image through the second random mask and combining them with the noise information of the first disordered image. The first data-enhanced image and the second data-enhanced image are combined to obtain a data-enhanced sample image pair.

2. The image data enhancement method of claim 1, wherein, The first random mask and the second random mask are obtained by the following formula: , , wherein, is a first random mask, is a second random mask, is used to represent the pixel coordinates in the first image that need to be added with noise, is used to represent the pixel coordinates in the second image that need to be added with noise, and is a pixel index, is a function for generating a uniformly distributed random number in the interval (0, 1), is a proportion of the pixel points added with noise to the total number of pixel points in the sample image pair, The value range of is 0 < p < 1. The value range of is 0 < q < 1.

3. The image data enhancement method of claim 1, wherein, The first data-enhanced image and the second data-enhanced image are obtained by the following formula: , , wherein, is a first image, is a second image, is a first shuffled image, is a second shuffled image, is a first random mask, is a second random mask, is a noise proportionality coefficient.

4. The image data enhancement method of claim 1, wherein, The steps of obtaining a first disordered image based on the first image and obtaining a second disordered image based on the second image include: Get the preset scrambling template; The pixel values ​​of the first image are filled into the preset scrambling template according to preset rules to obtain a first scrambled image, and the pixel values ​​of the second image are filled into the preset scrambling template according to preset rules to obtain a second scrambled image.

5. The image data enhancement method of claim 1, wherein, The step of obtaining the target sample image pair includes: Obtain the original sample image pairs; The original sample image pairs are preprocessed to obtain target sample image pairs.

6. A method for training a binocular stereo matching model, characterized in that, include: Construct an image pair sample set, wherein the image pair sample set is constructed based on the image data augmentation method according to any one of claims 1-5; The image sample set is divided into a training sample subset and a test sample subset; At least one training sample from the training sample subset is input into the original stereo matching model for training. During the training process, the parameters of the original stereo matching model are adjusted according to the preset loss function, the first threshold and the output of the original stereo matching model until the training loss of the original stereo model is less than the first threshold, thereby obtaining the undetermined stereo matching model. At least one test sample from the subset of test samples is input into the undetermined binocular stereo matching model for verification, and during the verification process, it is determined whether the output of the undetermined binocular stereo matching model meets the first preset condition. If the output of the undetermined binocular stereo matching model satisfies the first preset condition, then the undetermined binocular stereo matching model is determined as the target binocular stereo matching model. If the undetermined binocular stereo matching model does not meet the first preset condition, then the undetermined binocular stereo matching model will continue to be trained based on the training sample subset.

7. An image data augmentation device for training sample expansion of a binocular stereo matching task, characterized in that, include: A target sample image pair acquisition module is used to acquire target sample image pairs, wherein the target sample image pairs include a first image and a second image; A random mask generation module is used to generate a first random mask based on the first image and a second random mask based on the second image, wherein the first random mask has the same size as the first image and the second random mask has the same size as the second image. Both the first random mask and the second random mask are binary masks used to determine the pixels in the corresponding image that need to have noise added. The disordered image acquisition module is used to obtain a first disordered image based on the first image and a second disordered image based on the second image, wherein the first disordered image is obtained by scrambling the pixels of the first image and the second disordered image is obtained by scrambling the pixels of the second image. The first data augmentation image acquisition module is used to obtain a first data augmentation image based on the first random mask, the second disordered image and the first image, that is, to generate an augmentation image by filtering the noisy pixels of the first image through the first random mask and combining the noise information of the second disordered image. The second data augmentation image acquisition module is used to obtain a second data augmentation image based on the second random mask, the first disordered image and the second image, that is, to generate an augmented image by filtering the noisy pixels of the second image through the second random mask and combining the noise information of the first disordered image. The data augmentation sample image pair acquisition module is used to combine the first data augmentation image and the second data augmentation image to obtain a data augmentation sample image pair.

8. An apparatus for training a binocular stereo matching model, comprising: include: An image pair sample set construction module is used to construct an image pair sample set, wherein the image pair sample set is constructed based on the image data augmentation method according to any one of claims 1-5; The sample subset partitioning module is used to divide the image sample set into a training sample subset and a test sample subset; The model training module is used to input at least one training sample from the training sample subset into the original stereo matching model for training, and adjust the parameters of the original stereo matching model according to the preset loss function, the first threshold and the output of the original stereo matching model during the training process, until the training loss of the original stereo model is less than the threshold, thereby obtaining the undetermined stereo matching model. The model testing module is used to input at least one test sample from the subset of test samples into the undetermined binocular stereo matching model for verification, and during the verification process, determine whether the output of the undetermined binocular stereo matching model meets a first preset condition; and, This is used to determine the undetermined binocular stereo matching model as the target binocular stereo matching model when the output of the undetermined binocular stereo matching model meets the first preset condition. 9.A computer device, comprising a memory and a processor, wherein the memory stores a computer program, and the computer device is configured to perform the method according to any one of claims 1-8. When the processor executes the computer program, it implements the steps of the image data enhancement method according to any one of claims 1-5.

10. A computer-readable storage medium having stored thereon a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the image data enhancement method according to any one of claims 1-5.