Picture generation method and device, computer device and storage medium

By performing instance segmentation, enhancement, and blurring on the image set, high-quality target images are generated, which solves the problem of low dataset diversity in existing technologies and improves the accuracy of the robot cleaning recognition neural network.

CN116597252BActive Publication Date: 2026-06-19SHENZHEN PUDU TECH CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHENZHEN PUDU TECH CO LTD
Filing Date
2023-05-19
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing deep learning-based dirt recognition methods require large amounts of high-quality datasets for training, and the data generated by existing methods has low diversity, resulting in low accuracy of the robot cleaning recognition neural network.

Method used

By acquiring the set of images to be processed and the initial set of background images, instance segmentation is performed. Background images and foreground targets are randomly selected, and enhancement and blurring are performed based on scene type to generate high-quality target images, ensuring diversity and realism.

🎯Benefits of technology

This improved the accuracy of the robot's cleaning recognition neural network in identifying dirt and grime, and the generated dataset can be applied to a wider range of fields, enhancing the diversity and quality of the dataset.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116597252B_ABST
    Figure CN116597252B_ABST
Patent Text Reader

Abstract

This application relates to an image generation method, apparatus, computer device, and storage medium. The method includes: performing instance segmentation processing on a set of images to be processed to obtain a target foreground set; randomly selecting a current background image from an initial background image set, detecting and obtaining the target scene type corresponding to the current background image; randomly selecting a current foreground target from the target foreground set, determining a target enhancement processing method based on the target scene type, and performing the target enhancement processing method to obtain an enhanced foreground target; performing background enhancement processing on the current background image to obtain an enhanced background image, and randomly pasting the enhanced foreground target into the enhanced background image to obtain the current image; determining a target blurring processing method based on the target scene type, and performing the target blurring processing method on the current image to generate a target image. This method can improve the data quality of images used for training the cleaning recognition neural network for robots.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computer technology, and in particular to a method, apparatus, computer device, and storage medium for generating images. Background Technology

[0002] With the rapid development of artificial intelligence technology, compared with traditional recognition methods such as digital image feature recognition, deep learning-based dirt recognition methods have better generalization, higher detection accuracy, and more agile deployment, which is of great significance for improving robot cleaning efficiency and ensuring cleaning quality. However, deep learning-based methods usually require a large amount of high-quality data to train the neural network. Currently, the method of manually labeling training data requires a lot of time and labor costs, and has large errors, inconsistent labeling quality, and difficulty in generating large-scale high-quality training data.

[0003] Furthermore, while existing methods can identify and locate dirt based on deep learning, the datasets used to train the neural network are all real data collected and labeled by the robot. The cost of collecting and labeling the datasets is high, and the existing methods generate low-variety data and low-quality image datasets. As a result, the robot's cleaning recognition neural network has a low accuracy in identifying dirt after being trained on low-quality image datasets. Summary of the Invention

[0004] Therefore, it is necessary to provide an image generation method, apparatus, computer equipment, and storage medium that can quickly and efficiently generate high-quality datasets to address the aforementioned technical problems, thereby improving the accuracy of the robot's corresponding cleaning recognition neural network in identifying dirt and debris.

[0005] An image generation method, the method comprising:

[0006] Obtain a set of images to be processed and an initial set of background images. Perform instance segmentation on the images in the set of images to be processed to obtain a target foreground set. The images to be processed include the foreground subject to be cleaned.

[0007] Randomly select a current background image from the initial background image set, detect the scene type corresponding to the current background image, and obtain the target scene type;

[0008] Randomly select a current foreground target from the target foreground set, determine the target enhancement processing method corresponding to the current foreground target based on the target scene type, and execute the operation corresponding to the target enhancement processing method to obtain the enhanced foreground target;

[0009] The current background image is enhanced to obtain an enhanced background image. The enhanced foreground object is then randomly pasted into the enhanced background image to obtain the current image.

[0010] Based on the target scene type, a target blurring processing method is determined, and the current image is subjected to the operation corresponding to the target blurring processing method to generate a target image.

[0011] In one embodiment, before obtaining the set of images to be processed, the method further includes:

[0012] Obtain the original image set captured by the camera, and calculate the distortion parameters and transformation matrix corresponding to each image in the original image set;

[0013] Based on the distortion parameters and transformation matrix, distortion correction processing is performed on the images in the original image set to obtain the image set to be processed.

[0014] In one embodiment, a set of images to be processed is obtained, and instance segmentation processing is performed on the images in the set of images to be processed to obtain a target foreground set, including:

[0015] Polygon annotation is performed on each element corresponding to each image in the image set to be processed to obtain the target polygon corresponding to each image;

[0016] Based on the target polygon, a cutout operation is performed on the corresponding image to obtain the target foreground set.

[0017] In one embodiment, detecting the scene type corresponding to the current background image and obtaining the target scene type includes:

[0018] Obtain the grayscale value and total number of pixels corresponding to each pixel in the current background image;

[0019] Based on the gray value corresponding to each pixel and the total number of pixels, calculate the probability of each gray level in the current background image;

[0020] Based on the gray level probabilities corresponding to each gray level, calculate the average gray level of the current background image;

[0021] Based on the grayscale value, grayscale level probability, and grayscale average value, calculate the grayscale variance value corresponding to the current background image;

[0022] Based on the average grayscale value and the grayscale variance value, the target scene type corresponding to the current background image is determined.

[0023] In one embodiment, determining the target scene type corresponding to the current background image based on the average grayscale value and the grayscale variance value includes:

[0024] Obtain a variance threshold; when the grayscale variance value is less than the variance threshold, use the first scene type as the target scene type.

[0025] When the grayscale variance value is greater than the variance threshold, the second scene type is taken as the target scene type;

[0026] When the average grayscale value meets the third scene condition, the third scene type is taken as the target scene type.

[0027] In one embodiment, determining the target enhancement processing method corresponding to the current foreground target based on the target scene type includes:

[0028] Based on the correspondence between scene type and foreground target enhancement processing method, the target enhancement processing method corresponding to the current foreground target is determined from the candidate foreground enhancement processing methods. The candidate foreground enhancement processing methods include size dimension enhancement processing method, pixel dimension enhancement processing method and grayscale dimension enhancement processing method.

[0029] In one embodiment, determining the target blurring method based on the target scene type includes:

[0030] Based on the correspondence between scene type and fuzzy processing method, the target fuzzy processing method is determined from the candidate fuzzy processing methods, which include uniform distribution fuzzy processing method, Gaussian distribution fuzzy processing method and nonlinear distribution fuzzy processing method.

[0031] An image generation apparatus, the apparatus comprising:

[0032] The acquisition and segmentation module is used to acquire a set of images to be processed and an initial set of background images, perform instance segmentation processing on the images in the set of images to be processed to obtain a target foreground set, wherein the images to be processed include the foreground subject to be cleaned;

[0033] The detection module is used to randomly select the current background image from the initial background image set, detect the scene type corresponding to the current background image, and obtain the target scene type;

[0034] The enhancement module is used to randomly select a current foreground target from the target foreground set, determine the target enhancement processing method corresponding to the current foreground target based on the target scene type, and execute the operation corresponding to the target enhancement processing method to obtain the enhanced foreground target;

[0035] The paste module is used to perform background enhancement processing on the current background image to obtain an enhanced background image, and randomly paste the enhanced foreground target into the enhanced background image to obtain the current image;

[0036] The blur module is used to determine the target blur processing method based on the target scene type, perform the operation corresponding to the target blur processing method on the current image, and generate the target image.

[0037] A computer device includes a memory and a processor, the memory storing a computer program, and the processor executing the computer program to implement the steps of the above-described image generation method.

[0038] A computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the above-described image generation method.

[0039] The aforementioned image generation method, apparatus, computer equipment, and storage medium acquire a set of images to be processed and an initial set of background images. They then perform instance segmentation on the images in the set of images to be processed to obtain a target foreground set, where the images to be processed include foreground subjects to be cleaned. Next, a current background image is randomly selected from the initial set of background images, and the scene type corresponding to the current background image is detected to obtain a target scene type. A current foreground target is randomly selected from the target foreground set, and based on the target scene type, a target enhancement processing method corresponding to the current foreground target is determined. The operation corresponding to the target enhancement processing method is executed to obtain an enhanced foreground target. Background enhancement processing is performed on the current background image to obtain an enhanced background image. The enhanced foreground target is randomly pasted into the enhanced background image to obtain a current image. Finally, based on the target scene type, a target blurring processing method is determined, and the operation corresponding to the target blurring processing method is performed on the current image to generate a target image. By performing instance segmentation on the image set to be processed, the final generated target images can be applied to a wider range of fields, no longer limited to the data application of the robot cleaning recognition neural network in a single field. Through multiple random selection processes, the diversity of generated target images is ensured. Furthermore, by distinguishing scene types, more targeted enhancement and blurring processing is applied to the images, making the target images corresponding to different scenes closer to the real situation and obtaining higher quality target images. In this way, the image set of multiple generated target images can be integrated and used in the training of the robot cleaning recognition neural network, which improves the accuracy of the robot's corresponding cleaning recognition neural network in identifying dirt and debris to a certain extent. Attached Figure Description

[0040] Figure 1 This is an application environment diagram of the image generation method in one embodiment;

[0041] Figure 2 This is a flowchart illustrating an image generation method in one embodiment;

[0042] Figure 3This is a schematic diagram of the distortion correction process in one embodiment;

[0043] Figure 4 This is a flowchart illustrating instance segmentation in one embodiment;

[0044] Figure 5 This is a schematic diagram of the process for calculating the grayscale value of an image in one embodiment;

[0045] Figure 6 This is a schematic diagram of the scenario determination process in one embodiment;

[0046] Figure 7 This is a schematic diagram of the robot's dirty image synthesis process in one embodiment;

[0047] Figure 8 This is a schematic diagram illustrating the overall process of generating a dirty image of a robot in one embodiment.

[0048] Figure 9 This is a structural block diagram of an image generation device in one embodiment;

[0049] Figure 10 This is a diagram of the internal structure of the robot in one embodiment;

[0050] Figure 11 This is an internal structural diagram of a computer device in one embodiment. Detailed Implementation

[0051] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.

[0052] The image generation method provided in this application embodiment can be applied to, for example, Figure 1In the application environment shown, robot 102 communicates with server 104 via a network. A data storage system can store the data that server 104 needs to process. The data storage system can be integrated onto server 104 or placed on a cloud or other network server. Server 104 is used to acquire a set of images to be processed and an initial set of background images captured by a camera (which can be an RGB camera). It performs instance segmentation processing on the images in the set of images to be processed to obtain a target foreground set, which includes the foreground subject to be cleaned. It randomly selects a current background image from the initial background image set, detects the scene type corresponding to the current background image, and obtains the target scene type. It randomly selects a current foreground target from the target foreground set, determines the target enhancement processing method corresponding to the current foreground target based on the target scene type, and executes the operation corresponding to the target enhancement processing method to obtain an enhanced foreground target. It performs background enhancement processing on the current background image to obtain an enhanced background image, randomly pastes the enhanced foreground target into the enhanced background image, and obtains the current image. Based on the target scene type, it determines a target blurring processing method, performs the operation corresponding to the target blurring processing method on the current image, and generates the target image. Robot 102 is used to deploy a cleaning recognition neural network trained on an image set ensembled from multiple target images. Robot 102 can be a cleaning robot or other type of service robot with active cleaning capabilities. Server 104 can be implemented using a standalone server or a server cluster consisting of multiple servers.

[0053] In one embodiment, such as Figure 2 As shown, an image generation method is provided, which can be applied to... Figure 1 Taking the server in the example, the following steps are included:

[0054] Step S200: Obtain the image set to be processed and the initial background image set. Perform instance segmentation processing on the images in the image set to be processed to obtain the target foreground set. The image set to be processed is obtained by the robot taking pictures of the preset location. The images to be processed include the foreground subject to be cleaned.

[0055] The "image set to be processed" refers to the collection of images used to generate the foreground target. These images can be taken in various indoor and outdoor locations using visual sensors on robots or other devices, depicting dirty environments without any annotation or background removal. For ease of description, this application uses an example of a dirty environment in an indoor setting (such as a shopping mall, supermarket, office, or parking lot) obtained through a robot's visual sensor. The "initial background image" refers to the initial image used as the background for the final generated dirty image (target image). These images are not enhanced in any way and are closest to the background image in a real dirty scene. Instance segmentation is one of the four classic visual tasks. It combines the characteristics of semantic segmentation (requiring pixel-level classification) with some features of object detection (requiring the localization of different instances, even those of the same class). However, unlike semantic segmentation, which outputs bounding boxes and categories, instance segmentation outputs the mask and category of the object. In simpler terms, segmentation uses polygons to extract the target. For example, in an image of a leaf, points are sequentially drawn along the edge of the leaf to form a closed polygon (the number of points is unlimited, the more the better). All pixels within the polygon are automatically selected, and a label (leaf) is assigned to that region. This automatically generates a foreground object annotation file, thus achieving foreground object segmentation and annotation. The target foreground set refers to the collection of foreground objects used to generate the dirty image (target image). It can be obtained by performing instance segmentation on images in the image set to be processed. The resulting foreground object annotations contain various information, such as the size, type, and location of the annotation points. The foreground subject refers to the objects that appear in the image. These objects are the dirt that the robot needs to clean up, such as the leaves in the example above. The foreground object is also the foreground subject.

[0056] Specifically, before performing the relevant image generation operations, it is necessary to create foreground and background images for image generation. The foreground objects are mainly obtained by acquiring the set of images to be processed and performing instance segmentation on the set. The resulting foreground objects are the foreground subjects extracted from the images to be processed. These foreground subjects are labeled with corresponding information, providing a data foundation for subsequent processes. Furthermore, both the images to be processed and the background images can be obtained through the robot, specifically image data related to the robot cleaning indoor dirt.

[0057] Step S202: Randomly select the current background image from the initial background image set, detect the scene type corresponding to the current background image, and obtain the target scene type.

[0058] Here, "current background image" refers to the background image used to synthesize the target image in the current step. This background image has not yet undergone any enhancement processing and is most similar to the environment in which the robot captured the background image. "Target scene type" refers to the scene corresponding to the current background image, which can be an open scene, a cluttered scene, or a dark scene, etc.

[0059] Specifically, to better reproduce more realistic images and improve the accuracy of the cleaning recognition neural network training during automatic robot cleaning, the scene type corresponding to the current background image is identified before generating the target image. Then, based on the scene type, corresponding enhancement processing methods are considered to make the generated images more realistic and diverse. When detecting the target scene type corresponding to the current background image, the relevant average or variance value can be calculated based on the grayscale value of the current background image, and then the scene type corresponding to the current background image can be determined based on the calculated data.

[0060] Step S204: Randomly select the current foreground target from the target foreground set, determine the target enhancement processing method corresponding to the current foreground target based on the target scene type, execute the operation corresponding to the target enhancement processing method, and obtain the enhanced foreground target.

[0061] Here, "current foreground target" refers to the foreground targets randomly obtained from the target foreground set in the current step. These foreground targets are used to generate the corresponding target image in the current step. "Target enhancement processing method" refers to the enhancement processing method applied to the current foreground target, determined according to the scene type. This can be methods such as geometric transformation enhancement or pixel transformation enhancement. "Enhanced foreground target" refers to the foreground target obtained after enhancing the current foreground target. These foreground targets may differ from the original current foreground target in terms of saturation, blur level, brightness, etc.

[0062] Specifically, to diversify the final generated target images, a random number of foreground targets needed for generating the target images in the current step can be randomly selected from the target foreground set. The enhancement processing method for foreground targets in different scenes is determined based on the target scene category, which can effectively avoid realism deviations caused by differences in images across different scenes. For example, if the scene type is relatively dark, but enhancement operations (over-brightness, over-saturation, etc.) are applied to the current foreground target that are inconsistent with the dark scene, the randomness of the current foreground target allows for diverse combinations of foreground targets, thereby increasing the diversity of the generated target images. Targeted enhancement processing of the current foreground target can make the final enhanced foreground target more realistic and closer to the real environment, achieving a more realistic fusion of the foreground target and the background image, thus improving the quality of the generated target images.

[0063] Step S206: Perform background enhancement processing on the current background image to obtain an enhanced background image. Randomly paste the enhanced foreground target into the enhanced background image to obtain the current image.

[0064] Background enhancement processing refers to operations that moderately alter the background, such as changing its blur, brightness, or saturation through pixel transformations. An enhanced background image refers to the image obtained after applying background enhancement processing to the current background image. The current image refers to the image generated by combining the enhanced foreground object with the enhanced background image, before any other operations have been performed on it.

[0065] Specifically, to obtain a higher quality background image, background enhancement processing can be performed on the current background image, which helps to reduce noise and thus improve image quality. To make the generated image more closely resemble the complexity and diversity of a real-world dirty environment, multiple pasting points can be randomly selected in the enhanced background image, and the enhanced foreground target can be randomly pasted onto these pasting points. This double randomization enhances the diversity and complexity of the generated image, preventing it from being too uniform and thus better reflecting the random complexity and diversity of a real-world dirty environment, thereby improving the overall image quality.

[0066] Step S208: Based on the target scene type, determine the target blurring processing method, perform the corresponding operation on the current image to generate the target image.

[0067] Among them, target blurring processing refers to processing methods determined based on the target scene type, which can further improve the quality of the current image. The target image refers to the image obtained after blurring the current image; this image is more realistic and diverse in its conformity to the actual environment.

[0068] Specifically, images corresponding to different scenes differ in color, blur, and saturation. Therefore, to obtain better quality images, different blurring methods can be determined based on different scene types. Although the target scene type is detected based on the current background image without background enhancement, it is closer to the scene corresponding to the initial background image. Furthermore, since scene detection largely depends on the background in the image, the target scene type detected through the current background image can be used as the target scene type corresponding to the current image. Based on this target scene type, the target blurring method corresponding to the current image is determined. This target blurring method can be mean blur, Gaussian blur, or median blur, etc. After the current image undergoes the operation corresponding to the target blurring method, image data that is closer to the real dirty scene is generated, which is the target image.

[0069] In one embodiment, a large dataset is used to train the robot's cleaning recognition neural network. To obtain more high-quality target images that are closer to real-world scenarios, the target image generation process can be iterated. In each iteration, the current background image, the current foreground target, and the pasting of the enhanced foreground target into the enhanced background image are all randomized. This ensures the diversity among the generated target images and covers various types of dirt conditions. As a result, the quality of the dataset used to train the robot's cleaning recognition neural network (i.e., the image set integrating multiple target images) is greatly improved, which is beneficial for improving the accuracy of dirt recognition by the neural network trained by the robot using the image set integrating multiple target images.

[0070] The above image generation method involves acquiring a set of images to be processed, including the foreground subject to be cleaned, captured by a robot at a preset location. The image set is then segmented to obtain a target foreground set. An initial background image set is obtained, and a current background image is randomly selected from this set. The target scene type corresponding to the current background image is detected. A current foreground target is then selected from the target foreground set. Based on the target scene type, a corresponding target enhancement processing method is determined, and the current foreground target is subjected to the operation corresponding to this method to obtain an enhanced foreground target. Next, background enhancement processing is performed on the current background image to obtain an enhanced background image. The enhanced foreground target is randomly pasted into the enhanced background image to generate the current image. Finally, the current image is processed based on the target scene type. The operation corresponding to the target blurring method with a defined type generates target images. The image set to be processed is then segmented into instances, making the final generated target images applicable to a wider range of fields, no longer limited to the data application of robot cleaning recognition neural networks in a single field. Through multiple random selection processes, the diversity of generated target images is ensured. Furthermore, by distinguishing scene types, the images are enhanced and blurred in a more targeted manner, making the target images corresponding to different scenes closer to the real situation and obtaining higher quality target images. In this way, the image set of multiple generated target images can be integrated and used in the training of the robot cleaning recognition neural network, which to a certain extent improves the accuracy of the robot's corresponding cleaning recognition neural network in identifying dirt and grime.

[0071] In one embodiment, such as Figure 3 As shown, before step S200, the following steps are also included:

[0072] Step S300: Obtain the original image set captured by the camera, and calculate the distortion parameters and transformation matrix corresponding to each image in the original image set.

[0073] The original image set refers to the initial, unprocessed images captured. These can be images obtained through the RGB sensor (camera) of the cleaning robot or images captured by other camera devices. Distortion parameters refer to the distortion coefficients of the camera lens in the radial and tangential directions. The transformation matrix refers to the matrix data used for both the original and corrected images. It includes intrinsic and extrinsic parameter matrices. The intrinsic parameter matrix contains the physical dimensions of the pixels, focal length, distortion factor of the image's physical coordinates, and the horizontal and vertical offsets of the image origin relative to the optical center imaging point. The extrinsic parameter matrix contains the rotation and translation matrices for transforming the world coordinate system into the camera coordinate system.

[0074] Specifically, when the robot collects images from the original image set, it captures images from different angles and distances. For example, when capturing a particular original image, the robot moves from 0.05m to 2m, with each 0.2m interval serving as a distance. At each distance, the robot rotates 60° to take the next image. During this process, different lighting can be actively added to recreate most of the real environment. The robot then captures the corresponding original image at each distance point, and the collection of images from multiple locations forms the original image set. Furthermore, because the robot uses a wide-angle lens, the original images will be distorted. Therefore, distortion correction processing is required for these original images. This distortion correction method can employ Zhang Zhengyou's chessboard calibration method.

[0075] Step S302: Based on the distortion parameters and transformation matrix, perform distortion correction processing on the images in the original image set to obtain the image set to be processed.

[0076] Specifically, after obtaining the corresponding distortion parameters and transformation matrix, the distortion of the original images captured by the robot is corrected based on these data using the Zhang Zhengyou chessboard calibration method. This makes the corrected images closer to the images corresponding to the real scene, which helps to improve the realism of the foreground objects in the target foreground set obtained by instance segmentation from the image set to be processed. This leads to the generation of higher quality target images. The image set integrating multiple high-quality target images is then applied to the training of the robot cleaning recognition neural network, enabling the robot cleaning recognition neural network to better identify dirty objects.

[0077] In this embodiment, by acquiring the original image set, calculating the distortion parameters and transformation matrix corresponding to each image in the original image set, and performing distortion correction processing on the images in the original image set based on the distortion parameters and transformation matrix, a set of images to be processed is obtained. This achieves distortion correction from the original images to the images to be processed, which means that images distorted due to the large coverage of the robot's wide-angle lens are corrected, thereby ensuring the data quality of the set of images to be processed.

[0078] In one embodiment, such as Figure 4 As shown, step S200 includes:

[0079] Step S400: Perform polygon annotation on each element corresponding to each image in the image set to be processed to obtain the target polygon corresponding to each image.

[0080] Here, "elements" refers to various objects appearing in the image, including but not limited to foreground objects. "Polygon annotation" refers to the process of sequentially selecting points along the edges of each element's shape to form a closed-loop polygon, then automatically selecting all pixels within that polygon and labeling it. "Target polygon" refers to the polygon corresponding to each element in the image, and this polygon is labeled with the corresponding annotation information for each element.

[0081] Specifically, the target foreground set is obtained by performing operations such as segmentation and labeling on the images in the image to be processed. In order to obtain a target foreground set that can be applied to generate datasets in multiple domains, firstly, polygon outlines are drawn on the various elements corresponding to the images in the image set to be processed, then all pixels corresponding to the polygon are selected, and the polygon is labeled. For example, if the polygon is a leaf, then information labels related to leaves are labeled, and then the corresponding target polygon is generated.

[0082] Step S402: Based on the target polygon, perform a cutout operation on the corresponding image to obtain the target foreground set.

[0083] Among them, the image cutout operation refers to the operation of cutting out or segmenting out the foreground object corresponding to the target polygon.

[0084] Specifically, the target polygons are various objects (including foreground objects) marked on the image. In order to obtain a single foreground object, the corresponding image can be cut out based on the target polygons, separating the foreground object corresponding to the target polygon from the original image to form a single foreground object. Through this operation, various foreground objects can be obtained, and these foreground objects are marked with corresponding size information, type information, and position information of the annotation points, etc., providing rich data materials for generating target images in subsequent processes.

[0085] In this embodiment, by annotating each element corresponding to the image in the image set to be processed with polygons, the target polygons corresponding to the image are obtained. Based on the target polygons, the corresponding image is cut out to obtain the target foreground set. This realizes the transformation from the image to be processed to multiple individual foreground targets, thereby obtaining rich data materials, which is conducive to generating multiple types of image datasets, and thus effectively improving the quality of the final generated target image.

[0086] In one embodiment, such as Figure 5 As shown, step S202 includes:

[0087] Step S500: Obtain the grayscale value and total number of pixels corresponding to each pixel in the current background image.

[0088] Grayscale value refers to the intensity or lightness of a color, or the brightness of a single pixel. Total pixels refers to the total number of pixels in the current background image.

[0089] Specifically, the scene type can be determined by calculating the grayscale value of each pixel in the image. Then, the scene type can be determined by the range of grayscale value changes between the corresponding pixels in the image, or by the average value of the sum of the grayscale values ​​of the corresponding pixels in the image. Based on the scene type, more effective enhancement and blurring methods can be selected to make the generated target image more realistic.

[0090] Step S502: Based on the grayscale value of each pixel and the total number of pixels, calculate the grayscale probability of each grayscale level in the current background image.

[0091] Gray level refers to the maximum number of different gray levels in an image; the larger the gray level, the wider the brightness range of the image. Gray level probability refers to the probability of each gray level appearing in the current background image.

[0092] Specifically, it is necessary to count the number of times different gray values ​​appear based on the gray values ​​corresponding to each pixel, and then calculate the probability of each gray level based on the number of times different gray values ​​appear and the total number of pixels in the image. The formula for calculating the probability of a gray level is shown in formula (1). In formula (1), Refers to grayscale levels; It can be 0, 1, 2, ..., L-1. It can represent all possible grayscale values ​​in an image of size M×N; yes The number of times it appears in the image; It is the total number of pixels. In addition, the sum of the probabilities of each gray level is obviously 1, as shown in formula (2). The data in formula (2) corresponds to that in formula (1).

[0093] (1) (2)

[0094] Step S504: Calculate the average gray level of the current background image based on the gray level probability of each pixel.

[0095] The average grayscale value refers to the average grayscale value in the current background image.

[0096] Specifically, the average grayscale value can intuitively reflect the overall darkness or lightness of an image. For example, a lower average grayscale value indicates that the scene in the image is generally darker, and vice versa. The average grayscale value can be calculated using formula (3), where... It is the average grayscale value; , , and The data corresponds to those in formulas (1) and (2).

[0097] (3)

[0098] Step S506: Calculate the grayscale variance value corresponding to the current background image based on the grayscale value, grayscale level probability, and grayscale average value.

[0099] Among them, the grayscale variance value refers to the variance value corresponding to the grayscale values ​​appearing in the current background image, which can be used to represent the changes of each grayscale value in the current background image.

[0100] Specifically, besides the average grayscale value which can intuitively reflect the overall darkness of the image, the grayscale variance of the current background image can also be calculated. This variance is used to determine the changes in grayscale values ​​within the current background image, and the range of grayscale value changes is used to determine the openness of the scene corresponding to the current background image. This allows for better selection of more targeted enhancement and blurring methods based on the scene's openness, further ensuring the quality of the generated target image. Specifically, the grayscale variance of the current background image can be calculated using formula (4). In formula (4), It is the grayscale variance value; , , , and The data corresponds to those in formulas (1), (2), and (3).

[0101] (4)

[0102] Step S508: Determine the target scene type corresponding to the current background image based on the grayscale average value and grayscale variance value.

[0103] Specifically, different scene types can be determined based on the prominent features of the scene. The grayscale average value and grayscale variance reflect the different image characteristics of the current background image. The grayscale average value can be used to determine the darkness of the current background image and then determine the corresponding scene type; the grayscale variance value can be used to determine whether the current background image is open or cluttered, and thus determine the scene type of the current background image.

[0104] In this embodiment, by obtaining the grayscale value and total number of pixels corresponding to each pixel in the current background image, the grayscale probability corresponding to each grayscale level in the current background image is calculated based on the grayscale value and total number of pixels. Then, based on the grayscale probability corresponding to each grayscale level, the average grayscale value corresponding to the current background image is calculated. Then, based on the grayscale value, grayscale probability, and average grayscale value, the grayscale variance value corresponding to the current background image is calculated. Finally, based on the average grayscale value and grayscale variance value, the target scene type corresponding to the current background image is determined. This method effectively utilizes the characteristics of the image to determine the scene type corresponding to the current background image, which is beneficial for subsequent processes to perform more reasonable enhancement and blurring processing on the image according to the target scene type. To a certain extent, this helps to improve the data quality of the generated target image.

[0105] In one embodiment, such as Figure 6 As shown, step S508 includes:

[0106] Step S600: Obtain the variance threshold. When the grayscale variance value is less than the variance threshold, the first scene type is taken as the target scene type.

[0107] The variance threshold refers to the boundary value used to classify scenes into different types. The first scene type refers to the scene corresponding to the smaller grayscale variance value of the current background image. This can refer to an open scene, that is, a scene with a relatively wide field of view and not too many subjects.

[0108] Specifically, the smaller the grayscale variance value, the smaller the range of grayscale value variation of the current background image. In other words, the same color is more evenly distributed in the scene corresponding to the current background image. For example, most of the floor in a shopping mall is a solid color. That is, the smaller the grayscale variance value, the more open the corresponding scene. Scenes with a grayscale variance value smaller than this can be classified as the first scene type. The images corresponding to this scene type can be uniformly enhanced and blurred. This can avoid the problem of over-processing images due to large scene differences, thereby effectively improving the quality of image data corresponding to this scene type.

[0109] Step S602: When the grayscale variance value is greater than the variance threshold, the second scene type is taken as the target scene type.

[0110] The second scene type refers to the scene where the grayscale variance value of the current background image is large. It can refer to a cluttered scene. In a cluttered scene, there are many objects, which will cause the grayscale value of the image to change greatly, that is, the corresponding grayscale variance value is large.

[0111] Specifically, the more cluttered the scene, the greater the variation in grayscale values ​​of the captured image, resulting in a larger grayscale variance. For this type of scene, the variation is significantly larger than that of the first scene type. If the enhancement and blurring methods corresponding to the first scene type are applied as well, the final image quality may be unsatisfactory. Therefore, scenes with larger grayscale variance values ​​can be considered the second scene type.

[0112] Step S604: When the average grayscale value meets the third scene condition, the third scene type is taken as the target scene type.

[0113] The third scene condition refers to the grayscale value used to determine if a scene belongs to the third scene type. It is a dividing line value, and scene types with an average grayscale value lower than this dividing line value can be classified as the third scene type. The third scene type refers to the scene corresponding to a low average grayscale value of the current background image.

[0114] Specifically, when a robot performs active cleaning, it can automatically clean in relatively dimly lit places, such as shopping mall parking lots. However, if the enhancement and blurring methods corresponding to the first and second scene types are directly applied to dimly lit scenes, the final generated target image may deviate significantly from the real scene. Consequently, the quality of the dataset corresponding to the dimly lit scene will not reach the ideal effect, and the cleaning recognition neural network of the robot, which should be trained on these datasets, will not be able to effectively identify dirt in dimly lit scenes, making it difficult for the robot to automatically clean in such scenes.

[0115] In this embodiment, by obtaining a variance threshold, when the grayscale variance value is less than the variance threshold, the first scene type is taken as the target scene type; when the grayscale variance value is greater than the variance threshold, the second scene type is taken as the target scene type; and when the grayscale average value meets the third scene condition, the third scene type is taken as the target scene type. This achieves the division of different scenes, which is beneficial for selecting more effective enhancement and blurring processing methods according to different scene types in subsequent processes, thereby generating higher quality image data more efficiently and better applied to the training of the robot cleaning recognition neural network.

[0116] In one embodiment, step S204 includes: determining the target enhancement processing method corresponding to the current foreground target from the candidate foreground enhancement processing methods based on the correspondence between scene type and foreground target enhancement processing method, wherein the candidate foreground enhancement processing method includes size dimension enhancement processing method, pixel dimension enhancement processing method and grayscale dimension enhancement processing method.

[0117] Among them, candidate foreground enhancement processing methods refer to the methods used to enhance foreground targets, including enhancement in the size dimension, pixel dimension, and grayscale dimension. Size dimension enhancement processing refers to foreground enhancement based on size, which can be achieved by scaling the image at a random magnification. Pixel dimension enhancement processing refers to foreground enhancement based on pixel dimensions, which can be achieved by adjusting brightness, saturation, or blurring. Grayscale dimension enhancement processing refers to foreground enhancement based on grayscale dimensions, which can be achieved by changing grayscale values ​​to make the foreground target blend more seamlessly with the corresponding scene. Specifically, this can be achieved by performing histogram equalization on the foreground target, adjusting the average grayscale value of the image to the grayscale level corresponding to the scene type, thereby achieving fusion between the foreground target and the third scene type.

[0118] Specifically, to make the final generated target image closer to the real environment, the enhancement processing method for the foreground target can be selected according to different scene types. That is, the enhancement processing method can be matched to the characteristics corresponding to different scene types. In this embodiment, the scene type can include a first scene type with a small grayscale variance value corresponding to the background image, a second scene type with a large grayscale variance value corresponding to the background image, and a third scene type with a low average grayscale value corresponding to the background image. For the image corresponding to the first scene type, the grayscale value changes less, meaning the corresponding scene is relatively open. Therefore, only the size dimension of the image of the first scene type needs to be changed to make the foreground target in this scene type closer to the real environment. Furthermore, in real-world scenes, when the robot takes pictures in open scenes, dirty targets (i.e., dirty objects, which can be foreground targets) can be near or far, resulting in large variations in the size of the foreground target. Therefore, size enhancement techniques can be used to scale the foreground target by a certain factor (e.g., 0.5-1.5x). This allows for the generation of foreground targets of different sizes, which can then be used to generate target images. It can also generate target images with foreground targets of the same type but different sizes, thus enriching the diversity and realism of the target images. Furthermore, the foreground target can be rotated at random angles, ranging from 0-360°, with each rotation being a certain factor to obtain foreground targets from different angles. Moreover, the choice of size enhancement techniques can include set transformation enhancement, which includes, but is not limited to, the enhancement types mentioned above. The specific enhancement method can be determined based on further scene segmentation.

[0119] Furthermore, for foreground objects in the second scene type with a large grayscale variance corresponding to the background image, pixel-level enhancement processing can be performed on the foreground object. This includes appropriately adjusting the saturation and blur of the foreground object, allowing it to blend better with the background image of the second scene type, thus more closely resembling a dirty environment in a real scene. For the third scene type with a low average grayscale value corresponding to the background image, since the current foreground object is randomly selected, it cannot be guaranteed that it will fit the atmosphere of the third scene type. Therefore, it is necessary to modify the grayscale dimension of the current foreground object according to the grayscale enhancement processing method, so that the foreground object and the enhanced background image can blend better, obtaining more realistic image data corresponding to the third scene type, thereby effectively improving the quality of the image data corresponding to the third scene type.

[0120] In this embodiment, by determining the target enhancement processing method corresponding to the current foreground target from the candidate foreground enhancement processing methods based on the correspondence between scene type and foreground target enhancement processing method, the foreground target is enhanced in a targeted manner according to the scene type, so that the final foreground target and background image are better integrated, and the integrated image is closer to the real dirty environment.

[0121] In one embodiment, step S208 includes: determining the target fuzzing method from candidate fuzzing methods based on the correspondence between scene type and fuzzing method, wherein the candidate fuzzing methods include uniform distribution fuzzing method, Gaussian distribution fuzzing method and nonlinear distribution fuzzing method.

[0122] Among them, candidate blurring methods refer to the methods used to blur the image after the foreground target and background image are merged. These include uniform distribution, Gaussian distribution, and non-linear distribution blurring methods. Uniform distribution blurring refers to blurring the image in a uniform manner, which can be based on the image characteristics of a first scene type, specifically a mean blurring method. Gaussian distribution blurring refers to blurring the image in a Gaussian distribution, which can be based on the image characteristics of a second scene type, specifically a Gaussian blurring method. Non-linear distribution blurring refers to blurring the image in a non-linear distribution, which can be based on the image characteristics of a third scene type, specifically a median blurring method.

[0123] Specifically, different scene types correspond to different image properties. To make the image after fusing the foreground target and background image closer to the real environment of the corresponding scene, different blurring methods can be applied to images of different scene types. In this embodiment, scene types can include a first scene type with a small grayscale variance value corresponding to the background image, a second scene type with a large grayscale variance value corresponding to the background image, and a third scene type with a low average grayscale value corresponding to the background image. Among them, the grayscale value variation range of the image corresponding to the first scene type is small, that is, the overall grayscale variation of the image is small. Therefore, in order to make the final blurred image more consistent with the uniformity of human vision, a uniformly distributed mean blurring method can be selected to blur the current image obtained by fusing the enhanced foreground target and enhanced background image. This makes the final target image closer to the real dirty environment corresponding to the first scene type.

[0124] Furthermore, the grayscale values ​​of images corresponding to the second scene type vary significantly. If the first blurring method (mean blurring) is directly applied to the image obtained by fusing the enhanced foreground and background images, the resulting image will have uneven blurring across different parts, potentially resulting in a significant difference from the real-world environment. For example, in a two-pixel image, one pixel might have a grayscale value of 1, while the other has a grayscale value of 100. If mean blurring is used, the mean value will be around 50, which is insufficient to adequately represent the grayscale values ​​of these two pixels and thus fails to achieve the desired blurring. However, by using Gaussian distribution blurring (Gaussian blur), which blurs based on the grayscale values ​​of neighboring pixels, uniform blurring between adjacent pixels is ensured. This prevents over-blurring and under-blurring in certain corners of the second scene type due to its large grayscale value variation, thereby improving the quality of the target image for this scene type. When this scene type image is used to train a robot's cleaning recognition neural network, the robot can better identify the types of dirt corresponding to this second scene type.

[0125] Furthermore, the environment corresponding to the third scene type is a relatively dark environment. For such environments, the median grayscale value of a local area of ​​the image can be used to replace all grayscale values ​​within that area. For example, if a pixel has a grayscale value of 5, and the grayscale values ​​of its surrounding pixels are distributed as 3, 6, 8, and 4, then the grayscale values ​​of all surrounding pixels can be directly changed to 5. This non-linear median blurring method effectively improves the brightness of the image corresponding to the third scene type without excessively altering the dark overall appearance of the image. This makes the processed image closer to the real environment, thus improving the data quality of the target image for this third scene type.

[0126] In this embodiment, the target blurring method is determined from the candidate blurring methods by the correspondence between scene type and blurring method. Different blurring methods are determined for different scene types. Based on the different characteristics of different scene types, different blurring methods are adopted, so that the final generated target image is closer to the real dirty environment corresponding to different scenes, thereby improving the data quality of the generated target image.

[0127] In one embodiment, Figure 7 In one embodiment, the process of generating image data that can be used to train a neural network for robot cleaning recognition is described. Figure 7 The image provides a relatively intuitive view of the appearance of certain foreground objects and background images. Among them, Figure 7 The multi-type dirt library is a repository for storing various foreground objects, and the multi-style background library is a repository for storing various background images. When a preset number of dirt images need to be generated, they can be randomly obtained directly from the multi-type dirt library and the multi-style background library, and then corresponding data augmentation and blur enhancement operations can be performed to generate the dirt images. Figure 8 This is a detailed process, as described in one embodiment, for generating image data that can be used to train a neural network for robot cleaning recognition. Figure 8 It can be seen that a random number of foreground targets can be selected from a multi-type dirt database, and data augmentation processing can be performed on the randomly selected foreground targets. In addition, a background image can be randomly selected from a multi-style background database. Since the background image has a significant impact on the judgment of scene type in the actual environment, scene type detection can be performed based on the background image to better determine the scene type corresponding to the image after the foreground target and background image are merged. After scene type detection of the background image, data augmentation processing is further performed on the background image, and the formats corresponding to the foreground target and background image are converted to ensure that the foreground target and background image have the same format. At this time, the background image can also be cropped and labeled. Then, the randomly selected foreground target is scaled and randomly pasted into a randomly selected foreground pasting point in the background image, thus obtaining the dirt image under this process. To make the generated dirt image closer to the real environment, further blur enhancement processing is performed on the generated dirt image, thereby generating image data labeled with corresponding tags that can be directly used for training the robot cleaning recognition neural network. Among these, in Figure 8 Setting the appropriate quantity allows control over the final number of images obtained, and multiple random operations ensure the large-scale generation of a wide variety of images that more closely resemble realistic dirty environments.

[0128] In one embodiment, the robot can take pictures in places such as shopping malls, supermarkets, offices, and parking lots to obtain corresponding image sets. The image sets after distortion correction are used as the image sets to be processed. Then, the image sets to be processed are segmented to obtain the foreground targets corresponding to the robot in these scenarios, and the set of these foreground targets is used as the target foreground set. In addition, the robot also takes background images of different cleaning scenarios on relatively clean floors in these locations. This collection of background images serves as the initial background image set. Then, it randomly selects a current background image from this initial set and detects the scene type corresponding to that image to determine the target scene type. Next, it randomly selects a random number of current foreground targets from this target foreground set. Based on the target scene type, it determines the target enhancement processing method for these current foreground targets and performs the corresponding operation to obtain the enhanced foreground targets. It then performs background enhancement processing on the current background image to obtain an enhanced background image. Next, it randomly selects multiple pasting points from this enhanced background image and randomly pastes these enhanced foreground targets onto these points to obtain the current image. To more closely resemble a real dirty environment, it further blurs the current image. Since different scene types have different characteristics, it determines the target blurring processing method corresponding to the current image based on the target scene type and performs the corresponding operation to generate the corresponding target image. To obtain a larger image dataset, the operation of randomly selecting current background images from the initial background image set can be repeated until the number of generated target images meets the corresponding requirements, thus generating a high-quality target image set. Using this high-quality set of target images to train the robot's cleaning recognition neural network allows the network to learn more about dirty objects in dirty environments, thereby improving the accuracy of the network in identifying dirt and enabling the robot to better perform proactive cleaning in dirty environments.

[0129] Based on the same inventive concept, this application also provides an image generation apparatus for implementing the image generation method described above. The solution provided by this apparatus is similar to the implementation described in the above method; therefore, the specific limitations in one or more image generation apparatus embodiments provided below can be found in the limitations of the image generation method described above, and will not be repeated here.

[0130] In one embodiment, such as Figure 9 As shown, an image generation device is provided, including: an acquisition and segmentation module 900, a detection module 902, an enhancement module 904, a pasting module 906, and a blurring module 908, wherein:

[0131] The acquisition and segmentation module 900 is used to acquire a set of images to be processed and an initial set of background images, perform instance segmentation processing on the images in the set of images to be processed to obtain a target foreground set. The set of images to be processed is obtained by the robot taking pictures of a preset location, and the images to be processed include the foreground subject to be cleaned.

[0132] The detection module 902 is used to randomly select the current background image from the initial background image set, detect the scene type corresponding to the current background image, and obtain the target scene type.

[0133] The enhancement module 904 is used to randomly select a current foreground target from the target foreground set, determine the target enhancement processing method corresponding to the current foreground target based on the target scene type, and execute the operation corresponding to the target enhancement processing method to obtain an enhanced foreground target.

[0134] The pasting module 906 is used to perform background enhancement processing on the current background image to obtain an enhanced background image, and randomly paste the enhanced foreground target into the enhanced background image to obtain the current image.

[0135] The blur module 908 is used to determine the target blur processing method based on the target scene type, perform the operation corresponding to the target blur processing method on the current image, and generate a target image.

[0136] In one embodiment, the image generation device further includes a distortion correction module 910, which is used to acquire an original image set captured by a camera, calculate the distortion parameters and transformation matrix corresponding to each image in the original image set, and perform distortion correction processing on the images in the original image set based on the distortion parameters and transformation matrix to obtain the image set to be processed.

[0137] In one embodiment, the acquisition and segmentation module 900 is further configured to perform polygon annotation on each element corresponding to each image in the image set to be processed, to obtain the target polygon corresponding to each image; and to perform image matting operation on the corresponding image based on the target polygon, to obtain the target foreground set.

[0138] In one embodiment, the detection module 902 is further configured to obtain the grayscale value and total number of pixels corresponding to each pixel in the current background image; calculate the grayscale level probability corresponding to each grayscale level in the current background image based on the grayscale value and total number of pixels; calculate the grayscale average value corresponding to the current background image based on the grayscale level probability corresponding to each grayscale level; calculate the grayscale variance value corresponding to the current background image based on the grayscale value, grayscale level probability and grayscale average value; and determine the target scene type corresponding to the current background image based on the grayscale average value and the grayscale variance value.

[0139] In one embodiment, the image generation device further includes a scene determination module 912, used to obtain a variance threshold, and when the grayscale variance value is less than the variance threshold, to use a first scene type as the target scene type; when the grayscale variance value is greater than the variance threshold, to use a second scene type as the target scene type; and when the grayscale average value meets a third scene condition, to use a third scene type as the target scene type.

[0140] In one embodiment, the enhancement module 904 is further configured to determine the target enhancement processing method corresponding to the current foreground target from the candidate foreground enhancement processing methods based on the correspondence between scene type and foreground target enhancement processing method. The candidate foreground enhancement processing methods include size dimension enhancement processing method, pixel dimension enhancement processing method and grayscale dimension enhancement processing method.

[0141] In one embodiment, the fuzzing module 908 is further configured to determine the target fuzzing method from candidate fuzzing methods based on the correspondence between scene type and fuzzing method, wherein the candidate fuzzing methods include uniform distribution fuzzing method, Gaussian distribution fuzzing method and nonlinear distribution fuzzing method.

[0142] Each module in the aforementioned image generation device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in or independent of the processor in a computer device, or stored in the memory of a computer device as software, so that the processor can call and execute the corresponding operations of each module.

[0143] In one embodiment, a robot is provided whose internal structure diagram can be as follows: Figure 10 As shown, the robot includes a processor, memory, communication interface, display screen, and input devices connected via a system bus. The robot's processor provides computing and control capabilities. The robot's memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The robot's communication interface is used for wired or wireless communication with external robots. Wireless communication can be achieved through Wi-Fi, carrier networks, NFC (Near Field Communication), or other technologies. When the computer program is executed by the processor, it implements an image generation method. The robot's display screen can be an LCD screen or an e-ink display screen. The robot's input devices can be a touch layer covering the display screen, buttons, a trackball, or a touchpad located on the robot's shell, or an external keyboard, touchpad, or mouse.

[0144] In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as follows: Figure 11 As shown, this computer device includes a processor, memory, input / output (I / O) interfaces, and a communication interface. The processor, memory, and I / O interfaces are connected via a system bus, and the communication interface is also connected to the system bus via the I / O interfaces. The processor provides computational and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system, computer programs, and a database. The internal memory provides the environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The database stores data generated during execution. The I / O interfaces are used for exchanging information between the processor and external devices. The communication interface is used for communicating with external terminals via a network connection. When executed by the processor, the computer program implements an image generation method.

[0145] Those skilled in the art will understand that Figure 10 and Figure 11 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.

[0146] In one embodiment, a computer device is provided, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps in the above-described method examples.

[0147] In one embodiment, a computer-readable storage medium is provided having a computer program stored thereon, which, when executed by a processor, implements the steps in the above method embodiments.

[0148] In one embodiment, a computer program product or computer program is provided, the computer program product or computer program including computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, causing the computer device to perform the steps in the above method embodiments.

[0149] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of related data must comply with the relevant laws, regulations and standards of the relevant countries and regions.

[0150] Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium. When executed, the computer program can include the processes of the embodiments described above. Any references to memory, databases, or other media used in the embodiments provided in this application can include at least one of non-volatile and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetic random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc. Volatile memory can include random access memory (RAM) or external cache memory, etc. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM). The databases involved in the embodiments provided in this application may include at least one type of relational database and non-relational database. Non-relational databases may include, but are not limited to, blockchain-based distributed databases. The processors involved in the embodiments provided in this application may be general-purpose processors, central processing units, graphics processing units, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, etc., and are not limited to these.

[0151] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

[0152] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of this patent application. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this application should be determined by the appended claims.

Claims

1. A method for generating an image, characterized in that, The method includes: Obtain a set of images to be processed and an initial set of background images. Perform instance segmentation on the images in the set of images to be processed to obtain a target foreground set. The images to be processed include the foreground subject to be cleaned. Randomly select a current background image from the initial background image set, detect the scene type corresponding to the current background image, and obtain the target scene type; Randomly select a current foreground target from the target foreground set, determine the target enhancement processing method corresponding to the current foreground target based on the target scene type, and execute the operation corresponding to the target enhancement processing method to obtain the enhanced foreground target; The current background image is enhanced to obtain an enhanced background image. Multiple pasting points are randomly selected, and the enhanced foreground target is randomly pasted into the pasting points of the enhanced background image to obtain the current image. Based on the target scene type, a target blurring processing method is determined, and the current image is subjected to the operation corresponding to the target blurring processing method to generate a target image.

2. The method according to claim 1, characterized in that, Before obtaining the set of images to be processed, the process also includes: Obtain the original image set captured by the camera, and calculate the distortion parameters and transformation matrix corresponding to each image in the original image set; Based on the distortion parameters and transformation matrix, distortion correction processing is performed on the images in the original image set to obtain the image set to be processed.

3. The method according to claim 1, characterized in that, The process of obtaining the image set to be processed and performing instance segmentation on the images in the image set to obtain the target foreground set includes: Polygon annotation is performed on each element corresponding to each image in the image set to be processed to obtain the target polygon corresponding to each image; Based on the target polygon, a cutout operation is performed on the corresponding image to obtain the target foreground set.

4. The method according to claim 1, characterized in that, The detection of the scene type corresponding to the current background image to obtain the target scene type includes: Obtain the grayscale value and total number of pixels corresponding to each pixel in the current background image; Based on the gray value corresponding to each pixel and the total number of pixels, calculate the probability of each gray level in the current background image; Based on the gray level probabilities corresponding to each gray level, calculate the average gray level of the current background image; Based on the grayscale value, grayscale level probability, and grayscale average value, calculate the grayscale variance value corresponding to the current background image; Based on the average grayscale value and the grayscale variance value, the target scene type corresponding to the current background image is determined.

5. The method according to claim 4, characterized in that, The step of determining the target scene type corresponding to the current background image based on the average grayscale value and the grayscale variance value includes: Obtain a variance threshold; when the grayscale variance value is less than the variance threshold, use the first scene type as the target scene type. When the grayscale variance value is greater than the variance threshold, the second scene type is taken as the target scene type; When the average grayscale value meets the third scene condition, the third scene type is taken as the target scene type.

6. The method according to claim 1, characterized in that, The step of determining the target enhancement processing method corresponding to the current foreground target based on the target scene type includes: Based on the correspondence between scene type and foreground target enhancement processing method, the target enhancement processing method corresponding to the current foreground target is determined from the candidate foreground enhancement processing methods. The candidate foreground enhancement processing methods include size dimension enhancement processing method, pixel dimension enhancement processing method and grayscale dimension enhancement processing method.

7. The method according to claim 1, characterized in that, The determination of the target blurring method based on the target scene type includes: Based on the correspondence between scene type and fuzzy processing method, the target fuzzy processing method is determined from the candidate fuzzy processing methods, which include uniform distribution fuzzy processing method, Gaussian distribution fuzzy processing method and nonlinear distribution fuzzy processing method.

8. An image generation apparatus, characterized in that, The device includes: The acquisition and segmentation module is used to acquire a set of images to be processed and an initial set of background images, perform instance segmentation on the images in the set of images to be processed to obtain a target foreground set, wherein the images to be processed include the foreground subject to be cleaned. The detection module is used to randomly select the current background image from the initial background image set, detect the scene type corresponding to the current background image, and obtain the target scene type; The enhancement module is used to randomly select a current foreground target from the target foreground set, determine the target enhancement processing method corresponding to the current foreground target based on the target scene type, execute the operation corresponding to the target enhancement processing method, and obtain the enhanced foreground target; The paste module is used to perform background enhancement processing on the current background image to obtain an enhanced background image, randomly select multiple paste points, and randomly paste the enhanced foreground target into the paste points of the enhanced background image to obtain the current image; The blur module is used to determine the target blur processing method based on the target scene type, perform the operation corresponding to the target blur processing method on the current image, and generate the target image.

9. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, The processor is used to implement the steps of the method according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 7.

Citation Information

Patent Citations

  • System and method for utilizing enhanced scene detection in a depth estimation procedure

    CN103685861A

  • A method for generating a labeled data set for training a deep learning target detection network

    CN109816014A