Anomalous image generation device and computer-readable storage medium
The abnormal image generation device addresses the limitations of existing models by generating controlled abnormal images with specified abnormalities, improving training and evaluation reliability through diverse image variations.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- FANUC LTD
- Filing Date
- 2024-12-25
- Publication Date
- 2026-07-02
Smart Images

Figure JP2024045980_02072026_PF_FP_ABST
Abstract
Description
Abnormal Image Generation Device and Computer-Readable Storage Medium
[0001] The present disclosure relates to an abnormal image generation device and a computer-readable storage medium.
[0002] Currently, appearance inspections of products using AI (Artificial Intelligence) are being carried out. In appearance inspections, an abnormality detection model is used. In order to create an abnormality detection model, it is necessary to learn a large number of product images. Also, evaluation images for evaluating the learned model are required. Since the number of products with abnormalities is not large, there is a technology for creating learning images and evaluation images using AI.
[0003] When creating learning images and evaluation images, there is a desire to specify the position of the abnormality, the type of abnormality, etc.
[0004] In order to generate an abnormality at a specific position, in the creation of the learning image (and also the evaluation image) of Patent Document 1, an image of the abnormal part and a mask of the shape of the abnormal part are prepared in advance, and using an affine transformation or the like, the prepared image of the abnormal part is shaped into the form of the mask. Then, the texture of the base image is transformed and combined with the image of the abnormal part to generate a learning image.
[0005] Japanese Patent Application Laid-Open No. 2018-205123
[0006] An abnormal image created based on an existing image is called a pseudo-abnormal image. Since the image of the abnormal part of the pseudo-abnormal image is limited to the existing image, there are limitations in variations. If the variations of the learning images are scarce, there is a possibility of causing overlearning, and if the variations of the evaluation images are scarce, the reliability of the evaluation results will be low.
[0007] In the field of abnormality detection using image recognition, it is desired to generate an abnormal image including a desired abnormality at a desired position.
[0008] The abnormal image generation apparatus according to this disclosure comprises: a base abnormal image generation unit that inputs a first base image and a mask image into a first image generation model to generate a base abnormal image in which an abnormality is generated in the mask region; a cropping unit that creates a partial abnormal image by cropping the area around the mask region from the base abnormal image; a data expansion unit that inputs the partial abnormal image and a mask image for data expansion into a second image generation model to reconstruct the mask region and generate one or more expanded images; and a synthesis unit that synthesizes the expanded images with the first base image or the second base image to generate one or more abnormal images.
[0009] This is a block diagram of the abnormal image generation device of the first embodiment. This diagram illustrates the generation of a base abnormal image. This diagram illustrates the cropping of a partial abnormal image. This diagram illustrates the expansion of a partial abnormal image. This diagram illustrates the synthesis of partial abnormal images. This is a flowchart illustrating the operation of the abnormal image generation device. This diagram illustrates the problem of the inpainting model. This is a block diagram of the abnormal image generation device of the second embodiment. This is a hardware configuration diagram of the abnormal image generation device.
[0010] Embodiments of this disclosure will be described below with reference to the drawings. In the following description, components having the same or similar functions will be denoted by the same reference numerals. Duplication of these components may be omitted.
[0011] In this application, "based on XX" means "based on at least XX," and includes cases where it is based on another element in addition to XX. Furthermore, "based on XX" is not limited to cases where XX is used directly, but also includes cases where it is based on something that has been calculated or processed. "XX" is any element (for example, any information).
[0012] The abnormal image generation device 100 of this embodiment is applied to an information processing device such as a personal computer.
[0013] [First Embodiment] Figure 1 is a block diagram of the abnormal image generation device 100 according to the first embodiment. The abnormal image generation device 100 comprises a base abnormal image generation unit 11, a cropping unit 12, a data expansion unit 13, and a synthesis unit 14.
[0014] The base abnormality image generation unit 11 acquires a good product image (first base image) and a mask. Figure 2 shows an example of a mask and a good product image. The mask image is a binary image of the same size as the original image (good product image), showing the region where an abnormality is to be generated and the other regions. The good product image is an image of a product in which no abnormality has occurred.
[0015] The base abnormal image generation unit 11 generates abnormalities (product defects) in good product images using a first image generation model. The first image generation model is a probabilistic generation model. In a probabilistic generation model, images are generated based on a probability distribution. The image generation model is, for example, a diffusion model.
[0016] The image generation model attempts to generate anomalies within the masked area. However, because it generates images probabilistically, it does not always generate anomalies as specified, and unexpected images may be generated. In the example in Figure 2, anomalies are generated outside the masked area. The image generation model may also generate anomalies with shapes different from the mask, or it may not generate any anomalies in the mask at all. The anomaly images generated by the image generation model are highly varied and cannot be controlled.
[0017] The cropping unit 12 receives the base anomaly image and the mask image as input, and based on the mask image, crops around the mask to create a partial anomaly image. The partial anomaly image includes the mask. Figure 3 shows an example of a partial anomaly image. In the example in Figure 3, a rectangular area is cropped from the base anomaly image to create a partial anomaly image. The cropped shape may also be a circle or other shape.
[0018] The data augmentation unit 13 receives a partially abnormal image as input and performs data augmentation (amplification) of the partially abnormal image using an inpainting model (second image generation model). The inpainting model performs the task of reconstructing the specified region without any unnatural appearance.
[0019] Inpainting models include learning-based models. Learning-based inpainting models are probabilistic generative models that utilize deep learning, similar to image generation models. They take an original image and a mask image as input and reconstruct an image that blends seamlessly with the surrounding images into the mask region. Various methods have been proposed for learning-based inpainting models. The type of inpainting model used is not limited. An example of an inpainting model is a diffusion model. Because learning-based inpainting models are probabilistic generative models, there is a wide variety of variations.
[0020] The mask image input to the data augmentation unit 13 may be a dedicated one, or it may be the same mask image used to generate the base anomaly image. Image processing such as dilation / contraction may be applied to the mask image. Figure 4 shows a mask image for data augmentation.
[0021] The data augmentation unit 13 reconstructs the mask portion of the partial anomaly image. This generates multiple augmented images that include anomalies that blend seamlessly with the surrounding area. In the augmented images, the area near the mask boundary and the characteristics of the anomalies are represented in diverse ways.
[0022] The synthesis unit 14 synthesizes a good image (the first or second base image) with an extended image. The good image to be synthesized may be the good image used in the abnormal image generation, or it may be any other good image. Synthesis methods include pasting the images as they are, using image processing such as Poisson image editing, or replacing only the pixels in positions where they differ significantly. Figure 5 shows an example of a synthesized image of a good image and an extended image. The extended image is pasted onto a rectangular area from which a partially abnormal image has been cut out. By creating multiple good images with extended images pasted onto them, a large number of training images and evaluation images can be generated.
[0023] Figure 6 is a flowchart illustrating the operation of the abnormal image generation device 100. First, the abnormal image generation device 100 acquires a good product image and a mask image (step S1). The base abnormal image generation unit 11 uses an image generation model to generate a base abnormal image in which an abnormality is introduced in the mask region (step S2). The abnormal image generated by the base abnormal image generation unit is uncontrolled and may contain unexpected abnormalities.
[0024] The cropping unit 12 receives the base abnormality image, crops around the mask so that the abnormal portion is included, and creates a partial abnormality image (step S3). The partial abnormality image includes only the abnormality around the mask.
[0025] The data augmentation unit 13 receives the partially abnormal image as input and performs data augmentation of the partially abnormal image using an inpainting model (step S4). Through data augmentation, multiple augmented images are created in which only the masked portion of the partially abnormal image is reconstructed.
[0026] The synthesis unit 14 synthesizes the good image and the extended image (step S5). By synthesizing the good image and the extended image, it is possible to generate an abnormal image in which abnormalities occur only in the region of the specified mask.
[0027] The abnormal image generation device 100 of this embodiment first generates a base abnormal image from a good product image and a mask image. An image generation model is used to generate the base abnormal image. Images generated by the image generation model have a wide variety of variations, but they are difficult to control, and unexpected events may occur.
[0028] Therefore, a partially anomalous image is created by cropping the image around the mask, and the mask region of the partially anomalous image is reconstructed using an inpainting model to generate multiple augmented images. The inpainting model also uses deep learning, so there are many variations, but it is difficult to control. For example, as shown in Figure 7, if the mask is small relative to the original image, the image may not change. To address this problem, in this embodiment, the partially anomalous image is cropped, and the mask region relative to the original image (partially anomalous image) is made sufficiently large. As a result, the image of the mask region is reconstructed.
[0029] Since good images do not contain unnecessary information, it is possible to generate abnormal images in which abnormalities are introduced only in the masked area by pasting an extended image onto a good image.
[0030] [Second Embodiment] Figure 8 is a block diagram of the abnormal image generation device 100 of the second embodiment. The abnormal image generation device 100 of the second embodiment includes a first prompt acquisition unit 15 and a second prompt acquisition unit 16.
[0031] The first prompt acquisition unit 15 acquires prompts for generating a base abnormal image. A prompt is an instruction for image generation. For example, the type of abnormality can be specified using prompts such as "Generate a scratch," "Generate a burn," or "Generate a discoloration."
[0032] The second prompt acquisition unit 16 acquires prompts for data augmentation. A prompt is an instruction for image generation. Using prompts, it is possible to specify the type of anomaly, or even the state of the anomaly, such as "change the characteristics of the burn" or "make the scratches deeper." The content can be specified by the user.
[0033] One image generation model that obtains prompts and generates abnormal images is the diffusion model. Within the diffusion model, there is a multimodal learning model called VLM (Visual Language Model) that learns the relationship between images and text. VLM outputs information representing the relationship between text and images from prompts, etc., and adds this information to the noise data of the diffusion model. When the diffusion model generates an image based on this information, it can produce an abnormal image with added prompt information.
[0034] In the abnormal image generation device 100 of the second embodiment, the type and shape of the abnormality can be controlled using prompts. This makes it possible to generate abnormalities of a desired location, type, and shape.
[0035] According to the abnormal image generation device 100 of this embodiment, it is possible to control the uncertainty of the image generation model while taking advantage of the diversity of image generation models, thereby generating the desired abnormal image.
[0036] The hardware configuration of the abnormal image generation device 100 to which this disclosure applies will be described below. Figure 9 is a hardware configuration diagram of the abnormal image generation device 100. As shown in Figure 9, the abnormal image generation device 100 includes a CPU 111 that controls the abnormal image generation device 100 as a whole, a ROM 112 that records programs and data, and a RAM 113 for temporarily expanding data. The CPU 111 reads the system program recorded in the ROM 112 via a bus.
[0037] The non-volatile memory 114 is backed up, for example, by a battery (not shown), so that its stored state is maintained even when the power to the abnormal image generation device 100 is turned off. Various data such as programs read from the external device 120 via interfaces 115, 118, and 119, and operation inputs input via the input device 20 are stored in the non-volatile memory 114. Programs and data for running the abnormal image generation device 100 of this embodiment may also be stored in the non-volatile memory 114.
[0038] Interface 115 is an interface for connecting the abnormal image generation device 100 to an external device 120 such as an adapter. Programs and various parameters are read from the external device 120. Interface 118 is an interface for connecting the abnormal image generation device 100 to a display device 30 such as a liquid crystal display. The display device 30 displays data obtained as a result of the execution of various data, programs, etc., that have been read into memory. Interface 119 is an interface for connecting the abnormal image generation device 100 to an input device 20 such as a keyboard or pointing device. The input device 20 passes commands, data, etc., based on operations by the operator to the CPU 111 via interface 119.
[0039] While embodiments of this disclosure have been described in detail above, this disclosure is not limited to the individual embodiments described above. These embodiments can be added, replaced, modified, partially deleted, etc., in any way that does not depart from the spirit of the invention or from the idea and intent of this disclosure derived from the claims and their equivalents. For example, the order of operations and processes in the embodiments described above are shown as examples only and are not limited thereto. The same applies when numerical values or mathematical formulas are used in the description of the embodiments described above.
[0040] The following are annotations relating to embodiments of the present disclosure. (Annotation 1) An abnormal image generation device (100) according to one aspect of the present disclosure includes: a base abnormal image generation unit (11) that inputs a first base image and a mask image into a first image generation model to generate a base abnormal image in which an abnormality is generated in the mask region; a cropping unit (12) that creates a partial abnormal image by cropping the area around the mask region from the base abnormal image; a data expansion unit (13) that inputs a mask image for data expansion into a second image generation model to reconstruct the mask region and generate one or more expanded images; and a synthesis unit (14) that synthesizes the expanded images with the first base image to the second base image to generate one or more abnormal images. (Annotation 2) The abnormal image generation device (100) includes a first prompt acquisition unit (15) that acquires a prompt for generating a base abnormal image, and the base abnormal image generation unit (11) generates a base abnormal image with the prompt instructions added. (Annotation 3) The first image generation model is a diffusion model. (Note 4) The abnormal image generation device (100) includes a second prompt acquisition unit (16) for acquiring prompts for data expansion, and the data expansion unit (13) generates an expanded image with the prompt instructions added. (Note 5) The second image generation model is a diffusion model. (Note 6) The first base image and the second base image are good product images. (Note 7) A computer-readable storage medium (112, 113, 114) according to one aspect of the present disclosure records a program that causes a computer (111) to operate as: a base anomaly image generation unit (11) that inputs a first base image and a mask image into a first image generation model to generate a base anomaly image in which an anomaly occurs in the mask region; a cropping unit (12) that creates a partial anomaly image by cropping the area around the mask region from the base anomaly image; a data expansion unit (13) that inputs a mask image for data expansion into a second image generation model to reconstruct the mask region and generate one or more expanded images; and a synthesis unit (14) that synthesizes the expanded images with the first or second base image to generate one or more anomaly images.
[0041] 100 Abnormal Image Generation Device 11 Base Abnormal Image Generation Unit 12 Cutting Unit 13 Data Expansion Unit 14 Synthesis Unit 15 First Prompt Acquisition Unit 16 Second Prompt Acquisition Unit 111 CPU 112 ROM 113 RAM 114 Non-Volatile Memory
Claims
1. An abnormal image generation device comprising: a base abnormal image generation unit that inputs a first base image and a mask image into a first image generation model to generate a base abnormal image in which an abnormality is generated in the mask region; a cropping unit that creates a partial abnormal image by cropping the area around the mask region from the base abnormal image; a data expansion unit that inputs a mask image for data expansion into a second image generation model to reconstruct the mask region and generate one or more expanded images; and a synthesis unit that synthesizes the expanded images with the first base image or the second base image to generate one or more abnormal images.
2. The abnormal image generation apparatus according to claim 1, comprising a first prompt acquisition unit for acquiring a prompt for generating a base abnormal image, wherein the base abnormal image generation unit generates a base abnormal image with the prompt instructions added.
3. The abnormal image generation apparatus according to claim 1, wherein the first image generation model is a diffusion model.
4. The abnormal image generation device according to claim 1, further comprising a second prompt acquisition unit for acquiring a prompt for data augmentation, wherein the data augmentation unit generates an augmented image with the prompt instructions added.
5. The abnormal image generation apparatus according to claim 1, wherein the second image generation model is a diffusion model.
6. The abnormal image generation apparatus according to claim 1, wherein the first base image and the second base image are good product images.
7. A computer-readable storage medium that records a program causing the computer to operate as: a base anomaly image generation unit that inputs a first base image and a mask image into a first image generation model to generate a base anomaly image in which an anomaly occurs in the mask region; a cropping unit that creates a partial anomaly image by cropping the area around the mask region from the base anomaly image; a data expansion unit that inputs a mask image for data expansion into a second image generation model to reconstruct the mask region and generate one or more expanded images; and a synthesis unit that synthesizes the expanded images with the first base image or the second base image to generate one or more anomaly images.