Image processing apparatus and image processing method

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
The image processing apparatus and method generate simulated images to visually assess the robustness of machine learning models against environmental changes, ensuring high accuracy and reliability in image recognition systems.

JP7880551B2Active Publication Date: 2026-06-26PANASONIC INTELLECTUAL PROPERTY MANAGEMENT CO LTD

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Patents
Current Assignee / Owner: PANASONIC INTELLECTUAL PROPERTY MANAGEMENT CO LTD
Filing Date: 2022-03-11
Publication Date: 2026-06-26

AI Technical Summary

Technical Problem

Existing image recognition systems using machine learning models lack the ability for system developers and operators to easily verify the robustness of the models against various environmental changes, leading to potential decreases in accuracy.

Method used

An image processing apparatus and method that generates simulated images reflecting different environmental conditions and superimposes state images onto original and simulated images to visually display recognition states, allowing users to assess the robustness of machine learning models.

Benefits of technology

Enables easy visual confirmation of machine learning model robustness against environmental changes, facilitating the construction of highly robust systems.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure 0007880551000001
Figure 0007880551000002
Figure 0007880551000003

Patent Text Reader

Abstract

To enable persons in charge of system development and system operation to easily visually confirm robustness of a machine learning model with respect to diverse environmental changes that are expected on site and to make it possible to construct a highly robust system.SOLUTION: A device includes the steps of: executing, according to a user operation that specifies a working condition, image working processing on an original image based on the specified working condition for generating a simulated image which reproduces an image captured in a prescribed situation; generating a heat map (status image) that represents a recognized state of a subject to be detected in image recognition processing pertaining to the simulated image; and outputting display information of a visualization result in which the heat map is superimposed on the simulated image.SELECTED DRAWING: Figure 4

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The present invention relates to an image processing apparatus and an image processing method that enable a user to visually confirm the recognition performance of a machine learning model by visualizing and presenting to the user the recognition state of an event to be detected in image recognition processing when detecting a predetermined event from a captured image using the machine learning model.

Background Art

[0002] Systems that detect the occurrence of a predetermined event in a monitoring area by performing image recognition processing on an image captured by a camera in the monitoring area are widely used. In particular, in recent years, the accuracy of image recognition processing has been remarkably improved by using a machine learning model constructed by machine learning such as deep learning.

[0003] On the other hand, when a machine learning model is used for image recognition processing, the machine learning model is a black box, and the process leading to the recognition result in the machine learning model is unknown. Therefore, conventionally, a technique of visualizing and displaying the basis for the determination leading to the recognition result in image recognition processing using a machine learning model in an image or characters is known (see Patent Document 1).

Prior Art Documents

Patent Documents

[0004]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0005] According to the conventional technology, since the basis for the determination leading to the recognition result in image recognition processing using a machine learning model is visualized and displayed in an image or characters, the user can easily grasp the process leading to the recognition result in the machine learning model visually.

[0006] However, environmental changes can cause the conditions of the monitoring area to change in diverse ways. In this case, image recognition processing using machine learning models may also be affected by environmental changes, potentially leading to a decrease in accuracy. Therefore, it is insufficient to evaluate the recognition performance of machine learning models using images captured under specific conditions. It is desirable to evaluate the recognition performance of machine learning models using images that have changed in various ways in response to environmental changes, that is, to confirm whether sufficient robustness (robustness) of image recognition processing using machine learning models against environmental changes is ensured. In particular, it is desirable that system development and system operation personnel themselves be able to easily verify robustness against the diverse environmental changes that can be expected in the field.

[0007] Therefore, the main objective of the present invention is to provide an image processing device and an image processing method that enable system development and system operation personnel to easily visually confirm the robustness of machine learning models against various environmental changes expected in the field, and to construct a system with high robustness. [Means for solving the problem]

[0008] The present invention is an image processing apparatus that uses a machine learning model to detect a predetermined event from an captured image, and the processor performs a process to visualize the recognition state of the detected object in the image recognition process, wherein the processor is: The original image in which the detection target is set is input, and the original image In response to user operations specifying processing conditions, based on the specified processing conditions The aforementioned Image processing is performed on the original image to generate a simulated image that reproduces the image taken under predetermined conditions. A state image representing the recognition state of the detected target is generated when the image recognition process is performed on the original image and the simulated image, and a first visualization result display information is generated by superimposing the state image onto the original image, and a second visualization result display information is generated by superimposing the state image onto the simulated image. The system will be configured to output [this].

[0009] Furthermore, the present invention relates to an image processing method in which an information processing device performs a process to visualize the recognition state of a target to be detected in an image recognition process that detects a predetermined event from a captured image using a machine learning model, The original image in which the detection target is set is input, and the original image In response to user operations specifying processing conditions, based on the specified processing conditions The aforementioned Image processing is performed on the original image to generate a simulated image that reproduces the image taken under predetermined conditions. A state image representing the recognition state of the detected target is generated when the image recognition process is performed on the original image and the simulated image, and a first visualization result display information is generated by superimposing the state image onto the original image, and a second visualization result display information is generated by superimposing the state image onto the simulated image. The system will be configured to output [this]. [Effects of the Invention]

[0010] According to the present invention, Image recognition processing is performed on the original image and simulated images that reflect various environmental changes. State images representing the recognition state of the detected target are generated, and a first visualization result display information is obtained by superimposing the state image onto the original image, and a second visualization result display information is obtained by superimposing the state image onto the simulated image. This is output. This allows system developers and system operators to easily visually verify the robustness of machine learning models against various environmental changes expected in the field, enabling the construction of highly robust systems. [Brief explanation of the drawing]

[0011] [Figure 1] Overall configuration diagram of the robustness verification system according to this embodiment. [Figure 2] An explanatory diagram showing an overview of the processing performed by the image processing device. [Figure 3] Block diagram showing the schematic configuration of the image processing device. [Figure 4] Block diagram showing an overview of the processing performed by the image processing device. [Figure 5] Diagram showing the original image settings screen. [Figure 6] Diagram showing the detection target settings screen. [Figure 7] Diagram showing the processing condition setting screen. [Figure 8] Diagram showing a simulated image display screen. [Figure 9] Diagram illustrating the visualization results display screen. [Figure 10] Diagram illustrating the visualization results display screen. [Figure 11] An explanatory diagram showing another example of the visualization results display screen. [Figure 12] Diagram illustrating the calculation procedure for the validity score. [Figure 13] Diagram illustrating the real-time visualization results display screen. [Figure 14] This diagram illustrates the visualization results screen when multiple detection targets are specified. [Figure 15]Explanatory diagram showing another example of the visualization result display screen when multiple detection targets are specified [Figure 16] Explanatory diagram showing another example of the visualization result display screen when multiple detection targets are specified [Figure 17] Flow diagram showing the operation procedure of the image processing apparatus

Embodiments for Carrying Out the Invention

[0012] A first invention made to solve the above problems is an image processing apparatus that executes, by a processor, a process of visualizing a recognition state of a detection target in an image recognition process for detecting a predetermined event from a captured image using a machine learning model. The processor is configured to: The original image in which the detection target is set is input, and the original image According to an operation of a user who specifies processing conditions, based on the specified processing conditions The aforementioned Execute an image processing process on the original image to generate a simulated image that reproduces the captured image in a predetermined situation, A state image representing the recognition state of the detected target is generated when the image recognition process is performed on the original image and the simulated image, and a first visualization result display information is generated by superimposing the state image onto the original image, and a second visualization result display information is generated by superimposing the state image onto the simulated image. And output the simulated image.

[0013] According to this, Image recognition processing is performed on the original image and simulated images that reflect various environmental changes. State images representing the recognition state of the detected target are generated, and a first visualization result display information is obtained by superimposing the state image onto the original image, and a second visualization result display information is obtained by superimposing the state image onto the simulated image. The simulated image is output. As a result, the robustness of the machine learning model against various environmental changes assumed at the site can be easily visually confirmed by the persons in charge of system development and system operation themselves, and a system with high robustness can be constructed.

[0014] Further, a second invention is configured such that the processor presents a detection target setting screen to the user and sets the detection target specified by the user according to an operation of the user on the detection target setting screen.

[0015] According to this, by the user changing the detection target variously, the recognition states of various detection targets can be confirmed. Note that the detection target may be a specific type of object, or a specific state of a specific type of object.

[0016] Furthermore, the third invention is configured such that the processor presents a processing condition setting screen to the user and sets the processing conditions specified by the user in response to the user's operations on the processing condition setting screen.

[0017] According to this method, it is possible to generate simulated images that reproduce images taken under various environmental conditions where accuracy degradation is a concern in the monitoring area, thereby reliably verifying the robustness of the machine learning model to environmental changes.

[0018] Furthermore, the fourth invention is configured such that the image processing includes at least one of blurring, illumination adjustment, and virtual object overlay processing.

[0019] According to this, blurring can generate simulated images that reproduce, for example, images taken when the camera lens is fogged up or images taken outdoors in fog. Additionally, illumination adjustment can generate simulated images that reproduce, for example, images taken in bright sunlight or images taken in weak sunlight and with the lighting equipment off. Furthermore, virtual object overlay processing can generate simulated images that represent, for example, a surveillance area crowded with people or a target object hidden by the shadows of other objects.

[0020] Furthermore, the fifth invention is configured such that the processor superimposes a grayscale image on the simulated image as the state image, which represents the degree of contribution of each part in the simulated image to the recognition result in the image recognition process.

[0021] According to this, users can appropriately understand the recognition status of the object being detected in the image recognition process. Note that the grayscale image may be one in which the hue is gradually changed according to its contribution, or a monochrome image in which the brightness (density) is gradually changed according to its contribution.

[0022] Furthermore, the sixth invention is configured such that the processor superimposes a score image on the simulated image, which numerically represents the accuracy of the recognition state of the detected object relative to the simulated image, as the state image.

[0023] According to this, users can easily grasp the accuracy of the recognition state of the detected object in the simulated image, that is, the validity of the machine learning model for the simulated image. The accuracy of the recognition state of the detected object in the simulated image can be quantified, for example, based on the degree of consistency between the region of the detected object in the simulated image and the region of the state image superimposed on the simulated image.

[0024] Furthermore, the seventh invention is configured such that, when a user specifies a plurality of detection targets, the processor superimposes the state images for each of the plurality of detection targets onto the simulated image in a way that allows for identification.

[0025] According to this, the user can visually grasp the recognition status of each detected object. In this case, for example, the status images for each detected object may be displayed simultaneously in different display formats, specifically with different colors or patterns. Alternatively, the status images superimposed on the simulated image may be switched according to the user's operation of selecting a detected object from the tabs displayed on the screen.

[0026] Furthermore, the eighth invention is an image processing method in which an information processing device performs a process to visualize the recognition state of the detected object in an image recognition process that detects a predetermined event from a captured image using a machine learning model, The original image in which the detection target is set is input, and the original image In response to user operations specifying processing conditions, based on the specified processing conditions The aforementioned Image processing is performed on the original image to generate a simulated image that reproduces the image taken under predetermined conditions. A state image representing the recognition state of the detected target is generated when the image recognition process is performed on the original image and the simulated image, and a first visualization result display information is generated by superimposing the state image onto the original image, and a second visualization result display information is generated by superimposing the state image onto the simulated image. The system will be configured to output [this].

[0027] According to this, similar to the first invention, the robustness of machine learning models against various environmental changes expected in the field can be easily visually confirmed by the system development and operation personnel themselves, enabling the construction of a highly robust system.

[0028] Hereinafter, embodiments of the present invention will be described with reference to the drawings.

[0029] Figure 1 is an overall diagram of the robustness verification system according to this embodiment.

[0030] This system comprises an image processing device 1 (information processing device), a camera 2, and a recorder 3.

[0031] Camera 2 captures images of the surveillance area. Recorder 3 stores the images captured by Camera 2. Image processing device 1 receives real-time images from Camera 2. Image processing device 1 also receives images stored in Recorder 3.

[0032] The image processing device 1 is comprised of a PC or similar device. The image processing device 1 is connected to a display 4 and an input device 5 such as a keyboard or mouse. Alternatively, the display 4 and input device 5 may be integrated into a single touch panel display.

[0033] The image processing device 1 performs a process to visualize the recognition state of the detected object in image recognition processing, which uses a machine learning model to detect a predetermined event from a captured image. By presenting this visualization result to the user, the user can visually confirm the validity of the machine learning model. In particular, in this embodiment, the validity of the machine learning model is evaluated for captured images whose conditions change in various ways due to environmental changes, that is, the user can easily confirm whether sufficient robustness (robustness) of the image recognition processing using the machine learning model against environmental changes is ensured.

[0034] Next, we will explain the processing performed by the image processing device 1. Figure 2 is an explanatory diagram showing an overview of the processing performed by the image processing device 1.

[0035] Image processing device 1 acquires the original image 21 (original captured image) from camera 2 or recorder 3. In this example, the monitoring area (camera 2's shooting area) is the elevator hall (the space used by people getting on and off the elevator). In this example, the object being detected is a wheelchair, and the original image 21 shows a person getting out of the elevator while moving the wheelchair.

[0036] The image processing device 1 performs image processing on the original image 21 to generate a simulated image that reproduces the captured image under predetermined conditions. The image processing is performed based on the processing conditions specified by the user, in response to the user's operation to specify the processing conditions. That is, the user specifies the processing conditions, i.e., various parameters in the image processing, based on the environmental changes expected in the target site (monitoring area).

[0037] In this embodiment, image processing includes blurring, illumination adjustment, and virtual object overlay processing.

[0038] In the blurring process, a blurred image transformation is applied to the original image 21 to generate a simulated image 22 that reproduces the captured image with blurring. Specifically, the simulated image 22 generated reproduces an image taken when the lens of camera 2 is foggy, or an image taken outdoors when fog is present.

[0039] In the illumination adjustment process, an image transformation is applied to the original image 21 to change its brightness, generating simulated images that reproduce images taken under low and high illumination conditions. Specifically, when the illumination is set to high, a simulated image is generated that reproduces an image taken under strong sunlight conditions. Conversely, when the illumination is set to low, a simulated image is generated that reproduces an image taken under weak sunlight conditions with the lighting equipment turned off.

[0040] In the virtual object overlay process, a predetermined virtual object image 23 is superimposed on the original image 21. In this example, an image of a person is superimposed on the original image 21 as the virtual object image 23. When the virtual object image 23 is an image of a person, a simulated image is generated that represents a state where the monitoring area is crowded with people. A simulated image is also generated that represents a state where the object to be detected is hidden and concealed by the shadow of other objects. The virtual object image 23 is created by cutting out the region of an object (such as a person) from an image previously captured by a camera 2 or the like. The virtual object image 23 is also generated using computer graphics (CG). Note that the virtual object image 23 may also be a silhouette image.

[0041] In this embodiment, blurring, illumination adjustment, and virtual object overlay are performed as image processing, but other image processing may also be performed. For example, the image resolution may be changed. Alternatively, an image editing application software may be launched, and various image processing operations may be performed in response to user input.

[0042] Furthermore, the image processing device 1 performs a visualization process to visualize the recognition state of the detected object in the image recognition process using a machine learning model for the target images (original image 21, simulated image 22). In this embodiment, visualization results 25 and 26 are generated in which a heat map 27 (state image) representing the recognition state of the detected object is superimposed on the target images (original image 21, simulated image 22).

[0043] The heatmap 27 is a grayscale image that represents the degree of contribution of each part (e.g., pixel by pixel, or block by block containing multiple pixels) within the target image (original image 21, simulated image 22) to the recognition result in image recognition processing. Specifically, the heatmap 27 is a grayscale image relating to hue, with the hue gradually changing according to the degree of contribution. For example, the hue may be changed in the order of red, yellow, green, and blue, from highest to lowest contribution. Alternatively, the heatmap 27 may be a grayscale image relating to density, with the brightness (density) of a single color gradually changing according to the degree of contribution.

[0044] The user can compare the image of the object to be detected in the target image (original image 21, simulated image 22) with the heatmap 27 superimposed on the target image, and visually determine the degree of overlap (consistency), in particular whether the area with a high contribution in the heatmap 27 is located at the center of the image of the object to be detected. This allows the user to confirm the accuracy of the recognition result of the image recognition process, i.e., the validity of the machine learning model.

[0045] In this embodiment, the recognition state of the detected object in the image recognition process using a machine learning model for the target image (original image 21, simulated image 22) is visualized by superimposing a heatmap 27 onto the target image (original image 21, simulated image 22). However, the method of visualization is not limited to a heatmap 27.

[0046] Next, the image processing device 1 will be described. Figure 3 is a block diagram showing the schematic configuration of the image processing device 1. Figure 4 is a block diagram showing an overview of the processing performed by the image processing device 1.

[0047] The image processing device 1 comprises a communication unit 11, a storage unit 12, and a processor 13.

[0048] The communication unit 11 communicates with the camera 2 and the recorder 3.

[0049] The memory unit 12 stores programs and other data that are executed by the processor 13.

[0050] The processor 13 performs various processes by executing programs stored in the memory unit 12. In this embodiment, the processor 13 performs processes such as original image acquisition, detection target setting, processing condition setting, image processing, image recognition, judgment basis extraction, visualization, and output processing.

[0051] In the original image acquisition process, the processor 13 acquires the captured image (original image) received from the camera 2 or recorder 3 by the communication unit 11.

[0052] In the detection target setting process, the processor 13 sets the type of object to be detected by the image recognition process using the machine learning model (the condition for image recognition processing) in response to user input. In addition to setting the type of object as the detection target, the state of the object may also be set as the condition for detection. Specifically, a particular state of a particular object (for example, a person who has fallen over) may be set as the detection target.

[0053] In the processing condition setting process, the processor 13 sets the processing conditions (conditions for image processing) according to the user's operation.

[0054] In the image processing, the processor 13 processes the original image based on the processing conditions set in the processing condition setting process to generate a simulated image. Specifically, the image processing includes blurring, illumination adjustment, and virtual object overlay processing (see Figure 2).

[0055] In the image recognition process, the processor 13 uses a machine learning model (image recognition engine) to recognize the objects to be detected, as set in the detection target setting process, from the target images (original image and simulated image).

[0056] In the decision basis extraction process, processor 13 extracts decision basis information from the machine learning model on which the image recognition process was performed. Specifically, processor 13 extracts decision basis information contained in the intermediate layers of the neural network that constitutes the machine learning model. Decision basis information is information about the basis for the decision leading to the recognition result in the image recognition process using the machine learning model on the target image (original image and simulated image), that is, information representing the recognition state of the detected object in the image recognition process on the target image.

[0057] In the visualization process, processor 13 generates a heatmap that visualizes the basis for the judgment, and then generates display information of the visualization result by superimposing this heatmap onto the target image (original image and simulated image) (see Figure 2). Note that a machine learning model may be used in the visualization process, separate from the machine learning model used in the image recognition process.

[0058] During output processing, the processor 13 outputs to the display 4 a detection target setting screen (see Figure 6) that allows the user to specify the detection target, a processing condition setting screen (see Figure 7) that allows the user to specify the processing conditions, and a visualization result display screen (see Figures 9 and 10) that presents the visualization results to the user.

[0059] Next, we will explain the screens displayed on Display 4. Figure 5 is an explanatory diagram showing the original image setting screen. Figure 6 is an explanatory diagram showing the detection target setting screen. Figure 7 is an explanatory diagram showing the processing condition setting screen. Figure 8 is an explanatory diagram showing the simulated image display screen. Figures 9 and 10 are explanatory diagrams showing the visualization result display screen.

[0060] The original image setting screen 101 shown in Figure 5 is equipped with an original image display unit 102. When the user operates the original image display unit 102, an original image selection screen (not shown) is displayed. On the original image selection screen, the user can select a file of captured images stored in the recorder 3. As a result, the captured images stored in the recorder 3 are input to the image processing device 1 as original images. The original image selection screen also allows the user to select a camera 2. As a result, the captured images output in real time from camera 2 are input to the image processing device 1 as original images. Once the original images are input to the image processing device 1, the system transitions to the detection target setting screen 111 (see Figure 6).

[0061] In the detection target setting screen 111 shown in Figure 6, the original image 21 input to the image processing device 1 is displayed on the original image display unit 102.

[0062] The detection target setting screen 111 is provided with a detection target selection unit 112. When the user operates the detection target selection unit 112, a detection target list 113 is displayed. The user can select a detection target from the detection target list. In this example, the user can select people, wheelchairs, strollers, bicycles, etc., as detection targets.

[0063] Furthermore, the detection target setting screen 111 is equipped with a setting button 114 and a registration button 115. When the user operates the setting button 114, the screen transitions to the processing condition setting screen 121 (see Figure 7).

[0064] The processing condition setting screen 121 shown in Figure 7 is provided with a processing condition specification section 122 for each type of image processing. In this example, processing condition specification sections 122 are provided for each of the image processing types: blurring, illumination adjustment, and virtual object overlay processing.

[0065] The processing condition specification unit 122 allows the user to specify processing conditions. At this time, the image processing device 1 executes image processing based on the specified processing conditions, and the processed simulated image 22 is displayed in the processing condition specification unit 122. The user can visually check whether an appropriate simulated image 22 has been obtained through image processing based on the specified processing conditions by looking at the simulated image 22.

[0066] Specifically, the processing condition specification unit 122 for blurring is provided with a level adjustment unit 123. The level adjustment unit 123 allows the user to adjust the level (strength) of the blurring.

[0067] The processing condition specification unit 122 for illuminance adjustment processing is provided with a level adjustment unit 124. The level adjustment unit 124 allows the user to adjust the illuminance level (adjust the brightness).

[0068] The processing condition specification unit 122 for virtual object overlay processing is provided with a detailed settings button 125. When the user operates the detailed settings button 125, an image editing screen (not shown) is displayed. On the image editing screen, the user can perform image editing to overlay a predetermined virtual object image 23 (such as a person image) onto the original image 21.

[0069] The virtual object image 23 is an image extracted in advance from an image captured by camera 2. Furthermore, the virtual object image 23 is an image generated by computer graphics (CG). In the image editing screen (not shown), when the virtual object image 23 is superimposed on the original image 21, its position and size are adjusted according to user input. Additionally, in the case of a virtual object image 23 generated by CG using a 3D model, when the virtual object image 23 is superimposed on the original image 21, the orientation of the object is adjusted according to user input.

[0070] The user specifies processing conditions in the processing condition specification unit 122 and visually inspects the simulated image 22 displayed in the processing condition specification unit 122 to confirm that an appropriate simulated image 22 can be obtained through image processing based on the specified processing conditions. Then, the user operates the registration button 115. As a result, the simulated image 22 generated by image processing based on the specified processing conditions is registered, and the system transitions to the simulated image display screen 131 (see Figure 8).

[0071] The simulated image display screen 131 shown in Figure 8 is equipped with a simulated image display unit 132. The simulated image display unit 132 displays a simulated image 22 generated by image processing based on specified processing conditions. The simulated image display unit 132 is equipped with a scroll bar 135. By operating the scroll bar 135, the user can display the simulated image 22 outside the display range.

[0072] Furthermore, when the user operates the settings button 114, they return to the processing condition setting screen 121 (see Figure 7). Here, the user can specify different processing conditions, and when they operate the register button 115, a simulated image 22 representing a different state is registered. By repeating the above operations, multiple simulated images 22 representing different states are registered. In this example, a simulated image 22 representing a state where blurring occurs due to blurring processing, a simulated image representing a state with strong sunlight due to illumination adjustment processing, a simulated image 22 representing a state with weak sunlight, and a simulated image 22 representing a crowded state due to virtual object stacking processing are registered.

[0073] Furthermore, the simulated image display screen 131 is provided with a visualization execution button 133. After the user has finished registering the necessary simulated images 22, they operate the visualization execution button 133. This executes the image recognition process, the judgment basis extraction process, and the visualization process, and the system transitions to the visualization result display screen 141 (see Figure 9).

[0074] The visualization result display screen 141 shown in Figure 9 is provided with a raw image visualization result display unit 142. The raw image visualization result display unit 142 displays a visualization result 25 based on the raw image 21. The visualization result 25 is a heat map 27 superimposed on the raw image 21, which visualizes the basis for the judgment leading to the recognition result in the image recognition processing using a machine learning model on the raw image 21.

[0075] Furthermore, the visualization result display screen 141 is provided with a simulated image visualization result display unit 143. The simulated image visualization result display unit 143 displays a visualization result 26 based on the simulated image 22. The visualization result 26 is a heatmap 27 superimposed on the simulated image 22, which visualizes the basis for the judgment leading to the recognition result in the image recognition processing using a machine learning model on the simulated image 22.

[0076] The user can visually check whether the image recognition processing using the machine learning model is performed appropriately by looking at the visualization result 25 displayed on the original image visualization result display unit 142 and the visualization result 26 displayed on the simulated image visualization result display unit 143.

[0077] Here, the visualization result display screen 141 shown in Figure 9 is an example where there are no problems with robustness to environmental changes. In the visualization result display screen 141 shown in Figure 9, the heatmap 27 is appropriately displayed for all visualization results 26 for each simulated image 22 in the simulated image visualization result display unit 143.

[0078] On the other hand, the visualization result display screen 141 shown in Figure 10 is an example of a case where there is a problem with robustness to environmental changes. In the visualization result display screen 141 shown in Figure 10, the heatmap 27 is not displayed in part of the visualization results 26 for each simulated image 22 (example of illuminance -20) in the simulated image visualization result display unit 143. In this example, the heatmap 27 is not displayed in the visualization result 26 of the simulated image 22 which reproduces an image taken in low illuminance conditions. As a result, the user can confirm that there is a problem with recognition accuracy in low illuminance conditions because the recognition accuracy is reduced in low illuminance conditions. In this case, robustness to environmental changes can be improved by additionally performing training using images taken in low illuminance conditions to improve recognition accuracy in low illuminance conditions.

[0079] In this embodiment, the recognition status of the detection target for simulated images 22 that reproduce images taken in various situations is visualized and displayed by a heatmap 27, allowing users to easily visually confirm the validity of the machine learning model in various situations. Furthermore, if it is confirmed that there is a problem with the recognition accuracy in a particular situation, robustness to environmental changes can be improved by performing additional training using images taken in that particular situation.

[0080] Next, we will describe another example of the visualization results display screen shown on display 4. Figure 11 is an explanatory diagram showing another example of the visualization results display screen. Figure 12 is an explanatory diagram showing the procedure for calculating the validity score.

[0081] In the visualization results display screen 141 shown in Figure 11, a score image 145 (state image) with a validity score is displayed on the original image visualization results display unit 142 and the simulated image visualization results display unit 143. The validity score is a numerical representation of the accuracy of the recognition state of the detected object in the image recognition processing for the target images (original image 21, simulated image 22), that is, the validity of the machine learning model for the target images. In this example, the score image 145 is displayed superimposed on the visualization results 25 and 26 for the target images (original image 21, simulated image 22).

[0082] Furthermore, the visualization results display screen 141 is equipped with a statistical information display unit 146. The statistical information display unit 146 displays statistical information regarding the validity score of each simulated image 22. Specifically, the average value (average score), maximum value (MAX), and minimum value (MIN) of the validity score for each simulated image 22 are displayed.

[0083] In order to calculate the validity score, a rectangular frame 31 is pre-defined in the area occupied by the detection target in the original image 21, as shown in Figure 12(A). This rectangular frame 31 is input by the user as annotation information. Specifically, the user visually specifies the range of the area occupied by the detection target on the original image 21 displayed on the screen.

[0084] When calculating the validity score, as shown in Figure 12(B), the regions of the heatmap 27 included in the visualization results 25,26 for the target image (original image 21, simulated image 22) are compared with the regions to be detected (rectangular frame 31), and the validity score is calculated based on the degree of consistency (overlap rate) between the two. Here, for example, if the discrepancy between the regions of the heatmap 27 and the regions to be detected (rectangular frame 31) is small, that is, if the degree of consistency between the two is high, the validity score will be high. In addition, the validity score may be calculated by reducing the weighting of parts of the heatmap 27 that have a low contribution to the recognition result.

[0085] Next, we will explain the real-time visualization results display screen shown on display 4. Figure 13 is an explanatory diagram showing the real-time visualization results display screen.

[0086] In the examples shown in Figures 7, 8, and 9, the user specifies processing conditions for image processing to generate a simulated image on the processing conditions setting screen 121, and then operates the "Visualization Execution" button 133 on the simulated image display screen 131, which transitions to the visualization results display screen 141, where visualization results 25 and 26 for the target images (original image 21, simulated image 22) are displayed.

[0087] On the other hand, in the real-time visualization results display screen 151 shown in Figure 13, the specified processing conditions for image processing to generate the simulated image 22 are reflected in the simulated image 22 and the visualization results 26 in real time, according to the user's operation to specify the processing conditions for image processing to generate the simulated image 22.

[0088] The real-time visualization results display screen 151, like the processing condition setting screen (see Figure 7), is provided with a processing condition specification section 152 for each type of image processing. Specifically, there are processing condition specification sections 152 for blurring, illumination adjustment, and virtual object weighting processing. The processing condition specification section 152 allows the user to specify processing conditions.

[0089] Furthermore, the real-time visualization results display screen 151 is equipped with a simulated image display unit 153. The simulated image display unit 153 displays a simulated image 22 generated by image processing based on specified processing conditions, in conjunction with the user's operation on the processing condition specification unit 152.

[0090] Furthermore, the real-time visualization results display screen 151 is provided with a simulated image visualization results display unit 154. The simulated image visualization results display unit 154 displays visualization results 26 based on the simulated image 22. The visualization results 26 are a heatmap 27 superimposed on the simulated image 22, which visualizes the basis for the judgment leading to the recognition result in the image recognition processing using a machine learning model on the simulated image 22.

[0091] The user can visually inspect the simulated image 22 displayed on the simulated image display unit 153 to confirm whether an appropriate simulated image 22 is obtained through image processing based on the specified processing conditions. At the same time, the user can visually inspect the visualization result 26 displayed on the simulated image visualization result display unit 154 to confirm whether the image recognition processing using the machine learning model is performed appropriately.

[0092] Next, we will explain the visualization results screen when multiple detection targets are specified. Figure 14 is an explanatory diagram showing the visualization results screen when multiple detection targets are specified.

[0093] In the visualization results display screen 161 shown in Figure 14, the user can specify multiple detection targets in the detection target specification unit 112.

[0094] Furthermore, on the visualization results display screen 161, the visualization results 25 of the original image visualization results display unit 142 and the visualization results 26 of the simulated image visualization results display unit 143 simultaneously display heatmaps 27 for each detected object in different display formats, specifically with different colors and patterns. In this example, two types of detected objects (wheelchair and person) are selected, and for example, the heatmap of one detected object is displayed in warm colors, while the heatmap of the other detected object is displayed in cool colors.

[0095] Furthermore, the visualization results display screen 161 is provided with a legend display unit 162. The legend display unit 162 allows the user to determine which detection target the heatmap 27 displayed on the simulated image visualization results display unit 143 and the original image visualization results display unit 142 relates to.

[0096] In this example, two types of detection targets (wheelchair and person) are selected, but three or more types may be selected.

[0097] Next, we will explain another example of the visualization result display screen when multiple detection targets are specified. Figures 15 and 16 are explanatory diagrams showing another example of the visualization result display screen when multiple detection targets are specified.

[0098] The visualization results display screen 171 shown in Figures 15 and 16 has tabs 172 for selecting the object to be detected, with each tab representing a different type of object to be detected. By operating the tabs 172, the user can switch between different types of objects to be detected and display the visualization results 26.

[0099] In Figure 15, the user selects a wheelchair by operating the wheelchair tab 172, and the heatmap 27 is superimposed on the wheelchair area in the simulated image 22 of the visualization result 26. In Figure 16, the user selects a person by operating the person tab 172, and the heatmap 27 is superimposed on the person area in the simulated image 22 of the visualization result 26.

[0100] In addition, the original image visualization result display unit 142 displays a heatmap 27 as a visualization result 26 related to the detection target selected by operating the tab 172, similar to the simulated image visualization result display unit 143.

[0101] In this example, two types of detection targets (wheelchair and person) are selected, but three or more types may be selected. In this case, tab 172 will be provided for each type of detection target selected.

[0102] Next, the operation procedure of the image processing device 1 will be described. Figure 17 is a flowchart showing the operation procedure of the image processing device 1.

[0103] The image processing device 1 first acquires the original image from the camera 2 or recorder 3 (original image acquisition process) (ST101).

[0104] Next, the image processing device 1 sets the type of object to be detected by the image recognition process using the machine learning model in response to user operation (detection target setting process) (ST102).

[0105] Next, the image processing device 1 determines whether or not to proceed with registering a new simulated image in response to the user's operation (ST103). At this time, if the user operates the setting button 114 on the processing condition setting screen 121 (see Figure 7), it is determined that the user will proceed with registering a new simulated image. On the other hand, if the user operates the visualization execution button 133 on the simulated image display screen 131 (see Figure 8), it is determined that the user will not proceed with registering a new simulated image, that is, the registration of the simulated image will be terminated.

[0106] If the user chooses to proceed to registering a new simulated image (Yes in ST103), the image processing device 1 sets the processing conditions (conditions for image processing) according to the user's operation (processing condition setting process) (ST104).

[0107] Next, the image processing device 1 processes the original image based on the processing conditions set in the processing condition setting process to generate a simulated image (image processing) (ST105).

[0108] Next, the image processing device 1 registers the simulated image generated by the image processing process into the simulated image list (simulated image registration process) (ST106), and returns to ST103. At this time, the simulated image registration process is executed in response to the user operating the registration button 115 on the processing condition setting screen 121 (see Figure 1).

[0109] On the other hand, if the registration of the simulated image is completed (No in ST103), the image processing device 1 uses a machine learning model (image recognition engine) to recognize the object to be detected, which was set in the detection target setting process, from the original image and the simulated image (image recognition process) (ST107).

[0110] Next, the image processing device 1 extracts decision basis information from the machine learning model on which the image recognition process has been performed (decision basis extraction process) (ST108).

[0111] Next, the image processing device 1 generates a heatmap that visualizes the decision-making basis information, and generates display information of the visualization result by superimposing the heatmap onto the original image and the simulated image (visualization process) (ST109).

[0112] As described above, embodiments have been explained as examples of the technology disclosed in this application. However, the technology in this disclosure is not limited to these embodiments and can be applied to embodiments that have been modified, replaced, added, or omitted. Furthermore, it is possible to create new embodiments by combining the components described in the above embodiments. [Industrial applicability]

[0113] The image processing apparatus and image processing method according to the present invention have the effect of allowing system development and system operation personnel to easily visually confirm the robustness of the machine learning model against various environmental changes expected in the field, and to construct a system with high robustness. Furthermore, when a predetermined event is detected from a captured image by image recognition processing using a machine learning model, the recognition state of the event to be detected in the image recognition processing is visualized and presented to the user, making it useful as an image processing apparatus and image processing method that allows the user to visually confirm the recognition performance of the machine learning model. [Explanation of symbols]

[0114] 1 Image processing device 2 cameras 3. Recorder 4 displays 5 Input Devices 11 Communications Department 12 Storage section 13 processors 21 Original image 22 Simulated Images 23 Virtual object images 25 Visualization results 26 Visualization results 27 Heatmap (Status Image) 31 Rectangular frame

Claims

1. An image processing device that uses a machine learning model to detect a predetermined event from a captured image, and which uses a processor to perform a process to visualize the recognition state of the detected object in this image recognition process, The aforementioned processor, The original image in which the detection target is set is input, In response to user input specifying processing conditions for the original image, image processing is performed on the original image based on the specified processing conditions to generate a simulated image that reproduces the captured image under predetermined conditions. State images representing the recognition state of the detected target are generated when the image recognition process is performed on the original image and the simulated image, An image processing apparatus characterized by outputting display information of a first visualization result obtained by superimposing the state image onto the original image, and display information of a second visualization result obtained by superimposing the state image onto the simulated image.

2. The aforementioned processor, The image processing apparatus according to claim 1, characterized in that it presents a detection target setting screen to the user and sets the detection target specified by the user in accordance with the user's operation on the detection target setting screen.

3. The aforementioned processor, The image processing apparatus according to claim 1, characterized in that it presents a processing condition setting screen to the user and sets the processing conditions specified by the user in response to the user's operations on the processing condition setting screen.

4. The image processing apparatus according to claim 1, characterized in that the image processing includes at least one of blurring, illumination adjustment, and virtual object weighting.

5. The aforementioned processor, The image processing apparatus according to claim 1, characterized in that, as the state image, a grayscale image is superimposed on the simulated image, which represents the degree of contribution of each part in the simulated image to the recognition result in the image recognition process.

6. The aforementioned processor, The image processing apparatus according to claim 1, characterized in that, as the state image, a score image which numerically represents the accuracy of the recognition state of the detected object relative to the simulated image is superimposed on the simulated image.

7. The aforementioned processor, The image processing apparatus according to claim 1, characterized in that, when a user specifies a plurality of detection targets, the state images for each of the plurality of detection targets are superimposed on the simulated image in a manner that allows for identification.

8. An image processing method in which an information processing device performs a process to visualize the recognition state of the detected object in an image recognition process that detects a predetermined event from a captured image using a machine learning model, The original image in which the detection target is set is input, In response to user input specifying processing conditions for the original image, image processing is performed on the original image based on the specified processing conditions to generate a simulated image that reproduces the captured image under predetermined conditions. State images representing the recognition state of the detected target are generated when the image recognition process is performed on the original image and the simulated image, An image processing method characterized by outputting display information of a first visualization result obtained by superimposing the state image onto the original image, and display information of a second visualization result obtained by superimposing the state image onto the simulated image.