Information processing device and information processing method
The information processing device improves class identification accuracy in edge devices by using a user-corrected teacher model for retraining, addressing the challenge of new classes and resource limitations.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- SONY SEMICON SOLUTIONS CORP
- Filing Date
- 2025-09-29
- Publication Date
- 2026-07-02
Smart Images

Figure JP2025034353_02072026_PF_FP_ABST
Abstract
Description
INFORMATION PROCESSING DEVICE AND INFORMATION PROCESSING METHODCROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of Japanese Priority Patent Application JP 2024-226108 filed on December 23, 2024, the entire contents of which are incorporated herein by reference.
[0002] The present technology relates to an information processing device and a method thereof and particularly relates to a technical field related to a retraining process of an artificial intelligence (AI) model.
[0003] There is a technology for performing various types of image recognition processes using an AI model for an image captured by an imaging device. The image recognition process here broadly means processing for recognizing image content, such as an object detection process and an object recognition process.
[0004] The result of the image recognition process performed on the captured image is used to perform various analyses on the subject. For example, a plurality of imaging devices is installed in a target site such as a store, and the object detection process for a person is performed on captured images of the imaging devices using an AI model. Then, on the basis of the result of the object detection process, one could analyze the subject, for example, counting the number of people or analyzing movement of a person.
[0005] As a system for realizing various analyses on a subject as described above, there is a configuration in which an AI model is mounted in an on-site device such as an imaging device, and a server device arranged remotely from the site performs a management process of the AI model. Hereinafter, the on-site device in such a system is referred to as an “edge device”. Examples of the management process of the AI model performed by the server device include a process of deploying the AI model made available to the edge device by purchase or the like by the user, a process of retraining the AI model, and the like. Note that “deploy” means processing of transmitting the AI model to the edge device so that the AI model can be used by the edge device. Here, the retraining of the AI model is performed as training adapted to the site environment using the captured image in the actual site in order to prevent the inference performance from deteriorating due to a change in the use environment of the imaging device or the like.
[0006] Note that PTLs 1 and 2 below can be cited as related examples of the background art. PTL 1 below discloses a technique for performing labeling (annotation) on individual objects included in a captured image using a trained image recognition unit. In addition, PTL 2 below discloses a technique that includes an adding unit that adds (annotates) identification information to image data using a trained training model, and a correction unit that corrects the identification information added by the adding unit according to a correction request, and updates the training model (retrains the AI model) using the image data in which the identification information has been corrected by the correction unit.
[0007] Japanese Patent No. 7055259Japanese Patent No. 7390628Summary
[0008] Here, as the change in the use environment of the imaging device, it can be assumed that an object of a new class that has not been set as an imaging target in the past is included in the imaging target. For example, in a case where the imaging device is used for managing product inventory in a store, a case where a new product is displayed in the store is considered.
[0009] In this case, the AI model used in the edge device must be trained so that an object of a new class can be identified. In this case, the user must perform annotation work on a captured image (an image in which an object of a new class is included as an imaging target) used for retraining, which imposes a workload for retraining.
[0010] In addition, since the AI model used in the edge device is a relatively small model due to local resource availability, it is difficult to train the AI model such that it appropriately identifies a new class if the AI model used in the edge device is directly retrained using annotation information reflecting a new class name as teacher data. There is also a possibility that class identification accuracy is deteriorated.
[0011] The present technology has been made in view of the above circumstances, and it is desirable to improve class identification accuracy of an AI model while reducing a workload of a user in a case of performing retraining of adding an identifiable class for the AI model used in the object detection process of an edge device.
[0012] An information processing device according to the present technology includes processing circuitry configured to perform an annotation process on an input image using a first AI model; receive a correction of the annotation process by a user; retrain the first AI model based on the correction; and after retraining the first AI model, retrain a second AI model used in an edge device with the first AI model as a teacher model.
[0013] Fig. 1 is a block diagram illustrating a configuration example of an information processing system as an embodiment.Fig. 2 is a block diagram illustrating a configuration example of an imaging device.Fig. 3 is a block diagram illustrating a hardware configuration example of an information processing device as the embodiment.Fig. 4 is a diagram illustrating an example of a service Top screen.Fig. 5 is a diagram illustrating an example of an edge model management Top screen.Fig. 6 is a diagram illustrating an example of a job setting screen.Fig. 7 is a diagram illustrating an example of an upload / training type setting screen.Fig. 8 is a diagram illustrating an example of an edge model management Top screen in a case where the retraining process is being executed.Fig. 9 is an explanatory diagram of an evaluation result screen.Fig. 10 is an explanatory diagram of an evaluation result screen in the same manner.Fig. 11 is a diagram illustrating an example of a deployment management screen.Fig. 12 is a functional block diagram for describing various functions according to a retraining method as an embodiment.Fig. 13 is an explanatory diagram of a retraining method of an edge model in a case of “with class addition”.Fig. 14 is a diagram illustrating an example of a correction reception screen.Fig. 15 is a diagram for describing an example of correction of a class name.Fig. 16 is an explanatory diagram of the edge model retraining method in the case of “without class addition”.Fig. 17 is a flowchart illustrating a specific processing procedure example to be executed to realize a retraining method as an embodiment.Fig. 18 is a flowchart of processing related to an evaluation of a retrained edge model.Fig. 19 is a flowchart of processing related to deployment of a retrained edge model.
[0014] Hereinafter, embodiments according to the present technology will be described in the following order with reference to the accompanying drawings. <1. Configuration of information processing system> (1-1. System overview) (1-2. Configuration example of imaging device) (1-3. Configuration example of information processing device) <2. Screen transition example related to retraining> <3. Retraining method as embodiment> <4. Processing procedure> <5. Modification> <6. Summary of embodiments> <7. Present technology>
[0015] <1. Configuration of information processing system> (1-1. System overview) Fig. 1 is a schematic explanatory diagram of an information processing system as an embodiment including an information processing device as the embodiment according to the present technology. As illustrated, the information processing system as an embodiment includes a server device 1, an imaging device 2, and a user terminal 3. The server device 1 and the user terminal 3 are each configured as a computer device including a microcomputer including a central processing unit (CPU), a read only memory (ROM), and a random access memory (RAM). The server device 1 corresponds to an example of an information processing device as the embodiment according to the present technology.
[0016] The server device 1 is configured to be able to perform data communication with each of the imaging device 2 and the user terminal 3 via a network NT serving as a communication network such as the Internet.
[0017] The imaging device 2 images a subject to obtain a captured image. Here, in the present specification, “imaging” broadly means obtaining image data capturing a subject. The image data mentioned here is a generic term for data including a plurality of pieces of pixel data, and the pixel data is a concept widely including not only data indicating the intensity of the amount of light received from the subject but also, for example, distance to the subject, polarization information about the subject, temperature information, and the like. That is, the “image data” (captured image data) obtained by the “imaging” includes data as a gradation image indicating information about an amount of received light for each pixel, data as a distance image indicating information about a distance to a subject for each pixel, data as a polarized image indicating polarization information about incident light for each pixel, data as a thermal image indicating information about temperature for each pixel, and the like.
[0018] As an example, it is assumed that the imaging device 2 in the present example is configured to obtain the above-described gradation image as a captured image as in a general digital camera device. Specifically, it is assumed that an RGB color image is obtained as a captured image.
[0019] In the information processing system of the present example, a plurality of imaging devices 2 is provided at a target site 100. The site 100 can vary depending on the use of the imaging device 2, and for example, in the case of use for monitoring a person such as a customer in a store, the site 100 is the store, or in the case of use for monitoring a vehicle or the like in a parking lot, the site 100 is the parking lot.
[0020] In the information processing system, the server device 1 is a computer device assumed to be used by a provider of a service using the information processing system. In addition, the user terminal 3 is a computer device assumed to be used by a user who receives the service.
[0021] The information processing system of the present example is configured as a system for performing an inference process using an artificial intelligence (AI) model, specifically, an image recognition process on a captured image obtained by the imaging device 2 as a target, and generating analysis information indicating an analysis result for a subject on the basis of a processing result (inference result) and presenting the analysis information to a user.
[0022] The image recognition process here broadly means processing of recognizing image content. Examples of the image recognition process include the object detection process of detecting a region where an object exists, the object recognition process of recognizing what object an object appearing in an image is, the semantic segmentation process, and the abnormality detection process such as PatchCore. Here, the object detection process includes not only detection of a region where an object exists, such as You Only Look Once (YOLO) or Single Shot multibox Detector (SSD) but also processing of recognizing what object the object is.
[0023] In the present embodiment, as the image recognition process using the AI model, it is assumed that not only the detection of the region where the object exists like YOLO and SSD described above but also the object detection process (hereinafter, simply referred to as an “object detection process”) of a type recognizing what object the object is performed.
[0024] Here, in a case where a service of presenting the analysis information about the subject based on the inference result to the user is assumed as described above, the application of the imaging device 2 can include, for example, applications such as monitoring of the inside such as a store, an office, a house, or the like, monitoring (including a traffic monitoring camera or the like) of the outside such as a parking lot, a town, or the like, monitoring of a manufacturing line in factory automation (FA) or industrial automation (IA), and monitoring of the inside and the outside of the vehicle.
[0025] For example, in a case where the camera is used as a monitoring camera for a store, one could arrange a plurality of imaging devices 2 at predetermined positions in the store so as to allow the user to check customer demographics (gender, age group, or the like), customer actions (movement) within the store, and the like. In this case, one could generate, as the above-described analysis information, information about the customer demographics, information about the movement in the store, information about a congestion status at a checkout (for example, information about a waiting time at the checkout), and the like. Alternatively, in a case where the camera is used as a traffic monitoring camera, one could arrange each imaging device 2 at a position near a road so as to allow the user to recognize information such as a license number (vehicle number), a vehicle color, a vehicle type, and the like regarding a passing vehicle, and in this case, one could generate, as the above-described analysis information, information such as the license number, the vehicle color, the vehicle type, and the like.
[0026] Furthermore, in a case where the camera is used as a traffic monitoring camera for a parking lot, one could arrange the imaging device 2 so as to be able to monitor each parked vehicle, monitor whether or not there is a suspicious person acting suspiciously around each vehicle, and in a case where there is a suspicious person, make a notification of a fact that there is a suspicious person, an attribute (gender, age group, clothes, or the like) of the suspicious person, and the like. Moreover, one could monitor a street or an available space in the parking lot and notify the user of a place of the available parking space or the like.
[0027] Here, in the present example, the object detection process for the captured image is performed by the imaging device 2. That is, an AI model for performing the object detection process is set in the imaging device 2, and information (hereinafter, also referred to as “inference result information”) indicating a result of the object detection process using the AI model is transmitted to the server device 1. The server device 1 performs the above-described various types of analysis processing on the basis of the inference result information transmitted from the imaging device 2 in this manner.
[0028] By adopting the method in which the imaging device 2 performs the inference process to transmit the inference result information to the server device 1 in this manner, it is possible to significantly reduce the amount of communication data in implementing the inference process as compared with a case where the server device 1 performs the inference process on the captured image transmitted from the imaging device 2. In addition, since it is not necessary to transmit the captured image from the imaging device 2 to the server device 1, it is possible to prevent the captured image including the personal information from being leaked to the outside, and it is possible to protect privacy.
[0029] In the present example, the AI model used by the imaging device 2 is deployed from the server device 1 to the imaging device 2. “Deploy” referred to here means processing of transmitting the AI model to the edge device so that the AI model can be used by the edge device. In the present example, the imaging device 2 is owned by the user, and the AI model can be deployed from the server device 1 to the imaging device 2 by the user purchasing the right to use the AI model by paying a fee to the service operator.
[0030] Furthermore, in the information processing system of the present example, the server device 1 performs the process of retraining the AI model used by the imaging device 2. The AI model used by the imaging device 2 (in the present example, the AI model purchased by the user) is managed by the server device 1, but the AI model is assumed to have been subjected to the basic training process for each application of the object detection process at the time of purchase by the user. For example, training is performed so that a person can be detected in the case of the application of detecting a person, and training is performed so that a product can be detected in the case of the application of detecting a product. Alternatively, in the case of the application of detecting a license plate of a vehicle, training is performed so that the license plate can be detected. The retraining process here means that the AI model (hereinafter referred to as “template model”) after the basic training is performed in this manner is trained again so as to adapt to the environment to the site 100. It is assumed that the retraining process is performed again for the AI model which was trained to adapt to the environment of the site 100. Specifically, it is assumed that the retraining process of “with class addition” as described later is performed as the training process for still another time.
[0031] The retraining process of the AI model is performed by, for example, disposing the imaging device 2 in the site 100 in an arrangement form similar to that at the time of operation before starting the service and using the captured image of the arranged imaging device 2 as the training image.
[0032] Note that, in the above description, an example is described in which the number of imaging devices 2 is plural, but in the information processing system as an embodiment, the number of imaging devices 2 may be at least one or more. Furthermore, in Fig. 1, the number of user terminals 3 in the information processing system is one, but a plurality of user terminals 3 may be provided. That is, it is assumed that the number of users who receive the service by the information processing system is plural.
[0033] Furthermore, in the above description, an example is described in which the object detection process using the AI model is performed by the imaging device 2, but it is not essential to perform the object detection process using the AI model by the imaging device 2. For example, one could adopt a configuration in which a computer device (information processing device) such as a fog server capable of communicating with each imaging device 2 and the server device 1 has an AI model for performing the object detection process, and the computer device performs the object detection process using the AI model on an image captured by the imaging device 2 (one or a plurality of imaging devices). In this case, the computer device transmits the inference result information about the object detection process to the server device 1. For example, in a case where the imaging device 2 is used for monitoring a store, the computer device such as the fog server is assumed to be arranged in the store, or in a case where the user is a company having a plurality of stores, the computer device is assumed to be arranged in a facility different from the store such as a data center managed by the company.
[0034] At this time, as viewed from the server device 1 (that is, the cloud) that manages the AI model, the computer device such as the imaging device 2 and the fog server can be regarded as a device arranged at the edge. In the present specification, a device arranged at the edge as viewed from the server device 1 as described above is referred to as an “edge device”. In addition, the AI model used by the edge device for the object detection process is referred to as an “edge model”.
[0035] (1-2. Configuration example of imaging device) Fig. 2 is a block diagram illustrating a configuration example of the imaging device 2. As illustrated, the imaging device 2 includes an imaging optical system 41, an image sensor 42, an optical system drive unit 43, a camera control unit 44, a memory unit 45, and a communication unit 46. The image sensor 42, the camera control unit 44, the memory unit 45, and the communication unit 46 are connected via a bus BS and can perform data communication with each other.
[0036] In the present example, the image sensor 42 is configured as a gradation image sensor that obtains the above-described gradation image. Specifically, the image sensor 42 is configured as a solid-state imaging element such as a charge coupled device (CCD) type, a complementary metal oxide semiconductor (CMOS) type, or the like, for example.
[0037] The imaging optical system 41 includes lenses such as a cover lens, a zoom lens, and a focus lens, and a diaphragm (iris) mechanism. Light (incident light) from a subject is guided by the imaging optical system 41 and condensed on a light receiving surface (imaging surface) of the image sensor 42.
[0038] The optical system drive unit 43 comprehensively represents drive units of the zoom lens, the focus lens, and the diaphragm mechanism included in the imaging optical system 41. Specifically, the optical system drive unit 43 includes an actuator for driving each of the zoom lens, the focus lens, and the diaphragm mechanism, and a drive circuit of the actuator.
[0039] The camera control unit 44 includes, for example, a microcomputer including a CPU, a ROM, and a RAM, and performs the overall control of the imaging device 2 by causing the CPU to perform various processes in accordance with a program stored in the ROM or a program loaded in the RAM.
[0040] Furthermore, the camera control unit 44 instructs the optical system drive unit 43 to drive the zoom lens, the focus lens, the diaphragm mechanism, and the like. The optical system drive unit 43 moves the focus lens and the zoom lens, opens or closes a diaphragm blade of the diaphragm mechanism, or the like in response to such a drive instruction.
[0041] Furthermore, the camera control unit 44 controls the writing and reading of various types of data to and from the memory unit 45. The memory unit 45 is a nonvolatile storage device such as a hard disk drive (HDD) or a flash memory device, for example, and is used for storing data used in a case where the camera control unit 44 executes various processes. Furthermore, the memory unit 45 can also be used as a storage destination (recording destination) of the image data output from the image sensor 42.
[0042] Furthermore, the camera control unit 44 performs various pieces of data communication with an external device via the communication unit 46. The communication unit 46 in the present example is configured to be able to perform communication via the network NT illustrated in Fig. 1 and to be able to perform data communication with an external device connected to the network NT, in particular, at least the server device 1 in the present example.
[0043] As illustrated in the drawing, the image sensor 42 includes an imaging unit 51, an image signal processing unit 52, an in-sensor control unit 53, an AI processing unit 54, a memory unit 55, and a communication interface (I / F) 56, which are connected via a bus 57 and can perform data communication with each other.
[0044] The imaging unit 51 includes a pixel array unit in which pixels each having a light receiving element (photoelectric conversion element) such as a photodiode are two-dimensionally arranged, and a reading circuit that reads an electric signal (light reception signal) obtained by photoelectric conversion from each pixel included in the pixel array unit. The reading circuit performs, for example, a correlated double sampling (CDS) process, an automatic gain control (AGC) process, and the like on the electric signal obtained by the photoelectric conversion and further performs an analog / digital (A / D) conversion process on the electric signal.
[0045] In the imaging unit 51 of the present example, a color filter that selectively transmits light of any color of R, G, and B is formed for each pixel so that an RGB color image can be obtained as a captured image. In the present example, the array mode of the color filter in the imaging unit 51 is, for example, a Bayer array mode of RGGB. Note that the Bayer array is merely an example, and other modes such as RYYB and RGBW may be used as the array mode (mosaic mode) of the color filters.
[0046] The imaging unit 51 outputs an RAW image as a captured image. The RAW image referred to herein means a digital captured image immediately after A / D conversion of a signal read from the pixel array unit. The captured image as the RAW image output from the imaging unit 51 is input to the image signal processing unit 52.
[0047] The image signal processing unit 52 performs preprocessing, synchronization processing, YC generation processing, codec processing, and the like on a captured image as a RAW image. In the preprocessing, clamp processing for clamping a black level to a predetermined level, correction processing between R, G, and B color channels, and the like are performed. In the preprocessing, adjustment processing related to brightness such as gamma correction processing, and adjustment processing related to color such as white balance adjustment processing and linear matrix processing are performed. The linear matrix processing is processing of correcting a color reproduction error and is processing of performing a predetermined matrix operation on RGB to perform color correction suitable for a desired color space.
[0048] In the synchronization processing, color separation processing is performed so that image data for each pixel has all the R, G, and B color components. For example, as in the present example, in a case of an imaging element using a color filter of Bayer array, demosaicing processing is performed as the color separation processing. In the YC generation processing, a luminance (Y) signal and a color (C) signal are generated (separated) from the image data of R, G, and B.
[0049] In the codec processing, for example, encoding processing for recording or communication and file generation are performed on the image data subjected to the various types of processing described above. In the codec processing, it is possible to generate a file in a format such as a moving picture experts group (MPEG)-2 or H.264 as a moving image file format. It is possible to generate a file in a format such as joint photographic experts group (JPEG), tagged image file format (TIFF), or graphics interchange format (GIF) as a still image file.
[0050] The in-sensor control unit 53 includes a microcomputer including, for example, a CPU, a ROM, a RAM, and the like, and integrally controls the operation of the image sensor 42. For example, the in-sensor control unit 53 performs execution control of the imaging operation by issuing an instruction to the imaging unit 51. In addition, the in-sensor control unit 53 controls execution of processing for the image signal processing unit 52.
[0051] The AI processing unit 54 includes a programmable arithmetic processing device such as a digital signal processor (DSP) or a field programmable gate array (FPGA), for example, and performs the inference process (AI processing) using an AI model on the captured image.
[0052] As understood from the above description, in the present example, the AI model used by the AI processing unit 54 is an AI model that performs the object detection process. Furthermore, as understood from the above description, the AI model used by the AI processing unit 54 for the object detection process is an AI model as an edge model.
[0053] The memory unit 55 is used to store data necessary for the AI processing unit 54 to perform the object detection process. Specifically, the memory unit 55 stores data of an AI model used by the AI processing unit 54 for the object detection process. Note that, for example, in a case where the AI model includes a neural network such as a convolutional neural network (CNN), the data of the AI model here corresponds to a parameter indicating a structure of the neural network, data of a parameter as a filter coefficient used in convolution processing, or the like.
[0054] The communication interface (I / F) 56 is an interface that communicates with each unit connected via the bus BS, such as the camera control unit 44 and the memory unit 45 outside the image sensor 42. For example, the communication interface 56 performs communication for acquiring an AI model and the like used by the AI processing unit 54 from the outside on the basis of the control of the in-sensor control unit 53. Furthermore, result information (inference result information) and the like of the object detection process by the AI processing unit 54 can be output to the outside of the image sensor 42 via the communication interface 56.
[0055] (1-3. Configuration example of information processing device) Fig. 3 is a block diagram illustrating a hardware configuration example of the server device 1. Note that a computer device as the user terminal 3 illustrated in Fig. 1 may also adopt a hardware configuration similar to that illustrated in Fig. 3.
[0056] As illustrated, the server device 1 includes a processor 11. The processor 11 includes at least a CPU and executes various processes according to a program stored in the ROM 12 or a program loaded from the storage unit 19 to a RAM 13. Here, in the server device 1 of the present example, the processor 11 includes a graphics processing unit (GPU) in addition to a CPU in order to perform various types of image signal processes related to the retraining process of the AI model.
[0057] The RAM 13 appropriately stores data and the like necessary for the processor 11 to execute various types of processing. The processor 11, the ROM 12, and the RAM 13 are connected to one another via a bus 14. An input / output interface (I / F) 15 is connected to the bus 14.
[0058] An input unit 16 including an operation element or an operation device is connected to the input / output interface 15. For example, as the input unit 16, various types of operation elements and operation devices such as a keyboard, a mouse, a key, a dial, a touch panel, a touch pad, and a remote controller are assumed. A user's operation is detected by the input unit 16, and a signal corresponding to the input operation is interpreted by the processor 11.
[0059] Furthermore, a display unit 17 including a liquid crystal display (LCD), an organic electro-luminescence (EL) panel, or the like, and an audio output unit 18 including a speaker or the like are integrally or separately connected to the input / output interface 15. The display unit 17 is used for displaying various types of information, and includes, for example, a display device provided in a housing of a computer device, a separate display device connected to the computer device, or the like.
[0060] The display unit 17 executes display of an image for various types of image processing, a moving image to be processed, and the like on a display screen on the basis of an instruction from the processor 11. In addition, the display unit 17 displays various operation menus, icons, messages, and the like, that is, as a graphical user interface (GUI), on the basis of an instruction from the processor 11.
[0061] In some cases, the storage unit 19 including an HDD, a solid-state memory, or the like, and the communication unit 20 including a modem or the like are connected to the input / output interface 15.
[0062] The communication unit 20 performs communication processing via a transmission path such as the Internet and performs wired / wireless communication with various devices and communication based on bus communication or the like.
[0063] Furthermore, a drive 21 is connected to the input / output interface 15 as necessary, and a removable recording medium 22, such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory, is mounted appropriately.
[0064] Data (including a computer program or the like) used for each process can be read from the removable recording medium 22 by the drive 21. The read data is stored in the storage unit 19, and in a case where the read data is image data or audio data, an image or a voice is output by the display unit 17 or the audio output unit 18. Furthermore, a computer program or the like read from the removable recording medium 22 is installed in the storage unit 19 as necessary.
[0065] In the server device 1 having the hardware configuration as described above, for example, software for the processing of the present embodiment can be installed via network communication by the communication unit 20 or the removable recording medium 22. Alternatively, the software may be stored in advance in the ROM 12, the storage unit 19, or the like. The processor 11 performs processing operations on the basis of various programs, thereby executing information processing and communication processing necessary for the server device 1.
[0066] Note that the server device 1 is not limited to a single computer device as illustrated in Fig. 3 and may be configured by systematizing a plurality of computer devices. The plurality of computer devices may be systematized by a local area network (LAN) or the like or may be arranged in a remote place by a virtual private network (VPN) or the like using the Internet or the like. The plurality of computer devices may include a computer device as a server group (cloud) that can be used by a cloud computing service.
[0067] <2. Screen transition example related to retraining> As described above, in the information processing system according to the embodiment, the server device 1 retrains the AI model as the edge model held by the user. First, a transition example of a screen presented to the user regarding such a retraining process will be described. For confirmation, various screens described below are displayed on a display unit (corresponding to the display unit 17 in Fig. 3) that displays various types of information for the user in the user terminal 3. Various screens are displayed on the display unit of the user terminal 3 on the basis of control by the server device 1 (control by the processor 11 in the present example).
[0068] Fig. 4 illustrates an example of a service Top screen G1 which is a Top screen for a service provided to a user in the information processing system. Although illustration is omitted, the user is required to perform a login operation accompanied by input of account information in a case of accessing the service Top screen.
[0069] Various buttons related to services that can be provided in the information processing system are arranged on the service Top screen G1. Specifically, in the present example, at least a device management button B1, an edge model management button B2, and a deployment management button B3 are arranged as illustrated in the drawing. The device management button B1 is a button for calling a device management screen displaying various types of management information about the imaging device 2 (edge device) held by the user.
[0070] The edge model management button B2 is a button for calling a management screen of an edge model (an edge model management Top screen G2 described below) that displays various types of management information about the edge model held by the user. Furthermore, the deployment management button B3 is a button for calling a screen (a deployment management screen G6 as described later) for performing an operation of deploying an edge model to an edge device (the imaging device 2 in the present example).
[0071] Fig. 5 illustrates an example of an edge model management Top screen G2 displayed in response to the operation of the edge model management button B2. The edge model management Top screen G2 is a screen on which executed job information about a job executable for an edge model held by the user is displayed. In the present example, there are three types of executable jobs for the edge model: “retraining”, “parameter tuning”, and “performance evaluation”. Note that “parameter tuning” means tuning a parameter of an edge model, specifically, a parameter such as a threshold value for likelihood (reliability score calculated for each class) used for determination for obtaining a final class identification result.
[0072] On the edge model management Top screen G2, as the job information, information about the job name, information about the target model, and information about the process are displayed as illustrated in the drawing. The information about the target model means identification information about an edge model that is a job execution target. The information about the process is information indicating the progress of the job. Here, since a case where only job information about the job executed in the past is displayed is illustrated, all pieces of information about the process are “executed” as illustrated. As illustrated, a result display button B4 is displayed for the executed job. The result display button B4 is a button for instructing execution of performance evaluation of the target model and display of the evaluation result.
[0073] On the edge model management Top screen G2, a job addition button B5 is provided, and the user can add a new job by operating the job addition button B5.
[0074] Fig. 6 illustrates an example of a job setting screen G3 displayed in response to the operation of the job addition button B5. The job setting screen G3 is a screen for setting a job name, a target model, and a job type for a job to be added, and as illustrated in the drawing, an input box bi for inputting a job name, an input box bi for inputting a target model, and a check box cb for selecting a job type are arranged. The input box bi of the job name is a box to which any text information can be input. In addition, in the present example, the input box bi of the target model is a box in which an edge model can be selected and input by a pull-down method. The selection candidate is an edge model held by the user.
[0075] The job type can be selected from the items of “retraining”, “parameter tuning”, and “performance evaluation” described above, and a check box cb is arranged for each of these items.
[0076] In a case where the user instructs the selected target model to execute a job according to the selected job type, the user operates a next button B7 provided on the job setting screen G3. In addition, a return button B6 is arranged on the job setting screen G3, and the user can return the screen to the edge model management Top screen G2 illustrated in Fig. 5 by operating the return button B6.
[0077] Although not illustrated, in a case where the “performance evaluation” is selected as the job type and the next button B7 is operated, the screen transitions to the edge model management Top screen G2 illustrated in Fig. 5, and the corresponding job information is displayed on the edge model management Top screen G2 together with the display of the process information indicating “in execution”. The process information is updated to “executed” according to the completion of the process, and the result display button B4 is displayed for the job information. The user can display a result screen (Details will be described with reference to Figs. 9 and 10) of the performance evaluation for the selected edge model by operating the result display button B4.
[0078] Fig. 7 illustrates an example of an upload / training type setting screen G4 displayed in a case where “retraining” is selected on the job setting screen G3 and the next button B7 is operated. The upload / training type setting screen G4 is a screen for setting image data to be uploaded and setting (selecting) the type of retraining in a case where the training image to be used for retraining of the edge model is uploaded to the server device 1.
[0079] As described above, in the present example, a captured image by each imaging device 2 installed in the site 100 is used as the training image. For example, the user stores a captured image (captured image group) to be a training image by each imaging device 2 in the user terminal 3, designates the stored captured image as an upload image, and causes the user terminal 3 to transmit the image to the server device 1.
[0080] Furthermore, in the present example, two types of “with class addition” and “without class addition” are set as the type of retraining. “With class addition” means retraining for adding a function of identifying an object of a new class to the edge model. “Without class addition” means retraining for adapting to a so-called domain change such as a change in a background or a change in an illumination condition at the site 100, for example, without adding a function of identifying an object of such a new class.
[0081] As illustrated in the drawing, on the upload / training type setting screen G4, a select button B8 and a display box di are provided, and a check box cb for selecting the retraining type is arranged. In a case where the select button B8 is operated, directory information about the data file stored in the user terminal 3 is displayed in the display box di. The user can designate a directory of a data file to be uploaded (that is, a data file as a captured image group captured by the imaging device 2) on the basis of the directory information displayed in this manner. Furthermore, the user can select (set) the type of retraining to be executed by operating the check box cb.
[0082] On the upload / training type setting screen G4, a return button B9 and an execution button B10 are arranged. The user can return the screen to the job setting screen G3 illustrated in Fig. 6 by operating the return button B9.
[0083] In a case where the execution button B10 is operated, a data file (captured image group) of a directory designated in the display box di is uploaded from the user terminal 3 to the server device 1. Then, in the server device 1, the edge model selected on the previous job setting screen G3 is retrained using the captured image group uploaded in this manner as a retraining image group. Here, details of the retraining process of “with class addition” and “without class addition” will be described later again.
[0084] Note that the screen on which the setting and the execution instruction for uploading the training image are given and the screen on which the type of retraining is set and the execution instruction for retraining is given could be provided as separate screens.
[0085] In a case where the retraining process is started, the edge model management Top screen G2 illustrated in Fig. 8 is displayed on the display unit of the user terminal 3. That is, the edge model management Top screen G2 displays the job information indicating that the added retraining job is being executed. As in the job of “performance evaluation” described above, information indicating “in execution” is displayed as process information for the job being executed. Also in this case, the process information is updated to “executed” according to the completion of the process, and the result display button B4 is displayed for the job information. By operating the result display button B4, the user can instruct the execution of the performance evaluation of the retrained edge model and the display of an evaluation result screen G5 indicating the result of the performance evaluation.
[0086] Figs. 9 and 10 are explanatory diagrams of the evaluation result screen G5. As the performance evaluation process of the edge model, the server device 1 of the present example performs a process of calculating evaluation information by numerical values such as accuracy (accuracy rate), precision (precision rate), and recall (reproduction rate). The evaluation result screen G5 is basically a screen displaying such evaluation information by numerical values, but the evaluation result screen G5 of the present example is configured as a screen capable of displaying image preview information visually indicating the detection situation of the object (in the present example, the detection situation of the bounding box) by an image together with the display of the evaluation information by numerical values.
[0087] On the evaluation result screen G5, a result display region Ae for displaying the evaluation result is provided, and an image preview tab T1 and a numerical information tab T2 for selecting display of image preview information and display of evaluation information by numerical values, respectively, are provided. Fig. 9 illustrates a state in which the image preview tab T1 is selected and the image preview information is displayed, and Fig. 10 illustrates a state in which the numerical information tab T2 is selected and the numerical value evaluation information is displayed. As illustrated in Fig. 9, in the present example, the image preview information is displayed as information indicating correct answer information (Ground Truth) and detection information (Predicted) for the bounding box in comparison for each image.
[0088] On the evaluation result screen G5, a return button B11 and an export button B12 is provided, and the user can return the screen to the edge model management Top screen G2 by operating the return button B11 and can export the evaluation information (evaluation information by at least a numerical value) in a predetermined file format by operating the export button B12.
[0089] Fig. 11 illustrates an example of a deployment management screen G6 displayed in response to the operation of the deployment management button B3 on the service Top screen G1. As illustrated in the drawing, on the deployment management screen G6, the input box bi for selecting and inputting an edge model to be deployed, the input box bi for selecting and inputting an edge device to be deployed, and a return button B13 and a deploy button B14 are provided.
[0090] In the present example, the input box bi for selecting and inputting an edge model and the input box bi for selecting and inputting an edge device to be a deployed are pull-down type selection input boxes, a list of edge models owned by the user is displayed in the former input box bi, and the user can select an edge model to be deployed from the list. In addition, a list of edge devices owned by the user is displayed in the latter input box bi, and the user can select an edge device to be deployed from the list.
[0091] On the deployment management screen G6, the user can return the screen to the service Top screen G1 by operating the return button B13. In addition, by operating the deploy button B14, the user can instruct the server device 1 to deploy the edge model designated by the selection input operation using each of the input boxes bi to the edge device designated.
[0092] <3. Retraining method as embodiment> Next, a retraining method as an embodiment will be described. Fig. 12 is a functional block diagram illustrating various functions related to the retraining method as the embodiment, the functions being included in the processor 11 of the server device 1. Specifically, here, the function related to the retraining of “with class addition” described above will be described.
[0093] As illustrated, the processor 11 has functions as an annotation processing unit F1, a reception processing unit F2, a large scale model retraining processing unit F3, an edge model retraining processing unit F4, an evaluation processing unit F5, and a deployment processing unit F6.
[0094] The annotation processing unit F1 performs the annotation process on the input image using the large scale AI model. The large scale AI model here means an AI model that uses a resource larger than the AI model (edge model) deployed in the edge device and has higher inference performance.
[0095] The reception processing unit F2 receives correction by a user of the annotation result by the annotation processing unit F1.
[0096] The large scale model retraining processing unit F3 retrains the large scale AI model using the annotation information corrected by the user.
[0097] The edge model retraining processing unit F4 retrains the edge model by knowledge distillation with the large scale AI model after retraining as a teacher model.
[0098] The evaluation processing unit F5 performs the performance evaluation process for the edge model retrained by the edge model retraining processing unit F4. Specifically, in the present example, the processing of calculating the evaluation information by numerical values as exemplified above in Fig. 10 is performed for the retrained edge model.
[0099] The deployment processing unit F6 deploys the edge model retrained by the edge model retraining processing unit F4 to the edge device. The deployment processing unit F6 in the present example performs a process of deploying the edge model designated (selected) on the deployment management screen G6 illustrated in Fig. 11 above to the edge device (in the present example, the imaging device 2) designated (selected) on the deployment management screen G6. That is, the deployment processing unit F6 of the present example deploys the edge model retrained by the edge model retraining processing unit F4 to the edge device in accordance with the designation of the edge model on the deployment management screen G6.
[0100] Details of the retraining process in a case of “with class addition” for the edge model from the annotation processing unit F1 to the edge model retraining processing unit F4 will be described with reference to Fig. 13. In the figure, the retraining image group represents a plurality of training images uploaded by the user to the server device 1. In the present example, the retraining image group corresponds to a captured image of each imaging device 2 set as an upload target on the upload / training type setting screen G4 illustrated in Fig. 7. Note that, for confirmation, it is a premise that the retraining image group used for retraining of “with class addition” includes an image in which an object of a class to be added is reflected in a subject.
[0101] First, as illustrated in A of Fig. 13, the annotation processing unit F1 performs the annotation process on the retraining image group using the large scale AI model. Here, as the large scale AI model, a large scale AI model that performs the object detection process and is configured to output region information indicating an object detection region even for a class unidentifiable object is used. Hereinafter, the large scale AI model satisfying such a condition is referred to as a “large scale AI model ML”. In the present example, it is assumed that as the large scale AI model ML, the edge model to be retrained is trained so that class identification can be performed for objects of all classes to be identified. Here, as an example for description, a case where the edge model to be retrained is an AI model capable of identifying various products as a class will be described as an example. In addition, the edge model may be an AI model or the like that can identify various persons (age, sex, etc.).
[0102] In the annotation processing, the annotation information about each training image is generated using the information about the class identification result by the large scale AI model ML. The annotation information for each image includes information (for example, information indicating the position and the range of the bounding box) indicating a region of the detected object and information indicating an identified class name.
[0103] Next, the reception processing unit F2 receives correction by the user of the annotation information.
[0104] Fig. 14 illustrates an example of a correction reception screen G7 displayed by the reception processing unit F2 to receive correction by the user of the annotation information. In the present example, the correction reception screen G7 is displayed on the display unit of the user terminal 3 in response to the completion of the annotation processing by the annotation processing unit F1 after the execution instruction of the retraining of “with class addition” for the edge model is given.
[0105] As illustrated in the drawing, the correction reception screen G7 is provided with a target image display region Ap for displaying an image to be processed, and a candidate image display region Ac for displaying a list of thumbnail images of images to be processed, that is, images as a training image group. In a case where an operation of selecting one image from the candidate image display region Ac is performed, the selected image is enlarged and displayed in the target image display region Ap.
[0106] In the target image display region Ap, annotation information for an image selected from the candidate image display region Ac is displayed. Specifically, a bounding box Bb of an object detected in the image and a class name display box Bc indicating a class name of the object are displayed.
[0107] Fig. 14 illustrates an example of annotation information in a case where the large scale AI model ML cannot perform class identification for one object among the objects detected in the selected image. The large scale AI model ML of the present example is configured to output information about the bounding box Bb for the object whose class is unidentifiable, and output information indicating that the class name is unknown, such as “TargetX” illustrated in the figure, as the information about the class name.
[0108] The user corrects the information about the class name for the annotation information in which the information of “unknown” is indicated as the class name. Specifically, in the present example, as illustrated in Fig. 15, an operation of inputting information (text information) of a correct class name is performed in the corresponding class name display box Bc.
[0109] In a case where the class name correction operation is performed in this way, the reception processing unit F2 associates the information about the corrected class name as the information about the class name to be corrected of the bounding box Bb. As a result, the annotation information is corrected.
[0110] In the above description, the case of correcting the information about the class name of the object whose class is unidentifiable has been exemplified, but in a case where the large scale AI model ML performs the object detection process on the object of the new class, the object could be erroneously identified as the object of the existing class. That is, in this case, information of an erroneous class name is indicated as the annotation information for the object. The user is requested to correct the information about the class name erroneously attached as described above as the correction work of the annotation information on the correction reception screen G7.
[0111] Here, since the retraining image group is a large number of image groups, it is not desirable to cause the user to correct all of the information about the class name that has been unidentifiable and the information about the class name that has been erroneously recognized as described above because this imposes a large burden on the user.
[0112] Therefore, the reception processing unit F2 in the present example has the following functions. That is, in a case where the correction by the user of the class name is received, the reception processing unit F2 in the present example performs an automatic setting process of setting the class name of the object whose correction has been received as the class name of another object having a similar feature amount to the object. Here, the feature amount means a value indicating the feature of the object, the value being calculated by the inference process of the large scale AI model ML. In addition, the fact that the feature amount is similar means that an error of the feature amount is within a predetermined range.
[0113] In this case, as the correction work of the user, it is assumed that the correction work is performed on some objects, for example, about ten to several tens of objects with respect to the object whose class name is unidentifiable or the object whose class name is erroneously recognized. The reception processing unit F2 performs the automatic setting process using the correction result of the class name performed for such some objects.
[0114] In the present example, as a result of performing such an automatic setting process by the reception processing unit F2, the corrected annotation information in which the information about the correct class name is associated with the object of the new class is obtained.
[0115] In response to acquisition of such corrected annotation information, as illustrated in B of Fig. 13, the large scale AI model ML is retrained by the large scale model retraining processing unit F3. The retraining process of the large scale AI model ML is performed as a process of retraining the large scale AI model ML by supervised training using the retraining image group as the training input image and the corrected annotation information as the teacher data. With this retraining process, it is possible to obtain an AI model capable of class identifying an object of a new class as the large scale AI model ML. The large scale AI model ML trained by the retraining process by the large scale model retraining processing unit F3 will be hereinafter referred to as a “retrained large scale AI model MLd”.
[0116] Next, the edge model is retrained by the edge model retraining processing unit F4 illustrated in C of Fig. 13. Here, the edge model (hereinafter referred to as an “edge model ME”) selected as the" target model" (here, the retraining target model) on the job setting screen G3 of Fig. 6 is retrained by knowledge distillation with the retrained large scale AI model MLd as a teacher model. Specifically, the edge model retraining processing unit F4 includes an error transmission unit F41, and the error transmission unit F41 calculates an error between the output of the retrained large scale AI model MLd in a case where the training image is given as an input and the output of the edge model ME in a case where the training image is given as an input in the same manner. Then, the calculated error is transmitted to the edge model ME, and the edge model ME is retrained. The “output” of the AI model here means information about likelihood for each class. That is, in the above description, an example is described in which such an error between the likelihoods for each class (so-called “soft target loss”) is transmitted as the knowledge distillation. Note that, in the knowledge distillation, it is possible to transmit an error (so-called “hard target loss”) between the correct data obtained as the annotation result and the class identification result (information indicating the class as the inference result).
[0117] By performing the retraining process by the edge model retraining processing unit F4 as described above, it is possible to obtain an edge model capable of identifying a new class as the edge model ME.
[0118] Here, although the retraining method in the case of with class addition is described above, the edge model retraining processing unit F4 retrains the edge model by the machine training using the annotation result by the annotation processing unit F1 as the teacher data in a case where execution of the retraining process of “without class addition”, in other words, execution of the retraining process that does not require addition of an identifiable class is instructed.
[0119] Fig. 16 is an explanatory diagram of the method of retraining the edge model ME in the case of “without class addition”. As illustrated in A of Fig. 16, even in a case of “without class addition”, the annotation processing unit F1 executes the annotation processing on the retraining image group using the large scale AI model ML, and obtains the annotation information.
[0120] Then, in the case of “without class addition”, as illustrated in B Fig. 16, the edge model retraining processing unit F4 retrains the edge model ME by machine training with the retraining image group as the training input image and the annotation information as the teacher data. As a result, it is possible to retrain the edge model ME so as to be able to adapt to a so-called domain change such as the change in the background and the change in the illumination condition described above.
[0121] Here, as described above with reference to Figs. 5 and 8, in the information processing system of the present example, the performance evaluation process of the retrained edge model ME is performed according to the operation of the result display button B4 on the edge model management Top screen G2. This performance evaluation process is performed by the evaluation processing unit F5 described above, but as the performance evaluation process, causes the target edge model to execute the inference process using the verification image (for example, selected as an image that has not been used for retraining in the retraining image group) as an input, and the performance evaluation process is performed on the basis of the output. At this time, an AI processor for operating the edge model is required, but as the AI processor, a virtual processor as a simulator may be used, or a processor as an actual machine may be used. Here, the “actual machine” means a processor as a real object.
[0122] The evaluation processing unit F5 in the present example includes a processor as an actual machine as an AI processor used for the performance evaluation process of an edge model, that is, an AI processor capable of executing the object detection process using an edge model. By performing a performance evaluation using an AI processor as an actual machine, it is possible to perform a performance evaluation with higher accuracy than that in a case where the performance evaluation is performed using a virtual AI processor as a simulator.
[0123] Here, the AI processor as an actual machine used for the performance evaluation process by the evaluation processing unit F5 may be provided in a server device 4 or may be provided in an external device capable of communicating with the server device 4.
[0124] <4. Processing procedure> Fig. 17 is a flowchart illustrating a specific processing procedure example to be executed by the server device 1 in order to realize the retraining method as the embodiment described above. In the present example, the processing illustrated in Fig. 17 is executed by the processor 11 on the basis of a program stored in a predetermined storage device such as the ROM 12 or the storage unit 19.
[0125] In Fig. 17, in step S101, the processor 11 waits for an edge model retraining instruction. Specifically, the processor waits until the execution button B10 is operated on the upload / training type setting screen G4 (Fig. 7).
[0126] In a case where, in step S101, the execution button B10 is operated and it is determined that a retraining instruction of the edge model is given, the processor 11 advances the process to step S102 and determines whether or not the type is “with class addition”. That is, in the present example, it is determined whether or not the execution button B10 has been operated in a state where the check box cb of “with class addition” is filled in on the upload / training type setting screen G4.
[0127] In a case where it is determined in step S102 that the type is “with class addition”, the processor 11 executes processing in steps S103 to S107. First, in step S103, the processor 11 executes the annotation process on the retraining image group using the large scale AI model ML, and in subsequent step S104, executes a correction reception process. That is, the correction reception screen illustrated in Fig. 14 is displayed on the display unit of the user terminal 3, and the correction by the user of the annotation information obtained by the annotation processing is received. As described above, in the present example, the correction by the user of the annotation information is performed only for some objects that need to be corrected.
[0128] In step S105 subsequent to step S104, the processor 11 executes an automatic setting process of the class name based on the correction result. That is, as described above, processing of setting the class name of the object for which correction by the user of the class name has been received as the class name of another object having a similar feature amount to the object is performed.
[0129] In step S106 subsequent to step S105, the processor 11 retrains the large scale AI model ML with the corrected annotation information as teacher data. That is, the large scale AI model ML is retrained with the corrected annotation information which is the annotation information reflecting the correction by the user of the class name received in step S104 and the automatic setting of the class name in the automatic setting process in step S105 as the teacher data and the retraining image group as the training input image. As a result, the retrained large scale AI model MLd is obtained.
[0130] In step S107 subsequent to step S106, the processor 11 retrains the selected edge model ME by knowledge distillation with the retrained large scale AI model MLd as a teacher model. That is, the edge model ME is retrained by the method described above with reference to C of Fig. 13.
[0131] In addition, in a case where it is determined that the type is not “with class addition” in step S102 described above, the processor 11 advances the processing to step S108. In step S108, the processor 11 executes an annotation process on the retraining image group using the large scale AI model ML. Then, in subsequent step S109, the processor 11 retrains the edge model ME with the annotation information as teacher data. That is, the edge model ME is trained by machine training with the annotation information obtained by the annotation processing in step S108 as teacher data and the retraining image group as the training input image.
[0132] The processor 11 ends the series of processes illustrated in Fig. 17 in response to execution of any of the processes of steps S107 and S109.
[0133] Fig. 18 is a flowchart of processing related to an evaluation of the retrained edge model ME. In step S201, the processor 11 waits for an evaluation instruction. That is, processing of waiting until the result display button B4 displayed for the process information about the edge model ME is operated on the edge model management Top screen G2 in a case where the retraining process of the edge model ME is completed is performed.
[0134] In a case where, in step S201, the above-described operation of the result display button B4 has been performed and it is determined that an evaluation instruction has been given, the processor 11 advances the process to step S202 and executes the evaluation process. That is, in the present example, the calculation of the evaluation information by the numerical value described above and the generation processing of the image preview information are performed.
[0135] In step S203 subsequent to step S202, the processor 11 performs a presentation process of the evaluation result. That is, processing of displaying the evaluation result screen G5 on the display unit of the user terminal 3 is performed, and the evaluation information by the numerical values described above and the image preview information are presented to the user.
[0136] In response to the execution of the process of step S203, the processor 11 ends the series of processes illustrated in Fig. 18.
[0137] Fig. 19 is a flowchart of processing related to deployment of the retrained edge model ME. In step S301, the processor 11 waits for a deployment instruction. That is, processing of waiting until the deploy button B14 is operated on the deployment management screen G6 (Fig. 11) is performed.
[0138] In a case where the deploy button B14 is operated and it is determined that a deployment instruction is given, the processor 11 advances the process to step S302 and performs a process of deploying the designated edge model ME to the designated imaging device 2. That is, processing of deploying the edge model ME designated according to the selection input operation on the input box bi on the deployment management screen G6 to the imaging device 2 designated according to the selection input operation on the input box bi is performed. As a result, the retrained edge model ME can be used in the imaging device 2.
[0139] In response to the execution of the process of step S302, the processor 11 ends the series of processes illustrated in Fig. 19.
[0140] <5. Modification> Although the embodiments according to the present technology have been described above, the embodiments are not limited to the specific examples described above, and configurations as various modifications can be used. For example, in the above description, an example is described in which the server device 1 retrains the edge model, an edge device different from the imaging device 2, such as the fog server described above, could retrain the edge model. Here, although it is described above that it is possible to adopt a configuration in which an edge device different from the imaging device 2, such as a fog server, performs the object detection process using the edge model, one could adopt a configuration in which such an edge device retrains the edge model. In this case, the object detection process and the retraining process using the edge model are performed in the same device.
[0141] Furthermore, in the above description, an example is described in which the server device 1 performs the processing from reception of upload of the retraining image group to retraining of the edge model and the processing of deploying the retrained edge model, but one could adopt a configuration in which separate devices perform the processing from reception of upload of the retraining image group to retraining of the edge model and the processing of deploying the retrained edge model.
[0142] <6. Summary of embodiments> As described above, the information processing device (server device 1) as the embodiment includes an annotation processing unit (F1) that performs an annotation process on an input image using a large scale AI model (ML) that performs an object detection process and is configured to output region information indicating an object detection region even for a class unidentifiable object, a reception processing unit (F2) that receives correction by a user of an annotation result by the annotation processing unit, a large scale model retraining processing unit (F3) that retrains the large scale AI model using annotation information corrected by the user, and an edge model retraining processing unit (F4) that retrains an edge model that is an AI model used in an edge device that performs an object detection process on a captured image by knowledge distillation with the large scale AI model (retrained large scale AI model MLd) after retraining as a teacher model. By including the reception processing unit as described above, it is possible to cause the user to set information about a correct class name for an object whose class the large scale AI model before retraining is not capable of identifying or an object whose class is erroneously identified. Then, by retraining the large scale AI model using the corrected annotation information as described above, it is possible to obtain a large scale AI model capable of identifying an object of a new class (an object whose class is unidentifiable before retraining or an object whose class is erroneously identified). Furthermore, by retraining the edge model by knowledge distillation with such a large scale AI model after retraining as a teacher model, it is possible to obtain an edge model capable of identifying an object of a new class. That is, it is possible to realize retraining of adding an identifiable class for the AI model used in the object detection process in the edge device. Then, according to the above configuration, in realizing the retraining of adding the identifiable class for the AI model used in the object detection process of the edge device, the user is only required to perform the work of correcting at least the annotation result by the annotation processing unit. In addition, since the retraining of the edge model is performed by knowledge distillation with a large scale AI model retrained so as to be able to identify a new class, it is possible to obtain an edge model capable of performing class identification with high accuracy. From these points, according to the present embodiment, it is possible to improve class identification accuracy of an AI model while reducing a workload of a user in a case of performing retraining of adding an identifiable class for the AI model used in the object detection process of an edge device.
[0143] Furthermore, in the information processing device as the embodiment, the reception processing unit performs an automatic setting process of setting, in a case where correction by a user of a class name is received, a class name of an object for which the correction is received as a class name of another object having a similar feature amount to the object, and the large scale model retraining processing unit retrains the large scale AI model using annotation information for which a class name has been set by the automatic setting process. According to the above configuration, in a case where the class name of only some objects of the objects for which the class name is required to be corrected by the user is corrected, the same class name is automatically set for another object for which the class name is required to be corrected. Therefore, it is possible to further reduce the workload of the user required for retraining.
[0144] Furthermore, the information processing device as the embodiment further includes an evaluation processing unit that performs a performance evaluation process on the edge model retrained by the edge model retraining processing unit. As a result, it is possible to obtain performance evaluation information for allowing the user to check whether or not retraining has been appropriately performed.
[0145] Furthermore, in the information processing device as the embodiment, the evaluation processing unit performs the performance evaluation process using an AI processor as an actual machine capable of executing the object detection process using an edge model. By performing a performance evaluation using an AI processor as an actual machine, it is possible to perform a performance evaluation with higher accuracy than that in a case where the performance evaluation is performed using a virtual AI processor as a simulator.
[0146] Furthermore, the information processing device as the embodiment further includes a deployment processing unit (F6) that deploys the edge model retrained by the edge model retraining processing unit to an edge device. As a result, the retrained edge model can be used in the edge device. Therefore, the edge model used by the edge device can be adapted to the environmental change of the site.
[0147] Furthermore, in the information processing device as the embodiment, the edge model retraining processing unit retrains, in a case where execution of a retraining process that does not require addition of an identifiable class as a retraining process of the edge model is instructed, the edge model by machine training using an annotation result by the annotation processing unit as teacher data. As a result, it is possible to retrain the edge model so as to be able to adapt to a domain change such as a change in the illumination condition or a change in the background object at the site. Since it is possible to selectively execute training with different purposes as retraining of the edge model, convenience for the user is improved.
[0148] Furthermore, in the information processing device as the embodiment, the edge model retraining processing unit retrains the edge model selected by the user. As a result, in a case where the user has a plurality of edge models, only the necessary edge model can be retrained, and convenience can be improved.
[0149] An information processing method as the embodiment is an information processing method executed by an information processing device, the information processing method includes performing an annotation process on an input image using a large scale AI model that performs an object detection process and is configured to output region information indicating an object detection region even for a class unidentifiable object, receiving correction by a user of an annotation result by the annotation process, retraining the large scale AI model using annotation information corrected by the user, and retraining an edge model that is an AI model used in an edge device that performs an object detection process on a captured image by knowledge distillation with the large scale AI model after retraining as a teacher model. Such an information processing method can produce functions and effects similar to the functions and effects produced by the information processing device as the embodiment described above.
[0150] Here, as the embodiment, it is possible to consider a program for realizing the processing described above with reference to Fig. 17 and the like by, for example, a CPU, a DSP, or a device including these. That is, a program according to the embodiment is a computer device readable program, the program causing the computer device to execute the functions of performing an annotation process on an input image using a large scale AI model that performs an object detection process and is configured to output region information indicating an object detection region even for a class unidentifiable object, receiving correction by a user of an annotation result by the annotation process, retraining the large scale AI model using annotation information corrected by the user, and retraining an edge model that is an AI model used in an edge device that performs an object detection process on a captured image by knowledge distillation with the large scale AI model after retraining as a teacher model. With such a program, the function of the server device 1 as the above-described embodiment can be realized in the computer device.
[0151] The program described above can be recorded in advance in an HDD or an SSD as a recording medium built in a device such as a computer device, and a ROM or the like in a microcomputer including a CPU. Alternatively, the program may be temporarily or permanently stored (recorded) in a removable recording medium such as a flexible disk, a compact disc read only memory (CD-ROM), a magneto optical (MO) disk, a digital versatile disc (DVD), a Blu-ray disc (registered trademark), a magnetic disk, a semiconductor memory, or a memory card. Such a removable recording medium may be provided as what is referred to as package software. Furthermore, such a program can be installed from the removable recording medium into a personal computer or the like, or can be downloaded from a download site via a network such as a LAN or the Internet.
[0152] In addition, such a program is suitable for providing a wide range of retraining methods as embodiments. Various forms of computer devices can be caused to function as devices that implement the retraining method of the present disclosure.
[0153] Note that, the effects described in the present specification are merely examples and are not limited, and other effects may be provided.
[0154] <7. Present technology> Note that the present technology can also have the following configurations. (1) An information processing device including an annotation processing unit that performs an annotation process on an input image using a large scale AI model that performs an object detection process and is configured to output region information indicating an object detection region even for a class unidentifiable object, a reception processing unit that receives correction by a user of an annotation result by the annotation processing unit, a large scale model retraining processing unit that retrains the large scale AI model using annotation information corrected by the user, and an edge model retraining processing unit that retrains an edge model that is an AI model used in an edge device that performs an object detection process on a captured image by knowledge distillation with the large scale AI model after retraining as a teacher model. (2) The information processing device according to the item (1), in which the reception processing unit performs an automatic setting process of setting, in a case where correction by a user of a class name is received, a class name of an object for which the correction is received as a class name of another object having a similar feature amount to the object, and the large scale model retraining processing unit retrains the large scale AI model using annotation information for which a class name has been set by the automatic setting process. (3) The information processing device according to the item (1) or (2), further including an evaluation processing unit that performs a performance evaluation process on the edge model retrained by the edge model retraining processing unit. (4) The information processing device according to the item (3), in which the evaluation processing unit performs the performance evaluation process using an AI processor as an actual machine capable of executing an object detection process using the edge model. (5) The information processing device according to any one of the items (1) to (4), further including a deployment processing unit that deploys the edge model retrained by the edge model retraining processing unit to the edge device. (6) The information processing device according to any one of the items (1) to (5), in which the edge model retraining processing unit retrains, in a case where execution of a retraining process that does not require addition of an identifiable class as a retraining process of the edge model is instructed, the edge model by machine training using an annotation result by the annotation processing unit as teacher data. (7) The information processing device according to any one of the items (1) to (6), in which the edge model retraining processing unit retrains the edge model selected by the user. (8) An information processing method executed by an information processing device, the information processing method including performing an annotation process on an input image using a large scale AI model that performs an object detection process and is configured to output region information indicating an object detection region even for a class unidentifiable object, receiving correction by a user of an annotation result by the annotation process, retraining the large scale AI model using annotation information corrected by the user, and retraining an edge model that is an AI model used in an edge device that performs an object detection process on a captured image by knowledge distillation with the large scale AI model after retraining as a teacher model.
[0155] It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and alterations may occur depending on design requirements and other factors insofar as they are within the scope of the appended claims or the equivalents thereof.
[0156] 1 Server device 2 Imaging device 3 User terminal 41 Imaging optical system 42 Image sensor 54 AI processing unit 43 Optical system drive unit 44 Camera control unit 45 Memory unit 46 Communication unit 11 Processor 12 ROM 13 RAM 17 Display unit 19 Storage unit 20 Communication unit F1 Annotation processing unit F2 Reception processing unit F3 Large scale model retraining processing unit F4 Edge model retraining processing unit F41 Error transmission unit F5 Evaluation processing unit F6 Deployment processing unit ML Large scale AI model ME Edge model MLd Retrained large scale AI model
Claims
1. An information processing system comprising: processing circuitry configured to: perform an annotation process on an input image using a first AI model; receive a correction of the annotation process by a user; retrain the first AI model based on the correction; and after retraining the first AI model, retrain a second AI model used in an edge device with the first AI model as a teacher model.
2. The information processing system according to claim 1, wherein the first AI model performs an object detection process and is configured to output region information indicating an object detection region even for a class unidentifiable object.
3. The information processing system according to claim 1, wherein the second AI model performs an object detection process on a captured image.
4. The information processing system according to claim 2, wherein the second AI model performs an object detection process on a captured image.
5. The information processing system according to claim 1, wherein the first AI model is a large scale AI model.
6. The information processing system according to claim 1, wherein the second AI model is an edge model.
7. The information processing system according to claim 5, wherein the second AI model is an edge model.
8. The information processing system according to claim 1, wherein the processing circuitry is configured to retrain the second AI model by knowledge distillation.
9. The information processing system according to claim 7, wherein the processing circuitry is configured to retrain the second AI model by knowledge distillation.
10. The information processing system according to claim 1, wherein the processing circuitry is configured to: perform an automatic setting process of setting, in a case where the correction by the user includes a class name of an object, the class name of the object as a class name of another object having a similar feature amount to the object, and retrain the first AI model using annotation information of the automatic setting process.
11. The information processing system according to claim 1, wherein the processing circuitry is configured to perform a performance evaluation process on the second AI model retrained with the first AI model.
12. The information processing system according to claim 11, wherein the processing circuitry is configured to perform the performance evaluation process using an AI processor as an actual machine capable of executing an object detection process using the second AI model.
13. The information processing system according to claim 1, wherein the processing circuitry is configured to deploy the second AI model retrained with the first AI model to the edge device.
14. The information processing system according to claim 1, wherein the processing circuitry is configured to retrain the second AI model by machine training using an annotation result of the annotation process as teacher data.
15. The information processing system according to claim 1, wherein the processing circuitry is configured to accept selection of the second AI model for the retraining by the user.
16. An information processing method comprising: performing an annotation process on an input image using a first AI model; receiving a correction of the annotation process by a user; retraining the first AI model based on the correction; and after retraining the first AI model, retraining a second AI model used in an edge device with the first AI model as a teacher model.
17. A system comprising: an edge device configured to use a second AI model; and an information processing device including processing circuitry configured to: perform an annotation process on an input image using a first AI model; receive a correction of the annotation process by a user; retrain the first AI model based on the correction; and after retraining the first AI model, retrain the second AI model used in the edge device with the first AI model as a teacher model.
18. The system according to claim 17, wherein the first AI model performs an object detection process and is configured to output region information indicating an object detection region even for a class unidentifiable object, and the second AI model performs an object detection process on an image captured by the edge device.
19. The system according to claim 17, wherein the first AI model is a large scale AI model, and the second AI model is an edge model.
20. The system according to claim 18, wherein the first AI model is a large scale AI model, and the second AI model is an edge model.