Computer-readable storage medium storing inference program, inference method, inference device, computer-readable storage medium storing learning program, learning method, and learning device

The system effectively identifies multiple objects in images and prevents fraud by using object and image classification models, enhancing user authentication in reward systems.

WO2026126847A1PCT designated stage Publication Date: 2026-06-18CYGAMES INC

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
CYGAMES INC
Filing Date
2025-12-01
Publication Date
2026-06-18

Smart Images

  • Figure JP2025041794_18062026_PF_FP_ABST
    Figure JP2025041794_18062026_PF_FP_ABST
Patent Text Reader

Abstract

This inference device comprises: a communication unit that acquires a captured image in which a plurality of objects have been imaged, and a surrounding area image identification region for identifying surrounding area images of the objects, the surrounding area image identification region being set as a portion of the captured image; an object identification model that sets, for each of the plurality of objects appearing in the captured image, an object detection region in which the object has been detected, and identifies the object for each object detection region; and a determining unit that determines the success or failure of user authentication on the basis of the position of the object detection region relative to the surrounding area image identification region.
Need to check novelty before this filing date? Find Prior Art

Description

Computer-readable storage medium storing an inference program, inference method, inference device, computer-readable storage medium storing a learning program, learning method, and learning device

[0001] The present invention relates to a computer-readable storage medium storing an inference program, an inference method, an inference device, a computer-readable storage medium storing a learning program, a learning method, and a learning device.

[0002] Conventionally, a reward granting system has been provided that identifies a product from a captured image of a single product and grants a reward (including benefits such as points, in-game currency, in-game items, etc.) to the user who transmitted the captured image when the captured product is valid. Since a privilege is granted to a user who is determined to have captured a valid product, it is expected that the user's purchasing desire will increase.

[0003] Patent Document 1 describes a system including a terminal device and a server. The terminal device described in Patent Document 1 determines whether any one of a plurality of types of objects is imaged in an image of a space imaged by the imaging device of the portable terminal device based on local feature amounts of shapes for detecting a plurality of types of objects. When it is determined that any one of the types of objects is imaged, the image determined to have imaged the object is taken in as an acquisition target object image, and the acquisition target object image is transmitted to the server. The server receives the transmitted acquisition target object image, determines a feature correlation amount of the acquisition target object image with respect to the one type of object based on the image feature amount of the one type of object among the plurality of types of objects, and determines whether the acquisition target object image is an image that has imaged the one type of object based on the determined feature correlation amount.

[0004] Japanese Patent No. 6517298

[0005] Conventional reward systems assume that only one product is captured in the image. If the image processing and object recognition performed on the image are deemed valid, the system authenticates the user and awards a reward. However, conventional reward systems could not identify multiple products (e.g., a combination of a burger and a drink) in situations like set menus in restaurants or food delivery services. Set menus often have multiple drink and burger options, resulting in numerous combinations. Furthermore, it's necessary to assume that users will arrange the products in any order and photograph them from any angle. Therefore, image recognition techniques for single objects, such as those disclosed in Patent Document 1, were completely unusable.

[0006] Furthermore, a new challenge arose with images containing multiple products: occlusion could occur, for example, a burger might appear in front of a drink cup, obscuring part of the cup, or conversely, a drink might appear in front of a burger, obscuring part of the burger. Therefore, a mechanism was needed to identify multiple items in a set menu in a photograph in a predetermined combination and reliably authenticate the user. Correctly identifying the set of products was therefore essential.

[0007] Furthermore, in a reward system that awards rewards to users who submit images of set menus taken at any location, it is also necessary to address fraudulent activity by users. Fraudulent activity includes attempts to successfully authenticate by displaying an image of a set menu (for example, an image obtained from the internet) on the display of a smartphone or PC (Personal Computer), and then taking a picture of this displayed image.

[0008] This invention was made in view of the above circumstances, and aims to correctly identify multiple objects in a captured image, detect fraud, and determine whether authentication is successful or unsuccessful.

[0009] The present invention relates to a computer-readable storage medium that stores an inference program causing a computer to execute the following steps: a procedure for acquiring a captured image in which multiple objects are photographed, and region information of a peripheral image identification region for identifying the surrounding image of an object set in a part of the captured image; a procedure for an object identification model to set an object detection region where an object has been detected for each of the multiple objects in the captured image, and to identify the multiple objects in the captured image for each object detection region; and a procedure for determining the success or failure of user authentication based on the position of the object detection region relative to the peripheral image identification region. The computer-readable storage medium storing the above inference program is one aspect of the present invention, and an inference method and inference apparatus reflecting one aspect of the present invention are configured in the same manner as the computer-readable storage medium storing the above inference program.

[0010] Furthermore, the present invention relates to a computer-readable storage medium that stores a learning program for causing a computer to execute a procedure for causing an object recognition model to learn to identify multiple objects that appear in a predetermined combination in a captured image for object recognition learning, and a procedure for recording the learned object recognition model in a recording unit. The computer-readable storage medium storing the above-mentioned learning program is one aspect of the present invention, and a learning method and learning apparatus that reflect one aspect of the present invention are configured in the same manner as the computer-readable storage medium storing the above-mentioned learning program.

[0011] According to the present invention, it is possible to correctly identify multiple objects in a captured image and detect fraud, thereby determining whether authentication is successful or unsuccessful, and thus preventing fraud. Problems, configurations, and effects other than those described above will be clarified by the following description of embodiments.

[0012] This is an overall configuration diagram showing an overview of the reward system according to the first embodiment of the present invention. This is a block diagram showing an example of the hardware configuration of an information processing terminal according to the first embodiment of the present invention. This is a block diagram showing the hardware configuration of a learning device according to the first embodiment of the present invention. This is a block diagram showing an example of the functional configuration of a learning device according to the first embodiment of the present invention. This is a flowchart showing an example of object recognition learning processing according to the first embodiment of the present invention. This is a flowchart showing an example of image classification learning processing according to the first embodiment of the present invention. This is a block diagram showing the hardware configuration of an inference device according to the first embodiment of the present invention. This is a diagram showing an example of the shooting area, internal area, and surrounding image recognition area according to the first embodiment of the present invention. This is a diagram showing an example of the display of the check-in screen when check-in is successful according to the first embodiment of the present invention. This is a diagram showing an example of the display of the check-in screen when check-in fails according to the first embodiment of the present invention. This is a block diagram showing an example of the functional configuration of the reward system according to the first embodiment of the present invention. This is a sequence diagram showing an example of processing of an information processing terminal and an inference device according to the first embodiment of the present invention. This is a flowchart showing an example of object recognition inference processing according to the first embodiment of the present invention. This is a diagram showing an example of object recognition results according to the first embodiment of the present invention. This is a flowchart showing an example of object recognition judgment processing according to the first embodiment of the present invention. This is a diagram showing an example of object recognition results and check-in results according to the first embodiment of the present invention. This is a flowchart showing an example of image classification inference processing according to the first embodiment of the present invention. This is a diagram showing an example of the configuration of a judgment result table according to the first embodiment of the present invention. This is a block diagram showing an example of the functional configuration of a reward system according to a second embodiment of the present invention. This is a block diagram showing an example of the functional configuration of a reward system according to a third embodiment of the present invention.

[0013] Hereinafter, embodiments for carrying out the present invention will be described with reference to the accompanying drawings. In this specification and drawings, components having substantially the same function or configuration are denoted by the same reference numerals, and redundant descriptions are omitted.

[0014] [First Embodiment] Before determining whether to award rewards (including points, in-game currency, in-game items, etc.) to users who have photographed existing products with their cameras, a process called check-in is performed to authenticate the user by combining an object recognition process that determines the correct combination of objects from the captured image (first determination) and a process that detects whether the captured image is not fraudulent (second determination). The system in which rewards are awarded to users who have been authenticated through check-in is called the reward awarding system. For this reason, the reward awarding system is also called the check-in system. In the reward awarding system, users who perform fraudulent operations are determined to have failed authentication, and these users are not given rewards, or any rewards that have already been given are revoked. The realization of such a reward awarding system is essential for promoting the promotion of products and / or services.

[0015] Traditionally, user authentication was determined using methods such as QR codes or near-field radio communication (NFC). However, these methods required modifications to the POS (Point of Sale) system, resulting in significant implementation costs. There are also systems that use BLE (Bluetooth Low Energy) beacons for user authentication, which do not require modifications to the POS system. However, the small devices that emit BLE beacons can only be used for check-ins at specific locations, and deploying them to thousands of stores tends to be costly, making it difficult to confirm that a user has purchased a specific product.

[0016] On the other hand, the reward system according to the first embodiment of the present invention is a technology that confirms a user's purchase of products by taking a picture of the purchased set menu with a smartphone camera, targeting a set menu consisting of predetermined products. The embodiments described below describe a reward system that has the ability to identify the type of product, in particular a set menu consisting of a combination of specific, diverse products (an example of an object), and tamper resistance to prevent fraudulent authentication.

[0017] <Overall Configuration Example of Reward Granting System> First, an example configuration of a reward granting system according to the first embodiment of the present invention will be described. This reward granting system is configured by combining a learning device that learns object recognition processing and image classification processing, an inference device that infers object recognition and image classification, and an information processing terminal. Figure 1 is an overall configuration diagram showing an overview of the reward granting system 10 according to the first embodiment of the present invention. Hereinafter, the object that is the target of object recognition processing, etc., may be described as a product.

[0018] The reward system 10 comprises a tablet terminal 2_1, a PC (Personal Computer) 2_2, a learning device 30, and an inference device 60. The tablet terminal 2_1 and PC 2_2 used by users can connect to the inference device 60 via a network N such as the Internet. The tablet terminal 2_1 and PC 2_2 used by administrators can connect to the learning device 30 and the inference device 60. In the following description, the tablet terminal 2_1 and PC 2_2 will be collectively referred to as the information processing terminal 2.

[0019] The learning device 30 is a device that trains an object recognition AI model 43 (see Figure 4 below), which is an example of an object recognition model, to perform object recognition processing, and trains an image classification AI model 53 (see Figure 4 below) to perform image classification processing. The inference device 60 is a device that has the trained object recognition AI model 43 identify products in the captured image, and has the trained image classification AI model 53 perform binary classification of the captured image into positive or negative examples.

[0020] Therefore, the learning device 30 and the inference device 60 manage the programs used as the object recognition AI model 43 and the image classification AI model 53, as well as various types of data. For example, the YOLO (You Only Look Once) model described later is used as the object recognition AI model 43. For example, the ResNet model described later is used as the image classification AI model 53.

[0021] The information processing terminal 2 will be described as being operated by a user. The user is assumed to be a person who purchases a set of products, takes a picture of the set of products, and operates the information processing terminal 2 to request authentication.

[0022] In the tablet terminal 2_1, which constitutes the information processing terminal 2, a touch panel display device is used in which the input device 26 and output device 27 are integrated. In PC 2_2, the input device 26 and output device 27 are separate units. Alternatively, PC 2_2 may be configured as a desktop PC, with the input device 26 and output device 27 connected separately to the desktop PC.

[0023] The information processing terminal 2 selects a program based on the operation signals input from the input device 26 by the user's actions and outputs a video signal to the output device 27 that matches the screen of the output device 27. The output device 27 displays the video based on the video signal. The operation signals input from the input device 26 are, for example, signals corresponding to each operation button on the keyboard. The user can input instructions through the input device 26, instruct the inference device 60 to execute an inference program, or operate an inference program recorded in the terminal's recording device 22. Examples of operations on the inference program input from the input device 26 include various command inputs such as launching an application program, taking a picture, and instructing the transmission of a captured image. Another example of an operation performed from the input device 26 is a tap operation, such as touching the screen of the output device 27 with a finger or pen.

[0024] The information processing terminal 2 performs processes such as reading image data from the recording device 22 and executing a program, and displaying a screen on the output device 27 in accordance with operation signals input from the input device 26. For example, in response to an operation by the user through the input device 26, the information processing terminal 2 displays a screen on the output device 27 in which an image based on the image data read from the recording device 22 has been drawn.

[0025] During training, the information processing terminal 2 is used by the administrator. The information processing terminal 2 receives input to send images of multiple products in a predetermined combination, intended for object recognition training, to the learning device 30, causing the object recognition AI model 43 to learn how to identify products. The information processing terminal 2 also receives input to send images of multiple products in a predetermined combination, intended for image classification, to the learning device 30, causing the image classification AI model 53 to learn how to classify the images into positive or negative examples. Furthermore, the information processing terminal 2 can also perform data augmentation on the images used for object recognition training.

[0026] The inference program according to this embodiment, which operates in the inference device 60, causes the object recognition AI model 43 to identify multiple products in the captured image 83 (see Figure 11, described later) based on instruction information input via the input device 26 (an example of an input unit). The inference program according to this embodiment also causes the image classification AI model 53 to classify the captured image into positive or negative examples.

[0027] <Example Hardware Configuration of the Reward Granting System> Next, an example of the hardware configuration of each terminal and device included in the reward granting system 10 according to the first embodiment will be described. Figure 2 is a block diagram showing an example of the hardware configuration of the information processing terminal 2. The hardware configuration examples of the learning device 30 and the inference device 60 will be described later.

[0028] (Example of Information Processing Terminal Configuration) Information processing terminal 2 is an example of a computer that operates as a computer capable of executing various programs. This information processing terminal 2 is equipped with a processor 21, a recording device 22, and a network interface 24, each connected to a bus 23.

[0029] The processor 21 is composed of at least one of the following: a CPU (Central Processing Unit), an MPU (Microprocessor Unit), a GPU (Graphics Processing Unit), and an FPGA (Field Programmable Gate Array). The processor 21 reads the program code of the application that realizes each function according to this embodiment from the recording device 22, loads it into a temporary storage unit (not shown) provided in the recording device 22, and executes the program code. The processor 21 performs calculations based on information acquired from the inference device 60, and performs processing necessary to draw the GUI of the application on the output device 27 of the information processing terminal 2. The processor 21 also performs processing of the OS (Operating System) of the information processing terminal 2, and processing of data input and output performed in each part of the information processing terminal 2. Furthermore, when the processor 21 handles information related to user authentication, it can output an image signal to the output device 27 through the input / output interface 25.

[0030] The recording device 22 is composed of, for example, ROM (Read Only Memory) and RAM (Random Access Memory). ROM can be an optical disc, magneto-optical disc, DVD (Digital Versatile Disc)-ROM, CD-ROM, Blu-ray® disc, etc. RAM can be SRAM (Static RAM), DRAM (Dynamic RAM), etc. Variables and parameters generated during the processing of the processor 21 are temporarily written to the recording device 22, and these variables and parameters are read out by the processor 21 as needed.

[0031] Furthermore, the recording device 22 is composed of at least one of the following: an HDD (Hard Disk Drive), an SSD (Solid State Drive), and flash memory. The recording device 22 stores the OS of the information processing terminal 2, various parameters, programs for making the information processing terminal 2 function, application programs used for user authentication, and the like. As described above, the recording device 22 stores programs and data necessary for the processor 21 to operate, and is used as an example of a computer-readable, non-transient storage medium that stores programs executed by the information processing terminal 2.

[0032] For example, a Network Interface Card (NIC) can be used for the network interface 24. The network interface 24 can transmit and receive various types of data between the learning device 30 and the inference device 60, and communicate with other information processing terminals 2, via a dedicated line connected to the terminals of the NIC and through the network N.

[0033] The input / output interface 25 converts the operation signals received from the input device 26 into data in a predetermined format and passes the converted data to the processor 21. The input / output interface 25 also converts the data of the screen drawn by the processor 21 into a video signal and outputs it to the output device 27.

[0034] The input device 26 is a device that receives input instructions or various types of information from the user. An example of the input device 26 is a pointing device that can input coordinate information of a location specified by the user. This pointing device could be a mouse, a touch panel device, etc. A touch panel device is configured by combining the input device 26 and the output device 27. The input device 26 may also be a keyboard, mouse, etc.

[0035] The output device 27 is a device that outputs information processed by the processor 21. An example of an output device 27 is a display device (such as a display unit or touch panel). When the output device 27 is a display device, an image (for example, a captured image) based on the video signal received from the input / output interface 25 is displayed on the display device.

[0036] The camera 28 can capture various images through the operation of a user using the information processing terminal 2. The camera 28 saves still images as captured images to the recording device 22. The camera 28 can also save moving images as captured images to the recording device 22. When taking images using the camera 28, an image of the product is displayed on the output device 27, so the user can confirm the position of the product in the captured image.

[0037] <Example of Hardware Configuration of Learning Device> Next, an example of the hardware configuration of the learning device 30 will be described. Figure 3 is a block diagram showing the hardware configuration of the learning device 30 according to the first embodiment of the present invention. The learning device 30 is an example of a computer that operates as a computer capable of executing various programs. The learning device 30 is one example of a system for generating trained models consisting of one or more devices, namely the object recognition AI model 43 and the image classification AI model 53 (see Figure 4, which will be described later). However, in the following embodiments, for the sake of explanation, it will be described as a single device. The system for generating the object recognition AI model 43 and the image classification AI model 53 can also mean the learning device 30. The same applies to the inference device 60, which will be described later.

[0038] The learning device 30 comprises a processor 31, an input device 32, a display device 33, a recording device 34, and a communication device 35. These components are connected by a bus 36. Interfaces are assumed to be interposed between the bus 36 and each component as needed. The learning device 30 includes a configuration similar to that of a typical server or PC.

[0039] The processor 31 controls the operation of the entire learning device 30. For example, the processor 31 is at least one of a CPU, MPU, GPU, and FPGA. The processor 31 performs various processes by reading and executing programs (e.g., learning programs) and data stored in the recording device 34. The processor 31 may be composed of multiple processors.

[0040] The input device 32 is a user interface that receives input from the user to the learning device 30, and is, for example, a touch panel, touchpad, keyboard, mouse, or buttons. The display device 33 is a display that shows application screens and the like to the user of the learning device 30 according to the control of the processor 31.

[0041] The recording device 34 (an example of a recording unit) includes a main memory and an auxiliary memory. The main memory is a semiconductor memory such as RAM. RAM is a volatile storage medium that allows for high-speed reading and writing of information and is used as a storage area and work area when the processor 31 processes information. The main memory may also include ROM, which is a read-only non-volatile storage medium. The auxiliary memory stores various programs and data used by the processor 31 when executing each program. The auxiliary memory may be any non-volatile storage or non-volatile memory that can store information, and may be removable.

[0042] The communication device 35 exchanges data with the information processing terminal 2 or other computers such as a server via the network, and is, for example, a wireless LAN module. The communication device 35 can also be other wireless communication devices or modules such as a Bluetooth® module, or wired communication devices or modules such as an Ethernet® module or a USB interface. The system configuration and data structure of this embodiment will be described in detail below.

[0043] <Example of Functional Configuration of Learning Device> Next, the process of generating the object recognition AI model 43 and the image classification AI model 53 will be described. FIG. 4 is a block diagram showing an example of the functional configuration of the learning device 30 according to the first embodiment of the present invention. The learning device 30 includes an object recognition learning unit 42, an object recognition AI model conversion unit 44, an image classification learning unit 52, and an image classification AI model conversion unit 54. Further, the learning device 30 includes object recognition learning data 41, an object recognition learning unit 42, an object recognition AI model 43, an object recognition AI model conversion unit 44, an object recognition AI model 45 for a terminal, image classification learning data 51, an image classification learning unit 52, an image classification AI model 53, an image classification AI model conversion unit 54, and an image classification AI model 55 for a terminal, which are recorded as a learning program in a recording device 34, which is an example of a computer-readable storage medium.

[0044] (Object Recognition Learning) First, an example of the configuration and process related to object recognition learning will be described. The object recognition learning unit 42 reads the object recognition learning data 41 prepared in the recording device 34 shown in FIG. 3 and generates an object recognition AI model 43 that has learned object recognition processing. The object recognition processing is a process of recognizing a plurality of products shown in a captured image. A large number of captured images for object recognition learning are stored in the object recognition learning data 41. In the captured image, a plurality of products are shown in a predetermined combination, and a label representing the product name is attached to each product in advance. Further, the object recognition AI model 43 generated by the object recognition learning unit 42 is stored in the recording device 34. When another object recognition learning data 41 is prepared, the object recognition learning unit 42 can learn the object recognition processing again on the object recognition AI model 43 read from the recording device 34.

[0045] Let's return to the explanation of Figure 4. The object recognition AI model 43 described above is, for example, composed of YOLO. YOLO is a real-time deep learning model used for object detection. The greatest feature of YOLO is that it can perform object detection at extremely high speed and accuracy by inferring object detection and labeling end to end. In addition, because YOLO captures the background pattern and object features as a whole, it has fewer false positives and high generalization performance compared to other methods. In particular, even when one product is placed in front of another product and occlusion occurs, YOLO can correctly identify the object if the object features partially match.

[0046] This section explains the YOLO training method. For YOLO training, a "Web application for capturing training data," implemented using the same code as the prototype web application, is used to photograph items from a set menu. Data is prepared by selecting and labeling the areas of the photographed items. At this time, the training data used is reduced to the same aspect ratio as during inference and cropped.

[0047] The object recognition AI model conversion unit 44 generates a terminal object recognition AI model 45 by optimizing the object recognition AI model 43 read from the recording device 34. The terminal object recognition AI model 45 uses less memory than the original object recognition AI model 43, and the size of the model file itself is also reduced. Therefore, the terminal object recognition AI model 45 can be operated even on an information processing terminal 2 with limited resources. The terminal object recognition AI model 45 is used in the second and third embodiments described later.

[0048] FIG. 5 is a flowchart showing an example of object recognition learning processing. First, the object recognition learning unit 42 acquires object recognition learning data 41 from the recording device 34 (S1). Next, the object recognition learning unit 42 generates an object recognition AI model 43 that has learned object recognition processing based on the object recognition learning data 41 (S2). Next, the object recognition learning unit 42 records the learned object recognition AI model 43 in the recording device 34 (S3). If the terminal object recognition AI model 45 is not used, this process ends.

[0049] When the terminal object recognition AI model 45 is used, the object recognition AI model conversion unit 44 converts the object recognition AI model 43 read from the recording device 34 into the terminal object recognition AI model 45 (S4). Next, the object recognition AI model conversion unit 44 records the terminal object recognition AI model 45 in the recording device 34 (S5) and ends this process.

[0050] (Image Classification Learning) Next, an example of the configuration and processing related to image classification learning will be described. Conventionally, there have been attempts to illegally display or print a photo obtained from the Internet or the like on a display without purchasing a product to succeed in authenticating a reward system. Since it is necessary to detect such fraud, in this embodiment, an image classification AI model 53 capable of classifying an image into a positive example or a negative example is constructed as an example of an image classification model.

[0051] The image classification learning unit 52 reads the image classification learning data 51 prepared in the recording device 34 and generates an image classification AI model 53 that has learned image classification processing. The image classification processing is a process of classifying a captured image for image classification in which a plurality of products are captured in a predetermined combination into a positive example or a negative example. A large number of captured images for image classification learning are stored in the image classification learning data 51. In the captured image, a plurality of products are captured in a predetermined combination with the captured image for image classification learning as a positive example and an illegal image as a negative example. Further, the image classification AI model 53 generated by the image classification learning unit 52 is stored in the recording device 34. When another image classification learning data 51 is prepared, the image classification learning unit 52 can make the image classification AI model 53 read from the recording device 34 learn image classification processing again.

[0052] Positive example images include training data 41 for object recognition. Negative example images are not shown, but they include, for example, images of other images processed with a paint tool, printed images, etc.

[0053] The image classification AI model 53 is constructed using, for example, ResNet. Although ResNet is not specialized for any particular purpose, it is a model that exhibits extremely high performance in image classification. It is characterized by its ability to improve performance even with very deep networks (such as 700 layers) and to achieve high accuracy in image recognition tasks. Training ResNet is extremely simple; it learns binary classification processing using the data used in the YOLO training described above as positive examples and, for example, images of fraudulent check-ins intentionally created manually by debugging staff as negative examples.

[0054] The image classification AI model conversion unit 54 generates an optimized terminal image classification AI model 55 from the image classification AI model 53 read from the recording device 34. The terminal image classification AI model 55 uses less memory than the original image classification AI model 53, and the model file itself is also smaller. Therefore, the terminal image classification AI model 55 can be operated even on an information processing terminal 2 with limited resources. The terminal image classification AI model 55 is used in the third embodiment described later.

[0055] The object recognition AI model conversion unit 44 and the image classification AI model conversion unit 54 described above each have the function of converting the AI ​​model into a highly efficient neural network representation file format called ONNX, thereby reducing the file size. The terminal object recognition AI model 45 and the terminal image classification AI model 55 are small enough to be loaded into the RAM or VRAM of the information processing terminal 2, making deployment on the information processing terminal 2 a realistic option. Furthermore, the terminal object recognition AI model 45 and the terminal image classification AI model 55 perform inference processing on middleware called ONNX Runtime, which is responsible for the optimization execution of neural network models. As a result, their behavior is automatically optimized for the SIMD instructions and GPU instructions of the CPU of the information processing terminal 2, enabling inference at a realistic speed.

[0056] Figure 6 is a flowchart illustrating an example of the image classification learning process. First, the image classification learning unit 52 acquires image classification learning data 51 from the recording device 34 (S11). Next, the image classification learning unit 52 generates an image classification AI model 53 that has been trained to perform image classification based on the image classification learning data 51 (S12). Next, the image classification learning unit 52 records the trained image classification AI model 53 in the recording device 34 (S13). If the terminal image classification AI model 55 is not used, this process is terminated.

[0057] When using the terminal image classification AI model 55, the image classification AI model conversion unit 54 converts the image classification AI model 53 read from the recording device 34 into the terminal image classification AI model 55 (S14). Next, the image classification AI model conversion unit 54 records the terminal image classification AI model 55 into the recording device 34 (S15), and the process ends.

[0058] (Example of Hardware Configuration of Inference Device) Next, an example of the configuration of the inference device 60 will be described. Figure 7 is a block diagram showing the hardware configuration of the inference device 60 according to the first embodiment of the present invention. The inference device 60 is an example of a computer that operates as a computer capable of executing various programs. The inference device 60 comprises a processor 61, an input device 62, a display device 63, a recording device 64, and a communication device 65. These components are connected by a bus 66. Interfaces are assumed to be interposed between the bus 66 and each component as needed. The inference device 60 includes a configuration similar to that of a general server or PC.

[0059] The processor 61 controls the operation of the entire inference unit 60. For example, the processor 61 is at least one of a CPU, MPU, GPU, and FPGA. The processor 61 performs various processes by reading and executing programs (e.g., inference programs) and data stored in the recording device 64. The processor 61 may be composed of multiple processors.

[0060] The input device 62 is a user interface that receives input from the user to the inference device 60, and is, for example, a touch panel, touchpad, keyboard, mouse, or buttons. The display device 63 is a display that shows application screens and the like to the user of the inference device 60 according to the control of the processor 61.

[0061] The recording device 64 includes a main memory and an auxiliary memory. The main memory is a semiconductor memory such as RAM. RAM is a volatile storage medium that allows for high-speed reading and writing of information and is used as a storage area and work area when the processor 61 processes information. The main memory may also include ROM, which is a read-only non-volatile storage medium. The auxiliary memory stores various programs and data used by the processor 61 when executing each program. The auxiliary memory may be any non-volatile storage or non-volatile memory that can store information and may be removable.

[0062] The communication device 65 exchanges data with the information processing terminal 2 or other computers such as a server via the network, and is, for example, a wireless LAN module. The communication device 65 can also be other wireless communication devices or modules such as a Bluetooth® module, or wired communication devices or modules such as an Ethernet® module or a USB interface.

[0063] <Description of the Function of the Inference Device> Next, the functions of the inference device 60 will be described. The reward granting system 10, including the inference device 60, consists of a user interface part (referred to as the UI part) that acquires check-in images and a backend part (referred to as the BE part) that verifies the check-in images using two AI models. First, the user interface part that the user directly sees will be described. The user interface for check-in in this embodiment and each area will be described with reference to Figures 8 to 10.

[0064] <Explanation of Each Area> First, with reference to Figure 8, each area displayed on the display device (output device 27) of the information processing terminal 2 will be explained. Figure 8 shows examples of the shooting area, internal area, and surrounding image recognition area.

[0065] The shooting area display unit 81 represents the screen displayed on the information processing terminal 2, which is used in portrait orientation. The shooting area 101 is the area enclosed by a dashed line in the figure, and is the area that the camera 28 can shoot. The display area 102 is the area filled with hatching in the figure. The display device displays the image that is captured in the display area 102 from the area that can be captured in the shooting area 101.

[0066] The shooting area 103 is the area enclosed by the dashed line in the figure, and represents the size of the captured image. The product photographed by the camera 28 is recorded in the recording device 22 shown in Figure 2 as a captured image 83 (see Figure 11 described later) that is the size displayed in the shooting area 103. The aspect ratio of the shooting area 103 is set to, for example, 10:13.

[0067] The internal region 104 is the area inside the imaging region 103, scaled down from the center point of the imaging region 103. The area of ​​the internal region 104 is adjusted to be, for example, 70% or less of the area of ​​the imaging region 103. The position of the internal region 104 is determined by the x and y coordinates of the upper left corner, and the size of the internal region 104 is determined by its height h and width w. The area remaining after removing the internal region 104 from the imaging region 103 is called the peripheral image recognition region 105. The area of ​​the peripheral image recognition region 105 is smaller than the area of ​​the imaging region 103. For example, the area of ​​the peripheral image recognition region 105 is adjusted to be more than 30% of the area of ​​the imaging region 103.

[0068] Figure 9 shows an example of the display of the check-in screen W1 when check-in is successful. The check-in screen W1 is displayed on the output device 27 of the information processing terminal 2 and outputs whether the check-in was successful or not.

[0069] The display example (1) in Figure 9 shows an example of a check-in screen W1 in which an application installed on the information processing terminal 2 activates the camera 28 and displays the video from the camera 28 on the page in real time. The check-in screen W1 displays a bounding frame 106 with an annular boundary line. The bounding frame 106 is configured in a rectangle and indicates the boundary between the internal area 104 and the surrounding image recognition area 105. In addition, a message instructing the user to place the item to be checked in within the bounding frame 106 is displayed in the message display area 107 shown at the bottom of the check-in screen W1.

[0070] The brightness of the peripheral image recognition area 105 is displayed lower than the brightness of the internal area 104, which is inside the boundary line representing the boundary frame 106. Therefore, the user tries to operate the camera 28 to take a picture so that the product is inside the boundary frame 106. As a result, the camera 28 takes a picture not only of the image in the internal area 104, but also of the peripheral image recognition area 105, which is outside the boundary frame 106. The diagonal line crossing the internal area 104 and the peripheral image recognition area 105 represents the edge of the table on which the product to be checked in is placed. When the user presses the capture button displayed in the message display area 107 in the display example (1) of Figure 9, the camera 28 takes a picture of the product, and the display example (2) of Figure 9 is displayed.

[0071] When the user presses the "Retake" button displayed in the message display area 107 in the display example (2) of Figure 9, they return to the display example (1), allowing the user to retake the photo of the product. When the user presses the "OK" button displayed in the message display area 107, the information processing terminal 2 transmits the entire image, including the internal area 104 and the surrounding image recognition area 105, as the captured image 83 to the inference device 60. The object recognition AI model 43 of the inference device 60 identifies the product in the captured image 83.

[0072] A key feature of the user interface shown in the display examples (1) and (2) of Figure 9 is that the internal area 104 represented by the boundary frame 106 is a rectangle narrower than the shooting area 103 that represents the shooting range. Furthermore, it is also characterized by transmitting the captured image and area information, including the coordinates of the boundary frame 106 within that image, to the inference device 60. Although this boundary frame 106 is not drawn in the check-in video, the coordinates x, y of the top-left vertex of the boundary frame 106, and the width w and height h of the boundary frame 106 are transmitted to the inference device 60 as area information (x, y, w, h).

[0073] The display example (3) in Figure 9 shows an example of displaying the check-in result. The check-in result displays a message indicating that the check-in was successful because the identified product combination is valid and the captured image 83 is not fraudulent, and that a reward (an example of a reward) will be sent to the user who successfully checked in.

[0074] Figure 10 shows an example of the display of the check-in screen W1 when check-in fails. In example (1) of Figure 10, only the drink and burger are displayed from a set containing multiple items. If the user presses the capture button in this state, the captured image will be as shown in example (2) of Figure 10, showing only the drink and burger. In this case, check-in is determined to have failed.

[0075] If the check-in is determined to be unsuccessful, a message indicating that check-in could not be completed will be displayed in the message display area 107, as shown in the example display (3) of Figure 10. The message display area 107 will also display information including a request to retake the photo of the set after confirming all items (drinks, burgers, and fries) included in the set being checked in, and the names of the items that the object recognition AI model 43 could not identify.

[0076] (Example of Functional Configuration of Reward System) Next, an example of the functional configuration of the reward system 10 according to the first embodiment will be described. Figure 11 is a block diagram showing an example of the functional configuration of the reward system 10 according to the first embodiment. The reward system 10 according to the first embodiment is configured as a client-server system in which the information processing terminal 2 is the client and the inference device 60 is the server. The information processing terminal 2 is used as the UI unit and the inference device 60 is used as the BE unit.

[0077] First, an example of the functional configuration of the information processing terminal 2 will be described. The information processing terminal 2 includes a shooting area display unit 81, a shooting unit 82, a captured image 83 stored in a recording device 22 (see Figure 2), a terminal-side communication unit 84, and a result display unit 85. The shooting area display unit 81, the shooting unit 82, the terminal-side communication unit 84, and the result display unit 85 are recorded as part of an inference program in the recording device 22, which is an example of a computer-readable storage medium.

[0078] The shooting area display unit 81 displays the shooting area 103 and boundary frame 106, etc., on the display device (output device 27) of the information processing terminal 2. The shooting unit 82 generates a captured image 83 of the product captured in the shooting area 103 using the functions of the camera 28 shown in Figure 2. The captured image 83 is recorded in the recording device 22 and output to the terminal-side communication unit 84 by a dedicated application launched on the information processing terminal 2.

[0079] The terminal-side communication unit 84 transmits the captured image 83 to the inference device 60 via the network N. In addition to the captured image 83, the terminal-side communication unit 84 also transmits area information of an internal region 104 that is set to be smaller than the captured image 83, and user information of the user operating the information processing terminal 2. The area information of the internal region 104 represents the position and size of the internal region 104 within the captured image 83, and is represented by (x, y, w, h) as shown in Figure 8. The terminal-side communication unit 84 also receives the object identification determination result from the inference device 60.

[0080] The result display unit 85 displays the check-in determination result received by the terminal-side communication unit 84. As will be described later, the check-in determination result includes either a success or failure in the check-in. If the check-in is successful, the user can expect to be given benefits linked to their user information. On the other hand, if the check-in fails, it includes information about products that the object recognition AI model 43 could not identify. Therefore, the user can retake photos of the products based on the check-in determination result displayed by the result display unit 85.

[0081] Next, an example of the functional configuration of the inference device 60 will be described. The inference device 60 comprises an inference device-side communication unit 91, captured images 83 recorded in a recording device 64, an image normalization unit 92, an object recognition AI model 43, an object recognition determination unit 93, an image classification AI model 53, a score determination unit 94, and a determination result table 95. The object recognition AI model 43, the image classification AI model 53, the inference device-side communication unit 91, the image normalization unit 92, the object recognition determination unit 93, the score determination unit 94, and the determination result table 95 are recorded as part of the inference program in the recording device 64, which is an example of a computer-readable storage medium.

[0082] The inference device communication unit 91 acquires a captured image 83 containing multiple products and a surrounding image identification region 105 set in a part of the captured image 83 for identifying the surrounding images of the products, which are transmitted from the terminal-side communication unit 84 of the information processing terminal 2. The surrounding image identification region 105 is a region identified from the captured image 83 by the coordinate information of one corner of a rectangular boundary line, and the width and height based on one corner, as shown in Figure 8, and is acquired as region information. The inference device communication unit 91 also acquires user information transmitted from the terminal-side communication unit 84 of the information processing terminal 2 via the network N. The captured image 83, region information, and user information are stored in the recording device 64 of the inference device 60. The inference device communication unit 91 also transmits the object identification determination result from the object identification determination unit 93 to the terminal-side communication unit 84 of the information processing terminal 2 via the network N.

[0083] The image normalization unit 92 normalizes the captured image 83 read from the recording device 64. Products are photographed in various aspect ratios depending on the type of information processing terminal 2. Normalization of the captured image 83 is performed so that the object recognition AI model 43 can process captured images 83 taken in various aspect ratios in a common manner. Normalization of the captured image 83 is extremely important for improving the accuracy of product recognition by the object recognition AI model 43.

[0084] The image normalization unit 92 calculates the size of the internal region 104 corresponding to the image after normalization. This is called the normalization frame. The image normalization unit 92 then determines the resampling ratio according to the ratio between the normalization frame and the size of the input captured image 83. After determining the region to be retained after resampling, the image normalization unit 92 actually resamples the captured image 83, thereby enabling normalization of the captured image 83 centered on the internal region 104.

[0085] The object recognition AI model 43 is copied from the recording device 34 of the learning device 30 to the recording device 64 of the inference device 60. The object recognition AI model 43 takes the normalized captured image 83 as input and sets an object detection region 110 (see Figure 14 described later) where an item is detected for each of the multiple items shown in the captured image 83, and identifies an item for each object detection region 110.

[0086] The object recognition determination unit 93 performs a first determination for user authentication based on the position of the object detection area relative to the surrounding image recognition area 105. As shown in Figure 15, which will be described later, the object recognition determination unit 93 determines that the first determination for authentication has failed if the overlap between the object detection area 110 and the surrounding image recognition area 105 is greater than a predetermined amount (for example, 10 to 30% or more of the area of ​​the surrounding image recognition area 105).

[0087] Furthermore, the object identification determination unit 93 determines the first determination performed for authentication to be a failure if the overlap between the object detection area 110 and the surrounding image identification area 105 is less than a predetermined amount, and the set of identified products does not satisfy the set condition. The set condition is used to determine that the set of identified products corresponds to a predetermined combination of multiple different products. For this reason, the object identification determination unit 93 determines the first determination performed for authentication to be a success if the overlap with the surrounding image identification area 105 is less than a predetermined amount, and the set of identified products satisfies the set condition.

[0088] The object identification determination result determined by the first determination is transmitted to the information processing terminal 2 via the inference device side communication unit 91. The object identification determination result is also recorded in the determination result table 95. The contents recorded in the determination result table 95 are referred to as appropriate by the administrator of the inference device 60.

[0089] Thus, the inference device 60 can implement a process that does not allow check-in if the area of ​​the product detected by the object recognition AI model 43 deviates from the shooting range in the center of the screen, that is, if the image of the product encroaches on the surrounding image recognition area 105 and the surrounding image recognition area 105 becomes less than a certain area (for example, less than 70-90% of the area of ​​the surrounding image recognition area 105). Therefore, the reward granting system 10 can realize a mechanism that naturally compels the user to photograph the surrounding image recognition area. Furthermore, in this embodiment, the object recognition AI model 43 can identify products without depending on the size or aspect ratio of a specific boundary frame 106.

[0090] The image classification AI model 53 is copied from the recording device 34 of the learning device 30 to the recording device 64 of the inference device 60. When the object recognition determination unit 93 determines that the first determination is successful, the image classification AI model 53 classifies the captured image 83 into positive or negative examples and outputs a score.

[0091] The score determination unit 94 performs a second determination for authentication, determining the captured image 83 as a positive example if the score output by the image classification AI model 53 is less than the classification threshold, and determining the captured image 83 as a negative example if the score is equal to or greater than the classification threshold. This second determination is performed to detect check-in fraud using the captured image 83, which includes not only the internal region 104 shown in Figure 8 but also the surrounding image identification region 105.

[0092] The scores output by the image classification AI model 53 have a range of values, and using only one classification threshold may result in misclassifying a captured image 83 that should be classified as a positive example as a negative example, or conversely, misclassifying a captured image 83 that should be classified as a negative example as a positive example. Therefore, the score determination unit 94 divides the classification threshold into a first classification threshold and a second classification threshold greater than the first classification threshold, and determines the captured image 83 accordingly. As shown in Figure 17 described later, if the score determination unit 94 is less than the first classification threshold, it determines it as a "Pass" representing a positive example, and if the score is equal to or greater than the second classification threshold, it determines it as a "Fail" representing a negative example. Captured images 83 determined as "Fail" are audited by the administrator. In addition, if the score determination unit 94 is equal to or greater than the first classification threshold and less than the second classification threshold, it determines it as a "Borderline" that cannot be classified as either a positive or negative example. Captured images 83 determined as "Borderline" are reviewed by the administrator. The score output by the image classification AI model 53 is recorded in the judgment result table 95 as the result of the second judgment by the score judgment unit 94, and the object identification judgment unit 93 can refer to the result of the second judgment.

[0093] In the reward system 10, rewards are awarded to users whose first judgment is determined to be successful and whose second judgment is determined to be "Pass". For example, a user whose captured image 83 is determined to be a positive example by the score judgment unit 94 is awarded a reward. On the other hand, if the score judgment unit 94 determines that the captured image 83 is a negative example, the user's authentication fails. A user who has submitted a captured image 83 that is determined to be a negative example is highly likely to have performed an unauthorized operation. For users who have performed an unauthorized operation, even if a reward has been given, the reward may be revoked by the administrator or the account may be deleted.

[0094] <Example of processing by information processing terminal and inference device> In the check-in service provided to the user by the reward system 10, the inference device 60 receives from the information processing terminal 2 an image taken by the application and coordinate information of the boundary frame 106 in the image as a check-in token. It is preferable that the reward system 10 does not accept images from any client, but only images taken from a specific application or a specific web page. Therefore, secure session management is required, for example, so that only a specific user can send the captured image 83 used for check-in determination.

[0095] Here, we will explain the processing on the inference device 60 side. Here, we will explain the check-in service performed by the information processing terminal 2 and the inference device 60, referring to the example of the screen displayed on the information processing terminal 2 shown in Figure 9. Figure 12 is a sequence diagram showing an example of the processing between the information processing terminal 2 and the inference device 60.

[0096] The user operating the information processing terminal 2 launches an application and activates the camera 28 through the application. When the camera 28 is activated, the output device 27 displays the internal area 104 and the shooting area 103, which includes the surrounding image recognition area 105, etc., as shown in the display example (1) of Figure 9 (S21).

[0097] When a user takes a picture of a set (multiple products) at once (S22), a captured image 83 showing multiple products is displayed, as shown in the display example (2) of Figure 9. In addition, a message is displayed in the message display area 107 to confirm whether the user wants to send the captured image to the server (inference device 60) and check it in. When the user presses the OK button, the terminal-side communication unit 84 sends the captured image 83, area information, and user information to the inference device 60 (S23).

[0098] The inference device side communication unit 91 of the inference device 60 receives the captured image 83, region information, and user information from the information processing terminal 2 (S31). The inference device side communication unit 91 records the captured image 83, region information, and user information acquired from the information processing terminal 2 in the recording device 64 (S32).

[0099] Next, object recognition inference processing is performed (S33), and the object recognition result is output to the object recognition determination unit 93. Here, an example of the object recognition inference processing in step S33 will be described. Figure 13 is a flowchart showing an example of object recognition inference processing.

[0100] First, the image normalization unit 92 acquires the captured image 83 read from the recording device 64 and normalizes the captured image 83 (S41).

[0101] Next, the object recognition AI model 43 detects the internal region 104 from the normalized captured image 83 (S42). Then, the object recognition AI model 43 extracts the object detection region of the object (for example, a product) in the normalized captured image 83 and performs object recognition (assigning a product name) (S43), and proceeds to step S34.

[0102] Figure 14 shows an example of object recognition results. In step S33 of Figure 12 and the object recognition inference process shown in Figure 13, products are individually identified based on the captured image 83 in which multiple products are visible. For example, in the shooting examples (1) to (4) of Figure 14, even if the cup, burger, and fries are arranged appropriately, the product name of the identified product is labeled to the object detection area 110 indicated by the bounding box detected for each product. Furthermore, as shown in shooting examples (3) and (4), even if part of the cup or burger is hidden by the fries, the products are correctly identified.

[0103] Returning to the explanation of Figure 12, in the object recognition inference process, the object recognition AI model 43 detects products based on the captured image 83. At this time, the object recognition AI model 43 outputs the label (identifier string) of the identified product, the score of the identified product, and detection area information (x, y, w, h) representing the object detection area 110 to the object recognition determination unit 93 as the object recognition result.

[0104] The object identification determination unit 93 performs object identification determination processing based on the object identification result (S34). Here, an example of the object identification determination processing in step S34 will be described. Figure 15 is a flowchart of an example of object identification determination processing. Object identification determination processing is an example of a first determination performed for user authentication.

[0105] First, the object recognition determination unit 93 compares the surrounding image recognition area 105 acquired by the inference device side communication unit 91 in step S31 with the object detection area 110 identified from the detection area information (S51). Next, the object recognition determination unit 93 determines whether the overlap of the object detection area 110 with respect to the surrounding image recognition area 105 is greater than or equal to a predetermined amount (S52). The overlap of the object detection area 110 with respect to the surrounding image recognition area 105 is determined, for example, by the ratio of the area of ​​the object detection area 110 that extends into the surrounding image recognition area 105 to the area of ​​the surrounding image recognition area 105. For example, if the overlapping area of ​​the object detection area 110 with respect to the surrounding image recognition area 105 exceeds 10 to 30% of the area of ​​the surrounding image recognition area 105, it is determined that the overlap is greater than or equal to a predetermined amount. Note that coordinate calculation or collision detection checks may be used to determine the overlap of the object detection area 110 with respect to the surrounding image recognition area 105.

[0106] In step S52, if the object recognition determination unit 93 determines that the overlap of the object detection area 110 with respect to the surrounding image recognition area 105 is greater than or equal to a predetermined amount (YES in S52), it determines that the check-in is "failed". The object recognition determination unit 93 then outputs the object recognition determination result of "failure" (S53) and returns to step S35 in Figure 12.

[0107] In step S52, if the object identification determination unit 93 determines that the overlap is not greater than a predetermined amount (NO in S52), it determines whether the set of products identified by the object identification AI model 43 satisfies the set condition (S54). Specifically, the object identification determination unit 93 confirms that the combination of product names included in the set menu is included in the label output from the object identification AI model 43 by matching the pattern of the identifier string.

[0108] In step S54, if the object identification determination unit 93 determines that the set of identified items satisfies the set condition (YES in S54), it determines that the check-in is "successful," that is, the first determination is successful. The object identification determination unit 93 then outputs the object identification determination result of "success" (S55), and returns to step S35 in Figure 12.

[0109] On the other hand, in step S54, if the object identification determination unit 93 determines that the set of identified products does not satisfy the set condition (NO in S54), it determines that the check-in is "failed," that is, the first determination is a failure. The object identification determination unit 93 then outputs the object identification determination result, which includes the fact that the check-in was determined to be "failed" and the product names that do not match the combination (S56), and returns to step S35 in Figure 12. As the product names that do not match the predetermined combination among the products identified by the object identification AI model 43 are output, the information processing terminal 2 displays a message for the products that do not match the predetermined combination, as shown in the display example (3) in Figure 10.

[0110] Here, we will explain an example of object recognition results by the object recognition AI model 43. Figure 16 shows an example of object recognition results and check-in results. The object recognition result (1) of the captured image 83 shown in Figure 16 is indicated by the object detection region 110, which is a bounding box of a rectangular frame indicated by a dashed line, showing how the object recognition AI model 43 detected each product in the captured image 83. Each object detection region 110 is labeled with the product name identified by the object recognition AI model 43. Since the object detection region 110 of each product detected by the object recognition AI model 43 is inside the internal region 104, the check-in is determined to be "successful".

[0111] The object recognition result (2) of the captured image 83 shown in Figure 16 is also shown in the object detection region 110, which is a rectangular frame indicated by a dashed line, showing how the object recognition AI model 43 detected each product in the captured image 83. However, since the overlap between the surrounding image recognition region 105 and the object detection region 110 labeled "Cup" and "Potato" is greater than a predetermined amount, the check-in is judged to be "failure".

[0112] Returning to the explanation of Figure 12, the inference device communication unit 91 transmits the object identification determination result to the information processing terminal 2 regardless of whether the check-in determination is successful or unsuccessful (S35). The object identification determination result, which is the result of the first determination, is recorded in the determination result table 95, as shown in Figure 18.

[0113] The terminal-side communication unit 84 of the information processing terminal 2 receives the object identification determination result from the inference device 60 (S24). The output device 27 of the information processing terminal 2 displays the determination result based on the object identification determination result (S25), and the processing of the information processing terminal 2 ends.

[0114] On the other hand, in the inference device 60, if the check-in is determined to be "successful" in step S34, that is, if the first determination is determined to be successful, the image classification inference process is continued (S36). Here, an example of the image classification inference process in step S36 will be described. Figure 17 is a flowchart showing an example of the image classification inference process.

[0115] First, the image classification AI model 53 classifies the captured image 83 read from the recording device 64 and outputs a score (S61). Next, the score determination unit 94 compares the score with a classification threshold and performs a second determination for authentication, determining whether the captured image 83 is a positive or negative example.

[0116] Specifically, the score determination unit 94 determines whether the score is less than the first classification threshold (S62). If the score is less than the first classification threshold (YES in S62), the score determination unit 94 writes "Pass" to the determination result table 95 and terminates the process. A "Pass" for the captured image 83 means that it is a valid image.

[0117] If the score is equal to or greater than the first classification threshold (NO in S62), the score determination unit 94 determines whether the score is equal to or greater than the second classification threshold, which is greater than the first classification threshold (S64). If the score is equal to or greater than the second classification threshold (YES in S64), the score determination unit 94 writes "Fail" to the determination result table 95 and terminates the process. A "Fail" for the captured image 83 means that it is an invalid image.

[0118] If the score is not equal to or greater than the second classification threshold (NO in S64), the score is equal to or greater than the first classification threshold but less than the second classification threshold. In this case, the score determination unit 94 writes "Borderline" to the determination result table 95 and terminates the process. If the captured image 83 is "Borderline", it means that it is not possible to determine whether it is a valid image or an invalid image.

[0119] <Example of the configuration of the judgment result table> Figure 18 shows an example of the configuration of the judgment result table 95. The judgment result table 95 manages user information, captured images 83, and the judgment results of each AI model. The judgment result table 95 has the following items: user ID, image reception date and time, captured image, object identification result, object identification judgment result, image classification result, future processing, and authentication result.

[0120] The User ID field stores the User ID of the user who sent the captured image 83. If a single user sends images multiple times, a record is created each time an image is sent. The Image Reception Date and Time field stores the date and time (year, month, day, and time) when the inference device 60 received the captured image 83 sent from the information processing terminal 2. The Captured Image field stores the file of the captured image 83 received from the information processing terminal 2. Here, an example of a JPEG image file is shown, but any image file extension is acceptable. The Captured Image field may also store location information (path, etc.) indicating the location of the captured image 83 file.

[0121] The object recognition result item stores the object recognition result obtained when the object recognition AI model 43 identifies the products in the captured image 83. Here, the name of the product included in the set menu is stored as the object recognition result. The object recognition judgment result item stores "success" or "failure" as the object recognition judgment result.

[0122] The image classification result item stores the image classification result determined by the score determination unit 94 based on the score obtained by the image classification AI model 53 in classifying the captured image 83 as either a positive or negative example. As described above, if the image classification AI model 53 classified the captured image 83 as a positive example, it is represented as "Pass," and if it classified it as a negative example, it is represented as "Fail." Since the image classification AI model 53 represents the image classification result as a score, if neither the score for the positive example nor the negative example is significantly high and manual verification is required, the image classification result is represented as "Borderline." Also, if the object identification determination result is "Failure," the image classification inference process is not performed, and the image classification result remains blank.

[0123] The "Future Processing Items" section stores the future processing to be performed based on the object identification results and image classification results. The "Authentication Result Items" section stores the final authentication result. The final authentication result may be registered by the administrator because even if the first judgment is successful, the second judgment may fail.

[0124] If, as with user1, the object recognition result is "successful" and the image classification result is "Pass," the final user authentication is determined to be "successful," and the user is granted a reward (an example of a reward).

[0125] If, like user2, the object recognition result for the first submitted image 83 is "failure," the final user authentication is determined to be "failure," and the user is not granted any benefits. However, if the object recognition result for the second submitted image 83 is "successful," an image classification result is obtained. If the image classification result is "Pass," the final user authentication is determined to be "successful," and the user is granted benefits.

[0126] As with user3, if the object identification determination result is "successful" but the image classification result is "fail," there is a high possibility that the captured image 83 is fraudulent. For this reason, the administrator of the inference device 60 audits the captured image 83. If it is confirmed that the captured image 83 was obtained fraudulently, the benefits for user3 will be revoked. In other words, even if the object identification determination result is "successful" and the image classification result is "Pass," if the final user authentication is determined to be "failed," the user will not be granted any benefits. In this embodiment, since the image classification AI model 53 classifies fraudulent activity in the captured image 83, users attempting fraudulent operations (cheating) will inevitably have to repeatedly try fraudulent operations. As a result, the history of fraudulent operations is accumulated in the determination result table 95, making it easier for the administrator to take measures such as revoking benefits for users attempting fraud or suspending their accounts.

[0127] In cases like User 4, where the object identification result is "successful" but the image classification result is "Borderline," the image classification AI model 53 was unable to correctly classify the captured image 83, so the administrator checks the captured image 83. Therefore, the final user authentication will be either "successful" or "failed," and the result confirmed by the administrator is stored in the authentication result field.

[0128] In the reward system 10 according to the first embodiment described above, product packaging, which contains a lot of noise such as deformation due to folding and dirt, is used as a check-in point, and it is possible to identify combinations of these, i.e., products included in a "set menu," with high accuracy. For this reason, the object recognition AI model 43 can simultaneously identify multiple products to be checked in from products that appear in any captured image 83. In addition, the image classification AI model 53 outputs a score as the result of classifying whether the captured image 83 was fraudulently created or tampered with using the images captured in the surrounding image recognition area 105, and the score determination unit 94 can determine the score and classify the image. For this reason, the inference device 60 can use product packaging, which contains a lot of noise such as deformation due to folding and dirt, as a check-in point, and it is possible to achieve both identification of combinations of these, i.e., "set menus," and fraud prevention measures, thereby preventing fraud.

[0129] Furthermore, by combining the object recognition AI model 43 and the image classification AI model 53, the reward system 10 can be constructed to be robust against fraud while identifying set menus containing various types of products at high speed (processing time within 1 second) and with high accuracy. Specifically, it is possible to detect a set of predetermined products from a captured image 83 containing noise and occlusion, taken using the camera 28 mounted on the information processing terminal 2, and to check whether the captured image has not been intentionally altered or processed.

[0130] Furthermore, the output device 27 of the information processing terminal 2 displays the check-in screen W1, which the user uses to perform the check-in operation. The check-in screen W1 displays a rectangular boundary frame 106 in the center of the screen that is narrower than the actual shooting area 103. This makes it easier for the user to operate the system so that the product to be checked in is captured within the boundary frame 106.

[0131] Furthermore, the object recognition determination unit 93 determines the object recognition determination result as "failure" and does not allow check-in if the object detection area of ​​the product detected by the object recognition AI model 43 extends beyond the internal area 104 specified by the area information. If the object recognition determination result is determined as "failure," a message prompting the user to take a picture so that the product is inside the boundary frame 106 is displayed on the check-in screen W1. Therefore, the user can try checking in again by taking the picture again.

[0132] Furthermore, determining whether the captured image 83 is a fraudulent image requires, in particular, an image of the surrounding image recognition region 105. By naturally encouraging the user to take a picture so that the product is inside the boundary frame 106, an image of the surrounding image recognition region 105 large enough to classify whether it is a fraudulent image is also captured. As a result, the classification accuracy of the image classification AI model 53 is improved, and the tamper resistance of the reward system 10 can be enhanced.

[0133] Furthermore, if the object recognition result is determined to be successful and the captured image 83 is classified as legitimate, users can be rewarded with, for example, in-game rewards. This makes it possible to promote sales by incentivizing users to purchase the menu items that are subject to check-in.

[0134] [Second Embodiment] Next, a reward system according to a second embodiment of the present invention will be described. In the client-server system configured with the reward system 10 according to the first embodiment, the approach of placing the AI ​​model on the server side is only one embodiment. For example, by using a small AI model for check-in purposes, the service can be flexibly designed to match the scale and budget of the initiative. It is also possible to install the small AI model on the information processing terminal 2 and allow check-ins without using network communication.

[0135] Here, the reward system 10A according to the second embodiment will be described with reference to Figure 19. Figure 19 is a block diagram showing an example of the overall configuration of the reward system 10A according to the second embodiment.

[0136] The reward system 10A comprises an information processing terminal 2A and an inference device 60A. In addition to the functional units of the information processing terminal 2 described with reference to Figure 11, the information processing terminal 2A includes a terminal object recognition AI model 45, an image normalization unit 86, and an object recognition determination unit 87. Therefore, the shooting area display unit 81, the shooting unit 82, the terminal-side communication unit 84, the result display unit 85, the image normalization unit 86, the object recognition determination unit 87, and the terminal object recognition AI model 45 are recorded as part of the inference program in a recording device 22 (see Figure 2), which is an example of a computer-readable storage medium. The terminal object recognition AI model 45 is an optimized AI model converted from the object recognition AI model 43 by the object recognition AI model conversion unit 44 shown in Figure 4. By preparing the terminal object recognition AI model 45 in advance, the information processing terminal 2A can make the terminal object recognition AI model 45 perform operations equivalent to those of the object recognition AI model 43.

[0137] The image normalization unit 86 has the same function as the image normalization unit 92 shown in Figure 11. The object identification determination unit 87 has the same function as the object identification determination unit 93 shown in Figure 11. Therefore, in the information processing terminal 2A according to the second embodiment, the image normalization unit 86 normalizes the captured image 83, the terminal object identification AI model 45 identifies the products in the normalized captured image 83, and the object identification determination unit 87 can independently perform the process of determining whether to check in based on the object identification result.

[0138] The captured image 83 is transmitted to the inference device 60A via the terminal-side communication unit 84. The inference device 60A has the inference device-side communication unit 91, image normalization unit 92, score determination unit 94, image classification AI model 53, and determination result table 95 recorded as part of the inference program in a recording device 64, which is an example of a computer-readable storage medium. The inference device-side communication unit 91 records the captured image 83 in the recording device 64. The image classification AI model 53 classifies the captured image 83 into positive or negative examples and outputs a score, and the score determination unit 94 records the image classification result with its determined score in the determination result table 95.

[0139] In the reward system 10A according to the second embodiment described above, the terminal object recognition AI model 45 provided on the information processing terminal 2A identifies the products shown in the captured image 83, and the object recognition determination unit 87 makes a determination. Therefore, if the purpose is only check-in, the processing can be completed on the information processing terminal 2A, thus reducing the operational load on the inference device 60A.

[0140] [Third Embodiment] Next, a reward system according to a third embodiment of the present invention will be described. The reward system according to the third embodiment consists only of an information processing terminal 2B.

[0141] Here, the reward system 10B according to the third embodiment will be described with reference to Figure 20. Figure 20 is a block diagram showing an example of the overall configuration of the reward system 10B according to the third embodiment.

[0142] The reward system 10B includes an information processing terminal 2B. In addition to the functional units of the information processing terminal 2A described with reference to Figure 19, the information processing terminal 2B includes a terminal image classification AI model 55, a score determination unit 88, and a determination result table 89. Therefore, the shooting area display unit 81, the shooting unit 82, the terminal-side communication unit 84, the result display unit 85, the image normalization unit 86, the object recognition determination unit 87, the score determination unit 88, the determination result table 89, the terminal object recognition AI model 45, and the terminal image classification AI model 55 are recorded as inference programs in a recording device 22 (see Figure 2), which is an example of a computer-readable storage medium.

[0143] The terminal image classification AI model 55 is an optimized AI model converted from the image classification AI model 53 by the image classification AI model conversion unit 54 shown in Figure 4. By preparing the terminal image classification AI model 55 in advance, the information processing terminal 2A can make the terminal image classification AI model 55 perform the same operations as the image classification AI model 53.

[0144] Similar to the reward system 10A according to the second embodiment, the information processing terminal 2B can independently perform the process of identifying products from the captured image 83 and determining whether a check-in has occurred. Furthermore, in the information processing terminal 2B, the terminal image classification AI model 55 classifies the captured image 83 into positive or negative examples and outputs a score, and the score determination unit 88 records the image classification result determined by the score in the determination result table 89. The determination result table 89 may be encrypted so that the user cannot read its contents. In addition, the information in the determination result table 89 may be transmitted to a server managed by an administrator.

[0145] In the reward system 10B according to the third embodiment described above, the terminal object recognition AI model 45 provided on the information processing terminal 2B identifies the products shown in the captured image 83, and the object recognition determination unit 87 makes a determination. Furthermore, the terminal image classification AI model 55 provided on the information processing terminal 2B classifies the captured image 83 into positive or negative examples, and a score is determined based on this score. Therefore, even in an offline environment, the information processing terminal 2B can identify products from the captured image 83 and perform the process of classifying the captured image 83 into positive or negative examples.

[0146] The AI ​​models according to the second and third embodiments can be operated on both a server and a client. If the check-in measure is long-term or permanent, it can be operated on the client, which is the information processing terminal 2, and if it is short-term, it can be operated on the server, which is the inference device 60, thereby achieving low costs.

[0147] [Variation] Furthermore, by using WebGPU EP (Execution Provider), it is possible to load the AI ​​model directly into a web browser and perform inference without going through an application. Therefore, the reward system is not necessarily an architecture that depends on a client / server system.

[0148] Furthermore, the AI ​​models capable of realizing each of the above embodiments do not depend on any specific AI model (YOLO, ResNet mentioned above). AI models constructed using various technologies may be used as the object recognition AI model 43 and the image classification AI model 53.

[0149] It should be noted that the present invention is not limited to the embodiments described above, and various other applications and modifications are possible as long as they do not deviate from the gist of the present invention as described in the claims. For example, the embodiments described above describe the configuration of the device and system in detail and concretely in order to explain the present invention in an easy-to-understand manner, and are not necessarily limited to having all the configurations described. Furthermore, it is possible to replace some of the configurations of the embodiments described here with the configurations of other embodiments, and it is also possible to add the configurations of other embodiments to the configuration of one embodiment. In addition, it is possible to add, delete, or replace some of the configurations of each embodiment with other configurations. Furthermore, the control lines and information lines shown are those that are considered necessary for explanation, and do not necessarily represent all control lines and information lines in the actual product. In practice, it can be assumed that almost all configurations are interconnected.

[0150] 2... Information processing terminal, 10... Reward system, 30... Learning device, 41... Learning data for object recognition, 42... Object recognition learning unit, 43... Object recognition AI model, 44... Object recognition AI model conversion unit, 45... Object recognition AI model for terminal, 51... Learning data for image classification, 52... Image classification learning unit, 53... Image classification AI model, 54... Image classification AI model conversion unit, 55... Image classification AI model for terminal, 60... Inference device, 81... 82...Shooting area display unit, 83...Shooting unit, 84...Terminal side communication unit, 85...Result display unit, 91...Inference device side communication unit, 92...Image normalization unit, 93...Object identification determination unit, 94...Score determination unit, 95...Determination result table, 101...Shootable area, 102...Display area, 103...Shooting area, 104...Internal area, 105...Surrounding image identification area, 106...Boundary frame, 107...Message display area, W1...Check-in screen

Claims

1. A computer-readable storage medium storing an inference program for causing a computer to execute the following steps:

1. A procedure for acquiring a captured image in which multiple objects are photographed, and a peripheral image identification region set in a part of the captured image for identifying the surrounding image of the objects; 2. A procedure for an object identification model to set an object detection region where an object has been detected for each of the multiple objects in the captured image, and to identify the object for each object detection region; and 3. A procedure for determining the success or failure of user authentication based on the position of the object detection region relative to the peripheral image identification region.

2. A computer-readable storage medium storing the inference program according to claim 1, which determines that authentication has failed if the overlap between the object detection area and the surrounding image recognition area is greater than or equal to a predetermined amount.

3. A computer-readable storage medium storing the inference program according to claim 2, wherein the first determination performed for authentication is determined to be a failure if the overlap between the object detection area and the surrounding image recognition area is less than a predetermined amount and the set of identified objects does not satisfy the set condition, and the first determination is determined to be a success if the overlap with the surrounding image recognition area is less than a predetermined amount and the set of identified objects satisfies the set condition.

4. A computer-readable storage medium storing the inference program according to claim 3, wherein the set condition is that the set of identified objects corresponds to a predetermined combination of a plurality of different objects.

5. A computer-readable storage medium storing the inference program according to claim 4, which outputs information on objects that do not fall under a predetermined combination among the objects identified by the object identification model.

6. A computer-readable storage medium storing the inference program according to claim 3, comprising: a step of outputting a score in which the image classification model classifies the captured image as a positive example or a negative example when the first determination is determined to be successful; and a step of performing a second determination for authentication, in which the captured image is determined to be a positive example if the score is less than a classification threshold, and the captured image is determined to be a negative example if the score is equal to or greater than the classification threshold.

7. A computer-readable storage medium storing the inference program according to claim 3, wherein the display unit of an information processing terminal that photographs multiple objects displays a circular boundary line indicating the surrounding image identification area.

8. A computer-readable storage medium storing the inference program according to claim 7, wherein the boundary line is rectangular, the area of ​​the peripheral image identification region is smaller than the area of ​​the shooting region displayed on the display unit, and the peripheral image identification region is identified from the captured image by the coordinate information of one corner of the rectangular boundary line and the width and height based on that one corner.

9. A computer-readable storage medium storing the inference program according to claim 8, wherein the display unit displays the brightness of the surrounding image identification area at a lower level than the brightness of the area inside the boundary line.

10. An inference method comprising the steps of: acquiring a captured image in which multiple objects are photographed, and a peripheral image identification region set in a part of the captured image for identifying the surrounding image of the objects; setting an object detection region where an object identification model has detected an object for each of the multiple objects in the captured image, and identifying the object for each object detection region; and determining the success or failure of user authentication based on the position of the object detection region relative to the peripheral image identification region.

11. An inference device comprising: a communication unit that acquires a captured image in which multiple objects are photographed and a peripheral image identification region set in a part of the captured image for identifying the surrounding image of the objects; an object identification model that sets an object detection region for each of the multiple objects in the captured image and identifies the object for each object detection region; and a determination unit that determines the success or failure of user authentication based on the position of the object detection region relative to the peripheral image identification region.

12. A computer-readable storage medium storing a learning program for causing a computer to perform the following steps: a procedure for causing an object recognition model to learn to identify multiple objects appearing in a predetermined combination in a captured image for object recognition learning; and a procedure for recording the learned object recognition model in a recording unit.

13. A learning method comprising the steps of: training an object recognition model to identify multiple objects that appear in a predetermined combination in a captured image for object recognition training; and recording the trained object recognition model in a recording unit.

14. A learning device comprising an object recognition learning unit that causes an object recognition model to perform training to identify multiple objects appearing in a predetermined combination in a captured image for object recognition training, and records the trained object recognition model in a recording unit.