Inference program, inference method, inference device, learning program, learning method, and learning device

The use of object recognition and image classification models in the reward system accurately identifies multiple objects and prevents fraud, addressing the limitations of conventional systems in handling set menus and fraudulent submissions.

JP2026100395AActive Publication Date: 2026-06-19CYGAMES INC

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
CYGAMES INC
Filing Date
2024-12-09
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Conventional reward systems fail to identify multiple products in a captured image, such as set menus, and are vulnerable to fraud, as they rely on image recognition technology for single objects and cannot handle occlusions or fraudulent submissions.

Method used

An inference program and learning program that utilize object recognition and image classification models, such as YOLO and ResNet, to identify multiple objects in a captured image and detect fraudulent attempts by classifying images into positive or negative examples.

Benefits of technology

Accurately identifies multiple objects in a captured image and detects fraud, ensuring successful user authentication and preventing fraudulent activities in reward systems.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026100395000001_ABST
    Figure 2026100395000001_ABST
Patent Text Reader

Abstract

The present invention provides an inference device that can correctly identify multiple objects in a captured image and determine whether authentication is successful or unsuccessful. [Solution] The inference device 60 includes an inference device-side communication unit 91 that acquires a captured image 83 in which multiple objects are photographed, and a surrounding image identification region set in a part of the captured image 83 for identifying the surrounding image of an object, and an object identification AI model 43 that sets an object detection region where an object is detected for each of the multiple objects in the captured image 83 and identifies an object for each object detection region, and determines the success or failure of user authentication based on the position of the object detection region relative to the surrounding image identification region.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The present invention relates to an inference program, an inference method, an inference device, a learning program, a learning method, and a learning device.

Background Art

[0002] Conventionally, a reward giving system has been provided that identifies a product from a captured image of a single product and gives a reward (including benefits such as points, in-game currency, in-game items, etc.) to the user who transmitted the captured image when the captured product is appropriate. Since a benefit is given to the user who is determined to have a proper captured product, it is expected that the user's purchasing desire will increase.

[0003] Patent Document 1 describes a system including a terminal device and a server. The terminal device described in Patent Document 1 determines whether any one of a plurality of types of objects is imaged in an image of a space imaged by an imaging device of the portable terminal device based on local feature amounts regarding shapes for detecting a plurality of types of objects. When it is determined that any one of the types of objects is imaged, the image determined to have the object imaged is taken in as an acquisition target object image, and the acquisition target object image is transmitted to the server. The server receives the transmitted acquisition target object image, determines a feature correlation amount of the acquisition target object image with respect to the one type of object based on the image feature amount of the one type of object among the plurality of types of objects, and determines whether the acquisition target object image is an image that images the one type of object based on the determined feature correlation amount.

Prior Art Documents

Patent Documents

[0004]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0005] Conventional reward systems assume that a single product is captured in the image, and if the results of image processing and object recognition on the captured image are deemed valid, the system determines that user authentication is successful and rewards the user. However, conventional reward systems could not identify multiple products (for example, a combination of burgers and drinks) such as set menus in restaurants or food delivery services. Set menus have multiple types of drinks and burgers, resulting in numerous combinations. Furthermore, it is necessary to assume that the user will arrange the products in any order and photograph them from any angle. For this reason, image recognition technology for a single object, such as that disclosed in Patent Document 1, could not be used at all.

[0006] Furthermore, a new challenge arose with images containing multiple products: occlusion could occur, for example, a burger might appear in front of a drink cup, obscuring part of the cup, or conversely, a drink might appear in front of a burger, obscuring part of the burger. Therefore, a mechanism was needed to identify multiple items in a set menu in a photograph in a predetermined combination and reliably authenticate the user. Correctly identifying the set of products was therefore essential.

[0007] Furthermore, in a reward system that rewards users for submitting images of set menus taken at any location, it is also necessary to address user fraud. Fraudulent activity includes attempts to successfully authenticate by displaying an image of a set menu (for example, an image obtained from the internet) on a smartphone or PC (Personal Computer) screen and then taking a picture of this displayed image.

[0008] This invention was made in view of the above circumstances, and aims to correctly identify multiple objects in a captured image, detect fraud, and determine whether authentication is successful or unsuccessful. [Means for solving the problem]

[0009] The inference program according to the present invention causes a computer to perform the following steps: acquire a captured image in which multiple objects are photographed, and region information of a surrounding image identification region for identifying the surrounding image of an object set in a part of the captured image; set an object detection region where an object has been detected for each of the multiple objects in the captured image by an object identification model, and identify the multiple objects in the captured image for each object detection region; and determine whether user authentication is successful or unsuccessful based on the position of the object detection region relative to the surrounding image identification region. The above inference program is one aspect of the present invention, and an inference method and inference apparatus that reflect one aspect of the present invention are configured in the same manner as the above inference program.

[0010] Furthermore, the learning program according to the present invention causes a computer to perform the following steps: a procedure to have an object recognition model learn to identify multiple objects that appear in a predetermined combination in a captured image for object recognition learning, and a procedure to record the learned object recognition model in a recording unit. The above-described learning program is one aspect of the present invention, and a learning method and learning apparatus that reflect one aspect of the present invention are configured in the same manner as the above-described learning program. [Effects of the Invention]

[0011] According to the present invention, it is possible to correctly identify multiple objects in a captured image and detect fraud, thereby determining whether authentication is successful or unsuccessful, and thus preventing fraud. Other issues, configurations, and effects not mentioned above will be clarified by the following description of the embodiments. [Brief explanation of the drawing]

[0012] [Figure 1] This is an overall configuration diagram showing an overview of the reward system according to the first embodiment of the present invention. [Figure 2] This is a block diagram showing an example of the hardware configuration of an information processing terminal according to the first embodiment of the present invention. [Figure 3]It is a block diagram showing the hardware configuration of a learning device according to the first embodiment of the present invention. [Figure 4] It is a block diagram showing an example of the functional configuration of a learning device according to the first embodiment of the present invention. [Figure 5] It is a flowchart showing an example of object recognition learning processing according to the first embodiment of the present invention. [Figure 6] It is a flowchart showing an example of image classification learning processing according to the first embodiment of the present invention. [Figure 7] It is a block diagram showing the hardware configuration of an inference device according to the first embodiment of the present invention. [Figure 8] It is a diagram showing an example of a shooting area, an internal area, and a peripheral image identification area according to the first embodiment of the present invention. [Figure 9] It is a diagram showing an example of the display of a check-in screen when the check-in according to the first embodiment of the present invention is successful. [Figure 10] It is a diagram showing an example of the display of a check-in screen when the check-in according to the first embodiment of the present invention fails. [Figure 11] It is a block diagram showing an example of the functional configuration of a reward granting system according to the first embodiment of the present invention. [Figure 12] It is a sequence diagram showing an example of the processing of an information processing terminal and an inference device according to the first embodiment of the present invention. [Figure 13] It is a flowchart showing an example of object recognition inference processing according to the first embodiment of the present invention. [Figure 14] It is a diagram showing an example of object recognition results according to the first embodiment of the present invention. [Figure 15] It is a flowchart showing an example of object recognition determination processing according to the first embodiment of the present invention. [Figure 16] It is a diagram showing an example of object recognition results and check-in results according to the first embodiment of the present invention. [Figure 17] It is a flowchart showing an example of image classification inference processing according to the first embodiment of the present invention. [Figure 18]It is a diagram showing a configuration example of a determination result table according to the first embodiment of the present invention. [Figure 19] It is a block diagram showing a functional configuration example of a reward giving system according to the second embodiment of the present invention. [Figure 20] It is a block diagram showing a functional configuration example of a reward giving system according to the third embodiment of the present invention.

Embodiments for Carrying Out the Invention

[0013] Hereinafter, embodiments for carrying out the present invention will be described with reference to the accompanying drawings. In this specification and the drawings, components having substantially the same function or configuration are denoted by the same reference numerals, and redundant descriptions are omitted.

[0014] [First Embodiment] Before determining to give a reward (including benefits such as points, in-game currency, in-game items, etc.) to a user who has photographed an existing product itself with a camera, a process of determining the correct combination of objects from the photographed image by object recognition processing (first determination), and a process of detecting that the photographed image is not illegal (second determination) are combined, and the process of authenticating the user is called check-in. And a system in which a reward is given to a user authenticated by check-in is called a reward giving system. For this reason, the reward giving system is also called a check-in system. In the reward giving system, a user who has performed an illegal operation is determined to have failed authentication, and no reward is given to this user, or even a reward that has been given once is canceled. The realization of such a reward giving system is essentially important in promoting the promotion of products and / or services.

[0015] Traditionally, user authentication was determined using methods such as QR codes or near-field radio communication (NFC). However, these methods required modifications to the POS (Point of Sale) system, resulting in significant implementation costs. There are also systems that use BLE (Bluetooth Low Energy) beacons for user authentication, which do not require modifications to the POS system. However, the small devices that emit BLE beacons can only be used for check-ins at specific locations, and deploying them to thousands of stores tends to be costly, making it difficult to confirm that a user has purchased a specific product.

[0016] On the other hand, the reward system according to the first embodiment of the present invention is a technology that confirms a user's purchase of products by taking a picture of the purchased set menu with a smartphone camera, targeting a set menu consisting of predetermined products. The embodiments described below describe a reward system that has the ability to identify the type of product, in particular a set menu consisting of a combination of specific, diverse products (an example of an object), and tamper resistance to prevent fraudulent authentication.

[0017] <Example of the overall configuration of a reward system> First, an example of the configuration of a reward-granting system according to the first embodiment of the present invention will be described. This reward-granting system is configured by combining a learning device that learns object recognition processing and image classification processing, an inference device that infers object recognition and image classification, and an information processing terminal. Figure 1 is an overall configuration diagram showing an overview of the reward system 10 according to the first embodiment of the present invention. Hereinafter, the object that is the target of object identification processing, etc., may be described as a product.

[0018] The reward system 10 comprises a tablet terminal 2_1, a PC (Personal Computer) 2_2, a learning device 30, and an inference device 60. The tablet terminal 2_1 and PC 2_2 used by users can connect to the inference device 60 via a network N such as the Internet. The tablet terminal 2_1 and PC 2_2 used by administrators can connect to the learning device 30 and the inference device 60. In the following description, the tablet terminal 2_1 and PC 2_2 will be collectively referred to as the information processing terminal 2.

[0019] The learning device 30 is a device that trains an object recognition AI model 43 (see Figure 4 below), which is an example of an object recognition model, to perform object recognition processing, and trains an image classification AI model 53 (see Figure 4 below) to perform image classification processing. The inference device 60 is a device that has the trained object recognition AI model 43 identify products in the captured image, and has the trained image classification AI model 53 perform binary classification of the captured image into positive or negative examples.

[0020] Therefore, the learning device 30 and the inference device 60 manage the programs used as the object recognition AI model 43 and the image classification AI model 53, as well as various types of data. The object recognition AI model 43 uses YOLO (You Only Look Once), which will be described later. The image classification AI model 53 uses ResNet, which will be described later.

[0021] Information processing terminal 2 will be described as being operated by a user. The user is assumed to be a person who purchases a set of products, takes a picture of the set of products, and operates information processing terminal 2 to request authentication.

[0022] In the tablet terminal 2_1, which constitutes the information processing terminal 2, a touch panel display device is used in which the input device 26 and output device 27 are integrated. In PC2_2, the input device 26 and output device 27 are separate units. Alternatively, PC2_2 may be configured as a desktop PC, with the input device 26 and output device 27 connected separately to the desktop PC.

[0023] The information processing terminal 2 selects a program based on the operation signals input from the input device 26 by the user's actions and outputs a video signal to the output device 27 that matches the screen of the output device 27. The output device 27 displays the video based on the video signal. The operation signals input from the input device 26 are, for example, signals corresponding to each operation button on the keyboard. The user can input instructions through the input device 26, instruct the inference device 60 to execute an inference program, or operate an inference program recorded in the terminal's recording device 22. Examples of operations on the inference program input from the input device 26 include various command inputs such as launching an application program, taking a picture, and instructing the transmission of a captured image. Another example of an operation performed from the input device 26 is a tap operation, such as touching the screen of the output device 27 with a finger or pen.

[0024] The information processing terminal 2 performs processes such as reading image data from the recording device 22 and executing a program, and displaying a screen on the output device 27 in accordance with operation signals input from the input device 26. For example, in response to operations performed by the user through the input device 26, the information processing terminal 2 displays a screen on the output device 27 in which an image based on the image data read from the recording device 22 has been drawn.

[0025] During training, the information processing terminal 2 is used by the administrator. The information processing terminal 2 receives input to send images of multiple products in a predetermined combination, intended for object recognition training, to the learning device 30, causing the object recognition AI model 43 to learn how to identify products. The information processing terminal 2 also receives input to send images of multiple products in a predetermined combination, intended for image classification, to the learning device 30, causing the image classification AI model 53 to learn how to classify the images into positive or negative examples. Furthermore, the information processing terminal 2 can also perform data augmentation on the images used for object recognition training.

[0026] The inference program according to this embodiment, which operates on the inference device 60, causes the object recognition AI model 43 to identify multiple products in the captured image 83 based on instruction information input via the input device 26 (an example of an input unit). The inference program according to this embodiment also causes the image classification AI model 53 to classify the captured image into positive or negative examples.

[0027] <Example Hardware Configuration for a Reward System> Next, we will describe an example of the hardware configuration of each terminal and device included in the reward system 10 according to the first embodiment. Figure 2 is a block diagram showing an example of the hardware configuration of the information processing terminal 2. The hardware configuration examples of the learning device 30 and the inference device 60 will be described later.

[0028] (Example of an information processing terminal configuration) Information processing terminal 2 is an example of a computer that operates as a computer capable of executing various programs. This information processing terminal 2 is equipped with a processor 21, a recording device 22, and a network interface 24, each connected to a bus 23.

[0029] The processor 21 is composed of at least one of the following: a CPU (Central Processing Unit), an MPU (Microprocessor Unit), a GPU (Graphics Processing Unit), and an FPGA (Field Programmable Gate Array). The processor 21 reads the program code of the application that realizes each function according to this embodiment from the recording device 22, loads it into a temporary storage unit (not shown) provided in the recording device 22, and executes the program code. The processor 21 performs calculations based on information acquired from the inference device 60, and performs processing necessary to draw the application's GUI on the output device 27 of the information processing terminal 2. The processor 21 also performs processing of the OS (Operating System) of the information processing terminal 2, and manages data input and output performed in various parts of the information processing terminal 2. Furthermore, when the processor 21 handles information related to user authentication, it can output an image signal to the output device 27 through the input / output interface 25.

[0030] The recording device 22 is composed of, for example, ROM (Read Only Memory) and RAM (Random Access Memory). ROM can be an optical disc, magneto-optical disc, DVD (Digital Versatile Disc)-ROM, CD-ROM, Blu-ray® disc, etc. RAM can be SRAM (Static RAM), DRAM (Dynamic RAM), etc. Variables and parameters generated during the processing of the processor 21 are temporarily written to the recording device 22, and these variables and parameters are read out by the processor 21 as needed.

[0031] Furthermore, the recording device 22 is composed of at least one of the following: an HDD (Hard Disk Drive), an SSD (Solid State Drive), and flash memory. The recording device 22 stores the OS of the information processing terminal 2, various parameters, programs for making the information processing terminal 2 function, and application programs used for user authentication. As described above, the recording device 22 stores programs and data necessary for the processor 21 to operate, and is used as an example of a computer-readable, non-transient storage medium that stores programs executed by the information processing terminal 2.

[0032] For example, a NIC (Network Interface Card) can be used for the network interface 24. The network interface 24 can transmit and receive various types of data between the learning device 30 and the inference device 60, and communicate with other information processing terminals 2, via a dedicated line connected to the terminals of the NIC and through the network N.

[0033] The input / output interface 25 converts the operation signals received from the input device 26 into data in a predetermined format and passes the converted data to the processor 21. The input / output interface 25 also converts the data of the screen drawn by the processor 21 into a video signal and outputs it to the output device 27.

[0034] The input device 26 is a device that receives input instructions or various types of information from the user. An example of the input device 26 is a pointing device that can input coordinate information of a location specified by the user. This pointing device could be a mouse, a touch panel device, etc. A touch panel device is configured by combining the input device 26 and the output device 27. The input device 26 may also be a keyboard, mouse, etc.

[0035] The output device 27 is a device that outputs information processed by the processor 21. An example of an output device 27 is a display device (such as a display unit or touch panel). When the output device 27 is a display device, the image (for example, a captured image) based on the video signal received from the input / output interface 25 is displayed on the display device.

[0036] Camera 28 can capture various images through the operation of a user using the information processing terminal 2. Camera 28 saves still images as captured images to the recording device 22. Camera 28 can also save moving images as captured images to the recording device 22. When taking images using camera 28, an image of the product is displayed on the output device 27, so the user can confirm the position of the product in the captured image.

[0037] <Example of hardware configuration for a learning device> Next, we will describe an example of the hardware configuration of the learning device 30. Figure 3 is a block diagram showing the hardware configuration of a learning device 30 according to a first embodiment of the present invention. The learning device 30 is one example of a system for generating trained models, an object recognition AI model 43 and an image classification AI model 53 (see Figure 4 below), which are composed of one or more devices. However, in the following embodiments, for the sake of explanation, it will be described as a single device. The system for generating the object recognition AI model 43 and the image classification AI model 53 can also mean the learning device 30. The same applies to the inference device 60 below.

[0038] The learning device 30 comprises a processor 31, an input device 32, a display device 33, a recording device 34, and a communication device 35. These components are connected by a bus 36. Interfaces are assumed to be interposed between the bus 36 and each component as needed. The learning device 30 includes configurations similar to those of a typical server or PC.

[0039] The processor 31 controls the operation of the entire learning device 30. For example, the processor 31 is at least one of the following: CPU, MPU, GPU, and FPGA. The processor 31 performs various processes by reading and executing programs and data stored in the recording device 34. The processor 31 may be composed of multiple processors.

[0040] The input device 32 is a user interface that receives input from the user to the learning device 30, and is, for example, a touch panel, touchpad, keyboard, mouse, or buttons. The display device 33 is a display that shows application screens and the like to the user of the learning device 30 according to the control of the processor 31.

[0041] The recording device 34 (an example of a recording unit) includes a main memory and an auxiliary memory. The main memory is a semiconductor memory such as RAM. RAM is a volatile storage medium that allows for high-speed reading and writing of information and is used as a storage area and work area when the processor 31 processes information. The main memory may also include ROM, which is a read-only non-volatile storage medium. The auxiliary memory stores various programs and data used by the processor 31 when executing each program. The auxiliary memory may be any non-volatile storage or non-volatile memory that can store information, and may be removable.

[0042] The communication device 35 exchanges data with the information processing terminal 2 or other computers such as a server via a network, and is, for example, a wireless LAN module. The communication device 35 can also be other wireless communication devices or modules such as a Bluetooth® module, or wired communication devices or modules such as an Ethernet® module or a USB interface. The system configuration and data structure of this embodiment are described in detail below.

[0043] <Example of functional configuration of a learning device> Next, we will explain the process of generating the object recognition AI model 43 and the image classification AI model 53. Figure 4 is a block diagram showing an example of the functional configuration of a learning device 30 according to the first embodiment of the present invention. The learning device 30 comprises an object recognition learning unit 42, an object recognition AI model conversion unit 44, an image classification learning unit 52, and an image classification AI model conversion unit 54. The learning device 30 also comprises object recognition learning data 41, an object recognition AI model 43, a terminal object recognition AI model 45, image classification learning data 51, an image classification AI model 53, and a terminal image classification AI model 55, all recorded in a recording device 34.

[0044] (Object recognition learning) First, we will explain an example of the configuration and processing involved in object recognition learning. The object recognition learning unit 42 reads the object recognition learning data 41 prepared in the recording device 34 shown in Figure 3 and generates an object recognition AI model 43 that has been trained to perform object recognition processing. Object recognition processing is the process of identifying multiple products that appear in a captured image. The object recognition learning data 41 stores a large number of captured images for object recognition training. The captured images show multiple products in predetermined combinations, and each product is pre-labeled with a label representing its product name. The object recognition AI model 43 generated by the object recognition learning unit 42 is stored in the recording device 34. When another object recognition learning data 41 is prepared, the object recognition learning unit 42 can train the object recognition AI model 43 read from the recording device 34 to perform object recognition processing again.

[0045] Returning to the explanation of Figure 4. The object recognition AI model 43 described above is, for example, constructed using YOLO. YOLO is a real-time deep learning model used for object detection. YOLO's greatest feature is its ability to perform object detection with extremely high speed and accuracy by inferring object detection and labeling end-to-end. Furthermore, because YOLO captures the background pattern and object features as a whole, it has fewer false positives and higher generalization performance than other methods. In particular, even when one product is placed in front of another product, causing occlusion, YOLO can correctly identify the object if its features partially match.

[0046] This section explains the YOLO training method. For YOLO training, a "web application for capturing training data," implemented using the same code as the prototype web application, is used to photograph items from a set menu. Data is prepared by selecting and labeling the areas of the photographed items. At this time, the training data used is cropped and reduced to the same aspect ratio as during inference.

[0047] The object recognition AI model conversion unit 44 generates a terminal object recognition AI model 45 by optimizing the object recognition AI model 43 read from the recording device 34. The terminal object recognition AI model 45 uses less memory than the original object recognition AI model 43, and the size of the model file itself is also reduced. Therefore, the terminal object recognition AI model 45 can be operated even on an information processing terminal 2 with limited resources. The terminal object recognition AI model 45 is used in the second and third embodiments described later.

[0048] Figure 5 is a flowchart showing an example of object recognition learning processing. First, the object recognition learning unit 42 acquires object recognition learning data 41 from the recording device 34 (S1). Next, the object recognition learning unit 42 generates an object recognition AI model 43 that has been trained to perform object recognition processing based on the object recognition learning data 41 (S2). Next, the object recognition learning unit 42 records the trained object recognition AI model 43 in the recording device 34 (S3). If the terminal object recognition AI model 45 is not used, this process is terminated.

[0049] When using the terminal object recognition AI model 45, the object recognition AI model conversion unit 44 converts the object recognition AI model 43 read from the recording device 34 into the terminal object recognition AI model 45 (S4). Next, the object recognition AI model conversion unit 44 records the terminal object recognition AI model 45 into the recording device 34 (S5), and the process ends.

[0050] (Image classification learning) Next, we will describe an example of the configuration and processing involved in image classification learning. Traditionally, there have been fraudulent attempts to successfully authenticate users in reward systems by displaying or printing photos obtained from the internet or other sources on a screen without actually purchasing any products. Since it is necessary to detect such fraud, this embodiment constructs an image classification AI model 53, capable of classifying images into positive or negative examples, as an example of an image classification model.

[0051] The image classification learning unit 52 reads the image classification learning data 51 prepared in the recording device 34 and generates an image classification AI model 53 that has been trained to perform image classification processing. Image classification processing is the process of classifying photographed images for image classification, in which multiple products are shown in predetermined combinations, into positive examples or negative examples. The image classification learning data 51 stores a large number of photographed images for image classification learning. Each photographed image contains multiple products in predetermined combinations, with the photographed image for image classification learning used as a positive example and the incorrect image as a negative example. The image classification AI model 53 generated by the image classification learning unit 52 is stored in the recording device 34. When another set of image classification learning data 51 is prepared, the image classification learning unit 52 can train the image classification AI model 53 read from the recording device 34 to perform image classification processing again.

[0052] Positive example images include training data 41 for object recognition. Negative example images are not shown, but include, for example, images of other images processed with a paint tool, printed images, etc.

[0053] Image classification AI model 53 is constructed using, for example, ResNet. While ResNet is not specialized for any particular purpose, it is a model that exhibits extremely high performance in image classification. It is characterized by its ability to improve performance even with very deep networks (such as 700 layers) and achieve high accuracy in image recognition tasks. Training ResNet is extremely simple; it learns binary classification using the data used in the YOLO training described above as positive examples, and images of fraudulent check-ins intentionally created manually by debugging staff as negative examples.

[0054] The image classification AI model conversion unit 54 generates a terminal image classification AI model 55 by optimizing the image classification AI model 53 read from the recording device 34. The terminal image classification AI model 55 uses less memory than the original image classification AI model 53, and the size of the model file itself is also reduced. Therefore, the terminal image classification AI model 55 can be operated even on an information processing terminal 2 with limited resources. The terminal image classification AI model 55 is used in the third embodiment described later.

[0055] The object recognition AI model conversion unit 44 and the image classification AI model conversion unit 54 described above each have the function of converting the AI ​​model into a highly efficient neural network representation file format called ONNX, thereby reducing the file size. The terminal object recognition AI model 45 and the terminal image classification AI model 55 are small enough to be loaded into the RAM or VRAM of the information processing terminal 2, making deployment to the information processing terminal 2 a realistic option. Furthermore, the terminal object recognition AI model 45 and the terminal image classification AI model 55 perform inference processing on middleware called ONNX Runtime, which is responsible for the optimization execution of neural network models. This results in behavior that is automatically optimized for the SIMD instructions of the CPU and GPU instructions of the information processing terminal 2, enabling inference at a realistic speed.

[0056] Figure 6 is a flowchart showing an example of image classification learning processing. First, the image classification learning unit 52 acquires image classification training data 51 from the recording device 34 (S11). Next, the image classification learning unit 52 generates an image classification AI model 53 that has been trained to perform image classification processing based on the image classification training data 51 (S12). Next, the image classification learning unit 52 records the trained image classification AI model 53 in the recording device 34 (S13). If the terminal image classification AI model 55 is not used, this process is terminated.

[0057] When using the terminal image classification AI model 55, the image classification AI model conversion unit 54 converts the image classification AI model 53 read from the recording device 34 into the terminal image classification AI model 55 (S14). Next, the image classification AI model conversion unit 54 records the terminal image classification AI model 55 into the recording device 34 (S15), and the process ends.

[0058] (Example hardware configuration for inference device) Next, we will describe an example configuration of the inference device 60. Figure 7 is a block diagram showing the hardware configuration of an inference device 60 according to the first embodiment of the present invention. The inference device 60 comprises a processor 61, an input device 62, a display device 63, a recording device 64, and a communication device 65. These components are connected by a bus 66. Interfaces are assumed to be interposed between the bus 66 and each component as needed. The inference device 60 includes a configuration similar to that of a general server or PC.

[0059] The processor 61 controls the operation of the entire inference device 60. For example, the processor 61 is at least one of a CPU, MPU, GPU, and FPGA. The processor 61 performs various processes by reading and executing programs and data stored in the recording device 64. The processor 61 may be composed of multiple processors.

[0060] The input device 62 is a user interface that receives input from the user to the inference device 60, and is, for example, a touch panel, touchpad, keyboard, mouse, or buttons. The display device 63 is a display that shows application screens and the like to the user of the inference device 60 according to the control of the processor 61.

[0061] The recording device 64 includes a main memory and an auxiliary memory. The main memory is a semiconductor memory such as RAM. RAM is a volatile storage medium that allows for high-speed reading and writing of information and is used as a storage area and work area when the processor 61 processes information. The main memory may also include ROM, which is a read-only non-volatile storage medium. The auxiliary memory stores various programs and data used by the processor 61 when executing each program. The auxiliary memory may be any non-volatile storage or non-volatile memory that can store information and may be removable.

[0062] The communication device 65 exchanges data with the information processing terminal 2 or other computers such as a server via a network, and is, for example, a wireless LAN module. The communication device 65 can also be other wireless communication devices or modules such as a Bluetooth® module, or wired communication devices or modules such as an Ethernet® module or a USB interface.

[0063] <Description of the inference device's functions> Next, the functions of the inference device 60 will be described. The inference device 60 consists of a user interface part (referred to as the UI part) that acquires check-in images and a backend part (referred to as the BE part) that verifies the check-in images using two AI models. First, the user interface part that the user directly sees will be described. The check-in user interface in this embodiment and each area will be described with reference to Figures 8 to 10.

[0064] <Explanation of each area> First, with reference to Figure 8, we will explain each area displayed on the display device (output device 27) of the information processing terminal 2. Figure 8 shows examples of the imaging area, internal area, and peripheral image recognition area.

[0065] The shooting area display unit 81 represents the screen displayed on the information processing terminal 2, which is used in portrait orientation. The shooting area 101 is the area enclosed by a dashed line in the figure, and is the area that the camera 28 can capture. The display area 102 is the area filled with hatching in the figure. The display device displays the image captured in the display area 102 from the area that can be captured in the shooting area 101.

[0066] The shooting area 103 is the area enclosed by the dashed line in the figure, and represents the size of the captured image. The product captured by the camera 28 is recorded in the recording device 22 shown in Figure 2 as a captured image 83 that is the size displayed in the shooting area 103. The aspect ratio of the shooting area 103 is set to, for example, 10:13.

[0067] The internal region 104 is the area inside the imaging region 103, scaled down from its center point. The area of ​​the internal region 104 is adjusted to be, for example, 70% or less of the area of ​​the imaging region 103. The position of the internal region 104 is determined by the x and y coordinates of the upper left corner, and the size of the internal region 104 is determined by its height h and width w. The area remaining after removing the internal region 104 from the imaging region 103 is called the peripheral image recognition region 105. The area of ​​the peripheral image recognition region 105 is smaller than the area of ​​the imaging region 103. For example, the area of ​​the peripheral image recognition region 105 is adjusted to be more than 30% of the area of ​​the imaging region 103.

[0068] Figure 9 shows an example of the display of the check-in screen W1 when check-in is successful. The check-in screen W1 is displayed on the output device 27 of the information processing terminal 2 and outputs whether the check-in was successful or not.

[0069] Figure 9 shows an example of a check-in screen W1 in which an application installed on the information processing terminal 2 activates the camera 28 and displays the video from the camera 28 on the page in real time. The check-in screen W1 displays a bounding frame 106 with an annular boundary line. The bounding frame 106 is rectangular and indicates the boundary between the internal area 104 and the surrounding image recognition area 105. In addition, a message instructing the user to place the item to be checked in within the bounding frame 106 is displayed in the message display area 107 shown at the bottom of the check-in screen W1.

[0070] The brightness of the peripheral image recognition area 105 is displayed lower than the brightness of the internal area 104, which is inside the boundary line representing the boundary frame 106. Therefore, the user tries to operate the camera 28 to take a picture so that the product is inside the boundary frame 106. As a result, the camera 28 takes a picture not only of the image in the internal area 104, but also of the image in the peripheral image recognition area 105, which is outside the boundary frame 106. The diagonal line crossing the internal area 104 and the peripheral image recognition area 105 represents the edge of the table on which the product to be checked in is placed. When the user presses the capture button displayed in the message display area 107 in the display example (1) of Figure 9, the camera 28 takes a picture of the product, and the display example (2) of Figure 9 is displayed.

[0071] When the user presses the "Retake" button displayed in the message display area 107 in the display example (2) of Figure 9, they return to the display example (1), allowing the user to retake the photo of the product. When the user presses the "OK" button displayed in the message display area 107, the information processing terminal 2 transmits the entire image, including the internal area 104 and the surrounding image recognition area 105, as the captured image 83 to the inference device 60. The object recognition AI model 43 of the inference device 60 identifies the product in the captured image 83.

[0072] A key feature of the user interface shown in the display examples (1) and (2) in Figure 9 is that the internal region 104 represented by the boundary frame 106 is a rectangle narrower than the shooting region 103 representing the shooting range. Furthermore, it is characterized by transmitting the captured image and region information, including the coordinates of the boundary frame 106 within that image, to the inference device 60. Although this boundary frame 106 is not drawn in the check-in video, the coordinates x,y of the top-left vertex of the boundary frame 106, and the width w and height h of the boundary frame 106 are transmitted to the inference device 60 as region information (x, y, w, h).

[0073] Display example (3) in Figure 9 shows an example of displaying the check-in judgment result. The check-in judgment result displays a message indicating that the check-in was successful because the identified product combination is valid and the captured image 83 is not fraudulent, and that a reward (an example of a reward) will be sent to the user who successfully checked in.

[0074] Figure 10 shows an example of the display of the check-in screen W1 when check-in fails. In the example display (1) in Figure 10, only the drink and burger are displayed from a set containing multiple items. If the user presses the capture button in this state, the captured image will show only the drink and burger, as shown in the example display (2) in Figure 10. In this case, the check-in is determined to have failed.

[0075] If the check-in is determined to be unsuccessful, a message indicating that the check-in could not be completed will be displayed in the message display area 107, as shown in the example display (3) in Figure 10. The message display area 107 will also display information including a request to retake the set after checking all the items included in the set to be checked, and the names of the items that the object recognition AI model 43 could not identify.

[0076] (Example of a reward system's functional configuration) Next, an example of the functional configuration of the reward system 10 according to the first embodiment will be described. Figure 11 is a block diagram showing an example of the functional configuration of the reward system 10 according to the first embodiment. The reward system 10 according to the first embodiment is configured as a client-server system with the information processing terminal 2 as the client and the inference device 60 as the server.

[0077] First, we will explain an example of the functional configuration of information processing terminal 2. The information processing terminal 2 includes a shooting area display unit 81, a shooting unit 82, a captured image 83 stored in a recording device 22 (see Figure 2), a terminal-side communication unit 84, and a result display unit 85.

[0078] The shooting area display unit 81 displays the shooting area 103 and boundary frame 106, etc., on the display device (output device 27) of the information processing terminal 2. The imaging unit 82 generates an image 83 of the product captured in the imaging area 103 using the functions of the camera 28 shown in Figure 2. The image 83 is recorded in the recording device 22 and output to the terminal-side communication unit 84 by a dedicated application launched on the information processing terminal 2.

[0079] The terminal-side communication unit 84 transmits the captured image 83 to the inference device 60 via the network N. In addition to the captured image 83, the terminal-side communication unit 84 also transmits region information of an internal region 104 that is set to be smaller than the captured image 83, and user information of the user operating the information processing terminal 2. The region information of the internal region 104 represents the position and size of the internal region 104 within the captured image 83, and is represented by (x, y, w, h) as shown in Figure 8. The terminal-side communication unit 84 also receives the object identification determination result from the inference device 60.

[0080] The result display unit 85 displays the check-in judgment result received by the terminal-side communication unit 84. As will be described later, the check-in judgment result includes either a success or failure of the check-in. If the check-in is successful, the user can expect to be given benefits linked to their user information. On the other hand, if the check-in fails, it includes information about products that the object recognition AI model 43 could not identify. Therefore, the user can retake photos of the products based on the check-in judgment result displayed by the result display unit 85.

[0081] Next, we will describe an example of the functional configuration of the inference device 60. The inference device 60 includes an inference device side communication unit 91, captured images 83 recorded in a recording device 64, an image normalization unit 92, an object recognition AI model 43, an object recognition determination unit 93, an image classification AI model 53, a score determination unit 94, and a determination result table 95.

[0082] The inference device communication unit 91 acquires a captured image 83 containing multiple products and a surrounding image identification region 105 set in a part of the captured image 83 for identifying the surrounding images of the products, which are transmitted from the terminal-side communication unit 84 of the information processing terminal 2. The surrounding image identification region 105 is a region identified from the captured image 83 by the coordinate information of one corner of a rectangular boundary line and the width and height based on that corner, as shown in Figure 8, and is acquired as region information. The inference device communication unit 91 also acquires user information transmitted from the terminal-side communication unit 84 of the information processing terminal 2 via the network N. The captured image 83, region information, and user information are stored in the recording device 64 of the inference device 60. The inference device communication unit 91 also transmits the object identification determination result from the object identification determination unit 93 to the terminal-side communication unit 84 of the information processing terminal 2 via the network N.

[0083] The image normalization unit 92 normalizes the captured image 83 read from the recording device 64. Products are photographed in various aspect ratios depending on the type of information processing terminal 2. Normalization of the captured image 83 is performed so that the object recognition AI model 43 can process captured images 83 taken in various aspect ratios in a common manner. Normalization of the captured image 83 is extremely important for improving the accuracy of product recognition by the object recognition AI model 43.

[0084] The image normalization unit 92 calculates the size of the internal region 104 corresponding to the image after normalization. This is called the normalization frame. The image normalization unit 92 then determines the resampling ratio according to the ratio between the normalization frame and the size of the input captured image 83. After determining the region to be retained after resampling, the image normalization unit 92 actually resamples the captured image 83, thereby enabling normalization of the captured image 83 centered on the internal region 104.

[0085] The object recognition AI model 43 is copied from the recording device 34 of the learning device 30 to the recording device 64 of the inference device 60. The object recognition AI model 43 takes the normalized captured image 83 as input and sets an object detection region 110 (see Figure 14 described later) where an object is detected for each of the multiple products shown in the captured image 83, and identifies the product for each object detection region 110.

[0086] The object recognition determination unit 93 performs a first determination for user authentication based on the position of the object detection area relative to the surrounding image recognition area 105. As shown in Figure 15, which will be described later, the object recognition determination unit 93 determines that the first determination for authentication has failed if the overlap between the object detection area 110 and the surrounding image recognition area 105 is greater than a predetermined amount (for example, 10-30% or more of the area of ​​the surrounding image recognition area 105).

[0087] Furthermore, the object identification determination unit 93 determines the first determination performed for authentication to be a failure if the overlap between the object detection area 110 and the surrounding image identification area 105 is less than a predetermined amount, and the set of identified products does not satisfy the set condition. The set condition is used to determine that the set of identified products corresponds to a predetermined combination of multiple different products. For this reason, the object identification determination unit 93 determines the first determination performed for authentication to be a success if the overlap with the surrounding image identification area 105 is less than a predetermined amount, and the set of identified products satisfies the set condition.

[0088] The object identification determination result determined by the first determination is transmitted to the information processing terminal 2 via the inference device side communication unit 91. The object identification determination result is also recorded in the determination result table 95. The contents recorded in the determination result table 95 are referred to as appropriate by the administrator of the inference device 60.

[0089] Thus, the inference device 60 can implement a process that does not allow check-in if the area of ​​the product detected by the object recognition AI model 43 deviates from the shooting range in the center of the screen, that is, if the image of the product encroaches on the surrounding image recognition area 105 and the surrounding image recognition area 105 becomes less than a certain area (for example, less than 70-90% of the area of ​​the surrounding image recognition area 105). Therefore, the reward granting system 10 can realize a mechanism that naturally compels the user to capture the surrounding image recognition area. Furthermore, in this embodiment, the object recognition AI model 43 can identify products without depending on the size or aspect ratio of a specific boundary frame 106.

[0090] The image classification AI model 53 is copied from the recording device 34 of the learning device 30 to the recording device 64 of the inference device 60. When the object recognition determination unit 93 determines that the first determination is successful, the image classification AI model 53 classifies the captured image 83 into positive or negative examples and outputs a score.

[0091] The score determination unit 94 performs a second determination for authentication, determining the captured image 83 as a positive example if the score output by the image classification AI model 53 is less than the classification threshold, and determining the captured image 83 as a negative example if the score is equal to or greater than the classification threshold. This second determination is performed to detect check-in fraud using the captured image 83, which includes not only the internal region 104 shown in Figure 8 but also the surrounding image identification region 105.

[0092] The scores output by the image classification AI model 53 have a range of values, and using only one classification threshold may result in misclassifying a captured image 83 that should be classified as a positive example as a negative example, or conversely, misclassifying a captured image 83 that should be classified as a negative example as a positive example. Therefore, the score determination unit 94 divides the classification threshold into a first classification threshold and a second classification threshold greater than the first classification threshold, and determines the captured image 83 accordingly. As shown in Figure 17 described later, if the score determination unit 94 is less than the first classification threshold, it determines it as a "Pass" representing a positive example, and if the score is equal to or greater than the second classification threshold, it determines it as a "Fail" representing a negative example. Captured images 83 determined as "Fail" are audited by the administrator. In addition, if the score determination unit 94 is equal to or greater than the first classification threshold and less than the second classification threshold, it determines it as a "Borderline" that cannot be classified as either a positive or negative example. Captured images 83 determined as "Borderline" are reviewed by the administrator. The score output by the image classification AI model 53 is recorded in the judgment result table 95 as the result of the second judgment by the score judgment unit 94, and the object identification judgment unit 93 can refer to the judgment result of the second judgment.

[0093] In the reward system 10, rewards are awarded to users whose first judgment is determined to be successful and whose second judgment is determined to be "Pass". For example, a user whose captured image 83 is determined to be a positive example by the score judgment unit 94 will be awarded a reward. On the other hand, if the score judgment unit 94 determines that the captured image 83 is a negative example, the user's authentication will fail. A user who has submitted a captured image 83 that is determined to be a negative example is highly likely to have performed an unauthorized operation. For users who have performed an unauthorized operation, even if a reward has been given, the reward may be revoked by the administrator or the account may be deleted.

[0094] <Example of processing between an information processing terminal and an inference device> In the check-in service provided to the user by the reward system 10, the inference device 60 receives an image taken by the application and the coordinate information of the boundary frame 106 in the image from the information processing terminal 2 as a check-in token. It is preferable that the reward system 10 does not accept images from any client, but only images taken from a specific application or a specific web page. Therefore, secure session management is required, for example, so that only a specific user can send the captured image 83 used for check-in determination.

[0095] Here, we will explain the processing on the inference device 60 side. Here, we will explain the check-in service performed by the information processing terminal 2 and the inference device 60, referring to the example of the screen displayed on the information processing terminal 2 shown in Figure 9. Figure 12 is a sequence diagram showing an example of processing between the information processing terminal 2 and the inference device 60.

[0096] The user operating the information processing terminal 2 launches an application and activates the camera 28 through the application. When the camera 28 is activated, the output device 27 displays the internal area 104 and the shooting area 103, which includes the surrounding image identification area 105, etc., as shown in the display example (1) of Figure 9 (S21).

[0097] When a user takes a picture of a set (multiple products) at once (S22), a captured image 83 showing multiple products is displayed, as shown in the display example (2) of Figure 9. In addition, a message is displayed in the message display area 107 asking the user whether to send the captured image to the server (inference device 60) and check it in. When the user presses the OK button, the terminal-side communication unit 84 sends the captured image 83, area information, and user information to the inference device 60 (S23).

[0098] The inference device side communication unit 91 of the inference device 60 receives the captured image 83, region information, and user information from the information processing terminal 2 (S31). The inference device side communication unit 91 records the captured image 83, region information, and user information acquired from the information processing terminal 2 in the recording device 64 (S32).

[0099] Next, object identification inference processing is performed (S33), and the object identification result is output to the object identification determination unit 93. Here, an example of the object identification inference processing in step S33 will be described. Figure 13 is a flowchart showing an example of object recognition inference processing.

[0100] First, the image normalization unit 92 acquires the captured image 83 read from the recording device 64 and normalizes the captured image 83 (S41).

[0101] Next, the object recognition AI model 43 detects the internal region 104 from the normalized captured image 83 (S42). Then, the object recognition AI model 43 extracts the object detection region of the product shown in the normalized captured image 83 and performs object recognition (assigning the product name) (S43), and proceeds to step S34.

[0102] Figure 14 shows an example of object identification results. In step S33 of Figure 12 and the object recognition inference process shown in Figure 13, products are individually identified based on the captured image 83 in which multiple products are visible. For example, in the shooting examples (1) to (4) of Figure 14, even if the cup, burger, and fries are arranged appropriately, the product name of the identified product is labeled to the object detection area 110 indicated by the bounding box detected for each product. Furthermore, as shown in shooting examples (3) and (4), even if part of the cup or burger is hidden by the fries, the products are correctly identified.

[0103] Returning to the explanation of Figure 12. In the object recognition inference process, the object recognition AI model 43 detects products based on the captured image 83. At this time, the object recognition AI model 43 outputs the label (identifier string) of the identified product, the score of the identified product, and detection area information (x, y, w, h) representing the object detection area 110 to the object recognition determination unit 93 as the object recognition result.

[0104] The object identification determination unit 93 performs object identification determination processing based on the object identification result (S34). Here, an example of the object identification determination processing in step S34 will be described. Figure 15 is a flowchart illustrating an example of object identification processing. Object identification processing is an example of the first determination performed for user authentication.

[0105] First, the object recognition determination unit 93 compares the surrounding image recognition area 105 acquired by the inference device side communication unit 91 in step S31 with the object detection area 110 identified from the detection area information (S51). Next, the object recognition determination unit 93 determines whether the overlap of the object detection area 110 with respect to the surrounding image recognition area 105 is greater than or equal to a predetermined amount (S52). The overlap of the object detection area 110 with respect to the surrounding image recognition area 105 is determined, for example, by the ratio of the area of ​​the object detection area 110 that extends into the surrounding image recognition area 105 to the area of ​​the surrounding image recognition area 105. For example, if the overlapping area of ​​the object detection area 110 with respect to the surrounding image recognition area 105 exceeds 10-30% of the area of ​​the surrounding image recognition area 105, it is determined that the overlap is greater than or equal to a predetermined amount. Note that coordinate calculation or collision detection checks may be used to determine the overlap of the object detection area 110 with respect to the surrounding image recognition area 105.

[0106] In step S52, if the object recognition determination unit 93 determines that the overlap of the object detection area 110 with respect to the surrounding image recognition area 105 is greater than a predetermined amount (YES in S52), it determines that the check-in is "failed". The object recognition determination unit 93 then outputs the object recognition determination result of "failure" (S53) and returns to step S35 in Figure 12.

[0107] In step S52, if the object identification determination unit 93 determines that the overlap is not greater than a predetermined amount (NO in S52), it determines whether the set of products identified by the object identification AI model 43 satisfies the set condition (S54). Specifically, the object identification determination unit 93 confirms that the combination of product names included in the set menu is included in the label output by the object identification AI model 43 by pattern matching of the identifier string.

[0108] In step S54, if the object identification determination unit 93 determines that the set of identified items satisfies the set condition (YES in S54), it determines that the check-in is "successful," that is, the first determination is successful. The object identification determination unit 93 then outputs the object identification determination result of "success" (S55) and returns to step S35 in Figure 12.

[0109] On the other hand, in step S54, if the object identification determination unit 93 determines that the set of identified products does not satisfy the set condition (NO in S54), it determines that the check-in is "failed," that is, the first determination is a failure. The object identification determination unit 93 then outputs the object identification determination result, which includes the fact that the check-in was determined to be "failed" and the product names that do not belong to the combination (S56), and returns to step S35 in Figure 12. As the product names that do not belong to the predetermined combination among the products identified by the object identification AI model 43 are output, the information processing terminal 2 displays a message for the products that do not belong to the predetermined combination, as shown in the display example (3) in Figure 10.

[0110] Here, we will explain an example of object recognition results from the object recognition AI model 43. Figure 16 shows examples of object identification results and check-in results. The object recognition result (1) of the captured image 83 shown in Figure 16 is indicated by the object detection region 110, which is a bounding box of a rectangular frame shown by a dashed line, showing how the object recognition AI model 43 detected each product in the captured image 83. Each object detection region 110 is labeled with the product name identified by the object recognition AI model 43. Since the object detection region 110 of each product detected by the object recognition AI model 43 is inside the internal region 104, the check-in is determined to be "successful".

[0111] The object recognition result (2) of the captured image 83 shown in Figure 16 is also shown in the object detection region 110, which is a rectangular frame indicated by a dashed line, showing how the object recognition AI model 43 detected each product in the captured image 83. However, since the overlap between the surrounding image recognition region 105 and the object detection region 110 labeled "Cup" and "Potato" is greater than a predetermined amount, the check-in is judged as "failure".

[0112] Returning to the explanation of Figure 12. The inference device communication unit 91 transmits the object identification determination result to the information processing terminal 2 regardless of whether the check-in determination is successful or unsuccessful (S35). The object identification determination result, which is the result of the first determination, is recorded in the determination result table 95, as shown in Figure 18.

[0113] The terminal-side communication unit 84 of the information processing terminal 2 receives the object identification determination result from the inference device 60 (S24). The output device 27 of the information processing terminal 2 displays the determination result based on the object identification determination result (S25), and the processing of the information processing terminal 2 ends.

[0114] On the other hand, in the inference device 60, if the check-in is determined to be "successful" in step S34, that is, if the first determination is determined to be successful, the image classification inference process is continued (S36). Here, an example of the image classification inference process in step S36 will be described. Figure 17 is a flowchart showing an example of image classification inference processing.

[0115] First, the image classification AI model 53 classifies the captured image 83 read from the recording device 64 and outputs a score (S61). Next, the score determination unit 94 compares the score with a classification threshold and performs a second determination for authentication, determining whether the captured image 83 is a positive or negative example.

[0116] Specifically, the score determination unit 94 determines whether the score is less than the first classification threshold (S62). If the score is less than the first classification threshold (YES in S62), the score determination unit 94 writes "Pass" to the determination result table 95 and terminates the process. A "Pass" for the captured image 83 means that it is a valid image.

[0117] If the score is equal to or greater than the first classification threshold (NO in S62), the score determination unit 94 determines whether the score is equal to or greater than the second classification threshold, which is greater than the first classification threshold (S64). If the score is equal to or greater than the second classification threshold (YES in S64), the score determination unit 94 writes "Fail" to the determination result table 95 and terminates the process. A "Fail" for the captured image 83 means that it is an invalid image.

[0118] If the score is not equal to or greater than the second classification threshold (NO in S64), the score is equal to or greater than the first classification threshold but less than the second classification threshold. In this case, the score determination unit 94 writes "Borderline" to the determination result table 95 and terminates processing. If the captured image 83 is "Borderline", it means that it is not possible to determine whether it is a valid image or an invalid image.

[0119] <Example of the structure of the judgment result table> Figure 18 shows an example of the configuration of the judgment result table 95. The judgment result table 95 manages user information, captured images 83, and the judgment results of each AI model. The judgment result table 95 includes the following items: user ID, image reception date and time, captured image, object identification result, object identification judgment result, image classification result, future processing, and authentication result.

[0120] The User ID field stores the User ID of the user who submitted image 83. If a single user submits images multiple times, a record is created each time an image is submitted. The image reception date and time field stores the date and time (year, month, day, and time) on which the inference device 60 received the captured image 83 transmitted from the information processing terminal 2. The "Captured Image" item stores the file of the captured image 83 received from the information processing terminal 2. Here, an example is shown where a JPEG image file is stored, but any image file, regardless of its extension, is acceptable. Additionally, the "Captured Image" item may also store location information (such as a path) indicating the location of the captured image 83 file.

[0121] The object recognition result item stores the object recognition result obtained by the object recognition AI model 43, which identifies the products in the captured image 83. Here, the name of the product included in the set menu is stored as the object recognition result. The object identification result field stores either "Success" or "Failure" as the result of the object identification determination.

[0122] The image classification result item stores the image classification result determined by the score determination unit 94 based on the score obtained by the image classification AI model 53 in classifying the captured image 83 as either a positive or negative example. As described above, if the image classification AI model 53 classified the captured image 83 as a positive example, it is represented as "Pass," and if it classified it as a negative example, it is represented as "Fail." Note that since the image classification AI model 53 represents the image classification result as a score, if neither the score for the positive example nor the negative example is significantly high and manual verification is required, the image classification result is represented as "Borderline." Also, if the object identification determination result is "Failure," the image classification inference process is not performed, and the image classification result remains blank.

[0123] The "Future Processing Items" section stores the next processing steps to be performed based on the object identification results and image classification results. The authentication result field stores the final authentication result. The final authentication result may be registered by the administrator because even if the first judgment is successful, the second judgment may fail.

[0124] If, as with user1, the object recognition result is "successful" and the image classification result is "Pass," the final user authentication is determined to be "successful," and the user is granted a reward (an example of a reward).

[0125] If, like user2, the object recognition result for the first submitted image 83 is "failure," the final user authentication is determined to be "failure," and the user is not granted any rewards. However, if the object recognition result for the second submitted image 83 is "successful," an image classification result is obtained. If the image classification result is "Pass," the final user authentication is determined to be "successful," and the user is granted rewards.

[0126] As with user3, if the object identification determination result is "successful" but the image classification result is "fail," there is a high possibility that the captured image 83 is fraudulent. For this reason, the administrator of the inference device 60 audits the captured image 83. If it is confirmed that the captured image 83 was obtained fraudulently, the benefits for user3 will be revoked, etc. In other words, even if the object identification determination result is "successful" and the image classification result is "Pass," if the final user authentication is determined to be "failed," the user will not be granted any benefits. In this embodiment, since the image classification AI model 53 classifies fraudulent activity in the captured image 83, users attempting fraudulent activity (cheating) will inevitably have to repeatedly try to perform fraudulent activities. As a result, the history of fraudulent activities is accumulated in the determination result table 95, making it easier for the administrator to take measures such as revoking benefits for users attempting fraud or suspending their accounts.

[0127] As with User 4, even if the object identification result is "successful," if the image classification result is "Borderline," the image classification AI model 53 was unable to correctly classify the captured image 83, and the administrator will then review the captured image 83. Therefore, the final user authentication will be either "successful" or "failed," and the result reviewed by the administrator will be stored in the authentication result field.

[0128] In the reward system 10 according to the first embodiment described above, product packaging, which contains a lot of noise such as deformation and dirt due to folding, is used as a check-in point, and combinations of these, i.e., products included in a "set menu," can be identified with high accuracy. For this reason, the object recognition AI model 43 can simultaneously identify multiple products to be checked in from products that appear in any captured image 83. In addition, the image classification AI model 53 outputs a score as the result of classifying whether the captured image 83 was fraudulently created or tampered with using the images captured in the surrounding image recognition area 105, and the score determination unit 94 can determine the score and classify the image. For this reason, the inference device 60 can use product packaging, which contains a lot of noise such as deformation and dirt due to folding, as a check-in point, and simultaneously identify combinations of these, i.e., "set menus," and implement fraud prevention measures, thereby preventing fraud.

[0129] Furthermore, by combining the object recognition AI model 43 and the image classification AI model 53, the reward system 10 can be constructed to be robust against fraud while identifying set menus containing various products at high speed (processing time within 1 second) and with high accuracy. Specifically, it is possible to detect a set of predetermined products from captured images 83, which include noise and occlusion, taken using the camera 28 mounted on the information processing terminal 2, and to check whether the captured images have been intentionally altered or processed.

[0130] Furthermore, the output device 27 of the information processing terminal 2 displays the check-in screen W1, which the user uses to perform the check-in operation. The check-in screen W1 displays a rectangular boundary frame 106 in the center of the screen, which is narrower than the actual shooting area 103. This makes it easier for the user to operate the system so that the product to be checked in is captured within the boundary frame 106.

[0131] Furthermore, the object recognition determination unit 93 determines the object recognition determination result as "failure" and does not allow check-in if the object detection area of ​​the product detected by the object recognition AI model 43 extends beyond the internal area 104 specified by the area information. If the object recognition determination result is determined as "failure," a message prompting the user to take a picture so that the product is inside the boundary frame 106 is displayed on the check-in screen W1. Therefore, the user can try checking in again by taking the picture again.

[0132] Furthermore, determining whether the captured image 83 is a fraudulent image requires, in particular, an image of the surrounding image recognition region 105. By naturally coercing the user to take a picture so that the product is inside the boundary frame 106, an image of the surrounding image recognition region 105 large enough to classify whether it is a fraudulent image is also captured. As a result, the classification accuracy of the image classification AI model 53 can be improved, and the tamper resistance of the reward system 10 can be increased.

[0133] Furthermore, if the object recognition result is determined to be successful and the captured image 83 is classified as legitimate, users can, for example, be given in-game rewards. This makes it possible to promote sales by incentivizing users to purchase the menu items that are subject to check-in.

[0134] [Second Embodiment] Next, a reward system according to a second embodiment of the present invention will be described. In the client-server system configured with the reward system 10 according to the first embodiment, the approach of deploying the AI ​​model on the server side is only one embodiment. For example, by using a small AI model for check-in purposes, the service can be flexibly designed to match the scale and budget of the initiative. It is also possible to install the small AI model on the information processing terminal 2 and allow check-ins without using network communication.

[0135] Here, the reward system 10A according to the second embodiment will be described with reference to Figure 19. Figure 19 is a block diagram showing an example of the overall configuration of the reward system 10A according to the second embodiment.

[0136] The reward system 10A includes an information processing terminal 2A and an inference device 60A. In addition to the functional units of the information processing terminal 2 described with reference to Figure 11, the information processing terminal 2A includes a terminal object recognition AI model 45, an image normalization unit 86, and an object recognition determination unit 87. The object recognition AI model 45 for the terminal is an optimized AI model converted from the object recognition AI model 43 by the object recognition AI model conversion unit 44 shown in Figure 4. By preparing the object recognition AI model 45 for the terminal in advance, the information processing terminal 2A can make the object recognition AI model 45 perform the same operations as the object recognition AI model 43.

[0137] The image normalization unit 86 has the same function as the image normalization unit 92 shown in Figure 11. The object identification determination unit 87 has the same function as the object identification determination unit 93 shown in Figure 11. Therefore, in the information processing terminal 2A according to the second embodiment, the image normalization unit 86 normalizes the captured image 83, the terminal object recognition AI model 45 identifies the products in the normalized captured image 83, and the object recognition determination unit 87 can independently perform the process of determining whether to check in based on the object recognition result.

[0138] The captured image 83 is transmitted to the inference device 60A via the terminal-side communication unit 84. The inference device-side communication unit 91 records the captured image 83 in the recording device 64. The image classification AI model 53 classifies the captured image 83 into positive or negative examples and outputs a score, and the score determination unit 94 records the image classification result with the determined score in the determination result table 95.

[0139] In the reward system 10A according to the second embodiment described above, the terminal object recognition AI model 45 provided on the information processing terminal 2A identifies the products shown in the captured image 83, and the object recognition determination unit 87 makes a determination. Therefore, if the purpose is only check-in, the processing can be completed on the information processing terminal 2A, thus reducing the operational load on the inference device 60A.

[0140] [Third Embodiment] Next, a reward system according to a third embodiment of the present invention will be described. The reward system according to the third embodiment consists only of an information processing terminal 2B.

[0141] Here, the reward system 10B according to the third embodiment will be described with reference to Figure 20. Figure 20 is a block diagram showing an example of the overall configuration of the reward system 10B according to the third embodiment.

[0142] The reward system 10B includes an information processing terminal 2B. In addition to the functional units of the information processing terminal 2A described with reference to Figure 19, the information processing terminal 2B includes a terminal image classification AI model 55, a score determination unit 88, and a determination result table 89.

[0143] The terminal image classification AI model 55 is an optimized AI model converted from the image classification AI model 53 by the image classification AI model conversion unit 54 shown in Figure 4. By preparing the terminal image classification AI model 55 in advance, the information processing terminal 2A can make the terminal image classification AI model 55 perform the same operations as the image classification AI model 53.

[0144] Similar to the reward system 10A according to the second embodiment, the information processing terminal 2B can independently perform the process of identifying products from the captured image 83 and determining whether a check-in has occurred. Furthermore, in the information processing terminal 2B, the terminal image classification AI model 55 classifies the captured image 83 into positive or negative examples and outputs a score, and the score determination unit 88 records the image classification result determined by the score in the determination result table 89. The determination result table 89 may be encrypted so that the user cannot read its contents. In addition, the information in the determination result table 89 may be transmitted to a server managed by the administrator.

[0145] In the reward system 10B according to the third embodiment described above, the terminal object recognition AI model 45 provided in the information processing terminal 2B identifies the products in the captured image 83, and the object recognition determination unit 87 makes a determination. Furthermore, the terminal image classification AI model 55 provided in the information processing terminal 2B determines the score based on the score obtained by classifying the captured image 83 as either a positive or negative example. Therefore, even in an offline environment, the information processing terminal 2B can identify products from the captured image 83 and perform the process of classifying the captured image 83 as either a positive or negative example.

[0146] The AI ​​models according to the second and third embodiments can be operated on both a server and a client. If the check-in measure is long-term or permanent, it can be operated on the client, which is the information processing terminal 2, and if it is short-term, it can be operated on the server, which is the inference device 60, thereby achieving low costs.

[0147] [Differentiation] Furthermore, by using WebGPU EP (Execution Provider), it is possible to load AI models directly into a web browser and perform inference without going through an application. Therefore, the reward system is not necessarily an architecture that depends on a client / server system.

[0148] Furthermore, the AI ​​models capable of realizing each of the above embodiments do not depend on any specific AI model (such as YOLO or ResNet mentioned above). AI models constructed using various technologies may be used as the object recognition AI model 43 and the image classification AI model 53.

[0149] It should be noted that the present invention is not limited to the embodiments described above, and various other applications and modifications can be taken as long as they do not deviate from the gist of the present invention as described in the claims. For example, the embodiments described above are detailed and specific explanations of the configuration of the apparatus and system in order to clearly illustrate the present invention, and are not necessarily limited to having all the configurations described. Furthermore, it is possible to replace some of the configurations of the embodiments described here with the configurations of other embodiments, and it is also possible to add the configurations of other embodiments to the configuration of one embodiment. In addition, it is possible to add, delete, or replace some of the configurations of each embodiment with other configurations. Furthermore, the control lines and information lines shown are those deemed necessary for explanatory purposes, and not all control lines and information lines are necessarily shown in the actual product. In reality, it is safe to assume that almost all components are interconnected. [Explanation of Symbols]

[0150] 2…Information processing terminal, 10…Reward system, 30…Learning device, 41…Learning data for object recognition, 42…Object recognition learning unit, 43…Object recognition AI model, 44…Object recognition AI model conversion unit, 45…Object recognition AI model for terminal, 51…Learning data for image classification, 52…Image classification learning unit, 53…Image classification AI model, 54…Image classification AI model conversion unit, 55…Image classification AI model for terminal, 60…Inference device, 81… 82...Shooting area display unit, 83...Shooting unit, 84...Terminal side communication unit, 85...Result display unit, 91...Inference device side communication unit, 92...Image normalization unit, 93...Object recognition determination unit, 94...Score determination unit, 95...Determination result table, 101...Shootable area, 102...Display area, 103...Shooting area, 104...Internal area, 105...Surrounding image recognition area, 106...Boundary frame, 107...Message display area, W1...Check-in screen

Claims

1. A procedure for acquiring a captured image in which multiple objects are photographed, and a peripheral image identification region set in a part of the captured image for identifying the surrounding image of the objects, The object recognition model sets an object detection region for each of the multiple objects captured in the image, and identifies the object for each object detection region. A procedure for determining the success or failure of user authentication based on the position of the object detection region relative to the surrounding image recognition region, An inference program designed to be run by a computer.

2. If the overlap between the object detection area and the surrounding image recognition area exceeds a predetermined amount, the authentication is determined to have failed. The inference program according to claim 1.

3. If the overlap between the object detection area and the surrounding image recognition area is less than a predetermined amount, and the set of identified objects does not satisfy the set condition, the first determination performed for authentication is determined to be a failure. If the overlap with the surrounding image recognition area is less than a predetermined amount, and the set of identified objects satisfies the set condition, the first determination is determined to be a success. The inference program according to claim 2.

4. The aforementioned set condition is that the set of identified objects corresponds to a predetermined combination of a plurality of different objects. The inference program according to claim 3.

5. The system outputs information on the objects identified by the object identification model that do not fall under any predetermined combination. The inference program according to claim 4.

6. If the first determination is deemed successful, the image classification model outputs a score that classifies the captured image as either a positive or negative example. The procedure includes a second determination for authentication, in which the captured image is determined to be a positive example if the score is less than the classification threshold, and the captured image is determined to be a negative example if the score is equal to or greater than the classification threshold. The inference program according to claim 3.

7. On the display unit of the information processing terminal that photographs multiple objects, a ring-shaped boundary line indicating the surrounding image identification area is displayed. The inference program according to claim 3.

8. The aforementioned boundary line is rectangular in shape. The area of ​​the surrounding image identification region is smaller than the area of ​​the shooting region displayed on the display unit. The aforementioned peripheral image identification region is identified from the captured image by the coordinate information of one corner of the rectangular boundary line, and the width and height based on that corner. The inference program according to claim 7.

9. The display unit shows the brightness of the surrounding image identification area as lower than the brightness of the area inside the boundary line. The inference program according to claim 8.

10. The steps include: acquiring a captured image in which multiple objects are photographed, and a peripheral image identification region set in a part of the captured image for identifying the surrounding image of the objects; The object recognition model sets an object detection region for each of the multiple objects captured in the image, and identifies the object for each object detection region. The step of determining whether user authentication is successful or unsuccessful based on the position of the object detection region relative to the surrounding image recognition region. Reasoning method.

11. A communication unit that acquires a captured image in which multiple objects are photographed, and a peripheral image identification region set in a part of the captured image for identifying the surrounding image of the objects, For each of the multiple objects captured in the aforementioned image, an object detection region is set where the object is detected, and an object identification model is provided to identify the object for each of the object detection regions. The system includes a determination unit that determines whether user authentication is successful or unsuccessful based on the position of the object detection region relative to the surrounding image recognition region. Reasoning device.

12. A procedure for having an object recognition model learn to identify multiple objects that appear in a predetermined combination in a captured image for object recognition training, A procedure for recording the trained object recognition model in a recording unit, A learning program designed to be executed by a computer.

13. The object recognition model performs a step of learning to identify multiple objects that appear in a predetermined combination in a captured image for object recognition training, The step includes recording the trained object recognition model in a recording unit. Learning methods.

14. The object recognition learning unit includes an object recognition learning unit that causes an object recognition model to learn how to identify multiple objects that appear in a predetermined combination in a captured image for object recognition learning, and records the learned object recognition model in a recording unit. Learning device.