system

The system uses image analysis and generative AI to automate license confirmation, addressing inefficiencies in conventional verification methods by reducing time and effort in police inquiries.

JP2026104514APending Publication Date: 2026-06-25SOFTBANK GROUP CORP

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
SOFTBANK GROUP CORP
Filing Date
2024-12-13
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

Conventional license confirmation processes in job inquiries are time-consuming and inefficient, placing a burden on both police officers and citizens, and lack effective management of past verification histories, leading to repetitive confirmations.

Method used

A system utilizing image acquisition means to input driver's license images, analyzed by generative AI for text and facial recognition, and compared with past verification history in a database to automate identity verification.

Benefits of technology

This approach reduces the burden on users and police officers by enabling rapid, efficient identity verification, minimizing duplicate checks and improving overall verification efficiency.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026104514000001_ABST
    Figure 2026104514000001_ABST
Patent Text Reader

Abstract

We provide the system. [Solution] A means for inputting image data containing identification information using an image acquisition device, A means of analyzing input image data using generative AI technology and recognizing specific information, A means for comparing past identification history and specific information recorded in an information storage device, A means for determining the necessity of person verification based on the aforementioned specific information and matching results, A means for outputting the judgment result, A means for providing immediate notification upon confirmation and a display device to minimize waiting time, A system that includes this.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The technology of this disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a persona chatbot control method performed by at least one processor, including steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a chatbot character, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] In the conventional license confirmation in job inquiries, police officers need to manually check license information, which takes time in the process, so there is a problem of burden on both police officers and ordinary citizens. In addition, past confirmation histories have not been properly managed, and the same person may be repeatedly confirmed, so the efficiency of the confirmation work has been reduced. Thus, the lack of a system for quickly and efficiently performing identity verification is an issue.

Means for Solving the Problems

[0005] This invention provides a system that uses image acquisition means to input image data containing personal identification information, analyzes that image data using a generating AI to recognize specific information, and then compares it with past verification history recorded in a database to quickly verify the information. This allows for the determination of the need for re-verification and, if necessary, the automation of identity verification procedures. Furthermore, by utilizing the camera of a mobile device, it has means to facilitate on-site operation and improve the efficiency of verification work.

[0006] "Image acquisition means" refers to a device or function for acquiring image data containing personal identification information from an external source.

[0007] "Generative AI" is an artificial intelligence technology that analyzes input data and extracts or recognizes specific information.

[0008] "Identifiable information" refers to the necessary data extracted from input image data by a generating AI, and is used for identifying individuals.

[0009] A "database" is an information system used to record and manage past verification history.

[0010] "Verification" is the process of comparing specific information with past verification history to determine whether they match or differ.

[0011] "Identity verification" is a procedure to confirm that an individual is indeed the person correctly identified.

[0012] "Mobile devices" refer to portable electronic devices such as mobile phones, smartphones, and tablets. [Brief explanation of the drawing]

[0013] [Figure 1] This is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2]It is a conceptual diagram showing an example of the main functions of a data processing device and a smart device according to the first embodiment. [Figure 3] It is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] It is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] It is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] It is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] It is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] It is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] It shows an emotion map to which a plurality of emotions are mapped. [Figure 10] It shows an emotion map to which a plurality of emotions are mapped. [Figure 11] It is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] It is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] It is a sequence diagram showing the processing flow of the data processing system in Example 2 when an emotion engine is combined. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when an emotion engine is combined.

MODE FOR CARRYING OUT THE INVENTION

[0014] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings.

[0015] First, the terms used in the following description will be explained.

[0016] In the following embodiments, the numbered processor (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of multiple arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of multiple types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0017] In the following embodiments, the numbered RAM (Random Access Memory) is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0018] In the following embodiments, the numbered storage is one or more non-volatile storage devices that store various programs and various parameters, etc. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes, and the like.

[0019] In the following embodiments, the numbered communication I / F (Interface) is an interface that includes a communication processor and an antenna, etc. The communication I / F controls communication between multiple computers. Examples of communication standards applied to the communication I / F include wireless communication standards including 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark), and the like.

[0020] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0021] [First Embodiment]

[0022] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0023] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0024] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0025] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0026] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0027] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0028] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0029] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0030] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0031] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0032] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0033] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0034] The present invention provides a system for efficiently verifying driver's license information during questioning by police officers using a mobile terminal carried by the officer. The terminal takes a picture of the user's driver's license with its camera and sends the image to a server. Upon receiving the image, the server uses a generating AI to analyze the text information and facial photograph within the image and extract the necessary specific information.

[0035] The server then uses the extracted specific information to access the database and compare it with past verification history. If an existing verification history exists in the database, it decides to skip the detailed verification process and notifies the terminal that verification is complete. This process reduces the burden on both the user and the police officer, and enables rapid identity verification.

[0036] As a concrete example, when a police officer takes a picture of a driver's license with the camera on their terminal during a traffic checkpoint, that information is automatically sent to the server. Subsequently, if the same user's license information has been verified in the past, it will immediately display "Verified" based on that information. This allows users to avoid duplicate verification, and police officers can improve their work efficiency.

[0037] The following describes the processing flow.

[0038] Step 1:

[0039] The user takes a picture of their driver's license using the device's camera and saves the image data to the device. The device then prepares to process this image data.

[0040] Step 2:

[0041] The device sends the image data of the driver's license that it has photographed to the server. A secure communication protocol is used for transmission to ensure the safety of the data.

[0042] Step 3:

[0043] The server analyzes the image data received from the terminal using a generating AI. The generating AI recognizes the text on the driver's license (name, license number, expiration date, etc.) and extracts necessary specific information by analyzing the facial photograph.

[0044] Step 4:

[0045] The server accesses the database based on the extracted specific information and searches for past verification history. If a matching history is found, it retrieves the information and performs a comparison.

[0046] Step 5:

[0047] The server determines whether reconfirmation is necessary based on the results of a comparison with past confirmation history. If it is determined that reconfirmation is not necessary, it generates a confirmation result that includes that statement.

[0048] Step 6:

[0049] The server sends the verification result to the terminal. The terminal displays the received result to the user and notifies them that the verification is complete.

[0050] (Example 1)

[0051] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0052] Traditional identity verification methods often required manual verification, resulting in inefficiency and time-consuming processes. Furthermore, the need for repeated verification added significant complexity and slow processing, placing a heavy burden on both police officers and users. There is a need to address these issues and provide a fast and efficient method of identity verification.

[0053] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0054] In this invention, the server includes means for inputting image data containing personal identification information using image acquisition means, means for analyzing the input image data using a generating AI and recognizing specific information, and means for executing a face recognition algorithm using the analyzed specific information. This makes it possible to perform identity verification quickly and efficiently. This method significantly reduces the effort required even when re-verification is necessary, thereby reducing the burden on police officers and users.

[0055] "Image acquisition means" refers to devices or methods for acquiring image data that includes personal identification information.

[0056] "Generative AI" is a technology that uses generative artificial intelligence to analyze data and derive new information.

[0057] "Identifiable information" refers to information necessary to identify an individual, primarily including text data and facial image data.

[0058] A "face recognition algorithm" refers to a computational method used to identify individuals by analyzing facial features from images.

[0059] A "database" is a collection of information where past verification history and other related information are recorded and managed.

[0060] "Verification" is the process of comparing a large amount of information with newly acquired information to determine whether they match.

[0061] "Identity verification" refers to a series of procedures to confirm an individual's identity and determine whether they meet the required criteria.

[0062] A "portable information processing device" refers to a device that is easily portable and capable of performing calculations and information processing.

[0063] This invention provides a system for police officers to quickly verify the identity of individuals during questioning. Specifically, it uses a mobile terminal as a portable information processing device. The terminal's built-in camera captures an image of the user's driver's license. This image data is transmitted to a server via a secure protocol.

[0064] The server analyzes the received image data using a generation AI model and extracts text information from the image using OCR technology. Simultaneously, it applies a facial recognition algorithm to extract features from facial photographs. Through these processes, specific information for identifying an individual is generated. The generated information is queried from a database and compared with past verification history.

[0065] For example, if a police officer photographs a driver's license with a camera during a traffic checkpoint, this information is processed in real time. This allows for a quick determination of whether the user is verified, and the information is sent to the device. An example of a prompt message used in this process is, "Extract text information from this image and identify and return the name, license number, and date of birth."

[0066] This streamlines the repetitive process of identity verification, significantly reducing the burden on both police officers and users. Furthermore, the faster processing allows police officers to focus on other tasks.

[0067] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0068] Step 1:

[0069] The user presents their driver's license to the police officer. The police officer uses a terminal to take a picture of the license. The input for this step is the physical driver's license, and the output is digital image data. Autofocus and exposure adjustment functions are in place to ensure the image is clear and accurate.

[0070] Step 2:

[0071] The device digitizes the captured image data and sends it to the server via a secure network protocol (e.g., HTTPS). The input to this process is image data, and the output is encrypted image data. A checksum is generated before data transfer to ensure data integrity.

[0072] Step 3:

[0073] The server inputs the received image data into the generating AI model. Here, the prompt "Extract text information from this image and identify and return the name, license number, and date of birth" is used. The input for this step is encrypted image data, and the output is identified information (text information, feature data from the facial image). OCR technology and facial recognition algorithms are used in combination to analyze the information.

[0074] Step 4:

[0075] The server uses the extracted specific information to make a matching request to the database. The input for this step is the specific information, and the output is the verification result. Indexes are used to optimize database searches, enabling faster matching.

[0076] Step 5:

[0077] The server determines the need for identity verification based on the database matching results. The input for this step is the matching result, and the output is the determination regarding the need for verification. If the information has been previously verified, a flag is set to skip the re-verification procedure.

[0078] Step 6:

[0079] The server sends the judgment result back to the terminal, which then displays the information to the user and the police officer. The input for this step is the judgment result, and the output is a visual display of the confirmed status. Visual icons or text are displayed on the terminal to indicate that the confirmation is complete.

[0080] (Application Example 1)

[0081] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0082] The problem that this invention aims to solve is the time-consuming and cumbersome process of verifying an individual's identity quickly and efficiently using conventional methods. Furthermore, it is necessary to minimize visitor waiting times and ensure smooth entry into facilities.

[0083] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0084] In this invention, the server includes means for inputting image data containing identification information using an image acquisition device, means for analyzing the input image data using generation AI technology and recognizing specific information, and means for comparing the specific information with past identification history recorded in an information storage device. This enables rapid identity verification of individuals and minimizes waiting times for visitors.

[0085] An "image acquisition device" is a device for inputting image data, and includes the camera of a mobile information terminal.

[0086] "Identifying information" refers to data that includes information necessary to identify an individual.

[0087] "Generative AI technology" is an artificial intelligence technology used to analyze image data and recognize specific information.

[0088] An "information storage device" is a device that has the function of recording and storing past identification history.

[0089] "Identifiable information" refers to information necessary for identifying a person, extracted through image analysis.

[0090] "Verification" is the process of comparing extracted specific information with past identification history.

[0091] A "display device" is a device that visually displays confirmed notifications and other related information.

[0092] In this embodiment of the invention, a system is constructed to perform rapid and efficient personal identification by combining a mobile information terminal and a server. The mobile information terminal is equipped with a camera and software that supports image analysis, and is used to acquire image data containing identification information.

[0093] The terminal's camera takes a picture of the visitor's identification and sends the image to the server. The server analyzes the image data using generative AI technology and extracts specific information. It is desirable to use the latest image recognition technology as the generative AI model.

[0094] After specific information is extracted, the server accesses the information storage device and compares the specific information with past identification history. A database management system (DBMS) is used as the information storage device, for example, it might be built as an SQL database. Based on the comparison results, the server notifies the mobile device that the information has been verified.

[0095] The terminal visually notifies the user via a display device once verification is complete. This notification minimizes user waiting time and enables smooth entry management.

[0096] As a concrete example, it can be used at company reception areas and security gates. When a visitor arrives at the reception, the staff member can quickly verify their identity using a mobile device, receive an immediate "verified" message, and allow the visitor to pass through.

[0097] An example of a prompt used is, "Analyze the text information and facial photograph on the identification document and compare it with past verification history." Based on this prompt, the generating AI model performs appropriate analysis and recognition.

[0098] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0099] Step 1:

[0100] The device uses its camera to photograph the user's identification document and acquires image data. This image data becomes the input.

[0101] Step 2:

[0102] The terminal sends the acquired image data to the server. The server receives it and prepares for the next processing step. At this point, the server has the image data as input.

[0103] Step 3:

[0104] The server analyzes the received image data using a generation AI model. It applies the prompt "Analyze the text information and facial photograph from the identification document and compare it with past verification history," extracting specific information from the text and facial photograph. This analysis result becomes the intermediate output.

[0105] Step 4:

[0106] The server accesses the information storage device based on the extracted specific information and compares it with past identification history. The relevant history is retrieved from the database, and the comparison result is obtained as output.

[0107] Step 5:

[0108] The server determines the user's verification status based on the matching results and, if verified, sends that information to the terminal. The terminal receives this notification and displays "Verified" on its display device.

[0109] Step 6:

[0110] The terminal ultimately provides the user with visual notifications and offers further guidance based on the information. Specifically, the user can receive a confirmed notification and promptly enter the facility.

[0111] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0112] This invention is a system that integrates an emotion engine to analyze the user's emotional state, in addition to a system that streamlines license verification during police questioning. This system uses a mobile device and considers not only license information but also the user's emotional state to realize a safer and more effective verification process.

[0113] The terminal first takes a picture of the user's driver's license and sends the image to the server. At the same time, the terminal's camera also takes a picture of the user's face, and the emotion engine analyzes their emotions. Emotional information includes states such as tension, anger, and relief. Based on this, the server uses a generative AI to extract information from the driver's license and uses the emotion engine to analyze the user's emotional state.

[0114] The server compares the extracted license information with the database, checks past verification history, and determines the need for identity verification. Simultaneously, it takes into account the results of sentiment analysis to determine the appropriate response based on the user's state. Specifically, if the user is calm, the system continues normal processing; if tension or suspicion is detected, it guides the user through additional verification methods.

[0115] For example, if a user presents their driver's license to a terminal during a checkpoint, the emotion engine captures the user's subtle emotions along with the license information, and this is analyzed on the server. As a result, if verified data exists and the user is confirmed to be in a state of confidence, "Verification Complete" is quickly displayed. On the other hand, if necessary, further dialogue-based verification may be requested.

[0116] In this way, the invention integrates generative AI and an emotion engine, providing flexibility and reliability not found in conventional license verification systems. This makes the verification process safer and more effective.

[0117] The following describes the processing flow.

[0118] Step 1:

[0119] The user uses the device's camera to take pictures of their driver's license and their face simultaneously. The device then prepares to process these image data.

[0120] Step 2:

[0121] The device sends image data of the driver's license and facial image data to the server. A secure communication protocol is used for transmission to ensure data security.

[0122] Step 3:

[0123] The server analyzes the image data received from the terminal using a generating AI. The generating AI recognizes text information (name, license number, expiration date, etc.) from the driver's license and also analyzes the facial photograph to extract necessary specific information.

[0124] Step 4:

[0125] The server activates an emotion engine based on facial image data to analyze the user's emotional state. The emotion engine detects emotions such as tension, joy, and anger from facial expressions.

[0126] Step 5:

[0127] The server accesses the database using specific information and searches for past verification history. It checks if there is a history of verification and uses that information to decide whether to simplify the verification process.

[0128] Step 6:

[0129] The server makes a final decision based on the matching results and sentiment analysis results. If the user's sentiment is unstable and past verification has not been completed, additional identity verification methods will be initiated.

[0130] Step 7:

[0131] The server sends the final result to the terminal. The terminal displays the received result to the user and notifies them that the verification process is complete. This ensures that the user receives the most appropriate instructions immediately.

[0132] (Example 2)

[0133] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0134] Conventional verification systems primarily process personal identification information based on visual data, making it difficult to provide dynamic responses that take into account the user's emotional state. This resulted in a lack of additional verification methods or appropriate responses for users with unstable identity verification, hindering improvements in security and efficiency.

[0135] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0136] In this invention, the server includes means for acquiring personal identification information and emotional state using image acquisition means, means for analyzing image data using generating AI, and means for classifying an individual's emotions based on emotional state information. This enables not only the need for identity verification but also flexible responses adapted to the user's emotional state.

[0137] "Image acquisition means" refers to devices or methods for acquiring image data that includes personal identification information, and specifically refers to the camera equipment of a portable information terminal.

[0138] "Generative AI" is a type of artificial intelligence technology used to analyze input image data and recognize specific information that is necessary.

[0139] "Emotional state information" refers to data that indicates an individual's emotional state and includes elements that classify emotions such as tension, anger, and relief.

[0140] A "database" is an aggregate of information that stores and manages historical personal identification information and emotional state data recorded in the past, and assists in matching that information.

[0141] "Identity verification" is the process of confirming that the information provided by an individual matches past databases and determining the reliability of that information.

[0142] The system of the present invention has a configuration that combines a portable information terminal and a server for verifying the user's license. The terminal is equipped with a high-performance imaging device and is capable of acquiring images of the user's presented license and the user's face. The acquired image data is transmitted to the server via the internet and is encrypted and protected through a secure protocol.

[0143] The server utilizes a generative AI model with high accuracy in specific information recognition to analyze the received image data. Specifically, it uses a general cloud service that provides optical character recognition (OCR) technology capable of extracting text information from image data. Furthermore, the server incorporates a dedicated algorithm for emotion classification to analyze emotional states using facial image data. This allows for real-time identification of emotions such as "tension," "anger," and "relief."

[0144] The acquired license information and emotional state information are compared against past verification history stored in the server's database. Based on the comparison results and the user's emotional state, the server determines the next step in the verification process. If the user is in a relaxed state, the normal verification procedure proceeds; if the user is in a tense state, additional verification measures are offered to the user.

[0145] For example, consider a scenario where a user presents their driver's license to a mobile device at a checkpoint. The device automatically takes photos of the license and the user's face and immediately sends them to the server. The server uses AI to extract information such as "Ichiro Tanaka" and "License Number 12345678," and an emotion engine confirms that the user is at ease. If the information has already been verified in the database, the device displays "Verification Complete." An example of a prompt message would be, "Analyze the user's emotions and perform a comparison with past data along with the license information."

[0146] This system can enhance safety and efficiency by providing a new verification process that combines generative AI models with sentiment analysis.

[0147] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0148] Step 1:

[0149] The device uses its camera to photograph the user's driver's license and the user's face. As input, the device acquires image data of the driver's license and the user's face. Specifically, the device captures high-resolution images while adjusting autofocus and exposure. As output, image data is generated.

[0150] Step 2:

[0151] The device sends the captured image data to the server. The input is the image data acquired from the device. Specifically, the device encrypts the data and uploads it to the server using a secure protocol (e.g., HTTPS). The output is the server receiving the image data.

[0152] Step 3:

[0153] The server analyzes the received image data using an AI model to generate license information. The input is image data sent from the terminal. Specifically, optical character recognition (OCR) technology is used to extract text information such as name, date of birth, and license number. The output is specific information.

[0154] Step 4:

[0155] The server analyzes emotional states based on image data. The input is an image of the user's face. Specifically, it uses an emotion analysis algorithm to classify emotions such as tension, anger, and relief. The output is information about the user's emotional state.

[0156] Step 5:

[0157] The server compares the extracted license information with past verification history in the database. The inputs are license information and database history information. Specifically, it performs a database search to check for matching information. The output is the matching result.

[0158] Step 6:

[0159] The server determines the necessity of identity verification and the appropriate response based on the user's situation, using the matching results and emotional state information. Inputs include the matching results and emotional state information. Specifically, if the user is in a state of confidence, the server determines "verification complete"; if the user is in a state of tension, for example, it considers additional verification measures. Outputs include the verification result and proposed additional actions.

[0160] Step 7:

[0161] The terminal displays the final confirmation result to the user based on instructions from the server. The input is the decision result from the server. Specifically, the terminal displays messages such as "Confirmation complete" or "Further confirmation required" on the screen. The output is information displayed to the user.

[0162] (Application Example 2)

[0163] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as a "server" and the smart device 14 as a "terminal".

[0164] Traditional identity verification systems often involve manual verification processes for documents such as driver's licenses, which is time-consuming and makes it difficult to provide appropriate responses based on the emotional state of the person being verified. Therefore, there is a need to streamline the verification process and enhance security accordingly. Furthermore, it is necessary to enable flexible responses tailored to individual circumstances by considering the emotional state of the user.

[0165] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0166] In this invention, the server includes a device that inputs image data containing identity verification information using an image acquisition device, a device that analyzes the image data using generated artificial intelligence and recognizes specific information, and a device that analyzes emotional states from facial image data using an emotion analysis engine. This enables more efficient identity verification processes and flexible responses based on emotion analysis.

[0167] An "image acquisition device" is a device used to collect image data, including identity verification information, and specifically refers to the camera on a portable information terminal.

[0168] "Generated artificial intelligence" refers to an algorithm used to analyze data and recognize specific information, and its role is to analyze the input information.

[0169] "Identifiable information" refers to information extracted from image data that is particularly necessary for identification and is an important element in identity verification.

[0170] "Information recording medium" refers to databases and storage devices that store past verification history and data related to personal identification.

[0171] The "emotion analysis engine" is a system that analyzes an individual's emotional state based on facial image data and has the function of automatically identifying the user's emotions.

[0172] The "judgment result" is a conclusion derived based on the acquired information and analysis results, indicating whether or not identity verification is possible.

[0173] The system for realizing this invention consists mainly of a portable information terminal and a server. The terminal uses an image acquisition device (for example, a smartphone camera) to input image data including identity verification information. The user presents their driver's license to the terminal, and the camera simultaneously takes a picture of the license and their face. This allows for the efficient collection of the user's identity information.

[0174] The server uses generated artificial intelligence software (for example, Google® Cloud Vision AI) to analyze the input image data and recognize specific information. Specifically, it extracts important information such as name and date of birth from the driver's license. Then, it compares this information with past verification history stored in a database system (information recording medium) to determine whether verification is necessary.

[0175] Furthermore, by using an emotion analysis engine (for example, Microsoft® Face API), the user's emotional state is analyzed from facial image data. This analysis obtains information such as whether the user is relaxed or feeling tense or anxious. This emotional information is then used as data to enable flexible responses tailored to the user's situation.

[0176] For example, at the entrance of a large building, when a user presents their driver's license on a terminal, facial recognition and emotion analysis are performed smoothly. If the information matches the pre-registered data, entry is granted immediately. If the user feels uneasy, they may be guided through additional verification procedures.

[0177] In this way, efficient identity verification and secure security management are achieved simultaneously. In this invention, an example of a prompt message for effectively utilizing the generating AI model is: "Extract the owner's name and date of birth from the driver's license image and perform sentiment analysis. If it matches the list data, generate an entry permission message."

[0178] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0179] Step 1:

[0180] The user uses the camera on their portable information terminal to take pictures of their driver's license and their face. The terminal then obtains image data of the driver's license and their face. The input at this stage is raw image data captured by the camera, and this data is processed into a user-friendly format before being passed on to the next process.

[0181] Step 2:

[0182] The terminal sends the acquired image data to the server. The server analyzes the image data using a generative AI model and extracts specific information from the driver's license, such as the owner's name and date of birth. This process outputs the specific information in text format. The input is the acquired image data, and the output is the specific information in text format.

[0183] Step 3:

[0184] The server compares the extracted specific information with past verification history stored on the information storage medium. The server uses database queries to confirm the match and determines whether identity verification is necessary based on the input information. The result of this step is output as a determination of whether verification is possible or not.

[0185] Step 4:

[0186] The server uses an emotion analysis engine to analyze the user's emotional state from facial image data. Facial image data is used as input, and the result of the emotion analysis is output as the user's emotional state, such as relaxed, tense, or anxious.

[0187] Step 5:

[0188] The server determines appropriate feedback for the user based on the results of matching specific information and sentiment analysis. Based on the input data, which includes specific information and emotional state, the server determines whether to grant entry or require additional verification, and outputs a message accordingly.

[0189] Step 6:

[0190] The user receives feedback from the server on their device and follows the displayed instructions. Based on the instructions, they choose to proceed with the entry process or undergo additional verification. The input in this step is feedback from the server and is output as the user's action.

[0191] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0192] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0193] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart device 14.

[0194] [Second Embodiment]

[0195] Figure 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0196] As shown in Figure 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0197] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0198] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0199] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0200] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0201] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0202] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0203] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0204] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0205] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0206] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0207] The present invention provides a system for efficiently verifying driver's license information during questioning by police officers using a mobile terminal carried by the officer. The terminal takes a picture of the user's driver's license with its camera and sends the image to a server. Upon receiving the image, the server uses a generating AI to analyze the text information and facial photograph within the image and extract the necessary specific information.

[0208] The server then uses the extracted specific information to access the database and compare it with past verification history. If an existing verification history exists in the database, it decides to skip the detailed verification process and notifies the terminal that verification is complete. This process reduces the burden on both the user and the police officer, and enables rapid identity verification.

[0209] As a concrete example, when a police officer takes a picture of a driver's license with the camera on their terminal during a traffic checkpoint, that information is automatically sent to the server. Subsequently, if the same user's license information has been verified in the past, it will immediately display "Verified" based on that information. This allows users to avoid duplicate verification, and police officers can improve their work efficiency.

[0210] The following describes the processing flow.

[0211] Step 1:

[0212] The user takes a picture of their driver's license using the device's camera and saves the image data to the device. The device then prepares to process this image data.

[0213] Step 2:

[0214] The device sends the image data of the driver's license that it has photographed to the server. A secure communication protocol is used for transmission to ensure the safety of the data.

[0215] Step 3:

[0216] The server analyzes the image data received from the terminal using a generating AI. The generating AI recognizes the text on the driver's license (name, license number, expiration date, etc.) and extracts necessary specific information by analyzing the facial photograph.

[0217] Step 4:

[0218] The server accesses the database based on the extracted specific information and searches for past verification history. If a matching history is found, it retrieves the information and performs a comparison.

[0219] Step 5:

[0220] The server determines whether reconfirmation is necessary based on the results of a comparison with past confirmation history. If it is determined that reconfirmation is not necessary, it generates a confirmation result that includes that statement.

[0221] Step 6:

[0222] The server sends the verification result to the terminal. The terminal displays the received result to the user and notifies them that the verification is complete.

[0223] (Example 1)

[0224] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0225] Traditional identity verification methods often required manual verification, resulting in inefficiency and time-consuming processes. Furthermore, the need for repeated verification added significant complexity and slow processing, placing a heavy burden on both police officers and users. There is a need to address these issues and provide a fast and efficient method of identity verification.

[0226] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0227] In this invention, the server includes means for inputting image data containing personal identification information using image acquisition means, means for analyzing the input image data using a generating AI and recognizing specific information, and means for executing a face recognition algorithm using the analyzed specific information. This makes it possible to perform identity verification quickly and efficiently. This method significantly reduces the effort required even when re-verification is necessary, thereby reducing the burden on police officers and users.

[0228] "Image acquisition means" refers to devices or methods for acquiring image data that includes personal identification information.

[0229] "Generative AI" is a technology that uses generative artificial intelligence to analyze data and derive new information.

[0230] "Identifiable information" refers to information necessary to identify an individual, primarily including text data and facial image data.

[0231] A "face recognition algorithm" refers to a computational method used to identify individuals by analyzing facial features from images.

[0232] A "database" is a collection of information where past verification history and other related information are recorded and managed.

[0233] "Verification" is the process of comparing a large amount of information with newly acquired information to determine whether they match.

[0234] "Identity verification" refers to a series of procedures to confirm an individual's identity and determine whether they meet the required criteria.

[0235] A "portable information processing device" refers to a device that is easily portable and capable of performing calculations and information processing.

[0236] This invention provides a system for police officers to quickly verify the identity of individuals during questioning. Specifically, it uses a mobile terminal as a portable information processing device. The terminal's built-in camera captures an image of the user's driver's license. This image data is transmitted to a server via a secure protocol.

[0237] The server analyzes the received image data using a generation AI model and extracts text information from the image using OCR technology. Simultaneously, it applies a facial recognition algorithm to extract features from facial photographs. Through these processes, specific information for identifying an individual is generated. The generated information is queried from a database and compared with past verification history.

[0238] For example, if a police officer photographs a driver's license with a camera during a traffic checkpoint, this information is processed in real time. This allows for a quick determination of whether the user is verified, and the information is sent to the device. An example of a prompt message used in this process is, "Extract text information from this image and identify and return the name, license number, and date of birth."

[0239] This streamlines the repetitive process of identity verification, significantly reducing the burden on both police officers and users. Furthermore, the faster processing allows police officers to focus on other tasks.

[0240] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0241] Step 1:

[0242] The user presents their driver's license to the police officer. The police officer uses a terminal to take a picture of the license. The input for this step is the physical driver's license, and the output is digital image data. Autofocus and exposure adjustment functions are in place to ensure the image is clear and accurate.

[0243] Step 2:

[0244] The device digitizes the captured image data and sends it to the server via a secure network protocol (e.g., HTTPS). The input to this process is image data, and the output is encrypted image data. A checksum is generated before data transfer to ensure data integrity.

[0245] Step 3:

[0246] The server inputs the received image data into the generating AI model. Here, the prompt "Extract text information from this image and identify and return the name, license number, and date of birth" is used. The input for this step is encrypted image data, and the output is identified information (text information, feature data from the facial image). OCR technology and facial recognition algorithms are used in combination to analyze the information.

[0247] Step 4:

[0248] The server uses the extracted specific information to make a matching request to the database. The input for this step is the specific information, and the output is the verification result. Indexes are used to optimize database searches, enabling faster matching.

[0249] Step 5:

[0250] The server determines the need for identity verification based on the database matching results. The input for this step is the matching result, and the output is the determination regarding the need for verification. If the information has been previously verified, a flag is set to skip the re-verification procedure.

[0251] Step 6:

[0252] The server sends the judgment result back to the terminal, which then displays the information to the user and the police officer. The input for this step is the judgment result, and the output is a visual display of the confirmed status. Visual icons or text are displayed on the terminal to indicate that the confirmation is complete.

[0253] (Application Example 1)

[0254] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0255] The problem that this invention aims to solve is the time-consuming and cumbersome process of verifying an individual's identity quickly and efficiently using conventional methods. Furthermore, it is necessary to minimize visitor waiting times and ensure smooth entry into facilities.

[0256] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0257] In this invention, the server includes means for inputting image data containing identification information using an image acquisition device, means for analyzing the input image data using generation AI technology and recognizing specific information, and means for comparing the specific information with past identification history recorded in an information storage device. This enables rapid identity verification of individuals and minimizes waiting times for visitors.

[0258] An "image acquisition device" is a device for inputting image data, and includes the camera of a mobile information terminal.

[0259] "Identifying information" refers to data that includes information necessary to identify an individual.

[0260] "Generative AI technology" is an artificial intelligence technology used to analyze image data and recognize specific information.

[0261] An "information storage device" is a device that has the function of recording and storing past identification history.

[0262] "Identifiable information" refers to information necessary for identifying a person, extracted through image analysis.

[0263] "Verification" is the process of comparing extracted specific information with past identification history.

[0264] A "display device" is a device that visually displays confirmed notifications and other related information.

[0265] In this embodiment of the invention, a system is constructed to perform rapid and efficient personal identification by combining a mobile information terminal and a server. The mobile information terminal is equipped with a camera and software that supports image analysis, and is used to acquire image data containing identification information.

[0266] The terminal's camera takes a picture of the visitor's identification and sends the image to the server. The server analyzes the image data using generative AI technology and extracts specific information. It is desirable to use the latest image recognition technology as the generative AI model.

[0267] After specific information is extracted, the server accesses the information storage device and compares the specific information with past identification history. A database management system (DBMS) is used as the information storage device, for example, it might be built as an SQL database. Based on the comparison results, the server notifies the mobile device that the information has been verified.

[0268] The terminal visually notifies the user via a display device once verification is complete. This notification minimizes user waiting time and enables smooth entry management.

[0269] As a concrete example, it can be used at company reception areas and security gates. When a visitor arrives at the reception, the staff member can quickly verify their identity using a mobile device, receive an immediate "verified" message, and allow the visitor to pass through.

[0270] An example of a prompt used is, "Analyze the text information and facial photograph on the identification document and compare it with past verification history." Based on this prompt, the generating AI model performs appropriate analysis and recognition.

[0271] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0272] Step 1:

[0273] The device uses its camera to photograph the user's identification document and acquires image data. This image data becomes the input.

[0274] Step 2:

[0275] The terminal sends the acquired image data to the server. The server receives it and prepares for the next processing step. At this point, the server has the image data as input.

[0276] Step 3:

[0277] The server analyzes the received image data using a generation AI model. It applies the prompt "Analyze the text information and facial photograph from the identification document and compare it with past verification history," extracting specific information from the text and facial photograph. This analysis result becomes the intermediate output.

[0278] Step 4:

[0279] The server accesses the information storage device based on the extracted specific information and compares it with past identification history. The relevant history is retrieved from the database, and the comparison result is obtained as output.

[0280] Step 5:

[0281] The server determines the user's verification status based on the matching results and, if verified, sends that information to the terminal. The terminal receives this notification and displays "Verified" on its display device.

[0282] Step 6:

[0283] The terminal finally provides the user with a visual notification and further action guidelines based on the information. Specifically, upon receiving the confirmed notification, the user can promptly enter the facility.

[0284] Furthermore, an emotion engine for estimating the user's emotions may be combined. That is, the specific processing unit 290 may estimate the user's emotions using the emotion identification model 59 and perform specific processing using the user's emotions.

[0285] The present invention is a system that integrates an emotion engine for analyzing the user's emotional state in addition to a system for streamlining license confirmation during job interviews. This system uses a mobile terminal and takes into account not only license information but also the user's emotional state to realize a safer and more effective confirmation process.

[0286] First, the terminal takes a picture of the user's license and sends the image to the server. At this time, the camera of the terminal also takes a picture of the user's face and performs emotion analysis by the emotion engine. Emotion information includes states such as tension, anger, and relief. Thereby, the server extracts the information described on the license using the generative AI and analyzes the user's emotional state using the emotion engine.

[0287] The server compares the extracted license information with the database, checks the past confirmation history, and then determines the necessity for identity confirmation. At the same time, taking into account the results of the emotion analysis, it decides on an appropriate response according to the user's state. Specifically, when the user is in a calm state, the system continues with normal processing, and when signs of tension or suspicion are observed, it guides additional confirmation means.

[0288] For example, when the user presents the license to the terminal during an interrogation, the emotion engine captures the user's subtle emotions together with the license information and is analyzed by the server. As a result, if there is confirmed data and it is confirmed that the user is in a state of relaxation, "Confirmation completed" will be quickly displayed. On the other hand, additional confirmation through further dialogue may be instructed as needed.

[0289] In this way, the invention integrates generative AI and an emotion engine, providing flexibility and reliability not found in conventional license verification systems. This makes the verification process safer and more effective.

[0290] The following describes the processing flow.

[0291] Step 1:

[0292] The user uses the device's camera to take pictures of their driver's license and their face simultaneously. The device then prepares to process these image data.

[0293] Step 2:

[0294] The device sends image data of the driver's license and facial image data to the server. A secure communication protocol is used for transmission to ensure data security.

[0295] Step 3:

[0296] The server analyzes the image data received from the terminal using a generating AI. The generating AI recognizes text information (name, license number, expiration date, etc.) from the driver's license and also analyzes the facial photograph to extract necessary specific information.

[0297] Step 4:

[0298] The server activates an emotion engine based on facial image data to analyze the user's emotional state. The emotion engine detects emotions such as tension, joy, and anger from facial expressions.

[0299] Step 5:

[0300] The server accesses the database using specific information and searches for past verification history. It checks if there is a history of verification and uses that information to decide whether to simplify the verification process.

[0301] Step 6:

[0302] Based on the verification result and the sentiment analysis result, the server makes a final judgment. If the user's sentiment is unstable and the past verification is outstanding, additional identity verification means will be induced.

[0303] Step 7:

[0304] The server transmits the final result to the terminal. The terminal displays the received result to the user and notifies the completion of the confirmation process. Thereby, the optimal instruction is immediately conveyed to the user.

[0305] (Example 2)

[0306] Next, Example 2 will be described. In the following description, the data processing device 12 is referred to as the "server", and the smart glasses 214 are referred to as the "terminal".

[0307] In the conventional verification system, the processing of personal verification information is mainly performed based on visual information, and there is a problem that it is difficult to make a dynamic response considering the user's emotional state. For this reason, there may be a lack of additional verification means or appropriate responses for users with unstable identity verification, and there is a problem that it is difficult to improve safety and efficiency.

[0308] The specific processing by the specific processing unit 290 of the data processing device 12 in Example 2 is realized by the following respective means.

[0309] In this invention, the server includes means for acquiring personal verification information and emotional state using image acquisition means, means for analyzing image data by a generation AI, and means for classifying an individual's emotion based on emotional state information. Thereby, not only the necessity of identity verification but also a flexible response adapted to the user's emotional state becomes possible.

[0310] The "image acquisition means" is a device or method for acquiring image data including personal verification information, and specifically refers to the imaging device of a portable information terminal.

[0311] "Generative AI" is a type of artificial intelligence technology used to analyze input image data and recognize specific information that is necessary.

[0312] "Emotional state information" refers to data that indicates an individual's emotional state and includes elements that classify emotions such as tension, anger, and relief.

[0313] A "database" is an aggregate of information that stores and manages historical personal identification information and emotional state data recorded in the past, and assists in matching that information.

[0314] "Identity verification" is the process of confirming that the information provided by an individual matches past databases and determining the reliability of that information.

[0315] The system of the present invention has a configuration that combines a portable information terminal and a server for verifying the user's license. The terminal is equipped with a high-performance imaging device and is capable of acquiring images of the user's presented license and the user's face. The acquired image data is transmitted to the server via the internet and is encrypted and protected through a secure protocol.

[0316] The server utilizes a generative AI model with high accuracy in specific information recognition to analyze the received image data. Specifically, it uses a general cloud service that provides optical character recognition (OCR) technology capable of extracting text information from image data. Furthermore, the server incorporates a dedicated algorithm for emotion classification to analyze emotional states using facial image data. This allows for real-time identification of emotions such as "tension," "anger," and "relief."

[0317] The acquired license information and emotional state information are compared against past verification history stored in the server's database. Based on the comparison results and the user's emotional state, the server determines the next step in the verification process. If the user is in a relaxed state, the normal verification procedure proceeds; if the user is in a tense state, additional verification measures are offered to the user.

[0318] For example, consider a scenario where a user presents their driver's license to a mobile device at a checkpoint. The device automatically takes photos of the license and the user's face and immediately sends them to the server. The server uses AI to extract information such as "Ichiro Tanaka" and "License Number 12345678," and an emotion engine confirms that the user is at ease. If the information has already been verified in the database, the device displays "Verification Complete." An example of a prompt message would be, "Analyze the user's emotions and perform a comparison with past data along with the license information."

[0319] This system can enhance safety and efficiency by providing a new verification process that combines generative AI models with sentiment analysis.

[0320] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0321] Step 1:

[0322] The device uses its camera to photograph the user's driver's license and the user's face. As input, the device acquires image data of the driver's license and the user's face. Specifically, the device captures high-resolution images while adjusting autofocus and exposure. As output, image data is generated.

[0323] Step 2:

[0324] The device sends the captured image data to the server. The input is the image data acquired from the device. Specifically, the device encrypts the data and uploads it to the server using a secure protocol (e.g., HTTPS). The output is the server receiving the image data.

[0325] Step 3:

[0326] The server analyzes the received image data using an AI model to generate license information. The input is image data sent from the terminal. Specifically, optical character recognition (OCR) technology is used to extract text information such as name, date of birth, and license number. The output is specific information.

[0327] Step 4:

[0328] The server analyzes emotional states based on image data. The input is an image of the user's face. Specifically, it uses an emotion analysis algorithm to classify emotions such as tension, anger, and relief. The output is information about the user's emotional state.

[0329] Step 5:

[0330] The server compares the extracted license information with past verification history in the database. The inputs are license information and database history information. Specifically, it performs a database search to check for matching information. The output is the matching result.

[0331] Step 6:

[0332] The server determines the necessity of identity verification and the appropriate response based on the user's situation, using the matching results and emotional state information. Inputs include the matching results and emotional state information. Specifically, if the user is in a state of confidence, the server determines "verification complete"; if the user is in a state of tension, for example, it considers additional verification measures. Outputs include the verification result and proposed additional actions.

[0333] Step 7:

[0334] The terminal displays the final confirmation result to the user based on instructions from the server. The input is the decision result from the server. Specifically, the terminal displays messages such as "Confirmation complete" or "Further confirmation required" on the screen. The output is information displayed to the user.

[0335] (Application Example 2)

[0336] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0337] Traditional identity verification systems often involve manual verification processes for documents such as driver's licenses, which is time-consuming and makes it difficult to provide appropriate responses based on the emotional state of the person being verified. Therefore, there is a need to streamline the verification process and enhance security accordingly. Furthermore, it is necessary to enable flexible responses tailored to individual circumstances by considering the emotional state of the user.

[0338] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0339] In this invention, the server includes a device that inputs image data containing identity verification information using an image acquisition device, a device that analyzes the image data using generated artificial intelligence and recognizes specific information, and a device that analyzes emotional states from facial image data using an emotion analysis engine. This enables more efficient identity verification processes and flexible responses based on emotion analysis.

[0340] An "image acquisition device" is a device used to collect image data, including identity verification information, and specifically refers to the camera on a portable information terminal.

[0341] "Generated artificial intelligence" refers to an algorithm used to analyze data and recognize specific information, and its role is to analyze the input information.

[0342] "Identifiable information" refers to information extracted from image data that is particularly necessary for identification and is an important element in identity verification.

[0343] "Information recording medium" refers to databases and storage devices that store past verification history and data related to personal identification.

[0344] The "emotion analysis engine" is a system that analyzes an individual's emotional state based on facial image data and has the function of automatically identifying the user's emotions.

[0345] The "judgment result" is a conclusion derived based on the acquired information and analysis results, indicating whether or not identity verification is possible.

[0346] The system for realizing this invention consists mainly of a portable information terminal and a server. The terminal uses an image acquisition device (for example, a smartphone camera) to input image data including identity verification information. The user presents their driver's license to the terminal, and the camera simultaneously takes a picture of the license and their face. This allows for the efficient collection of the user's identity information.

[0347] The server uses generated artificial intelligence software (e.g., Google Cloud Vision AI) to analyze the input image data and recognize specific information. Specifically, it extracts important information such as name and date of birth from the driver's license. Then, it compares this information with past verification history stored in a database system (information recording medium) to determine whether verification is necessary.

[0348] Furthermore, by using an emotion analysis engine (for example, Microsoft Face API), the user's emotional state is analyzed from facial image data. This analysis obtains information such as whether the user is relaxed or feeling tense or anxious. This emotional information is then used as data to enable flexible responses tailored to the user's situation.

[0349] For example, at the entrance of a large building, when a user presents their driver's license on a terminal, facial recognition and emotion analysis are performed smoothly. If the information matches the pre-registered data, entry is granted immediately. If the user feels uneasy, they may be guided through additional verification procedures.

[0350] In this way, efficient identity verification and secure security management are achieved simultaneously. In this invention, an example of a prompt message for effectively utilizing the generating AI model is: "Extract the owner's name and date of birth from the driver's license image and perform sentiment analysis. If it matches the list data, generate an entry permission message."

[0351] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0352] Step 1:

[0353] The user uses the camera on their portable information terminal to take pictures of their driver's license and their face. The terminal then obtains image data of the driver's license and their face. The input at this stage is raw image data captured by the camera, and this data is processed into a user-friendly format before being passed on to the next process.

[0354] Step 2:

[0355] The terminal sends the acquired image data to the server. The server analyzes the image data using a generative AI model and extracts specific information from the driver's license, such as the owner's name and date of birth. This process outputs the specific information in text format. The input is the acquired image data, and the output is the specific information in text format.

[0356] Step 3:

[0357] The server compares the extracted specific information with past verification history stored on the information storage medium. The server uses database queries to confirm the match and determines whether identity verification is necessary based on the input information. The result of this step is output as a determination of whether verification is possible or not.

[0358] Step 4:

[0359] The server uses an emotion analysis engine to analyze the user's emotional state from facial image data. Facial image data is used as input, and the result of the emotion analysis is output as the user's emotional state, such as relaxed, tense, or anxious.

[0360] Step 5:

[0361] The server determines appropriate feedback for the user based on the results of matching specific information and sentiment analysis. Based on the input data, which includes specific information and emotional state, the server determines whether to grant entry or require additional verification, and outputs a message accordingly.

[0362] Step 6:

[0363] The user receives feedback from the server on their device and follows the displayed instructions. Based on the instructions, they choose to proceed with the entry process or undergo additional verification. The input in this step is feedback from the server and is output as the user's action.

[0364] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0365] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0366] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0367] [Third Embodiment]

[0368] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0369] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0370] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0371] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0372] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0373] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0374] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0375] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0376] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0377] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0378] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0379] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0380] The present invention provides a system for efficiently verifying driver's license information during questioning by police officers using a mobile terminal carried by the officer. The terminal takes a picture of the user's driver's license with its camera and sends the image to a server. Upon receiving the image, the server uses a generating AI to analyze the text information and facial photograph within the image and extract the necessary specific information.

[0381] The server then uses the extracted specific information to access the database and compare it with past verification history. If an existing verification history exists in the database, it decides to skip the detailed verification process and notifies the terminal that verification is complete. This process reduces the burden on both the user and the police officer, and enables rapid identity verification.

[0382] As a concrete example, when a police officer takes a picture of a driver's license with the camera on their terminal during a traffic checkpoint, that information is automatically sent to the server. Subsequently, if the same user's license information has been verified in the past, it will immediately display "Verified" based on that information. This allows users to avoid duplicate verification, and police officers can improve their work efficiency.

[0383] The following describes the processing flow.

[0384] Step 1:

[0385] The user takes a picture of their driver's license using the device's camera and saves the image data to the device. The device then prepares to process this image data.

[0386] Step 2:

[0387] The device sends the image data of the driver's license that it has photographed to the server. A secure communication protocol is used for transmission to ensure the safety of the data.

[0388] Step 3:

[0389] The server analyzes the image data received from the terminal using a generating AI. The generating AI recognizes the text on the driver's license (name, license number, expiration date, etc.) and extracts necessary specific information by analyzing the facial photograph.

[0390] Step 4:

[0391] The server accesses the database based on the extracted specific information and searches for past verification history. If a matching history is found, it retrieves the information and performs a comparison.

[0392] Step 5:

[0393] The server determines whether reconfirmation is necessary based on the results of a comparison with past confirmation history. If it is determined that reconfirmation is not necessary, it generates a confirmation result that includes that statement.

[0394] Step 6:

[0395] The server sends the verification result to the terminal. The terminal displays the received result to the user and notifies them that the verification is complete.

[0396] (Example 1)

[0397] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0398] Traditional identity verification methods often required manual verification, resulting in inefficiency and time-consuming processes. Furthermore, the need for repeated verification added significant complexity and slow processing, placing a heavy burden on both police officers and users. There is a need to address these issues and provide a fast and efficient method of identity verification.

[0399] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0400] In this invention, the server includes means for inputting image data containing personal identification information using image acquisition means, means for analyzing the input image data using a generating AI and recognizing specific information, and means for executing a face recognition algorithm using the analyzed specific information. This makes it possible to perform identity verification quickly and efficiently. This method significantly reduces the effort required even when re-verification is necessary, thereby reducing the burden on police officers and users.

[0401] "Image acquisition means" refers to devices or methods for acquiring image data that includes personal identification information.

[0402] "Generative AI" is a technology that uses generative artificial intelligence to analyze data and derive new information.

[0403] "Identifiable information" refers to information necessary to identify an individual, primarily including text data and facial image data.

[0404] A "face recognition algorithm" refers to a computational method used to identify individuals by analyzing facial features from images.

[0405] A "database" is a collection of information where past verification history and other related information are recorded and managed.

[0406] "Verification" is the process of comparing a large amount of information with newly acquired information to determine whether they match.

[0407] "Identity verification" refers to a series of procedures to confirm an individual's identity and determine whether they meet the required criteria.

[0408] A "portable information processing device" refers to a device that is easily portable and capable of performing calculations and information processing.

[0409] This invention provides a system for police officers to quickly verify the identity of individuals during questioning. Specifically, it uses a mobile terminal as a portable information processing device. The terminal's built-in camera captures an image of the user's driver's license. This image data is transmitted to a server via a secure protocol.

[0410] The server analyzes the received image data using a generation AI model and extracts text information from the image using OCR technology. Simultaneously, it applies a facial recognition algorithm to extract features from facial photographs. Through these processes, specific information for identifying an individual is generated. The generated information is queried from a database and compared with past verification history.

[0411] For example, if a police officer photographs a driver's license with a camera during a traffic checkpoint, this information is processed in real time. This allows for a quick determination of whether the user is verified, and the information is sent to the device. An example of a prompt message used in this process is, "Extract text information from this image and identify and return the name, license number, and date of birth."

[0412] This streamlines the repetitive process of identity verification, significantly reducing the burden on both police officers and users. Furthermore, the faster processing allows police officers to focus on other tasks.

[0413] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0414] Step 1:

[0415] The user presents their driver's license to the police officer. The police officer uses a terminal to take a picture of the license. The input for this step is the physical driver's license, and the output is digital image data. Autofocus and exposure adjustment functions are in place to ensure the image is clear and accurate.

[0416] Step 2:

[0417] The device digitizes the captured image data and sends it to the server via a secure network protocol (e.g., HTTPS). The input to this process is image data, and the output is encrypted image data. A checksum is generated before data transfer to ensure data integrity.

[0418] Step 3:

[0419] The server inputs the received image data into the generating AI model. Here, the prompt "Extract text information from this image and identify and return the name, license number, and date of birth" is used. The input for this step is encrypted image data, and the output is identified information (text information, feature data from the facial image). OCR technology and facial recognition algorithms are used in combination to analyze the information.

[0420] Step 4:

[0421] The server uses the extracted specific information to make a matching request to the database. The input for this step is the specific information, and the output is the verification result. Indexes are used to optimize database searches, enabling faster matching.

[0422] Step 5:

[0423] The server determines the need for identity verification based on the database matching results. The input for this step is the matching result, and the output is the determination regarding the need for verification. If the information has been previously verified, a flag is set to skip the re-verification procedure.

[0424] Step 6:

[0425] The server sends the judgment result back to the terminal, which then displays the information to the user and the police officer. The input for this step is the judgment result, and the output is a visual display of the confirmed status. Visual icons or text are displayed on the terminal to indicate that the confirmation is complete.

[0426] (Application Example 1)

[0427] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0428] The problem that this invention aims to solve is the time-consuming and cumbersome process of verifying an individual's identity quickly and efficiently using conventional methods. Furthermore, it is necessary to minimize visitor waiting times and ensure smooth entry into facilities.

[0429] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0430] In this invention, the server includes means for inputting image data containing identification information using an image acquisition device, means for analyzing the input image data using generation AI technology and recognizing specific information, and means for comparing the specific information with past identification history recorded in an information storage device. This enables rapid identity verification of individuals and minimizes waiting times for visitors.

[0431] An "image acquisition device" is a device for inputting image data, and includes the camera of a mobile information terminal.

[0432] "Identifying information" refers to data that includes information necessary to identify an individual.

[0433] "Generative AI technology" is an artificial intelligence technology used to analyze image data and recognize specific information.

[0434] An "information storage device" is a device that has the function of recording and storing past identification history.

[0435] "Identifiable information" refers to information necessary for identifying a person, extracted through image analysis.

[0436] "Verification" is the process of comparing extracted specific information with past identification history.

[0437] A "display device" is a device that visually displays confirmed notifications and other related information.

[0438] In this embodiment of the invention, a system is constructed to perform rapid and efficient personal identification by combining a mobile information terminal and a server. The mobile information terminal is equipped with a camera and software that supports image analysis, and is used to acquire image data containing identification information.

[0439] The terminal's camera takes a picture of the visitor's identification and sends the image to the server. The server analyzes the image data using generative AI technology and extracts specific information. It is desirable to use the latest image recognition technology as the generative AI model.

[0440] After specific information is extracted, the server accesses the information storage device and compares the specific information with past identification history. A database management system (DBMS) is used as the information storage device, for example, it might be built as an SQL database. Based on the comparison results, the server notifies the mobile device that the information has been verified.

[0441] The terminal visually notifies the user via a display device once verification is complete. This notification minimizes user waiting time and enables smooth entry management.

[0442] As a concrete example, it can be used at company reception areas and security gates. When a visitor arrives at the reception, the staff member can quickly verify their identity using a mobile device, receive an immediate "verified" message, and allow the visitor to pass through.

[0443] An example of a prompt used is, "Analyze the text information and facial photograph on the identification document and compare it with past verification history." Based on this prompt, the generating AI model performs appropriate analysis and recognition.

[0444] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0445] Step 1:

[0446] The device uses its camera to photograph the user's identification document and acquires image data. This image data becomes the input.

[0447] Step 2:

[0448] The terminal sends the acquired image data to the server. The server receives it and prepares for the next processing step. At this point, the server has the image data as input.

[0449] Step 3:

[0450] The server analyzes the received image data using a generation AI model. It applies the prompt "Analyze the text information and facial photograph from the identification document and compare it with past verification history," extracting specific information from the text and facial photograph. This analysis result becomes the intermediate output.

[0451] Step 4:

[0452] The server accesses the information storage device based on the extracted specific information and compares it with past identification history. The relevant history is retrieved from the database, and the comparison result is obtained as output.

[0453] Step 5:

[0454] The server determines the user's verification status based on the matching results and, if verified, sends that information to the terminal. The terminal receives this notification and displays "Verified" on its display device.

[0455] Step 6:

[0456] The terminal ultimately provides the user with visual notifications and offers further guidance based on the information. Specifically, the user can receive a confirmed notification and promptly enter the facility.

[0457] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0458] This invention is a system that integrates an emotion engine to analyze the user's emotional state, in addition to a system that streamlines license verification during police questioning. This system uses a mobile device and considers not only license information but also the user's emotional state to realize a safer and more effective verification process.

[0459] The terminal first takes a picture of the user's driver's license and sends the image to the server. At the same time, the terminal's camera also takes a picture of the user's face, and the emotion engine analyzes their emotions. Emotional information includes states such as tension, anger, and relief. Based on this, the server uses a generative AI to extract information from the driver's license and uses the emotion engine to analyze the user's emotional state.

[0460] The server compares the extracted license information with the database, checks past verification history, and determines the need for identity verification. Simultaneously, it takes into account the results of sentiment analysis to determine the appropriate response based on the user's state. Specifically, if the user is calm, the system continues normal processing; if tension or suspicion is detected, it guides the user through additional verification methods.

[0461] For example, if a user presents their driver's license to a terminal during a checkpoint, the emotion engine captures the user's subtle emotions along with the license information, and this is analyzed on the server. As a result, if verified data exists and the user is confirmed to be in a state of confidence, "Verification Complete" is quickly displayed. On the other hand, if necessary, further dialogue-based verification may be requested.

[0462] In this way, the invention integrates generative AI and an emotion engine, providing flexibility and reliability not found in conventional license verification systems. This makes the verification process safer and more effective.

[0463] The following describes the processing flow.

[0464] Step 1:

[0465] The user uses the device's camera to take pictures of their driver's license and their face simultaneously. The device then prepares to process these image data.

[0466] Step 2:

[0467] The device sends image data of the driver's license and facial image data to the server. A secure communication protocol is used for transmission to ensure data security.

[0468] Step 3:

[0469] The server analyzes the image data received from the terminal using a generating AI. The generating AI recognizes text information (name, license number, expiration date, etc.) from the driver's license and also analyzes the facial photograph to extract necessary specific information.

[0470] Step 4:

[0471] The server activates an emotion engine based on facial image data to analyze the user's emotional state. The emotion engine detects emotions such as tension, joy, and anger from facial expressions.

[0472] Step 5:

[0473] The server accesses the database using specific information and searches for past verification history. It checks if there is a history of verification and uses that information to decide whether to simplify the verification process.

[0474] Step 6:

[0475] The server makes a final decision based on the matching results and sentiment analysis results. If the user's sentiment is unstable and past verification has not been completed, additional identity verification methods will be initiated.

[0476] Step 7:

[0477] The server sends the final result to the terminal. The terminal displays the received result to the user and notifies them that the verification process is complete. This ensures that the user receives the most appropriate instructions immediately.

[0478] (Example 2)

[0479] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0480] Conventional verification systems primarily process personal identification information based on visual data, making it difficult to provide dynamic responses that take into account the user's emotional state. This resulted in a lack of additional verification methods or appropriate responses for users with unstable identity verification, hindering improvements in security and efficiency.

[0481] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0482] In this invention, the server includes means for acquiring personal identification information and emotional state using image acquisition means, means for analyzing image data using generating AI, and means for classifying an individual's emotions based on emotional state information. This enables not only the need for identity verification but also flexible responses adapted to the user's emotional state.

[0483] "Image acquisition means" refers to devices or methods for acquiring image data that includes personal identification information, and specifically refers to the camera equipment of a portable information terminal.

[0484] "Generative AI" is a type of artificial intelligence technology used to analyze input image data and recognize specific information that is necessary.

[0485] "Emotional state information" refers to data that indicates an individual's emotional state and includes elements that classify emotions such as tension, anger, and relief.

[0486] A "database" is an aggregate of information that stores and manages historical personal identification information and emotional state data recorded in the past, and assists in matching that information.

[0487] "Identity verification" is the process of confirming that the information provided by an individual matches past databases and determining the reliability of that information.

[0488] The system of the present invention has a configuration that combines a portable information terminal and a server for verifying the user's license. The terminal is equipped with a high-performance imaging device and is capable of acquiring images of the user's presented license and the user's face. The acquired image data is transmitted to the server via the internet and is encrypted and protected through a secure protocol.

[0489] The server utilizes a generative AI model with high accuracy in specific information recognition to analyze the received image data. Specifically, it uses a general cloud service that provides optical character recognition (OCR) technology capable of extracting text information from image data. Furthermore, the server incorporates a dedicated algorithm for emotion classification to analyze emotional states using facial image data. This allows for real-time identification of emotions such as "tension," "anger," and "relief."

[0490] The acquired license information and emotional state information are compared against past verification history stored in the server's database. Based on the comparison results and the user's emotional state, the server determines the next step in the verification process. If the user is in a relaxed state, the normal verification procedure proceeds; if the user is in a tense state, additional verification measures are offered to the user.

[0491] For example, consider a scenario where a user presents their driver's license to a mobile device at a checkpoint. The device automatically takes photos of the license and the user's face and immediately sends them to the server. The server uses AI to extract information such as "Ichiro Tanaka" and "License Number 12345678," and an emotion engine confirms that the user is at ease. If the information has already been verified in the database, the device displays "Verification Complete." An example of a prompt message would be, "Analyze the user's emotions and perform a comparison with past data along with the license information."

[0492] This system can enhance safety and efficiency by providing a new verification process that combines generative AI models with sentiment analysis.

[0493] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0494] Step 1:

[0495] The device uses its camera to photograph the user's driver's license and the user's face. As input, the device acquires image data of the driver's license and the user's face. Specifically, the device captures high-resolution images while adjusting autofocus and exposure. As output, image data is generated.

[0496] Step 2:

[0497] The device sends the captured image data to the server. The input is the image data acquired from the device. Specifically, the device encrypts the data and uploads it to the server using a secure protocol (e.g., HTTPS). The output is the server receiving the image data.

[0498] Step 3:

[0499] The server analyzes the received image data using an AI model to generate license information. The input is image data sent from the terminal. Specifically, optical character recognition (OCR) technology is used to extract text information such as name, date of birth, and license number. The output is specific information.

[0500] Step 4:

[0501] The server analyzes emotional states based on image data. The input is an image of the user's face. Specifically, it uses an emotion analysis algorithm to classify emotions such as tension, anger, and relief. The output is information about the user's emotional state.

[0502] Step 5:

[0503] The server compares the extracted license information with past verification history in the database. The inputs are license information and database history information. Specifically, it performs a database search to check for matching information. The output is the matching result.

[0504] Step 6:

[0505] The server determines the necessity of identity verification and the appropriate response based on the user's situation, using the matching results and emotional state information. Inputs include the matching results and emotional state information. Specifically, if the user is in a state of confidence, the server determines "verification complete"; if the user is in a state of tension, for example, it considers additional verification measures. Outputs include the verification result and proposed additional actions.

[0506] Step 7:

[0507] The terminal displays the final confirmation result to the user based on instructions from the server. The input is the decision result from the server. Specifically, the terminal displays messages such as "Confirmation complete" or "Further confirmation required" on the screen. The output is information displayed to the user.

[0508] (Application Example 2)

[0509] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0510] Traditional identity verification systems often involve manual verification processes for documents such as driver's licenses, which is time-consuming and makes it difficult to provide appropriate responses based on the emotional state of the person being verified. Therefore, there is a need to streamline the verification process and enhance security accordingly. Furthermore, it is necessary to enable flexible responses tailored to individual circumstances by considering the emotional state of the user.

[0511] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0512] In this invention, the server includes a device that inputs image data containing identity verification information using an image acquisition device, a device that analyzes the image data using generated artificial intelligence and recognizes specific information, and a device that analyzes emotional states from facial image data using an emotion analysis engine. This enables more efficient identity verification processes and flexible responses based on emotion analysis.

[0513] An "image acquisition device" is a device used to collect image data, including identity verification information, and specifically refers to the camera on a portable information terminal.

[0514] "Generated artificial intelligence" refers to an algorithm used to analyze data and recognize specific information, and its role is to analyze the input information.

[0515] "Identifiable information" refers to information extracted from image data that is particularly necessary for identification and is an important element in identity verification.

[0516] "Information recording medium" refers to databases and storage devices that store past verification history and data related to personal identification.

[0517] The "emotion analysis engine" is a system that analyzes an individual's emotional state based on facial image data and has the function of automatically identifying the user's emotions.

[0518] The "judgment result" is a conclusion derived based on the acquired information and analysis results, indicating whether or not identity verification is possible.

[0519] The system for realizing this invention consists mainly of a portable information terminal and a server. The terminal uses an image acquisition device (for example, a smartphone camera) to input image data including identity verification information. The user presents their driver's license to the terminal, and the camera simultaneously takes a picture of the license and their face. This allows for the efficient collection of the user's identity information.

[0520] The server uses generated artificial intelligence software (e.g., Google Cloud Vision AI) to analyze the input image data and recognize specific information. Specifically, it extracts important information such as name and date of birth from the driver's license. Then, it compares this information with past verification history stored in a database system (information recording medium) to determine whether verification is necessary.

[0521] Furthermore, by using an emotion analysis engine (for example, Microsoft Face API), the user's emotional state is analyzed from facial image data. This analysis obtains information such as whether the user is relaxed or feeling tense or anxious. This emotional information is then used as data to enable flexible responses tailored to the user's situation.

[0522] For example, at the entrance of a large building, when a user presents their driver's license on a terminal, facial recognition and emotion analysis are performed smoothly. If the information matches the pre-registered data, entry is granted immediately. If the user feels uneasy, they may be guided through additional verification procedures.

[0523] In this way, efficient identity verification and secure security management are achieved simultaneously. In this invention, an example of a prompt message for effectively utilizing the generating AI model is: "Extract the owner's name and date of birth from the driver's license image and perform sentiment analysis. If it matches the list data, generate an entry permission message."

[0524] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0525] Step 1:

[0526] The user uses the camera on their portable information terminal to take pictures of their driver's license and their face. The terminal then obtains image data of the driver's license and their face. The input at this stage is raw image data captured by the camera, and this data is processed into a user-friendly format before being passed on to the next process.

[0527] Step 2:

[0528] The terminal sends the acquired image data to the server. The server analyzes the image data using a generative AI model and extracts specific information from the driver's license, such as the owner's name and date of birth. This process outputs the specific information in text format. The input is the acquired image data, and the output is the specific information in text format.

[0529] Step 3:

[0530] The server compares the extracted specific information with past verification history stored on the information storage medium. The server uses database queries to confirm the match and determines whether identity verification is necessary based on the input information. The result of this step is output as a determination of whether verification is possible or not.

[0531] Step 4:

[0532] The server uses an emotion analysis engine to analyze the user's emotional state from facial image data. Facial image data is used as input, and the result of the emotion analysis is output as the user's emotional state, such as relaxed, tense, or anxious.

[0533] Step 5:

[0534] The server determines appropriate feedback for the user based on the results of matching specific information and sentiment analysis. Based on the input data, which includes specific information and emotional state, the server determines whether to grant entry or require additional verification, and outputs a message accordingly.

[0535] Step 6:

[0536] The user receives feedback from the server on their device and follows the displayed instructions. Based on the instructions, they choose to proceed with the entry process or undergo additional verification. The input in this step is feedback from the server and is output as the user's action.

[0537] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0538] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0539] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0540] [Fourth Embodiment]

[0541] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0542] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0543] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0544] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0545] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0546] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0547] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0548] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0549] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0550] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0551] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0552] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0553] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0554] The present invention provides a system for efficiently verifying driver's license information during questioning by police officers using a mobile terminal carried by the officer. The terminal takes a picture of the user's driver's license with its camera and sends the image to a server. Upon receiving the image, the server uses a generating AI to analyze the text information and facial photograph within the image and extract the necessary specific information.

[0555] The server then uses the extracted specific information to access the database and compare it with past verification history. If an existing verification history exists in the database, it decides to skip the detailed verification process and notifies the terminal that verification is complete. This process reduces the burden on both the user and the police officer, and enables rapid identity verification.

[0556] As a concrete example, when a police officer takes a picture of a driver's license with the camera on their terminal during a traffic checkpoint, that information is automatically sent to the server. Subsequently, if the same user's license information has been verified in the past, it will immediately display "Verified" based on that information. This allows users to avoid duplicate verification, and police officers can improve their work efficiency.

[0557] The following describes the processing flow.

[0558] Step 1:

[0559] The user takes a picture of their driver's license using the device's camera and saves the image data to the device. The device then prepares to process this image data.

[0560] Step 2:

[0561] The device sends the image data of the driver's license that it has photographed to the server. A secure communication protocol is used for transmission to ensure the safety of the data.

[0562] Step 3:

[0563] The server analyzes the image data received from the terminal using a generating AI. The generating AI recognizes the text on the driver's license (name, license number, expiration date, etc.) and extracts necessary specific information by analyzing the facial photograph.

[0564] Step 4:

[0565] The server accesses the database based on the extracted specific information and searches for past verification history. If a matching history is found, it retrieves the information and performs a comparison.

[0566] Step 5:

[0567] The server determines whether reconfirmation is necessary based on the results of a comparison with past confirmation history. If it is determined that reconfirmation is not necessary, it generates a confirmation result that includes that statement.

[0568] Step 6:

[0569] The server sends the verification result to the terminal. The terminal displays the received result to the user and notifies them that the verification is complete.

[0570] (Example 1)

[0571] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0572] Traditional identity verification methods often required manual verification, resulting in inefficiency and time-consuming processes. Furthermore, the need for repeated verification added significant complexity and slow processing, placing a heavy burden on both police officers and users. There is a need to address these issues and provide a fast and efficient method of identity verification.

[0573] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0574] In this invention, the server includes means for inputting image data containing personal identification information using image acquisition means, means for analyzing the input image data using a generating AI and recognizing specific information, and means for executing a face recognition algorithm using the analyzed specific information. This makes it possible to perform identity verification quickly and efficiently. This method significantly reduces the effort required even when re-verification is necessary, thereby reducing the burden on police officers and users.

[0575] "Image acquisition means" refers to devices or methods for acquiring image data that includes personal identification information.

[0576] "Generative AI" is a technology that uses generative artificial intelligence to analyze data and derive new information.

[0577] "Identifiable information" refers to information necessary to identify an individual, primarily including text data and facial image data.

[0578] A "face recognition algorithm" refers to a computational method used to identify individuals by analyzing facial features from images.

[0579] A "database" is a collection of information where past verification history and other related information are recorded and managed.

[0580] "Verification" is the process of comparing a large amount of information with newly acquired information to determine whether they match.

[0581] "Identity verification" refers to a series of procedures to confirm an individual's identity and determine whether they meet the required criteria.

[0582] A "portable information processing device" refers to a device that is easily portable and capable of performing calculations and information processing.

[0583] This invention provides a system for police officers to quickly verify the identity of individuals during questioning. Specifically, it uses a mobile terminal as a portable information processing device. The terminal's built-in camera captures an image of the user's driver's license. This image data is transmitted to a server via a secure protocol.

[0584] The server analyzes the received image data using a generation AI model and extracts text information from the image using OCR technology. Simultaneously, it applies a facial recognition algorithm to extract features from facial photographs. Through these processes, specific information for identifying an individual is generated. The generated information is queried from a database and compared with past verification history.

[0585] For example, if a police officer photographs a driver's license with a camera during a traffic checkpoint, this information is processed in real time. This allows for a quick determination of whether the user is verified, and the information is sent to the device. An example of a prompt message used in this process is, "Extract text information from this image and identify and return the name, license number, and date of birth."

[0586] This streamlines the repetitive process of identity verification, significantly reducing the burden on both police officers and users. Furthermore, the faster processing allows police officers to focus on other tasks.

[0587] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0588] Step 1:

[0589] The user presents their driver's license to the police officer. The police officer uses a terminal to take a picture of the license. The input for this step is the physical driver's license, and the output is digital image data. Autofocus and exposure adjustment functions are in place to ensure the image is clear and accurate.

[0590] Step 2:

[0591] The device digitizes the captured image data and sends it to the server via a secure network protocol (e.g., HTTPS). The input to this process is image data, and the output is encrypted image data. A checksum is generated before data transfer to ensure data integrity.

[0592] Step 3:

[0593] The server inputs the received image data into the generating AI model. Here, the prompt "Extract text information from this image and identify and return the name, license number, and date of birth" is used. The input for this step is encrypted image data, and the output is identified information (text information, feature data from the facial image). OCR technology and facial recognition algorithms are used in combination to analyze the information.

[0594] Step 4:

[0595] The server uses the extracted specific information to make a matching request to the database. The input for this step is the specific information, and the output is the verification result. Indexes are used to optimize database searches, enabling faster matching.

[0596] Step 5:

[0597] The server determines the need for identity verification based on the database matching results. The input for this step is the matching result, and the output is the determination regarding the need for verification. If the information has been previously verified, a flag is set to skip the re-verification procedure.

[0598] Step 6:

[0599] The server sends the judgment result back to the terminal, which then displays the information to the user and the police officer. The input for this step is the judgment result, and the output is a visual display of the confirmed status. Visual icons or text are displayed on the terminal to indicate that the confirmation is complete.

[0600] (Application Example 1)

[0601] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0602] The problem that this invention aims to solve is the time-consuming and cumbersome process of verifying an individual's identity quickly and efficiently using conventional methods. Furthermore, it is necessary to minimize visitor waiting times and ensure smooth entry into facilities.

[0603] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0604] In this invention, the server includes means for inputting image data containing identification information using an image acquisition device, means for analyzing the input image data using generation AI technology and recognizing specific information, and means for comparing the specific information with past identification history recorded in an information storage device. This enables rapid identity verification of individuals and minimizes waiting times for visitors.

[0605] An "image acquisition device" is a device for inputting image data, and includes the camera of a mobile information terminal.

[0606] "Identifying information" refers to data that includes information necessary to identify an individual.

[0607] "Generative AI technology" is an artificial intelligence technology used to analyze image data and recognize specific information.

[0608] An "information storage device" is a device that has the function of recording and storing past identification history.

[0609] "Identifiable information" refers to information necessary for identifying a person, extracted through image analysis.

[0610] "Verification" is the process of comparing extracted specific information with past identification history.

[0611] A "display device" is a device that visually displays confirmed notifications and other related information.

[0612] In this embodiment of the invention, a system is constructed to perform rapid and efficient personal identification by combining a mobile information terminal and a server. The mobile information terminal is equipped with a camera and software that supports image analysis, and is used to acquire image data containing identification information.

[0613] The terminal's camera takes a picture of the visitor's identification and sends the image to the server. The server analyzes the image data using generative AI technology and extracts specific information. It is desirable to use the latest image recognition technology as the generative AI model.

[0614] After specific information is extracted, the server accesses the information storage device and compares the specific information with past identification history. A database management system (DBMS) is used as the information storage device, for example, it might be built as an SQL database. Based on the comparison results, the server notifies the mobile device that the information has been verified.

[0615] The terminal visually notifies the user via a display device once verification is complete. This notification minimizes user waiting time and enables smooth entry management.

[0616] As a concrete example, it can be used at company reception areas and security gates. When a visitor arrives at the reception, the staff member can quickly verify their identity using a mobile device, receive an immediate "verified" message, and allow the visitor to pass through.

[0617] An example of a prompt used is, "Analyze the text information and facial photograph on the identification document and compare it with past verification history." Based on this prompt, the generating AI model performs appropriate analysis and recognition.

[0618] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0619] Step 1:

[0620] The device uses its camera to photograph the user's identification document and acquires image data. This image data becomes the input.

[0621] Step 2:

[0622] The terminal sends the acquired image data to the server. The server receives it and prepares for the next processing step. At this point, the server has the image data as input.

[0623] Step 3:

[0624] The server analyzes the received image data using a generation AI model. It applies the prompt "Analyze the text information and facial photograph from the identification document and compare it with past verification history," extracting specific information from the text and facial photograph. This analysis result becomes the intermediate output.

[0625] Step 4:

[0626] The server accesses the information storage device based on the extracted specific information and compares it with past identification history. The relevant history is retrieved from the database, and the comparison result is obtained as output.

[0627] Step 5:

[0628] The server determines the user's verification status based on the matching results and, if verified, sends that information to the terminal. The terminal receives this notification and displays "Verified" on its display device.

[0629] Step 6:

[0630] The terminal ultimately provides the user with visual notifications and offers further guidance based on the information. Specifically, the user can receive a confirmed notification and promptly enter the facility.

[0631] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0632] This invention is a system that integrates an emotion engine to analyze the user's emotional state, in addition to a system that streamlines license verification during police questioning. This system uses a mobile device and considers not only license information but also the user's emotional state to realize a safer and more effective verification process.

[0633] The terminal first takes a picture of the user's driver's license and sends the image to the server. At the same time, the terminal's camera also takes a picture of the user's face, and the emotion engine analyzes their emotions. Emotional information includes states such as tension, anger, and relief. Based on this, the server uses a generative AI to extract information from the driver's license and uses the emotion engine to analyze the user's emotional state.

[0634] The server compares the extracted license information with the database, checks past verification history, and determines the need for identity verification. Simultaneously, it takes into account the results of sentiment analysis to determine the appropriate response based on the user's state. Specifically, if the user is calm, the system continues normal processing; if tension or suspicion is detected, it guides the user through additional verification methods.

[0635] For example, if a user presents their driver's license to a terminal during a checkpoint, the emotion engine captures the user's subtle emotions along with the license information, and this is analyzed on the server. As a result, if verified data exists and the user is confirmed to be in a state of confidence, "Verification Complete" is quickly displayed. On the other hand, if necessary, further dialogue-based verification may be requested.

[0636] In this way, the invention integrates generative AI and an emotion engine, providing flexibility and reliability not found in conventional license verification systems. This makes the verification process safer and more effective.

[0637] The following describes the processing flow.

[0638] Step 1:

[0639] The user uses the device's camera to take pictures of their driver's license and their face simultaneously. The device then prepares to process these image data.

[0640] Step 2:

[0641] The device sends image data of the driver's license and facial image data to the server. A secure communication protocol is used for transmission to ensure data security.

[0642] Step 3:

[0643] The server analyzes the image data received from the terminal using a generating AI. The generating AI recognizes text information (name, license number, expiration date, etc.) from the driver's license and also analyzes the facial photograph to extract necessary specific information.

[0644] Step 4:

[0645] The server activates an emotion engine based on facial image data to analyze the user's emotional state. The emotion engine detects emotions such as tension, joy, and anger from facial expressions.

[0646] Step 5:

[0647] The server accesses the database using specific information and searches for past verification history. It checks if there is a history of verification and uses that information to decide whether to simplify the verification process.

[0648] Step 6:

[0649] The server makes a final decision based on the matching results and sentiment analysis results. If the user's sentiment is unstable and past verification has not been completed, additional identity verification methods will be initiated.

[0650] Step 7:

[0651] The server sends the final result to the terminal. The terminal displays the received result to the user and notifies them that the verification process is complete. This ensures that the user receives the most appropriate instructions immediately.

[0652] (Example 2)

[0653] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0654] Conventional verification systems primarily process personal identification information based on visual data, making it difficult to provide dynamic responses that take into account the user's emotional state. This resulted in a lack of additional verification methods or appropriate responses for users with unstable identity verification, hindering improvements in security and efficiency.

[0655] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0656] In this invention, the server includes means for acquiring personal identification information and emotional state using image acquisition means, means for analyzing image data using generating AI, and means for classifying an individual's emotions based on emotional state information. This enables not only the need for identity verification but also flexible responses adapted to the user's emotional state.

[0657] "Image acquisition means" refers to devices or methods for acquiring image data that includes personal identification information, and specifically refers to the camera equipment of a portable information terminal.

[0658] "Generative AI" is a type of artificial intelligence technology used to analyze input image data and recognize specific information that is necessary.

[0659] "Emotional state information" refers to data that indicates an individual's emotional state and includes elements that classify emotions such as tension, anger, and relief.

[0660] A "database" is an aggregate of information that stores and manages historical personal identification information and emotional state data recorded in the past, and assists in matching that information.

[0661] "Identity verification" is the process of confirming that the information provided by an individual matches past databases and determining the reliability of that information.

[0662] The system of the present invention has a configuration that combines a portable information terminal and a server for verifying the user's license. The terminal is equipped with a high-performance imaging device and is capable of acquiring images of the user's presented license and the user's face. The acquired image data is transmitted to the server via the internet and is encrypted and protected through a secure protocol.

[0663] The server utilizes a generative AI model with high accuracy in specific information recognition to analyze the received image data. Specifically, it uses a general cloud service that provides optical character recognition (OCR) technology capable of extracting text information from image data. Furthermore, the server incorporates a dedicated algorithm for emotion classification to analyze emotional states using facial image data. This allows for real-time identification of emotions such as "tension," "anger," and "relief."

[0664] The acquired license information and emotional state information are compared against past verification history stored in the server's database. Based on the comparison results and the user's emotional state, the server determines the next step in the verification process. If the user is in a relaxed state, the normal verification procedure proceeds; if the user is in a tense state, additional verification measures are offered to the user.

[0665] For example, consider a scenario where a user presents their driver's license to a mobile device at a checkpoint. The device automatically takes photos of the license and the user's face and immediately sends them to the server. The server uses AI to extract information such as "Ichiro Tanaka" and "License Number 12345678," and an emotion engine confirms that the user is at ease. If the information has already been verified in the database, the device displays "Verification Complete." An example of a prompt message would be, "Analyze the user's emotions and perform a comparison with past data along with the license information."

[0666] This system can enhance safety and efficiency by providing a new verification process that combines generative AI models with sentiment analysis.

[0667] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0668] Step 1:

[0669] The device uses its camera to photograph the user's driver's license and the user's face. As input, the device acquires image data of the driver's license and the user's face. Specifically, the device captures high-resolution images while adjusting autofocus and exposure. As output, image data is generated.

[0670] Step 2:

[0671] The device sends the captured image data to the server. The input is the image data acquired from the device. Specifically, the device encrypts the data and uploads it to the server using a secure protocol (e.g., HTTPS). The output is the server receiving the image data.

[0672] Step 3:

[0673] The server analyzes the received image data using an AI model to generate license information. The input is image data sent from the terminal. Specifically, optical character recognition (OCR) technology is used to extract text information such as name, date of birth, and license number. The output is specific information.

[0674] Step 4:

[0675] The server analyzes emotional states based on image data. The input is an image of the user's face. Specifically, it uses an emotion analysis algorithm to classify emotions such as tension, anger, and relief. The output is information about the user's emotional state.

[0676] Step 5:

[0677] The server compares the extracted license information with past verification history in the database. The inputs are license information and database history information. Specifically, it performs a database search to check for matching information. The output is the matching result.

[0678] Step 6:

[0679] The server determines the necessity of identity verification and the appropriate response based on the user's situation, using the matching results and emotional state information. Inputs include the matching results and emotional state information. Specifically, if the user is in a state of confidence, the server determines "verification complete"; if the user is in a state of tension, for example, it considers additional verification measures. Outputs include the verification result and proposed additional actions.

[0680] Step 7:

[0681] The terminal displays the final confirmation result to the user based on instructions from the server. The input is the decision result from the server. Specifically, the terminal displays messages such as "Confirmation complete" or "Further confirmation required" on the screen. The output is information displayed to the user.

[0682] (Application Example 2)

[0683] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0684] Traditional identity verification systems often involve manual verification processes for documents such as driver's licenses, which is time-consuming and makes it difficult to provide appropriate responses based on the emotional state of the person being verified. Therefore, there is a need to streamline the verification process and enhance security accordingly. Furthermore, it is necessary to enable flexible responses tailored to individual circumstances by considering the emotional state of the user.

[0685] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0686] In this invention, the server includes a device that inputs image data containing identity verification information using an image acquisition device, a device that analyzes the image data using generated artificial intelligence and recognizes specific information, and a device that analyzes emotional states from facial image data using an emotion analysis engine. This enables more efficient identity verification processes and flexible responses based on emotion analysis.

[0687] An "image acquisition device" is a device used to collect image data, including identity verification information, and specifically refers to the camera on a portable information terminal.

[0688] "Generated artificial intelligence" refers to an algorithm used to analyze data and recognize specific information, and its role is to analyze the input information.

[0689] "Identifiable information" refers to information extracted from image data that is particularly necessary for identification and is an important element in identity verification.

[0690] "Information recording medium" refers to databases and storage devices that store past verification history and data related to personal identification.

[0691] The "emotion analysis engine" is a system that analyzes an individual's emotional state based on facial image data and has the function of automatically identifying the user's emotions.

[0692] The "judgment result" is a conclusion derived based on the acquired information and analysis results, indicating whether or not identity verification is possible.

[0693] The system for realizing this invention consists mainly of a portable information terminal and a server. The terminal uses an image acquisition device (for example, a smartphone camera) to input image data including identity verification information. The user presents their driver's license to the terminal, and the camera simultaneously takes a picture of the license and their face. This allows for the efficient collection of the user's identity information.

[0694] The server uses generated artificial intelligence software (e.g., Google Cloud Vision AI) to analyze the input image data and recognize specific information. Specifically, it extracts important information such as name and date of birth from the driver's license. Then, it compares this information with past verification history stored in a database system (information recording medium) to determine whether verification is necessary.

[0695] Furthermore, by using an emotion analysis engine (for example, Microsoft Face API), the user's emotional state is analyzed from facial image data. This analysis obtains information such as whether the user is relaxed or feeling tense or anxious. This emotional information is then used as data to enable flexible responses tailored to the user's situation.

[0696] For example, at the entrance of a large building, when a user presents their driver's license on a terminal, facial recognition and emotion analysis are performed smoothly. If the information matches the pre-registered data, entry is granted immediately. If the user feels uneasy, they may be guided through additional verification procedures.

[0697] In this way, efficient identity verification and secure security management are achieved simultaneously. In this invention, an example of a prompt message for effectively utilizing the generating AI model is: "Extract the owner's name and date of birth from the driver's license image and perform sentiment analysis. If it matches the list data, generate an entry permission message."

[0698] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0699] Step 1:

[0700] The user uses the camera on their portable information terminal to take pictures of their driver's license and their face. The terminal then obtains image data of the driver's license and their face. The input at this stage is raw image data captured by the camera, and this data is processed into a user-friendly format before being passed on to the next process.

[0701] Step 2:

[0702] The terminal sends the acquired image data to the server. The server analyzes the image data using a generative AI model and extracts specific information from the driver's license, such as the owner's name and date of birth. This process outputs the specific information in text format. The input is the acquired image data, and the output is the specific information in text format.

[0703] Step 3:

[0704] The server compares the extracted specific information with past verification history stored on the information storage medium. The server uses database queries to confirm the match and determines whether identity verification is necessary based on the input information. The result of this step is output as a determination of whether verification is possible or not.

[0705] Step 4:

[0706] The server uses an emotion analysis engine to analyze the user's emotional state from facial image data. Facial image data is used as input, and the result of the emotion analysis is output as the user's emotional state, such as relaxed, tense, or anxious.

[0707] Step 5:

[0708] The server determines appropriate feedback for the user based on the results of matching specific information and sentiment analysis. Based on the input data, which includes specific information and emotional state, the server determines whether to grant entry or require additional verification, and outputs a message accordingly.

[0709] Step 6:

[0710] The user receives feedback from the server on their device and follows the displayed instructions. Based on the instructions, they choose to proceed with the entry process or undergo additional verification. The input in this step is feedback from the server and is output as the user's action.

[0711] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0712] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). One example of data generation model 58 is ChatGPT (Internet search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0713] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[0714] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[0715] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[0716] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[0717] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[0718] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[0719] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[0720] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values ​​representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values ​​representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[0721] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[0722] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[0723] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[0724] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[0725] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[0726] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[0727] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[0728] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[0729] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[0730] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[0731] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[0732] The following is further disclosed regarding the embodiments described above.

[0733] (Claim 1)

[0734] A means for inputting image data containing personal identification information using an image acquisition means,

[0735] A means of analyzing image data input by a generating AI and recognizing specific information,

[0736] A means of comparing past verification history recorded in the database with specific information,

[0737] A means for determining the necessity of identity verification based on the aforementioned specific information and the matching results,

[0738] A means for outputting the judgment result,

[0739] A system that includes this.

[0740] (Claim 2)

[0741] The system according to claim 1, wherein the image acquisition means is a camera of a mobile terminal.

[0742] (Claim 3)

[0743] The system according to claim 1, wherein the database includes historical information on personal verification, and it is possible to determine whether re-verification is unnecessary.

[0744] "Example 1"

[0745] (Claim 1)

[0746] A means for inputting image data containing personal identification information using an image acquisition means,

[0747] A means of analyzing image data input by a generating AI and recognizing specific information,

[0748] A means for executing a face recognition algorithm using the analyzed specific information,

[0749] A means of comparing past verification history recorded in the database with specific information,

[0750] A means for determining the necessity of identity verification based on the aforementioned specific information and the matching results,

[0751] A means for outputting the judgment result,

[0752] A system that includes this.

[0753] (Claim 2)

[0754] The system according to claim 1, wherein the image acquisition means is a camera of a portable information processing device.

[0755] (Claim 3)

[0756] The system according to claim 1, wherein the database includes historical information on personal identification, and it is possible to make a decision to omit the re-verification procedure after matching.

[0757] "Application Example 1"

[0758] (Claim 1)

[0759] A means for inputting image data containing identification information using an image acquisition device,

[0760] A means of analyzing input image data using generative AI technology and recognizing specific information,

[0761] A means for comparing past identification history and specific information recorded in an information storage device,

[0762] A means for determining the necessity of person verification based on the aforementioned specific information and matching results,

[0763] A means for outputting the judgment result,

[0764] A means for providing immediate notification upon confirmation and a display device to minimize waiting time,

[0765] A system that includes this.

[0766] (Claim 2)

[0767] The system according to claim 1, wherein the image acquisition device is a camera for a portable information terminal.

[0768] (Claim 3)

[0769] The system according to claim 1, wherein the information storage device includes a history of identification and verification, can determine whether re-verification is unnecessary, and further has a function for managing simple entry permits.

[0770] "Example 2 of combining an emotion engine"

[0771] (Claim 1)

[0772] A means for acquiring image data containing personal identification information and an individual's emotional state using an image acquisition means,

[0773] A means of analyzing image data input by a generating AI and recognizing specific information,

[0774] A method for classifying an individual's emotions based on analyzed emotional state information,

[0775] A means of comparing past verification history recorded in the database with specific information,

[0776] A means for determining the necessity of identity verification based on the aforementioned specific information, the matching results, and emotional state information,

[0777] A means for outputting the judgment result,

[0778] A system that includes this.

[0779] (Claim 2)

[0780] The system according to claim 1, wherein the image acquisition means is a camera device of a portable information terminal.

[0781] (Claim 3)

[0782] The system according to claim 1, wherein the database includes historical information on personal identification and emotional state data, and it is possible to determine whether re-confirmation is unnecessary and to take countermeasures based on the individual's state.

[0783] "Application example 2 when combining with an emotional engine"

[0784] (Claim 1)

[0785] A device that uses an image acquisition device to input image data including identity verification information,

[0786] A device that analyzes input image data using generated artificial intelligence and recognizes specific information,

[0787] A device that compares past verification history recorded in an information recording medium with specific information,

[0788] A device that determines the necessity of personal identification based on the aforementioned specific information and the matching results,

[0789] A device that outputs the judgment result,

[0790] A device that analyzes emotional states from input facial image data using an emotion analysis engine,

[0791] A device that takes into account the results of emotion analysis and performs appropriate processing according to the individual's situation,

[0792] A system that includes this.

[0793] (Claim 2)

[0794] The system according to claim 1, wherein the image acquisition device is a camera for a portable information terminal.

[0795] (Claim 3)

[0796] The system according to claim 1, wherein the information recording medium includes historical information for identity verification, and it is possible to determine whether re-verification is unnecessary. [Explanation of Symbols]

[0797] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. A means for inputting image data containing identification information using an image acquisition device, A means of analyzing input image data using generative AI technology and recognizing specific information, A means for comparing past identification history and specific information recorded in an information storage device, A means for determining the necessity of person verification based on the aforementioned specific information and matching results, A means for outputting the judgment result, A means for providing immediate notification upon confirmation and a display device to minimize waiting time, A system that includes this.

2. The system according to claim 1, wherein the image acquisition device is a camera for a portable information terminal.

3. The system according to claim 1, wherein the information storage device includes a history of identification and verification, can determine whether re-verification is unnecessary, and further has a function for managing simple entry permits.