system

The system addresses labor and security issues in television shopping by using voice recognition and fraud detection to streamline order processing, ensuring accurate identity verification and secure payment, thus improving user satisfaction and security.

JP2026100573APending Publication Date: 2026-06-19SOFTBANK GROUP CORP

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
SOFTBANK GROUP CORP
Filing Date
2024-12-09
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Conventional television shopping systems face issues such as high labor costs, opportunity losses during peak hours, complexity for elderly and technologically unfamiliar users, misunderstandings in order content, input mistakes, and insufficient identity verification and secure payment methods, leading to reduced customer satisfaction and security risks.

Method used

A system that utilizes voice recognition technology to accurately extract order information, confirms orders through voice output, verifies user identity via voice pattern matching, supports multiple payment methods, suggests products based on purchase history, and detects fraudulent orders using emotion analysis and fraud detection mechanisms.

Benefits of technology

Provides a smooth, secure, and inclusive purchasing experience by minimizing errors, ensuring accurate identity verification, and preventing fraudulent transactions, thereby enhancing user satisfaction and system reliability.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026100573000001_ABST
    Figure 2026100573000001_ABST
Patent Text Reader

Abstract

Provide a system. 【Solution means】 Program means for accurately recognizing voice input, Program means for extracting order information from the recognized voice data, Display means for repeating the extracted order information to the customer, Authentication means for performing identity verification by voice pattern matching, Payment processing means that enables selection of multiple payment methods, Recommendation means for recommending products based on purchase history, Sentiment analysis means for analyzing the user's sentiment, Fraud detection means for detecting fraudulent orders, A system including the above.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0005] ,

[0001] The technology of the present disclosure relates to a system.

Background Art

[0002] Patent Document 1 discloses a method for controlling a persona chatbot, which is performed by at least one processor, and includes steps of receiving a user utterance, adding the user utterance to a prompt including an instruction sentence related to an explanation of a character of the chatbot, encoding the prompt, and inputting the encoded prompt into a language model to generate a chatbot utterance in response to the user utterance.

Prior Art Documents

Patent Documents

[0003]

Patent Document 1

Summary of the Invention

Problems to be Solved by the Invention

[0004] In conventional television shopping, since it relied on order processing through telephone operators, there were problems such as an increase in labor costs and opportunity losses due to a shortage of operators during peak hours. Also, for elderly people and users unfamiliar with technology, the voice order process is often complex, and misunderstandings in order content and input mistakes reduce customer satisfaction. Furthermore, insufficient confirmation of the person and ensuring a secure payment method were also issues.

Means for Solving the Problems

[0005] This invention includes a program that accurately recognizes voice input and precisely extracts order information from the recognized voice data. It also includes a display means that confirms the content by repeating the extracted order information to the customer. Identity verification is performed by voice pattern matching, and the invention includes an authentication means that authenticates the user. Furthermore, it provides a payment processing means that allows the user to select from multiple payment methods and a suggestion means that makes product suggestions based on the user's purchase history. It also includes an emotion analysis means that analyzes the user's emotions from the voice data and generates an appropriate response, and a fraud detection means that detects fraudulent orders, thereby providing a more inclusive and secure purchasing experience.

[0006] "Voice input" is a means of operation in which a user uses their voice to convey information or instructions to a system.

[0007] "Recognition" is the process of analyzing sounds and data to understand their meaning and derive appropriate actions.

[0008] "Programming means" refers to computer software designed to achieve a specific function.

[0009] "Order information" refers to specific data related to the purchase of goods, including product numbers and quantities.

[0010] "Display means" refers to devices or equipment used to visually present information to users.

[0011] "Identity verification" is an authentication process that verifies the user's identity and prevents impersonation.

[0012] "Voice pattern matching" is a technology that analyzes the characteristics of a voice to identify a specific person.

[0013] "Authentication means" refers to hardware or software used to verify a user's identity and authority.

[0014] "Settlement processing means" refers to a function for completing the payment of the price in transactions of goods and services.

[0015] "Purchase history" refers to the records of goods and services purchased by the user in the past.

[0016] "Product recommendation" refers to the act of recommending additional products based on the interests and relevance of the user.

[0017] "Emotion analysis means" refers to a technology that reads emotions from the user's voice and text and adjusts responses.

[0018] "Fraud detection means" refers to a process of detecting abnormal patterns and behaviors and preventing fraudulent acts.

Brief Description of Drawings

[0019] [Figure 1] It is a conceptual diagram showing an example of the configuration of a data processing system according to the first embodiment. [Figure 2] It is a conceptual diagram showing an example of the main functions of a data processing device and a smart device according to the first embodiment. [Figure 3] It is a conceptual diagram showing an example of the configuration of a data processing system according to the second embodiment. [Figure 4] It is a conceptual diagram showing an example of the main functions of a data processing device and smart glasses according to the second embodiment. [Figure 5] It is a conceptual diagram showing an example of the configuration of a data processing system according to the third embodiment. [Figure 6] It is a conceptual diagram showing an example of the main functions of a data processing device and a headset-type terminal according to the third embodiment. [Figure 7] It is a conceptual diagram showing an example of the configuration of a data processing system according to the fourth embodiment. [Figure 8] It is a conceptual diagram showing an example of the main functions of a data processing device and a robot according to the fourth embodiment. [Figure 9] It shows an emotion map to which a plurality of emotions are mapped. [Figure 10] Shows an emotion map to which a plurality of emotions are mapped. [Figure 11] It is a sequence diagram showing the processing flow of the data processing system in Example 1. [Figure 12] It is a sequence diagram showing the processing flow of the data processing system in Application Example 1. [Figure 13] It is a sequence diagram showing the processing flow of the data processing system in Example 2 when an emotion engine is combined. [Figure 14] It is a sequence diagram showing the processing flow of the data processing system in Application Example 2 when an emotion engine is combined.

Mode for Carrying Out the Invention

[0020] Hereinafter, an example of an embodiment of a system according to the technology of the present disclosure will be described with reference to the accompanying drawings.

[0021] First, the terms used in the following description will be described.

[0022] In the following embodiments, a processor with a reference numeral (hereinafter simply referred to as "processor") may be a single arithmetic unit or a combination of a plurality of arithmetic units. Also, the processor may be a single type of arithmetic unit or a combination of a plurality of types of arithmetic units. Examples of arithmetic units include a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a GPGPU (General-Purpose computing on Graphics Processing Units), an APU (Accelerated Processing Unit), and the like.

[0023] In the following embodiments, a RAM (Random Access Memory) with a reference numeral is a memory in which information is temporarily stored and is used as a work memory by the processor.

[0024] In the following embodiments, the signed storage is one or more non-volatile storage devices that store various programs and various parameters. Examples of non-volatile storage devices include flash memory (SSD (Solid State Drive)), magnetic disks (e.g., hard disks), or magnetic tapes.

[0025] In the following embodiments, the signed communication interface (I / F) is an interface that includes a communication processor and an antenna, etc. The communication interface manages communication between multiple computers. Examples of communication standards applicable to the communication interface include wireless communication standards such as 5G (5th Generation Mobile Communication System), Wi-Fi (registered trademark), or Bluetooth (registered trademark).

[0026] In the following embodiments, "A and / or B" is synonymous with "at least one of A and B." That is, "A and / or B" means that it may be A alone, or B alone, or a combination of A and B. Furthermore, in this specification, the same concept as "A and / or B" applies when expressing three or more things linked by "and / or."

[0027] [First Embodiment]

[0028] Figure 1 shows an example of the configuration of the data processing system 10 according to the first embodiment.

[0029] As shown in Figure 1, the data processing system 10 includes a data processing device 12 and a smart device 14. An example of the data processing device 12 is a server.

[0030] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0031] The smart device 14 comprises a computer 36, a reception device 38, an output device 40, a camera 42, and a communication interface 44. The computer 36 comprises a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The reception device 38, output device 40, and camera 42 are also connected to the bus 52.

[0032] The reception device 38 is equipped with a touch panel 38A and a microphone 38B, etc., and receives user input. The touch panel 38A receives user input by detecting contact with an object (e.g., a pen or finger). The microphone 38B receives user input by detecting the user's voice. The control unit 46A transmits data indicating the user input received by the touch panel 38A and microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the data indicating the user input.

[0033] The output device 40 includes a display 40A and a speaker 40B, and presents data to the user 20 by outputting the data in a form perceptible to the user 20 (e.g., audio and / or text). The display 40A displays visible information such as text and images according to instructions from the processor 46. The speaker 40B outputs audio according to instructions from the processor 46. The camera 42 is a small digital camera equipped with an optical system such as a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor.

[0034] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various types of information between processor 46 and processor 28 via network 54.

[0035] Figure 2 shows an example of the main functions of the data processing device 12 and the smart device 14.

[0036] As shown in Figure 2, in the data processing device 12, a specific processing is performed by the processor 28. A specific processing program 56 is stored in the storage 32. The specific processing program 56 is an example of a "program" related to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 according to the specific processing program 56 executed on the RAM 30.

[0037] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0038] In the smart device 14, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The reception output program 60 is used in conjunction with a specific processing program 56 by the data processing system 10. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0039] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart device 14 as the "terminal".

[0040] This invention is a system that utilizes voice recognition technology to streamline television shopping orders and provide users with a smooth and secure purchasing experience. The operation of the entire system is described below.

[0041] The server first captures the user's voice input with high accuracy and converts the voice into text data using a speech recognition engine. At this stage, information such as product numbers and quantities related to the order is extracted. The terminal repeats the extracted order information to the user for confirmation. This prevents input errors.

[0042] Subsequently, the server performs voice pattern matching and verifies the user's identity by comparing it with their registration information. Once verified, the user selects their preferred payment method through their terminal. This system supports multiple payment methods, allowing for seamless transaction completion.

[0043] Furthermore, the server analyzes the user's past purchase history and suggests related products, providing the user with new purchasing opportunities. It also analyzes the user's emotions from their speech and, if positive feedback is received, employs an approach to encourage the purchase of suggested products.

[0044] Furthermore, the server employs a fraud detection algorithm to monitor unusual order patterns and immediately issues an alert if there are any signs of fraud. This further enhances security.

[0045] As a concrete example, when a user says, "This product looks interesting," the server captures the statement using speech recognition and identifies product number 789 as order data. The terminal then confirms, "Is one unit of product number 789 correct?" and the order is confirmed when the user responds, "Yes." After a verification process, the user selects credit card payment and the transaction is completed. Furthermore, if the user has a history of purchasing similar products in the past, the server suggests related products by saying, "How about this product as well?", naturally continuing the conversation. In this way, the present invention allows users to enjoy a safe and comfortable shopping experience.

[0046] The following describes the processing flow.

[0047] Step 1:

[0048] The user begins talking about the product they want to buy. Specifically, they state their order by voice, such as "I want this product."

[0049] Step 2:

[0050] The device captures the user's speech as audio data and sends that data to the server.

[0051] Step 3:

[0052] The server passes the received audio data to a speech recognition engine, which converts it into text. This extracts the product numbers and quantities needed for the order.

[0053] Step 4:

[0054] The terminal repeats the extracted order information to the user, confirming, for example, "Is it correct that the item number is 456 and the quantity is 2?"

[0055] Step 5:

[0056] The user responds with "yes" or "no" to the repeated order details.

[0057] Step 6:

[0058] The server performs voice pattern matching and verifies the user's identity by comparing it with user information.

[0059] Step 7:

[0060] Once the terminal has verified the user's identity, it presents the user with multiple payment methods and prompts them to choose. It will display options such as "Please choose from credit card, bank transfer, or deferred payment."

[0061] Step 8:

[0062] The user selects their preferred payment method, and the device sends this selection information to the server.

[0063] Step 9:

[0064] The server analyzes the user's past purchase history and suggests additional related products, displaying a message such as "We also recommend this product."

[0065] Step 10:

[0066] The server performs sentiment analysis, reading the user's emotions from their utterances and adjusting the response accordingly. For example, if the user is happy, it might generate a response such as "I'm glad you're satisfied."

[0067] Step 11:

[0068] The server monitors all order data and checks for any fraudulent order patterns. If an anomaly is detected, an alert is immediately issued and the responsible person is notified.

[0069] (Example 1)

[0070] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0071] Conventional trading systems utilizing voice recognition suffer from the potential for erroneous orders due to the low accuracy of voice input, and the complexities of payment and identity verification procedures. Furthermore, the suggestions based on purchase history are not sufficiently effective, resulting in a lack of improvement in the user's purchasing experience. In addition, the accuracy of fraudulent order detection is insufficient, posing a security risk.

[0072] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0073] In this invention, the server includes processing means for acquiring voice input with high accuracy, programming means for converting the acquired voice data into text data, and information processing means for extracting order information from the converted voice data. This enables the acquisition of voice input with high accuracy and the generation of accurate order information.

[0074] "A processing method for acquiring audio input with high accuracy" refers to a function that captures audio in an optimal state and removes noise and external interference to maintain sound quality.

[0075] "A program that converts acquired audio data into text data" refers to a program that converts human speech into digital text information and makes it into a format that can be processed by a computer.

[0076] "Information processing means for extracting order information from converted audio data" refers to a function that analyzes the text information converted from audio and identifies order-related information such as product number and quantity from it.

[0077] A "customer confirmation method via voice output" is a means of prompting a user to confirm their order by playing back the details of their order in audio.

[0078] "An identification method that uses voice data to verify identity" refers to a function that analyzes the characteristics of the voice and compares it with existing registration information to verify the user's identity.

[0079] "A payment selection method that allows users to choose from multiple different payment methods" refers to a function that allows users to select from various payment methods when making a transaction.

[0080] A "product suggestion method that proposes related products based on past purchase history" is a function that suggests new related products to a user based on their purchase history.

[0081] "An emotion analysis tool for analyzing user emotions" is a function that determines the user's emotions from their voice or speech and adjusts its response accordingly.

[0082] An "anomaly detection mechanism that detects abnormal orders and issues warnings" is a function that detects suspicious activity that deviates from normal order patterns and issues warnings as necessary.

[0083] This invention is a system that utilizes speech recognition technology to enable smooth transactions using the user's voice. The server, terminal, and user all work together, progressing through the following steps.

[0084] The server uses high-performance microphones and voice input devices to acquire voice input with high accuracy. After acquiring the voice data, a speech recognition engine (e.g., Google® Cloud Speech-to-Text) is used to convert the voice into text data. The generative AI model used in this conversion enables highly accurate and natural speech recognition. In this process, it is important to minimize ambient noise by utilizing noise cancellation technology and acoustic models.

[0085] From the converted text data, the server uses regular expressions and natural language processing techniques to identify order information and quantities. The information extracted by the server is then confirmed by voice to the user via the terminal. This confirmation method uses speech synthesis technology and employs a natural conversational style, such as, "Is it correct to order one item of product number 789?"

[0086] In the identity verification step, the server analyzes the voice pattern and uses identification technology to compare it with registered information. This allows for highly accurate verification of the user's identity. After authentication is complete, the user can choose from multiple payment methods through the terminal, and interfaces for credit cards and electronic money are provided.

[0087] Furthermore, the server uses machine learning algorithms to analyze past purchase history and suggest related products. For example, if a user asks, "What products do you recommend?", it can suggest new products that take into account the trends of products purchased in the past.

[0088] Ultimately, the server uses fraud detection algorithms to monitor abnormal order patterns in real time and issue alerts to administrators as needed. This significantly improves the overall security of the system.

[0089] Examples of prompt messages include the following:

[0090] "I would like to order one of these items."

[0091] "I'll pay by credit card"

[0092] "Tell me your recommended products."

[0093] Through these steps, the present invention provides users with an efficient and secure purchasing experience.

[0094] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0095] Step 1:

[0096] The server receives voice input from the user. The input is the user's speech, and the server uses a high-performance voice input device to obtain clear voice data while removing ambient noise. The output is digitized voice data.

[0097] Step 2:

[0098] The server uses a speech recognition engine to convert the acquired audio data into text data. The input is the audio data from step 1, which is then converted into text information using an AI model. The output is the user's spoken content as text data.

[0099] Step 3:

[0100] The server extracts order information from the converted text data. The input is the text data obtained in step 2. Regular expressions or natural language processing techniques are used to identify order-related information such as product numbers and quantities. The output is a dataset of order information.

[0101] Step 4:

[0102] The terminal confirms the order information received from the server with the user via voice. The input is the order information from step 3, and speech synthesis technology is used to generate a confirmation voice in a natural conversational format, such as "Is it okay to order item number 789, one unit?". The output is a voice output to the user.

[0103] Step 5:

[0104] The server verifies the user's identity using their voice pattern. The input is the voice data obtained in step 1, which is identified by comparing it with a registered voice profile. The output is the identity verification result.

[0105] Step 6:

[0106] The user selects their preferred payment method from a list of options displayed on the terminal. The input is a list of payment methods displayed on the terminal, from which the user makes their selection. The output is the selected payment method.

[0107] Step 7:

[0108] The server analyzes past purchase history and suggests related products. The input is the user's purchase history data, and a machine learning algorithm is used to select relevant products. The output is information on the suggested products.

[0109] Step 8:

[0110] The server runs a fraud detection algorithm and monitors for unusual orders. The input is the order information obtained in step 3 and the user's activity log, which is compared to typical order patterns. The output is the fraud detection result, and an alert is issued if necessary.

[0111] (Application Example 1)

[0112] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0113] Traditional online shopping systems suffer from usability issues, as users are required to perform numerous steps to find products and complete the ordering process. Furthermore, there is a risk of fraudulent orders. There is a need to solve these problems and provide a highly accurate and secure purchasing experience.

[0114] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0115] In this invention, the server includes processing means for recognizing voice input with high accuracy, processing means for extracting order-related data from the recognized voice information, and support means for quickly identifying products based on voice recognition and confirming them with the user. This allows users to smoothly order products using only their voice, while also detecting fraudulent orders and enabling a safe and comfortable purchasing experience.

[0116] "Voice input" refers to the process of recording a user's spoken words as digital data.

[0117] "Processing means" refers to a device or program that analyzes and calculates data such as audio information to achieve a specific purpose.

[0118] "Audio information" refers to audio data emitted by the user, which is recorded as an acoustic signal.

[0119] "Order data" refers to purchase information, including the products to be purchased and their quantities.

[0120] "Display means" refers to a device or apparatus that visually presents information to the user.

[0121] "Identity verification" is a procedure to confirm whether a user is a legitimate registered user.

[0122] "Authentication means" refers to a device or technology used to prove the authority or identity of a specific individual.

[0123] "Payment method" refers to the means and procedures for paying for a purchase.

[0124] A "payment processing method" is a system or function for processing the exchange of payment.

[0125] A "recommendation means" is a device or program for recommending products based on the user's preferences and history.

[0126] "Emotional analysis means" refers to a technology or program that determines a user's psychological state from their voice data.

[0127] "Fraud detection measures" are devices or systems that identify unusual or fraudulent patterns or behaviors.

[0128] "Support measures" refer to devices or technologies designed to assist with specific tasks or activities.

[0129] The voice recognition system in this invention is implemented as a smartphone application and streamlines online shopping through voice input by the user.

[0130] First, when the user speaks about the product into their smartphone's microphone, the device captures the audio as digital data. This audio data is then converted into text using speech recognition software. Speech recognition technologies such as the speech_recognition library are utilized in this process. Order-related information is extracted from the text data, identifying the product number and quantity.

[0131] Next, the server confirms the order information with the user via voice or text, and finalizes the order based on the user's response. After this confirmation, the server verifies the user's identity using authentication methods. Security is ensured by using voice pattern matching technology and comparing it with registered voice data.

[0132] Subsequently, the server processes the payment, allowing the user to select their preferred payment method from several options. This process utilizes fintech technology to ensure a smooth transaction. Furthermore, the server analyzes past purchase history and suggests related products, providing users with new purchasing opportunities. Sentiment analysis technology is used to extract interest from the user's positive utterances and improve the accuracy of recommendations.

[0133] Furthermore, the server is equipped with a fraud detection algorithm that immediately issues a warning if it detects an abnormal order pattern, thereby preventing fraudulent transactions.

[0134] As a concrete example, when a user says "I'd like to order one of these sofas" on their smartphone, the system recognizes the voice and identifies the product number and quantity. A confirmation prompt is generated asking, "Product number 5678, one unit, is that correct?" After the user responds, identity verification and payment procedures are completed, and related products are suggested based on the user's past purchase history of similar items, such as "Would you also like this cushion?" This process allows users to easily complete orders using only their voice.

[0135] An example of a prompt for a generative AI model might be, "Design an application that allows users to order desired products by voice, and smoothly handle confirmation and payment."

[0136] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0137] Step 1:

[0138] The user speaks into their smartphone's microphone about the product they wish to purchase. The input is the user's voice data, which the device acquires as digital data. This voice data serves as basic information for processing in the next step.

[0139] Step 2:

[0140] The device converts the acquired audio data into text data using the speech_recognition library. The input is audio data, and phoneme analysis is performed by the speech recognition engine to obtain the output as text containing the order details. This text data contains the order information intended by the user.

[0141] Step 3:

[0142] The terminal analyzes order information from the text data and extracts the product number and quantity. The input is the text data obtained in step 2, and by using natural language processing technology to identify and extract the necessary data, the product number and quantity are obtained as output.

[0143] Step 4:

[0144] The server confirms the extracted order information with the user via voice or text. The input is the product information obtained in step 3, and the server generates a confirmation message, which is then delivered to the user as synthesized speech or text. This confirmation helps prevent input errors.

[0145] Step 5:

[0146] The server receives the user's response and decides whether to confirm the order based on the response. The input is either the user's voice response or text data, which is analyzed to make the final order confirmation. If the correct response is confirmed, the order is officially confirmed.

[0147] Step 6:

[0148] The server performs user identity verification. Inputs include voice patterns and user registration information, and authentication methods are used to verify that the user is legitimate. The output is either an approval or rejection of the identity verification.

[0149] Step 7:

[0150] The server initiates the payment process, allowing the user to select their preferred payment method. The input consists of order confirmation and identity verification information, and fintech technology is used to perform the appropriate payment procedure. The output generates a status indicating that the transaction is complete.

[0151] Step 8:

[0152] The server analyzes past purchase history and suggests related products to the user. The input is the user's purchase history data, and appropriate products are selected based on database analysis. The output is a list of products suggested to the user.

[0153] Step 9:

[0154] The server monitors for abnormal order patterns using a fraud detection algorithm. The input is all order data, which is compared to pre-configured fraud patterns. If an anomaly is detected, an alert is issued.

[0155] This entire process allows users to shop easily and securely using voice commands.

[0156] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0157] This invention relates to a system equipped with a program that accurately recognizes user voice input and extracts order information from the voice data. The server converts the voice data into text using a speech recognition engine and extracts the product number and quantity. This data is repeated to the user via a terminal for confirmation.

[0158] Identity verification is performed by the server using voice pattern matching technology, ensuring accurate user authentication through the authentication method. After authentication is complete, the terminal presents the user with multiple payment methods and allows them to select their preferred method. The selection information is sent to the server, and the appropriate payment processing is carried out.

[0159] A distinctive feature of this system is its emotion analysis method, which incorporates an emotion engine. The server analyzes the user's voice tone and speech content using the emotion engine to estimate the user's emotions. Based on this emotion information, the server generates an appropriate response that matches the user's mood, and the terminal presents that response to the user.

[0160] For example, if a user expresses dissatisfaction, the emotion engine instantly detects that negative emotion. Based on this information, the server generates a response such as, "Shall I explain more about this product?", providing an interaction that mitigates the negative emotion. The emotion engine also takes past emotional patterns into consideration, using this information to generate future responses that ensure users always have a positive experience.

[0161] Furthermore, this system uses fraud detection mechanisms to immediately issue alerts and notify the responsible personnel if abnormal order patterns are detected. This further enhances security.

[0162] Thus, the system of the present invention highly integrates analysis, recognition, authentication, and payment, and provides a comprehensive platform for improving the user experience through an emotion engine.

[0163] The following describes the processing flow.

[0164] Step 1:

[0165] The user makes a verbal utterance indicating their intention to purchase. Specifically, they request a product by saying, "I want to buy this."

[0166] Step 2:

[0167] The device captures the user's speech as audio data and sends that data to the server.

[0168] Step 3:

[0169] The server inputs the transmitted voice data into a speech recognition engine and converts it into text. At this stage, order information, including product numbers and quantities, is extracted.

[0170] Step 4:

[0171] The terminal repeats the extracted order information to the user, asking for confirmation: "Is it correct that the item number is 123 and the quantity is 2?"

[0172] Step 5:

[0173] The user responds to the order confirmation with "yes" or "no". If the user answers "no", the device prompts them to re-enter the order details.

[0174] Step 6:

[0175] The server performs voice pattern analysis and verifies the user's identity by comparing it with existing user information.

[0176] Step 7:

[0177] Once the device has completed identity verification, it will present the user with several payment options, displaying a message such as, "Please choose from credit card, bank transfer, or deferred payment."

[0178] Step 8:

[0179] The user selects their preferred payment method. This selection is sent to the server via the terminal, and the payment process begins.

[0180] Step 9:

[0181] The server uses an emotion engine to analyze the user's voice tone and assess their emotional state. Based on this information, it adjusts its next response.

[0182] Step 10:

[0183] The server checks the user's past emotional patterns and purchase history, and suggests related products and special offers. It displays a message like, "We also recommend this product," through the user's device.

[0184] Step 11:

[0185] The server activates a fraud detection algorithm and issues real-time alerts if any unusual orders are detected. This notification is then sent to the responsible person, enabling a swift response.

[0186] (Example 2)

[0187] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0188] In modern society, ordering systems using voice input are widespread, but the accuracy of voice recognition and the generation of appropriate responses that respond to user emotions are insufficient. Furthermore, from a security standpoint, there is a lack of mechanisms to detect fraudulent orders. As a result, the user experience deteriorates and the reliability of the system is compromised.

[0189] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0190] In this invention, the server includes control means for recognizing voice input with high accuracy, response generation means for generating and presenting responses based on emotional information, and anomaly monitoring means for detecting fraudulent orders. This enables high-accuracy recognition of the user's voice, allows for appropriate responses in response to emotions during actual order processing, and prevents fraudulent orders, thereby improving the user experience and the reliability of the system.

[0191] "Voice input" is a method for users to communicate information to a system using spoken language.

[0192] "High-precision recognition" refers to analyzing audio information, converting it into text or numbers, and minimizing errors.

[0193] "Control means" refers to devices or programs that have functions for inputting, analyzing, converting, and outputting various types of data.

[0194] "Order details" refers to specific purchase information, such as product numbers and quantities, extracted from the user's voice.

[0195] "Display means" refers to monitors and display devices used to present text and images to users.

[0196] "Authentication methods" refer to technologies and devices used to verify a user's personal information and identify them as that person.

[0197] A "payment management system" is a system that offers multiple payment methods and allows users to make payments using the method they choose.

[0198] A "suggestion method" refers to a device or program that has the function of suggesting products or services suitable for the user based on their past purchase history.

[0199] "Emotional evaluation methods" refer to technologies that analyze emotional elements from a user's voice or text data to determine their emotional state.

[0200] An "anomaly monitoring system" is a system that has the function of detecting patterns that differ from normal operation and warning of fraud or abnormalities.

[0201] A "response generation means" is a process or apparatus for constructing and presenting appropriate responses or messages based on analyzed data and emotional information.

[0202] The system for carrying out the present invention recognizes voice input and integrates order processing, user authentication, sentiment analysis, and anomaly monitoring based on that input. Specific embodiments are as follows.

[0203] First, the user uses a device to input their voice. This device can be a standard smartphone or tablet. When the user gives their order instructions by voice, this voice data is transmitted to the server via the internet through the device.

[0204] Next, the server uses speech recognition software (for example, a cloud-based API that provides speech recognition services) to convert the speech data into text data. A specific example is using a speech recognition service to convert the voice command "I want to order 3 iPhone(registered trademark) 12s" into the text data "Product number iPhone12, quantity 3".

[0205] The server then extracts order details such as product number and quantity from the converted text data. This information is sent to the terminal and repeated to the user via screen display and voice. The user reviews the information and, if necessary, re-enters it via voice input.

[0206] In the identity verification step, the server uses voice pattern matching technology to authenticate the user. Specifically, it analyzes the user's voice characteristics and matches them with pre-registered information to confirm their identity.

[0207] In addition, the server uses an emotion analysis engine to analyze the user's voice tone and speech content to determine the user's emotional state. Based on this emotional information, it generates responses that are empathetic to the user's mood, such as "How are you feeling today? Do you need any support?" This improves the user experience.

[0208] Furthermore, the server is equipped with an anomaly monitoring system that detects fraudulent and abnormal order patterns. When an anomaly occurs, it issues an alert and notifies the administrator to take appropriate action. In this way, the security of the entire system is maintained.

[0209] A possible example of a specific prompt message would include: "Convert the user's voice input to text and extract the order details. Analyze the user's emotional state and generate a response based on that."

[0210] This system enables efficient voice-based order processing and provides users with a comfortable and safe user experience.

[0211] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0212] Step 1:

[0213] The user uses a terminal to place an order by voice. At this stage, the user's voice data is entered. The terminal receives this voice data and sends it to the server. This transmitted voice data becomes the basis for the next processing step.

[0214] Step 2:

[0215] The server converts the received audio data into text data using speech recognition software. This conversion process utilizes a cloud-based speech recognition API, which analyzes the speech to accurately convert it into text data. This converted text data is then used for subsequent data processing to extract order information.

[0216] Step 3:

[0217] The server performs text analysis to extract order information from the text data. This process involves applying natural language processing techniques to the text data to identify specific order details such as product numbers and quantities. The extracted order information is sent to the terminal, where the user is prompted to confirm it visually or audibly.

[0218] Step 4:

[0219] The user confirms the order information displayed on the terminal or repeated aloud. If the information confirmed by the user is correct, they send a "Confirmation Complete" input back to the server via the terminal. This response triggers the order process to proceed.

[0220] Step 5:

[0221] The server verifies the user's identity using authentication methods. Specifically, it uses voice pattern matching technology to compare the user's voice with registered voice data. If this matching is successful, user authentication is complete. This authentication is a crucial step for secure payments and protection of personal information.

[0222] Step 6:

[0223] The terminal presents the user with payment method options, including credit cards, e-money, and bank transfers, visually displaying a variety of payment methods to the user. The user selects their preferred payment method through the terminal, and the selection is sent to the server, where the payment process begins.

[0224] Step 7:

[0225] The server uses an emotion analysis engine to analyze the user's emotional state from the voice data. This process utilizes a generative AI model to estimate the user's emotions from the tone and content of the voice. The estimated emotional information then becomes input data for the server to generate a response and provide an appropriate dialogue.

[0226] Step 8:

[0227] The server generates a response to present to the user, taking into account emotional and order information. This response is considerate of the user's emotional state and may include questions such as "Is there anything else I can help you with?" or suggestions for support. The generated response is sent to the terminal and presented to the user.

[0228] Step 9:

[0229] The server utilizes a fraud detection system to monitor unusual orders. This system identifies deviations from normal order patterns and issues warnings if fraud is suspected. This ensures consistent and secure operation of the system.

[0230] (Application Example 2)

[0231] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart device 14 will be referred to as the "terminal."

[0232] With the advancement of information and communication technology, services utilizing voice recognition are becoming widespread. However, challenges remain, such as issues with recognition accuracy when placing orders via voice input and insufficient responses to user dissatisfaction. Furthermore, detecting fraudulent orders and offering diverse payment options are also problematic. In particular, there is a need for flexible responses that respond to the emotions of users.

[0233] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0234] In this invention, the server includes a programming means for recognizing voice input with high accuracy, an emotion analysis means, and a means for analyzing the user's feelings from their speech and generating special offers based on individual inquiries. This improves the accuracy of voice recognition and enables flexible responses that respond to the user's emotions, providing the user with a comfortable ordering experience.

[0235] "A programmatic means for recognizing voice input with high accuracy" refers to software technology that converts the voice spoken by the user into a digital signal and precisely extracts its content as text data.

[0236] "Programming means for extracting order information" refers to software technology that identifies the items and quantities of the items the user wishes to purchase from recognized voice data and compiles them into an order.

[0237] "Display means" refers to devices or software that visually display information extracted by the system to the user, enabling them to confirm and select information.

[0238] "Authentication methods" refer to technologies that analyze voice patterns to verify the user's identity and guarantee secure transactions.

[0239] A "payment processing means" is a processing means for completing the payment process according to the payment method selected by the user.

[0240] "Recommendation methods" refer to algorithms and technologies used to recommend appropriate products and services to users based on their past purchase history.

[0241] "Emotional analysis means" refers to technology that analyzes the emotions contained in a user's voice and identifies the user's mental state.

[0242] "Fraud detection measures" refer to algorithms and technologies used to detect fraudulent or abnormal order behavior and ensure security.

[0243] "A means of analyzing user utterances to understand their feelings and generating special offers based on individual inquiries" refers to a method of analyzing the content of user utterances and the emotions contained therein, and providing special offers and information tailored to the user.

[0244] The system realizing this invention highly recognizes user voice input and processes order information based on it. The server converts the voice data into text format using a speech recognition engine and extracts the order details. The Google Cloud Speech-to-Text API is used for speech recognition to ensure high accuracy.

[0245] The server extracts order information from the voice input, which is then repeated back to the user via the terminal for confirmation. This confirmation process utilizes the terminal's display, allowing the user to visually verify the information.

[0246] For identity verification, the server uses voice pattern matching technology to perform voice authentication. This ensures highly accurate user authentication. This process plays a role in preventing unauthorized access and manipulation.

[0247] Next, regarding payment processing, the server proposes multiple payment methods to the user, and the payment is processed according to the method selected by the user. The payment processing is required to be secure and smooth, and various payment APIs are applied.

[0248] In emotion analysis, the server uses IBM Watson® Tone Analyzer to analyze the user's voice tone and estimate their emotional state. Based on these analysis results, services and special offers tailored to the user's emotions are generated, improving the user's experience satisfaction.

[0249] The fraud detection mechanism allows the server to immediately send a notification to the responsible person if it detects an abnormal order pattern. This improves the overall security of the system.

[0250] For example, if a user says, "I'm tired today, so I'd like to order a pizza," and sentiment analysis determines that the user is feeling "tired," then an offer such as "Would you like a smoothie with your pizza?" will be generated.

[0251] An example of a prompt message might be, "If the user is in a mood for relaxation, suggest a seasonal drink."

[0252] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0253] Step 1:

[0254] Acquiring voice input

[0255] The user places an order by voice into the terminal. This voice data is recorded on the terminal and sent to the server. The input is the user's voice, and the output is digital voice data. The terminal performs signal processing, cutting out silent portions and compressing the audio before transmission.

[0256] Step 2:

[0257] Speech recognition processing

[0258] The server converts the received audio data into text using the Google Cloud Speech-to-Text API. The input is digital audio data, and the output is extracted text data. In this step, the server performs audio preprocessing, such as noise filtering, to improve recognition accuracy.

[0259] Step 3:

[0260] Extraction of order information

[0261] The server extracts order information, such as product names and quantities, from the converted text. The input is text data, and the output is structured order information. Keyword matching and natural language processing techniques are used to separate relevant information within the text.

[0262] Step 4:

[0263] Identity verification through voice pattern matching

[0264] The server analyzes voice patterns and performs identity verification by matching them with pre-registered voices. The input is voice feature data, and the output is the authentication result. Here, voice feature extraction technology is used, and the authentication process is performed by a machine learning model.

[0265] Step 5:

[0266] Emotion analysis

[0267] The server uses IBM Watson Tone Analyzer to estimate the user's emotional state from their speech. The input is text data of the user's utterances, and the output is an emotional label and score. The server analyzes the nuances of the utterances and processes the data to infer the user's psychological state.

[0268] Step 6:

[0269] Confirmation display and option presentation

[0270] The server extracts order information and sends it to the terminal, which then displays it on its screen for the user to confirm. The input is the order information, and the output is the display screen. The terminal uses a caching mechanism to optimize response speed.

[0271] Step 7:

[0272] Generating and presenting special offers

[0273] Based on the sentiment analysis results, the server generates a special offer tailored to the user's mood and sends it to the terminal. The input is a sentiment label, and the output is special offer data. The server uses a generation AI model to create a prompt message and notifies the user of relevant information.

[0274] Step 8:

[0275] Payment processing

[0276] The server processes the payment based on the payment method selected by the user. The input is the selected payment method, and the output is a payment completion notification. The server uses encryption technology to execute transactions to ensure secure and fast transactions.

[0277] Step 9:

[0278] Fraud detection and notification

[0279] The server employs fraud detection measures throughout the entire process and immediately sends an alert to the responsible person if an abnormal order pattern is detected. The input is order pattern data, and the output is a warning notification. The server monitors patterns using machine learning algorithms and performs actions to detect anomalies early.

[0280] The specific processing unit 290 transmits the result of the specific processing to the smart device 14. In the smart device 14, the control unit 46A causes the output device 40 to output the result of the specific processing. The microphone 38B acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 38B to the data processing device 12. In the data processing device 12, the specific processing unit 290 acquires the audio data.

[0281] Data generation model 58 is a so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (registered trademark) (Internet search).<URL: https: / / openai.com / blog / chatgpt> ), Gemini (registered trademark) (Internet search) <url: https: gemini.google.com ?hl="ja">)Generative AIs such as [the ones mentioned above] can be cited. The data generation model 58 is obtained by performing deep learning on a neural network. A prompt including instructions is input to the data generation model 58, and inference data such as voice data indicating voice, text data indicating text, and image data indicating an image is input. The data generation model 58 infers the input inference data according to the instructions indicated by the prompt, and outputs the inference result in a data format such as voice data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization, etc.

[0282] In the above embodiment, an example form in which specific processing is performed by the data processing device 12 is cited. However, the technology of the present disclosure is not limited to this, and specific processing may be performed by the smart device 14.

[0283] [Second Embodiment]

[0284] FIG. 3 shows an example of the configuration of the data processing system 210 according to the second embodiment.

[0285] As shown in FIG. 3, the data processing system 210 includes a data processing device 12 and smart glasses 214. An example of the data processing device 12 is a server.

[0286] The data processing device 12 includes a computer 22, a database 24, and a communication I / F 26. The computer 22 is an example of the "computer" according to the technology of the present disclosure. The computer 22 includes a processor 28, a RAM 30, and a storage 32. The processor 28, the RAM 30, and the storage 32 are connected to a bus 34. Also, the database 24 and the communication I / F 26 are connected to the bus 34. The communication I / F 26 is connected to a network 54. Examples of the network 54 include a WAN (Wide Area Network) and / or a LAN (Local Area Network), etc.

[0287] The smart glasses 214 include a computer 36, a microphone 238, a speaker 240, a camera 42, and a communication interface 44. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, and camera 42 are also connected to the bus 52.

[0288] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0289] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0290] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0291] Figure 4 shows an example of the main functions of the data processing device 12 and the smart glasses 214. As shown in Figure 4, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0292] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0293] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0294] In the smart glasses 214, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0295] Next, the identification processing performed by the identification processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0296] This invention is a system that utilizes voice recognition technology to streamline television shopping orders and provide users with a smooth and secure purchasing experience. The operation of the entire system is described below.

[0297] The server first captures the user's voice input with high accuracy and converts the voice into text data using a speech recognition engine. At this stage, information such as product numbers and quantities related to the order is extracted. The terminal repeats the extracted order information to the user for confirmation. This prevents input errors.

[0298] Subsequently, the server performs voice pattern matching and verifies the user's identity by comparing it with their registration information. Once verified, the user selects their preferred payment method through their terminal. This system supports multiple payment methods, allowing for seamless transaction completion.

[0299] Furthermore, the server analyzes the user's past purchase history and suggests related products, providing the user with new purchasing opportunities. It also analyzes the user's emotions from their speech and, if positive feedback is received, employs an approach to encourage the purchase of suggested products.

[0300] Furthermore, the server employs a fraud detection algorithm to monitor unusual order patterns and immediately issues an alert if there are any signs of fraud. This further enhances security.

[0301] As a concrete example, when a user says, "This product looks interesting," the server captures the statement using speech recognition and identifies product number 789 as order data. The terminal then confirms, "Is one unit of product number 789 correct?" and the order is confirmed when the user responds, "Yes." After a verification process, the user selects credit card payment and the transaction is completed. Furthermore, if the user has a history of purchasing similar products in the past, the server suggests related products by saying, "How about this product as well?", naturally continuing the conversation. In this way, the present invention allows users to enjoy a safe and comfortable shopping experience.

[0302] The following describes the processing flow.

[0303] Step 1:

[0304] The user begins talking about the product they want to buy. Specifically, they state their order by voice, such as "I want this product."

[0305] Step 2:

[0306] The terminal captures the user's speech as voice data and transmits the data to the server.

[0307] Step 3:

[0308] The server passes the received voice data to a voice recognition engine and converts it into text. Thereby, the product numbers and quantities required for the order are extracted.

[0309] Step 4:

[0310] The terminal repeats the extracted order information to the user for confirmation. For example, it asks "Is it okay for product number 456 and quantity 2?"

[0311] Step 5:

[0312] The user answers "yes" or "no" to the repeated order content.

[0313] Step 6:

[0314] The server performs voice pattern matching and verifies the identity by comparing with user information.

[0315] Step 7:

[0316] <� Upon receiving the successful identity verification, the terminal presents multiple payment methods to the user and prompts for a selection. It presents "Please choose from credit card, bank transfer, and deferred payment".

[0317] Step 8:

[0318] The user selects the desired payment method, and the terminal transmits the selection information to the server.

[0319] Step 9:

[0320] The server analyzes the user's past purchase history and proposes additional related products. It displays "This product is also recommended".

[0321] Step 10:

[0322] The server performs sentiment analysis, reading the user's emotions from their utterances and adjusting the response accordingly. For example, if the user is happy, it might generate a response such as "I'm glad you're satisfied."

[0323] Step 11:

[0324] The server monitors all order data and checks for any fraudulent order patterns. If an anomaly is detected, an alert is immediately issued and the responsible person is notified.

[0325] (Example 1)

[0326] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0327] Conventional trading systems utilizing voice recognition suffer from the potential for erroneous orders due to the low accuracy of voice input, and the complexities of payment and identity verification procedures. Furthermore, the suggestions based on purchase history are not sufficiently effective, resulting in a lack of improvement in the user's purchasing experience. In addition, the accuracy of fraudulent order detection is insufficient, posing a security risk.

[0328] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0329] In this invention, the server includes processing means for acquiring voice input with high accuracy, programming means for converting the acquired voice data into text data, and information processing means for extracting order information from the converted voice data. This enables the acquisition of voice input with high accuracy and the generation of accurate order information.

[0330] "A processing method for acquiring audio input with high accuracy" refers to a function that captures audio in an optimal state and removes noise and external interference to maintain sound quality.

[0331] "A program that converts acquired audio data into text data" refers to a program that converts human speech into digital text information and makes it into a format that can be processed by a computer.

[0332] "Information processing means for extracting order information from converted audio data" refers to a function that analyzes the text information converted from audio and identifies order-related information such as product number and quantity from it.

[0333] A "customer confirmation method via voice output" is a means of prompting a user to confirm their order by playing back the details of their order in audio.

[0334] "An identification method that uses voice data to verify identity" refers to a function that analyzes the characteristics of the voice and compares it with existing registration information to verify the user's identity.

[0335] "A payment selection method that allows users to choose from multiple different payment methods" refers to a function that allows users to select from various payment methods when making a transaction.

[0336] A "product suggestion method that proposes related products based on past purchase history" is a function that suggests new related products to a user based on their purchase history.

[0337] "An emotion analysis tool for analyzing user emotions" is a function that determines the user's emotions from their voice or speech and adjusts its response accordingly.

[0338] An "anomaly detection mechanism that detects abnormal orders and issues warnings" is a function that detects suspicious activity that deviates from normal order patterns and issues warnings as necessary.

[0339] This invention is a system that utilizes speech recognition technology to enable smooth transactions using the user's voice. The server, terminal, and user all work together, progressing through the following steps.

[0340] The server uses high-performance microphones and voice input devices to acquire voice input with high accuracy. After acquiring the voice data, a speech recognition engine (e.g., Google Cloud Speech-to-Text) is used to convert the voice into text data. The generative AI model used in this conversion achieves highly accurate and natural speech recognition. In this process, it is important to minimize ambient noise by utilizing noise cancellation technology and acoustic models.

[0341] From the converted text data, the server uses regular expressions and natural language processing techniques to identify order information and quantities. The information extracted by the server is then confirmed by voice to the user via the terminal. This confirmation method uses speech synthesis technology and employs a natural conversational style, such as, "Is it correct to order one item of product number 789?"

[0342] In the identity verification step, the server analyzes the voice pattern and uses identification technology to compare it with registered information. This allows for highly accurate verification of the user's identity. After authentication is complete, the user can choose from multiple payment methods through the terminal, and interfaces for credit cards and electronic money are provided.

[0343] Furthermore, the server uses machine learning algorithms to analyze past purchase history and suggest related products. For example, if a user asks, "What products do you recommend?", it can suggest new products that take into account the trends of products purchased in the past.

[0344] Ultimately, the server uses fraud detection algorithms to monitor abnormal order patterns in real time and issue alerts to administrators as needed. This significantly improves the overall security of the system.

[0345] Examples of prompt messages include the following:

[0346] "I would like to order one of these items."

[0347] "I'll pay by credit card"

[0348] "Tell me your recommended products."

[0349] Through these steps, the present invention provides users with an efficient and secure purchasing experience.

[0350] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0351] Step 1:

[0352] The server receives voice input from the user. The input is the user's speech, and the server uses a high-performance voice input device to obtain clear voice data while removing ambient noise. The output is digitized voice data.

[0353] Step 2:

[0354] The server uses a speech recognition engine to convert the acquired audio data into text data. The input is the audio data from step 1, which is then converted into text information using an AI model. The output is the user's spoken content as text data.

[0355] Step 3:

[0356] The server extracts order information from the converted text data. The input is the text data obtained in step 2. Regular expressions or natural language processing techniques are used to identify order-related information such as product numbers and quantities. The output is a dataset of order information.

[0357] Step 4:

[0358] The terminal confirms the order information received from the server with the user via voice. The input is the order information from step 3, and speech synthesis technology is used to generate a confirmation voice in a natural conversational format, such as "Is it okay to order item number 789, one unit?". The output is a voice output to the user.

[0359] Step 5:

[0360] The server verifies the user's identity using their voice pattern. The input is the voice data obtained in step 1, which is identified by comparing it with a registered voice profile. The output is the identity verification result.

[0361] Step 6:

[0362] The user selects their preferred payment method from a list of options displayed on the terminal. The input is a list of payment methods displayed on the terminal, from which the user makes their selection. The output is the selected payment method.

[0363] Step 7:

[0364] The server analyzes past purchase history and suggests related products. The input is the user's purchase history data, and a machine learning algorithm is used to select relevant products. The output is information on the suggested products.

[0365] Step 8:

[0366] The server runs a fraud detection algorithm and monitors for unusual orders. The input is the order information obtained in step 3 and the user's activity log, which is compared to typical order patterns. The output is the fraud detection result, and an alert is issued if necessary.

[0367] (Application Example 1)

[0368] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0369] Traditional online shopping systems suffer from usability issues, as users are required to perform numerous steps to find products and complete the ordering process. Furthermore, there is a risk of fraudulent orders. There is a need to solve these problems and provide a highly accurate and secure purchasing experience.

[0370] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0371] In this invention, the server includes processing means for recognizing voice input with high accuracy, processing means for extracting order-related data from the recognized voice information, and support means for quickly identifying products based on voice recognition and confirming them with the user. This allows users to smoothly order products using only their voice, while also detecting fraudulent orders and enabling a safe and comfortable purchasing experience.

[0372] "Voice input" refers to the process of recording a user's spoken words as digital data.

[0373] "Processing means" refers to a device or program that analyzes and calculates data such as audio information to achieve a specific purpose.

[0374] "Audio information" refers to audio data emitted by the user, which is recorded as an acoustic signal.

[0375] "Order data" refers to purchase information, including the products to be purchased and their quantities.

[0376] "Display means" refers to a device or apparatus that visually presents information to the user.

[0377] "Identity verification" is a procedure to confirm whether a user is a legitimate registered user.

[0378] "Authentication means" refers to a device or technology used to prove the authority or identity of a specific individual.

[0379] "Payment method" refers to the means and procedures for paying for a purchase.

[0380] A "payment processing method" is a system or function for processing the exchange of payment.

[0381] A "recommendation means" is a device or program for recommending products based on the user's preferences and history.

[0382] "Emotional analysis means" refers to a technology or program that determines a user's psychological state from their voice data.

[0383] "Fraud detection measures" are devices or systems that identify unusual or fraudulent patterns or behaviors.

[0384] "Support measures" refer to devices or technologies designed to assist with specific tasks or activities.

[0385] The voice recognition system in this invention is implemented as a smartphone application and streamlines online shopping through voice input by the user.

[0386] First, when the user speaks about the product into their smartphone's microphone, the device captures the audio as digital data. This audio data is then converted into text using speech recognition software. Speech recognition technologies such as the speech_recognition library are utilized in this process. Order-related information is extracted from the text data, identifying the product number and quantity.

[0387] Next, the server confirms the order information with the user via voice or text, and finalizes the order based on the user's response. After this confirmation, the server verifies the user's identity using authentication methods. Security is ensured by using voice pattern matching technology and comparing it with registered voice data.

[0388] Subsequently, the server processes the payment, allowing the user to select their preferred payment method from several options. This process utilizes fintech technology to ensure a smooth transaction. Furthermore, the server analyzes past purchase history and suggests related products, providing users with new purchasing opportunities. Sentiment analysis technology is used to extract interest from the user's positive utterances and improve the accuracy of recommendations.

[0389] Furthermore, the server is equipped with a fraud detection algorithm that immediately issues a warning if it detects an abnormal order pattern, thereby preventing fraudulent transactions.

[0390] As a concrete example, when a user says "I'd like to order one of these sofas" on their smartphone, the system recognizes the voice and identifies the product number and quantity. A confirmation prompt is generated asking, "Product number 5678, one unit, is that correct?" After the user responds, identity verification and payment procedures are completed, and related products are suggested based on the user's past purchase history of similar items, such as "Would you also like this cushion?" This process allows users to easily complete orders using only their voice.

[0391] An example of a prompt for a generative AI model might be, "Design an application that allows users to order desired products by voice, and smoothly handle confirmation and payment."

[0392] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0393] Step 1:

[0394] The user speaks into their smartphone's microphone about the product they wish to purchase. The input is the user's voice data, which the device acquires as digital data. This voice data serves as basic information for processing in the next step.

[0395] Step 2:

[0396] The device converts the acquired audio data into text data using the speech_recognition library. The input is audio data, and phoneme analysis is performed by the speech recognition engine to obtain the output as text containing the order details. This text data contains the order information intended by the user.

[0397] Step 3:

[0398] The terminal analyzes order information from the text data and extracts the product number and quantity. The input is the text data obtained in step 2, and by using natural language processing technology to identify and extract the necessary data, the product number and quantity are obtained as output.

[0399] Step 4:

[0400] The server confirms the extracted order information with the user via voice or text. The input is the product information obtained in step 3, and the server generates a confirmation message, which is then delivered to the user as synthesized speech or text. This confirmation helps prevent input errors.

[0401] Step 5:

[0402] The server receives the user's response and decides whether to confirm the order based on the response. The input is either the user's voice response or text data, which is analyzed to make the final order confirmation. If the correct response is confirmed, the order is officially confirmed.

[0403] Step 6:

[0404] The server performs user identity verification. Inputs include voice patterns and user registration information, and authentication methods are used to verify that the user is legitimate. The output is either an approval or rejection of the identity verification.

[0405] Step 7:

[0406] The server initiates the payment process, allowing the user to select their preferred payment method. The input consists of order confirmation and identity verification information, and fintech technology is used to perform the appropriate payment procedure. The output generates a status indicating that the transaction is complete.

[0407] Step 8:

[0408] The server analyzes past purchase history and suggests related products to the user. The input is the user's purchase history data, and appropriate products are selected based on database analysis. The output is a list of products suggested to the user.

[0409] Step 9:

[0410] The server monitors for abnormal order patterns using a fraud detection algorithm. The input is all order data, which is compared to pre-configured fraud patterns. If an anomaly is detected, an alert is issued.

[0411] This entire process allows users to shop easily and securely using voice commands.

[0412] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0413] This invention relates to a system equipped with a program that accurately recognizes user voice input and extracts order information from the voice data. The server converts the voice data into text using a speech recognition engine and extracts the product number and quantity. This data is repeated to the user via a terminal for confirmation.

[0414] Identity verification is performed by the server using voice pattern matching technology, ensuring accurate user authentication through the authentication method. After authentication is complete, the terminal presents the user with multiple payment methods and allows them to select their preferred method. The selection information is sent to the server, and the appropriate payment processing is carried out.

[0415] A distinctive feature of this system is its emotion analysis method, which incorporates an emotion engine. The server analyzes the user's voice tone and speech content using the emotion engine to estimate the user's emotions. Based on this emotion information, the server generates an appropriate response that matches the user's mood, and the terminal presents that response to the user.

[0416] For example, if a user expresses dissatisfaction, the emotion engine instantly detects that negative emotion. Based on this information, the server generates a response such as, "Shall I explain more about this product?", providing an interaction that mitigates the negative emotion. The emotion engine also takes past emotional patterns into consideration, using this information to generate future responses that ensure users always have a positive experience.

[0417] Furthermore, this system uses fraud detection mechanisms to immediately issue alerts and notify the responsible personnel if abnormal order patterns are detected. This further enhances security.

[0418] Thus, the system of the present invention highly integrates analysis, recognition, authentication, and payment, and provides a comprehensive platform for improving the user experience through an emotion engine.

[0419] The following describes the processing flow.

[0420] Step 1:

[0421] The user makes a verbal utterance indicating their intention to purchase. Specifically, they request a product by saying, "I want to buy this."

[0422] Step 2:

[0423] The device captures the user's speech as audio data and sends that data to the server.

[0424] Step 3:

[0425] The server inputs the transmitted voice data into a speech recognition engine and converts it into text. At this stage, order information, including product numbers and quantities, is extracted.

[0426] Step 4:

[0427] The terminal repeats the extracted order information to the user, asking for confirmation: "Is it correct that the item number is 123 and the quantity is 2?"

[0428] Step 5:

[0429] The user responds to the order confirmation with "yes" or "no". If the user answers "no", the device prompts them to re-enter the order details.

[0430] Step 6:

[0431] The server performs voice pattern analysis and verifies the user's identity by comparing it with existing user information.

[0432] Step 7:

[0433] Once the device has completed identity verification, it will present the user with several payment options, displaying a message such as, "Please choose from credit card, bank transfer, or deferred payment."

[0434] Step 8:

[0435] The user selects their preferred payment method. This selection is sent to the server via the terminal, and the payment process begins.

[0436] Step 9:

[0437] The server uses an emotion engine to analyze the user's voice tone and assess their emotional state. Based on this information, it adjusts its next response.

[0438] Step 10:

[0439] The server checks the user's past emotional patterns and purchase history, and suggests related products and special offers. It displays a message like, "We also recommend this product," through the user's device.

[0440] Step 11:

[0441] The server activates a fraud detection algorithm and issues real-time alerts if any unusual orders are detected. This notification is then sent to the responsible person, enabling a swift response.

[0442] (Example 2)

[0443] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the smart glasses 214 will be referred to as the "terminal".

[0444] In modern society, ordering systems using voice input are widespread, but the accuracy of voice recognition and the generation of appropriate responses that respond to user emotions are insufficient. Furthermore, from a security standpoint, there is a lack of mechanisms to detect fraudulent orders. As a result, the user experience deteriorates and the reliability of the system is compromised.

[0445] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0446] In this invention, the server includes control means for recognizing voice input with high accuracy, response generation means for generating and presenting responses based on emotional information, and anomaly monitoring means for detecting fraudulent orders. This enables high-accuracy recognition of the user's voice, allows for appropriate responses in response to emotions during actual order processing, and prevents fraudulent orders, thereby improving the user experience and the reliability of the system.

[0447] "Voice input" is a method for users to communicate information to a system using spoken language.

[0448] "High-precision recognition" refers to analyzing audio information, converting it into text or numbers, and minimizing errors.

[0449] "Control means" refers to devices or programs that have functions for inputting, analyzing, converting, and outputting various types of data.

[0450] "Order details" refers to specific purchase information, such as product numbers and quantities, extracted from the user's voice.

[0451] "Display means" refers to monitors and display devices used to present text and images to users.

[0452] "Authentication methods" refer to technologies and devices used to verify a user's personal information and identify them as that person.

[0453] A "payment management system" is a system that offers multiple payment methods and allows users to make payments using the method they choose.

[0454] A "suggestion method" refers to a device or program that has the function of suggesting products or services suitable for the user based on their past purchase history.

[0455] "Emotional evaluation methods" refer to technologies that analyze emotional elements from a user's voice or text data to determine their emotional state.

[0456] An "anomaly monitoring system" is a system that has the function of detecting patterns that differ from normal operation and warning of fraud or abnormalities.

[0457] A "response generation means" is a process or apparatus for constructing and presenting appropriate responses or messages based on analyzed data and emotional information.

[0458] The system for carrying out the present invention recognizes voice input and integrates order processing, user authentication, sentiment analysis, and anomaly monitoring based on that input. Specific embodiments are as follows.

[0459] First, the user uses a device to input their voice. This device can be a standard smartphone or tablet. When the user gives their order instructions by voice, this voice data is transmitted to the server via the internet through the device.

[0460] Next, the server uses speech recognition software (for example, a cloud-based API that provides speech recognition services) to convert the voice data into text data. A specific example is using a speech recognition service to convert the voice command "I want to order 3 iPhone 12s" into the text data "Product number iPhone 12, quantity 3".

[0461] The server then extracts order details such as product number and quantity from the converted text data. This information is sent to the terminal and repeated to the user via screen display and voice. The user reviews the information and, if necessary, re-enters it via voice input.

[0462] In the identity verification step, the server uses voice pattern matching technology to authenticate the user. Specifically, it analyzes the user's voice characteristics and matches them with pre-registered information to confirm their identity.

[0463] In addition, the server uses an emotion analysis engine to analyze the user's voice tone and speech content to determine the user's emotional state. Based on this emotional information, it generates responses that are empathetic to the user's mood, such as "How are you feeling today? Do you need any support?" This improves the user experience.

[0464] Furthermore, the server is equipped with an anomaly monitoring system that detects fraudulent and abnormal order patterns. When an anomaly occurs, it issues an alert and notifies the administrator to take appropriate action. In this way, the security of the entire system is maintained.

[0465] A possible example of a specific prompt message would include: "Convert the user's voice input to text and extract the order details. Analyze the user's emotional state and generate a response based on that."

[0466] This system enables efficient voice-based order processing and provides users with a comfortable and safe user experience.

[0467] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0468] Step 1:

[0469] The user uses a terminal to place an order by voice. At this stage, the user's voice data is entered. The terminal receives this voice data and sends it to the server. This transmitted voice data becomes the basis for the next processing step.

[0470] Step 2:

[0471] The server converts the received audio data into text data using speech recognition software. This conversion process utilizes a cloud-based speech recognition API, which analyzes the speech to accurately convert it into text data. This converted text data is then used for subsequent data processing to extract order information.

[0472] Step 3:

[0473] The server performs text analysis to extract order information from the text data. This process involves applying natural language processing techniques to the text data to identify specific order details such as product numbers and quantities. The extracted order information is sent to the terminal, where the user is prompted to confirm it visually or audibly.

[0474] Step 4:

[0475] The user confirms the order information displayed on the terminal or repeated aloud. If the information confirmed by the user is correct, they send a "Confirmation Complete" input back to the server via the terminal. This response triggers the order process to proceed.

[0476] Step 5:

[0477] The server verifies the user's identity using authentication methods. Specifically, it uses voice pattern matching technology to compare the user's voice with registered voice data. If this matching is successful, user authentication is complete. This authentication is a crucial step for secure payments and protection of personal information.

[0478] Step 6:

[0479] The terminal presents the user with payment method options, including credit cards, e-money, and bank transfers, visually displaying a variety of payment methods to the user. The user selects their preferred payment method through the terminal, and the selection is sent to the server, where the payment process begins.

[0480] Step 7:

[0481] The server uses an emotion analysis engine to analyze the user's emotional state from the voice data. This process utilizes a generative AI model to estimate the user's emotions from the tone and content of the voice. The estimated emotional information then becomes input data for the server to generate a response and provide an appropriate dialogue.

[0482] Step 8:

[0483] The server generates a response to present to the user, taking into account emotional and order information. This response is considerate of the user's emotional state and may include questions such as "Is there anything else I can help you with?" or suggestions for support. The generated response is sent to the terminal and presented to the user.

[0484] Step 9:

[0485] The server utilizes a fraud detection system to monitor unusual orders. This system identifies deviations from normal order patterns and issues warnings if fraud is suspected. This ensures consistent and secure operation of the system.

[0486] (Application Example 2)

[0487] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the smart glasses 214 will be referred to as the "terminal."

[0488] With the advancement of information and communication technology, services utilizing voice recognition are becoming widespread. However, challenges remain, such as issues with recognition accuracy when placing orders via voice input and insufficient responses to user dissatisfaction. Furthermore, detecting fraudulent orders and offering diverse payment options are also problematic. In particular, there is a need for flexible responses that respond to the emotions of users.

[0489] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0490] In this invention, the server includes a programming means for recognizing voice input with high accuracy, an emotion analysis means, and a means for analyzing the user's feelings from their speech and generating special offers based on individual inquiries. This improves the accuracy of voice recognition and enables flexible responses that respond to the user's emotions, providing the user with a comfortable ordering experience.

[0491] "A programmatic means for recognizing voice input with high accuracy" refers to software technology that converts the voice spoken by the user into a digital signal and precisely extracts its content as text data.

[0492] "Programming means for extracting order information" refers to software technology that identifies the items and quantities of the items the user wishes to purchase from recognized voice data and compiles them into an order.

[0493] "Display means" refers to devices or software that visually display information extracted by the system to the user, enabling them to confirm and select information.

[0494] "Authentication methods" refer to technologies that analyze voice patterns to verify the user's identity and guarantee secure transactions.

[0495] A "payment processing means" is a processing means for completing the payment process according to the payment method selected by the user.

[0496] "Recommendation methods" refer to algorithms and technologies used to recommend appropriate products and services to users based on their past purchase history.

[0497] "Emotional analysis means" refers to technology that analyzes the emotions contained in a user's voice and identifies the user's mental state.

[0498] "Fraud detection measures" refer to algorithms and technologies used to detect fraudulent or abnormal order behavior and ensure security.

[0499] "A means of analyzing user utterances to understand their feelings and generating special offers based on individual inquiries" refers to a method of analyzing the content of user utterances and the emotions contained therein, and providing special offers and information tailored to the user.

[0500] The system realizing this invention highly recognizes user voice input and processes order information based on it. The server converts the voice data into text format using a speech recognition engine and extracts the order details. The Google Cloud Speech-to-Text API is used for speech recognition to ensure high accuracy.

[0501] The server extracts order information from the voice input, which is then repeated back to the user via the terminal for confirmation. This confirmation process utilizes the terminal's display, allowing the user to visually verify the information.

[0502] For identity verification, the server uses voice pattern matching technology to perform voice authentication. This ensures highly accurate user authentication. This process plays a role in preventing unauthorized access and manipulation.

[0503] Next, regarding payment processing, the server proposes multiple payment methods to the user, and the payment is processed according to the method selected by the user. The payment processing is required to be secure and smooth, and various payment APIs are applied.

[0504] In emotion analysis, the server uses IBM Watson Tone Analyzer to analyze the user's voice tone and estimate their emotional state. Based on these analysis results, services and special offers tailored to the user's emotions are generated, improving the user's experience satisfaction.

[0505] The fraud detection mechanism allows the server to immediately send a notification to the responsible person if it detects an abnormal order pattern. This improves the overall security of the system.

[0506] For example, if a user says, "I'm tired today, so I'd like to order a pizza," and sentiment analysis determines that the user is feeling "tired," then an offer such as "Would you like a smoothie with your pizza?" will be generated.

[0507] An example of a prompt message might be, "If the user is in a mood for relaxation, suggest a seasonal drink."

[0508] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0509] Step 1:

[0510] Acquiring voice input

[0511] The user places an order by voice into the terminal. This voice data is recorded on the terminal and sent to the server. The input is the user's voice, and the output is digital voice data. The terminal performs signal processing, cutting out silent portions and compressing the audio before transmission.

[0512] Step 2:

[0513] Speech recognition processing

[0514] The server converts the received audio data into text using the Google Cloud Speech-to-Text API. The input is digital audio data, and the output is extracted text data. In this step, the server performs audio preprocessing, such as noise filtering, to improve recognition accuracy.

[0515] Step 3:

[0516] Extraction of order information

[0517] The server extracts order information, such as product names and quantities, from the converted text. The input is text data, and the output is structured order information. Keyword matching and natural language processing techniques are used to separate relevant information within the text.

[0518] Step 4:

[0519] Identity verification through voice pattern matching

[0520] The server analyzes voice patterns and performs identity verification by matching them with pre-registered voices. The input is voice feature data, and the output is the authentication result. Here, voice feature extraction technology is used, and the authentication process is performed by a machine learning model.

[0521] Step 5:

[0522] Emotion analysis

[0523] The server uses IBM Watson Tone Analyzer to estimate the user's emotional state from their speech. The input is text data of the user's utterances, and the output is an emotional label and score. The server analyzes the nuances of the utterances and processes the data to infer the user's psychological state.

[0524] Step 6:

[0525] Confirmation display and option presentation

[0526] The server extracts order information and sends it to the terminal, which then displays it on its screen for the user to confirm. The input is the order information, and the output is the display screen. The terminal uses a caching mechanism to optimize response speed.

[0527] Step 7:

[0528] Generating and presenting special offers

[0529] Based on the sentiment analysis results, the server generates a special offer tailored to the user's mood and sends it to the terminal. The input is a sentiment label, and the output is special offer data. The server uses a generation AI model to create a prompt message and notifies the user of relevant information.

[0530] Step 8:

[0531] Payment processing

[0532] The server processes the payment based on the payment method selected by the user. The input is the selected payment method, and the output is a payment completion notification. The server uses encryption technology to execute transactions to ensure secure and fast transactions.

[0533] Step 9:

[0534] Fraud detection and notification

[0535] The server employs fraud detection measures throughout the entire process and immediately sends an alert to the responsible person if an abnormal order pattern is detected. The input is order pattern data, and the output is a warning notification. The server monitors patterns using machine learning algorithms and performs actions to detect anomalies early.

[0536] The specific processing unit 290 transmits the result of the specific processing to the smart glasses 214. In the smart glasses 214, the control unit 46A causes the speaker 240 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0537] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0538] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the smart glasses 214.

[0539] [Third Embodiment]

[0540] Figure 5 shows an example of the configuration of the data processing system 310 according to the third embodiment.

[0541] As shown in Figure 5, the data processing system 310 includes a data processing device 12 and a headset terminal 314. An example of the data processing device 12 is a server.

[0542] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0543] The headset terminal 314 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a display 343. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and display 343 are also connected to the bus 52.

[0544] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0545] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0546] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0547] Figure 6 shows an example of the main functions of the data processing device 12 and the headset terminal 314. As shown in Figure 6, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0548] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0549] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0550] In the headset terminal 314, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0551] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the headset terminal 314 will be referred to as the "terminal".

[0552] This invention is a system that utilizes voice recognition technology to streamline television shopping orders and provide users with a smooth and secure purchasing experience. The operation of the entire system is described below.

[0553] The server first captures the user's voice input with high accuracy and converts the voice into text data using a speech recognition engine. At this stage, information such as product numbers and quantities related to the order is extracted. The terminal repeats the extracted order information to the user for confirmation. This prevents input errors.

[0554] Subsequently, the server performs voice pattern matching and verifies the user's identity by comparing it with their registration information. Once verified, the user selects their preferred payment method through their terminal. This system supports multiple payment methods, allowing for seamless transaction completion.

[0555] Furthermore, the server analyzes the user's past purchase history and suggests related products, providing the user with new purchasing opportunities. It also analyzes the user's emotions from their speech and, if positive feedback is received, employs an approach to encourage the purchase of suggested products.

[0556] Furthermore, the server employs a fraud detection algorithm to monitor unusual order patterns and immediately issues an alert if there are any signs of fraud. This further enhances security.

[0557] As a concrete example, when a user says, "This product looks interesting," the server captures the statement using speech recognition and identifies product number 789 as order data. The terminal then confirms, "Is one unit of product number 789 correct?" and the order is confirmed when the user responds, "Yes." After a verification process, the user selects credit card payment and the transaction is completed. Furthermore, if the user has a history of purchasing similar products in the past, the server suggests related products by saying, "How about this product as well?", naturally continuing the conversation. In this way, the present invention allows users to enjoy a safe and comfortable shopping experience.

[0558] The following describes the processing flow.

[0559] Step 1:

[0560] The user begins talking about the product they want to buy. Specifically, they state their order by voice, such as "I want this product."

[0561] Step 2:

[0562] The device captures the user's speech as audio data and sends that data to the server.

[0563] Step 3:

[0564] The server passes the received audio data to a speech recognition engine, which converts it into text. This extracts the product numbers and quantities needed for the order.

[0565] Step 4:

[0566] The terminal repeats the extracted order information to the user, confirming, for example, "Is it correct that the item number is 456 and the quantity is 2?"

[0567] Step 5:

[0568] The user responds with "yes" or "no" to the repeated order details.

[0569] Step 6:

[0570] The server performs voice pattern matching and verifies the user's identity by comparing it with user information.

[0571] Step 7:

[0572] Once the terminal has verified the user's identity, it presents the user with multiple payment methods and prompts them to choose. It will display options such as "Please choose from credit card, bank transfer, or deferred payment."

[0573] Step 8:

[0574] The user selects their preferred payment method, and the device sends this selection information to the server.

[0575] Step 9:

[0576] The server analyzes the user's past purchase history and suggests additional related products, displaying a message such as "We also recommend this product."

[0577] Step 10:

[0578] The server performs sentiment analysis, reading the user's emotions from their utterances and adjusting the response accordingly. For example, if the user is happy, it might generate a response such as "I'm glad you're satisfied."

[0579] Step 11:

[0580] The server monitors all order data and checks for any fraudulent order patterns. If an anomaly is detected, an alert is immediately issued and the responsible person is notified.

[0581] (Example 1)

[0582] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0583] Conventional trading systems utilizing voice recognition suffer from the potential for erroneous orders due to the low accuracy of voice input, and the complexities of payment and identity verification procedures. Furthermore, the suggestions based on purchase history are not sufficiently effective, resulting in a lack of improvement in the user's purchasing experience. In addition, the accuracy of fraudulent order detection is insufficient, posing a security risk.

[0584] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0585] In this invention, the server includes processing means for acquiring voice input with high accuracy, programming means for converting the acquired voice data into text data, and information processing means for extracting order information from the converted voice data. This enables the acquisition of voice input with high accuracy and the generation of accurate order information.

[0586] "A processing method for acquiring audio input with high accuracy" refers to a function that captures audio in an optimal state and removes noise and external interference to maintain sound quality.

[0587] "A program that converts acquired audio data into text data" refers to a program that converts human speech into digital text information and makes it into a format that can be processed by a computer.

[0588] "Information processing means for extracting order information from converted audio data" refers to a function that analyzes the text information converted from audio and identifies order-related information such as product number and quantity from it.

[0589] A "customer confirmation method via voice output" is a means of prompting a user to confirm their order by playing back the details of their order in audio.

[0590] "An identification method that uses voice data to verify identity" refers to a function that analyzes the characteristics of the voice and compares it with existing registration information to verify the user's identity.

[0591] "A payment selection method that allows users to choose from multiple different payment methods" refers to a function that allows users to select from various payment methods when making a transaction.

[0592] A "product suggestion method that proposes related products based on past purchase history" is a function that suggests new related products to a user based on their purchase history.

[0593] "An emotion analysis tool for analyzing user emotions" is a function that determines the user's emotions from their voice or speech and adjusts its response accordingly.

[0594] An "anomaly detection mechanism that detects abnormal orders and issues warnings" is a function that detects suspicious activity that deviates from normal order patterns and issues warnings as necessary.

[0595] This invention is a system that utilizes speech recognition technology to enable smooth transactions using the user's voice. The server, terminal, and user all work together, progressing through the following steps.

[0596] The server uses high-performance microphones and voice input devices to acquire voice input with high accuracy. After acquiring the voice data, a speech recognition engine (e.g., Google Cloud Speech-to-Text) is used to convert the voice into text data. The generative AI model used in this conversion achieves highly accurate and natural speech recognition. In this process, it is important to minimize ambient noise by utilizing noise cancellation technology and acoustic models.

[0597] From the converted text data, the server uses regular expressions and natural language processing techniques to identify order information and quantities. The information extracted by the server is then confirmed by voice to the user via the terminal. This confirmation method uses speech synthesis technology and employs a natural conversational style, such as, "Is it correct to order one item of product number 789?"

[0598] In the identity verification step, the server analyzes the voice pattern and uses identification technology to compare it with registered information. This allows for highly accurate verification of the user's identity. After authentication is complete, the user can choose from multiple payment methods through the terminal, and interfaces for credit cards and electronic money are provided.

[0599] Furthermore, the server uses machine learning algorithms to analyze past purchase history and suggest related products. For example, if a user asks, "What products do you recommend?", it can suggest new products that take into account the trends of products purchased in the past.

[0600] Ultimately, the server uses fraud detection algorithms to monitor abnormal order patterns in real time and issue alerts to administrators as needed. This significantly improves the overall security of the system.

[0601] Examples of prompt messages include the following:

[0602] "I would like to order one of these items."

[0603] "I'll pay by credit card"

[0604] "Tell me your recommended products."

[0605] Through these steps, the present invention provides users with an efficient and secure purchasing experience.

[0606] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0607] Step 1:

[0608] The server receives voice input from the user. The input is the user's speech, and the server uses a high-performance voice input device to obtain clear voice data while removing ambient noise. The output is digitized voice data.

[0609] Step 2:

[0610] The server uses a speech recognition engine to convert the acquired audio data into text data. The input is the audio data from step 1, which is then converted into text information using an AI model. The output is the user's spoken content as text data.

[0611] Step 3:

[0612] The server extracts order information from the converted text data. The input is the text data obtained in step 2. Regular expressions or natural language processing techniques are used to identify order-related information such as product numbers and quantities. The output is a dataset of order information.

[0613] Step 4:

[0614] The terminal confirms the order information received from the server with the user via voice. The input is the order information from step 3, and speech synthesis technology is used to generate a confirmation voice in a natural conversational format, such as "Is it okay to order item number 789, one unit?". The output is a voice output to the user.

[0615] Step 5:

[0616] The server verifies the user's identity using their voice pattern. The input is the voice data obtained in step 1, which is identified by comparing it with a registered voice profile. The output is the identity verification result.

[0617] Step 6:

[0618] The user selects their preferred payment method from a list of options displayed on the terminal. The input is a list of payment methods displayed on the terminal, from which the user makes their selection. The output is the selected payment method.

[0619] Step 7:

[0620] The server analyzes past purchase history and suggests related products. The input is the user's purchase history data, and a machine learning algorithm is used to select relevant products. The output is information on the suggested products.

[0621] Step 8:

[0622] The server runs a fraud detection algorithm and monitors for unusual orders. The input is the order information obtained in step 3 and the user's activity log, which is compared to typical order patterns. The output is the fraud detection result, and an alert is issued if necessary.

[0623] (Application Example 1)

[0624] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0625] Traditional online shopping systems suffer from usability issues, as users are required to perform numerous steps to find products and complete the ordering process. Furthermore, there is a risk of fraudulent orders. There is a need to solve these problems and provide a highly accurate and secure purchasing experience.

[0626] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0627] In this invention, the server includes processing means for recognizing voice input with high accuracy, processing means for extracting order-related data from the recognized voice information, and support means for quickly identifying products based on voice recognition and confirming them with the user. This allows users to smoothly order products using only their voice, while also detecting fraudulent orders and enabling a safe and comfortable purchasing experience.

[0628] "Voice input" refers to the process of recording a user's spoken words as digital data.

[0629] "Processing means" refers to a device or program that analyzes and calculates data such as audio information to achieve a specific purpose.

[0630] "Audio information" refers to audio data emitted by the user, which is recorded as an acoustic signal.

[0631] "Order data" refers to purchase information, including the products to be purchased and their quantities.

[0632] "Display means" refers to a device or apparatus that visually presents information to the user.

[0633] "Identity verification" is a procedure to confirm whether a user is a legitimate registered user.

[0634] "Authentication means" refers to a device or technology used to prove the authority or identity of a specific individual.

[0635] "Payment method" refers to the means and procedures for paying for a purchase.

[0636] A "payment processing method" is a system or function for processing the exchange of payment.

[0637] A "recommendation means" is a device or program for recommending products based on the user's preferences and history.

[0638] "Emotional analysis means" refers to a technology or program that determines a user's psychological state from their voice data.

[0639] "Fraud detection measures" are devices or systems that identify unusual or fraudulent patterns or behaviors.

[0640] "Support measures" refer to devices or technologies designed to assist with specific tasks or activities.

[0641] The voice recognition system in this invention is implemented as a smartphone application and streamlines online shopping through voice input by the user.

[0642] First, when the user speaks about the product into their smartphone's microphone, the device captures the audio as digital data. This audio data is then converted into text using speech recognition software. Speech recognition technologies such as the speech_recognition library are utilized in this process. Order-related information is extracted from the text data, identifying the product number and quantity.

[0643] Next, the server confirms the order information with the user via voice or text, and finalizes the order based on the user's response. After this confirmation, the server verifies the user's identity using authentication methods. Security is ensured by using voice pattern matching technology and comparing it with registered voice data.

[0644] Subsequently, the server processes the payment, allowing the user to select their preferred payment method from several options. This process utilizes fintech technology to ensure a smooth transaction. Furthermore, the server analyzes past purchase history and suggests related products, providing users with new purchasing opportunities. Sentiment analysis technology is used to extract interest from the user's positive utterances and improve the accuracy of recommendations.

[0645] Furthermore, the server is equipped with a fraud detection algorithm that immediately issues a warning if it detects an abnormal order pattern, thereby preventing fraudulent transactions.

[0646] As a concrete example, when a user says "I'd like to order one of these sofas" on their smartphone, the system recognizes the voice and identifies the product number and quantity. A confirmation prompt is generated asking, "Product number 5678, one unit, is that correct?" After the user responds, identity verification and payment procedures are completed, and related products are suggested based on the user's past purchase history of similar items, such as "Would you also like this cushion?" This process allows users to easily complete orders using only their voice.

[0647] An example of a prompt for a generative AI model might be, "Design an application that allows users to order desired products by voice, and smoothly handle confirmation and payment."

[0648] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0649] Step 1:

[0650] The user speaks into their smartphone's microphone about the product they wish to purchase. The input is the user's voice data, which the device acquires as digital data. This voice data serves as basic information for processing in the next step.

[0651] Step 2:

[0652] The device converts the acquired audio data into text data using the speech_recognition library. The input is audio data, and phoneme analysis is performed by the speech recognition engine to obtain the output as text containing the order details. This text data contains the order information intended by the user.

[0653] Step 3:

[0654] The terminal analyzes order information from the text data and extracts the product number and quantity. The input is the text data obtained in step 2, and by using natural language processing technology to identify and extract the necessary data, the product number and quantity are obtained as output.

[0655] Step 4:

[0656] The server confirms the extracted order information with the user via voice or text. The input is the product information obtained in step 3, and the server generates a confirmation message, which is then delivered to the user as synthesized speech or text. This confirmation helps prevent input errors.

[0657] Step 5:

[0658] The server receives the user's response and decides whether to confirm the order based on the response. The input is either the user's voice response or text data, which is analyzed to make the final order confirmation. If the correct response is confirmed, the order is officially confirmed.

[0659] Step 6:

[0660] The server performs user identity verification. Inputs include voice patterns and user registration information, and authentication methods are used to verify that the user is legitimate. The output is either an approval or rejection of the identity verification.

[0661] Step 7:

[0662] The server initiates the payment process, allowing the user to select their preferred payment method. The input consists of order confirmation and identity verification information, and fintech technology is used to perform the appropriate payment procedure. The output generates a status indicating that the transaction is complete.

[0663] Step 8:

[0664] The server analyzes past purchase history and suggests related products to the user. The input is the user's purchase history data, and appropriate products are selected based on database analysis. The output is a list of products suggested to the user.

[0665] Step 9:

[0666] The server monitors for abnormal order patterns using a fraud detection algorithm. The input is all order data, which is compared to pre-configured fraud patterns. If an anomaly is detected, an alert is issued.

[0667] This entire process allows users to shop easily and securely using voice commands.

[0668] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0669] This invention relates to a system equipped with a program that accurately recognizes user voice input and extracts order information from the voice data. The server converts the voice data into text using a speech recognition engine and extracts the product number and quantity. This data is repeated to the user via a terminal for confirmation.

[0670] Identity verification is performed by the server using voice pattern matching technology, ensuring accurate user authentication through the authentication method. After authentication is complete, the terminal presents the user with multiple payment methods and allows them to select their preferred method. The selection information is sent to the server, and the appropriate payment processing is carried out.

[0671] A distinctive feature of this system is its emotion analysis method, which incorporates an emotion engine. The server analyzes the user's voice tone and speech content using the emotion engine to estimate the user's emotions. Based on this emotion information, the server generates an appropriate response that matches the user's mood, and the terminal presents that response to the user.

[0672] For example, if a user expresses dissatisfaction, the emotion engine instantly detects that negative emotion. Based on this information, the server generates a response such as, "Shall I explain more about this product?", providing an interaction that mitigates the negative emotion. The emotion engine also takes past emotional patterns into consideration, using this information to generate future responses that ensure users always have a positive experience.

[0673] Furthermore, this system uses fraud detection mechanisms to immediately issue alerts and notify the responsible personnel if abnormal order patterns are detected. This further enhances security.

[0674] Thus, the system of the present invention highly integrates analysis, recognition, authentication, and payment, and provides a comprehensive platform for improving the user experience through an emotion engine.

[0675] The following describes the processing flow.

[0676] Step 1:

[0677] The user makes a verbal utterance indicating their intention to purchase. Specifically, they request a product by saying, "I want to buy this."

[0678] Step 2:

[0679] The device captures the user's speech as audio data and sends that data to the server.

[0680] Step 3:

[0681] The server inputs the transmitted voice data into a speech recognition engine and converts it into text. At this stage, order information, including product numbers and quantities, is extracted.

[0682] Step 4:

[0683] The terminal repeats the extracted order information to the user, asking for confirmation: "Is it correct that the item number is 123 and the quantity is 2?"

[0684] Step 5:

[0685] The user responds to the order confirmation with "yes" or "no". If the user answers "no", the device prompts them to re-enter the order details.

[0686] Step 6:

[0687] The server performs voice pattern analysis and verifies the user's identity by comparing it with existing user information.

[0688] Step 7:

[0689] Once the device has completed identity verification, it will present the user with several payment options, displaying a message such as, "Please choose from credit card, bank transfer, or deferred payment."

[0690] Step 8:

[0691] The user selects their preferred payment method. This selection is sent to the server via the terminal, and the payment process begins.

[0692] Step 9:

[0693] The server uses an emotion engine to analyze the user's voice tone and assess their emotional state. Based on this information, it adjusts its next response.

[0694] Step 10:

[0695] The server checks the user's past emotional patterns and purchase history, and suggests related products and special offers. It displays a message like, "We also recommend this product," through the user's device.

[0696] Step 11:

[0697] The server activates a fraud detection algorithm and issues real-time alerts if any unusual orders are detected. This notification is then sent to the responsible person, enabling a swift response.

[0698] (Example 2)

[0699] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0700] In modern society, ordering systems using voice input are widespread, but the accuracy of voice recognition and the generation of appropriate responses that respond to user emotions are insufficient. Furthermore, from a security standpoint, there is a lack of mechanisms to detect fraudulent orders. As a result, the user experience deteriorates and the reliability of the system is compromised.

[0701] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0702] In this invention, the server includes control means for recognizing voice input with high accuracy, response generation means for generating and presenting responses based on emotional information, and anomaly monitoring means for detecting fraudulent orders. This enables high-accuracy recognition of the user's voice, allows for appropriate responses in response to emotions during actual order processing, and prevents fraudulent orders, thereby improving the user experience and the reliability of the system.

[0703] "Voice input" is a method for users to communicate information to a system using spoken language.

[0704] "High-precision recognition" refers to analyzing audio information, converting it into text or numbers, and minimizing errors.

[0705] "Control means" refers to devices or programs that have functions for inputting, analyzing, converting, and outputting various types of data.

[0706] "Order details" refers to specific purchase information, such as product numbers and quantities, extracted from the user's voice.

[0707] "Display means" refers to monitors and display devices used to present text and images to users.

[0708] "Authentication methods" refer to technologies and devices used to verify a user's personal information and identify them as that person.

[0709] A "payment management system" is a system that offers multiple payment methods and allows users to make payments using the method they choose.

[0710] A "suggestion method" refers to a device or program that has the function of suggesting products or services suitable for the user based on their past purchase history.

[0711] "Emotional evaluation methods" refer to technologies that analyze emotional elements from a user's voice or text data to determine their emotional state.

[0712] An "anomaly monitoring system" is a system that has the function of detecting patterns that differ from normal operation and warning of fraud or abnormalities.

[0713] A "response generation means" is a process or apparatus for constructing and presenting appropriate responses or messages based on analyzed data and emotional information.

[0714] The system for carrying out the present invention recognizes voice input and integrates order processing, user authentication, sentiment analysis, and anomaly monitoring based on that input. Specific embodiments are as follows.

[0715] First, the user uses a device to input their voice. This device can be a standard smartphone or tablet. When the user gives their order instructions by voice, this voice data is transmitted to the server via the internet through the device.

[0716] Next, the server uses speech recognition software (for example, a cloud-based API that provides speech recognition services) to convert the voice data into text data. A specific example is using a speech recognition service to convert the voice command "I want to order 3 iPhone 12s" into the text data "Product number iPhone 12, quantity 3".

[0717] The server then extracts order details such as product number and quantity from the converted text data. This information is sent to the terminal and repeated to the user via screen display and voice. The user reviews the information and, if necessary, re-enters it via voice input.

[0718] In the identity verification step, the server uses voice pattern matching technology to authenticate the user. Specifically, it analyzes the user's voice characteristics and matches them with pre-registered information to confirm their identity.

[0719] In addition, the server uses an emotion analysis engine to analyze the user's voice tone and speech content to determine the user's emotional state. Based on this emotional information, it generates responses that are empathetic to the user's mood, such as "How are you feeling today? Do you need any support?" This improves the user experience.

[0720] Furthermore, the server is equipped with an anomaly monitoring system that detects fraudulent and abnormal order patterns. When an anomaly occurs, it issues an alert and notifies the administrator to take appropriate action. In this way, the security of the entire system is maintained.

[0721] A possible example of a specific prompt message would include: "Convert the user's voice input to text and extract the order details. Analyze the user's emotional state and generate a response based on that."

[0722] This system enables efficient voice-based order processing and provides users with a comfortable and safe user experience.

[0723] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0724] Step 1:

[0725] The user uses a terminal to place an order by voice. At this stage, the user's voice data is entered. The terminal receives this voice data and sends it to the server. This transmitted voice data becomes the basis for the next processing step.

[0726] Step 2:

[0727] The server converts the received audio data into text data using speech recognition software. This conversion process utilizes a cloud-based speech recognition API, which analyzes the speech to accurately convert it into text data. This converted text data is then used for subsequent data processing to extract order information.

[0728] Step 3:

[0729] The server performs text analysis to extract order information from the text data. This process involves applying natural language processing techniques to the text data to identify specific order details such as product numbers and quantities. The extracted order information is sent to the terminal, where the user is prompted to confirm it visually or audibly.

[0730] Step 4:

[0731] The user confirms the order information displayed on the terminal or repeated aloud. If the information confirmed by the user is correct, they send a "Confirmation Complete" input back to the server via the terminal. This response triggers the order process to proceed.

[0732] Step 5:

[0733] The server verifies the user's identity using authentication methods. Specifically, it uses voice pattern matching technology to compare the user's voice with registered voice data. If this matching is successful, user authentication is complete. This authentication is a crucial step for secure payments and protection of personal information.

[0734] Step 6:

[0735] The terminal presents the user with payment method options, including credit cards, e-money, and bank transfers, visually displaying a variety of payment methods to the user. The user selects their preferred payment method through the terminal, and the selection is sent to the server, where the payment process begins.

[0736] Step 7:

[0737] The server uses an emotion analysis engine to analyze the user's emotional state from the voice data. This process utilizes a generative AI model to estimate the user's emotions from the tone and content of the voice. The estimated emotional information then becomes input data for the server to generate a response and provide an appropriate dialogue.

[0738] Step 8:

[0739] The server generates a response to present to the user, taking into account emotional and order information. This response is considerate of the user's emotional state and may include questions such as "Is there anything else I can help you with?" or suggestions for support. The generated response is sent to the terminal and presented to the user.

[0740] Step 9:

[0741] The server utilizes a fraud detection system to monitor unusual orders. This system identifies deviations from normal order patterns and issues warnings if fraud is suspected. This ensures consistent and secure operation of the system.

[0742] (Application Example 2)

[0743] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server," and the headset-type terminal 314 will be referred to as the "terminal."

[0744] With the advancement of information and communication technology, services utilizing voice recognition are becoming widespread. However, challenges remain, such as issues with recognition accuracy when placing orders via voice input and insufficient responses to user dissatisfaction. Furthermore, detecting fraudulent orders and offering diverse payment options are also problematic. In particular, there is a need for flexible responses that respond to the emotions of users.

[0745] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[0746] In this invention, the server includes a programming means for recognizing voice input with high accuracy, an emotion analysis means, and a means for analyzing the user's feelings from their speech and generating special offers based on individual inquiries. This improves the accuracy of voice recognition and enables flexible responses that respond to the user's emotions, providing the user with a comfortable ordering experience.

[0747] "A programmatic means for recognizing voice input with high accuracy" refers to software technology that converts the voice spoken by the user into a digital signal and precisely extracts its content as text data.

[0748] "Programming means for extracting order information" refers to software technology that identifies the items and quantities of the items the user wishes to purchase from recognized voice data and compiles them into an order.

[0749] "Display means" refers to devices or software that visually display information extracted by the system to the user, enabling them to confirm and select information.

[0750] "Authentication methods" refer to technologies that analyze voice patterns to verify the user's identity and guarantee secure transactions.

[0751] A "payment processing means" is a processing means for completing the payment process according to the payment method selected by the user.

[0752] "Recommendation methods" refer to algorithms and technologies used to recommend appropriate products and services to users based on their past purchase history.

[0753] "Emotional analysis means" refers to technology that analyzes the emotions contained in a user's voice and identifies the user's mental state.

[0754] "Fraud detection measures" refer to algorithms and technologies used to detect fraudulent or abnormal order behavior and ensure security.

[0755] "A means of analyzing user utterances to understand their feelings and generating special offers based on individual inquiries" refers to a method of analyzing the content of user utterances and the emotions contained therein, and providing special offers and information tailored to the user.

[0756] The system realizing this invention highly recognizes user voice input and processes order information based on it. The server converts the voice data into text format using a speech recognition engine and extracts the order details. The Google Cloud Speech-to-Text API is used for speech recognition to ensure high accuracy.

[0757] The server extracts order information from the voice input, which is then repeated back to the user via the terminal for confirmation. This confirmation process utilizes the terminal's display, allowing the user to visually verify the information.

[0758] For identity verification, the server uses voice pattern matching technology to perform voice authentication. This ensures highly accurate user authentication. This process plays a role in preventing unauthorized access and manipulation.

[0759] Next, regarding payment processing, the server proposes multiple payment methods to the user, and the payment is processed according to the method selected by the user. The payment processing is required to be secure and smooth, and various payment APIs are applied.

[0760] In emotion analysis, the server uses IBM Watson Tone Analyzer to analyze the user's voice tone and estimate their emotional state. Based on these analysis results, services and special offers tailored to the user's emotions are generated, improving the user's experience satisfaction.

[0761] The fraud detection mechanism allows the server to immediately send a notification to the responsible person if it detects an abnormal order pattern. This improves the overall security of the system.

[0762] For example, if a user says, "I'm tired today, so I'd like to order a pizza," and sentiment analysis determines that the user is feeling "tired," then an offer such as "Would you like a smoothie with your pizza?" will be generated.

[0763] An example of a prompt message might be, "If the user is in a mood for relaxation, suggest a seasonal drink."

[0764] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[0765] Step 1:

[0766] Acquiring voice input

[0767] The user places an order by voice into the terminal. This voice data is recorded on the terminal and sent to the server. The input is the user's voice, and the output is digital voice data. The terminal performs signal processing, cutting out silent portions and compressing the audio before transmission.

[0768] Step 2:

[0769] Speech recognition processing

[0770] The server converts the received audio data into text using the Google Cloud Speech-to-Text API. The input is digital audio data, and the output is extracted text data. In this step, the server performs audio preprocessing, such as noise filtering, to improve recognition accuracy.

[0771] Step 3:

[0772] Extraction of order information

[0773] The server extracts order information, such as product names and quantities, from the converted text. The input is text data, and the output is structured order information. Keyword matching and natural language processing techniques are used to separate relevant information within the text.

[0774] Step 4:

[0775] Identity verification through voice pattern matching

[0776] The server analyzes voice patterns and performs identity verification by matching them with pre-registered voices. The input is voice feature data, and the output is the authentication result. Here, voice feature extraction technology is used, and the authentication process is performed by a machine learning model.

[0777] Step 5:

[0778] Emotion analysis

[0779] The server uses IBM Watson Tone Analyzer to estimate the user's emotional state from their speech. The input is text data of the user's utterances, and the output is an emotional label and score. The server analyzes the nuances of the utterances and processes the data to infer the user's psychological state.

[0780] Step 6:

[0781] Confirmation display and option presentation

[0782] The server extracts order information and sends it to the terminal, which then displays it on its screen for the user to confirm. The input is the order information, and the output is the display screen. The terminal uses a caching mechanism to optimize response speed.

[0783] Step 7:

[0784] Generating and presenting special offers

[0785] Based on the sentiment analysis results, the server generates a special offer tailored to the user's mood and sends it to the terminal. The input is a sentiment label, and the output is special offer data. The server uses a generation AI model to create a prompt message and notifies the user of relevant information.

[0786] Step 8:

[0787] Payment processing

[0788] The server processes the payment based on the payment method selected by the user. The input is the selected payment method, and the output is a payment completion notification. The server uses encryption technology to execute transactions to ensure secure and fast transactions.

[0789] Step 9:

[0790] Fraud detection and notification

[0791] The server employs fraud detection measures throughout the entire process and immediately sends an alert to the responsible person if an abnormal order pattern is detected. The input is order pattern data, and the output is a warning notification. The server monitors patterns using machine learning algorithms and performs actions to detect anomalies early.

[0792] The specific processing unit 290 transmits the result of the specific processing to the headset terminal 314. In the headset terminal 314, the control unit 46A causes the speaker 240 and display 343 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[0793] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[0794] In the above embodiment, an example was given in which specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and specific processing may also be performed by the headset terminal 314.

[0795] [Fourth Embodiment]

[0796] Figure 7 shows an example of the configuration of the data processing system 410 according to the fourth embodiment.

[0797] As shown in Figure 7, the data processing system 410 includes a data processing device 12 and a robot 414. An example of the data processing device 12 is a server.

[0798] The data processing device 12 comprises a computer 22, a database 24, and a communication interface 26. The computer 22 is an example of a "computer" related to the technology of this disclosure. The computer 22 comprises a processor 28, RAM 30, and storage 32. The processor 28, RAM 30, and storage 32 are connected to a bus 34. The database 24 and the communication interface 26 are also connected to the bus 34. The communication interface 26 is connected to a network 54. An example of the network 54 is a WAN (Wide Area Network) and / or a LAN (Local Area Network).

[0799] The robot 414 includes a computer 36, a microphone 238, a speaker 240, a camera 42, a communication interface 44, and a controlled object 443. The computer 36 includes a processor 46, RAM 48, and storage 50. The processor 46, RAM 48, and storage 50 are connected to a bus 52. The microphone 238, speaker 240, camera 42, and controlled object 443 are also connected to the bus 52.

[0800] The microphone 238 receives voice signals from the user 20 and receives instructions from the user 20. The microphone 238 captures the voice signals from the user 20, converts the captured voice into audio data, and outputs it to the processor 46. The speaker 240 outputs audio according to the instructions from the processor 46.

[0801] Camera 42 is a small digital camera equipped with an optical system including a lens, aperture, and shutter, and an image sensor such as a CMOS (Complementary Metal-Oxide-Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor, and captures images of the area around the user 20 (for example, an imaging range defined by a field of view equivalent to the width of a typical healthy person's field of vision).

[0802] Communication interface 44 is connected to network 54. Communication interfaces 44 and 26 are responsible for the exchange of various information between processor 46 and processor 28 via network 54. The exchange of various information between processor 46 and processor 28 using communication interfaces 44 and 26 is performed in a secure manner.

[0803] The controlled object 443 includes a display device, LEDs in the eyes, and motors that drive the arms, hands, and feet. The posture and gestures of the robot 414 are controlled by controlling the motors of the arms, hands, and feet. Some of the robot 414's emotions can be expressed by controlling these motors. Furthermore, the robot 414's facial expressions can also be expressed by controlling the illumination state of the LEDs in its eyes.

[0804] Figure 8 shows an example of the main functions of the data processing device 12 and the robot 414. As shown in Figure 8, the data processing device 12 performs specific processing using the processor 28. The storage 32 stores the specific processing program 56.

[0805] The specific processing program 56 is an example of a "program" relating to the technology of this disclosure. The processor 28 reads the specific processing program 56 from the storage 32 and executes the read specific processing program 56 on the RAM 30. The specific processing is realized by the processor 28 operating as a specific processing unit 290 in accordance with the specific processing program 56 executed on the RAM 30.

[0806] The storage 32 stores the data generation model 58 and the emotion identification model 59. The data generation model 58 and the emotion identification model 59 are used by the identification processing unit 290.

[0807] In robot 414, the processor 46 performs the reception output processing. The storage 50 stores the reception output program 60. The processor 46 reads the reception output program 60 from the storage 50 and executes the read reception output program 60 on the RAM 48. The reception output processing is realized by the processor 46 operating as a control unit 46A according to the reception output program 60 executed on the RAM 48.

[0808] Next, the specific processing performed by the specific processing unit 290 of the data processing device 12 will be described. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0809] This invention is a system that utilizes voice recognition technology to streamline television shopping orders and provide users with a smooth and secure purchasing experience. The operation of the entire system is described below.

[0810] The server first captures the user's voice input with high accuracy and converts the voice into text data using a speech recognition engine. At this stage, information such as product numbers and quantities related to the order is extracted. The terminal repeats the extracted order information to the user for confirmation. This prevents input errors.

[0811] Subsequently, the server performs voice pattern matching and verifies the user's identity by comparing it with their registration information. Once verified, the user selects their preferred payment method through their terminal. This system supports multiple payment methods, allowing for seamless transaction completion.

[0812] Furthermore, the server analyzes the user's past purchase history and suggests related products, providing the user with new purchasing opportunities. It also analyzes the user's emotions from their speech and, if positive feedback is received, employs an approach to encourage the purchase of suggested products.

[0813] Furthermore, the server employs a fraud detection algorithm to monitor unusual order patterns and immediately issues an alert if there are any signs of fraud. This further enhances security.

[0814] As a concrete example, when a user says, "This product looks interesting," the server captures the statement using speech recognition and identifies product number 789 as order data. The terminal then confirms, "Is one unit of product number 789 correct?" and the order is confirmed when the user responds, "Yes." After a verification process, the user selects credit card payment and the transaction is completed. Furthermore, if the user has a history of purchasing similar products in the past, the server suggests related products by saying, "How about this product as well?", naturally continuing the conversation. In this way, the present invention allows users to enjoy a safe and comfortable shopping experience.

[0815] The following describes the processing flow.

[0816] Step 1:

[0817] The user begins talking about the product they want to buy. Specifically, they state their order by voice, such as "I want this product."

[0818] Step 2:

[0819] The device captures the user's speech as audio data and sends that data to the server.

[0820] Step 3:

[0821] The server passes the received audio data to a speech recognition engine, which converts it into text. This extracts the product numbers and quantities needed for the order.

[0822] Step 4:

[0823] The terminal repeats the extracted order information to the user, confirming, for example, "Is it correct that the item number is 456 and the quantity is 2?"

[0824] Step 5:

[0825] The user responds with "yes" or "no" to the repeated order details.

[0826] Step 6:

[0827] The server performs voice pattern matching and verifies the user's identity by comparing it with user information.

[0828] Step 7:

[0829] Once the terminal has verified the user's identity, it presents the user with multiple payment methods and prompts them to choose. It will display options such as "Please choose from credit card, bank transfer, or deferred payment."

[0830] Step 8:

[0831] The user selects their preferred payment method, and the device sends this selection information to the server.

[0832] Step 9:

[0833] The server analyzes the user's past purchase history and suggests additional related products, displaying a message such as "We also recommend this product."

[0834] Step 10:

[0835] The server performs sentiment analysis, reading the user's emotions from their utterances and adjusting the response accordingly. For example, if the user is happy, it might generate a response such as "I'm glad you're satisfied."

[0836] Step 11:

[0837] The server monitors all order data and checks for any fraudulent order patterns. If an anomaly is detected, an alert is immediately issued and the responsible person is notified.

[0838] (Example 1)

[0839] Next, we will describe Example 1. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0840] Conventional trading systems utilizing voice recognition suffer from the potential for erroneous orders due to the low accuracy of voice input, and the complexities of payment and identity verification procedures. Furthermore, the suggestions based on purchase history are not sufficiently effective, resulting in a lack of improvement in the user's purchasing experience. In addition, the accuracy of fraudulent order detection is insufficient, posing a security risk.

[0841] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 1 is realized by the following means.

[0842] In this invention, the server includes processing means for acquiring voice input with high accuracy, programming means for converting the acquired voice data into text data, and information processing means for extracting order information from the converted voice data. This enables the acquisition of voice input with high accuracy and the generation of accurate order information.

[0843] "A processing method for acquiring audio input with high accuracy" refers to a function that captures audio in an optimal state and removes noise and external interference to maintain sound quality.

[0844] "A program that converts acquired audio data into text data" refers to a program that converts human speech into digital text information and makes it into a format that can be processed by a computer.

[0845] "Information processing means for extracting order information from converted audio data" refers to a function that analyzes the text information converted from audio and identifies order-related information such as product number and quantity from it.

[0846] A "customer confirmation method via voice output" is a means of prompting a user to confirm their order by playing back the details of their order in audio.

[0847] "An identification method that uses voice data to verify identity" refers to a function that analyzes the characteristics of the voice and compares it with existing registration information to verify the user's identity.

[0848] "A payment selection method that allows users to choose from multiple different payment methods" refers to a function that allows users to select from various payment methods when making a transaction.

[0849] A "product suggestion method that proposes related products based on past purchase history" is a function that suggests new related products to a user based on their purchase history.

[0850] "An emotion analysis tool for analyzing user emotions" is a function that determines the user's emotions from their voice or speech and adjusts its response accordingly.

[0851] An "anomaly detection mechanism that detects abnormal orders and issues warnings" is a function that detects suspicious activity that deviates from normal order patterns and issues warnings as necessary.

[0852] This invention is a system that utilizes speech recognition technology to enable smooth transactions using the user's voice. The server, terminal, and user all work together, progressing through the following steps.

[0853] The server uses high-performance microphones and voice input devices to acquire voice input with high accuracy. After acquiring the voice data, a speech recognition engine (e.g., Google Cloud Speech-to-Text) is used to convert the voice into text data. The generative AI model used in this conversion achieves highly accurate and natural speech recognition. In this process, it is important to minimize ambient noise by utilizing noise cancellation technology and acoustic models.

[0854] From the converted text data, the server uses regular expressions and natural language processing techniques to identify order information and quantities. The information extracted by the server is then confirmed by voice to the user via the terminal. This confirmation method uses speech synthesis technology and employs a natural conversational style, such as, "Is it correct to order one item of product number 789?"

[0855] In the identity verification step, the server analyzes the voice pattern and uses identification technology to compare it with registered information. This allows for highly accurate verification of the user's identity. After authentication is complete, the user can choose from multiple payment methods through the terminal, and interfaces for credit cards and electronic money are provided.

[0856] Furthermore, the server uses machine learning algorithms to analyze past purchase history and suggest related products. For example, if a user asks, "What products do you recommend?", it can suggest new products that take into account the trends of products purchased in the past.

[0857] Ultimately, the server uses fraud detection algorithms to monitor abnormal order patterns in real time and issue alerts to administrators as needed. This significantly improves the overall security of the system.

[0858] Examples of prompt messages include the following:

[0859] "I would like to order one of these items."

[0860] "I'll pay by credit card"

[0861] "Tell me your recommended products."

[0862] Through these steps, the present invention provides users with an efficient and secure purchasing experience.

[0863] The flow of the specific processing in Example 1 will be explained using Figure 11.

[0864] Step 1:

[0865] The server receives voice input from the user. The input is the user's speech, and the server uses a high-performance voice input device to obtain clear voice data while removing ambient noise. The output is digitized voice data.

[0866] Step 2:

[0867] The server uses a speech recognition engine to convert the acquired audio data into text data. The input is the audio data from step 1, which is then converted into text information using an AI model. The output is the user's spoken content as text data.

[0868] Step 3:

[0869] The server extracts order information from the converted text data. The input is the text data obtained in step 2. Regular expressions or natural language processing techniques are used to identify order-related information such as product numbers and quantities. The output is a dataset of order information.

[0870] Step 4:

[0871] The terminal confirms the order information received from the server with the user via voice. The input is the order information from step 3, and speech synthesis technology is used to generate a confirmation voice in a natural conversational format, such as "Is it okay to order item number 789, one unit?". The output is a voice output to the user.

[0872] Step 5:

[0873] The server verifies the user's identity using their voice pattern. The input is the voice data obtained in step 1, which is identified by comparing it with a registered voice profile. The output is the identity verification result.

[0874] Step 6:

[0875] The user selects their preferred payment method from a list of options displayed on the terminal. The input is a list of payment methods displayed on the terminal, from which the user makes their selection. The output is the selected payment method.

[0876] Step 7:

[0877] The server analyzes past purchase history and suggests related products. The input is the user's purchase history data, and a machine learning algorithm is used to select relevant products. The output is information on the suggested products.

[0878] Step 8:

[0879] The server runs a fraud detection algorithm and monitors for unusual orders. The input is the order information obtained in step 3 and the user's activity log, which is compared to typical order patterns. The output is the fraud detection result, and an alert is issued if necessary.

[0880] (Application Example 1)

[0881] Next, we will explain Application Example 1. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0882] Traditional online shopping systems suffer from usability issues, as users are required to perform numerous steps to find products and complete the ordering process. Furthermore, there is a risk of fraudulent orders. There is a need to solve these problems and provide a highly accurate and secure purchasing experience.

[0883] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 1 is realized by the following means.

[0884] In this invention, the server includes processing means for recognizing voice input with high accuracy, processing means for extracting order-related data from the recognized voice information, and support means for quickly identifying products based on voice recognition and confirming them with the user. This allows users to smoothly order products using only their voice, while also detecting fraudulent orders and enabling a safe and comfortable purchasing experience.

[0885] "Voice input" refers to the process of recording a user's spoken words as digital data.

[0886] "Processing means" refers to a device or program that analyzes and calculates data such as audio information to achieve a specific purpose.

[0887] "Audio information" refers to audio data emitted by the user, which is recorded as an acoustic signal.

[0888] "Order data" refers to purchase information, including the products to be purchased and their quantities.

[0889] "Display means" refers to a device or apparatus that visually presents information to the user.

[0890] "Identity verification" is a procedure to confirm whether a user is a legitimate registered user.

[0891] "Authentication means" refers to a device or technology used to prove the authority or identity of a specific individual.

[0892] "Payment method" refers to the means and procedures for paying for a purchase.

[0893] A "payment processing method" is a system or function for processing the exchange of payment.

[0894] A "recommendation means" is a device or program for recommending products based on the user's preferences and history.

[0895] "Emotional analysis means" refers to a technology or program that determines a user's psychological state from their voice data.

[0896] "Fraud detection measures" are devices or systems that identify unusual or fraudulent patterns or behaviors.

[0897] "Support measures" refer to devices or technologies designed to assist with specific tasks or activities.

[0898] The voice recognition system in this invention is implemented as a smartphone application and streamlines online shopping through voice input by the user.

[0899] First, when the user speaks about the product into their smartphone's microphone, the device captures the audio as digital data. This audio data is then converted into text using speech recognition software. Speech recognition technologies such as the speech_recognition library are utilized in this process. Order-related information is extracted from the text data, identifying the product number and quantity.

[0900] Next, the server confirms the order information with the user via voice or text, and finalizes the order based on the user's response. After this confirmation, the server verifies the user's identity using authentication methods. Security is ensured by using voice pattern matching technology and comparing it with registered voice data.

[0901] Subsequently, the server processes the payment, allowing the user to select their preferred payment method from several options. This process utilizes fintech technology to ensure a smooth transaction. Furthermore, the server analyzes past purchase history and suggests related products, providing users with new purchasing opportunities. Sentiment analysis technology is used to extract interest from the user's positive utterances and improve the accuracy of recommendations.

[0902] Furthermore, the server is equipped with a fraud detection algorithm that immediately issues a warning if it detects an abnormal order pattern, thereby preventing fraudulent transactions.

[0903] As a concrete example, when a user says "I'd like to order one of these sofas" on their smartphone, the system recognizes the voice and identifies the product number and quantity. A confirmation prompt is generated asking, "Product number 5678, one unit, is that correct?" After the user responds, identity verification and payment procedures are completed, and related products are suggested based on the user's past purchase history of similar items, such as "Would you also like this cushion?" This process allows users to easily complete orders using only their voice.

[0904] An example of a prompt for a generative AI model might be, "Design an application that allows users to order desired products by voice, and smoothly handle confirmation and payment."

[0905] The flow of a specific process in Application Example 1 will be explained using Figure 12.

[0906] Step 1:

[0907] The user speaks into their smartphone's microphone about the product they wish to purchase. The input is the user's voice data, which the device acquires as digital data. This voice data serves as basic information for processing in the next step.

[0908] Step 2:

[0909] The device converts the acquired audio data into text data using the speech_recognition library. The input is audio data, and phoneme analysis is performed by the speech recognition engine to obtain the output as text containing the order details. This text data contains the order information intended by the user.

[0910] Step 3:

[0911] The terminal analyzes order information from the text data and extracts the product number and quantity. The input is the text data obtained in step 2, and by using natural language processing technology to identify and extract the necessary data, the product number and quantity are obtained as output.

[0912] Step 4:

[0913] The server confirms the extracted order information with the user via voice or text. The input is the product information obtained in step 3, and the server generates a confirmation message, which is then delivered to the user as synthesized speech or text. This confirmation helps prevent input errors.

[0914] Step 5:

[0915] The server receives the user's response and decides whether to confirm the order based on the response. The input is either the user's voice response or text data, which is analyzed to make the final order confirmation. If the correct response is confirmed, the order is officially confirmed.

[0916] Step 6:

[0917] The server performs user identity verification. Inputs include voice patterns and user registration information, and authentication methods are used to verify that the user is legitimate. The output is either an approval or rejection of the identity verification.

[0918] Step 7:

[0919] The server initiates the payment process, allowing the user to select their preferred payment method. The input consists of order confirmation and identity verification information, and fintech technology is used to perform the appropriate payment procedure. The output generates a status indicating that the transaction is complete.

[0920] Step 8:

[0921] The server analyzes past purchase history and suggests related products to the user. The input is the user's purchase history data, and appropriate products are selected based on database analysis. The output is a list of products suggested to the user.

[0922] Step 9:

[0923] The server monitors for abnormal order patterns using a fraud detection algorithm. The input is all order data, which is compared to pre-configured fraud patterns. If an anomaly is detected, an alert is issued.

[0924] This entire process allows users to shop easily and securely using voice commands.

[0925] Furthermore, an emotion engine that estimates the user's emotions may be incorporated. That is, the identification processing unit 290 may use the emotion identification model 59 to estimate the user's emotions and perform identification processing using the user's emotions.

[0926] This invention relates to a system equipped with a program that accurately recognizes user voice input and extracts order information from the voice data. The server converts the voice data into text using a speech recognition engine and extracts the product number and quantity. This data is repeated to the user via a terminal for confirmation.

[0927] Identity verification is performed by the server using voice pattern matching technology, ensuring accurate user authentication through the authentication method. After authentication is complete, the terminal presents the user with multiple payment methods and allows them to select their preferred method. The selection information is sent to the server, and the appropriate payment processing is carried out.

[0928] A distinctive feature of this system is its emotion analysis method, which incorporates an emotion engine. The server analyzes the user's voice tone and speech content using the emotion engine to estimate the user's emotions. Based on this emotion information, the server generates an appropriate response that matches the user's mood, and the terminal presents that response to the user.

[0929] For example, if a user expresses dissatisfaction, the emotion engine instantly detects that negative emotion. Based on this information, the server generates a response such as, "Shall I explain more about this product?", providing an interaction that mitigates the negative emotion. The emotion engine also takes past emotional patterns into consideration, using this information to generate future responses that ensure users always have a positive experience.

[0930] Furthermore, this system uses fraud detection mechanisms to immediately issue alerts and notify the responsible personnel if abnormal order patterns are detected. This further enhances security.

[0931] Thus, the system of the present invention highly integrates analysis, recognition, authentication, and payment, and provides a comprehensive platform for improving the user experience through an emotion engine.

[0932] The following describes the processing flow.

[0933] Step 1:

[0934] The user makes a verbal utterance indicating their intention to purchase. Specifically, they request a product by saying, "I want to buy this."

[0935] Step 2:

[0936] The device captures the user's speech as audio data and sends that data to the server.

[0937] Step 3:

[0938] The server inputs the transmitted voice data into a speech recognition engine and converts it into text. At this stage, order information, including product numbers and quantities, is extracted.

[0939] Step 4:

[0940] The terminal repeats the extracted order information to the user, asking for confirmation: "Is it correct that the item number is 123 and the quantity is 2?"

[0941] Step 5:

[0942] The user responds to the order confirmation with "yes" or "no". If the user answers "no", the device prompts them to re-enter the order details.

[0943] Step 6:

[0944] The server performs voice pattern analysis and verifies the user's identity by comparing it with existing user information.

[0945] Step 7:

[0946] Once the device has completed identity verification, it will present the user with several payment options, displaying a message such as, "Please choose from credit card, bank transfer, or deferred payment."

[0947] Step 8:

[0948] The user selects their preferred payment method. This selection is sent to the server via the terminal, and the payment process begins.

[0949] Step 9:

[0950] The server uses an emotion engine to analyze the user's voice tone and assess their emotional state. Based on this information, it adjusts its next response.

[0951] Step 10:

[0952] The server checks the user's past emotional patterns and purchase history, and suggests related products and special offers. It displays a message like, "We also recommend this product," through the user's device.

[0953] Step 11:

[0954] The server activates a fraud detection algorithm and issues real-time alerts if any unusual orders are detected. This notification is then sent to the responsible person, enabling a swift response.

[0955] (Example 2)

[0956] Next, we will describe Example 2. In the following description, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[0957] In modern society, ordering systems using voice input are widespread, but the accuracy of voice recognition and the generation of appropriate responses that respond to user emotions are insufficient. Furthermore, from a security standpoint, there is a lack of mechanisms to detect fraudulent orders. As a result, the user experience deteriorates and the reliability of the system is compromised.

[0958] The identification process performed by the identification processing unit 290 of the data processing device 12 in Example 2 is realized by the following means.

[0959] In this invention, the server includes control means for recognizing voice input with high accuracy, response generation means for generating and presenting responses based on emotional information, and anomaly monitoring means for detecting fraudulent orders. This enables high-accuracy recognition of the user's voice, allows for appropriate responses in response to emotions during actual order processing, and prevents fraudulent orders, thereby improving the user experience and the reliability of the system.

[0960] "Voice input" is a method for users to communicate information to a system using spoken language.

[0961] "High-precision recognition" refers to analyzing audio information, converting it into text or numbers, and minimizing errors.

[0962] "Control means" refers to devices or programs that have functions for inputting, analyzing, converting, and outputting various types of data.

[0963] "Order details" refers to specific purchase information, such as product numbers and quantities, extracted from the user's voice.

[0964] "Display means" refers to monitors and display devices used to present text and images to users.

[0965] "Authentication methods" refer to technologies and devices used to verify a user's personal information and identify them as that person.

[0966] A "payment management system" is a system that offers multiple payment methods and allows users to make payments using the method they choose.

[0967] A "suggestion method" refers to a device or program that has the function of suggesting products or services suitable for the user based on their past purchase history.

[0968] "Emotional evaluation methods" refer to technologies that analyze emotional elements from a user's voice or text data to determine their emotional state.

[0969] An "anomaly monitoring system" is a system that has the function of detecting patterns that differ from normal operation and warning of fraud or abnormalities.

[0970] A "response generation means" is a process or apparatus for constructing and presenting appropriate responses or messages based on analyzed data and emotional information.

[0971] The system for carrying out the present invention recognizes voice input and integrates order processing, user authentication, sentiment analysis, and anomaly monitoring based on that input. Specific embodiments are as follows.

[0972] First, the user uses a device to input their voice. This device can be a standard smartphone or tablet. When the user gives their order instructions by voice, this voice data is transmitted to the server via the internet through the device.

[0973] Next, the server uses speech recognition software (for example, a cloud-based API that provides speech recognition services) to convert the voice data into text data. A specific example is using a speech recognition service to convert the voice command "I want to order 3 iPhone 12s" into the text data "Product number iPhone 12, quantity 3".

[0974] The server then extracts order details such as product number and quantity from the converted text data. This information is sent to the terminal and repeated to the user via screen display and voice. The user reviews the information and, if necessary, re-enters it via voice input.

[0975] In the identity verification step, the server uses voice pattern matching technology to authenticate the user. Specifically, it analyzes the user's voice characteristics and matches them with pre-registered information to confirm their identity.

[0976] In addition, the server uses an emotion analysis engine to analyze the user's voice tone and speech content to determine the user's emotional state. Based on this emotional information, it generates responses that are empathetic to the user's mood, such as "How are you feeling today? Do you need any support?" This improves the user experience.

[0977] Furthermore, the server is equipped with an anomaly monitoring system that detects fraudulent and abnormal order patterns. When an anomaly occurs, it issues an alert and notifies the administrator to take appropriate action. In this way, the security of the entire system is maintained.

[0978] A possible example of a specific prompt message would include: "Convert the user's voice input to text and extract the order details. Analyze the user's emotional state and generate a response based on that."

[0979] This system enables efficient voice-based order processing and provides users with a comfortable and safe user experience.

[0980] The flow of the specific processing in Example 2 will be explained using Figure 13.

[0981] Step 1:

[0982] The user uses a terminal to place an order by voice. At this stage, the user's voice data is entered. The terminal receives this voice data and sends it to the server. This transmitted voice data becomes the basis for the next processing step.

[0983] Step 2:

[0984] The server converts the received audio data into text data using speech recognition software. This conversion process utilizes a cloud-based speech recognition API, which analyzes the speech to accurately convert it into text data. This converted text data is then used for subsequent data processing to extract order information.

[0985] Step 3:

[0986] The server performs text analysis to extract order information from the text data. This process involves applying natural language processing techniques to the text data to identify specific order details such as product numbers and quantities. The extracted order information is sent to the terminal, where the user is prompted to confirm it visually or audibly.

[0987] Step 4:

[0988] The user confirms the order information displayed on the terminal or repeated aloud. If the information confirmed by the user is correct, they send a "Confirmation Complete" input back to the server via the terminal. This response triggers the order process to proceed.

[0989] Step 5:

[0990] The server verifies the user's identity using authentication methods. Specifically, it uses voice pattern matching technology to compare the user's voice with registered voice data. If this matching is successful, user authentication is complete. This authentication is a crucial step for secure payments and protection of personal information.

[0991] Step 6:

[0992] The terminal presents the user with payment method options, including credit cards, e-money, and bank transfers, visually displaying a variety of payment methods to the user. The user selects their preferred payment method through the terminal, and the selection is sent to the server, where the payment process begins.

[0993] Step 7:

[0994] The server uses an emotion analysis engine to analyze the user's emotional state from the voice data. This process utilizes a generative AI model to estimate the user's emotions from the tone and content of the voice. The estimated emotional information then becomes input data for the server to generate a response and provide an appropriate dialogue.

[0995] Step 8:

[0996] The server generates a response to present to the user, taking into account emotional and order information. This response is considerate of the user's emotional state and may include questions such as "Is there anything else I can help you with?" or suggestions for support. The generated response is sent to the terminal and presented to the user.

[0997] Step 9:

[0998] The server utilizes a fraud detection system to monitor unusual orders. This system identifies deviations from normal order patterns and issues warnings if fraud is suspected. This ensures consistent and secure operation of the system.

[0999] (Application Example 2)

[1000] Next, we will explain application example 2. In the following explanation, the data processing device 12 will be referred to as the "server" and the robot 414 as the "terminal".

[1001] With the advancement of information and communication technology, services utilizing voice recognition are becoming widespread. However, challenges remain, such as issues with recognition accuracy when placing orders via voice input and insufficient responses to user dissatisfaction. Furthermore, detecting fraudulent orders and offering diverse payment options are also problematic. In particular, there is a need for flexible responses that respond to the emotions of users.

[1002] The specific processing performed by the specific processing unit 290 of the data processing device 12 in Application Example 2 is realized by the following means.

[1003] In this invention, the server includes a programming means for recognizing voice input with high accuracy, an emotion analysis means, and a means for analyzing the user's feelings from their speech and generating special offers based on individual inquiries. This improves the accuracy of voice recognition and enables flexible responses that respond to the user's emotions, providing the user with a comfortable ordering experience.

[1004] "A programmatic means for recognizing voice input with high accuracy" refers to software technology that converts the voice spoken by the user into a digital signal and precisely extracts its content as text data.

[1005] "Programming means for extracting order information" refers to software technology that identifies the items and quantities of the items the user wishes to purchase from recognized voice data and compiles them into an order.

[1006] "Display means" refers to devices or software that visually display information extracted by the system to the user, enabling them to confirm and select information.

[1007] "Authentication methods" refer to technologies that analyze voice patterns to verify the user's identity and guarantee secure transactions.

[1008] A "payment processing means" is a processing means for completing the payment process according to the payment method selected by the user.

[1009] "Recommendation methods" refer to algorithms and technologies used to recommend appropriate products and services to users based on their past purchase history.

[1010] "Emotional analysis means" refers to technology that analyzes the emotions contained in a user's voice and identifies the user's mental state.

[1011] "Fraud detection measures" refer to algorithms and technologies used to detect fraudulent or abnormal order behavior and ensure security.

[1012] "A means of analyzing user utterances to understand their feelings and generating special offers based on individual inquiries" refers to a method of analyzing the content of user utterances and the emotions contained therein, and providing special offers and information tailored to the user.

[1013] The system realizing this invention highly recognizes user voice input and processes order information based on it. The server converts the voice data into text format using a speech recognition engine and extracts the order details. The Google Cloud Speech-to-Text API is used for speech recognition to ensure high accuracy.

[1014] The server extracts order information from the voice input, which is then repeated back to the user via the terminal for confirmation. This confirmation process utilizes the terminal's display, allowing the user to visually verify the information.

[1015] For identity verification, the server uses voice pattern matching technology to perform voice authentication. This ensures highly accurate user authentication. This process plays a role in preventing unauthorized access and manipulation.

[1016] Next, regarding payment processing, the server proposes multiple payment methods to the user, and the payment is processed according to the method selected by the user. The payment processing is required to be secure and smooth, and various payment APIs are applied.

[1017] In emotion analysis, the server uses IBM Watson Tone Analyzer to analyze the user's voice tone and estimate their emotional state. Based on these analysis results, services and special offers tailored to the user's emotions are generated, improving the user's experience satisfaction.

[1018] The fraud detection mechanism allows the server to immediately send a notification to the responsible person if it detects an abnormal order pattern. This improves the overall security of the system.

[1019] For example, if a user says, "I'm tired today, so I'd like to order a pizza," and sentiment analysis determines that the user is feeling "tired," then an offer such as "Would you like a smoothie with your pizza?" will be generated.

[1020] An example of a prompt message might be, "If the user is in a mood for relaxation, suggest a seasonal drink."

[1021] The flow of a specific process in Application Example 2 will be explained using Figure 14.

[1022] Step 1:

[1023] Acquiring voice input

[1024] The user places an order by voice into the terminal. This voice data is recorded on the terminal and sent to the server. The input is the user's voice, and the output is digital voice data. The terminal performs signal processing, cutting out silent portions and compressing the audio before transmission.

[1025] Step 2:

[1026] Speech recognition processing

[1027] The server converts the received audio data into text using the Google Cloud Speech-to-Text API. The input is digital audio data, and the output is extracted text data. In this step, the server performs audio preprocessing, such as noise filtering, to improve recognition accuracy.

[1028] Step 3:

[1029] Extraction of order information

[1030] The server extracts order information, such as product names and quantities, from the converted text. The input is text data, and the output is structured order information. Keyword matching and natural language processing techniques are used to separate relevant information within the text.

[1031] Step 4:

[1032] Identity verification through voice pattern matching

[1033] The server analyzes voice patterns and performs identity verification by matching them with pre-registered voices. The input is voice feature data, and the output is the authentication result. Here, voice feature extraction technology is used, and the authentication process is performed by a machine learning model.

[1034] Step 5:

[1035] Emotion analysis

[1036] The server uses IBM Watson Tone Analyzer to estimate the user's emotional state from their speech. The input is text data of the user's utterances, and the output is an emotional label and score. The server analyzes the nuances of the utterances and processes the data to infer the user's psychological state.

[1037] Step 6:

[1038] Confirmation display and option presentation

[1039] The server extracts order information and sends it to the terminal, which then displays it on its screen for the user to confirm. The input is the order information, and the output is the display screen. The terminal uses a caching mechanism to optimize response speed.

[1040] Step 7:

[1041] Generating and presenting special offers

[1042] Based on the sentiment analysis results, the server generates a special offer tailored to the user's mood and sends it to the terminal. The input is a sentiment label, and the output is special offer data. The server uses a generation AI model to create a prompt message and notifies the user of relevant information.

[1043] Step 8:

[1044] Payment processing

[1045] The server processes the payment based on the payment method selected by the user. The input is the selected payment method, and the output is a payment completion notification. The server uses encryption technology to execute transactions to ensure secure and fast transactions.

[1046] Step 9:

[1047] Fraud detection and notification

[1048] The server employs fraud detection measures throughout the entire process and immediately sends an alert to the responsible person if an abnormal order pattern is detected. The input is order pattern data, and the output is a warning notification. The server monitors patterns using machine learning algorithms and performs actions to detect anomalies early.

[1049] The specific processing unit 290 transmits the result of the specific processing to the robot 414. In the robot 414, the control unit 46A causes the speaker 240 and the controlled object 443 to output the result of the specific processing. The microphone 238 acquires audio indicating user input for the result of the specific processing. The control unit 46A transmits the audio data indicating user input acquired by the microphone 238 to the data processing unit 12. In the data processing unit 12, the specific processing unit 290 acquires the audio data.

[1050] Data generation model 58 is a type of so-called generative AI (Artificial Intelligence). An example of data generation model 58 is ChatGPT (Internet Search<URL: https: / / openai.com / blog / chatgpt> ), Gemini (Internet search) <url: https: gemini.google.com ?hl="ja">Examples of generative AI include the following. The data generation model 58 is obtained by performing deep learning on a neural network. The data generation model 58 is input with prompts containing instructions, and with inference data such as audio data representing speech, text data representing text, and image data representing images. The data generation model 58 infers from the input inference data according to the instructions indicated by the prompts, and outputs the inference results in data formats such as audio data and text data. Here, inference refers to, for example, analysis, classification, prediction, and / or summarization.

[1051] In the above embodiment, an example was given in which the specific processing is performed by the data processing device 12, but the technology of this disclosure is not limited thereto, and the specific processing may also be performed by the robot 414.

[1052] Furthermore, the emotion identification model 59, acting as an emotion engine, may determine the user's emotion according to a specific mapping. Specifically, the emotion identification model 59 may determine the user's emotion according to a specific mapping, which is an emotion map (see Figure 9). Similarly, the emotion identification model 59 may also determine the robot's emotion, and the identification processing unit 290 may perform identification processing using the robot's emotion.

[1053] Figure 9 shows an emotion map 400 in which multiple emotions are mapped. In the emotion map 400, emotions are arranged in concentric circles radiating from the center. The closer to the center of the concentric circles, the more primitive the emotions are located. Further out of the concentric circles, emotions representing states and actions arising from mental states are located. Emotion is a concept that includes feelings and mental states. On the left side of the concentric circles, emotions that are generally generated from reactions occurring in the brain are located. On the right side of the concentric circles, emotions that are generally induced by situational judgment are located. Above and below the concentric circles, emotions that are generally generated from reactions occurring in the brain and induced by situational judgment are located. In addition, the emotion of "pleasure" is located on the upper side of the concentric circles, and the emotion of "displeasure" is located on the lower side. Thus, in the emotion map 400, multiple emotions are mapped based on the structure in which emotions arise, and emotions that are likely to occur simultaneously are mapped close together.

[1054] These emotions are distributed at the 3 o'clock position on the Emotion Map 400, and usually fluctuate between feelings of security and anxiety. In the right half of the Emotion Map 400, situational awareness takes precedence over internal feelings, resulting in a calm impression.

[1055] The inside of the Emotion Map 400 represents inner thoughts, while the outside represents actions. Therefore, the further you go from the outside of the Emotion Map 400, the more visible (expressed in actions) your emotions become.

[1056] Here, human emotions are based on various balances, such as posture and blood sugar levels. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. Similarly, in robots, cars, motorcycles, etc., emotions can be created based on various balances, such as posture and battery level. When these balances deviate from the ideal, it results in discomfort, and when they approach the ideal, it results in pleasure. The emotion map can be generated, for example, based on Dr. Mitsuyoshi's emotion map (Research on a system for analyzing brain physiological signals of speech emotion recognition and emotion, Tokushima University, doctoral dissertation: https: / / ci.nii.ac.jp / naid / 500000375379). The left half of the emotion map contains emotions belonging to a region called "response," where sensation is dominant. The right half of the emotion map contains emotions belonging to a region called "situation," where situational awareness is dominant.

[1057] The emotion map defines two emotions that promote learning. One is the emotion around the middle of the negative "repentance" and "reflection" on the situation side. In other words, it is when the robot experiences negative emotions such as "I never want to feel this way again" or "I don't want to be scolded again." The other is the emotion around the positive "desire" on the reaction side. In other words, it is when the robot has positive feelings such as "I want more" or "I want to know more."

[1058] The emotion identification model 59 inputs user input into a pre-trained neural network, obtains emotion values ​​representing each emotion shown in the emotion map 400, and determines the user's emotion. This neural network is pre-trained based on multiple training data sets, which are combinations of user input and emotion values ​​representing each emotion shown in the emotion map 400. Furthermore, this neural network is trained so that emotions located close together have similar values, as shown in the emotion map 900 in Figure 10. Figure 10 shows an example where multiple emotions such as "reassured," "calm," and "confident" have similar emotion values.

[1059] The above description primarily focuses on the functions of the data processing device 12 in relation to this disclosure. However, the system related to this disclosure is not necessarily implemented on a server. The system related to this disclosure may be implemented as a general information processing system. This disclosure may be implemented, for example, as a software program that runs on a personal computer or as an application that runs on a smartphone. The method related to this disclosure may be provided to users in SaaS (Software as a Service) format.

[1060] In the above embodiment, an example was given in which a specific process is performed by a single computer 22. However, the technology of this disclosure is not limited thereto, and a distributed processing of the specific process may be performed by multiple computers, including computer 22. For example, a data generation model 58 may be provided in an external device of the data processing device 12, and the external device may generate data according to the input data.

[1061] In the above embodiment, an example was given in which the specific processing program 56 is stored in the storage 32, but the technology of this disclosure is not limited thereto. For example, the specific processing program 56 may be stored in a portable, computer-readable, non-temporary storage medium such as a USB (Universal Serial Bus) memory. The specific processing program 56 stored in the non-temporary storage medium is installed in the computer 22 of the data processing device 12. The processor 28 executes specific processing according to the specific processing program 56.

[1062] Alternatively, the specific processing program 56 may be stored in a storage device such as a server connected to the data processing device 12 via the network 54, and the specific processing program 56 may be downloaded and installed on the computer 22 in response to a request from the data processing device 12.

[1063] Furthermore, it is not necessary to store the entirety of the specific processing program 56 in a storage device such as a server connected to the data processing device 12 via the network 54, or to store the entirety of the specific processing program 56 in the storage 32; it is acceptable to store only a portion of the specific processing program 56.

[1064] The following types of processors can be used as hardware resources to perform specific processing. Examples of processors include a CPU, a general-purpose processor that functions as a hardware resource to perform specific processing by executing software, i.e., a program. Other examples of processors include dedicated electrical circuits, such as FPGAs (Field-Programmable Gate Arrays), PLDs (Programmable Logic Devices), or ASICs (Application Specific Integrated Circuits), which have circuit configurations specifically designed to perform specific processing. All of these processors have built-in or connected memory, and all of them perform specific processing by using memory.

[1065] The hardware resource that performs a specific process may consist of one of these various processors, or it may consist of a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs, or a combination of a CPU and an FPGA). Alternatively, the hardware resource that performs a specific process may consist of a single processor.

[1066] Examples of configurations using a single processor include, firstly, a configuration in which one or more CPUs and software are combined to form a single processor, and this processor functions as a hardware resource that performs a specific process. Secondly, there is a configuration using a processor that realizes the functions of the entire system, including multiple hardware resources that perform a specific process, on a single IC chip, as exemplified by SoCs (System-on-a-chip). In this way, a specific process is realized using one or more of the above types of processors as hardware resources.

[1067] Furthermore, the hardware structure of these various processors can more specifically utilize electrical circuits that combine circuit elements such as semiconductor devices. Also, the specific processing described above is merely an example. Therefore, it goes without saying that unnecessary steps can be deleted, new steps added, or the processing order rearranged, as long as it does not deviate from the main purpose.

[1068] The descriptions and illustrations presented above are detailed explanations of the technical aspects of this disclosure and are merely examples of the technical aspects. For example, the above descriptions of the structure, function, operation, and effect are examples of the structure, function, operation, and effect of the technical aspects of this disclosure. Therefore, it goes without saying that you may delete unnecessary parts, add new elements, or replace elements in the descriptions and illustrations presented above, as long as you do not deviate from the essence of the technical aspects of this disclosure. Furthermore, in order to avoid confusion and facilitate understanding of the technical aspects of this disclosure, explanations of common technical knowledge and the like that do not require special explanation to enable the implementation of the technical aspects of this disclosure have been omitted from the descriptions and illustrations presented above.

[1069] All documents, patent applications, and technical standards described herein are incorporated by reference to the same extent as if each individual document, patent application, and technical standard were specifically and individually noted to be incorporated by reference.

[1070] The following is further disclosed regarding the embodiments described above.

[1071] (Claim 1)

[1072] A programmatic means for recognizing voice input with high accuracy,

[1073] A programmatic means for extracting order information from recognized voice data,

[1074] A display means for repeating the extracted order information to the customer,

[1075] An authentication method that verifies identity by matching voice patterns,

[1076] A payment processing method that allows users to select from multiple payment methods,

[1077] A suggestion method that recommends products based on purchase history,

[1078] A sentiment analysis tool for analyzing user emotions,

[1079] A fraud detection method for detecting fraudulent orders,

[1080] A system that includes this.

[1081] (Claim 2)

[1082] The system according to claim 1, comprising a programming means for converting recognized speech data into text.

[1083] (Claim 3)

[1084] The system according to claim 1, wherein the emotion analysis means includes a computer program for detecting the user's tone of voice and generating a response.

[1085] "Example 1"

[1086] (Claim 1)

[1087] A processing means for acquiring voice input with high accuracy,

[1088] A program that converts acquired audio data into text data,

[1089] Information processing means for extracting order information from converted audio data,

[1090] An output device for confirming the extracted order information with the customer via voice,

[1091] An identification method that uses voice data to verify identity,

[1092] A payment selection method that allows users to choose from multiple different payment methods,

[1093] A product presentation method that suggests related products based on past purchase history,

[1094] A means of analyzing user emotions,

[1095] An anomaly detection means that detects abnormal orders and issues a warning,

[1096] A system that includes this.

[1097] (Claim 2)

[1098] The system according to claim 1, comprising a programming means for converting acquired speech data into text using a speech recognition engine.

[1099] (Claim 3)

[1100] The system according to claim 1, further comprising a computational program that analyzes the user's voice tone using emotion analysis means and generates a response based on the analysis results.

[1101] "Application Example 1"

[1102] (Claim 1)

[1103] A processing means for recognizing voice input with high accuracy,

[1104] A processing means for extracting order-related data from recognized voice information,

[1105] A display means that repeats the extracted order data to the user,

[1106] An authentication method that verifies identity by matching voice patterns,

[1107] A payment processing method that allows users to select from multiple payment methods,

[1108] A suggestion method that recommends products based on purchase history,

[1109] A means of analyzing the emotions of users,

[1110] A fraud detection method for detecting fraudulent orders,

[1111] A support system that quickly identifies products based on voice recognition and confirms them with the user,

[1112] A system that includes this.

[1113] (Claim 2)

[1114] The system according to claim 1, further comprising processing means for converting recognized speech information into text.

[1115] (Claim 3)

[1116] The system according to claim 1, wherein the emotion analysis means includes a computer program for detecting the user's tone of voice and generating a response.

[1117] "Example 2 of combining an emotion engine"

[1118] (Claim 1)

[1119] A control means for recognizing voice input with high accuracy,

[1120] A control means for extracting order details from recognized voice information,

[1121] A display means that repeats the extracted order details to the user,

[1122] An authentication method that performs personal verification by matching voice patterns,

[1123] A payment management system that allows users to select multiple payment methods,

[1124] A suggestion method that recommends products based on past purchase history,

[1125] A means of emotional evaluation to analyze the emotions of users,

[1126] An anomaly monitoring mechanism for detecting fraudulent orders,

[1127] A response generation means that generates and presents a response based on emotional information,

[1128] A system that includes this.

[1129] (Claim 2)

[1130] The system according to claim 1, further comprising control means for converting recognized voice information into a string.

[1131] (Claim 3)

[1132] The system according to claim 1, wherein the emotion evaluation means includes information processing means for detecting the user's speech tone and generating a response.

[1133] "Application example 2 when combining with an emotional engine"

[1134] (Claim 1)

[1135] A programmatic means for recognizing voice input with high accuracy,

[1136] A programmatic means for extracting order information from recognized voice data,

[1137] A display means for presenting extracted order information to the user,

[1138] Authentication methods that use voice patterns for verification,

[1139] A payment processing method that allows users to select multiple payment methods,

[1140] A suggestion method that recommends items based on purchase history,

[1141] An emotional analysis tool for analyzing the mental state of users,

[1142] Fraud detection means to detect fraudulent purchases,

[1143] A means of analyzing user emotions from their speech and generating special offers based on individual inquiries,

[1144] A system that includes this.

[1145] (Claim 2)

[1146] The system according to claim 1, comprising a programming means for converting recognized voice data into a string.

[1147] (Claim 3)

[1148] The system according to claim 1, wherein the emotion analysis means includes a calculation program for detecting the user's speech tone and generating a response. [Explanation of Symbols]

[1149] 10, 210, 310, 410 Data Processing Systems 12 Data Processing Devices 14 Smart Devices 214 Smart Glasses 314 Headset-type terminal 414 Robots< / url:> < / url:> < / url:> < / url:>

Claims

1. A programmatic means for recognizing voice input with high accuracy, A programmatic means for extracting order information from recognized voice data, A display means for repeating the extracted order information to the customer, An authentication method that verifies identity by matching voice patterns, A payment processing method that allows users to select from multiple payment methods, A suggestion method that recommends products based on purchase history, A sentiment analysis tool for analyzing user emotions, A fraud detection method for detecting fraudulent orders, A system that includes this.

2. The system according to claim 1, comprising a programming means for converting recognized speech data into text.

3. The system according to claim 1, wherein the emotion analysis means includes a computer program for detecting the user's tone of voice and generating a response.